net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
Samuel Holland
samuel at sholland.org
Mon Feb 21 18:30:00 PST 2022
On 2/20/22 10:51 AM, Erico Nunes wrote:
> On Mon, Feb 7, 2022 at 11:56 AM Jerome Brunet <jbrunet at baylibre.com> wrote:
>>
>>
>> On Wed 02 Feb 2022 at 21:18, Erico Nunes <nunes.erico at gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I've been tracking down an issue with network interfaces from
>>> meson8b-dwmac sometimes not coming up properly at boot.
>>> The target systems are AML-S805X-CC boards (Amlogic S805X SoC), I have
>>> a group of them as part of a CI test farm that uses nfsroot.
>>>
>>> After hopefully ruling out potential platform/firmware and network
>>> issues I managed to bisect this commit in the kernel to make a big
>>> difference:
>>>
>>> 46f69ded988d2311e3be2e4c3898fc0edd7e6c5a net: stmmac: Use resolved
>>> link config in mac_link_up()
>>>
>>> With a kernel before that commit, I am able to submit hundreds of test
>>> jobs and the boards always start the network interface properly.
>>>
>>> After that commit, around 30% of the jobs start hitting this:
>>>
>>> [ 2.178078] meson8b-dwmac c9410000.ethernet eth0: PHY
>>> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>>> [ 2.183505] meson8b-dwmac c9410000.ethernet eth0: Register
>>> MEM_TYPE_PAGE_POOL RxQ-0
>>> [ 2.200784] meson8b-dwmac c9410000.ethernet eth0: No Safety
>>> Features support found
>>> [ 2.202713] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>>> [ 2.209825] meson8b-dwmac c9410000.ethernet eth0: configuring for
>>> phy/rmii link mode
>>> [ 3.762108] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
>>> 100Mbps/Full - flow control off
>>> [ 3.783162] Sending DHCP requests ...... timed out!
>>> [ 93.680402] meson8b-dwmac c9410000.ethernet eth0: Link is Down
>>> [ 93.685712] IP-Config: Retrying forever (NFS root)...
>>> [ 93.756540] meson8b-dwmac c9410000.ethernet eth0: PHY
>>> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>>> [ 93.763266] meson8b-dwmac c9410000.ethernet eth0: Register
>>> MEM_TYPE_PAGE_POOL RxQ-0
>>> [ 93.779340] meson8b-dwmac c9410000.ethernet eth0: No Safety
>>> Features support found
>>> [ 93.781336] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>>> [ 93.788088] meson8b-dwmac c9410000.ethernet eth0: configuring for
>>> phy/rmii link mode
>>> [ 93.807459] random: fast init done
>>> [ 95.353076] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
>>> 100Mbps/Full - flow control off
>>>
>>> This still happens with a kernel from master, currently 5.17-rc2 (less
>>> frequently but still often hit by CI test jobs).
>>> The jobs still usually get to work after restarting the interface a
>>> couple of times, but sometimes it takes 3-4 attempts.
>>>
>>> Here is one example and full dmesg:
>>> https://gitlab.freedesktop.org/enunes/mesa/-/jobs/16452399/raw
>>>
>>> Note that DHCP does not seem to be an issue here, besides the fact
>>> that the problem only happens since the mentioned commit under the
>>> same setup, I did try to set up the boards to use a static ip but then
>>> the interfaces just don't communicate at all from boot.
>>>
>>> For test purposes I attempted to revert
>>> 46f69ded988d2311e3be2e4c3898fc0edd7e6c5a on top of master but that
>>> does not apply trivially anymore, and by trying to revert it manually
>>> I haven't been able to get a working interface.
>>>
>>> Any advice on how to further debug or fix this?
>>
>> Hi Erico,
>>
>> Thanks a lot for digging into this topic.
>> I'm seeing exactly the same behavior on the g12 based khadas-vim3:
>>
>> * Boot stalled waiting for DHCP - with an NFS based filesystem
>> * Every minute, the network driver gets a reset and try again
>>
>> Sometimes it works on the first attempt, sometimes it takes up to 5
>> attempts. Eventually, it reaches the prompt which might be why it went
>> unnoticed so far.
>>
>> I think that NFS just makes the problem easier to see.
>> On devices with an eMMC based filesystem, I noticed that, sometimes, I
>> had unplug/plug the ethernet cable to make it go.
>>
>> So far, the problem is reported on all the Amlogic SoC generation we
>> support. I think a way forward is to ask the the other users of
>> stmmac whether they have this problem or not - adding Allwinner and
>> Rockchip ML.
>>
>> Since the commit you have identified is in the generic part of the
>> stmmac code, Maybe Jose can help us understand what is going on.
>
> Hi all,
>
> thanks for the feedback so far, good to know that this is not only on
> my board farm.
>
> Any more feedback about this from the people in cc?
The commit in question appears to have been merged in v5.7. I have been using
kernels newer than that (including up to v5.17-rc) on various Allwinner
platforms -- A64, H3, H6, D1 -- and I have not seen anything similar. I also
don't remember seeing reports of others having Ethernet issues at boot on
Allwinner boards either.
The only issue that's come up recently for us was related to runtime PM, but
that issue was traced to a commit a year later than the one you referenced here
(5ec55823438e).
Regards,
Samuel
More information about the linux-amlogic
mailing list