[PATCH net v3] net: stmmac: protect updates of 64-bit statistics counters

Tue Feb 27 22:19:56 PST 2024

Net maintainers, chiming in here, as it seems handling this regression
stalled.

On 13.02.24 16:52, Eric Dumazet wrote:
> On Tue, Feb 13, 2024 at 4:26 PM Guenter Roeck <linux at roeck-us.net> wrote:
>> On Tue, Feb 13, 2024 at 03:51:35PM +0100, Eric Dumazet wrote:
>>> On Tue, Feb 13, 2024 at 3:29 PM Jisheng Zhang <jszhang at kernel.org> wrote:
>>>> On Sun, Feb 11, 2024 at 08:30:21PM -0800, Guenter Roeck wrote:
>>>>> On Sat, Feb 03, 2024 at 08:09:27PM +0100, Petr Tesarik wrote:
>>>>>> As explained by a comment in <linux/u64_stats_sync.h>, write side of struct
>>>>>> u64_stats_sync must ensure mutual exclusion, or one seqcount update could
>>>>>> be lost on 32-bit platforms, thus blocking readers forever. Such lockups
>>>>>> have been observed in real world after stmmac_xmit() on one CPU raced with
>>>>>> stmmac_napi_poll_tx() on another CPU.
>>>>>>
>>>>>> To fix the issue without introducing a new lock, split the statics into
>>>>>> three parts:
>>>>>>
>>>>>> 1. fields updated only under the tx queue lock,
>>>>>> 2. fields updated only during NAPI poll,
>>>>>> 3. fields updated only from interrupt context,
>>>>>>
>>>>>> Updates to fields in the first two groups are already serialized through
>>>>>> other locks. It is sufficient to split the existing struct u64_stats_sync
>>>>>> so that each group has its own.
>>>>>>
>>>>>> Note that tx_set_ic_bit is updated from both contexts. Split this counter
>>>>>> so that each context gets its own, and calculate their sum to get the total
>>>>>> value in stmmac_get_ethtool_stats().
>>>>>>
>>>>>> For the third group, multiple interrupts may be processed by different CPUs
>>>>>> at the same time, but interrupts on the same CPU will not nest. Move fields
>>>>>> from this group to a newly created per-cpu struct stmmac_pcpu_stats.
>>>>>>
>>>>>> Fixes: 133466c3bbe1 ("net: stmmac: use per-queue 64 bit statistics where necessary")
>>>>>> Link: https://lore.kernel.org/netdev/Za173PhviYg-1qIn@torres.zugschlus.de/t/
>>>>>> Cc: stable at vger.kernel.org
>>>>>> Signed-off-by: Petr Tesarik <petr at tesarici.cz>
>>>>>
>>>>> This patch results in a lockdep splat. Backtrace and bisect results attached.
>>>>>
>>>>> ---
>>>>> [   33.736728] ================================
>>>>> [   33.736805] WARNING: inconsistent lock state
>>>>> [   33.736953] 6.8.0-rc4 #1 Tainted: G                 N
>>>>> [   33.737080] --------------------------------
>>>>> [   33.737155] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
>>>>> [   33.737309] kworker/0:2/39 [HC1[1]:SC0[2]:HE0:SE0] takes:
>>>>> [   33.737459] ef792074 (&syncp->seq#2){?...}-{0:0}, at: sun8i_dwmac_dma_interrupt+0x9c/0x28c
>>>>> [   33.738206] {HARDIRQ-ON-W} state was registered at:
>>>>> [   33.738318]   lock_acquire+0x11c/0x368
>>>>> [   33.738431]   __u64_stats_update_begin+0x104/0x1ac
>>>>> [   33.738525]   stmmac_xmit+0x4d0/0xc58
>>>>
>>>> interesting lockdep splat...
>>>> stmmac_xmit() operates on txq_stats->q_syncp, while the
>>>> sun8i_dwmac_dma_interrupt() operates on pcpu's priv->xstats.pcpu_stats
>>>> they are different syncp. so how does lockdep splat happen.
>>>
>>> Right, I do not see anything obvious yet.
>>
>> Wild guess: I think it maybe saying that due to
>>
>>         inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
>>
>> the critical code may somehow be interrupted and, while handling the
>> interrupt, try to acquire the same lock again.
> 
> This should not happen, the 'syncp' are different. They have different
> lockdep classes.
> 
> One is exclusively used from hard irq context.
> 
> The second one only used from BH context.

Alexis Lothoré hit this now as well, see yesterday report in this
thread; apart from that nothing seem to have happened for two weeks now.
The change recently made it to some stable/longterm kernels, too. Makes
me wonder:

What's the plan forward here? Is this considered to be a false positive?
Or a real problem? Or a kind of situation along the lines of "that
commit should not cause the problem we are seeing, so it might have
exposed a older bug in the code, but nobody looked closer yet to check"?
Or something else?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.