[RFC PATCH] net: stmmac: Fix the problem about interrupt storm

Avi Fishman avifishman70 at gmail.com
Sun Nov 3 11:00:54 PST 2024


Hi all,

We recently encountered the same interrupt storm and the root cause
was the same as here.
The suggested patch solved 99% of the issues, but indeed as written
below on rare cases the issue happens between the dev_open() and
clear_bit(STMMAC_DOWN) calls.
I also agree that stmmac_interrupt() unconditionally ignores
interrupts when the driver is in STMMAC_DOWN state is dangerous.

The issue happened for us in linux 5.10 but I see that this behaviour
wasn't changed also in newer versions.
maybe we should disable the device interrupts before dev_close(), and
enable it after dev_open().

>
> Hi Romain,
>
> On Sun, Mar 31, 2024 at 4:35 PM Romain Gantois
> <romain.gantois at bootlin.com> wrote:
> >
> > Hello Cathy,
> >
> > On Wed, 27 Mar 2024, Cathy Cai wrote:
> >
> > > Tx queue time out then reset adapter. When reset the adapter, stmmac driver
> > > sets the state to STMMAC_DOWN and calls dev_close() function. If an interrupt
> > > is triggered at this instant after setting state to STMMAC_DOWN, before the
> > > dev_close() call.
> > >
> > ...
> > > -     set_bit(STMMAC_DOWN, &priv->state);
> > >       dev_close(priv->dev);
> > > +     set_bit(STMMAC_DOWN, &priv->state);
> > >       dev_open(priv->dev, NULL);
> > >       clear_bit(STMMAC_DOWN, &priv->state);
> > >       clear_bit(STMMAC_RESETING, &priv->state);
> >
> > If this IRQ issue can happen whenever STMMAC_DOWN is set while the net device is
> > open, then it could also happen between the dev_open() and
> > clear_bit(STMMAC_DOWN) calls right? So you'd have to clear STMMAC_DOWN before
> > calling dev_open() but then I don't see the usefulness of setting STMMAC_DOWN
> > and clearing it immediately. Maybe closing and opening the net device should be
> > enough?
Indeed we encounter an issue between the dev_open() and clear_bit(STMMAC_DOWN)..
> >
>  Yes. It could also happen between the dev_open() and
> clear_bit(STMMAC_DOWN) calls.
> Although we did not reproduce this scenario, it should have happened
> if we had increased
> the number of test samples. In addition, I found that other people had
> similar problems before.
> The link is:
> https://lore.kernel.org/all/20210208140820.10410-11-Sergey.Semin@baikalelectronics.ru/
>
> >
> > Moreover, it seems strange to me that stmmac_interrupt() unconditionnally
> > ignores interrupts when the driver is in STMMAC_DOWN state. This seems like
> > dangerous behaviour, since it could cause IRQ storm issues whenever something
> > in the driver sets this state. I'm not too familiar with the interrupt handling
> > in this driver, but maybe stmmac_interrupt() could clear interrupts
> > unconditionnally in the STMMAC_DOWN state?
> >
> Clear interrupts unconditionally in the STMMAC_DOWN state directly
> certainly won't cause this problem.
> This may be too rough, maybe this design has other considerations.
>
But then after the dev_open() you might miss interrupt, no?
> >
> > Best Regards,
> >
> > --
> > Romain Gantois, Bootlin
> > Embedded Linux and Kernel engineering
> > https://bootlin.com
>
>  Best Regards,
> Cathy



-- 
Regards,
Avi



More information about the linux-arm-kernel mailing list