[LEDE-DEV] Bug - spinlock loop in cns3xxx_eth.c

Mon Jun 13 11:17:48 PDT 2016

On Mon, Jun 13, 2016 at 9:12 AM, Koen Vandeputte
<koen.vandeputte at ncentric.com> wrote:
> Hi All,
>
> There seems to be a bug in the function eth_poll() in this driver
>
> When the RX ring gets full once, the re-schedule is called forever, even
> when the ring is empty afterwards.
>
>
>     if (!received) {
>         napi_complete(napi);
>         enable_irq(sw->rx_irq);
>         budget = 0;
>
>         /* if rx descriptors are full schedule another poll */
>         if (rx_ring->desc[(i-1) & (RX_DESCS-1)].cown)
>         {
>             eth_schedule_poll(sw);    <----  Gets called on each function
> entry
>         }
>     }
>
>
> This causes SoftIRQ to fully load a core forever.
>
>
> I didn't fix it yet, but should I be the first, i'll supply a patch ..
>

Koen,

We have seen this before, but admittedly don't understand why we enter
into the same condition on each subsequent call to eth_poll(). The
check is to catch the condition described as irq rot [1] and is to
catch the case where after processing the full budget, we are
immediately full again (a situation which is easily re-producible with
a flood-ping). If this occurs we will no longer get an rx interrupt
(because the descriptors are full) and our napi function will never
get called again (unless transmitting packets).

What is your proposed patch?

Tim

1 .http://www.linuxfoundation.org/collaborate/workgroups/networking/napi#IRQ_race_a.k.a_rotting_packet