Issue found in Armada 370: "No buffer space available" error during continuous ping

Willy Tarreau w at 1wt.eu
Mon Jul 21 00:03:03 PDT 2014


Hi Maggie,

On Sun, Jul 20, 2014 at 11:33:22PM -0700, Maggie Mae Roxas wrote:
> > As you said that you both applied cd71e2 and reverted 4f3a4f, could you
> please confirm that with cd71 applied only it was not enough?
> Yes.
> If mvneta.c used is the v3.13.9 + cd71e2, issue still occurs.
> If mvneta.c used is the v3.13.9 + cd71e2 - 4f3a4f, issue does not occur anymore.

Rather strange then.

> Okay. First, I'll check if the interrupts are working by checking
> this, as you suggested:
> <snip>
> Checking /proc/interrupts when you're sending some traffic should show
> that the IRQ is increasing from time to time.
> <snip>
> I'll inform you the results within the next 2-3 days.

OK.

> We'll be using it as a router, thus, it would really be a problem for us.

OK so clearly the issue must be found.
Just thinking about something, do you have a custom boot loader ? It
would be possible that in our case, the Tx IRQ works only because some
obscure or undocumented bits are set by the boot loader and that in your
case it's not pre-initialized.

> Will check possibilities of shifting to v3.14+ with our customer -
> especially if we found problems in ethernet performance as you
> mentioned.
> Any recommendations on which version to use, specifically?

LTS would probably even interest your customer as it's an LTS version.
In this case, always pick the most recent one (3.14.12 today). You may
even be interested in 3.15.6 which contains another phy fix supposed to
fix cd71e2, but if you're saying that it doesn't change anything for you
I guess it will have no effet (might be worth testing for the purpose of
helping troubleshooting though).

> > Third, considering that other boards work without applying these changes, it might be possible that there's an issue on your board, and maybe detecting it early would allow you to fix it for all future batches, and maybe only apply these patches for the few very first ones.
> Acknowledged.
> Once we verified that indeed, the performance was slower (or
> interrupts were not increasing) - we will inform our hardware team and
> have them investigate this issue further for possible hardware bugs.

OK. I still have a hard time imagining how hardware itself could prevent
an IRQ from being delivered from a NIC which is located inside the SoC,
but there must be an explanation somewhere :-/

> Thanks a lot for the help again, I'll let you know as soon as I have more info.

Thanks,
Willy




More information about the linux-arm-kernel mailing list