[PATCH net-next] net: stmmac: enable RPS and RBU interrupts
Sam Edwards
cfsworks at gmail.com
Tue Apr 14 19:12:34 PDT 2026
On Tue, Apr 14, 2026 at 6:19 PM Russell King (Oracle)
<linux at armlinux.org.uk> wrote:
> Okay, just a quick note to say that nvidia's 5.10.216-tegra kernel
> survives iperf3 -c -R to the imx6.
Hi Russell,
Aw, you beat me to it! I was about to report that 5.10.104-tegra is
unaffected. And my iperf3 server is a multi-GbE amd64 machine.
> Dumping the registers and comparing, and then forcing the RQS and TQS
> values to 0x23 (+1 = 36, *256 = 9216 bytes) and 0x8f (+1 = 144,
> *256 = 36864 ytes) respectively seems to solve the problem. Under
> net-next, these both end up being 0xff (+1 = 256, *256 = 65536 bytes.)
> Suspiciously, 36 * 4 = 144, and I also see that this kernel programs
> all four of the MTL receive operation mode registers, but only the
> first MTL transmit operation mode register. However, DMA channels 1-3
> aren't initialised.
Wow, great! I wonder if the problem is that the MTL FIFOs are smaller
than that, so when the DMA suffers a momentary hiccup, the FIFOs are
allowed to overflow, putting the hardware in a bad state.
Though I suspect this is only half of the problem: do you still see
RBUs? Everything you've shared so far suggests the DMA failures are
_not_ because the rx ring is drying up. My gut's telling me the DMA
unit is encountering an AXI error, triggering RBU plus some kind of
recovery behavior, and the recovery takes the DMA offline long enough
for the FIFO to overflow (without triggering RPS because the RQS
threshold is unreachable).
It seems that the problem happens less frequently on my test setup
when I boot with iommu.passthrough=1 but that could be my imagination.
But if the hardware remains stable with RQS and TQS set correctly, I
don't feel an urgent need to dig deeper. :)
> Looking back at 5.10, I don't see any code that would account for these
> values being programmed for TQS and RQS, it looks like the calculations
> are basically the same as we have today.
Note that Nvidia have their own "nvethernet" driver for their vendor
kernel, which appears to pick the FIFO sizes from hardcoded tables in
its eqos_configure_mtl_queue() [1] function.
Cheers,
Sam
[1] https://github.com/proski/nvethernet/blob/main/nvethernetrm/osi/core/eqos_core.c#L263
More information about the linux-arm-kernel
mailing list