[PATCH net-next] net: stmmac: enable RPS and RBU interrupts
Sam Edwards
cfsworks at gmail.com
Mon Apr 13 14:54:30 PDT 2026
On Mon, Apr 13, 2026, 11:49 Russell King (Oracle) <linux at armlinux.org.uk> wrote:
>
> On Mon, Apr 13, 2026 at 11:02:22AM -0700, Jakub Kicinski wrote:
> > On Fri, 10 Apr 2026 14:07:51 +0100 Russell King (Oracle) wrote:
> > > Since we are seeing receive buffer exhaustion on several platforms,
> > > let's enable the interrupts so the statistics we publish via ethtool -S
> > > actually work to aid diagnosis. I've been in two minds about whether
> > > to send this patch, but given the problems with stmmac at the moment,
> > > I think it should be merged.
> >
> > Sorry for a under-research response but wasn't there are person trying
> > to fix the OOM starvation issue? Who was supposed to add a timer?
> > Is your problem also OOM related or do you suspect something else?
>
> It is not OOM related. I have this patch applied:
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 131ea887bedc..614d0e10e3e6 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -5095,14 +5095,18 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv, u32 queue)
>
> if (!buf->page) {
> buf->page = page_pool_alloc_pages(rx_q->page_pool, gfp);
> - if (!buf->page)
> + if (!buf->page) {
> + netdev_err(priv->dev, "q%u: no buffer 1\n", queue);
> break;
> + }
> }
>
> if (priv->sph_active && !buf->sec_page) {
> buf->sec_page = page_pool_alloc_pages(rx_q->page_pool, gfp);
> - if (!buf->sec_page)
> + if (!buf->sec_page) {
> + netdev_err(priv->dev, "q%u: no buffer 2\n", queue);
> break;
> + }
>
> buf->sec_addr = page_pool_get_dma_addr(buf->sec_page);
> }
>
> and it is silent, so we are not suffering starvation of buffers.
>
> However, the hardware hangs during iperf3, and because it triggers the
> MAC to stream PAUSE frames, and my network uses Netgear GS108 and GS116
> unmanaged switches that always use flow-control between them (there's no
> way not to) it takes down the entire network - as we've discussed
> before. So, this problem is pretty fatal to the *entire* network.
>
> With this patch, the existing statistical counters for this condition
> are incremented, and thus users can use ethtool -S to see what happened
> and report whether they are seeing the same issue.
>
> Without this patch applied, there are no diagnostics from stmmac that
> report what the state is. ethtool -d doesn't list the appropriate
> registers (as I suspect part of the problem is the number of queues
> is somewhat dynamic - userspace can change that configuration through
> ethtool).
>
> Thus, one has to resort to using devmem2 to find out what's happened.
> That's not user friendly.
>
> For me, devmem2 shows:
>
> Channel 0 status register:
> Value at address 0x02491160: 0x00000484
> bit 10: ETI early transmit interrupt - set
> bit 9 : RWT receive watchdog - clear
> bit 8 : RPS receieve process stopped - clear
> bit 7 : RBU receive buffer unavailable - set
> bit 6 : RI receive interrupt - clear
> bit 2 : TBU transmit buffer unavailable - set
> bit 1 : TPS transmit process stopped - clear
> bit 0 : TI transmit interrupt - clear
>
> Debug status register:
> Value at address 0x0249100c: 0x00006300
> TPS[3:0] = 6 = Suspended, Tx descriptor unavailable or Tx buffer
> underflow
> RPS[3:0] = 3 = Running, waiting for Rx packet
>
> Metal Queue 0 debug register:
> Value at address 0x02490d38: 0x002e0020
> PRXQ[13:0] = 0x2e = 46 packets in receive queue
> RXQSTS[1:0] = 2 = Rx queue fill-level above flow-control activate
> threshold
> RRCSTS[1:0] = 0 = Rx Queue Read Controller State = Idle
>
> > Firing interrupts when Rx fill ring runs dry (which IIUC this patches
> > dies?) is not a good idea.
>
> Well, I'm thinking that at least on some platforms, such as the Jetson
> Xavier NX, unless a different solution can be found, we need the RBU
> interrupt to fire off a reset of the stmmac IP when this happens to
> reduce the PAUSE frame flood on the network.
Hi Russell,
Should that reset trigger be RPS, not RBU? My understanding of these
status bits is RBU is just "RxDMA has failed to take a frame from the
RxFIFO" while RPS is "the RxFIFO is full." That would make RBU our
critical threshold to start proactively refilling, and RPS the "too
late, we lose" threshold.
Thinking aloud: Do you suppose the RxDMA waits for a wakeup signal
sent whenever a frame is added to RxFIFO? That might explain why the
former never recovers once the latter is full: a manual wakeup needs
to be sent whenever we resolve RBU. Does the .enable_dma_reception()
op need to be implemented for dwmac5, or have you tried that already?
>
> If we can't do that, then I think stmmac on these platforms needs to be
> marked with CONFIG_BROKEN because right now there doesn't seem to be any
> other viable solution.
>
> My intention with this patch is merely to start collecting the already
> existing statistics so other users can start seeing whether they are
> hitting the same or similar problem. If we're not prepared to do that,
> then we should delete the useless statistics from ethtool -S, but I
> suspect they're now part of the UAPI, even though without this patch
> they will remain stedfastly stuck at zero.
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
More information about the linux-arm-kernel
mailing list