[PATCH net-next] net: stmmac: enable RPS and RBU interrupts

Sam Edwards cfsworks at gmail.com
Wed Apr 15 13:50:53 PDT 2026


On Wed, Apr 15, 2026 at 12:37 PM Russell King (Oracle)
<linux at armlinux.org.uk> wrote:
>
> It's not a question about how I define RBU - this is defined by Synopsys
> and I'm using it *exactly* that way as stated in the documentation.
>
> "This bit indicates that the host owns the Next Descriptor in the
> Receive List and the DMA cannot acquire it. The Receive Process is
> suspended. ... This bit is set only when the previous Receive
> Descriptor is owned by the DMA."
>
> In other words, DMA has processed the previous receive descriptor which
> _was_ owned by the hardware, written back to clear the OWN bit, and
> then fetches the next descriptor and finds that the OWN bit is also
> clear.

I'm only trying to leave open the possibility that the Synopsys
technical writer and the hardware implementation team weren't
communicating clearly. We already have a situation where RPS isn't
behaving as documented (even if that's likely just hardware
misconfiguration), so while I'm currently pretty sure RBU carries no
other (actual) meaning than "DMA caught up to OWN=0," I'm only about
75% confident.

> > It would seem* that the kernel isn't really failing to keep up with
> > the packet rate. If RBU is firing with a ring that's not even close to
> > empty, that tells me there's another way for it to fire. So I suspect
> > the hardware designers implemented it to mean:
> > "We couldn't read the next descriptor, _or_ it wasn't owned by hardware."
> >
> > (* However, if bit 31 is clear everywhere, wouldn't that mean the ring
> > is actually completely depleted, not full? If count==budget, wouldn't
> > that mean the whole ring hasn't been visited, so we only refilled 64
> > entries and not necessarily the entire ring? Maybe the kernel isn't
> > keeping up after all.)
>
> Ah, I think that's where our terminology differs.
>
> You seem to define full as "populated with empty buffers". I define
> full to mean "the hardware has filled every buffer with a packet that
> it has received and handed it over to software to process." Note even
> the terminology there - filling buffers with data. That ultimately
> ends up filling the ring, and when completely filled, it is full.
>
> I think of buffers like buckets. If a buffer contains no data, it
> is empty. If a buffer contains data, it has been filled or is full.
> Apply that to a list of buffers and you get the same thing. Many
> ethernet driver documentation uses this same terminology, so I
> thought it would be widely understood.

Ah okay, I was beginning to suspect the same. In my defense: though I
also think of buffers in the same way, this driver calls the process
of supplying empty buffers "refilling," which is also the terminology
we've both been using throughout this exchange, and when something is
"completely refilled" I generally call it "full." But I'm realizing
now that the bidirectional (submissions+completions) nature of this
ring means that "full" and "empty" aren't really well-defined
concepts. I'll try to read more carefully (and switch to saying
"completely dirty" and "completely clean") going forward.

So the kernel is able to supply clean buffers without issue, but it
somehow falls behind the incoming packet rate and the DMA is left with
a completely dirty ring. I agree that stmmac_rx() is therefore just
not running fast enough: either it's got really bad scheduler jitter
for the ~6.3ms minimum it takes for 512x full-sized Ethernet frames to
arrive from the PHY (your scenario 1), or -- more likely -- the NAPI
budgets gradually fall behind the hardware (your scenario 2).

> Right, 40KiB. Sorry, I'm getting interrupted almost constantly while
> trying to do anything.
>
> However, I've tested with 0x7f in both fields, and it still falls flat
> on its face. I've also tried other values, but because I had to unplug
> the laptop from the nvidia board to use the laptop portably due to the
> medical emergency situation, that caused screen to quit, so I've lost
> all that. Chaos reigns supreme here :/

I'm sorry to hear about that, please prioritize you/yours and don't
feel like you owe me speedy replies.

> So, I'm not sure we understand what's going on - I don't think it's that
> the FIFOs are smaller than specified. I suspect that the 9KiB vs 36KiB
> results in some kind of throttling that prevents the condition which
> hangs the hardware.

I'll try playing with the FIFO configuration on my end to learn:
a) If a suitably-configured FIFO size makes the RPS status arrive as documented
b) If I can safely fill the FIFO slowly (by manually stalling the
driver and adding frames one at a time) and have it drain on resume
c) Whether the TQS value can be adjusted independently of this
problem's prevalence
d) The maximum RQS value that allows the problem to happen

> I'm not getting as much time as I'd like to really test out scenarios
> due to everything that is going on, and honestly I feel like just
> writing this week off now and giving up.

I have the same hardware, observe the same issue, and find this
interesting enough to keep plugging away at it. I would have no hard
feelings if you left me alone with this problem for a bit. :)

Be well,
Sam



More information about the linux-arm-kernel mailing list