[PATCH net v4 0/2] stmmac crash/stall fixes when under memory pressure
Sam Edwards
cfsworks at gmail.com
Sat Apr 11 13:40:51 PDT 2026
On Fri, Apr 10, 2026 at 5:23 AM Russell King (Oracle)
<linux at armlinux.org.uk> wrote:
>
> On Thu, Apr 02, 2026 at 10:39:32AM -0700, Sam Edwards wrote:
> > On Thu, Apr 2, 2026 at 10:16 AM Russell King (Oracle)
> > <linux at armlinux.org.uk> wrote:
> > > I've tested this on my Jetson Xavier platform. One of the issues I've
> > > had is that running iperf3 results in the receive side stalling because
> > > it runs out of descriptors. However, despite the receive ring
> > > eventually being re-filled and the hardware appropriately prodded, it
> > > steadfastly refuses to restart, despite the descriptors having been
> > > updated.
> >
> > Hi Russell,
> >
> > Just to make sure I understand correctly: before my patches, you've
> > been observing this problem on Xavier for a while (no interrupts, ring
> > goes dry); with my patches, the ring is refilled, but the dwmac5
> > doesn't resume DMA. (Ah, just saw your follow-up email.)
> >
> > > Any ideas?
> >
> > Off the top of my head, my hypothesis is that dwmac5 has an additional
> > tripwire when the receive DMA is exhausted, and the
> > stmmac_set_rx_tail_ptr()/stmmac_enable_dma_reception() at the end of
> > stmmac_rx_refill() aren't sufficient to wake it back up.
> >
> > I think this is new to dwmac5, because my RK3588 (dwmac4.20 iirc)
> > happily resumes after the same condition.
> >
> > You gave a lot of info; thanks! I'll try to scrape up some
> > documentation on dwmac5 to see if there's something more
> > stmmac_rx_refill() ought to be doing. I think I have a Xavier NX
> > around here somewhere, I'll see if I can repro the problem.
>
> I've added dma_rmb() into dwmac4_wrback_get_tx_status() and
> dwmac4_wrback_get_rx_status(), and with that I've had an iperf3
> instance finally complete... but only once:
Hi Russell,
To me it feels relevant that the T194 doesn't use first-party
ARM/Cortex cores but rather Nvidia's in-house "Carmel" architecture.
Do you suppose the cache there is quirky in such a way that either:
1) We're seeing poor cache hygiene in stmmac where other caches are
more forgiving (more likely)
2) Carmel's cache has a subtle hardware bug triggered by stmmac's
specific access pattern (less likely)?
I'm still trying to get my Xavier NX to boot on net-next. It's running
into eMMC corruption/stalls very early in the boot process (at
slightly different times; feels like a problem in autocalibration)
that I'm not seeing on older kernels. Once I'm done bisecting that
regression I'll take a deeper look at this stmmac mystery. :)
Cheers,
Sam
>
> root at tegra-ubuntu:~# iperf3 -c 192.168.248.1 -R
> Connecting to host 192.168.248.1, port 5201
> Reverse mode, remote host 192.168.248.1 is sending
> [ 5] local 192.168.248.174 port 42232 connected to 192.168.248.1 port 5201
> [ ID] Interval Transfer Bitrate
> [ 5] 0.00-1.00 sec 50.8 MBytes 426 Mbits/sec
> [ 5] 1.00-2.00 sec 54.9 MBytes 460 Mbits/sec
> [ 5] 2.00-3.00 sec 54.0 MBytes 453 Mbits/sec
> [ 5] 3.00-4.00 sec 53.8 MBytes 452 Mbits/sec
> [ 5] 4.00-5.00 sec 52.4 MBytes 438 Mbits/sec
> [ 5] 5.00-6.00 sec 54.3 MBytes 455 Mbits/sec
> [ 5] 6.00-7.00 sec 53.7 MBytes 452 Mbits/sec
> [ 5] 7.00-8.00 sec 52.8 MBytes 443 Mbits/sec
> [ 5] 8.00-9.00 sec 53.7 MBytes 451 Mbits/sec
> [ 5] 9.00-10.00 sec 54.3 MBytes 455 Mbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.01 sec 537 MBytes 450 Mbits/sec 13 sender
> [ 5] 0.00-10.00 sec 535 MBytes 448 Mbits/sec receiver
>
> iperf Done.
>
> So, it seems better, but not completely solved.
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
> index 2994df41ec2c..119f31c94b61 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
> @@ -17,10 +17,12 @@ static int dwmac4_wrback_get_tx_status(struct stmmac_extra_stats *x,
> struct dma_desc *p,
> void __iomem *ioaddr)
> {
> - u32 tdes3 = le32_to_cpu(p->des3);
> + u32 tdes3;
> int ret = tx_done;
>
> /* Get tx owner first */
> + dma_rmb();
> + tdes3 = le32_to_cpu(p->des3);
> if (unlikely(tdes3 & TDES3_OWN))
> return tx_dma_own;
>
> @@ -70,12 +72,12 @@ static int dwmac4_wrback_get_tx_status(struct stmmac_extra_stats *x,
> static int dwmac4_wrback_get_rx_status(struct stmmac_extra_stats *x,
> struct dma_desc *p)
> {
> - u32 rdes1 = le32_to_cpu(p->des1);
> - u32 rdes2 = le32_to_cpu(p->des2);
> - u32 rdes3 = le32_to_cpu(p->des3);
> + u32 rdes1, rdes2, rdes3;
> int message_type;
> int ret = good_frame;
>
> + dma_rmb();
> + rdes3 = le32_to_cpu(p->des3);
> if (unlikely(rdes3 & RDES3_OWN))
> return dma_own;
>
> @@ -107,6 +109,7 @@ static int dwmac4_wrback_get_rx_status(struct stmmac_extra_stats *x,
>
> message_type = FIELD_GET(RDES1_PTP_MSG_TYPE_MASK, rdes1);
>
> + rdes1 = le32_to_cpu(p->des1);
> if (rdes1 & RDES1_IP_HDR_ERROR) {
> x->ip_hdr_err++;
> ret |= csum_none;
> @@ -152,6 +155,7 @@ static int dwmac4_wrback_get_rx_status(struct stmmac_extra_stats *x,
> if (rdes1 & RDES1_TIMESTAMP_DROPPED)
> x->timestamp_dropped++;
>
> + rdes2 = le32_to_cpu(p->des2);
> if (unlikely(rdes2 & RDES2_SA_FILTER_FAIL)) {
> x->sa_rx_filter_fail++;
> ret = discard_frame;
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
More information about the linux-arm-kernel
mailing list