[PATCH net v4 0/2] stmmac crash/stall fixes when under memory pressure

Sam Edwards cfsworks at gmail.com
Sat Apr 11 13:40:51 PDT 2026


On Fri, Apr 10, 2026 at 5:23 AM Russell King (Oracle)
<linux at armlinux.org.uk> wrote:
>
> On Thu, Apr 02, 2026 at 10:39:32AM -0700, Sam Edwards wrote:
> > On Thu, Apr 2, 2026 at 10:16 AM Russell King (Oracle)
> > <linux at armlinux.org.uk> wrote:
> > > I've tested this on my Jetson Xavier platform. One of the issues I've
> > > had is that running iperf3 results in the receive side stalling because
> > > it runs out of descriptors. However, despite the receive ring
> > > eventually being re-filled and the hardware appropriately prodded, it
> > > steadfastly refuses to restart, despite the descriptors having been
> > > updated.
> >
> > Hi Russell,
> >
> > Just to make sure I understand correctly: before my patches, you've
> > been observing this problem on Xavier for a while (no interrupts, ring
> > goes dry); with my patches, the ring is refilled, but the dwmac5
> > doesn't resume DMA. (Ah, just saw your follow-up email.)
> >
> > > Any ideas?
> >
> > Off the top of my head, my hypothesis is that dwmac5 has an additional
> > tripwire when the receive DMA is exhausted, and the
> > stmmac_set_rx_tail_ptr()/stmmac_enable_dma_reception() at the end of
> > stmmac_rx_refill() aren't sufficient to wake it back up.
> >
> > I think this is new to dwmac5, because my RK3588 (dwmac4.20 iirc)
> > happily resumes after the same condition.
> >
> > You gave a lot of info; thanks! I'll try to scrape up some
> > documentation on dwmac5 to see if there's something more
> > stmmac_rx_refill() ought to be doing. I think I have a Xavier NX
> > around here somewhere, I'll see if I can repro the problem.
>
> I've added dma_rmb() into dwmac4_wrback_get_tx_status() and
> dwmac4_wrback_get_rx_status(), and with that I've had an iperf3
> instance finally complete... but only once:

Hi Russell,

To me it feels relevant that the T194 doesn't use first-party
ARM/Cortex cores but rather Nvidia's in-house "Carmel" architecture.
Do you suppose the cache there is quirky in such a way that either:
1) We're seeing poor cache hygiene in stmmac where other caches are
more forgiving (more likely)
2) Carmel's cache has a subtle hardware bug triggered by stmmac's
specific access pattern (less likely)?

I'm still trying to get my Xavier NX to boot on net-next. It's running
into eMMC corruption/stalls very early in the boot process (at
slightly different times; feels like a problem in autocalibration)
that I'm not seeing on older kernels. Once I'm done bisecting that
regression I'll take a deeper look at this stmmac mystery. :)

Cheers,
Sam

>
> root at tegra-ubuntu:~# iperf3 -c 192.168.248.1 -R
> Connecting to host 192.168.248.1, port 5201
> Reverse mode, remote host 192.168.248.1 is sending
> [  5] local 192.168.248.174 port 42232 connected to 192.168.248.1 port 5201
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.00   sec  50.8 MBytes   426 Mbits/sec
> [  5]   1.00-2.00   sec  54.9 MBytes   460 Mbits/sec
> [  5]   2.00-3.00   sec  54.0 MBytes   453 Mbits/sec
> [  5]   3.00-4.00   sec  53.8 MBytes   452 Mbits/sec
> [  5]   4.00-5.00   sec  52.4 MBytes   438 Mbits/sec
> [  5]   5.00-6.00   sec  54.3 MBytes   455 Mbits/sec
> [  5]   6.00-7.00   sec  53.7 MBytes   452 Mbits/sec
> [  5]   7.00-8.00   sec  52.8 MBytes   443 Mbits/sec
> [  5]   8.00-9.00   sec  53.7 MBytes   451 Mbits/sec
> [  5]   9.00-10.00  sec  54.3 MBytes   455 Mbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.01  sec   537 MBytes   450 Mbits/sec   13             sender
> [  5]   0.00-10.00  sec   535 MBytes   448 Mbits/sec                  receiver
>
> iperf Done.
>
> So, it seems better, but not completely solved.
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
> index 2994df41ec2c..119f31c94b61 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
> @@ -17,10 +17,12 @@ static int dwmac4_wrback_get_tx_status(struct stmmac_extra_stats *x,
>                                        struct dma_desc *p,
>                                        void __iomem *ioaddr)
>  {
> -       u32 tdes3 = le32_to_cpu(p->des3);
> +       u32 tdes3;
>         int ret = tx_done;
>
>         /* Get tx owner first */
> +       dma_rmb();
> +       tdes3 = le32_to_cpu(p->des3);
>         if (unlikely(tdes3 & TDES3_OWN))
>                 return tx_dma_own;
>
> @@ -70,12 +72,12 @@ static int dwmac4_wrback_get_tx_status(struct stmmac_extra_stats *x,
>  static int dwmac4_wrback_get_rx_status(struct stmmac_extra_stats *x,
>                                        struct dma_desc *p)
>  {
> -       u32 rdes1 = le32_to_cpu(p->des1);
> -       u32 rdes2 = le32_to_cpu(p->des2);
> -       u32 rdes3 = le32_to_cpu(p->des3);
> +       u32 rdes1, rdes2, rdes3;
>         int message_type;
>         int ret = good_frame;
>
> +       dma_rmb();
> +       rdes3 = le32_to_cpu(p->des3);
>         if (unlikely(rdes3 & RDES3_OWN))
>                 return dma_own;
>
> @@ -107,6 +109,7 @@ static int dwmac4_wrback_get_rx_status(struct stmmac_extra_stats *x,
>
>         message_type = FIELD_GET(RDES1_PTP_MSG_TYPE_MASK, rdes1);
>
> +       rdes1 = le32_to_cpu(p->des1);
>         if (rdes1 & RDES1_IP_HDR_ERROR) {
>                 x->ip_hdr_err++;
>                 ret |= csum_none;
> @@ -152,6 +155,7 @@ static int dwmac4_wrback_get_rx_status(struct stmmac_extra_stats *x,
>         if (rdes1 & RDES1_TIMESTAMP_DROPPED)
>                 x->timestamp_dropped++;
>
> +       rdes2 = le32_to_cpu(p->des2);
>         if (unlikely(rdes2 & RDES2_SA_FILTER_FAIL)) {
>                 x->sa_rx_filter_fail++;
>                 ret = discard_frame;
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!



More information about the linux-arm-kernel mailing list