[BUG] net: stmmac: crash within stmmac_rx()

Shane Francis bigbeeshane at gmail.com
Mon Aug 19 13:08:57 PDT 2024


Hi Andrew & Eric


I think you are both onto the root cause, although MTU is set at 1500 on each
side of the connection I tried with 2 other gigabit devices connected (1 desktop
and 1 laptop). These devices were able to achieve a full 940mbps in each
direction, only minor issue is a large latency penalty while under 900mbps load
(approx +120ms).

The device that triggers the issue when connected is a QCOM IPQ8074 based
router / access point. I'm starting to wonder if that is doing ....
some unintended
optimizations at a firmware level that the stmmac drive is not happy with. To
confirm the MTU was set to 1500 and nothing like GRO / Jumbo packets is set.

I will keep digging and pass back any more information, as even with a
potentially
misbehaving connected device the driver should not crash.


Thanks Again

On Mon, Aug 19, 2024 at 5:25 PM Andrew Lunn <andrew at lunn.ch> wrote:
>
> On Mon, Aug 19, 2024 at 01:26:37PM +0100, Shane Francis wrote:
> > Summary of the problem:
> > ===================
> > Crash observed within stmmac_rx when under high RX demand
> >
> > Hardware : Rockchip RK3588 platform with an RTL8211F NIC
> >
> > the issue seems identical to the one described here :
> > https://lore.kernel.org/netdev/20210514214927.GC1969@qmqm.qmqm.pl/T/
> >
> > Full description of the problem/report:
> > =============================
> > I have observed that when under high upload scenarios the stmmac
> > driver will crash due to what I think is an overflow error, after some
> > debugging I found that stmmac_rx_buf2_len() is returning an
> > unexpectedly high value and assigning to buf2_len here
> > https://github.com/torvalds/linux/blob/v6.6/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c#L5466
> >
> > an example value set that i have observed to causes the crash :
> >     buf1_len = 0
> >     buf2_len = 4294966330
> >
> > from within the stmmac_rx_buf2_len function
> >     plen = 2106
> >     len = 3072
> >
> > the return value would be plen-len or -966 (4294966330 as a uint32
> > that matches the buf2_len)
> >
> > I am unsure on how to debug this further, would clamping
> > stmmac_rx_buf2_len function to return the dma_buf_sz if the return
> > value would have otherwise exceeded it ?
>
> Clamping will just paper over the problem, not fix it. You need to
> keep debugging to really understand what the issue is.
>
> Clearly len > plen is a problem, so you could add a BUG_ON(len > plen)
> which will give you a stack trace. But i doubt that is very
> interesting. You probably want to get into stmmac_get_rx_frame_len()
> and see how it calculates plan. stmmac obfustication makes it hard to
> say which of:
>
> dwmac4_descs.c: .get_rx_frame_len = dwmac4_wrback_get_rx_frame_len,
> dwxgmac2_descs.c:       .get_rx_frame_len = dwxgmac2_get_rx_frame_len,
> enh_desc.c:     .get_rx_frame_len = enh_desc_get_rx_frame_len,
> norm_desc.c:    .get_rx_frame_len = ndesc_get_rx_frame_len,
>
> is being used. But they all look pretty similar.
>
> What i find interesting is that both are greater than 1512, a typical
> ethernet frame size. Are you using jumbo packets? Is the hardware
> doing some sort of GRO?
>
>       Andrew



More information about the linux-arm-kernel mailing list