[PATCH net-next] net: stmmac: fix .ndo_fix_features()

Russell King (Oracle) linux at armlinux.org.uk
Wed Feb 25 06:28:23 PST 2026


On Wed, Feb 25, 2026 at 02:26:42PM +0100, Andrew Lunn wrote:
> > Then, looking at the "Enable TSO" block of code in stmmac_hw_setup(),
> > it looks to me like we can end up with some queues that have TSO
> > enabled, and others which don't (because TBS has been enabled on the
> > queue.) As far as I'm aware, the network layer doesn't support
> > per-queue TSO.
> 
> There is a software implementation of TSO. See for example:
> 
> commit 3ae8f4e0b98b640aadf410c21185ccb6b5b02351
> Author: Ezequiel Garcia <ezequiel.garcia at free-electrons.com>
> Date:   Mon May 19 14:00:00 2014 -0300
> 
>     net: mv643xx_eth: Implement software TSO
> 
> You might be able to use this to fill in the gaps.

That'll likely need some major rework to the descriptor handling, which
is also buggy, certainly in the non-TSO path.

stmmac_xmit() does this:

        if (unlikely(stmmac_tx_avail(priv, queue) < nfrags + 1)) {
                if (!netif_tx_queue_stopped(netdev_get_tx_queue(dev, queue))) {
                        netif_tx_stop_queue(netdev_get_tx_queue(priv->dev,
                                                                queue));
                        /* This is a hard error, log it. */
                        netdev_err(priv->dev,
                                   "%s: Tx Ring full when queue awake\n",
                                   __func__);
                }
                return NETDEV_TX_BUSY;
        }

Then it sets the descriptors, which are consumed as per:

- 1 descriptor if priv->dma_cap.vlins && skb has a vlan
- for non-jumbo:
  - 1 descriptor for skb head (with sarc & tbs)
- for jumbo:
  - if headlen > 8KiB:
    - 1 descriptor for first bmax (2 or 8 KiB) of skb head
        (sets buffer2 of descriptor to data + 4KiB)
    - 1 descriptor for remainder of skb head
        (sets buffer2 of descriptor to data + 4KiB)
  - otherwise:
    - 1 descriptor for skb head
        (sets buffer2 of descriptor to data + 4KiB)
- 1 descriptor per skb fragment

So, the number of descriptors used is 1 + nfrags + overhead, where
overhead is:

overhead += 1 if jumbo > 8KiB is supported
overhead += 1 if priv->dma_cap.vlins

To put it another way:
- if we have a non-jumbo non-vlan skb, then 1 + nfrags descriptors will
  be used.
- if we have a jumbo vlan skb, then 3 + nfrags descriptors will be
  used.

However, the decision to stop the queue is:

        if (unlikely(stmmac_tx_avail(priv, queue) <= (MAX_SKB_FRAGS + 1))) {
                netif_dbg(priv, hw, priv->dev, "%s: stop transmitted packets\n",
                          __func__);
                netif_tx_stop_queue(netdev_get_tx_queue(priv->dev, queue));
        }

So it doesn't account for the extra descriptor for vlan insertion, nor
does it account for the extra descriptor for a jumbo frame.

In the existing TSO path, the same problem exists - its the same check
for the last test, although the first test is different:

        /* Desc availability based on threshold should be enough safe */
        if (unlikely(stmmac_tx_avail(priv, queue) <
                (((skb->len - proto_hdr_len) / TSO_MAX_BUFF_SIZE + 1)))) {

This doesn't deal with vlan insertion by the hardware, nor I think
jumbo frames (I may be wrong on that) but it does use an extra
descriptor each time the MSS (as the driver calls it, actually
skb_shinfo(skb)->gso_size) changes on this specific queue. So, it
can be one extra descriptor on top of that +1 above.

The implications of this is that if the descriptors are almost full,
there is the possibility to overwrite the tail descriptor (that the
hardware has yet to process) as, when setting up the descriptors,
the insertion index is merely incremented, and not individually
checked to see whether it's caught up with the tail.

I think the only thing which saves the driver from this is that the
transmit descriptor ring is large (512 entries by default) and
probably never fills up in a way that we hit the boundary condition
that allows descriptor overwrite to occur. However, there is this
comment:

/* TX and RX Descriptor Length, these need to be power of two.
 * TX descriptor length less than 64 may cause transmit queue timed out error.
 * RX descriptor length less than 64 may cause inconsistent Rx chain error.
 */
in drivers/net/ethernet/stmicro/stmmac/common.h, which hints at the
problem here: if we do overflow the tail, the hardware may have already
processed that descriptor, but we haven't yet reaped it. The driver
will be expecting to write to the following descriptor. However, when
we real the tail, the driver will clear it. This results in following
packets being added after a descriptor with the hardware OWN bit clear,
so transmission will stop when the hardware gets to that descriptor,
eventually causing a transmit timeout.

With this driver, it's really a case of "which of its many problems
do we address first" because the more I look at it, the more problems
I'm finding.

The next problem which can be seen in my analysis of the descriptor
usage is - take a close look at the jumbo processing. In some
configurations, the length fields are 11 bits. The jumbo code seems
to know this because it calculates bmax accordingly. However, the
rest of the code just assumes that - even though we can only describe
up to 2KiB (actually 4KiB-1) bytes in the length field and it limits
the length of the first descriptor to that, it then promptly maps the
bytes from 4KiB to the total length to the second buffer in the
descriptor. What happens to the 2KiB..4KiB part of the data in this
case?!?!

I have a huge pile of cleanups to this driver that have helped to
uncover these problems by making the code more readable than it
currently is - and one of the problems is, I'm generating cleanups
and finding problems faster than I can get the changes merged.

During the last cycle, I tried to avoid adding to my set of patches
for this driver during the -rc phase. That hasn't helped. It also
meant that I didn't get any chance to post much else, such as the
Marvell PTP changes.

I was thinking that, when I get stmmac's phylink mess sorted out,
(essentially a few more patches beyond the part 3 RFC I posted today)
I'll stop hacking on stmmac, but stmmac just seems to be far too
broken. It needs someone to be a proper maintainer, someone who is
prepared to say "no" to patches that increase the messy complexity in
the driver that I think has led to a lot of these problems... to say
no to patches that introduce e.g. TBS without first thinking about
its interaction with TSO and whether the upper networking layers can
cope with some but not all queues being TSO capable... etc.

I do feel that, having got involved with stmmac to try and sort out
its bloody abuse of phylink, it's now become a huge mill stone, and
I don't have time to look at phylink / sfp related stuff (which is
why I've hardly been able to look at Maxime's patches.)

Can't we just delete stmmac to put it out of its misery? :D

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!



More information about the linux-arm-kernel mailing list