[PATCH v28 01/20] net: Introduce direct data placement tcp offload

Aurelien Aptel aaptel at nvidia.com
Fri May 16 07:47:34 PDT 2025


Hi Eric,

We have looked into your suggestions, but both have drawbacks.

The first idea was to make the tailroom small/empty to prevent
condensing. The issue is that the header is already placed at the skb
head, and there could be another PDU after the first payload. Placing
the header at the tail of the skb would require copying (which we want
to avoid) and could potentially overwrite anything after it.

The second idea was to use the unreadable bit. We tried setting the bit
in the driver and updating tcp_collapse() to copy the bit along with
other bits. However, making the skb unreadable causes issues at the
other end when the nvme driver reads from it, as the unreadable bit
makes it, well, unreadable. If you look at __skb_datagram_iter(), you'll
see it errs out if skb_frags_readable(skb) is false.

The offload works by calling the iter copy functions while skipping the
memcpy (see patch 3).  We think the unreadable bit is getting close to
what we want if it wasn't for the skb_datagram_iter() check. Maybe the
bit could be unset at a later stage but it's not clear where.
Alternatively, the no_condense bit might be a good compromise? readable
but not condensable.

Thanks



More information about the Linux-nvme mailing list