[PATCH v1 net-next 02/15] net: Introduce direct data placement tcp offload
Jakub Kicinski
kuba at kernel.org
Thu Dec 10 21:01:08 EST 2020
On Wed, 9 Dec 2020 21:26:05 -0700 David Ahern wrote:
> Yes, TCP is a byte stream, so the packets could very well show up like this:
>
> +--------------+---------+-----------+---------+--------+-----+
> | data - seg 1 | PDU hdr | prev data | TCP hdr | IP hdr | eth |
> +--------------+---------+-----------+---------+--------+-----+
> +-----------------------------------+---------+--------+-----+
> | payload - seg 2 | TCP hdr | IP hdr | eth |
> +-----------------------------------+---------+--------+-----+
> +-------- +-------------------------+---------+--------+-----+
> | PDU hdr | payload - seg 3 | TCP hdr | IP hdr | eth |
> +---------+-------------------------+---------+--------+-----+
>
> If your hardware can extract the NVMe payload into a targeted SGL like
> you want in this set, then it has some logic for parsing headers and
> "snapping" an SGL to a new element. ie., it already knows 'prev data'
> goes with the in-progress PDU, sees more data, recognizes a new PDU
> header and a new payload. That means it already has to handle a
> 'snap-to-PDU' style argument where the end of the payload closes out an
> SGL element and the next PDU hdr starts in a new SGL element (ie., 'prev
> data' closes out sgl[i], and the next PDU hdr starts sgl[i+1]). So in
> this case, you want 'snap-to-PDU' but that could just as easily be 'no
> snap at all', just a byte stream and filling an SGL after the protocol
> headers.
This 'snap-to-PDU' requirement is something that I don't understand
with the current TCP zero copy. In case of, say, a storage application
which wants to send some headers (whatever RPC info, block number,
etc.) and then a 4k block of data - how does the RX side get just the
4k block a into a page so it can zero copy it out to its storage device?
Per-connection state in the NIC, and FW parsing headers is one way,
but I wonder how this record split problem is best resolved generically.
Perhaps by passing hints in the headers somehow?
Sorry for the slight off-topic :)
More information about the Linux-nvme
mailing list