[PATCH v25 00/20] nvme-tcp receive offloads

Mon Jun 3 00:09:26 PDT 2024

On 31/05/2024 9:11, Christoph Hellwig wrote:
> FYI, I still absolutely detest this code.  I know people want to
> avoid the page copy for NVMe over TCP (or any TCP based storage
> protocols for that matter), but having these weird vendors specific
> hooks all the way up into the application protocol are just horrible.

I hoped for a transparent ddp offload as well, but I don't see how this
is possible.

>
> IETF has standardized a generic data placement protocol, which is
> part of iWarp.  Even if folks don't like RDMA it exists to solve
> exactly these kinds of problems of data placement.

iWARP changes the wire protocol. Is your comment to just go make people
use iWARP instead of TCP? or extending NVMe/TCP to natively support DDP?

I think that the former is limiting, and the latter is unclear.

 From what I understand, the offload engine uses the NVMe command-id as
the rkey (or stag) for ddp purposes.

>    And if we can't
> arse folks into standard data placement methods we at least need it
> vendor independent and without hooks into the actual protocol
> driver.
>

That would be great, but what does a "vendor independent without hooks" 
look like from
your perspective? I'd love having this translate to standard (and some 
new) socket operations,
but I could not find a way that this can be done given the current 
architecture.

Early on, I thought that enabling the queue offload could be modeled as 
a setsockopt() and
and nvme_tcp_setup_ddp() would be modeled as a new 
recvmsg(MSG_DDP_BUFFER, iovec, tag) but where I got stuck was the whole 
async teardown mechanism that the nic has. But if this is solvable, I 
think such an interface is much better.
FWIW, I think that the benefit of this is worth having. I think that the 
folks from NVIDIA
are committed to supporting and evolving it.