[PATCH v25 00/20] nvme-tcp receive offloads
Sagi Grimberg
sagi at grimberg.me
Mon Jun 3 00:09:26 PDT 2024
On 31/05/2024 9:11, Christoph Hellwig wrote:
> FYI, I still absolutely detest this code. I know people want to
> avoid the page copy for NVMe over TCP (or any TCP based storage
> protocols for that matter), but having these weird vendors specific
> hooks all the way up into the application protocol are just horrible.
I hoped for a transparent ddp offload as well, but I don't see how this
is possible.
>
> IETF has standardized a generic data placement protocol, which is
> part of iWarp. Even if folks don't like RDMA it exists to solve
> exactly these kinds of problems of data placement.
iWARP changes the wire protocol. Is your comment to just go make people
use iWARP instead of TCP? or extending NVMe/TCP to natively support DDP?
I think that the former is limiting, and the latter is unclear.
From what I understand, the offload engine uses the NVMe command-id as
the rkey (or stag) for ddp purposes.
> And if we can't
> arse folks into standard data placement methods we at least need it
> vendor independent and without hooks into the actual protocol
> driver.
>
That would be great, but what does a "vendor independent without hooks"
look like from
your perspective? I'd love having this translate to standard (and some
new) socket operations,
but I could not find a way that this can be done given the current
architecture.
Early on, I thought that enabling the queue offload could be modeled as
a setsockopt() and
and nvme_tcp_setup_ddp() would be modeled as a new
recvmsg(MSG_DDP_BUFFER, iovec, tag) but where I got stuck was the whole
async teardown mechanism that the nic has. But if this is solvable, I
think such an interface is much better.
FWIW, I think that the benefit of this is worth having. I think that the
folks from NVIDIA
are committed to supporting and evolving it.
More information about the Linux-nvme
mailing list