[PATCH v25 00/20] nvme-tcp receive offloads
Sagi Grimberg
sagi at grimberg.me
Mon Jun 10 07:30:34 PDT 2024
On 10/06/2024 15:29, Christoph Hellwig wrote:
> On Mon, Jun 03, 2024 at 10:09:26AM +0300, Sagi Grimberg wrote:
>>> IETF has standardized a generic data placement protocol, which is
>>> part of iWarp. Even if folks don't like RDMA it exists to solve
>>> exactly these kinds of problems of data placement.
>> iWARP changes the wire protocol.
> Compared to plain NVMe over TCP that's a bit of an understatement :)
Yes :) the comment was that people want to use NVMe/TCP, and adding
DDP awareness inspired by iWARP would change the existing NVMe/TCP wire
protocol.
This offload, does not.
>
>> Is your comment to just go make people
>> use iWARP instead of TCP? or extending NVMe/TCP to natively support DDP?
> I don't know to be honest. In many ways just using RDMA instead of
> NVMe/TCP would solve all the problems this is trying to solve, but
> there are enough big customers that have religious concerns about
> the use of RDMA.
>
> So if people want to use something that looks non-RDMA but have the
> same benefits we have to reinvent it quite similarly under a different
> name. Looking at DDP and what we can learn from it without bringing
> the Verbs API along might be one way to do that.
>
> Another would be to figure out what amount of similarity and what
> amount of state we need in an on the wire protocol to have an
> efficient header splitting in the NIC, either hard coded or even
> better downloadable using something like eBPF.
From what I understand, this is what this offload is trying to do. It uses
the nvme command_id similar to how the read_stag is used in iwarp,
it tracks the NVMe/TCP pdus to split pdus from data transfers, and maps
the command_id to an internal MR for dma purposes.
What I think you don't like about this is the interface that the offload
exposes
to the TCP ulp driver (nvme-tcp in our case)?
>
>> That would be great, but what does a "vendor independent without hooks"
>> look like from
>> your perspective? I'd love having this translate to standard (and some new)
>> socket operations,
>> but I could not find a way that this can be done given the current
>> architecture.
> Any amount of calls into NIC/offload drivers from NVMe is a nogo.
>
Not following you here...
*something* needs to program a buffer for DDP, *something* needs to
invalidate this buffer, *something* needs to declare a TCP stream as DDP
capable.
Unless I interpret what you're saying is that the interface needs to be
generalized to
extend the standard socket operations (i.e.
[s|g]etsockopt/recvmsg/cmsghdr etc) ?
More information about the Linux-nvme
mailing list