[PATCH v25 00/20] nvme-tcp receive offloads

Tue Jun 11 04:01:32 PDT 2024

On 11/06/2024 9:41, Christoph Hellwig wrote:
> On Mon, Jun 10, 2024 at 05:30:34PM +0300, Sagi Grimberg wrote:
>>> efficient header splitting in the NIC, either hard coded or even
>>> better downloadable using something like eBPF.
>>  From what I understand, this is what this offload is trying to do. It uses
>> the nvme command_id similar to how the read_stag is used in iwarp,
>> it tracks the NVMe/TCP pdus to split pdus from data transfers, and maps
>> the command_id to an internal MR for dma purposes.
>>
>> What I think you don't like about this is the interface that the offload
>> exposes
>> to the TCP ulp driver (nvme-tcp in our case)?
> I don't see why a memory registration is needed at all.

I don't see how you can do it without memory registration.

>
> The by far biggest painpoint when doing storage protocols (including
> file systems) over IP based storage is the data copy on the receive
> path because the payload is not aligned to a page boundary.
>
> So we need to figure out a way that is as stateless as possible that
> allows aligning the actual data payload on a page boundary in an
> otherwise normal IP receive path.

But the device gets payload from the network, and needs a buffer
to dma to. In order to dma to the "correct" buffer it needs some
sort of pre-registration expressed with a tag, that the device can
infer by some sort of stream inspection. The socket recv call from
the ulp happens at a later stage.

I am not sure I understand the alignment assurance help the NIC
to dma payload from the network to the "correct" buffer
(i.e. userspace doing O_DIRECT read).