[PATCH v25 00/20] nvme-tcp receive offloads

David Laight David.Laight at ACULAB.COM
Sat Jun 15 14:34:52 PDT 2024


From: Christoph Hellwig
> Sent: 11 June 2024 07:42
> 
> On Mon, Jun 10, 2024 at 05:30:34PM +0300, Sagi Grimberg wrote:
> >> efficient header splitting in the NIC, either hard coded or even
> >> better downloadable using something like eBPF.
> >
> > From what I understand, this is what this offload is trying to do. It uses
> > the nvme command_id similar to how the read_stag is used in iwarp,
> > it tracks the NVMe/TCP pdus to split pdus from data transfers, and maps
> > the command_id to an internal MR for dma purposes.
> >
> > What I think you don't like about this is the interface that the offload
> > exposes
> > to the TCP ulp driver (nvme-tcp in our case)?
> 
> I don't see why a memory registration is needed at all.
> 
> The by far biggest painpoint when doing storage protocols (including
> file systems) over IP based storage is the data copy on the receive
> path because the payload is not aligned to a page boundary.

How much does the copy cost anyway?
If the hardware has merged the segments then it should be a single copy.
On x86 (does anyone care about anything else :-) 'rep mosvb' with a
cache-line aligned destination runs at 64 bytes/clock.
(The source alignment doesn't matter at all.)
I guess it loads the source data into the D-cache, the target is probably
required anyway - or you wouldn't be doing a read.

	David

> 
> So we need to figure out a way that is as stateless as possible that
> allows aligning the actual data payload on a page boundary in an
> otherwise normal IP receive path.

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)




More information about the Linux-nvme mailing list