[PATCH] nvme: uring_cmd specific request_queue for SGLs

Christoph Hellwig hch at lst.de
Mon Jun 30 23:16:09 PDT 2025


On Mon, Jun 30, 2025 at 08:04:47AM -0600, Keith Busch wrote:
> > My back on the envelope calculations (for 8 byte metadata chunks)
> > suggests otherwise, but I never got around fully benchmarking it.
> > If you do have a representation workload that you care about I'd love to
> > see the numbers.
> 
> Metadata isn't actually the important part of this patch.
> 
> The workload just receives data from a iouring zero-copy network and
> writes them out to disk using uring_cmd. The incoming data can have
> various offsets, so it often sends an iovec with page gaps.
> 
> Currently the kernel provides a bounce buffer when there are page gaps.
> That's obviously undesirable when the hardware is capable of handling
> the original vector directly.

Yes, the bounce buffer is obviously not very efficient when transferring
large amount of data.

> The options to avoid the copies are either:
> 
>   a. Force the application to split each iovec into a separate command
> 
>   b. Relax the kernel's limits to match the hardware's capabilities
> 
> This patch is trying to do "b".

a, or a variant of that (not using passthrough) would in general be
my preference.  Why is that not suitable here?



More information about the Linux-nvme mailing list