[PATCH] nvme: uring_cmd specific request_queue for SGLs

Mon Jun 30 07:04:47 PDT 2025

On Mon, Jun 30, 2025 at 08:00:16AM +0200, Christoph Hellwig wrote:
> On Fri, Jun 27, 2025 at 09:34:56AM -0600, Keith Busch wrote:
> > Back to this patch, I get that this uring_cmd path wouldn't be able to
> > use the more efficient coalesced mapping when the IOMMU is on, and
> > instead map each segment indiviually. I think that's still better than
> > the alternative, though.
> 
> My back on the envelope calculations (for 8 byte metadata chunks)
> suggests otherwise, but I never got around fully benchmarking it.
> If you do have a representation workload that you care about I'd love to
> see the numbers.

Metadata isn't actually the important part of this patch.

The workload just receives data from a iouring zero-copy network and
writes them out to disk using uring_cmd. The incoming data can have
various offsets, so it often sends an iovec with page gaps.

Currently the kernel provides a bounce buffer when there are page gaps.
That's obviously undesirable when the hardware is capable of handling
the original vector directly.

The options to avoid the copies are either:

  a. Force the application to split each iovec into a separate command

  b. Relax the kernel's limits to match the hardware's capabilities

This patch is trying to do "b".