[PATCH] nvme: uring_cmd specific request_queue for SGLs

Christoph Hellwig hch at lst.de
Wed Jun 25 22:14:13 PDT 2025


On Wed, Jun 25, 2025 at 04:08:28PM -0600, Keith Busch wrote:
> If you send a readv/writev with a similar iovec to a O_DIRECT block
> device, then it will just get split on the gapped virt boundaries but it
> still uses it directly without bouncing. We can't split passthrough
> requests though, so it'd be preferable to use the iovec in a single
> command if the hardware supports it rather than bounce it.

True.

> > Note that this directly conflict with the new DMA API.  There we do
> > rely on the virt boundary to gurantee that the IOMMU path can always
> > coalesce the entire request into a single IOVA mapping.  We could still
> > do it for the direct mapping path, where it makes a difference, but
> > we really should do that everywhere, i.e. revist the default
> > sgl_threshold and see if we could reduce it to 2 * PAGE_SIZE or so
> > so that we'd only use PRPs for the simple path where we can trivially
> > do the virt_boundary check right in NVMe.
> 
> Sure, that sounds okay if you mean 2 * NVME_CTRL_PAGE_SIZE.
> 
> It looks straight forward to add merging while we iterate for the direct
> mapping result if it returns mergable iova's, but I think we'd have to
> commit to using SGL over PRP for everything but the simple case, and
> drop the PRP imposed virt boundary. The downside might be we'd lose that
> iova pre-allocation optimization (dma_iova_try_alloc) you have going on,
> but I'm not sure how important that is. Could the direct mapping get too
> fragmented to consistently produce contiguous iova's in this path?

I can't really parse this.  Direct mapping means not using an IOMMU
mapping, either because there is none or because it is configured to
do an identity mapping.  In that case we'll never use the IOVA path.

If an IOMMU is configured for dynamic IOMMU mappings we never use the
direct mapping.  In that case we'd have to do one IOMMU mapping per
segment with the IOVA mapping path that requires (IOMMU) page alignment,
which will be very expensive.



More information about the Linux-nvme mailing list