[PATCH 7/9] nvme-pci: convert the data mapping blk_rq_dma_map
Daniel Gomez
da.gomez at kernel.org
Wed Jun 11 07:13:22 PDT 2025
On 10/06/2025 07.06, Christoph Hellwig wrote:
> Use the blk_rq_dma_map API to DMA map requests instead of scatterlists.
> This removes the need to allocate a scatterlist covering every segment,
> and thus the overall transfer length limit based on the scatterlist
> allocation.
>
> Instead the DMA mapping is done by iterating the bio_vec chain in the
> request directly. The unmap is handled differently depending on how
> we mapped:
>
> - when using an IOMMU only a single IOVA is used, and it is stored in
> iova_state
> - for direct mappings that don't use swiotlb and are cache coherent no
> unmap is needed at all
> - for direct mappings that are not cache coherent or use swiotlb, the
> physical addresses are rebuild from the PRPs or SGL segments
>
> The latter unfortunately adds a fair amount of code to the driver, but
> it is code not used in the fast path.
>
> The conversion only covers the data mapping path, and still uses a
> scatterlist for the multi-segment metadata case. I plan to convert that
> as soon as we have good test coverage for the multi-segment metadata
> path.
>
> Thanks to Chaitanya Kulkarni for an initial attempt at a new DMA API
> conversion for nvme-pci, Kanchan Joshi for bringing back the single
> segment optimization, Leon Romanovsky for shepherding this through a
> gazillion rebases and Nitesh Shetty for various improvements.
>
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
> drivers/nvme/host/pci.c | 388 +++++++++++++++++++++++++---------------
> 1 file changed, 242 insertions(+), 146 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 04461efb6d27..2d3573293d0c 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
...
> @@ -2908,26 +3018,14 @@ static int nvme_disable_prepare_reset(struct nvme_dev *dev, bool shutdown)
> static int nvme_pci_alloc_iod_mempool(struct nvme_dev *dev)
Since this pool is now used exclusively for metadata, it makes sense to update
the function name accordingly:
static int nvme_pci_alloc_iod_meta_mempool(struct nvme_dev *dev)
> {
> size_t meta_size = sizeof(struct scatterlist) * (NVME_MAX_META_SEGS + 1);
> - size_t alloc_size = sizeof(struct scatterlist) * NVME_MAX_SEGS;
> -
> - dev->iod_mempool = mempool_create_node(1,
> - mempool_kmalloc, mempool_kfree,
> - (void *)alloc_size, GFP_KERNEL,
> - dev_to_node(dev->dev));
> - if (!dev->iod_mempool)
> - return -ENOMEM;
>
> dev->iod_meta_mempool = mempool_create_node(1,
> mempool_kmalloc, mempool_kfree,
> (void *)meta_size, GFP_KERNEL,
> dev_to_node(dev->dev));
> if (!dev->iod_meta_mempool)
> - goto free;
> -
> + return -ENOMEM;
> return 0;
> -free:
> - mempool_destroy(dev->iod_mempool);
> - return -ENOMEM;
> }
>
> static void nvme_free_tagset(struct nvme_dev *dev)
More information about the Linux-nvme
mailing list