[PATCH] nvme-pci: fix swapped arguments in SGL DMA unmap path
Alireza Haghdoost
haghdoost at uber.com
Fri Apr 10 15:39:17 PDT 2026
On Fri, Apr 10, 2026 at 3:29 PM Alireza Haghdoost <haghdoost at uber.com> wrote:
>
> The arguments to nvme_free_sgls() in nvme_unmap_data() are swapped for
> the multi-entry SGL case. The first argument (sge) should be the
> segment descriptor from the NVMe command's data pointer (type
> NVME_SGL_FMT_LAST_SEG_DESC), and the second argument (sg_list) should
> be the pool-allocated array of data descriptors.
>
> With the arguments swapped, sge points to the first data descriptor
> (type NVME_SGL_FMT_DATA_DESC). nvme_free_sgls() sees a data descriptor,
> unmaps only that single entry, and returns -- leaking the DMA mappings
> for all subsequent segments.
>
> This manifests as unbounded iommu_iova slab growth on ARM64 systems
> with 64K pages and IOMMU DMA translation, where IOVA coalescing is
> disabled due to the NVMe 4K page / IOMMU 64K page granularity
> mismatch. On x86 and ARM64 with 4K pages, IOVA coalescing handles
> the unmap via dma_iova_destroy() and the buggy path is never reached.
>
> Fixes: 7ce3c1dd78fc ("nvme-pci: convert the data mapping to blk_rq_dma_map")
> Signed-off-by: Alireza Haghdoost <haghdoost at uber.com>
> ---
> drivers/nvme/host/pci.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 28f638413e122..728999e4247d8 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -761,8 +761,8 @@ static void nvme_unmap_data(struct request *req)
>
> if (!blk_rq_dma_unmap(req, dma_dev, &iod->dma_state, iod->total_len)) {
> if (nvme_pci_cmd_use_sgl(&iod->cmd))
> - nvme_free_sgls(req, iod->descriptors[0],
> - &iod->cmd.common.dptr.sgl);
> + nvme_free_sgls(req, &iod->cmd.common.dptr.sgl,
> + iod->descriptors[0]);
> else
> nvme_free_prps(req);
> }
> --
> 2.39.5
Apologies, I wasn't aware Roger Pau Monne already submitted this fix
(commit a54afbc8a2138 "nvme-pci: DMA unmap the correct regions in
nvme_free_sgls"), which is already in 6.19.y. Please disregard this
patch.
For the record, we independently confirmed the bug on production ARM64
hosts (64K pages, IOMMU DMA-FQ) where it caused ~490 GiB of leaked
iommu_iova slab over 42 days. Setting sgl_threshold=0 stopped the leak
immediately.
Alireza
More information about the Linux-nvme
mailing list