[PATCH v1 0/2] nvme-pci: Fix dma-iommu mapping failures when PAGE_SIZE=64KB

Will Deacon will at kernel.org
Wed Feb 14 08:41:38 PST 2024


Hi Nicolin,

On Tue, Feb 13, 2024 at 01:53:55PM -0800, Nicolin Chen wrote:
> It's observed that an NVME device is causing timeouts when Ubuntu boots
> with a kernel configured with PAGE_SIZE=64KB due to failures in swiotlb:
>     systemd[1]: Started Journal Service.
>  => nvme 0000:00:01.0: swiotlb buffer is full (sz: 327680 bytes), total 32768 (slots), used 32 (slots)
>     note: journal-offline[392] exited with irqs disabled
>     note: journal-offline[392] exited with preempt_count 1
> 
> An NVME device under a PCIe bus can be behind an IOMMU, so dma mappings
> going through dma-iommu might be also redirected to swiotlb allocations.
> Similar to dma_direct_max_mapping_size(), dma-iommu should implement its
> dma_map_ops->max_mapping_size to return swiotlb_max_mapping_size() too.
> 
> Though an iommu_dma_max_mapping_size() is a must, it alone can't fix the
> issue. The swiotlb_max_mapping_size() returns 252KB, calculated from the
> default pool 256KB subtracted by min_align_mask NVME_CTRL_PAGE_SIZE=4KB,
> while dma-iommu can roundup a 252KB mapping to 256KB at its "alloc_size"
> when PAGE_SIZE=64KB via iova->granule that is often set to PAGE_SIZE. So
> this mismatch between NVME_CTRL_PAGE_SIZE=4KB and PAGE_SIZE=64KB results
> in a similar failure, though its signature has a fixed size "256KB":
>     systemd[1]: Started Journal Service.
>  => nvme 0000:00:01.0: swiotlb buffer is full (sz: 262144 bytes), total 32768 (slots), used 128 (slots)
>     note: journal-offline[392] exited with irqs disabled
>     note: journal-offline[392] exited with preempt_count 1
> 
> Both failures above occur to NVME behind IOMMU when PAGE_SIZE=64KB. They
> were likely introduced for the security feature by:
> commit 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers"),
> 
> So, this series bundles two fixes together against that. They should be
> taken at the same time to entirely fix the mapping failures.

It's a bit of a shot in the dark, but I've got a pending fix to some of
the alignment handling in swiotlb. It would be interesting to know if
patch 1 has any impact at all on your NVME allocations:

https://lore.kernel.org/r/20240205190127.20685-1-will@kernel.org

Cheers,

Will



More information about the Linux-nvme mailing list