[PATCH v1 0/2] nvme-pci: Fix dma-iommu mapping failures when PAGE_SIZE=64KB

Will Deacon will at kernel.org
Thu Feb 15 06:22:09 PST 2024


On Wed, Feb 14, 2024 at 11:57:32AM -0800, Nicolin Chen wrote:
> On Wed, Feb 14, 2024 at 04:41:38PM +0000, Will Deacon wrote:
> > On Tue, Feb 13, 2024 at 01:53:55PM -0800, Nicolin Chen wrote:
> > > It's observed that an NVME device is causing timeouts when Ubuntu boots
> > > with a kernel configured with PAGE_SIZE=64KB due to failures in swiotlb:
> > >     systemd[1]: Started Journal Service.
> > >  => nvme 0000:00:01.0: swiotlb buffer is full (sz: 327680 bytes), total 32768 (slots), used 32 (slots)
> > >     note: journal-offline[392] exited with irqs disabled
> > >     note: journal-offline[392] exited with preempt_count 1
> > >
> > > An NVME device under a PCIe bus can be behind an IOMMU, so dma mappings
> > > going through dma-iommu might be also redirected to swiotlb allocations.
> > > Similar to dma_direct_max_mapping_size(), dma-iommu should implement its
> > > dma_map_ops->max_mapping_size to return swiotlb_max_mapping_size() too.
> > >
> > > Though an iommu_dma_max_mapping_size() is a must, it alone can't fix the
> > > issue. The swiotlb_max_mapping_size() returns 252KB, calculated from the
> > > default pool 256KB subtracted by min_align_mask NVME_CTRL_PAGE_SIZE=4KB,
> > > while dma-iommu can roundup a 252KB mapping to 256KB at its "alloc_size"
> > > when PAGE_SIZE=64KB via iova->granule that is often set to PAGE_SIZE. So
> > > this mismatch between NVME_CTRL_PAGE_SIZE=4KB and PAGE_SIZE=64KB results
> > > in a similar failure, though its signature has a fixed size "256KB":
> > >     systemd[1]: Started Journal Service.
> > >  => nvme 0000:00:01.0: swiotlb buffer is full (sz: 262144 bytes), total 32768 (slots), used 128 (slots)
> > >     note: journal-offline[392] exited with irqs disabled
> > >     note: journal-offline[392] exited with preempt_count 1
> > >
> > > Both failures above occur to NVME behind IOMMU when PAGE_SIZE=64KB. They
> > > were likely introduced for the security feature by:
> > > commit 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers"),
> > >
> > > So, this series bundles two fixes together against that. They should be
> > > taken at the same time to entirely fix the mapping failures.
> > 
> > It's a bit of a shot in the dark, but I've got a pending fix to some of
> > the alignment handling in swiotlb. It would be interesting to know if
> > patch 1 has any impact at all on your NVME allocations:
> > 
> > https://lore.kernel.org/r/20240205190127.20685-1-will@kernel.org
> 
> I applied these three patches locally for a test.

Thank you!

> Though I am building with a v6.6 kernel, I see some warnings:
>                  from kernel/dma/swiotlb.c:26:
> kernel/dma/swiotlb.c: In function ‘swiotlb_area_find_slots’:
> ./include/linux/minmax.h:21:35: warning: comparison of distinct pointer types lacks a cast
>    21 |         (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
>       |                                   ^~
> ./include/linux/minmax.h:27:18: note: in expansion of macro ‘__typecheck’
>    27 |                 (__typecheck(x, y) && __no_side_effects(x, y))
>       |                  ^~~~~~~~~~~
> ./include/linux/minmax.h:37:31: note: in expansion of macro ‘__safe_cmp’
>    37 |         __builtin_choose_expr(__safe_cmp(x, y), \
>       |                               ^~~~~~~~~~
> ./include/linux/minmax.h:75:25: note: in expansion of macro ‘__careful_cmp’
>    75 | #define max(x, y)       __careful_cmp(x, y, >)
>       |                         ^~~~~~~~~~~~~
> kernel/dma/swiotlb.c:1007:26: note: in expansion of macro ‘max’
>  1007 |                 stride = max(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1);
>       |                          ^~~
> 
> Replacing with a max_t() can fix these.

Weird, I haven't seen that. I can fix it as you suggest, but please can
you also share your .config so I can look into it further?

> And it seems to get worse, as even a 64KB mapping is failing:
> [    0.239821] nvme 0000:00:01.0: swiotlb buffer is full (sz: 65536 bytes), total 32768 (slots), used 0 (slots)
> 
> With a printk, I found the iotlb_align_mask isn't correct:
>    swiotlb_area_find_slots:alloc_align_mask 0xffff, iotlb_align_mask 0x800
> 
> But fixing the iotlb_align_mask to 0x7ff still fails the 64KB
> mapping..

Hmm. A mask of 0x7ff doesn't make a lot of sense given that the slabs
are 2KiB aligned. I'll try plugging in some of the constants you have
here, as something definitely isn't right...

Will



More information about the Linux-nvme mailing list