[PATCH v1 2/2] nvme-pci: Fix iommu map (via swiotlb) failures when PAGE_SIZE=64KB

Nicolin Chen nicolinc at nvidia.com
Thu Feb 15 17:07:30 PST 2024


On Thu, Feb 15, 2024 at 12:01:34PM +0000, Robin Murphy wrote:
> On 15/02/2024 4:46 am, Nicolin Chen wrote:
> > On Wed, Feb 14, 2024 at 06:36:38PM -0700, Keith Busch wrote:
> > > On Tue, Feb 13, 2024 at 10:09:19PM -0800, Nicolin Chen wrote:
> > > > On Tue, Feb 13, 2024 at 04:31:04PM -0700, Keith Busch wrote:
> > > > > On Tue, Feb 13, 2024 at 01:53:57PM -0800, Nicolin Chen wrote:
> > > > > > @@ -2967,7 +2967,7 @@ static struct nvme_dev *nvme_pci_alloc_dev(struct pci_dev *pdev,
> > > > > >                dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(48));
> > > > > >        else
> > > > > >                dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> > > > > > -     dma_set_min_align_mask(&pdev->dev, NVME_CTRL_PAGE_SIZE - 1);
> > > > > > +     dma_set_min_align_mask(&pdev->dev, PAGE_SIZE - 1);
> > > > > >        dma_set_max_seg_size(&pdev->dev, 0xffffffff);
> > > > > 
> > > > > I recall we had to do this for POWER because they have 64k pages, but
> > > > > page aligned addresses IOMMU map to 4k, so we needed to allow the lower
> > > > > dma alignment to efficiently use it.
> > > > 
> > > > Thanks for the input!
> > > > 
> > > > In that case, we might have to rely on iovad->granule from the
> > > > attached iommu_domain:
> > > 
> > > I explored a bit more, and there is some PPC weirdness that lead to
> > > NVME_CTRL_PAGE_SIZE, I don't find the dma min align mask used in that
> > > path. It looks like swiotlb is the only user for this, so your original
> > > patch may be just fine.
> > 
> > Oh, that'll be great if we confirmed. And I think I forgot to add
> > CC line to the stable trees: the two patches should be applicable
> > cleanly to older kernels too. Let's wait for some day, so people
> > can give some tests and reviews. Then I will respin a v2 with the
> > CC line.
> 
> Hmm, as far as I understand, NVME_CTRL_PAGE_SIZE represents the
> alignment that NVMe actually cares about, so if specifying that per the
> intended purpose of the API doesn't work then it implies the DMA layer
> is still not doing its job properly, thus I'd rather keep digging and
> try to fix that properly.
>
> FWIW I have a strong suspicion that iommu-dma may not be correctly doing
> what it thinks it's trying to do, so I would definitely think it
> worthwhile to give that a really close inspection in light of Will's
> SWIOTLB fixes.

Yes. Let's figure out what's breaking Will's change.

Thanks
Nicolin



More information about the Linux-nvme mailing list