[PATCH v2 0/3] arm-smmu: select suitable IOVA

Shyam Saini shyamsaini at linux.microsoft.com
Thu May 29 11:22:19 PDT 2025


On Tue, May 27, 2025 at 09:04:25PM -0300, Jason Gunthorpe wrote:
> On Tue, May 27, 2025 at 01:54:28PM -0700, Shyam Saini wrote:
> > > The above is the only place that creates a IOMMU_RESV_SW_MSI so it is
> > > definately called and used, right? If not where does your
> > > IOMMU_RESV_SW_MSI come from?
> > 
> > code tracing and printks in that code path suggests iommu_dma_get_resv_regions()
> > called by vfio-pci driver, 
> 
> Yes, I know it is, that is how it setups the SW_MSI.
> 
> > > As above, I've asked a few times now if your resv_regions() is
> > > correct, meaning there is a reserved range covering the address space
> > > that doesn't have working translation. That means
> > > iommu_get_resv_regions() returns such a range.
> > 
> > sorry about missing that, i see msi iova being reserved:
> > 
> > cat /sys/kernel/iommu_groups/*/reserved_regions
> > 0x0000000008000000 0x00000000080fffff msi
> > 0x0000000008000000 0x00000000080fffff msi
> > 0x0000000008000000 0x00000000080fffff msi
> > 0x0000000008000000 0x00000000080fffff msi
> > [output trimmed]
> 
> But this does not seem correct, you should have a "reserved" region
> covering 0x8000000 as well because you say your platform cannot do DMA
> to 0x8000000 and this is why you are doing all this.
> 
> All IOVA that the platform cannot DMA from should be reported in the
> reserved_regions file as "reserved". You must make your platform
> achieve this.

so should it be for all the iommu groups?

                no_dma_mem {
                       reg = <0x0 0x8000000 0x0 0x100000>;
                        no-map;
                };
 
i think that's how we reserve memory in general, eg: ramoops
but this doesn't show up in:
  /sys/kernel/iommu_groups/*/reserved_regions

 
> > Yes, i tried that,
> > 
> > This is how my dts node looked like
> > reserved-memory {
> >                faulty_iova: resv_faulty {
> >                        iommu-addresses = <&pcieX 0x8000000 0x100000>;
> >                };
> >                ..
> >                ..
> > }
> > 
> > &pcieX {
> >     memory-region = <&faulty_iova>;
> > };
> > 
> > I see it working for the devices which are calling
> > iommu_get_resv_regions(), eg if I specify faulty_iova for dma
> > controller dts node then i see an additional entry in the related
> > group
> 
> Exactly, it has to flow from the DT into the reserved_regions, that is
> essential.
 
> So what is the problem if you have figured out how to fix up
> /sys/kernel/iommu_groups/Y/reserved_regions?

sorry, i haven't yet
 
> If you found some cases where you can't get /sys/../reserved_regions
> to report the right things from the DT then that needs to be addressed
> first before you think about fixing SW_MSI.
> 
> I very vaguely recall we have some gaps on OF where the DMA-API code
> is understanding parts of the DT that don't get mapped into
> reserved_regions and nobody has cared to fix it because it only
> effects VFIO. You may have landed in the seat that has to fix it :)

I think this is the case we are dealing with?

> But I still don't have a clear sense of what your actual problem is as
> you are show DT that seems reasonable and saying that
> /sys/../reserved_regions is working..

/sys/../reserved_regions is working for certain devices like dma controller
but it doesn't work for pcie devices and its vfio-pcie driver calling
iommu_get_resv_regions() but we don't have dts node for vfio.
I have confirmed this about pcie on two different platforms, it seems to be
OF DMA-API gap that you hinted above, happy to work on that :), it would be
great if you can share any other reference discussions to that problem

When i specify this for dma controller:

		faulty_iova: resv_faulty {
			iommu-addresses = <&dmaX 0x8000000 0x100000>;
		};
&dmaX {
	memory-region = <&faulty_iova>;
};

I see following:
$ cat /sys/kernel/iommu_groups/y/reserved_regions 
0x0000000008000000 0x00000000080fffff reserved
0x00000000a0000000 0x00000000a00fffff msi

Clarifying the Issue with MSI and SMMU Faults on Our Platform:

We are encountering SMMU faults when using our userspace tool/driver that
relies on MSI. Specifically, the issue arises when the MSI_IOVA_BASE is set
to the current default value of 0x08000000.

The observed fault is as follows:

arm-smmu 64000000.iommu: Unhandled context fault: fsr=0x402, iova=0x00000040,
fsynr=0x2f0013, cbfrsynra=0x102, cb=15

Upon investigation, our hardware team confirmed that the memory region
containing 0x08000000 is already mapped for other peripherals, making it
unavailable for MSI usage.

eg: using 0xa0000000 as MSI_IOVA_BASE solves our problem.

let me know if you have any other questions

Thanks,
Shyam



More information about the linux-arm-kernel mailing list