[PATCH v2 0/3] arm-smmu: select suitable IOVA

Shyam Saini shyamsaini at linux.microsoft.com
Tue May 27 13:54:28 PDT 2025


On Sun, May 25, 2025 at 04:07:03PM -0300, Jason Gunthorpe wrote:
> On Tue, May 20, 2025 at 03:42:24PM -0700, Shyam Saini wrote:
> > Hi Jason,
> > 
> > apologies for the delayed response.
> > 
> > > On Wed, Apr 16, 2025 at 11:04:27AM -0700, Jacob Pan wrote:
> > > 
> > > > Per last discussion "SMMU driver have a list of potential addresses and
> > > > select the first one that does not intersect with the non-working IOVA
> > > > ranges.". If we don't know what the "non-working IOVA" is, how do we
> > > > know it does not intersect the "potential addresses"?
> > > 
> > > I had understood from previous discussions that this platform is
> > > properly creating IOMMU_RESV_RESERVED regions for the IOVA that
> > > doesn't work. Otherwise everything is broken..
> > > 
> > > Presumably that happens through iommu_dma_get_resv_regions() calling
> > > of_iommu_get_resv_regions() on a DT platform. There is a schema
> > > describing how to do this, so platform firmware should be able to do it..
> > > 
> > > So the fix seems trivial enough to me:
> > > 
> > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > index b4c21aaed1266a..ebba18579151bc 100644
> > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > @@ -3562,17 +3562,29 @@ static int arm_smmu_of_xlate(struct device *dev,
> > >  static void arm_smmu_get_resv_regions(struct device *dev,
> > >  				      struct list_head *head)
> > >  {
> > > -	struct iommu_resv_region *region;
> > > -	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
> > > -
> > > -	region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
> > > -					 prot, IOMMU_RESV_SW_MSI, GFP_KERNEL);
> > > -	if (!region)
> > > -		return;
> > > -
> > > -	list_add_tail(&region->list, head);
> > > +	static const u64 msi_bases[] = { MSI_IOVA_BASE, 0x12340000 };
> > >  
> > >  	iommu_dma_get_resv_regions(dev, head);
> > 
> > my understand is, this hook is not called for all the devices, eg: pcie dts node
> > doesn't use [1] "iommus" property instead it uses "iommu-map" property
> > as a consequence, [1] while loop exits prematurely and iommu_dma_get_resv_regions()
> > is not called, so there is no IOVA reservation for the pcie device. 
> 
> I can't really understand this sentance.
> 
> The above is the only place that creates a IOMMU_RESV_SW_MSI so it is
> definately called and used, right? If not where does your
> IOMMU_RESV_SW_MSI come from?

code tracing and printks in that code path suggests iommu_dma_get_resv_regions()
called by vfio-pci driver, i didn't mention vfio-pci in my last reply since it
doesn't have an associated device tree node, sorry about that

By enabling this [1] dev_dbg message i get this:

vfio-pci 0000:01:00.2: device is behind an iommu

In case of 0000:01:00.2 device, when it invokes iommu_dma_get_resv_regions(),
code hit [2] this path

> 
> This function is also the only thing that computes the reserved ranges
> that iommu_get_resv_regions() returns.
> 
> As above, I've asked a few times now if your resv_regions() is
> correct, meaning there is a reserved range covering the address space
> that doesn't have working translation. That means
> iommu_get_resv_regions() returns such a range.

sorry about missing that, i see msi iova being reserved:

cat /sys/kernel/iommu_groups/*/reserved_regions
0x0000000008000000 0x00000000080fffff msi
0x0000000008000000 0x00000000080fffff msi
0x0000000008000000 0x00000000080fffff msi
0x0000000008000000 0x00000000080fffff msi
[output trimmed]

> 
> If you don't have that then you have a bigger platform problem, IMHO,
> as vfio/iommufd only respect reserved ranges.
> 
> Otherwise, what is the issue you see, exactly? Did you even try it?
> 

Yes, i tried that,

This is how my dts node looked like
reserved-memory {
               faulty_iova: resv_faulty {
                       iommu-addresses = <&pcieX 0x8000000 0x100000>;
               };
               ..
               ..
}

&pcieX {
    memory-region = <&faulty_iova>;
};

I see it working for the devices which are calling iommu_get_resv_regions(), eg if I
specify faulty_iova for dma controller dts node then i see an additional entry
in the related group, say Y: /sys/kernel/iommu_groups/Y/reserved_regions

Did i misunderstood? appreciate your help on this

Thanks,
Shyam

[1] https://elixir.bootlin.com/linux/v6.15-rc7/source/drivers/of/device.c#L170
[2] https://elixir.bootlin.com/linux/v6.15-rc7/source/drivers/iommu/of_iommu.c#L145



More information about the linux-arm-kernel mailing list