[PATCH v4 2/4] iommu/of: fix device tree configuration for PCI devices

Wed Sep 24 10:44:33 PDT 2025

On 2025-09-09 4:45 pm, Shyam Saini wrote:
> Individual PCI devices lack dedicated device tree nodes, but
> their IOMMU configuration (including reserved IOVA regions) is often
> defined at the PCI host controller level in the device tree. The
> "iommu-addresses" property in reserved-memory nodes specifies IOVA
> ranges that should be reserved for specific devices.
> 
> Currently, PCI devices cannot access these configurations because their
> dev->of_node is NULL, preventing of_iommu_get_resv_regions() from
> discovering reserved regions defined in the device tree.
> 
> There are at least 3 ways to reserve iommu-addresses for individual PCI
> devices,
>   - 1) By dynamically adding DTS nodes for individual PCI devices using
>     [2] CONFIG_PCI_DYNAMIC_OF_NODES, this requires hardcoding PCI device
>     IDs in DECLARE_PCI_FIXUP_FINAL
> 
>   - 2) By adding PCI devices nodes either in DTS or by modifying FDT at
>     boot time in the firmware, eg [3] However, of_iommu driver doesn't
>     seem to handle individual PCI devices, additionally this approach
>     doesn't seem to much scalable for the complex PCI hierarchy
> 
>   - 3) By configuring PCI host controller DTS node for PCI device so
>     that it can inherit iommu-addresses defined in the parent node.
> 
> This commit addresses the problem using approach 3) by assigning the
> PCI host controller's device tree node to PCI devices during IOMMU
> configuration, enabling them to inherit the host controller's device
> tree properties. This allows PCI devices to properly discover and
> reserve IOVA regions specified in the device tree.
> 
> Signed-off-by: Shyam Saini <shyamsaini at linux.microsoft.com>
> ---
>   drivers/iommu/of_iommu.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index 6b989a62def20..077482917e3e8 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -145,6 +145,17 @@ int of_iommu_configure(struct device *dev, struct device_node *master_np,
>   		err = pci_for_each_dma_alias(to_pci_dev(dev),
>   					     of_pci_iommu_init, &info);
>   		of_pci_check_device_ats(dev, master_np);
> +
> +		/*
> +		 * For PCI devices, ensure the device's of_node points to the
> +		 * PCI host controller's device tree node so that reserved regions
> +		 * and other DT-specific IOMMU configuration can be found.
> +		 * PCI devices typically don't have individual DT nodes, but
> +		 * their configuration (including reserved regions) is defined
> +		 * at the PCI host controller level.
> +		 */
> +		if (!err && master_np && !dev->of_node)
> +			dev->of_node = of_node_get(master_np);

This is just wrong. Disregarding the fiddly aspects of node reuse that 
are completely ignored here, an endpoint device is not the host 
bridge/root complex device, so it is wildly inappropriate to associate 
one with the other's DT node and all its properties, resources, etc.

If it truly is the case that boot firmware has somehow "reserved" some 
small amount of *IOVA* address space for specific endpoints (but without 
any endpoint or SMMU configuration, given that those both get reset by 
VFIO?) then frankly it *should* populate the PCI hierarchy in DT so it 
can accurately and truthfully describe what it has done.

On the other hand, if as I suspect it is simply the case that the host 
bridge has limited windows into system *physical* address space, like 
plenty of other systems do, then just like those other systems that 
should be described as standard "dma-ranges" instead of trying to wave 
silly hacks about.

Thanks,
Robin.

>   	} else {
>   		err = of_iommu_configure_device(master_np, dev, id);
>   	}