[PATCH v2 3/7] iommu: Add iommu_report_device_broken() to quarantine a broken device
Shuai Xue
xueshuai at linux.alibaba.com
Wed Mar 18 04:45:19 PDT 2026
On 3/18/26 3:15 AM, Nicolin Chen wrote:
> When an IOMMU hardware detects an error due to a faulty device (e.g. an ATS
> invalidation timeout), IOMMU drivers may quarantine the device by disabling
> specific hardware features or dropping translation capabilities.
>
> However, the core-level states of the faulty device are out of sync, as the
> device can be still attached to a translation domain or even potentially be
> moved to a new domain that might overwrite the driver-level quarantine.
>
> Given that such an error can be likely an ISR, introduce a broken_work per
> iommu_group, and add a helper function to allow driver to report the broken
> device, so as to completely quarantine it in the core.
>
> Use the existing pci_dev_reset_iommu_prepare() function to shift the device
> to its resetting_domain/blocking_domain. A later pci_dev_reset_iommu_done()
> call will clear it and move it out of the quarantine.
>
> Signed-off-by: Nicolin Chen <nicolinc at nvidia.com>
> ---
> include/linux/iommu.h | 2 ++
> drivers/iommu/iommu.c | 59 +++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 61 insertions(+)
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 9ba12b2164724..9b5f94e566ff9 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -891,6 +891,8 @@ static inline struct iommu_device *__iommu_get_iommu_dev(struct device *dev)
> #define iommu_get_iommu_dev(dev, type, member) \
> container_of(__iommu_get_iommu_dev(dev), type, member)
>
> +void iommu_report_device_broken(struct device *dev);
> +
This declaration is inside the #ifdef CONFIG_IOMMU_API section, but
there's no corresponding stub in the #else block. While current
callers (arm-smmu-v3) always have CONFIG_IOMMU_API, for API
completeness, please add a stub.
Thanks.
Shuai
More information about the linux-arm-kernel
mailing list