[PATCH v2 3/7] iommu: Add iommu_report_device_broken() to quarantine a broken device

Tian, Kevin kevin.tian at intel.com
Wed Mar 18 19:35:33 PDT 2026


> From: Nicolin Chen <nicolinc at nvidia.com>
> Sent: Thursday, March 19, 2026 9:31 AM
> 
> On Wed, Mar 18, 2026 at 07:31:18AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc at nvidia.com>
> > > Sent: Wednesday, March 18, 2026 3:16 AM
> > >
> > > +static void iommu_group_broken_worker(struct work_struct *work)
> > > +{
> > > +	struct iommu_group *group =
> > > +		container_of(work, struct iommu_group, broken_work);
> > > +	struct pci_dev *pdev = NULL;
> > > +	struct device *dev;
> > > +
> > > +	scoped_guard(mutex, &group->mutex) {
> > > +		/* Do not block the device again if it has been recovered */
> > > +		if (!READ_ONCE(group->requires_reset))
> > > +			goto out_put;
> > > +		if (list_is_singular(&group->devices)) {
> > > +			/* Note: only support group with a single device */
> >
> > this series is about fixing a vulnerability. Then it sounds incomplete to
> > leave certain configuration still under risk. Probably we should first
> > ensure ATS can be enabled only in singleton group, just like how we
> > did for pci_enable_pasid()?
> 
> I understand your concern. But I am not very sure about applying
> limitation to ATS support. Would this block some existing cases?

what about just throwing out a message to warn that enabling ATS
in a non-singleton iommu group may suffer from unquarantined
situation if any device in the group triggers a ATC invalidation timeout?

> 
> > > +			dev = iommu_group_first_dev(group);
> > > +			if (dev_is_pci(dev)) {
> > > +				pdev = to_pci_dev(dev);
> > > +				pci_dev_get(pdev);
> > > +			}
> > > +		}
> > > +	}
> > > +
> > > +	if (pdev) {
> > > +		/*
> > > +		 * Quarantine the device completely. This will be cleared
> > > upon
> > > +		 * a pci_dev_reset_iommu_done() call indicating the recovery.
> > > +		 */
> > > +		pci_dev_lock(pdev);
> > > +		pci_dev_reset_iommu_prepare(pdev);
> >
> > let's rename it to iommu_quarantine_device() to be called here. then
> > have another wrapper pci_dev_reset_iommu_prepare() to call it too.
> >
> > this path has nothing to do with reset.
> 
> But the implementation of that iommu_quarantine_device would be
> still to shift to resetting_domain. Perhaps, we can rename that
> to quarantine_domain or so.
> 

yes let's rename that too. the purpose of this function is clearly about
quarantine. reset is just one user of it.



More information about the linux-arm-kernel mailing list