[PATCH v4 11/24] iommu: Add iommu_report_device_broken() to quarantine a broken device
Jason Gunthorpe
jgg at nvidia.com
Thu May 21 06:12:48 PDT 2026
On Wed, May 20, 2026 at 11:13:14AM -0700, Nicolin Chen wrote:
> > > > We cannot eliminate parallel ATS invalidation. Two threads could be
> > > > concurrently processing the invs list. So it has handle it, the driver
> > > > is going to have to tolerate a number of redundant error events.
> > >
> > > OK. That sounds like we still need a flag or locking so that at
> > > least pci_disable_ats() would not be called again. I will see
> > > what I can do.
> >
> > I think we can call pci_disable_ats() as many times as we want
>
> That triggers WARN_ON(!dev->ats_enabled) in pci_disable_ats :-(
IMHO I'd rather take that out than add a bunch of complication in the
iommu drivers..
> > Still, I'd feel better if it is was definititive and we didn't rely on
> > this. This further points that the driver has to merge multiple error
> > notifications if it gets some AERs and a new "ATC ERROR" all for the
> > same key event.
>
> I feel some race here... Part of the complexity of this v4 is to deal
> with concurrent device reset during the async report() between IOMMU
> core and driver. Now, we add AER that could compete on the device side
> as well...
It is always going to have concurrent events, so long as the resets
sequence in an orderly way it doesn't matter if they overlap.
Most likely the driver will have locking that prevents it from pushing
concurrent resets.
Jason
More information about the linux-arm-kernel
mailing list