[PATCH v4 11/24] iommu: Add iommu_report_device_broken() to quarantine a broken device

Wed May 20 00:20:25 PDT 2026

On Tue, May 19, 2026 at 09:30:23PM -0300, Jason Gunthorpe wrote:
> On Tue, May 19, 2026 at 05:21:36PM -0700, Nicolin Chen wrote:
> > On Tue, May 19, 2026 at 08:02:04PM -0300, Jason Gunthorpe wrote:
> > > > OK. So you are suggesting a quarantine at the driver-level only:
> > > > 
> > > > 1. Driver detects ATC_INV timeout during an invalidation.
> > > > 2. Driver retries the commands to identify the master.
> > > 
> > > I might argue to push even this out to a followup series given it is
> > > complex and I suspect it becomes much simpler after the batch
> > > removal...
> > 
> > I see you suggest to treat the entire batch as ATS-broken. Just to
> > confirm: without per-SID retry, that might falsely block a healthy
> > device in the ATC batch, right? The driver now batches all ATC_INV
> > commands via arm_smmu_invs_end_batch().
> 
> Yes, it is not good, but a giant complex series is not reviewable. So
> I'd start with trashing all the devices, then come with a narrowing.

I can take that path for now and leave a FIXME.

Another option is to not batch multiple devices, until we support
retry (which shouldn't be hard to add since we've already done the
coding)?

> > > > 5. Driver sets master->ats_broken to fence concurrent attach:
> > > >    arm_smmu_write_ste() and arm_smmu_ats_supported().
> > > 
> > > Not sure this is needed, if we race some attach then the attach will
> > > re-set EATS, get another timeout and clear EATS. Doesn't seem worth
> > > trying to optimize for.
> > 
> > I didn't see that coming. master->ats_enabled && state->ats_enabled
> > in the commit() for a concurrent attachment would issue an ATC that
> > may timeout again to re-start the step 1.
> > 
> > And since arm_smmu_atc_inv_master() doesn't use domain->invs, it is
> > not affected by INV_TYPE_ATS_BROKEN. So, ATC_INV can continue to be
> > issued in this case.
> > 
> > Ah, I feel that we are walking in the mine field where every single
> > step could be a kaboom. But your insight is clearly a safe pathway.
> 
> We cannot eliminate parallel ATS invalidation. Two threads could be
> concurrently processing the invs list. So it has handle it, the driver
> is going to have to tolerate a number of redundant error events.

OK. That sounds like we still need a flag or locking so that at
least pci_disable_ats() would not be called again. I will see
what I can do.

> > > We do need to push a pci error event (didn't see that in this series)
> > > so the driver can catch it and start the FLR process. I suppose that
> > > will still need to bounce through a workqueue, and once you have that
> > > it can also set the blocked domain prior to calling out to the driver.
> > 
> > In the specific case that I am trying to tackle with this series, I
> > do see AER error prints from the device already but there is no FLR
> > process. 
> 
> It depends on the driver, mlx5 has a FLR RAS flow for instance.

I assume a driver like that would trigger FLR flow on its own?

> A driver with a device that can blow up ATS should implement the FLR
> flow if it wants automatic RAS. It requires driver co-ordination.

Or FLR via sysfs, which I have been doing...

> But I wasn't thinking we can rely on existing AER events here, yes
> probably there will be AERs associated with the device exploding so
> badly it cannot do ATS, but also maybe not..

So, should I put the AER injection on hold for a future work? To
be honest, I am still not very clear how AER injection could help
here; or is it for a case where ATC times out while device isn't
aware of any AER fault?

> This is also a problem if we shoot healthy devices as the first stage,
> there will not be an AER from heathly..
> 
> So I guess we need to decide which is better to tackle, the dedicated
> event or the single invalidation sequence..

I feel it safer to not break healthy devices. Otherwise, would a
nesting parent invalidation falsely block all devices, if one of
them times out?

Thanks
Nicolin