[PATCH v4 11/24] iommu: Add iommu_report_device_broken() to quarantine a broken device

Nicolin Chen nicolinc at nvidia.com
Tue May 19 17:21:36 PDT 2026


On Tue, May 19, 2026 at 08:02:04PM -0300, Jason Gunthorpe wrote:
> > OK. So you are suggesting a quarantine at the driver-level only:
> > 
> > 1. Driver detects ATC_INV timeout during an invalidation.
> > 2. Driver retries the commands to identify the master.
> 
> I might argue to push even this out to a followup series given it is
> complex and I suspect it becomes much simpler after the batch
> removal...

I see you suggest to treat the entire batch as ATS-broken. Just to
confirm: without per-SID retry, that might falsely block a healthy
device in the ATC batch, right? The driver now batches all ATC_INV
commands via arm_smmu_invs_end_batch().

> > 3. Driver calls pci_disable_ats() and clears STE.EATS.
> > 4. Driver marks domain->invs ATS entries as BROKEN.
> >    (optional since pci_disable_ats() is done?)
> 
> We need to stop sending invs otherwise there will be trouble making
> forward progress.

OK. This needs a surgical invs mutation: maybe INV_TYPE_ATS_BROEKN
that you suggested.

> > 5. Driver sets master->ats_broken to fence concurrent attach:
> >    arm_smmu_write_ste() and arm_smmu_ats_supported().
> 
> Not sure this is needed, if we race some attach then the attach will
> re-set EATS, get another timeout and clear EATS. Doesn't seem worth
> trying to optimize for.

I didn't see that coming. master->ats_enabled && state->ats_enabled
in the commit() for a concurrent attachment would issue an ATC that
may timeout again to re-start the step 1.

And since arm_smmu_atc_inv_master() doesn't use domain->invs, it is
not affected by INV_TYPE_ATS_BROKEN. So, ATC_INV can continue to be
issued in this case.

Ah, I feel that we are walking in the mine field where every single
step could be a kaboom. But your insight is clearly a safe pathway.

> > 6. Something external triggers an FLR (sysfs or AER).
> > 7. FLR goes through pci_dev_reset_iommu_prepare()/done(). done()
> >    reverts 3+4 and calls the reset_device_done callback clearing
> >    master->ats_broken (5).
> 
> It should restore core/driver/hw synchronization of EATS and the
> pci_enable_ats() by installing a blocking domain. Then it can go on to
> re-attach a translating domain and everything is back to correct.

Yea. We probably could drop the master->ats_broken, as done() would
be seemingly sufficient. I'll do the rework first, and see if there
might be some corner case.

> We do need to push a pci error event (didn't see that in this series)
> so the driver can catch it and start the FLR process. I suppose that
> will still need to bounce through a workqueue, and once you have that
> it can also set the blocked domain prior to calling out to the driver.

In the specific case that I am trying to tackle with this series, I
do see AER error prints from the device already but there is no FLR
process. So, I assume that, even if we push a PCI error event, that
wouldn't necessarily trigger an FLR?

Thanks
Nicolin



More information about the linux-arm-kernel mailing list