[PATCH v4 18/24] iommu/arm-smmu-v3: Introduce master->ats_broken flag

Mon Jun 1 22:44:36 PDT 2026

On Mon, Jun 01, 2026 at 09:15:47PM -0300, Jason Gunthorpe wrote:
> On Mon, Jun 01, 2026 at 01:41:26PM -0700, Nicolin Chen wrote:
> > On Mon, Jun 01, 2026 at 09:32:31AM -0300, Jason Gunthorpe wrote:
> > > On Fri, May 29, 2026 at 06:27:40PM -0700, Nicolin Chen wrote:
> > > > On Tue, May 19, 2026 at 09:06:58AM -0300, Jason Gunthorpe wrote:
> > > > > On Mon, May 18, 2026 at 08:39:01PM -0700, Nicolin Chen wrote:
> > > > So I've tried INV_TYPE_ATS_BROKEN: during per-domain invalidation,
> > > > each batch is built from domain->invs so it can carry the "invs";
> > > > if the batch times out, we can immediately mutate its ATS entries.
> > > > 
> > > > But I realized a limitation. E.g., if a device attaches to two SVA
> > > > domains on two SSIDs. An invalidation timing out on one of the SVA
> > > > domains could mark INV_TYPE_ATS_BROKEN in its own invs, but not in
> > > > the other SVA domain's invs?
> > > 
> > > You'd have to mark all the S1's sharing the STE.
> > 
> > That would be a bit convoluted as we would have to go through all
> > other domains' invs arrays.
> 
> Ok, that is certainly an annoying problem.
> 
> I don't have a better idea than storing the master unfortunately
> 
> But I think the locking for that is going to be tricky, I'm not sure it does
> actually fully work..

Yes, there can be a race that sets STE.EATS back while per-master
flag is set, which would skip the ATC_INV in commit(), so no more
ATC_INV timeout that resets STE.EATS=0. To close it, we can force 
STE.EATS=0 at the end of commit() when state->ats_enabled and the
per-master flag are both set, which is only possible in a race.

However, after more thought, I found that there's a big trade-off
to pay in the invalidation path, if we go on this path. The invs
array only has SID, so it needs to convert it to master using the
rb_tree for every entry (holding streams_lock).

I asked AI to list and compare the three solutions that we have:

(1) Per-invs marker: INV_TYPE_ATS_BROKEN + master_domains
    disable_ats() in the timeout path walks master->master_domains
    and flips matching ATS invs entries to the BROKEN type.

  + invs walker is free (one case label in the existing type switch).
  + No lock or pointer deref in the invs walker.
  + No master pointer stored in invs; no lifetime concern.

  - disable_ats() walks every (master, domain) and marks each invs
    set; the list needs locking usable from atomic.
  - No master-level flag, so atc_inv_master loses the cheap early
    skip and attach has nothing to gate ats_enabled on. Concurrent
    quarantine self-heals only after the next ATC_INV times out (~1s),
    unless we also keep a master flag (i.e. a hybrid of (1) and (2)).

(2) Per-master flag + streams_lock
    invs walker resolves SID -> master via streams_lock and reads
    master->ats_broken.

  + Single source of truth on the master.
  + disable_ats() is one WRITE_ONCE.
  + atc_inv_master early-skips via one READ_ONCE.
  + attach gates ats_enabled on the flag; a concurrent quarantine
    race can be closed by a short post-attach re-check in commit()
  + No master pointer in invs; no lifetime concern.

  - invs walker pays streams_lock + rb_find(SID) per ATS entry on
    every invalidation. Measurable on ATS-heavy workloads.

(3) Per-master flag + inv->master pointer (v4)
    invs entry carries a master pointer; the invs walker reads
    cur->master->ats_broken directly.

  + invs walker is one READ_ONCE through a cached pointer.
  + disable_ats is one WRITE_ONCE.
  + atc_inv_master early-skip via one READ_ONCE.
  + attach gate + post-attach re-check, same as (2).

  - invs holds a master ptr, so release_device must synchronize_rcu()
    before freeing the master to drain walkers under rcu_read_lock().
    We dropped this from v4 for that reason.

Side-by-side:
                (1)      (2)      (3)
  invs walker   fastest  slow     fast
  disable_ats   slow     fast     fast
  atc_inv skip  slow     fast     fast
  attach race   slow     fast     fast
  master free   fast     fast     slow

FWIW, AI picks (3), but this would reintroduce synchronize_rcu()
that you dislike. My personal feeling is that (1) is the cleanest
shirt in the dirty laundry?

I'd like to hear your thought before finalizing the design.

Thanks
Nicolin