[PATCH v4 18/24] iommu/arm-smmu-v3: Introduce master->ats_broken flag
Nicolin Chen
nicolinc at nvidia.com
Mon Jun 1 22:44:36 PDT 2026
On Mon, Jun 01, 2026 at 09:15:47PM -0300, Jason Gunthorpe wrote:
> On Mon, Jun 01, 2026 at 01:41:26PM -0700, Nicolin Chen wrote:
> > On Mon, Jun 01, 2026 at 09:32:31AM -0300, Jason Gunthorpe wrote:
> > > On Fri, May 29, 2026 at 06:27:40PM -0700, Nicolin Chen wrote:
> > > > On Tue, May 19, 2026 at 09:06:58AM -0300, Jason Gunthorpe wrote:
> > > > > On Mon, May 18, 2026 at 08:39:01PM -0700, Nicolin Chen wrote:
> > > > So I've tried INV_TYPE_ATS_BROKEN: during per-domain invalidation,
> > > > each batch is built from domain->invs so it can carry the "invs";
> > > > if the batch times out, we can immediately mutate its ATS entries.
> > > >
> > > > But I realized a limitation. E.g., if a device attaches to two SVA
> > > > domains on two SSIDs. An invalidation timing out on one of the SVA
> > > > domains could mark INV_TYPE_ATS_BROKEN in its own invs, but not in
> > > > the other SVA domain's invs?
> > >
> > > You'd have to mark all the S1's sharing the STE.
> >
> > That would be a bit convoluted as we would have to go through all
> > other domains' invs arrays.
>
> Ok, that is certainly an annoying problem.
>
> I don't have a better idea than storing the master unfortunately
>
> But I think the locking for that is going to be tricky, I'm not sure it does
> actually fully work..
Yes, there can be a race that sets STE.EATS back while per-master
flag is set, which would skip the ATC_INV in commit(), so no more
ATC_INV timeout that resets STE.EATS=0. To close it, we can force
STE.EATS=0 at the end of commit() when state->ats_enabled and the
per-master flag are both set, which is only possible in a race.
However, after more thought, I found that there's a big trade-off
to pay in the invalidation path, if we go on this path. The invs
array only has SID, so it needs to convert it to master using the
rb_tree for every entry (holding streams_lock).
I asked AI to list and compare the three solutions that we have:
(1) Per-invs marker: INV_TYPE_ATS_BROKEN + master_domains
disable_ats() in the timeout path walks master->master_domains
and flips matching ATS invs entries to the BROKEN type.
+ invs walker is free (one case label in the existing type switch).
+ No lock or pointer deref in the invs walker.
+ No master pointer stored in invs; no lifetime concern.
- disable_ats() walks every (master, domain) and marks each invs
set; the list needs locking usable from atomic.
- No master-level flag, so atc_inv_master loses the cheap early
skip and attach has nothing to gate ats_enabled on. Concurrent
quarantine self-heals only after the next ATC_INV times out (~1s),
unless we also keep a master flag (i.e. a hybrid of (1) and (2)).
(2) Per-master flag + streams_lock
invs walker resolves SID -> master via streams_lock and reads
master->ats_broken.
+ Single source of truth on the master.
+ disable_ats() is one WRITE_ONCE.
+ atc_inv_master early-skips via one READ_ONCE.
+ attach gates ats_enabled on the flag; a concurrent quarantine
race can be closed by a short post-attach re-check in commit()
+ No master pointer in invs; no lifetime concern.
- invs walker pays streams_lock + rb_find(SID) per ATS entry on
every invalidation. Measurable on ATS-heavy workloads.
(3) Per-master flag + inv->master pointer (v4)
invs entry carries a master pointer; the invs walker reads
cur->master->ats_broken directly.
+ invs walker is one READ_ONCE through a cached pointer.
+ disable_ats is one WRITE_ONCE.
+ atc_inv_master early-skip via one READ_ONCE.
+ attach gate + post-attach re-check, same as (2).
- invs holds a master ptr, so release_device must synchronize_rcu()
before freeing the master to drain walkers under rcu_read_lock().
We dropped this from v4 for that reason.
Side-by-side:
(1) (2) (3)
invs walker fastest slow fast
disable_ats slow fast fast
atc_inv skip slow fast fast
attach race slow fast fast
master free fast fast slow
FWIW, AI picks (3), but this would reintroduce synchronize_rcu()
that you dislike. My personal feeling is that (1) is the cleanest
shirt in the dirty laundry?
I'd like to hear your thought before finalizing the design.
Thanks
Nicolin
More information about the linux-arm-kernel
mailing list