[PATCH v4 18/24] iommu/arm-smmu-v3: Introduce master->ats_broken flag
Nicolin Chen
nicolinc at nvidia.com
Mon Jun 1 13:41:26 PDT 2026
On Mon, Jun 01, 2026 at 09:32:31AM -0300, Jason Gunthorpe wrote:
> On Fri, May 29, 2026 at 06:27:40PM -0700, Nicolin Chen wrote:
> > On Tue, May 19, 2026 at 09:06:58AM -0300, Jason Gunthorpe wrote:
> > > On Mon, May 18, 2026 at 08:39:01PM -0700, Nicolin Chen wrote:
> > So I've tried INV_TYPE_ATS_BROKEN: during per-domain invalidation,
> > each batch is built from domain->invs so it can carry the "invs";
> > if the batch times out, we can immediately mutate its ATS entries.
> >
> > But I realized a limitation. E.g., if a device attaches to two SVA
> > domains on two SSIDs. An invalidation timing out on one of the SVA
> > domains could mark INV_TYPE_ATS_BROKEN in its own invs, but not in
> > the other SVA domain's invs?
>
> You'd have to mark all the S1's sharing the STE.
That would be a bit convoluted as we would have to go through all
other domains' invs arrays.
A master (that timed out an ATC_INV) might be attached to multiple
domains (RID, SVA1, SVA2, ...). Also, we currently don't have any
per-master reverse-tracking to its attached domains (master_domain
is added to smmu_domain->devices list only for now).
So, two things would be needed on top of what we currently have.
Firstly, we would need another per-master list tracking all the
attached smmu_domains. Maybe reuse master_domain? Let's call this
master->master_domains for now.
Secondly, locking. We have two paths that can trigger an ATC_INV
timeout: __arm_smmu_domain_inv_range() that takes the rwlock read
on the current smmu_domain->invs; arm_smmu_atc_inv_master() that
doesn't take any rwlock. When these two paths walk through the
master->master_domains, we would need to take different rwlocks
for those domains. Also, the __arm_smmu_domain_inv_range() path
should skip the invs on the current master_domain, as the rwlock
is already held.
I wonder what's your opinion about these?
Given all this complexity, I started to wonder if we could have
implemented the invs as an RCU-list than an RCU-array: all IOTLB
tag nodes would be still allocated to add/delete/read locklessly;
all ATS nodes would be fixed in the master structure to add/del/
read with the rwlock. Then, a timeout occurring to either path
can simply mutate the ATS entries on the master directly without
going through the list of domains.
> > So, it seems that master->ats_broken is still a cleaner solution?
>
> I don't want the invs code touching master, that is against the entire
> design.
I think I can understand the idea here: we want the invs design to
be in the common code, so anything that's driver-specific (smmu or
master) shouldn't be touched.
> Maybe a flag in the invs list itself is sufficient.
I think we would have to use INV_TYPE_ATS_BROKEN than a per-invs
flag: e.g., a nesting parent domain will have multiple ATS devices
so it cannot use one flag on its big invs to separate the broken
devices from all other healthy devices.
Nicolin
More information about the linux-arm-kernel
mailing list