[PATCH v8 07/12] iommu/arm-smmu-v3: Add CMDQ_PROD_STOP_FLAG to gate CMDQ submissions
Daniel Mentz
danielmentz at google.com
Wed Jun 10 10:37:46 PDT 2026
On Tue, Jun 9, 2026 at 11:58 AM Pranjal Shrivastava <praan at google.com> wrote:
>
> On Tue, Jun 09, 2026 at 11:20:52AM -0700, Daniel Mentz wrote:
> > On Tue, Jun 9, 2026 at 3:05 AM Pranjal Shrivastava <praan at google.com> wrote:
> > > >
> > > > > Even if the worker CPU reorders the PTE write after the STOP_FLAG check,
> > > > > it is benign because the SMMU is incapable of fetching that (or any) PTE
> > > > > while the gate is closed. Because GATE_CLOSED == SMMUEN = 0, implying no
> > > > > access to any HW structures whatsoever.
> > > > >
> > > > > The real synchronization happens in the Resume Path:
> > > > >
> > > > > 1. arm_smmu_device_reset() clears all caches / TLBs.
> > > > > (None of these can have entries before SMMUEN=1)
> > > > >
> > > > > 2. We execute a full smp_mb() before setting SMMUEN=1. (that's why we
> > > > > need smp_mb before SMMUEN=1). This barrier ensures that any PTE
> > > > > writes made by any thread—including those that were elided while the
> > > > > gate was closed, are globally visible before the SMMU hardware starts
> > > > > fetching into TLBs again. (This is why Jason suggested this in v6 [1])
> > > >
> > > > A barrier on one CPU has no bearing on whether writes by any other CPU
> > > > can be observed by any particular agent in the system.
> > > >
> > > > Let's compare this with the long comment in
> > > > arm_smmu_domain_inv_range() which is what I believe Jason was
> > > > referring to. In that example, you see smp_mb() in the code path on
> > > > CPU0 and dma_wmb() in the code path on CPU1. Hence, barriers exist on
> > > > both sides. If you compare the runtime PM design with
> > > > arm_smmu_domain_inv_range(), then smp_mb() belongs in the CPU thread
> > > > that performs the translation table updates not the one that performs
> > > > the suspend/resume operation.
> > > >
> > >
> > > I might be missing something here, so please bear with me. My
> > > understanding it that's needed because the IOMMU is live & actively
> > > caching, which is not true for our case.
> >
> > I think the "invs" design (Per-domain invalidation array) is more
> > similar than you think! An SMMU being absent from invs is equivalent
> > to the STOP flag, and the STE pointing to TTB0 is roughly the
> > equivalent of SMMEN=1 i.e. the IOMMU is not actively caching a
> > particular translation domain until an STE (or CD) points to it.
> >
> > > [Assuming we use non-relaxed semantics & ordering for the STOP flag,
> > > i.e. set STOP_FLAG + barrier & clear STOP_FLAG (implicit dma_wmb())]
> > >
> > > In our case, during the resume op, we first clear the STOP_FLAG before
> > > setting SMMUEN=1 in program order. Thus, any PTE invalidations occurring
> > > before SMMUEN=1 are executed, i.e. EVEN when the SMMU is guaranteed not
> > > to access any structures, we've resumed invalidations.
> >
> > "[...] we first clear the STOP_FLAG before setting SMMUEN=1 in program
> > order." I think this should be modified to "we first clear the
> > STOP_FLAG and ensure that the cleared STOP_FLAG is observable by all
> > other CPUs before setting SMMUEN=1"
> >
>
> Ack. The goal was to explain the algorithm for this thread, I won't be
> commenting it in code. Are you suggesting I should convert my
> explaination of the algorithm above into in-line comments and make sure
> to include the STOP_FLAG observability part?
I believe there would be a benefit to having a comment in the code
that states the requirements for the STOP_FLAG observability.
Something along the lines of either of the following:
* SMMU must be disabled any time another CPU can observe the STOP_FLAG
* Other CPUs must (a) observe the STOP flag only after the SMMU is
disabled, and (b) observe the cleared STOP flag before the SMMU is
re-enabled.
> >
> > I would define a set of invariants:
> >
> > * If an agent observes the STOP flag, it is guaranteed that SMMUEN=0
> > (with ABORT set) at the time of observation.
> > * Any transition from a set STOP flag to SMMUEN=1 involves an
> > invalidate-all operation prior to setting SMMUEN=1
> >
> > Hence, if a CPU observes the STOP flag, it is assured that (a)
> > transactions are blocked and (b) if the SMMU is ever re-enabled, an
> > invalidate-all is performed prior to it being enabled.
> >
> > I would then argue that all operations support these invariants. For
> > example, we need proper barriers in the iommu_unmap path to ensure
> > that the STOP flag is only checked *after* the translation table
> > update is made. Hence, we need a memory barrier.
> >
> > I look at it this way: Every elided invalidation creates an
> > "invalidation deficit", and this deficit is tolerable for two reasons:
> > (a) SMMU blocks all transactions while there is a deficit. (b) An
> > invalidate-all eliminates any deficit accrued while the STOP flag was
> > set.
>
> Ack, which means you agree with the design proposed in my last reply.
> I'll document these invariants in line if that's what you're suggesting
> here?
Yes, I believe we are in agreement, and yes, I think documenting these
invariants would be beneficial.
[...]
> >
> >
> > > HW structures not accessed means no TLB / CFG
> > > cache accesses as well according to the spec.
> > >
> > > [CPU1] ==> PTE update => Invalidate => Succeeds (although SMMUEN = 0)
> > >
> > > [CPU0] GBPA.Abort set ==> Txns are blocked
> > >
> > > [CPU2] => PTE update => Invalidate => Succeeds [Txns blocked + SMMUEN=0]
> > >
> > > [CPU0] ==> SET STOP_FLAG ==> Elision begins
> > >
> > > [CPU3] ==> PTE update ==> Invalidation ==> Elided [Txns blocked + SMMUEN=0]
> > >
> > > Hence, the races in the suspend sequence are handled correctly.
> >
> > I'm not sure if this description demonstrates that every possible race
> > is handled correctly. If I compare this with Nicolin's presentation in
> > arm_smmu_domain_inv_range, I like that presentation, as it explicitly
> > mentions loads and barriers. For example, it has an smp_mb() followed
> > by "// load the updated invs". I think you should make have something
> > like "smp_mb() ; CHECK STOP_FLAG" in your presentation. Currently, the
> > STOP_FLAG checking is somehow implicit in "Invalidation".
>
> Ack. The goal of this diagram was to explain the working of the design,
> this is NOT the comment/document I plan to include in code.
>
> I'll add this as an in-line comment if that's what you're suggesting? OR
> are you also suggesting I should have this in my cover letter?
In arm_smmu_domain_inv_range(), there's Nicolin's sequence diagram
that describes how installing a new domain is synchronized with PTE
updates and subsequent TLBIs. I would recommend adding a similar
diagram that shows how suspend and resume are synchronized with PTE
updates and TLBIs.
Thanks
Daniel
More information about the linux-arm-kernel
mailing list