[PATCH] iommu/arm-smmu-v3: Allow disabling Stage 1 translation

Jason Gunthorpe jgg at ziepe.ca
Fri Apr 24 08:42:56 PDT 2026


On Fri, Apr 24, 2026 at 04:16:17PM +0100, Will Deacon wrote:
> > > > STE/CD is pretty simple now, there is only one place to put the CMO
> > > > and the ordering is all handled with that shared code. We no longer
> > > > care about ordering beyond all the writes must be visible to HW before
> > > > issuing the CMDQ invalidation command - which is the same environment
> > > > as the pagetable.
> > > 
> > > You presumably rely on 64-bit single-copy atomicity for hitless updates,
> > > no?
> > 
> > Yes, just like the page table does..
> > 
> > I hope that's not a problem or we have a issue with the PTW :)
> 
> You trimmed the part from my reply where I think we _do_ have an issue
> with the PTW. Here it is again:
> 
>   The non-coherent case looks more fragile, because I don't _think_ the
>   architecture provides any ordering or atomicity guarantees about cache
>   cleaning to the PoC. Presumably, the correct sequence would be to write
>   the PTE with the valid bit clear, do the CMO (with completion barrier),
>   *then* write the bottom byte with the valid bit set and do another CMO.

I wasn't sure if you are being serious.

CMO + barriers must provide an ordering guarentee about cache cleaning
to POC otherwise the entire Linux DMA API is broken. dma_sync must
order with following device DMA. IMHO that's not negotiable for Linux.

All ARM iommus rely on 64 bit atomic non tearing. No bugs reported?

Any fix to that is going to have major performance downsides..

I also strongly suspect it is provided on real HW. It would be hard to
even build HW where <= 64 bit quanta can tear.

Maybe this is something ARM should take a look at.

At the very least it would warrant an IORT flag for safe HW to use to
opt into the faster cachable flow.

> > My argument is that the CMO on STE/CD shouldn't bother mobile, you
> > could even view it as an micro-optimization because we do occasionally
> > read-back the STE/CD fields.
> 
> I was against that read-back, iirc :)

Yes, but it is OK :)

> > And if Samiullah can tackle dma_alloc_coherent then maybe the whole
> > question is moot.
> 
> Yes, that would be great, but we probably need to fix the page-table
> code too.

You really want to deal with the likely perf regressions that would
cause on Android/etc?

Jason



More information about the linux-arm-kernel mailing list