[PATCH v2] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum

Jason Gunthorpe jgg at nvidia.com
Wed Jun 10 05:50:10 PDT 2026


On Wed, Jun 10, 2026 at 12:28:33PM +0100, Will Deacon wrote:
> > Note: stlr* only supports base-register addressing, so the raw accessors
> > can no longer use the offset addressing introduced by commit d044d6ba6f02
> > ("arm64: io: permit offset addressing"). The str* and stlr* alternates
> > share a single inline-asm operand and the sequence is selected at boot,
> > so the operand form is fixed at compile time; unaffected CPUs keep using
> > str* but also revert to base-register addressing. This keeps the store
> > side as simple as the existing load-side patching (load-acquire) and
> > avoids adding complexity to the device write path; retaining offset
> > addressing only for str* would otherwise require a runtime branch on
> > every write.
> 
> I seem to remember Jason caring about that, possibly because some CPUs
> are very picky about write-combining?

I think it was more a fall out of the work there, after looking at the
assembly this minor edit to the constraint made a nice codegen
impact. It is certainly a shame to loose it for this bug.

If we care about write combining we can't have a branch anyhow, but
that is most important for the specific memcpy operations (which will
need a branch)

Jason



More information about the linux-arm-kernel mailing list