[PATCH v2] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum

Shanker Donthineni sdonthineni at nvidia.com
Wed Jun 10 06:20:28 PDT 2026


Hi Will,

On 6/10/2026 6:28 AM, Will Deacon wrote:
> External email: Use caution opening links or attachments
>
>
> [+Jason G]
>
> On Fri, Jun 05, 2026 at 09:45:51AM -0500, Shanker Donthineni wrote:
>> On systems with NVIDIA Olympus cores, a Device-nGnR* load can be
>> observed by a peripheral before an older, non-overlapping Device-nGnR*
>> store to the same peripheral. This breaks the program-order guarantee
>> that software expects for Device-nGnR* accesses and can leave a
>> peripheral in an incorrect state, as a load is observed before an
>> earlier store takes effect.
>>
>> The erratum can occur only when all of the following apply:
>>
>>    - A PE executes a Device-nGnR* store followed by a younger
>>      Device-nGnR* load.
>>    - The store is not a store-release.
>>    - The accesses target the same peripheral and do not overlap in bytes.
>>    - There is at most one intervening Device-nGnR* store in program
>>      order, and there are no intervening Device-nGnR* loads.
>>    - There is no DSB, and no DMB that orders loads, between the store and
>>      the load.
>>    - Specific micro-architectural and timing conditions occur.
>>
>> Two ways to restore ordering: insert a barrier (any DSB, or a DMB that
>> orders loads) between the store and the load, or make the store a
>> store-release. A load-acquire on the load side would not help, because
>> acquire semantics do not prevent a load from being observed ahead of an
>> older store; only the store side (release or a barrier) closes the
>> window.
> I think you can drop the paragraph above. A store-release isn't enough
> to order against a later load in the architecture either, so we're
> clearly in micro-architecture territory and I don't think you need to
> describe mechanisms that don't work here.
>
>> Promote the raw MMIO store helpers (__raw_writeb/w/l/q) from plain str*
>> to stlr* (Store-Release), which removes the "store is not a
>> store-release" condition for every device write the kernel issues.
>> Because writel() and writel_relaxed() are both built on __raw_writel()
>> in asm-generic/io.h, patching the raw variants covers both the
>> non-relaxed and relaxed APIs without touching the higher layers. Note
>> that writel()'s own barrier sits before the store, so it does not order
>> the store against a subsequent readl(); the store-release promotion is
>> what provides that ordering.

Based on the existing code comments and after reviewing this path again,
__const_memcpy_toio_aligned32() and __const_memcpy_toio_aligned64()
appear to be intended for WC regions. Since the erratum is scoped to
Device-nGnR* accesses, and WC mappings are Normal-NC on arm64, I don’t
think the STLR workaround should apply to these helpers by default.

Applying it there would also break the contiguous STR grouping that
this path relies on for write combining.

-Shanker





More information about the linux-arm-kernel mailing list