[PATCH v1] arm64: errata: Workaround NVIDIA Olympus device store/load ordering erratum
Shanker Donthineni
sdonthineni at nvidia.com
Fri Jun 5 07:34:10 PDT 2026
Hi Vladimir Murzin,
On 6/5/2026 4:26 AM, Vladimir Murzin wrote:
> External email: Use caution opening links or attachments
>
>
> On 6/5/26 00:12, Shanker Donthineni wrote:
>> On systems with NVIDIA Olympus cores, a Device-nGnR* load can be
>> observed by a peripheral before an older, non-overlapping Device-nGnR*
>> store to the same peripheral. This breaks the program-order guarantee
>> that software expects for Device-nGnR* accesses and can leave a
>> peripheral in an incorrect state, as a load is observed before an
>> earlier store takes effect.
>>
>> The erratum can occur only when all of the following apply:
>>
>> - A PE executes a Device-nGnR* store followed by a younger
>> Device-nGnR* load.
>> - The store is not a store-release.
>> - The accesses target the same peripheral and do not overlap in bytes.
>> - There is at most one intervening Device-nGnR* store in program
>> order, and there are no intervening Device-nGnR* loads.
>> - There is no DSB, and no DMB that orders loads, between the store and
>> the load.
>> - Specific micro-architectural and timing conditions occur.
>>
>> Two ways to restore ordering: insert a barrier (any DSB, or a DMB that
>> orders loads) between the store and the load, or make the store a
>> store-release. A load-acquire on the load side would not help, because
>> acquire semantics do not prevent a load from being observed ahead of an
>> older store; only the store side (release or a barrier) closes the
>> window.
>>
>> Promote the raw MMIO store helpers (__raw_writeb/w/l/q) from plain str*
>> to stlr* (Store-Release), which removes the "store is not a
>> store-release" condition for every device write the kernel issues.
>> Because writel() and writel_relaxed() are both built on __raw_writel()
>> in asm-generic/io.h, patching the raw variants covers both the
>> non-relaxed and relaxed APIs without touching the higher layers. Note
>> that writel()'s own barrier sits before the store, so it does not order
>> the store against a subsequent readl(); the store-release promotion is
>> what provides that ordering.
>>
>> Like ARM64_ERRATUM_832075 on the load side, the change is gated on a new
>> ARM64_WORKAROUND_DEVICE_STORE_RELEASE capability and only activated on
>> parts that match MIDR_NVIDIA_OLYMPUS, so unaffected CPUs continue to use
>> the plain str* sequence.
>>
>> Co-developed-by: Vikram Sethi <vsethi at nvidia.com>
>> Signed-off-by: Vikram Sethi <vsethi at nvidia.com>
>> Signed-off-by: Shanker Donthineni <sdonthineni at nvidia.com>
>> ---
>> Documentation/arch/arm64/silicon-errata.rst | 2 ++
>> arch/arm64/Kconfig | 23 ++++++++++++++++++++
>> arch/arm64/include/asm/io.h | 24 ++++++++++++++-------
>> arch/arm64/kernel/cpu_errata.c | 8 +++++++
>> arch/arm64/tools/cpucaps | 1 +
>> 5 files changed, 50 insertions(+), 8 deletions(-)
>>
>> diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
>> index 211119ce7adc..899bed3908bb 100644
>> --- a/Documentation/arch/arm64/silicon-errata.rst
>> +++ b/Documentation/arch/arm64/silicon-errata.rst
>> @@ -256,6 +256,8 @@ stable kernels.
>> +----------------+-----------------+-----------------+-----------------------------+
>> | NVIDIA | Carmel Core | N/A | NVIDIA_CARMEL_CNP_ERRATUM |
>> +----------------+-----------------+-----------------+-----------------------------+
>> +| NVIDIA | Olympus core | T410-OLY-1027 | NVIDIA_OLYMPUS_1027_ERRATUM |
>> ++----------------+-----------------+-----------------+-----------------------------+
>> | NVIDIA | T241 GICv3/4.x | T241-FABRIC-4 | N/A |
>> +----------------+-----------------+-----------------+-----------------------------+
>> | NVIDIA | T241 MPAM | T241-MPAM-1 | N/A |
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index fe60738e5943..a6bac84b05a1 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -564,6 +564,29 @@ config ARM64_ERRATUM_832075
>>
>> If unsure, say Y.
>>
>> +config NVIDIA_OLYMPUS_1027_ERRATUM
>> + bool "NVIDIA Olympus: device store/load ordering erratum"
>> + default y
>> + help
>> + This option adds an alternative code sequence to work around an
>> + NVIDIA Olympus core erratum where a Device-nGnR* store can be
>> + observed by a peripheral after a younger Device-nGnR* load to the
>> + same peripheral. This breaks the program order that drivers rely
>> + on for MMIO and can leave a device in an incorrect state.
>> +
>> + The workaround promotes the raw MMIO store helpers
>> + (__raw_writeb/w/l/q) to Store-Release (STLR), which restores the
>> + required ordering. Because writel() and writel_relaxed() are built
>> + on __raw_writel(), both are covered without changes to the higher
>> + layers.
>> +
>> + The fix is applied through the alternatives framework, so enabling
>> + this option does not by itself activate the workaround: it is
>> + patched in only when an affected CPU is detected, and is a no-op on
>> + unaffected CPUs.
>> +
>> + If unsure, say Y.
>> +
>> config ARM64_ERRATUM_834220
>> bool "Cortex-A57: 834220: Stage 2 translation fault might be incorrectly reported in presence of a Stage 1 fault (rare)"
>> depends on KVM
>> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
>> index 8cbd1e96fd50..b6d7966e9c19 100644
>> --- a/arch/arm64/include/asm/io.h
>> +++ b/arch/arm64/include/asm/io.h
>> @@ -25,29 +25,37 @@
>> #define __raw_writeb __raw_writeb
>> static __always_inline void __raw_writeb(u8 val, volatile void __iomem *addr)
>> {
>> - volatile u8 __iomem *ptr = addr;
>> - asm volatile("strb %w0, %1" : : "rZ" (val), "Qo" (*ptr));
>> + asm volatile(ALTERNATIVE("strb %w0, [%1]",
>> + "stlrb %w0, [%1]",
>> + ARM64_WORKAROUND_DEVICE_STORE_RELEASE)
>> + : : "rZ" (val), "r" (addr));
>> }
>>
> Nitpick:
>
> The change has the side effect of undoing d044d6ba6f02 ("arm64:
> io: permit offset addressing"), since stlr* do not support
> offset addressing. Unaffected CPUs would continue to use str*,
> but would lose the benefit of offset addressing :(
>
> Not sure if this needs to be mentioned in the commit message...
>
Thanks for your feedback, You're right that this reverts the
offset-addressing benefit of d044d6ba6f02 for the str* path too, because
stlr* has no offset form and both alternates must share one compile-time
operand form (alternatives are patched at boot). Keeping offset
addressing only for the unaffected str* path would need a runtime branch
per str operation, which isn't worth it for this optimization. I'll call
this out explicitly in the commit message in the v2 patch. -Shanker
More information about the linux-arm-kernel
mailing list