[PATCH 2/3] arm64: Optimize __READ_ONCE() with CONFIG_LTO=y
Marco Elver
elver at google.com
Mon Jan 26 15:15:38 PST 2026
On Mon, 26 Jan 2026 at 12:16, David Laight <david.laight.linux at gmail.com> wrote:
>
> On Mon, 26 Jan 2026 01:25:11 +0100
> Marco Elver <elver at google.com> wrote:
>
> > Rework arm64 LTO __READ_ONCE() to improve code generation as follows:
> >
> > 1. Replace the _Generic-based __unqual_scalar_typeof() with the builtin
> > typeof_unqual(). This strips qualifiers from all types, not just
> > integer types, which is required to be able to assign (must be
> > non-const) to __u.__val in the non-atomic case (required for #2).
> >
> > One subtle point here is that non-integer types of __val could be const
> > or volatile within the union with the old __unqual_scalar_typeof(), if
> > the passed variable is const or volatile. This would then result in a
> > forced load from the stack if __u.__val is volatile; in the case of
> > const, it does look odd if the underlying storage changes, but the
> > compiler is told said member is "const" -- it smells like UB.
> >
> > 2. Eliminate the atomic flag and ternary conditional expression. Move
> > the fallback volatile load into the default case of the switch,
> > ensuring __u is unconditionally initialized across all paths.
> > The statement expression now unconditionally returns __u.__val.
>
> Does it even need to be a union?
> I think (eg):
> TYPEOF_UNQUAL(*__x) __val; \
> ...
> : "=r" (*(__u32 *)&__val) \
> will have the same effect (might need an __force for sparse).
Unsure, but we might be treading on UB even with -fno-strict-aliasing
given all the inline asm around here.
> Also is the 'default' branch even needed?
> READ_ONCE() rejects sizes other than 1, 2, 4 and 8.
> A quick search only found one oversize read - for 'struct vcpu_runstate_info'
> in arch/x86/kvm/xen.c
> Requiring that code use a different define might make sense.
>
> I also did some x86-64 build timings with compiletime_assert_rwonce_type()
> commented out.
> Expanding and compiling that check seems to add just over 1% to the
> build time.
> So anything to shrink that define is likely to be noticeable.
The compiletime_assert_rwonce_type() is for the benefit of the
asm-generic variant, which is implemented like the 'default' case here
by default. This here is only the arm64 override of all that with LTO.
More information about the linux-arm-kernel
mailing list