[PATCH 2/3] arm64: Optimize __READ_ONCE() with CONFIG_LTO=y

Marco Elver elver at google.com
Mon Jan 26 15:15:38 PST 2026


On Mon, 26 Jan 2026 at 12:16, David Laight <david.laight.linux at gmail.com> wrote:
>
> On Mon, 26 Jan 2026 01:25:11 +0100
> Marco Elver <elver at google.com> wrote:
>
> > Rework arm64 LTO __READ_ONCE() to improve code generation as follows:
> >
> > 1. Replace the _Generic-based __unqual_scalar_typeof() with the builtin
> >    typeof_unqual(). This strips qualifiers from all types, not just
> >    integer types, which is required to be able to assign (must be
> >    non-const) to __u.__val in the non-atomic case (required for #2).
> >
> > One subtle point here is that non-integer types of __val could be const
> > or volatile within the union with the old __unqual_scalar_typeof(), if
> > the passed variable is const or volatile. This would then result in a
> > forced load from the stack if __u.__val is volatile; in the case of
> > const, it does look odd if the underlying storage changes, but the
> > compiler is told said member is "const" -- it smells like UB.
> >
> > 2. Eliminate the atomic flag and ternary conditional expression. Move
> >    the fallback volatile load into the default case of the switch,
> >    ensuring __u is unconditionally initialized across all paths.
> >    The statement expression now unconditionally returns __u.__val.
>
> Does it even need to be a union?
> I think (eg):
>         TYPEOF_UNQUAL(*__x) __val;      \
>         ...
>                 : "=r" (*(__u32 *)&__val)       \
> will have the same effect (might need an __force for sparse).

Unsure, but we might be treading on UB even with -fno-strict-aliasing
given all the inline asm around here.

> Also is the 'default' branch even needed?
> READ_ONCE() rejects sizes other than 1, 2, 4 and 8.
> A quick search only found one oversize read - for 'struct vcpu_runstate_info'
> in arch/x86/kvm/xen.c
> Requiring that code use a different define might make sense.
>
> I also did some x86-64 build timings with compiletime_assert_rwonce_type()
> commented out.
> Expanding and compiling that check seems to add just over 1% to the
> build time.
> So anything to shrink that define is likely to be noticeable.

The compiletime_assert_rwonce_type() is for the benefit of the
asm-generic variant, which is implemented like the 'default' case here
by default. This here is only the arm64 override of all that with LTO.



More information about the linux-arm-kernel mailing list