[PATCH 4/5] arm64: atomics: lse: improve constraints for simple ops
Will Deacon
will at kernel.org
Mon Dec 13 11:40:23 PST 2021
On Fri, Dec 10, 2021 at 03:14:09PM +0000, Mark Rutland wrote:
> We have overly conservative assembly constraints for the basic FEAT_LSE
> atomic instructions, and using more accurate and permissive constraints
> will allow for better code generation.
>
> The FEAT_LSE basic atomic instructions have come in two forms:
>
> LD{op}{order}{size} <Rs>, <Rt>, [<Rn>]
> ST{op}{order}{size} <Rs>, [<Rn>]
>
> The ST* forms are aliases of the LD* forms where:
>
> ST{op}{order}{size} <Rs>, [<Rn>]
> Is:
> LD{op}{order}{size} <Rs>, XZR, [<Rn>]
>
> For either form, both <Rs> and <Rn> are read but not written back to,
> and <Rt> is written with the original value of the memory location.
> Where (<Rt> == <Rs>) or (<Rt> == <Rn>), <Rt> is written *after* the
> other register value(s) are consumed. There are no UNPREDICTABLE or
> CONSTRAINED UNPREDICTABLE behaviours when any pair of <Rs>, <Rt>, or
> <Rn> are the same register.
>
> Our current inline assembly always uses <Rs> == <Rt>, treating this
> register as both an input and an output (using a '+r' constraint). This
> forces the compiler to do some unnecessary register shuffling and/or
> redundant value generation.
>
> For example, the compiler cannot reuse the <Rs> value, and currently GCC
> 11.1.0 will compile:
>
> __lse_atomic_add(1, a);
> __lse_atomic_add(1, b);
> __lse_atomic_add(1, c);
>
> As:
>
> mov w3, #0x1
> mov w4, w3
> stadd w4, [x0]
> mov w0, w3
> stadd w0, [x1]
> stadd w3, [x2]
>
> We can improve this with more accurate constraints, separating <Rs> and
> <Rt>, where <Rs> is an input-only register ('r'), and <Rt> is an
> output-only value ('=r'). As <Rt> is written back after <Rs> is
> consumed, it does not need to be earlyclobber ('=&r'), leaving the
> compiler free to use the same register for both <Rs> and <Rt> where this
> is desirable.
>
> At the same time, the redundant 'r' constraint for `v` is removed, as
> the `+Q` constraint is sufficient.
>
> With this change, the above example becomes:
>
> mov w3, #0x1
> stadd w3, [x0]
> stadd w3, [x1]
> stadd w3, [x2]
>
> I've made this change for the non-value-returning and FETCH ops. The
> RETURN ops have a multi-instruction sequence for which we cannot use the
> same constraints, and a subsequent patch will rewrite hte RETURN ops in
> terms of the FETCH ops, relying on the ability for the compiler to reuse
> the <Rs> value.
>
> This is intended as an optimization.
> There should be no functional change as a result of this patch.
>
> Signed-off-by: Mark Rutland <mark.rutland at arm.com>
> Cc: Boqun Feng <boqun.feng at gmail.com>
> Cc: Catalin Marinas <catalin.marinas at arm.com>
> Cc: Peter Zijlstra <peterz at infradead.org>
> Cc: Will Deacon <will at kernel.org>
> ---
> arch/arm64/include/asm/atomic_lse.h | 30 +++++++++++++++++------------
> 1 file changed, 18 insertions(+), 12 deletions(-)
Makes sense to me. I'm not sure _why_ the current constraints are so weird;
maybe a hangover from when we patched them inline? Anywho:
Acked-by: Will Deacon <will at kernel.org>
Will
More information about the linux-arm-kernel
mailing list