[PATCH 5/5] arm64: atomics: lse: define RETURN ops in terms of FETCH ops

Will Deacon will at kernel.org
Mon Dec 13 11:43:54 PST 2021


On Fri, Dec 10, 2021 at 03:14:10PM +0000, Mark Rutland wrote:
> The FEAT_LSE atomic instructions include LD* instructions which return
> the original value of a memory location can be used to directly
> implement FETCH opertations. Each RETURN op is implemented as a copy of
> the corresponding FETCH op with a trailing instruction instruction to
> generate the new value of the memory location. We only
> directly implement *_fetch_add*(), for which we have a trailing `add`
> instruction.
> 
> As the compiler has no visibility of the `add`, this leads to less than
> optimal code generation when consuming the result.
> 
> For example, the compiler cannot constant-fold the addition into later
> operations, and currently GCC 11.1.0 will compile:
> 
>        return __lse_atomic_sub_return(1, v) == 0;
> 
> As:
> 
> 	mov     w1, #0xffffffff
> 	ldaddal w1, w2, [x0]
> 	add     w1, w1, w2
> 	cmp     w1, #0x0
> 	cset    w0, eq  // eq = none
> 	ret
> 
> This patch improves this by replacing the `add` with C addition after
> the inline assembly block, e.g.
> 
> 	ret += i;
> 
> This allows the compiler to manipulate `i`. This permits the compiler to
> merge the `add` and `cmp` for the above, e.g.
> 
> 	mov     w1, #0xffffffff
> 	ldaddal w1, w1, [x0]
> 	cmp     w1, #0x1
> 	cset    w0, eq  // eq = none
> 	ret
> 
> With this change the assembly for each RETURN op is identical to the
> corresponding FETCH op (including barriers and clobbers) so I've removed
> the inline assembly and rewritten each RETURN op in terms of the
> corresponding FETCH op, e.g.
> 
> | static inline void __lse_atomic_add_return(int i, atomic_t *v)
> | {
> |       return __lse_atomic_fetch_add(i, v) + i
> | }
> 
> The new construction does not adversely affect the common case, and
> before and after this patch GCC 11.1.0 can compile:
> 
> 	__lse_atomic_add_return(i, v)
> 
> As:
> 
> 	ldaddal w0, w2, [x1]
> 	add     w0, w0, w2
> 
> ... while having the freedom to do better elsewhere.
> 
> This is intended as an optimization and cleanup.
> There should be no functional change as a result of this patch.
> 
> Signed-off-by: Mark Rutland <mark.rutland at arm.com>
> Cc: Boqun Feng <boqun.feng at gmail.com>
> Cc: Catalin Marinas <catalin.marinas at arm.com>
> Cc: Peter Zijlstra <peterz at infradead.org>
> Cc: Will Deacon <will at kernel.org>
> ---
>  arch/arm64/include/asm/atomic_lse.h | 48 +++++++++--------------------
>  1 file changed, 14 insertions(+), 34 deletions(-)

Acked-by: Will Deacon <will at kernel.org>

Will



More information about the linux-arm-kernel mailing list