[PATCH 0/5] arm64: atomics: cleanups and codegen improvements

Fri Dec 10 07:14:05 PST 2021

While looking at Peter's recent refcount rework, I spotted that we have
some unfortunate code generation for the LSE atomics. Due to a
combination of assembly constraints and manipulation performed in
assembly which the compiler has no visibilty of, the compiler ends up
generating unnecessary register shuffling and redundant manipulation.

This series (based on v5.16-rc4) attempts to improve this by improving
the constraints, and moving value manipulation to C where the compiler
can perofrm a number of optimizations. This also has the benefit of
simplifying the implementation and deleting 100+ lines of code.

This is purely a cleanup and optimization; there should be no functional
change as a result of the series.

I've pushed the series out to my arm64/atomics/improvements branch:

  https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/atomics/improvements
  git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/atomics/improvements

For comparison, using GCC 11.1.0 to compile the following code:

| s64 example64_fetch_and(s64 i, atomic64_t *v)
| {
| 	return __lse_atomic64_fetch_and(i, v);
| }
| 
| s64 example64_fetch_and_f(atomic64_t *v)
| {
| 	return __lse_atomic64_fetch_and(0xf, v);
| }
| 
| s64 example64_fetch_sub(s64 i, atomic64_t *v)
| {
| 	return __lse_atomic64_fetch_sub(i, v);
| }
| 
| s64 example64_fetch_sub_f(atomic64_t *v)
| {
| 	return __lse_atomic64_fetch_sub(0xf, v);
| }
| 
| s64 example64_sub_return(s64 i, atomic64_t *v)
| {
| 	return __lse_atomic64_sub_return(i, v);
| }
| 
| s64 example64_sub_return_f(atomic64_t *v)
| {
| 	return __lse_atomic64_sub_return(0xf, v);
| }

Before this series:

| 0000000000000000 <example64_fetch_and>:
|    0:   aa2003e0        mvn     x0, x0
|    4:   f8e01020        ldclral x0, x0, [x1]
|    8:   d65f03c0        ret
|    c:   d503201f        nop
| 
| 0000000000000010 <example64_fetch_and_f>:
|   10:   aa0003e2        mov     x2, x0
|   14:   d28001e1        mov     x1, #0xf                        // #15
|   18:   aa0103e0        mov     x0, x1
|   1c:   aa2003e0        mvn     x0, x0
|   20:   f8e01040        ldclral x0, x0, [x2]
|   24:   d65f03c0        ret
|   28:   d503201f        nop
|   2c:   d503201f        nop
| 
| 0000000000000030 <example64_fetch_sub>:
|   30:   cb0003e0        neg     x0, x0
|   34:   f8e00020        ldaddal x0, x0, [x1]
|   38:   d65f03c0        ret
|   3c:   d503201f        nop
| 
| 0000000000000040 <example64_fetch_sub_f>:
|   40:   aa0003e2        mov     x2, x0
|   44:   d28001e1        mov     x1, #0xf                        // #15
|   48:   aa0103e0        mov     x0, x1
|   4c:   cb0003e0        neg     x0, x0
|   50:   f8e00040        ldaddal x0, x0, [x2]
|   54:   d65f03c0        ret
|   58:   d503201f        nop
|   5c:   d503201f        nop
| 
| 0000000000000060 <example64_sub_return>:
|   60:   cb0003e0        neg     x0, x0
|   64:   f8e00022        ldaddal x0, x2, [x1]
|   68:   8b020000        add     x0, x0, x2
|   6c:   d65f03c0        ret
| 
| 0000000000000070 <example64_sub_return_f>:
|   70:   aa0003e2        mov     x2, x0
|   74:   d28001e1        mov     x1, #0xf                        // #15
|   78:   aa0103e0        mov     x0, x1
|   7c:   cb0003e0        neg     x0, x0
|   80:   f8e00041        ldaddal x0, x1, [x2]
|   84:   8b010000        add     x0, x0, x1
|   88:   d65f03c0        ret
|   8c:   d503201f        nop

After this series:

| 0000000000000000 <example64_fetch_and>:
|    0:   aa2003e0        mvn     x0, x0
|    4:   f8e01020        ldclral x0, x0, [x1]
|    8:   d65f03c0        ret
|    c:   d503201f        nop
| 
| 0000000000000010 <example64_fetch_and_f>:
|   10:   928001e1        mov     x1, #0xfffffffffffffff0         // #-16
|   14:   f8e11001        ldclral x1, x1, [x0]
|   18:   aa0103e0        mov     x0, x1
|   1c:   d65f03c0        ret
| 
| 0000000000000020 <example64_fetch_sub>:
|   20:   cb0003e0        neg     x0, x0
|   24:   f8e00020        ldaddal x0, x0, [x1]
|   28:   d65f03c0        ret
|   2c:   d503201f        nop
| 
| 0000000000000030 <example64_fetch_sub_f>:
|   30:   928001c1        mov     x1, #0xfffffffffffffff1         // #-15
|   34:   f8e10001        ldaddal x1, x1, [x0]
|   38:   aa0103e0        mov     x0, x1
|   3c:   d65f03c0        ret
| 
| 0000000000000040 <example64_sub_return>:
|   40:   cb0003e2        neg     x2, x0
|   44:   f8e20022        ldaddal x2, x2, [x1]
|   48:   cb000040        sub     x0, x2, x0
|   4c:   d65f03c0        ret
| 
| 0000000000000050 <example64_sub_return_f>:
|   50:   928001c1        mov     x1, #0xfffffffffffffff1         // #-15
|   54:   f8e10001        ldaddal x1, x1, [x0]
|   58:   d1003c20        sub     x0, x1, #0xf
|   5c:   d65f03c0        ret

Thanks,
Mark.

Mark Rutland (5):
  arm64: atomics: format whitespace consistently
  arm64: atomics lse: define SUBs in terms of ADDs
  arm64: atomics: lse: define ANDs in terms of ANDNOTs
  arm64: atomics: lse: improve constraints for simple ops
  arm64: atomics: lse: define RETURN ops in terms of FETCH ops

 arch/arm64/include/asm/atomic_ll_sc.h |  86 ++++----
 arch/arm64/include/asm/atomic_lse.h   | 270 ++++++++------------------
 2 files changed, 126 insertions(+), 230 deletions(-)

-- 
2.30.2