[PATCH 0/5] arm64: atomics: cleanups and codegen improvements
Mark Rutland
mark.rutland at arm.com
Fri Dec 10 07:14:05 PST 2021
While looking at Peter's recent refcount rework, I spotted that we have
some unfortunate code generation for the LSE atomics. Due to a
combination of assembly constraints and manipulation performed in
assembly which the compiler has no visibilty of, the compiler ends up
generating unnecessary register shuffling and redundant manipulation.
This series (based on v5.16-rc4) attempts to improve this by improving
the constraints, and moving value manipulation to C where the compiler
can perofrm a number of optimizations. This also has the benefit of
simplifying the implementation and deleting 100+ lines of code.
This is purely a cleanup and optimization; there should be no functional
change as a result of the series.
I've pushed the series out to my arm64/atomics/improvements branch:
https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/atomics/improvements
git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/atomics/improvements
For comparison, using GCC 11.1.0 to compile the following code:
| s64 example64_fetch_and(s64 i, atomic64_t *v)
| {
| return __lse_atomic64_fetch_and(i, v);
| }
|
| s64 example64_fetch_and_f(atomic64_t *v)
| {
| return __lse_atomic64_fetch_and(0xf, v);
| }
|
| s64 example64_fetch_sub(s64 i, atomic64_t *v)
| {
| return __lse_atomic64_fetch_sub(i, v);
| }
|
| s64 example64_fetch_sub_f(atomic64_t *v)
| {
| return __lse_atomic64_fetch_sub(0xf, v);
| }
|
| s64 example64_sub_return(s64 i, atomic64_t *v)
| {
| return __lse_atomic64_sub_return(i, v);
| }
|
| s64 example64_sub_return_f(atomic64_t *v)
| {
| return __lse_atomic64_sub_return(0xf, v);
| }
Before this series:
| 0000000000000000 <example64_fetch_and>:
| 0: aa2003e0 mvn x0, x0
| 4: f8e01020 ldclral x0, x0, [x1]
| 8: d65f03c0 ret
| c: d503201f nop
|
| 0000000000000010 <example64_fetch_and_f>:
| 10: aa0003e2 mov x2, x0
| 14: d28001e1 mov x1, #0xf // #15
| 18: aa0103e0 mov x0, x1
| 1c: aa2003e0 mvn x0, x0
| 20: f8e01040 ldclral x0, x0, [x2]
| 24: d65f03c0 ret
| 28: d503201f nop
| 2c: d503201f nop
|
| 0000000000000030 <example64_fetch_sub>:
| 30: cb0003e0 neg x0, x0
| 34: f8e00020 ldaddal x0, x0, [x1]
| 38: d65f03c0 ret
| 3c: d503201f nop
|
| 0000000000000040 <example64_fetch_sub_f>:
| 40: aa0003e2 mov x2, x0
| 44: d28001e1 mov x1, #0xf // #15
| 48: aa0103e0 mov x0, x1
| 4c: cb0003e0 neg x0, x0
| 50: f8e00040 ldaddal x0, x0, [x2]
| 54: d65f03c0 ret
| 58: d503201f nop
| 5c: d503201f nop
|
| 0000000000000060 <example64_sub_return>:
| 60: cb0003e0 neg x0, x0
| 64: f8e00022 ldaddal x0, x2, [x1]
| 68: 8b020000 add x0, x0, x2
| 6c: d65f03c0 ret
|
| 0000000000000070 <example64_sub_return_f>:
| 70: aa0003e2 mov x2, x0
| 74: d28001e1 mov x1, #0xf // #15
| 78: aa0103e0 mov x0, x1
| 7c: cb0003e0 neg x0, x0
| 80: f8e00041 ldaddal x0, x1, [x2]
| 84: 8b010000 add x0, x0, x1
| 88: d65f03c0 ret
| 8c: d503201f nop
After this series:
| 0000000000000000 <example64_fetch_and>:
| 0: aa2003e0 mvn x0, x0
| 4: f8e01020 ldclral x0, x0, [x1]
| 8: d65f03c0 ret
| c: d503201f nop
|
| 0000000000000010 <example64_fetch_and_f>:
| 10: 928001e1 mov x1, #0xfffffffffffffff0 // #-16
| 14: f8e11001 ldclral x1, x1, [x0]
| 18: aa0103e0 mov x0, x1
| 1c: d65f03c0 ret
|
| 0000000000000020 <example64_fetch_sub>:
| 20: cb0003e0 neg x0, x0
| 24: f8e00020 ldaddal x0, x0, [x1]
| 28: d65f03c0 ret
| 2c: d503201f nop
|
| 0000000000000030 <example64_fetch_sub_f>:
| 30: 928001c1 mov x1, #0xfffffffffffffff1 // #-15
| 34: f8e10001 ldaddal x1, x1, [x0]
| 38: aa0103e0 mov x0, x1
| 3c: d65f03c0 ret
|
| 0000000000000040 <example64_sub_return>:
| 40: cb0003e2 neg x2, x0
| 44: f8e20022 ldaddal x2, x2, [x1]
| 48: cb000040 sub x0, x2, x0
| 4c: d65f03c0 ret
|
| 0000000000000050 <example64_sub_return_f>:
| 50: 928001c1 mov x1, #0xfffffffffffffff1 // #-15
| 54: f8e10001 ldaddal x1, x1, [x0]
| 58: d1003c20 sub x0, x1, #0xf
| 5c: d65f03c0 ret
Thanks,
Mark.
Mark Rutland (5):
arm64: atomics: format whitespace consistently
arm64: atomics lse: define SUBs in terms of ADDs
arm64: atomics: lse: define ANDs in terms of ANDNOTs
arm64: atomics: lse: improve constraints for simple ops
arm64: atomics: lse: define RETURN ops in terms of FETCH ops
arch/arm64/include/asm/atomic_ll_sc.h | 86 ++++----
arch/arm64/include/asm/atomic_lse.h | 270 ++++++++------------------
2 files changed, 126 insertions(+), 230 deletions(-)
--
2.30.2
More information about the linux-arm-kernel
mailing list