Overhead of arm64 LSE per-CPU atomics?

Puranjay Mohan puranjay at kernel.org
Tue Nov 4 12:57:53 PST 2025


Hi Breno,

I tried your benchmark on AWS graviton platforms:

On EC2 c8g.metal-24xl (96 cpus Neoverse-V2) (AWS Graviton 4):

With ldadd, it was stable and LSE is always better than LL/SC

But with stadd, I saw some spikes in p95 and p99:

 CPU: 28 - Latency Percentiles:
====================
LL/SC:   p50: 6.61 ns     p95: 6.61 ns    p99: 6.62 ns
LSE  :   p50: 4.64 ns     p95: 4.65 ns    p99: 4.65 ns

 CPU: 30 - Latency Percentiles:
====================
LL/SC:   p50: 6.61 ns     p95: 6.61 ns    p99: 6.62 ns
LSE  :   p50: 4.64 ns     p95: 14.24 ns  ***p99: 27.74 ns***


On EC2 m6g.metal (64 cpus Neoverse-N1) (AWS Graviton 2):

Here both stadd and ldadd were stable and LSE was always better than LL/SC

with ldadd:

ARM64 Per-CPU Atomic Add Benchmark
===================================
Running percentile measurements (100 iterations)...
Detected 64 CPUs

 CPU: 0 - Latency Percentiles:
====================
LL/SC:   p50: 8.40 ns     p95: 8.40 ns    p99: 8.42 ns
LSE  :   p50: 5.60 ns     p95: 5.60 ns    p99: 5.61 ns

 CPU: 1 - Latency Percentiles:
====================
LL/SC:   p50: 8.40 ns     p95: 8.40 ns    p99: 8.41 ns
LSE  :   p50: 5.60 ns     p95: 5.60 ns    p99: 5.61 ns


[....]

 CPU: 62 - Latency Percentiles:
====================
LL/SC:   p50: 8.40 ns     p95: 8.40 ns    p99: 8.40 ns
LSE  :   p50: 5.60 ns     p95: 5.60 ns    p99: 5.60 ns

 CPU: 63 - Latency Percentiles:
====================
LL/SC:   p50: 8.40 ns     p95: 8.40 ns    p99: 8.41 ns
LSE  :   p50: 5.60 ns     p95: 5.60 ns    p99: 5.60 ns

=== Benchmark Complete ===

With stadd:

ARM64 Per-CPU Atomic Add Benchmark
===================================
Running percentile measurements (100 iterations)...
Detected 64 CPUs

 CPU: 0 - Latency Percentiles:
====================
LL/SC:   p50: 8.00 ns     p95: 8.01 ns    p99: 8.02 ns
LSE  :   p50: 5.20 ns     p95: 5.21 ns    p99: 5.21 ns

 CPU: 1 - Latency Percentiles:
====================
LL/SC:   p50: 8.00 ns     p95: 8.01 ns    p99: 8.01 ns
LSE  :   p50: 5.20 ns     p95: 5.21 ns    p99: 5.22 ns


[.....]

 CPU: 62 - Latency Percentiles:
====================
LL/SC:   p50: 8.00 ns     p95: 8.01 ns    p99: 8.14 ns
LSE  :   p50: 5.20 ns     p95: 5.21 ns    p99: 5.21 ns

 CPU: 63 - Latency Percentiles:
====================
LL/SC:   p50: 8.00 ns     p95: 8.01 ns    p99: 8.01 ns
LSE  :   p50: 5.20 ns     p95: 5.20 ns    p99: 5.20 ns

=== Benchmark Complete ===



More information about the linux-arm-kernel mailing list