Overhead of arm64 LSE per-CPU atomics?
Puranjay Mohan
puranjay at kernel.org
Tue Nov 4 12:57:53 PST 2025
Hi Breno,
I tried your benchmark on AWS graviton platforms:
On EC2 c8g.metal-24xl (96 cpus Neoverse-V2) (AWS Graviton 4):
With ldadd, it was stable and LSE is always better than LL/SC
But with stadd, I saw some spikes in p95 and p99:
CPU: 28 - Latency Percentiles:
====================
LL/SC: p50: 6.61 ns p95: 6.61 ns p99: 6.62 ns
LSE : p50: 4.64 ns p95: 4.65 ns p99: 4.65 ns
CPU: 30 - Latency Percentiles:
====================
LL/SC: p50: 6.61 ns p95: 6.61 ns p99: 6.62 ns
LSE : p50: 4.64 ns p95: 14.24 ns ***p99: 27.74 ns***
On EC2 m6g.metal (64 cpus Neoverse-N1) (AWS Graviton 2):
Here both stadd and ldadd were stable and LSE was always better than LL/SC
with ldadd:
ARM64 Per-CPU Atomic Add Benchmark
===================================
Running percentile measurements (100 iterations)...
Detected 64 CPUs
CPU: 0 - Latency Percentiles:
====================
LL/SC: p50: 8.40 ns p95: 8.40 ns p99: 8.42 ns
LSE : p50: 5.60 ns p95: 5.60 ns p99: 5.61 ns
CPU: 1 - Latency Percentiles:
====================
LL/SC: p50: 8.40 ns p95: 8.40 ns p99: 8.41 ns
LSE : p50: 5.60 ns p95: 5.60 ns p99: 5.61 ns
[....]
CPU: 62 - Latency Percentiles:
====================
LL/SC: p50: 8.40 ns p95: 8.40 ns p99: 8.40 ns
LSE : p50: 5.60 ns p95: 5.60 ns p99: 5.60 ns
CPU: 63 - Latency Percentiles:
====================
LL/SC: p50: 8.40 ns p95: 8.40 ns p99: 8.41 ns
LSE : p50: 5.60 ns p95: 5.60 ns p99: 5.60 ns
=== Benchmark Complete ===
With stadd:
ARM64 Per-CPU Atomic Add Benchmark
===================================
Running percentile measurements (100 iterations)...
Detected 64 CPUs
CPU: 0 - Latency Percentiles:
====================
LL/SC: p50: 8.00 ns p95: 8.01 ns p99: 8.02 ns
LSE : p50: 5.20 ns p95: 5.21 ns p99: 5.21 ns
CPU: 1 - Latency Percentiles:
====================
LL/SC: p50: 8.00 ns p95: 8.01 ns p99: 8.01 ns
LSE : p50: 5.20 ns p95: 5.21 ns p99: 5.22 ns
[.....]
CPU: 62 - Latency Percentiles:
====================
LL/SC: p50: 8.00 ns p95: 8.01 ns p99: 8.14 ns
LSE : p50: 5.20 ns p95: 5.21 ns p99: 5.21 ns
CPU: 63 - Latency Percentiles:
====================
LL/SC: p50: 8.00 ns p95: 8.01 ns p99: 8.01 ns
LSE : p50: 5.20 ns p95: 5.20 ns p99: 5.20 ns
=== Benchmark Complete ===
More information about the linux-arm-kernel
mailing list