Overhead of arm64 LSE per-CPU atomics?

Willy Tarreau w at 1wt.eu
Sat Nov 1 02:44:48 PDT 2025


Hi!

On Fri, Oct 31, 2025 at 08:25:07PM -0700, Paul E. McKenney wrote:
> > > -----------------8<------------------------
> > > diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h
> > > index 9abcc8ef3087..e381034324e1 100644
> > > --- a/arch/arm64/include/asm/percpu.h
> > > +++ b/arch/arm64/include/asm/percpu.h
> > > @@ -70,6 +70,7 @@ __percpu_##name##_case_##sz(void *ptr, unsigned long val)		\
> > >  	unsigned int loop;						\
> > >  	u##sz tmp;							\
> > >  									\
> > > +	asm volatile("prfm pstl1strm, %a0\n" : : "p" (ptr));
> > >  	asm volatile (ARM64_LSE_ATOMIC_INSN(				\
> > >  	/* LL/SC */							\
> > >  	"1:	ldxr" #sfx "\t%" #w "[tmp], %[ptr]\n"			\
> > > @@ -91,6 +92,7 @@ __percpu_##name##_return_case_##sz(void *ptr, unsigned long val)	\
> > >  	unsigned int loop;						\
> > >  	u##sz ret;							\
> > >  									\
> > > +	asm volatile("prfm pstl1strm, %a0\n" : : "p" (ptr));
> > >  	asm volatile (ARM64_LSE_ATOMIC_INSN(				\
> > >  	/* LL/SC */							\
> > >  	"1:	ldxr" #sfx "\t%" #w "[ret], %[ptr]\n"			\
> > > -----------------8<------------------------
> > 
> > I will give this a shot, thank you!
> 
> Jackpot!!!
> 
> This reduces the overhead to 8.427, which is significantly better than
> the non-LSE value of 9.853.  Still room for improvement, but much
> better than the 100ns values.

This is super interesting! I've blindly applied a similar change to all
of our atomics in haproxy and am seeing a consistent 2-7% perf increase
depending on the tests on a 80-core Ampere Altra (neoverse-n1). There
as well we're significantly using atomics to read/update mostly local
variables as we avoid sharing as much as possible. I'm pretty sure it
does hurt in certain cases, and we don't have this distinction of per_cpu
variants like here, however that makes me think about adding a "mostly
local" variant that we can choose from depending on the context. I'll
continue to experiment, thanks for sharing this trick (particularly to
Yicong Yang, the original reporter).

Willy



More information about the linux-arm-kernel mailing list