Overhead of arm64 LSE per-CPU atomics?

Catalin Marinas catalin.marinas at arm.com
Sat Nov 1 04:23:22 PDT 2025


On Fri, Oct 31, 2025 at 08:25:07PM -0700, Paul E. McKenney wrote:
> On Fri, Oct 31, 2025 at 04:38:57PM -0700, Paul E. McKenney wrote:
> > On Fri, Oct 31, 2025 at 10:43:35PM +0000, Catalin Marinas wrote:
> > > I just realised that patch doesn't touch percpu.h at all. So what about
> > > something like (untested):
> > > 
> > > -----------------8<------------------------
> > > diff --git a/arch/arm64/include/asm/percpu.h b/arch/arm64/include/asm/percpu.h
> > > index 9abcc8ef3087..e381034324e1 100644
> > > --- a/arch/arm64/include/asm/percpu.h
> > > +++ b/arch/arm64/include/asm/percpu.h
> > > @@ -70,6 +70,7 @@ __percpu_##name##_case_##sz(void *ptr, unsigned long val)		\
> > >  	unsigned int loop;						\
> > >  	u##sz tmp;							\
> > >  									\
> > > +	asm volatile("prfm pstl1strm, %a0\n" : : "p" (ptr));
> > >  	asm volatile (ARM64_LSE_ATOMIC_INSN(				\
> > >  	/* LL/SC */							\
> > >  	"1:	ldxr" #sfx "\t%" #w "[tmp], %[ptr]\n"			\
> > > @@ -91,6 +92,7 @@ __percpu_##name##_return_case_##sz(void *ptr, unsigned long val)	\
> > >  	unsigned int loop;						\
> > >  	u##sz ret;							\
> > >  									\
> > > +	asm volatile("prfm pstl1strm, %a0\n" : : "p" (ptr));
> > >  	asm volatile (ARM64_LSE_ATOMIC_INSN(				\
> > >  	/* LL/SC */							\
> > >  	"1:	ldxr" #sfx "\t%" #w "[ret], %[ptr]\n"			\
> > > -----------------8<------------------------
> > 
> > I will give this a shot, thank you!
> 
> Jackpot!!!
> 
> This reduces the overhead to 8.427, which is significantly better than
> the non-LSE value of 9.853.  Still room for improvement, but much
> better than the 100ns values.
> 
> I presume that you will send this up the normal path, but in the meantime,
> I will pull this in for further local testing, and thank you!

I think for this specific case it may work, for the futex as well but
not generally. The Neoverse-V2 TRM lists some controls in the
IMP_CPUECTLR_EL1, bits 29 to 33:

https://developer.arm.com/documentation/102375/0002

These can be configured depending on the system configuration but they
are too big knobs to cover all use-cases within an OS. This register is
typically configured by firmware, we don't touch it in Linux.

I'll dig some more but we may have to do tricks like prefetch if we
can't find a hardware configuration that satisfies all cases.

-- 
Catalin



More information about the linux-arm-kernel mailing list