Overhead of arm64 LSE per-CPU atomics?

Catalin Marinas catalin.marinas at arm.com
Wed Nov 5 09:15:51 PST 2025


On Wed, Nov 05, 2025 at 08:25:51AM -0800, Paul E. McKenney wrote:
> On Wed, Nov 05, 2025 at 03:34:21PM +0000, Catalin Marinas wrote:
> > Given that this_cpu_*() are meant for the local CPU, there's less risk
> > of cache line bouncing between CPUs, so I'm happy to change them to
> > either use PRFM or LDADD (I think I prefer the latter). This would not
> > be a generic change for the other atomics, only the per-CPU ones.
> 
> I have easy access to only the one type of ARM system, and of course
> the choice must be driven by a wide range of systems.  But yes, it
> would be much better if we can just use this_cpu_inc().  I will use the
> non-atomics protected by interrupt disabling in the meantime, but look
> forward to being able to switch back.

BTW, did you find a problem with this_cpu_inc() in normal use with SRCU
or just in a microbenchmark hammering them? From what I understand from
the hardware folk, doing STADD in a loop saturates some queues in the
interconnect and slows down eventually. In normal use, it's just a
posted operation not affecting the subsequent instructions (or at least
that's the theory).

-- 
Catalin



More information about the linux-arm-kernel mailing list