[PATCH v2] Avoid memory barrier in read_seqcount() through load acquire
Christoph Lameter (Ampere)
cl at gentwo.org
Fri Aug 23 10:56:12 PDT 2024
On Fri, 23 Aug 2024, Will Deacon wrote:
> On Mon, Aug 19, 2024 at 11:30:15AM -0700, Christoph Lameter via B4 Relay wrote:
> > +static __always_inline unsigned \
> > +__seqprop_##lockname##_sequence_acquire(const seqcount_##lockname##_t *s) \
> > +{ \
> > + unsigned seq = smp_load_acquire(&s->seqcount.sequence); \
> > + \
> > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) \
> > + return seq; \
> > + \
> > + if (preemptible && unlikely(seq & 1)) { \
> > + __SEQ_LOCK(lockbase##_lock(s->lock)); \
> > + __SEQ_LOCK(lockbase##_unlock(s->lock)); \
> > + \
> > + /* \
> > + * Re-read the sequence counter since the (possibly \
> > + * preempted) writer made progress. \
> > + */ \
> > + seq = smp_load_acquire(&s->seqcount.sequence); \
>
> We could probably do even better with LDAPR here, as that should be
> sufficient for this. It's a can of worms though, as it's not implemented
> on all CPUs and relaxing smp_load_acquire() might introduce subtle
> breakage in places where it's used to build other types of lock. Maybe
> you can hack something to see if there's any performance left behind
> without it?
I added the following patch. Kernel booted fine. No change in the cycles
of read_seq()
LDAPR
---------------------------
Test Single 2 CPU 4 CPU 8 CPU 16 CPU 32 CPU 64 CPU ALL
write seq : 13 98 385 764 1551 3043 6259 11922
read seq : 8 8 8 8 8 8 9 10
rw seq : 8 101 247 300 467 742 1384 2101
LDA
---------------------------
Test Single 2 CPU 4 CPU 8 CPU 16 CPU 32 CPU 64 CPU ALL
write seq : 13 90 343 785 1533 3032 6315 11073
read seq : 8 8 8 8 8 8 9 11
rw seq : 8 79 227 313 423 755 1313 2220
Index: linux/arch/arm64/include/asm/barrier.h
===================================================================
--- linux.orig/arch/arm64/include/asm/barrier.h
+++ linux/arch/arm64/include/asm/barrier.h
@@ -167,22 +167,22 @@ do { \
kasan_check_read(__p, sizeof(*p)); \
switch (sizeof(*p)) { \
case 1: \
- asm volatile ("ldarb %w0, %1" \
+ asm volatile (".arch_extension rcpc\nldaprb %w0, %1" \
: "=r" (*(__u8 *)__u.__c) \
: "Q" (*__p) : "memory"); \
break; \
case 2: \
- asm volatile ("ldarh %w0, %1" \
+ asm volatile (".arch_extension rcpc\nldaprh %w0, %1" \
: "=r" (*(__u16 *)__u.__c) \
: "Q" (*__p) : "memory"); \
break; \
case 4: \
- asm volatile ("ldar %w0, %1" \
+ asm volatile (".arch_extension rcpc\nldapr %w0, %1" \
: "=r" (*(__u32 *)__u.__c) \
: "Q" (*__p) : "memory"); \
break; \
case 8: \
- asm volatile ("ldar %0, %1" \
+ asm volatile (".arch_extension rcpc\nldapr %0, %1" \
: "=r" (*(__u64 *)__u.__c) \
: "Q" (*__p) : "memory"); \
break; \
More information about the linux-arm-kernel
mailing list