[PATCH v2] Avoid memory barrier in read_seqcount() through load acquire

Fri Aug 23 10:56:12 PDT 2024

On Fri, 23 Aug 2024, Will Deacon wrote:

> On Mon, Aug 19, 2024 at 11:30:15AM -0700, Christoph Lameter via B4 Relay wrote:
> > +static __always_inline unsigned						\
> > +__seqprop_##lockname##_sequence_acquire(const seqcount_##lockname##_t *s) \
> > +{									\
> > +	unsigned seq = smp_load_acquire(&s->seqcount.sequence);		\
> > +									\
> > +	if (!IS_ENABLED(CONFIG_PREEMPT_RT))				\
> > +		return seq;						\
> > +									\
> > +	if (preemptible && unlikely(seq & 1)) {				\
> > +		__SEQ_LOCK(lockbase##_lock(s->lock));			\
> > +		__SEQ_LOCK(lockbase##_unlock(s->lock));			\
> > +									\
> > +		/*							\
> > +		 * Re-read the sequence counter since the (possibly	\
> > +		 * preempted) writer made progress.			\
> > +		 */							\
> > +		seq = smp_load_acquire(&s->seqcount.sequence);		\
>
> We could probably do even better with LDAPR here, as that should be
> sufficient for this. It's a can of worms though, as it's not implemented
> on all CPUs and relaxing smp_load_acquire() might introduce subtle
> breakage in places where it's used to build other types of lock. Maybe
> you can hack something to see if there's any performance left behind
> without it?

I added the following patch. Kernel booted fine. No change in the cycles
of read_seq()

LDAPR
---------------------------
Test					Single	2 CPU	4 CPU	8 CPU	16 CPU	32 CPU	64 CPU	ALL
write seq			:	13	98	385	764	1551	3043	6259	11922
read seq			:	8	8	8	8	8	8	9	10
rw seq				:	8	101	247	300	467	742	1384	2101

LDA
---------------------------
Test                                     Single  2 CPU   4 CPU   8 CPU   16 CPU  32 CPU  64 CPU  ALL
write seq                        :       13      90      343     785     1533    3032    6315    11073
read seq                         :       8       8       8       8       8       8       9       11
rw seq                           :       8       79      227     313     423     755     1313    2220





Index: linux/arch/arm64/include/asm/barrier.h
===================================================================

--- linux.orig/arch/arm64/include/asm/barrier.h
+++ linux/arch/arm64/include/asm/barrier.h
@@ -167,22 +167,22 @@ do {									\
 	kasan_check_read(__p, sizeof(*p));				\
 	switch (sizeof(*p)) {						\
 	case 1:								\
-		asm volatile ("ldarb %w0, %1"				\
+		asm volatile (".arch_extension rcpc\nldaprb %w0, %1"	\
 			: "=r" (*(__u8 *)__u.__c)			\
 			: "Q" (*__p) : "memory");			\
 		break;							\
 	case 2:								\
-		asm volatile ("ldarh %w0, %1"				\
+		asm volatile (".arch_extension rcpc\nldaprh %w0, %1"				\
 			: "=r" (*(__u16 *)__u.__c)			\
 			: "Q" (*__p) : "memory");			\
 		break;							\
 	case 4:								\
-		asm volatile ("ldar %w0, %1"				\
+		asm volatile (".arch_extension rcpc\nldapr %w0, %1"				\
 			: "=r" (*(__u32 *)__u.__c)			\
 			: "Q" (*__p) : "memory");			\
 		break;							\
 	case 8:								\
-		asm volatile ("ldar %0, %1"				\
+		asm volatile (".arch_extension rcpc\nldapr %0, %1"				\
 			: "=r" (*(__u64 *)__u.__c)			\
 			: "Q" (*__p) : "memory");			\
 		break;							\