[PATCH v2] Avoid memory barrier in read_seqcount() through load acquire

Fri Aug 23 12:38:05 PDT 2024

On Fri, 23 Aug 2024, Will Deacon wrote:

> > +#ifdef CONFIG_ARCH_HAS_ACQUIRE_RELEASE
> > +#define raw_read_seqcount_begin(s)					\
> > +({									\
> > +	unsigned _seq;							\
> > +									\
> > +	while ((_seq = seqprop_sequence_acquire(s)) & 1)		\
> > +		cpu_relax();						\
>
> It would also be interesting to see whether smp_cond_load_acquire()
> performs any better that this loop in the !RT case.

The hack to do this follows. Kernel boots but no change in cycles. Also
builds a kernel just fine.

Another benchmark may be better. All my synthetic tests do is run the
function calls in a loop in parallel on multiple cpus.

The main effect here may be the reduction of power since the busyloop is
no longer required. I would favor a solution like this. But the patch is
not clean given the need to get rid of the const attribute with a cast.


Index: linux/include/linux/seqlock.h
===================================================================

--- linux.orig/include/linux/seqlock.h
+++ linux/include/linux/seqlock.h
@@ -325,9 +325,9 @@ SEQCOUNT_LOCKNAME(mutex,        struct m
 #define raw_read_seqcount_begin(s)					\
 ({									\
 	unsigned _seq;							\
+	seqcount_t *e = seqprop_ptr((struct seqcount_spinlock *)s);	\
 									\
-	while ((_seq = seqprop_sequence_acquire(s)) & 1)		\
-		cpu_relax();						\
+	_seq = smp_cond_load_acquire(&e->sequence, ((e->sequence & 1) == 0));	\
 									\
 	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);			\
 	_seq;								\