[PATCH v2] Avoid memory barrier in read_seqcount() through load acquire
Christoph Lameter (Ampere)
cl at gentwo.org
Fri Aug 23 12:38:05 PDT 2024
On Fri, 23 Aug 2024, Will Deacon wrote:
> > +#ifdef CONFIG_ARCH_HAS_ACQUIRE_RELEASE
> > +#define raw_read_seqcount_begin(s) \
> > +({ \
> > + unsigned _seq; \
> > + \
> > + while ((_seq = seqprop_sequence_acquire(s)) & 1) \
> > + cpu_relax(); \
>
> It would also be interesting to see whether smp_cond_load_acquire()
> performs any better that this loop in the !RT case.
The hack to do this follows. Kernel boots but no change in cycles. Also
builds a kernel just fine.
Another benchmark may be better. All my synthetic tests do is run the
function calls in a loop in parallel on multiple cpus.
The main effect here may be the reduction of power since the busyloop is
no longer required. I would favor a solution like this. But the patch is
not clean given the need to get rid of the const attribute with a cast.
Index: linux/include/linux/seqlock.h
===================================================================
--- linux.orig/include/linux/seqlock.h
+++ linux/include/linux/seqlock.h
@@ -325,9 +325,9 @@ SEQCOUNT_LOCKNAME(mutex, struct m
#define raw_read_seqcount_begin(s) \
({ \
unsigned _seq; \
+ seqcount_t *e = seqprop_ptr((struct seqcount_spinlock *)s); \
\
- while ((_seq = seqprop_sequence_acquire(s)) & 1) \
- cpu_relax(); \
+ _seq = smp_cond_load_acquire(&e->sequence, ((e->sequence & 1) == 0)); \
\
kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX); \
_seq; \
More information about the linux-arm-kernel
mailing list