[PATCH v4 3/4] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

Mon Mar 29 13:13:10 BST 2021

> -----Original Message-----
> From: Peter Zijlstra <peterz at infradead.org>
> Sent: 29 March 2021 16:57
> To: Guo Ren <guoren at kernel.org>
> Cc: linux-riscv <linux-riscv at lists.infradead.org>; Linux Kernel Mailing List
> <linux-kernel at vger.kernel.org>; linux-csky at vger.kernel.org; linux-arch
> <linux-arch at vger.kernel.org>; Guo Ren <guoren at linux.alibaba.com>; Will
> Deacon <will at kernel.org>; Ingo Molnar <mingo at redhat.com>; Waiman
> Long <longman at redhat.com>; Arnd Bergmann <arnd at arndb.de>; Anup
> Patel <anup at brainfault.org>
> Subject: Re: [PATCH v4 3/4] locking/qspinlock: Add
> ARCH_USE_QUEUED_SPINLOCKS_XCHG32
> 
> On Mon, Mar 29, 2021 at 07:19:29PM +0800, Guo Ren wrote:
> > On Mon, Mar 29, 2021 at 3:50 PM Peter Zijlstra <peterz at infradead.org>
> wrote:
> > >
> > > On Sat, Mar 27, 2021 at 06:06:38PM +0000, guoren at kernel.org wrote:
> > > > From: Guo Ren <guoren at linux.alibaba.com>
> > > >
> > > > Some architectures don't have sub-word swap atomic instruction,
> > > > they only have the full word's one.
> > > >
> > > > The sub-word swap only improve the performance when:
> > > > NR_CPUS < 16K
> > > >  *  0- 7: locked byte
> > > >  *     8: pending
> > > >  *  9-15: not used
> > > >  * 16-17: tail index
> > > >  * 18-31: tail cpu (+1)
> > > >
> > > > The 9-15 bits are wasted to use xchg16 in xchg_tail.
> > > >
> > > > Please let architecture select xchg16/xchg32 to implement
> > > > xchg_tail.
> > >
> > > So I really don't like this, this pushes complexity into the generic
> > > code for something that's really not needed.
> > >
> > > Lots of RISC already implement sub-word atomics using word ll/sc.
> > > Obviously they're not sharing code like they should be :/ See for
> > > example arch/mips/kernel/cmpxchg.c.
> > I see, we've done two versions of this:
> >  - Using cmpxchg codes from MIPS by Michael
> >  - Re-write with assembly codes by Guo
> >
> > But using the full-word atomic xchg instructions implement xchg16 has
> > the semantic risk for atomic operations.
> 
> What? -ENOPARSE
> 
> > > Also, I really do think doing ticket locks first is a far more
> > > sensible step.
> > NACK by Anup
> 
> Who's he when he's not sending NAKs ?

We had discussions in the RISC-V platforms group about this. Over there,
We had evaluated all spin lock approaches (ticket, qspinlock, etc) tried
in Linux till now. It was concluded in those discussions that eventually we
have to move to qspinlock (even if we moved to ticket spinlock temporarily)
because qspinlock avoids cache line bouncing. Also, moving to qspinlock
will be aligned with other major architectures supported in Linux (such as
x86, ARM64)

Some of the organizations working on high-end RISC-V systems (> 32
CPUs) are interested in having an optimized spinlock implementation
(just like other major architectures x86 and ARM64).

Based on above, Linux RISC-V should move to qspinlock.

Regards,
Anup