[PATCH v4 3/4] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

Wed Apr 7 09:42:50 BST 2021

On Tue, Apr 6, 2021 at 10:56 AM Stafford Horne <shorne at gmail.com> wrote:
> On Tue, Apr 06, 2021 at 11:50:38AM +0800, Guo Ren wrote:
> > On Wed, Mar 31, 2021 at 3:23 PM Arnd Bergmann <arnd at arndb.de> wrote:
> > > On Wed, Mar 31, 2021 at 12:35 AM Stafford Horne <shorne at gmail.com> wrote:
> >
> > We shouldn't export xchg16/cmpxchg16(emulated by lr.w/sc.w) in riscv,
> > We should forbid these sub-word atomic primitive and lets the
> > programmers consider their atomic design.
>
> Fair enough, having the generic sub-word emulation would be something that
> an architecture can select to use/export.

I still have the feeling that we should generalize and unify the exact behavior
across architectures as much as possible here, while possibly also trying to
simplify the interface a little.

Looking through the various xchg()/cmpxchg() implementations, I find eight
distinct ways to do 8-bit and 16-bit atomics:

Full support:
      ia64, m68k (Atari only), x86, arm32 (v6k+), arm64

gcc/clang __sync_{val,bool}_compare_and_swap:
     s390

Emulated through ll/sc:
      alpha, powerpc

Emulated through cmpxchg loop:
      mips, openrisc, xtensa (xchg but not cmpxchg), sparc64 (cmpxchg_u8,
      xchg_u16 but not cmpxchg_u16 and xchg_u8!)

Emulated through local_irq_save (non SMP only):
        h8300, m68k (most), microblaze, mips, nds32, nios2

Emulated through hashed spinlock:
        parisc (8-bit only added in 2020, 16-bit still missing)

Forced compile-time error:
       arm32 (v4/v5/v6 non-SMP), arc, csky, riscv, parisc (16 bit), sparc32,
       sparc64, xtensa (cmpxchg)

Silently broken:
        hexagon

Since there are really only a handful of instances in the kernel
that use the cmpxchg() or xchg() on u8/u16 variables, it would seem
best to just disallow those completely and have a separate set of
functions here, with only 64-bit architectures using any variable-type
wrapper to handle both 32-bit and 64-bit arguments.

Interestingly, the s390 version using __sync_val_compare_and_swap()
seems to produce nice output on all architectures that have atomic
instructions, with any supported compiler, to the point where I think
we could just use that to replace most of the inline-asm versions except
for arm64:

#define cmpxchg(ptr, o, n)                                              \
({                                                                      \
        __typeof__(*(ptr)) __o = (o);                                   \
        __typeof__(*(ptr)) __n = (n);                                   \
        (__typeof__(*(ptr))) __sync_val_compare_and_swap((ptr),__o,__n);\
})

Not how gcc's acquire/release behavior of __sync_val_compare_and_swap()
relates to what the kernel wants here.

The gcc documentation also recommends using the standard
__atomic_compare_exchange_n() builtin instead, which would allow
constructing release/acquire/relaxed versions as well, but I could not
get it to produce equally good output. (possibly I was using it wrong)

       Arnd