[PATCH v4 3/4] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

Wed Apr 7 15:29:12 BST 2021

On Wed, Apr 7, 2021 at 11:43 AM Christoph Hellwig <hch at infradead.org> wrote:
>
> On Tue, Apr 06, 2021 at 09:15:50AM +0200, Peter Zijlstra wrote:
> > Anyway, given you have such a crap architecture (and here I thought
> > RISC-V was supposed to be a modern design *sigh*), you had better go
> > look at the sparc64 atomic implementation which has a software backoff
> > for failed CAS in order to make fwd progress.
>
> It wasn't supposed to be modern.  It was supposed to use boring old
> ideas.  Where it actually did that it is a great ISA, in parts where
> academics actually tried to come up with cool or state of the art
> ideas (interrupt handling, tlb shootdowns, the totally fucked up
> memory model) it turned into a trainwreck.

Gentlemen, please rethink your wording.
RISC-V is neither "crap" nor a "trainwreck", regardless if you like it or not.

The comparison with sparc64 is not applicable, as sparc64 does not
have LL/SC instructions.

Further, it is not the case that RISC-V has no guarantees at all.
It just does not provide a forward progress guarantee for a
synchronization implementation,
that writes in an endless loop to a memory location while trying to
complete an LL/SC
loop on the same memory location at the same time.
If there's a reasonable algorithm, that relies on forward progress in this case,
then we should indeed think about that, but I haven't seen one so far.
The whole MCF lock idea is to actually spin on different memory
locations per CPU
to improve scalability (reduce cacheline bouncing). That's a clear indicator,
that well-scaling synchronization algorithms need to avoid contended cache lines
anyways.

RISC-V defines LR/SC loops consisting of up to 16 instructions as
constrained LR/SC loops.
Such constrained LR/SC loops provide the required forward guarantees,
that are expected
(similar to what other architectures, like AArch64, have).

What RISC-V does not have is sub-word atomics and if required, we
would have to implement
them as LL/SC sequences. And yes, using atomic instructions is
preferred over using LL/SC,
because atomics will tend to perform better (less instructions and
less spilled registers).
But that actually depends on the actual ISA implementation.

Respectfully,
Christoph