[RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()

Daniel Lustig dlustig at nvidia.com
Thu Mar 1 14:21:17 PST 2018


On 3/1/2018 1:54 PM, Palmer Dabbelt wrote:
> On Thu, 01 Mar 2018 07:11:41 PST (-0800), parri.andrea at gmail.com wrote:
>> Hi Daniel,
>>
>> On Thu, Feb 22, 2018 at 11:47:57AM -0800, Daniel Lustig wrote:
>>> On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
>>> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
>>> >> So we have something that is not all that rare in the Linux kernel
>>> >> community, namely two conflicting more-or-less concurrent changes.
>>> >> This clearly needs to be resolved, either by us not strengthening the
>>> >> Linux-kernel memory model in the way we were planning to or by you
>>> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of
>>> >> externally viewed release-acquire situations.
>>> >>
>>> >> Other thoughts?
>>> >
>>> > Like said in the other email, I would _much_ prefer to not go weaker
>>> > than PPC, I find that PPC is already painfully weak at times.
>>>
>>> Sure, and RISC-V could make this work too by using RCsc instructions
>>> and/or by using lightweight fences instead.  It just wasn't clear at
>>> first whether smp_load_acquire() and smp_store_release() were RCpc,
>>> RCsc, or something else, and hence whether RISC-V would actually need
>>> to use something stronger than pure RCpc there.  Likewise for
>>> spin_unlock()/spin_lock() and everywhere else this comes up.
>>
>> while digging into riscv's locks and atomics to fix the issues discussed
>> earlier in this thread, I became aware of another issue:
>>
>> Considering here the CMPXCHG primitives, for example, I noticed that the
>> implementation currently provides the four variants
>>
>>     ATOMIC_OPS(        , .aqrl)
>>     ATOMIC_OPS(_acquire,   .aq)
>>     ATOMIC_OPS(_release,   .rl)
>>     ATOMIC_OPS(_relaxed,      )
>>
>> (corresponding, resp., to
>>
>>     atomic_cmpxchg()
>>     atomic_cmpxchg_acquire()
>>     atomic_cmpxchg_release()
>>     atomic_cmpxchg_relaxed()  )
>>
>> so that the first variant above ends up doing
>>
>>     0:    lr.w.aqrl  %0, %addr
>>         bne        %0, %old, 1f
>>         sc.w.aqrl  %1, %new, %addr
>>         bnez       %1, 0b
>>         1:
>>
>> From Sect. 2.3.5. ("Acquire/Release Ordering") of the Spec.,
>>
>>  "AMOs with both .aq and .rl set are fully-ordered operations.  Treating
>>   the load part and the store part as independent RCsc operations is not
>>   in and of itself sufficient to enforce full fencing behavior, but this
>>   subtle weak behavior is counterintuitive and not much of an advantage
>>   architecturally, especially with lr and sc also available [...]."
>>
>> I understand that
>>
>>         { x = y = u = v = 0 }
>>
>>     P0()
>>
>>     WRITE_ONCE(x, 1);        ("relaxed" store, sw)
>>     atomic_cmpxchg(&u, 0, 1);
>>     r0 = READ_ONCE(y);        ("relaxed" load, lw)
>>
>>     P1()
>>
>>     WRITE_ONCE(y, 1);
>>     atomic_cmpxchg(&v, 0, 1);
>>     r1 = READ_ONCE(x);
>>
>> could result in (u = v = 1 and r0 = r1 = 0) at the end; can you confirm?
> 
> cmpxchg isn't an AMO, it's an LR SC sequence, so that blurb doesn't
> apply.  I think "lr.w.aqrl" and "sc.w.aqrl" is not sufficient to
> perform a fully ordered operation (ie, it's an incorrect
> implementation of atomic_cmpxchg()), but I was hoping to get some
> time to actually internalize this part of the RISC-V memory model at
> some point to be sure.

Agreed, the current intention is that we'll add a fence rw,rw in there
when we use lr/sc as the implementation, likely similar to what ARM does:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8e86f0b409a44193f1587e87b69c5dcf8f65be67

It's also probably not the only piece of synchronization-related code
we'll have to go back and audit as we finalize the RISC-V memory
consistency model over the next couple months or so.

Dan



More information about the linux-riscv mailing list