[RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()

Thu Mar 1 13:54:29 PST 2018

On Thu, 01 Mar 2018 07:11:41 PST (-0800), parri.andrea at gmail.com wrote:
> Hi Daniel,
>
> On Thu, Feb 22, 2018 at 11:47:57AM -0800, Daniel Lustig wrote:
>> On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
>> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
>> >> So we have something that is not all that rare in the Linux kernel
>> >> community, namely two conflicting more-or-less concurrent changes.
>> >> This clearly needs to be resolved, either by us not strengthening the
>> >> Linux-kernel memory model in the way we were planning to or by you
>> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of
>> >> externally viewed release-acquire situations.
>> >>
>> >> Other thoughts?
>> >
>> > Like said in the other email, I would _much_ prefer to not go weaker
>> > than PPC, I find that PPC is already painfully weak at times.
>>
>> Sure, and RISC-V could make this work too by using RCsc instructions
>> and/or by using lightweight fences instead.  It just wasn't clear at
>> first whether smp_load_acquire() and smp_store_release() were RCpc,
>> RCsc, or something else, and hence whether RISC-V would actually need
>> to use something stronger than pure RCpc there.  Likewise for
>> spin_unlock()/spin_lock() and everywhere else this comes up.
>
> while digging into riscv's locks and atomics to fix the issues discussed
> earlier in this thread, I became aware of another issue:
>
> Considering here the CMPXCHG primitives, for example, I noticed that the
> implementation currently provides the four variants
>
> 	ATOMIC_OPS(        , .aqrl)
> 	ATOMIC_OPS(_acquire,   .aq)
> 	ATOMIC_OPS(_release,   .rl)
> 	ATOMIC_OPS(_relaxed,      )
>
> (corresponding, resp., to
>
> 	atomic_cmpxchg()
> 	atomic_cmpxchg_acquire()
> 	atomic_cmpxchg_release()
> 	atomic_cmpxchg_relaxed()  )
>
> so that the first variant above ends up doing
>
> 	0:	lr.w.aqrl  %0, %addr
> 		bne        %0, %old, 1f
> 		sc.w.aqrl  %1, %new, %addr
> 		bnez       %1, 0b
>         1:
>
> From Sect. 2.3.5. ("Acquire/Release Ordering") of the Spec.,
>
>  "AMOs with both .aq and .rl set are fully-ordered operations.  Treating
>   the load part and the store part as independent RCsc operations is not
>   in and of itself sufficient to enforce full fencing behavior, but this
>   subtle weak behavior is counterintuitive and not much of an advantage
>   architecturally, especially with lr and sc also available [...]."
>
> I understand that
>
>         { x = y = u = v = 0 }
>
> 	P0()
>
> 	WRITE_ONCE(x, 1);		("relaxed" store, sw)
> 	atomic_cmpxchg(&u, 0, 1);
> 	r0 = READ_ONCE(y);		("relaxed" load, lw)
>
> 	P1()
>
> 	WRITE_ONCE(y, 1);
> 	atomic_cmpxchg(&v, 0, 1);
> 	r1 = READ_ONCE(x);
>
> could result in (u = v = 1 and r0 = r1 = 0) at the end; can you confirm?

cmpxchg isn't an AMO, it's an LR SC sequence, so that blurb doesn't apply.  I 
think "lr.w.aqrl" and "sc.w.aqrl" is not sufficient to perform a fully ordered 
operation (ie, it's an incorrect implementation of atomic_cmpxchg()), but I was 
hoping to get some time to actually internalize this part of the RISC-V memory 
model at some point to be sure.