[RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()

Thu Mar 1 07:11:41 PST 2018

Hi Daniel,

On Thu, Feb 22, 2018 at 11:47:57AM -0800, Daniel Lustig wrote:
> On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
> >> So we have something that is not all that rare in the Linux kernel
> >> community, namely two conflicting more-or-less concurrent changes.
> >> This clearly needs to be resolved, either by us not strengthening the
> >> Linux-kernel memory model in the way we were planning to or by you
> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of
> >> externally viewed release-acquire situations.
> >>
> >> Other thoughts?
> > 
> > Like said in the other email, I would _much_ prefer to not go weaker
> > than PPC, I find that PPC is already painfully weak at times.
> 
> Sure, and RISC-V could make this work too by using RCsc instructions
> and/or by using lightweight fences instead.  It just wasn't clear at
> first whether smp_load_acquire() and smp_store_release() were RCpc,
> RCsc, or something else, and hence whether RISC-V would actually need
> to use something stronger than pure RCpc there.  Likewise for
> spin_unlock()/spin_lock() and everywhere else this comes up.

while digging into riscv's locks and atomics to fix the issues discussed
earlier in this thread, I became aware of another issue:

Considering here the CMPXCHG primitives, for example, I noticed that the
implementation currently provides the four variants

	ATOMIC_OPS(        , .aqrl)
	ATOMIC_OPS(_acquire,   .aq)
	ATOMIC_OPS(_release,   .rl)
	ATOMIC_OPS(_relaxed,      )

(corresponding, resp., to

	atomic_cmpxchg()
	atomic_cmpxchg_acquire()
	atomic_cmpxchg_release()
	atomic_cmpxchg_relaxed()  )

so that the first variant above ends up doing

	0:	lr.w.aqrl  %0, %addr
		bne        %0, %old, 1f
		sc.w.aqrl  %1, %new, %addr
		bnez       %1, 0b
        1:                   

>From Sect. 2.3.5. ("Acquire/Release Ordering") of the Spec.,

 "AMOs with both .aq and .rl set are fully-ordered operations.  Treating
  the load part and the store part as independent RCsc operations is not
  in and of itself sufficient to enforce full fencing behavior, but this
  subtle weak behavior is counterintuitive and not much of an advantage
  architecturally, especially with lr and sc also available [...]."

I understand that

        { x = y = u = v = 0 }

	P0()

	WRITE_ONCE(x, 1);		("relaxed" store, sw)
	atomic_cmpxchg(&u, 0, 1);
	r0 = READ_ONCE(y);		("relaxed" load, lw)

	P1()

	WRITE_ONCE(y, 1);
	atomic_cmpxchg(&v, 0, 1);
	r1 = READ_ONCE(x);

could result in (u = v = 1 and r0 = r1 = 0) at the end; can you confirm?

Thanks,
  Andrea