[PATCH] arm64: spinlock: serialise spin_unlock_wait against concurrent lockers

Sun Dec 6 11:27:34 PST 2015

On Sun, Dec 06, 2015 at 04:16:17PM +0800, Boqun Feng wrote:
> Hi Paul,
> 
> On Thu, Dec 03, 2015 at 09:22:07AM -0800, Paul E. McKenney wrote:
> > On Thu, Dec 03, 2015 at 04:32:43PM +0000, Will Deacon wrote:
> > > Hi Peter, Paul,
> > > 
> > > Firstly, thanks for writing that up. I agree that you have something
> > > that can work in theory, but see below.
> > > 
> > > On Thu, Dec 03, 2015 at 02:28:39PM +0100, Peter Zijlstra wrote:
> > > > On Wed, Dec 02, 2015 at 04:11:41PM -0800, Paul E. McKenney wrote:
> > > > > This looks architecture-agnostic to me:
> > > > > 
> > > > > a.	TSO systems have smp_mb__after_unlock_lock() be a no-op, and
> > > > > 	have a read-only implementation for spin_unlock_wait().
> > > > > 
> > > > > b.	Small-scale weakly ordered systems can also have
> > > > > 	smp_mb__after_unlock_lock() be a no-op, but must instead
> > > > > 	have spin_unlock_wait() acquire the lock and immediately 
> > > > > 	release it, or some optimized implementation of this.
> > > > > 
> > > > > c.	Large-scale weakly ordered systems are required to define
> > > > > 	smp_mb__after_unlock_lock() as smp_mb(), but can have a
> > > > > 	read-only implementation of spin_unlock_wait().
> > > > 
> > > > This would still require all relevant spin_lock() sites to be annotated
> > > > with smp_mb__after_unlock_lock(), which is going to be a painful (no
> > > > warning when done wrong) exercise and expensive (added MBs all over the
> > > > place).
> > 
> > On the lack of warning, agreed, but please see below.  On the added MBs,
> > the only alternative I have been able to come up with has even more MBs,
> > as in on every lock acquisition.  If I am missing something, please do
> > not keep it a secret!
> > 
> 
> Maybe we can treat this problem as a problem of data accesses other than
> one of locks?
> 
> Let's take the example of tsk->flags in do_exit() and tsk->pi_lock, we
> don't need to add a full barrier for every lock acquisition of
> ->pi_lock, because some critical sections of ->pi_lock don't access the
> PF_EXITING bit of ->flags at all. What we only need is to add a full
> barrier before reading the PF_EXITING bit in a critical section of
> ->pi_lock. To achieve this, we could introduce a primitive like
> smp_load_in_lock():
> 
> (on PPC and ARM64v8)
> 
> 	#define smp_load_in_lock(x, lock) 		\
> 		({ 					\
> 			smp_mb();			\
> 			READ_ONCE(x);			\
> 		})
> 
> (on other archs)
> 	
> 	#define smp_load_in_lock(x, lock) READ_ONCE(x)
> 
> 
> And call it every time we read a data which is not protected by the
> current lock critical section but whose updaters synchronize with the
> current lock critical section with spin_unlock_wait().
> 
> I admit the name may be bad and the second parameter @lock is for a way
> to diagnosing the usage which I haven't come up with yet ;-)
> 
> Thoughts?

In other words, dispense with smp_mb__after_unlock_lock() in those cases,
and use smp_load_in_lock() to get the desired effect?

If so, one concern is how to check for proper use of smp_load_in_lock().
Another concern is redundant smp_mb() instances in case of multiple
accesses to the data under a given critical section.

Or am I missing your point?

							Thanx, Paul