[PATCH] arm64: spinlock: serialise spin_unlock_wait against concurrent lockers
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Sun Dec 6 11:27:34 PST 2015
On Sun, Dec 06, 2015 at 04:16:17PM +0800, Boqun Feng wrote:
> Hi Paul,
>
> On Thu, Dec 03, 2015 at 09:22:07AM -0800, Paul E. McKenney wrote:
> > On Thu, Dec 03, 2015 at 04:32:43PM +0000, Will Deacon wrote:
> > > Hi Peter, Paul,
> > >
> > > Firstly, thanks for writing that up. I agree that you have something
> > > that can work in theory, but see below.
> > >
> > > On Thu, Dec 03, 2015 at 02:28:39PM +0100, Peter Zijlstra wrote:
> > > > On Wed, Dec 02, 2015 at 04:11:41PM -0800, Paul E. McKenney wrote:
> > > > > This looks architecture-agnostic to me:
> > > > >
> > > > > a. TSO systems have smp_mb__after_unlock_lock() be a no-op, and
> > > > > have a read-only implementation for spin_unlock_wait().
> > > > >
> > > > > b. Small-scale weakly ordered systems can also have
> > > > > smp_mb__after_unlock_lock() be a no-op, but must instead
> > > > > have spin_unlock_wait() acquire the lock and immediately
> > > > > release it, or some optimized implementation of this.
> > > > >
> > > > > c. Large-scale weakly ordered systems are required to define
> > > > > smp_mb__after_unlock_lock() as smp_mb(), but can have a
> > > > > read-only implementation of spin_unlock_wait().
> > > >
> > > > This would still require all relevant spin_lock() sites to be annotated
> > > > with smp_mb__after_unlock_lock(), which is going to be a painful (no
> > > > warning when done wrong) exercise and expensive (added MBs all over the
> > > > place).
> >
> > On the lack of warning, agreed, but please see below. On the added MBs,
> > the only alternative I have been able to come up with has even more MBs,
> > as in on every lock acquisition. If I am missing something, please do
> > not keep it a secret!
> >
>
> Maybe we can treat this problem as a problem of data accesses other than
> one of locks?
>
> Let's take the example of tsk->flags in do_exit() and tsk->pi_lock, we
> don't need to add a full barrier for every lock acquisition of
> ->pi_lock, because some critical sections of ->pi_lock don't access the
> PF_EXITING bit of ->flags at all. What we only need is to add a full
> barrier before reading the PF_EXITING bit in a critical section of
> ->pi_lock. To achieve this, we could introduce a primitive like
> smp_load_in_lock():
>
> (on PPC and ARM64v8)
>
> #define smp_load_in_lock(x, lock) \
> ({ \
> smp_mb(); \
> READ_ONCE(x); \
> })
>
> (on other archs)
>
> #define smp_load_in_lock(x, lock) READ_ONCE(x)
>
>
> And call it every time we read a data which is not protected by the
> current lock critical section but whose updaters synchronize with the
> current lock critical section with spin_unlock_wait().
>
> I admit the name may be bad and the second parameter @lock is for a way
> to diagnosing the usage which I haven't come up with yet ;-)
>
> Thoughts?
In other words, dispense with smp_mb__after_unlock_lock() in those cases,
and use smp_load_in_lock() to get the desired effect?
If so, one concern is how to check for proper use of smp_load_in_lock().
Another concern is redundant smp_mb() instances in case of multiple
accesses to the data under a given critical section.
Or am I missing your point?
Thanx, Paul
More information about the linux-arm-kernel
mailing list