[PATCH] arm64: spinlock: serialise spin_unlock_wait against concurrent lockers

Boqun Feng boqun.feng at gmail.com
Wed Dec 2 03:16:10 PST 2015


On Wed, Dec 02, 2015 at 05:40:02PM +0800, Boqun Feng wrote:
> On Tue, Dec 01, 2015 at 04:32:33PM +0000, Will Deacon wrote:
> > On Tue, Dec 01, 2015 at 08:40:17AM +0800, Boqun Feng wrote:
> > > Hi Will,
> > 
> > Hi Boqun,
> > 
> > > On Fri, Nov 27, 2015 at 11:44:06AM +0000, Will Deacon wrote:
> > > > Boqun Feng reported a rather nasty ordering issue with spin_unlock_wait
> > > > on architectures implementing spin_lock with LL/SC sequences and acquire
> > > > semantics:
> > > > 
> > > >  | CPU 1                   CPU 2                     CPU 3
> > > >  | ==================      ====================      ==============
> > > >  |                                                   spin_unlock(&lock);
> > > >  |                         spin_lock(&lock):
> > > >  |                           r1 = *lock; // r1 == 0;
> > > >  |                         o = READ_ONCE(object); // reordered here
> > > >  | object = NULL;
> > > >  | smp_mb();
> > > >  | spin_unlock_wait(&lock);
> > > >  |                           *lock = 1;
> > > >  | smp_mb();
> > > >  | o->dead = true;
> > > >  |                         if (o) // true
> > > >  |                           BUG_ON(o->dead); // true!!
> > > > 
> > > > The crux of the problem is that spin_unlock_wait(&lock) can return on
> > > > CPU 1 whilst CPU 2 is in the process of taking the lock. This can be
> > > > resolved by upgrading spin_unlock_wait to a LOCK operation, forcing it
> > > 
> > > I wonder whether upgrading it to a LOCK operation is necessary. Please
> > > see below.
> > > 
> > > > to serialise against a concurrent locker and giving it acquire semantics
> > > > in the process (although it is not at all clear whether this is needed -
> > > > different callers seem to assume different things about the barrier
> > > > semantics and architectures are similarly disjoint in their
> > > > implementations of the macro).
> > > > 
> > > > This patch implements spin_unlock_wait using an LL/SC sequence with
> > > > acquire semantics on arm64. For v8.1 systems with the LSE atomics, the
> > > 
> > > IIUC, you implement this with acquire semantics because a LOCK requires
> > > acquire semantics, right? I get that spin_unlock_wait() becoming a LOCK
> > > will surely simply our analysis, because LOCK->LOCK is always globally
> > > ordered. But for this particular problem, seems only a relaxed LL/SC
> > > loop suffices, and the use of spin_unlock_wait() in do_exit() only
> > > requires a control dependency which could be fulfilled by a relaxed
> > > LL/SC loop. So the acquire semantics may be not necessary here.
> > > 
> > > Am I missing something subtle here which is the reason you want to
> > > upgrading spin_unlock_wait() to a LOCK?
> > 
> > There's two things going on here:
> > 
> >   (1) Having spin_unlock_wait do a complete LL/SC sequence (i.e. writing
> >       to the lock, like a LOCK operation would)
> > 
> >   (2) Giving it ACQUIRE semantics
> > 
> > (2) is not necessary to fix the problem you described. However, LOCK
> > operations do imply ACQUIRE and it's not clear to me that what the
> > ordering semantics are for spin_unlock_wait.
> > 
> > I'd really like to keep this as simple as possible. Introducing yet
> > another magic barrier macro that isn't well understood or widely used
> > feels like a major step backwards to me, so I opted to upgrade
> > spin_unlock_wait to LOCK on arm64 so that I no longer have to worry
> > about it.
> > 
> 
> Fair enough ;-)
> 
> Another thing about the problem is that it happens only if
> spin_unlock_wait() observes lock is *unlocked* at the first time it
> checks that, IOW, if spin_unlock_wait() observes the transition of the
> state of a lock from locked to unlocked, AFAICS, the problem won't
> happen. Maybe we can use this for a little optimization like(in pseudo
> code):
> 
> 	while (1) { // ll/sc loop
> 		r1 = load_link(*lock);
> 		if (!is_locked(r1)) {
> 			res = store_conditional(lock, r1);
> 			if (res == SUCC) // store succeeded.
> 				return;
> 		}
> 		else
> 			break;
> 	}
> 
> 	smp_rmb();
> 	do {
> 		r1 = *lock;	
> 	} while(is_locked(r1));
> 
> I think this works, and will test this using a PPC litmus in herd to
> verify it.
> 

So herd and PPCMEM both disagree with me ;-(

I'm missing the requirement of B-cumulativity here, so this doesn't
works on PPC.

Regards,
Boqun
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20151202/f675154c/attachment.sig>


More information about the linux-arm-kernel mailing list