CAS implementation may be broken

Tue Nov 24 10:36:24 EST 2009

On Tue, Nov 24, 2009 at 04:15:37PM +0100, Toby Douglass wrote:
> So, now I'm puzzled.  I was under the impression it was possible to use  
> LL/SC to solve ABA.

Well, what you can do is:

	LL
	compute new value
	SC

The problem for LL/SC architectures is that this creates a big region
where we hope that no one else performs another 'load locked' operation.

LL/SC based architectures doing this traditionally fall fowl of a problem.
Consider the following:

do {
	LL
	complex computation to new value
	SC
} while (store failed)

and have that running on two CPUs trying to modify the same location.  It
can hang both CPUs up in that loop for a _very_ long time.  This is
primerily why we've ended up with a CAS implementation, and everything
is implemented as:

	do {
		load current
		compute new value
		do {
			LL old
			old == current?
				SC new
		} while SC failed
	} while old != current

(I argued initially for a LL / SC interface but the arguments I was making
were shot down by Linus et.al. as completely unreasonable and technically
idiotic.)

Actually, I'm rather surprised that our existing LDREX/STREX implementations
don't suffer from this problem.  Consider two CPUs operating in lock-step:

	CPU0		CPU1
	ldrex
	add		ldrex
	strex (fails)	add
	repeat		strex (fails)
	ldrex		repeat
	add		ldrex
	strex (fails)	add
	repeat		strex (fails)
	...		repeat
			...

In this scenario, the loops _never_ terminate.  The usual solution is to
add delays into the loop which depend on the CPU number, thus ensuring
that two CPUs execute a different number of instructions each time around
the loop.

> My feeling in this, for what it's worth, is that this seems an excessive  
> requirement for something which is necessary for lock-free data  
> structures; they are becoming important as the number of cores in SMP  
> systems continues to double.

Hasn't the lock-free issue been solved by things like rcu?