CAS implementation may be broken
Toby Douglass
trd at 45mercystreet.com
Tue Nov 24 11:20:48 EST 2009
Russell King - ARM Linux wrote:
> On Tue, Nov 24, 2009 at 04:15:37PM +0100, Toby Douglass wrote:
>> So, now I'm puzzled. I was under the impression it was possible to use
>> LL/SC to solve ABA.
>
> Well, what you can do is:
>
> LL
> compute new value
> SC
>
> The problem for LL/SC architectures is that this creates a big region
> where we hope that no one else performs another 'load locked' operation.
Yes; live-lock.
> LL/SC based architectures doing this traditionally fall fowl of a problem.
> Consider the following:
>
> do {
> LL
> complex computation to new value
> SC
> } while (store failed)
>
> and have that running on two CPUs trying to modify the same location. It
> can hang both CPUs up in that loop for a _very_ long time.
Yes, but this complex computation; in all of the data structures which
can be implemented using a lock-free algorithm, the only work that is
done is loading an exchange value and a compare value.
> (I argued initially for a LL / SC interface but the arguments I was making
> were shot down by Linus et.al. as completely unreasonable and technically
> idiotic.)
Were they?
> Actually, I'm rather surprised that our existing LDREX/STREX implementations
> don't suffer from this problem. Consider two CPUs operating in lock-step:
>
> CPU0 CPU1
> ldrex
> add ldrex
> strex (fails) add
> repeat strex (fails)
> ldrex repeat
> add ldrex
> strex (fails) add
> repeat strex (fails)
> ... repeat
> ...
>
> In this scenario, the loops _never_ terminate. The usual solution is to
> add delays into the loop which depend on the CPU number, thus ensuring
> that two CPUs execute a different number of instructions each time around
> the loop.
Yes. Algorithmic back-off is the standard solution. For light use,
it's excessive. TBH, I doubt many are using this code, since the kernel
version of it didn't have memory barriers until recently and GCC's
__sync_synchronize() does nothing on ARM.
>> My feeling in this, for what it's worth, is that this seems an excessive
>> requirement for something which is necessary for lock-free data
>> structures; they are becoming important as the number of cores in SMP
>> systems continues to double.
>
> Hasn't the lock-free issue been solved by things like rcu?
I would say yes and no. Memory handling adds significant complexity to
a lock-free algorithm. For some algorithms, it's not necessary, for
others you can get by without it at a cost (using a freelist, for
example). It's a trade-off, rather than an outright solution.
More information about the linux-arm-kernel
mailing list