CAS implementation may be broken
Catalin Marinas
catalin.marinas at arm.com
Tue Nov 24 11:27:14 EST 2009
On Tue, 2009-11-24 at 15:36 +0000, Russell King - ARM Linux wrote:
> On Tue, Nov 24, 2009 at 04:15:37PM +0100, Toby Douglass wrote:
> > So, now I'm puzzled. I was under the impression it was possible to use
> > LL/SC to solve ABA.
>
> Well, what you can do is:
>
> LL
> compute new value
> SC
>
> The problem for LL/SC architectures is that this creates a big region
> where we hope that no one else performs another 'load locked' operation.
As I replied to Jamie, the long computation above may introduce some
live-lock situation on some hardware implementations (I think in general
it needs to avoid cache line evictions).
> (I argued initially for a LL / SC interface but the arguments I was making
> were shot down by Linus et.al. as completely unreasonable and technically
> idiotic.)
Maybe because other architectures don't allow for many instructions
between LL and SC.
> Actually, I'm rather surprised that our existing LDREX/STREX implementations
> don't suffer from this problem. Consider two CPUs operating in lock-step:
>
> CPU0 CPU1
> ldrex
> add ldrex
> strex (fails) add
> repeat strex (fails)
> ldrex repeat
> add ldrex
> strex (fails) add
> repeat strex (fails)
> ... repeat
> ...
>
> In this scenario, the loops _never_ terminate. The usual solution is to
> add delays into the loop which depend on the CPU number, thus ensuring
> that two CPUs execute a different number of instructions each time around
> the loop.
I thought we already have a hardware solution or this. There was an
erratum for ARM11MPCore r0p0 where we had to add the delay in software
but it's no longer needed.
> > My feeling in this, for what it's worth, is that this seems an excessive
> > requirement for something which is necessary for lock-free data
> > structures; they are becoming important as the number of cores in SMP
> > systems continues to double.
>
> Hasn't the lock-free issue been solved by things like rcu?
RCU allows only one writer (and multiple readers). The completely
lock-free algorithm would allow multiple writers (multiple threads
popping from a stack in Toby's original problem).
--
Catalin
More information about the linux-arm-kernel
mailing list