LSE atomic op ordering is weaker than intended?

Hector Martin marcan at marcan.st
Wed Mar 3 13:05:19 GMT 2021


Hi Will and everyone else,

While yak shaving the AIC driver ordering minutiae, I came across this.

atomic_t.txt describes "fully ordered" atomic ops as follows:

 > Fully ordered primitives are ordered against everything prior and
 > everything subsequent. Therefore a fully ordered primitive is like
 > having an smp_mb() before and an smp_mb() after the primitive.

And among those ops are the atomic_fetch_* ops. These are implemented as 
e.g. LDSETAL, with acquire-release semantics.

However, the *AL LSE ops have acquire semantics *for the read* and 
release semantics *for the write*. As independent components of the same 
atomic op, I cannot find anything in the ARM ARM that would imply 
ordering between the Load-Acquire and *prior* memory operations, nor 
ordering between the Store-Release and *subsequent* memory operations.

So it would seem these ops are not in fact fully ordered, but rather, 
only order the read component against prior ops, and the write component 
against subsequent ops.

Put another way: the current implementation means that unqualified ops 
are equal to _acquire + _release semantics as they are described in 
atomic_t.txt, but that is weaker than "fully ordered".

Throwing this litmus test at herd7 seems to confirm this theory:

AArch64 lse-atomic-al-ops-are-not-fully-ordered
""
{
0:X1=x; 0:X3=y;
1:X1=x; 1:X3=y;
}
  P0                   | P1                  ;
  MOV X0, #1           | MOV X0, #1          ;
  LDSETAL X0, X2, [X1] | LDSETAL X0, X2, [X3];
  LDR X4, [X3]         | LDR X4, [X1]        ;
exists (0:X4=0 /\ 1:X4=0)

The positive result goes away adding a DMB ISH (i.e. smp_mb()) after the 
atomic ops, which contradicts the atomic_t.txt claim.

Did I miss something, or is this in fact an issue?

(And while I'm talking to the right people: this issue aside, do atomic 
ops on Normal memory create ordering with Device memory ops, or are 
there no guarantees there due to the fact that Normal memory is mapped 
inner-shareable and the ordering guarantees thus do not extend to 
outer-shareable Device accesses? My currenty understanding is the 
latter, but I find the ARM ARM wording hard to conclusively grok here.)

-- 
Hector Martin (marcan at marcan.st)
Public Key: https://mrcn.st/pub



More information about the linux-arm-kernel mailing list