LSE atomic op ordering is weaker than intended?
Hector Martin
marcan at marcan.st
Wed Mar 3 18:04:20 GMT 2021
On 04/03/2021 00.36, Will Deacon wrote:
>> Did I miss something, or is this in fact an issue?
>
> Both. The -AL atomics are actually special-cased in the
> "barrier-ordered-before" relation in the Arm ARM:
>
> [RW1 is barrier-ordered-before if]
> * RW1 appears in program order before an atomic instruction with both
> Acquire and Release semantics that appears in program order before
> RW2.
>
> However, that isn't sufficient to order prior accesses with the "load part"
> of the RmW and later accesses with the "store part" of the RmW, as you have
> observed in your test. I'm aware of some pending proposals in this area of
> the architecture, so I'm reluctant to make any changes until that's
> bottomed-out, but I'll make a note to chase that up.
I had actually seen that part of the spec, and looked at it sideways a
few times, but concluded it wasn't giving me the ordering guarantees I
was looking for (this was before I wrote the litmus test). You're right,
it does nonetheless make it stronger than the mere combination of
_acquire and _release semantics.
Glad to hear this is something being worked on! I've been giving myself
a crash course in memory model minutiae over the past few weeks :)
>> (And while I'm talking to the right people: this issue aside, do atomic ops
>> on Normal memory create ordering with Device memory ops, or are there no
>> guarantees there due to the fact that Normal memory is mapped
>> inner-shareable and the ordering guarantees thus do not extend to
>> outer-shareable Device accesses? My currenty understanding is the latter,
>> but I find the ARM ARM wording hard to conclusively grok here.)
>
> Outer-shareable is a superset of inner-shareable, but I think this would be
> easier with a specific example. I'll go and look at the AIC patch, since
> this is all a lot easier to talk about in the context of some real code.
>
> Which is the latest version I should look at?
I'm just about to send a v3 tomorrow, so I'll CC you on that patch
(don't bother with v2, this part of the code is changing a lot). That
said, it's basically the following two sequences:
A:
// ...stuff that needs to be ordered prior to the atomic here
ret = atomic_fetch_or_release(flags...)
if (condition on ret and unrelated stuff) {
writel(reg_send, ...) // includes pre-barrier
}
B:
writel_relaxed(reg_ack, ...)
dma_wmb() // need a post-barrier
atomic_fetch_andnot_acquire(flags...)
// ...stuff that needs to be ordered after the atomic here
My current understanding is that I cannot drop the dma_wmb() in B and
use _relaxed in A() and instead use full-ordered atomic ops, because the
atomic ops, operating on normal IS memory, would not make any statements
regarding ordering with device OS memory. I need the I/O writes to be
ordered with regard to the atomics.
--
Hector Martin (marcan at marcan.st)
Public Key: https://mrcn.st/pub
More information about the linux-arm-kernel
mailing list