LSE atomic op ordering is weaker than intended?
Hector Martin
marcan at marcan.st
Wed Mar 3 19:37:31 GMT 2021
On 04/03/2021 03.40, Will Deacon wrote:
>> // ...stuff that needs to be ordered prior to the atomic here
>> ret = atomic_fetch_or_release(flags...)
>> if (condition on ret and unrelated stuff) {
>> writel(reg_send, ...) // includes pre-barrier
>
> Looks you have a control dependency here, so I think that can be
> writel_relaxed() [with a comment!].
Wouldn't the control dependency be strictly on the *read* portion of the
atomic, while I'm looking to order on the write part?
Though, actually, I omitted something; the unrelated stuff includes an
atomic_read_acquire that will create order against the write-release
portion of the atomic (not per the Linux cross-platform docs, but yes
per ARMv8 B2.3.3) and *that* should be a suitable control dependency if
I'm not mistaken.
>> writel_relaxed(reg_ack, ...)
>
> Interesting, is there no MMIO read on the IRQ path?
There's a prior read, but that doesn't help here; the important part is
that this writel (ACKing the hardware IPI) cannot move after the atomic
op after it (figuring out which software IPIs it represents), because if
it does we might ACK a subsequent IPI we're not about to handle.
>> dma_wmb() // need a post-barrier
>> atomic_fetch_andnot_acquire(flags...)
>> // ...stuff that needs to be ordered after the atomic here
>
> This looks pretty dodgy to me as a later read could drift up before teh
> writel_relaxed().
That's fine, as long as those reads are ordered after the atomic fetch
portion itself (which they should be since it's an _acquire).
AIUI the worst case ordering this can result in is:
- read portion of atomic
- the entire rest of the code (if it does no writes)
- writel_relaxed()
- write portion of atomic
That's fine: the result of the atomic read gives us license to run the
rest of the code; the writel doesn't need to have happened. The
important part is that the write portion of the atomic can't happen
before the writel.
> If this is all to do with IPIs between CPUs and the "stuff that needs to be
> ordered <before/after> the atomic" is all normal, cacheable memory then I
> dont't think you need to worry about the outer-shareable domain at all. The
> only reason we have that in dma_*mb() is because the coherent DMA buffers
> might be non-cacheable to deal with non-coherent DMA.
What worried me is whether the order created by atomics on normal memory
might not extend to MMIO ops on device memory, which are treated as
outer-shareable (per D5.5.1).
I don't fully understand how the entire B2.3 section on ordering
interacts across memory regions having different attributes. That
section talks about a Common Shareability Domain, and also explicitly
mentions only Normal memory locations, so I'm having a lot of trouble
piecing together how MMIO ops are treated within the entire memory
ordering framework (and all the litmus test stuff seems to assume
everythign is IS...).
All the actual memory involved in everything but the writel() stuff is
all normal cacheable memory; what I need to ensure is that I can
transitively create memory ordering constraints between normal memory
accesses through MMIO writes.
That is, that I can create two ordered sequences on two PEs:
(normal access 1; mmio write 1)
(mmio write 2; normal access 2)
Such that I can assert that at least one of the following orderings are
true:
(normal access 1; normal access 2)
(as observed by the PEs)
or:
(mmio write 2; mmio write 1)
(as observed by the MMIO device)
But never this combination:
(normal access 2; normal access 1)
(mmio write 1; mmio write 2)
> But hack something together, and I'll look at your v3 to see what's going
> on.
Please do, hopefully the full code makes more sense.
--
Hector Martin (marcan at marcan.st)
Public Key: https://mrcn.st/pub
More information about the linux-arm-kernel
mailing list