LSE atomic op ordering is weaker than intended?

Thu Mar 4 08:16:29 GMT 2021

On 04/03/2021 06.38, Will Deacon wrote:
> One thing to bear in mind here is that the MMIO device cannot "observe"
> anything in the architectural sense because it is a slave interface. In
> order to observe a memory access, you must emit a read or a write
> transaction, and it's this notion of observation which the shareability
> domains are built around.
> 
> So for this example, we can talk about the CPUs (in the inner-shareable
> domain) observing the MMIO writes and inner-shareable barriers are
> sufficient for that. The device mapping of the MMIO registers will then
> ensure that they arrive at the endpoint in that order too.

Ah! That makes sense now. So as long as the CPUs agree about the MMIO 
ordering, the endpoint will see that ordering too.

In that case I should be able to get away with simple SMP/atomic 
barriers (or nothing where the control dependency implies order).

> Hopefully, as I don't grok how this deals with spurious interrupts if it
> only does MMIO writes.

It's an implementation of a virtual (software) interrupt controller 
multiplexing several IPIs over one, hence the atomics stand in for what 
would be MMIO on a real controller. The actual hardware IPI underlying 
it all does use a single MMIO read to fetch/mask the event at the 
controller, but then the tricky ordering is between ACKing that IPI 
itself (which is a write) and the virtual stuff on top.

Spurious hardware IPIs are possible in this model, and are taken care of 
by the atomic flags being the source of truth for what is actually 
pending downstream; what I need to make sure to avoid is the opposite 
case where a virtual IPI ends up pending and unmasked, but the hardware 
IPI is not correctly raised due to a race.

For additional fun: this SoC supports a completely separate "fast IPI" 
mechanism built on IMP-DEF system registers without any MMIO, which 
requires ordering against not loads and stores, but rather sysregs. I 
have no idea if this stuff is formally defined in the architecture in 
any strict sense (especially since this is IMP-DEF), but I'm probably 
going to have to run some litmus-style experiments to see how the CPU 
behaves in practice. Right now we don't use/support this mechanism; that 
will come later. Still just has one IPI per CPU though, so it won't let 
us get rid of the virtual stuff on top.

-- 
Hector Martin (marcan at marcan.st)
Public Key: https://mrcn.st/pub