Should we use "dsb" or "dmb" between write to buffer and write to register

Wed Sep 7 10:53:43 PDT 2022

On Mon, Aug 22, 2022 at 03:53:42PM +0800, Mark Zhang wrote:
> May I consult when to use dsb or dmb in our device driver, thanks:
> 
> For example when send a command a FW/HW, usually we do it with 3 steps:
>   1. memcpy(buff, src, size);
>   2. wmb();
>   3. write64(ctrl, reg_addr);
> 
> IIUC in kernel wmb() is implemented with "dsb st". When we change this to
> "dmb st" then we get better performance, but we are not sure if it's safe. I
> have read Will's post[1] but still not sure.
> 
> So our questions are:
> 1. can we use "dmb" here?
> 2. If we can then should we use "dmb st", or "dmb oshst"?
> 
> Thank you very much.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=22ec71615d824f4f11d38d0e55a88d8956b7e45f

Will convinced me at the time that it's sufficient, though every time I
revisit this I get confused ;). Not sure whether we have updated the
memory model since to cover such scenarios. In practice at least from
what I recall that should be safe.

IIRC, the logic is that if an observer in the same shareability domain
is seeing the write64 (3), it should have observed the memcpy (1) as
well given the DMB. The device in question is one of the observers
observing the memcpy to 'buff' (but it doesn't 'observe' the write64
itself). In a multi-copy atomic world, if a third observer is seeing the
write64 and therefore the memcpy, it means that the device should have
observed the memcpy as well (the multi-copy atomicity requirement).

That's where it looks a bit like Schrodinger's cat to me (the state of
the cat being whether the device observed the memcpy or not). You can't
be sure until you have a third observer seeing the write64 to device. In
the absence of such hypothetical observer, the device might or might not
have seen the new data in 'buff' since it cannot observe the write64 to
its control register (and from the commit log, this seems to be the case
with peripherals private to a CPU).

I guess the question is what does it mean for the device that a third
observer saw the write64. In one interpretation of observability,
another write64 from the third observer is ordered after the original
write64 but to me it still doesn't help clarify any order imposed on the
device access to 'buff':

Initial state:
  buff=0
  ctrl=0

P0:		P1:		Device:
  Wbuff=1	  Wctrl=2	  Ry=buff
  DMB		  DMB
  Wctrl=1	  Rx=buff

If the final 'ctrl' register value is 2 then x==1. But I don't see how
y==0 or 1 is influenced by Wctrl=2. If x==1 on P1, any other observer,
including the device, should see the buff value of 1 but this assumes
that there is some other ordering for when Ry=buff is issued.

So, as you can see, I'm even more confused than when I started writing
this email ;). I'd leave this to Will to explain and, of course, if your
hardware folks disagree, they should let us know.

-- 
Catalin