Should we use "dsb" or "dmb" between write to buffer and write to register
Will Deacon
will at kernel.org
Thu Sep 8 06:50:17 PDT 2022
On Wed, Sep 07, 2022 at 06:53:43PM +0100, Catalin Marinas wrote:
> On Mon, Aug 22, 2022 at 03:53:42PM +0800, Mark Zhang wrote:
> > May I consult when to use dsb or dmb in our device driver, thanks:
> >
> > For example when send a command a FW/HW, usually we do it with 3 steps:
> > 1. memcpy(buff, src, size);
> > 2. wmb();
> > 3. write64(ctrl, reg_addr);
I'm assuming that write64 is just a plain 64-bit store to a device mapping
and doesn't imply any further ordering.
> > IIUC in kernel wmb() is implemented with "dsb st". When we change this to
> > "dmb st" then we get better performance, but we are not sure if it's safe. I
> > have read Will's post[1] but still not sure.
> >
> > So our questions are:
> > 1. can we use "dmb" here?
> > 2. If we can then should we use "dmb st", or "dmb oshst"?
> >
> > Thank you very much.
> >
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=22ec71615d824f4f11d38d0e55a88d8956b7e45f
>
> Will convinced me at the time that it's sufficient, though every time I
> revisit this I get confused ;). Not sure whether we have updated the
> memory model since to cover such scenarios. In practice at least from
> what I recall that should be safe.
The Armv8 memory model is "other-multi-copy-atomic" which means that a
store is either visible _only_ to the observer from which it originates
or it is visible to all observers. It cannot exist in some intermediate
state.
With that, the insight is that a write to the MMIO interface of a shared
peripheral must be observed by all observers when it reaches the endpoint.
Consequently, we only need to ensure that the stores from your memcpy()
in the motivating example are observed before the MMIO write is observed
and a DMB ST is sufficient for that. We use OSHST in Linux in case the
memory buffer is mapped as non-cacheable but I'm doubtful whether it makes
a real practical difference.
> IIRC, the logic is that if an observer in the same shareability domain
> is seeing the write64 (3), it should have observed the memcpy (1) as
> well given the DMB. The device in question is one of the observers
> observing the memcpy to 'buff' (but it doesn't 'observe' the write64
> itself). In a multi-copy atomic world, if a third observer is seeing the
> write64 and therefore the memcpy, it means that the device should have
> observed the memcpy as well (the multi-copy atomicity requirement).
>
> That's where it looks a bit like Schrodinger's cat to me (the state of
> the cat being whether the device observed the memcpy or not). You can't
> be sure until you have a third observer seeing the write64 to device. In
> the absence of such hypothetical observer, the device might or might not
> have seen the new data in 'buff' since it cannot observe the write64 to
> its control register (and from the commit log, this seems to be the case
> with peripherals private to a CPU).
Yes, CPU-private peripherals may well need additional ordering, but they
likely also roll their own I/O accessors.
> I guess the question is what does it mean for the device that a third
> observer saw the write64. In one interpretation of observability,
> another write64 from the third observer is ordered after the original
> write64 but to me it still doesn't help clarify any order imposed on the
> device access to 'buff':
>
> Initial state:
> buff=0
> ctrl=0
>
> P0: P1: Device:
> Wbuff=1 Wctrl=2 Ry=buff
> DMB DMB
> Wctrl=1 Rx=buff
>
> If the final 'ctrl' register value is 2 then x==1. But I don't see how
> y==0 or 1 is influenced by Wctrl=2. If x==1 on P1, any other observer,
> including the device, should see the buff value of 1 but this assumes
> that there is some other ordering for when Ry=buff is issued.
You need to relate the write to 'ctrl' with the device's read of 'buff'
somehow. Under which circumstances does the device read 'buff' (i.e.
what are the register fields in 'ctrl')?
Will
More information about the linux-arm-kernel
mailing list