Should we use "dsb" or "dmb" between write to buffer and write to register

Mark Zhang markzhang at nvidia.com
Mon Sep 12 01:43:14 PDT 2022


On 9/8/2022 9:50 PM, Will Deacon wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Wed, Sep 07, 2022 at 06:53:43PM +0100, Catalin Marinas wrote:
>> On Mon, Aug 22, 2022 at 03:53:42PM +0800, Mark Zhang wrote:
>>> May I consult when to use dsb or dmb in our device driver, thanks:
>>>
>>> For example when send a command a FW/HW, usually we do it with 3 steps:
>>>    1. memcpy(buff, src, size);
>>>    2. wmb();
>>>    3. write64(ctrl, reg_addr);
> 
> I'm assuming that write64 is just a plain 64-bit store to a device mapping
> and doesn't imply any further ordering.
> 
>>> IIUC in kernel wmb() is implemented with "dsb st". When we change this to
>>> "dmb st" then we get better performance, but we are not sure if it's safe. I
>>> have read Will's post[1] but still not sure.
>>>
>>> So our questions are:
>>> 1. can we use "dmb" here?
>>> 2. If we can then should we use "dmb st", or "dmb oshst"?
>>>
>>> Thank you very much.
>>>
>>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=22ec71615d824f4f11d38d0e55a88d8956b7e45f
>>
>> Will convinced me at the time that it's sufficient, though every time I
>> revisit this I get confused ;). Not sure whether we have updated the
>> memory model since to cover such scenarios. In practice at least from
>> what I recall that should be safe.
> 
> The Armv8 memory model is "other-multi-copy-atomic" which means that a
> store is either visible _only_ to the observer from which it originates
> or it is visible to all observers. It cannot exist in some intermediate
> state.
> 
> With that, the insight is that a write to the MMIO interface of a shared
> peripheral must be observed by all observers when it reaches the endpoint.
> Consequently, we only need to ensure that the stores from your memcpy()
> in the motivating example are observed before the MMIO write is observed
> and a DMB ST is sufficient for that. We use OSHST in Linux in case the
> memory buffer is mapped as non-cacheable but I'm doubtful whether it makes
> a real practical difference.
> 

Thank you very much Catalin and Will.

However my colleague Suresh still has some concerns:
"I believe the effect of the write64() here is to trigger a side effect 
in the device (that it is not a true write to memory although it is a 
memory access and so the NIC is not actually reading this memory 
address). If that is the case, a dsb is likely needed to guarantee that 
the effects of the memcpy are also observed by the NIC. You can check 
out some examples in Appendix K11 (Barrier Litmus Tests ) of the Arm ARM 
– for instance K11.4 and K11.5.4, where a dsb is used for these kinds of 
scenarios.
... There is a subtle difference between observing the execution of an 
instruction and observing the completion of an instruction"

What do you think?
Thanks.

>> IIRC, the logic is that if an observer in the same shareability domain
>> is seeing the write64 (3), it should have observed the memcpy (1) as
>> well given the DMB. The device in question is one of the observers
>> observing the memcpy to 'buff' (but it doesn't 'observe' the write64
>> itself). In a multi-copy atomic world, if a third observer is seeing the
>> write64 and therefore the memcpy, it means that the device should have
>> observed the memcpy as well (the multi-copy atomicity requirement).
>>
>> That's where it looks a bit like Schrodinger's cat to me (the state of
>> the cat being whether the device observed the memcpy or not). You can't
>> be sure until you have a third observer seeing the write64 to device. In
>> the absence of such hypothetical observer, the device might or might not
>> have seen the new data in 'buff' since it cannot observe the write64 to
>> its control register (and from the commit log, this seems to be the case
>> with peripherals private to a CPU).
> 
> Yes, CPU-private peripherals may well need additional ordering, but they
> likely also roll their own I/O accessors.
> 
>> I guess the question is what does it mean for the device that a third
>> observer saw the write64. In one interpretation of observability,
>> another write64 from the third observer is ordered after the original
>> write64 but to me it still doesn't help clarify any order imposed on the
>> device access to 'buff':
>>
>> Initial state:
>>    buff=0
>>    ctrl=0
>>
>> P0:           P1:             Device:
>>    Wbuff=1       Wctrl=2         Ry=buff
>>    DMB           DMB
>>    Wctrl=1       Rx=buff
>>
>> If the final 'ctrl' register value is 2 then x==1. But I don't see how
>> y==0 or 1 is influenced by Wctrl=2. If x==1 on P1, any other observer,
>> including the device, should see the buff value of 1 but this assumes
>> that there is some other ordering for when Ry=buff is issued.
> 
> You need to relate the write to 'ctrl' with the device's read of 'buff'
> somehow. Under which circumstances does the device read 'buff' (i.e.
> what are the register fields in 'ctrl')?
> 
> Will




More information about the linux-arm-kernel mailing list