Question: Enforcing delay() between two I/O writes

Will Deacon will at kernel.org
Mon Apr 17 06:28:05 PDT 2023


Hey Ash,

[For some reason, I can't find your mail in the list archive]

On Fri, Apr 14, 2023 at 12:55:59PM +0100, Ash Wilding wrote:
> From: Ash Wilding <ash at ashw.io>
> 
> Hey Will and fellow arm64 friends :-)

[Trimming context]

> Now, in 22ec71615 arm64: io: Relax implicit barriers in default I/O
> accessors, we downgraded the DSB in __iormb() to a DMB. To be clear,
> I'm well aware of Arm retroactively strengthening of the Armv8-A memory
> model to Other-multi-copy-atomic and (believe) I understand the
> implications of that. However, in the specific case of wanting to
> enforce a minimum delay between the first STR and the second STR
> arriving at the endpoint, I'm struggling to convince myself that a DMB
> is sufficient.

FWIW, I think for the purposes of udelay() ordering, you can remove the
memory barrier altogether, so I don't think DSB vs DMB makes any difference
either. The udelay ordering comes entirely from the control dependency off
the read-back.

In the sequence you had (I've labelled each instruction A-F):


A       STR                   // writel_relaxed()
B       LDR                   // readl()
C       DSB                   // __iormb()
        ...                   // Dummy control dependency
D   1:  ISB                   // get_cycles()
E       MRS   cntvct_el0
        ...
E       B.NE  1b
F       STR                   // writel_relaxed()


Do you agree that:

  1. The load at B cannot complete before the store at A, due to
     same-device ordering?

  2. The ISB at D remains speculative until the load at B has completed,
     due to the dummy control dependency?

  3. The read of the counter at E cannot execute until the ISB at D has
     completed?

  4. The STR at F cannot be observed until the branch at E has
     architecturally resolved (i.e. no speculative accesses to device)?

If so, then I think you just combine all of these properties: the store at F
cannot become visible (and therefore cannot complete) until the udelay loop
has terminated. That loop cannot begin until the ISB has completed, which
cannot happen until the read-back has completed and resolved the dummy
control dependency. The read-back itself cannot complete before the initial
store as they are to the same device.

Does that make sense?

Will



More information about the linux-arm-kernel mailing list