Question: Enforcing delay() between two I/O writes
Will Deacon
will at kernel.org
Mon Apr 17 06:28:05 PDT 2023
Hey Ash,
[For some reason, I can't find your mail in the list archive]
On Fri, Apr 14, 2023 at 12:55:59PM +0100, Ash Wilding wrote:
> From: Ash Wilding <ash at ashw.io>
>
> Hey Will and fellow arm64 friends :-)
[Trimming context]
> Now, in 22ec71615 arm64: io: Relax implicit barriers in default I/O
> accessors, we downgraded the DSB in __iormb() to a DMB. To be clear,
> I'm well aware of Arm retroactively strengthening of the Armv8-A memory
> model to Other-multi-copy-atomic and (believe) I understand the
> implications of that. However, in the specific case of wanting to
> enforce a minimum delay between the first STR and the second STR
> arriving at the endpoint, I'm struggling to convince myself that a DMB
> is sufficient.
FWIW, I think for the purposes of udelay() ordering, you can remove the
memory barrier altogether, so I don't think DSB vs DMB makes any difference
either. The udelay ordering comes entirely from the control dependency off
the read-back.
In the sequence you had (I've labelled each instruction A-F):
A STR // writel_relaxed()
B LDR // readl()
C DSB // __iormb()
... // Dummy control dependency
D 1: ISB // get_cycles()
E MRS cntvct_el0
...
E B.NE 1b
F STR // writel_relaxed()
Do you agree that:
1. The load at B cannot complete before the store at A, due to
same-device ordering?
2. The ISB at D remains speculative until the load at B has completed,
due to the dummy control dependency?
3. The read of the counter at E cannot execute until the ISB at D has
completed?
4. The STR at F cannot be observed until the branch at E has
architecturally resolved (i.e. no speculative accesses to device)?
If so, then I think you just combine all of these properties: the store at F
cannot become visible (and therefore cannot complete) until the udelay loop
has terminated. That loop cannot begin until the ISB has completed, which
cannot happen until the read-back has completed and resolved the dummy
control dependency. The read-back itself cannot complete before the initial
store as they are to the same device.
Does that make sense?
Will
More information about the linux-arm-kernel
mailing list