[EXT] Re: The problem about arm64: io: Relax implicit barriers in default I/O accessors
Will Deacon
will at kernel.org
Thu Jun 17 14:40:12 PDT 2021
On Thu, Jun 17, 2021 at 08:11:50PM +0000, Frank Li wrote:
>
>
> > -----Original Message-----
> > From: Will Deacon <will at kernel.org>
> > Sent: Thursday, June 17, 2021 12:42 PM
> > To: Catalin Marinas <catalin.marinas at arm.com>
> > Cc: Zhi Li <lznuaa at gmail.com>; Frank Li <frank.li at nxp.com>; Shenwei Wang
> > <shenwei.wang at nxp.com>; Han Xu <han.xu at nxp.com>; Nitin Garg
> > <nitin.garg at nxp.com>; Jason Liu <jason.hui.liu at nxp.com>; linux-arm-
> > kernel at lists.infradead.org
> > Subject: [EXT] Re: The problem about arm64: io: Relax implicit barriers in
> > default I/O accessors
> >
> > Caution: EXT Email
> >
> > On Thu, Jun 17, 2021 at 06:25:28PM +0100, Will Deacon wrote:
> > > On Thu, Jun 17, 2021 at 10:27:44AM +0100, Catalin Marinas wrote:
> > > > On Wed, Jun 16, 2021 at 02:24:39PM -0500, Zhi Li wrote:
> > > > > On Wed, Jun 16, 2021 at 2:18 PM Frank Li <frank.li at nxp.com> wrote:
> > > > > > Will Deacon wrote:
> > > > > > > It would also be helpful to know a bit more about the hardware:
> > > > > > >
> > > > > > > - What is the "internal bus fabric"?
> > > > >
> > > > > > Look like ARM call as "Interconnect", Multi AXI master and multi
> > AXI slave
> > > > > > connected together.
> > > > >
> > > > > I drawed simplified bus structure.
> > > > >
> > > > > ┌──────┐ ┌────┐
> > > > > │ A53 │ │A72 │
> > > > > └───┬──┘ └─┬──┘
> > > > > │ │
> > > > > ┌───▼──────▼──┐
> > > > > │ CCI400 │
> > > > > └─────┬───────┘
> > > > > │ 1 (a)write to ddr (normal uncached memory)
> > > > > │ DMB OSHST
> > > > > │ 2 (b)write to usb register(device, nGnRE)
> > > > > ┌─────▼───────────────────────┐ ┌
> > ───────────┐
> > > > > │ ◄───────┤ GPU │
> > > > > │ Bus fabric │ │ │
> > > > > └────────────────────────────┬┘ └
> > ───────────┘
> > > > > 3 (b) reach usb ▲ 4 usb read ▲ │ 6.(a)reach
> > > > > │ │ ddr │ │
> > > > > ┌──▼────────┴─┐ │ │
> > > > > │ │ │ │
> > > > > │ USB │ 5.usb │ │
> > > > > │ │ read │ │
> > > > > └─────────────┘ │ │
> > > > > ┌─┴───▼─┐
> > > > > │ │
> > > > > │ DDR │
> > > > > │ │
> > > > > └───────┘
> > > >
> > > > Since you sent an HTML message, it was rejected by the list server. The
> > > > above is a plain-text rendition by w3m (and changed barrier() to DMB
> > > > OSHST).
> > > >
> > > > Is the DMB propagated to the bus fabric? IIUC, our logic is that if the
> > > > write (b) to USB is observable by, let's say, the GPU, the same GPU
> > > > should also observe the write (a) to DDR. Since the write (a) to DDR is
> > > > globally observable, the USB device read at (4) should also observe it
> > > > (well, we may be wrong).
> > >
> > > It's pretty rare for barriers to propagate onto the fabric -- usually the
> > > CPU just orders everything based on acknowledgements. If the CCI gives
> > the
> > > write response for the non-cacheable write I could see that causing an
> > issue
> > > if the bus fabric can then reorder accesses, but then I would argue
> > that's a
> > > broken system because simple ring buffers in non-cacheable memory would
> > fail
>
> Bus fabric don't reorder the same axi master.
> https://elinux.org/images/7/73/Deacon-weak-to-weedy.pdf
> Page 42 show race condition. I think above race condition happen at our system.
> I am not sure if it is exist at Armv8 system.
Just a word of warning here, but the Armv8 memory model was
*retrospectively* strengthened since I gave that talk, so the stuff in that
pdf is out of date (and wrong).
> > > for peripherals hooking into the bus fabric (i.e. dma_*mb() would be
> > > broken). I think it would also mean that DSB doesn't necessarily fix the
> > > issue, it probably just makes it less likely because it takes longer to
> > > get the device write out after the acknowledgement -- ndelay() would
> > achieve
> > > the same effect :)
>
> That's what I worried.
>
> > >
> > > Frank -- what happens if you try either DMB SY, or DMB OSH (without the
> > ST)
> > > in writel()?
>
> It works well for 2 hours! Normally, problem happen below 10min. So I think DMB SY
> can fix it.
Oh, interesting. Maybe this is a case where OSH vs SY actually makes a
difference. I'm not quite sure what it means for the coherency of normal,
non-cacheable accesses (which are outer-shareable) so that probably needs a
bit more thought.
Can you confirm that the issue *does* still occur if you use dmb(osh)
instead of dmb(oshst), please?
Will
More information about the linux-arm-kernel
mailing list