The problem about arm64: io: Relax implicit barriers in default I/O accessors

Will Deacon will at kernel.org
Thu Jun 17 10:25:28 PDT 2021


On Thu, Jun 17, 2021 at 10:27:44AM +0100, Catalin Marinas wrote:
> On Wed, Jun 16, 2021 at 02:24:39PM -0500, Zhi Li wrote:
> > On Wed, Jun 16, 2021 at 2:18 PM Frank Li <frank.li at nxp.com> wrote:
> > > Will Deacon wrote:
> > > > It would also be helpful to know a bit more about the hardware:
> > > >
> > > >   - What is the "internal bus fabric"?
> > 
> > > Look like ARM call as "Interconnect",  Multi AXI master and multi AXI slave
> > > connected together. 
> > 
> > I  drawed simplified bus structure. 
> >  
> >         ┌──────┐ ┌────┐
> >         │ A53  │ │A72 │
> >         └───┬──┘ └─┬──┘
> >             │      │
> >         ┌───▼──────▼──┐
> >         │    CCI400   │
> >         └─────┬───────┘
> >               │   1 (a)write to ddr (normal uncached memory)
> >               │   DMB OSHST
> >               │   2 (b)write to usb register(device, nGnRE)
> >         ┌─────▼───────────────────────┐       ┌───────────┐
> >         │                             ◄───────┤   GPU     │
> >         │     Bus fabric              │       │           │
> >         └────────────────────────────┬┘       └───────────┘
> > 3 (b) reach usb   ▲ 4 usb read   ▲   │ 6.(a)reach
> >          │        │   ddr        │   │
> >       ┌──▼────────┴─┐            │   │
> >       │             │            │   │
> >       │  USB        │      5.usb │   │
> >       │             │      read  │   │
> >       └─────────────┘            │   │
> >                                ┌─┴───▼─┐
> >                                │       │
> >                                │ DDR   │
> >                                │       │
> >                                └───────┘
> 
> Since you sent an HTML message, it was rejected by the list server. The
> above is a plain-text rendition by w3m (and changed barrier() to DMB
> OSHST).
> 
> Is the DMB propagated to the bus fabric? IIUC, our logic is that if the
> write (b) to USB is observable by, let's say, the GPU, the same GPU
> should also observe the write (a) to DDR. Since the write (a) to DDR is
> globally observable, the USB device read at (4) should also observe it
> (well, we may be wrong).

It's pretty rare for barriers to propagate onto the fabric -- usually the
CPU just orders everything based on acknowledgements. If the CCI gives the
write response for the non-cacheable write I could see that causing an issue
if the bus fabric can then reorder accesses, but then I would argue that's a
broken system because simple ring buffers in non-cacheable memory would fail
for peripherals hooking into the bus fabric (i.e. dma_*mb() would be
broken). I think it would also mean that DSB doesn't necessarily fix the
issue, it probably just makes it less likely because it takes longer to
get the device write out after the acknowledgement -- ndelay() would achieve
the same effect :)

Frank -- what happens if you try either DMB SY, or DMB OSH (without the ST)
in writel()?

Will



More information about the linux-arm-kernel mailing list