The problem about arm64: io: Relax implicit barriers in default I/O accessors

Will Deacon will at kernel.org
Thu Jun 17 10:41:32 PDT 2021


On Thu, Jun 17, 2021 at 06:25:28PM +0100, Will Deacon wrote:
> On Thu, Jun 17, 2021 at 10:27:44AM +0100, Catalin Marinas wrote:
> > On Wed, Jun 16, 2021 at 02:24:39PM -0500, Zhi Li wrote:
> > > On Wed, Jun 16, 2021 at 2:18 PM Frank Li <frank.li at nxp.com> wrote:
> > > > Will Deacon wrote:
> > > > > It would also be helpful to know a bit more about the hardware:
> > > > >
> > > > >   - What is the "internal bus fabric"?
> > > 
> > > > Look like ARM call as "Interconnect",  Multi AXI master and multi AXI slave
> > > > connected together. 
> > > 
> > > I  drawed simplified bus structure. 
> > >  
> > >         ┌──────┐ ┌────┐
> > >         │ A53  │ │A72 │
> > >         └───┬──┘ └─┬──┘
> > >             │      │
> > >         ┌───▼──────▼──┐
> > >         │    CCI400   │
> > >         └─────┬───────┘
> > >               │   1 (a)write to ddr (normal uncached memory)
> > >               │   DMB OSHST
> > >               │   2 (b)write to usb register(device, nGnRE)
> > >         ┌─────▼───────────────────────┐       ┌───────────┐
> > >         │                             ◄───────┤   GPU     │
> > >         │     Bus fabric              │       │           │
> > >         └────────────────────────────┬┘       └───────────┘
> > > 3 (b) reach usb   ▲ 4 usb read   ▲   │ 6.(a)reach
> > >          │        │   ddr        │   │
> > >       ┌──▼────────┴─┐            │   │
> > >       │             │            │   │
> > >       │  USB        │      5.usb │   │
> > >       │             │      read  │   │
> > >       └─────────────┘            │   │
> > >                                ┌─┴───▼─┐
> > >                                │       │
> > >                                │ DDR   │
> > >                                │       │
> > >                                └───────┘
> > 
> > Since you sent an HTML message, it was rejected by the list server. The
> > above is a plain-text rendition by w3m (and changed barrier() to DMB
> > OSHST).
> > 
> > Is the DMB propagated to the bus fabric? IIUC, our logic is that if the
> > write (b) to USB is observable by, let's say, the GPU, the same GPU
> > should also observe the write (a) to DDR. Since the write (a) to DDR is
> > globally observable, the USB device read at (4) should also observe it
> > (well, we may be wrong).
> 
> It's pretty rare for barriers to propagate onto the fabric -- usually the
> CPU just orders everything based on acknowledgements. If the CCI gives the
> write response for the non-cacheable write I could see that causing an issue
> if the bus fabric can then reorder accesses, but then I would argue that's a
> broken system because simple ring buffers in non-cacheable memory would fail
> for peripherals hooking into the bus fabric (i.e. dma_*mb() would be
> broken). I think it would also mean that DSB doesn't necessarily fix the
> issue, it probably just makes it less likely because it takes longer to
> get the device write out after the acknowledgement -- ndelay() would achieve
> the same effect :)
> 
> Frank -- what happens if you try either DMB SY, or DMB OSH (without the ST)
> in writel()?

Also, digging into the A72 TRM there are a bunch of configuration signals
in this area; see SYSBARDISABLE and BROADCASTOUTER, for example.

Does the failure happen on both a53 and a72, or only on one CPU type?

Will



More information about the linux-arm-kernel mailing list