The problem about arm64: io: Relax implicit barriers in default I/O accessors

Will Deacon will at kernel.org
Wed Jun 16 11:55:36 PDT 2021


On Wed, Jun 16, 2021 at 07:40:23PM +0100, Catalin Marinas wrote:
> On Mon, Jun 14, 2021 at 10:41:38PM +0000, Frank Li wrote:
> > commit 22ec71615d824f4f11d38d0e55a88d8956b7e45f
> > Author: Will Deacon <will at kernel.org>
> > Date:   Fri Jun 7 15:48:58 2019 +0100
> > 
> >     arm64: io: Relax implicit barriers in default I/O accessors
> > 
> >     The arm64 implementation of the default I/O accessors requires barrier
> >     instructions to satisfy the memory ordering requirements documented in
> >     memory-barriers.txt [1], which are largely derived from the behaviour of
> >     I/O accesses on x86.
> [...]
> > 	If I added wmb() before xhci_ring_ep_doorbell, the problem gone.
> > 	Writel include io_wmb, which map into dma_wmb(). 
> > 	
> > 	1. write ddr
> > 	2. writel
> > 		2a. io_wmb(),   dmb(oshst)
> > 		2b, write usb register
> > 	3. usb dma read ddr.
> > 
> > 	
> > 	Internal bus fabric only guarantee the order for the same AXID.
> > 	1 write ddr may be slow.  USB register get data before 1 because
> > 	GPU occupy ddr now.  So USB DMA start read from ddr and get old
> > 	dma descriptor data and find not ready yet, then missed door
> > 	bell. 
> 
> That's a complex topic, Will should have a better answer. I'll try some
> thought exercise below introducing a hypothetical second CPU.

It would also be helpful to know a bit more about the hardware:

  - What is the "internal bus fabric"?
  - Can you be more specific about the AxIDs? I can't tell how that
    correlates back to code running on the CPU.
  - Is the device cache coherent?
  - What memory types are used to map the DDR and the USB register on the
    CPU? (I got lost in the indirection)

Also, do you know which part of the data appears to be stale when the device
reads it?

Will



More information about the linux-arm-kernel mailing list