The problem about arm64: io: Relax implicit barriers in default I/O accessors

Frank Li frank.li at nxp.com
Wed Jun 16 09:27:59 PDT 2021



> -----Original Message-----
> From: Frank Li
> Sent: Monday, June 14, 2021 5:42 PM
> To: Will Deacon <will at kernel.org>
> Cc: Shenwei Wang <shenwei.wang at nxp.com>; Han Xu <han.xu at nxp.com>;
> Nitin Garg <nitin.garg at nxp.com>; Jason Liu <jason.hui.liu at nxp.com>; linux-
> arm-kernel at lists.infradead.org; Zhi Li <lznuaa at gmail.com>
> Subject: The problem about arm64: io: Relax implicit barriers in default I/O
> accessors

Added Catalin. 

> 
> Will Deacon:
> 
> 	Our a test case is failure at 8QM platform(arm64).  USB transfer
> failure if run with GPU stress test.
> 	I found it related with your below change.
> 
> commit 22ec71615d824f4f11d38d0e55a88d8956b7e45f
> Author: Will Deacon <will at kernel.org>
> Date:   Fri Jun 7 15:48:58 2019 +0100
> 
>     arm64: io: Relax implicit barriers in default I/O accessors
> 
>     The arm64 implementation of the default I/O accessors requires barrier
>     instructions to satisfy the memory ordering requirements documented in
>     memory-barriers.txt [1], which are largely derived from the behaviour of
>     I/O accesses on x86.
> 
> drivers/usb/host/xhci-ring.c
> 
> static void giveback_first_trb(struct xhci_hcd *xhci, int slot_id,
>                 unsigned int ep_index, unsigned int stream_id, int start_cycle,
>                 struct xhci_generic_trb *start_trb)
> {
>         /*
>          * Pass all the TRBs to the hardware at once and make sure this write
>          * isn't reordered.
>          */
>         wmb();
>         if (start_cycle)
>                 start_trb->field[3] |= cpu_to_le32(start_cycle);
>         else
>                 start_trb->field[3] &= cpu_to_le32(~TRB_CYCLE);
>         xhci_ring_ep_doorbell(xhci, slot_id, ep_index, stream_id);
> }
> 
> 	If I added wmb() before xhci_ring_ep_doorbell, the problem gone.
> Writel include io_wmb, which map into dma_wmb().
> 
> 	1. write ddr
> 	2. writel
> 		2a. io_wmb(),   dmb(oshst)
> 		2b, write usb register
> 	3. usb dma read ddr.
> 
> 
> 	Internal bus fabric only guarantee the order for the same AXID.  1
> write ddr may be slow.  USB register get data before 1 because GPU occupy
> ddr now.  So USB DMA start read from ddr and get old dma descriptor data
> and find not ready yet, then missed door bell.
> 
> 	If do 2-3 times doorbell, problem also gone.
> 
> 	So I think dmb(oshst) is not enough for writel.
> 
>        A writeX() by the CPU to the peripheral will first wait for the
>         completion of all prior CPU writes to memory. For example, this ensures
>         that writes by the CPU to an outbound DMA buffer allocated by
>         dma_alloc_coherent() will be visible to a DMA engine when the CPU
> writes
>         to its MMIO control register to trigger the transfer.
> 
> 
> Best regards
> Frank Li


More information about the linux-arm-kernel mailing list