The problem about arm64: io: Relax implicit barriers in default I/O accessors
Frank Li
frank.li at nxp.com
Wed Jun 16 09:27:59 PDT 2021
> -----Original Message-----
> From: Frank Li
> Sent: Monday, June 14, 2021 5:42 PM
> To: Will Deacon <will at kernel.org>
> Cc: Shenwei Wang <shenwei.wang at nxp.com>; Han Xu <han.xu at nxp.com>;
> Nitin Garg <nitin.garg at nxp.com>; Jason Liu <jason.hui.liu at nxp.com>; linux-
> arm-kernel at lists.infradead.org; Zhi Li <lznuaa at gmail.com>
> Subject: The problem about arm64: io: Relax implicit barriers in default I/O
> accessors
Added Catalin.
>
> Will Deacon:
>
> Our a test case is failure at 8QM platform(arm64). USB transfer
> failure if run with GPU stress test.
> I found it related with your below change.
>
> commit 22ec71615d824f4f11d38d0e55a88d8956b7e45f
> Author: Will Deacon <will at kernel.org>
> Date: Fri Jun 7 15:48:58 2019 +0100
>
> arm64: io: Relax implicit barriers in default I/O accessors
>
> The arm64 implementation of the default I/O accessors requires barrier
> instructions to satisfy the memory ordering requirements documented in
> memory-barriers.txt [1], which are largely derived from the behaviour of
> I/O accesses on x86.
>
> drivers/usb/host/xhci-ring.c
>
> static void giveback_first_trb(struct xhci_hcd *xhci, int slot_id,
> unsigned int ep_index, unsigned int stream_id, int start_cycle,
> struct xhci_generic_trb *start_trb)
> {
> /*
> * Pass all the TRBs to the hardware at once and make sure this write
> * isn't reordered.
> */
> wmb();
> if (start_cycle)
> start_trb->field[3] |= cpu_to_le32(start_cycle);
> else
> start_trb->field[3] &= cpu_to_le32(~TRB_CYCLE);
> xhci_ring_ep_doorbell(xhci, slot_id, ep_index, stream_id);
> }
>
> If I added wmb() before xhci_ring_ep_doorbell, the problem gone.
> Writel include io_wmb, which map into dma_wmb().
>
> 1. write ddr
> 2. writel
> 2a. io_wmb(), dmb(oshst)
> 2b, write usb register
> 3. usb dma read ddr.
>
>
> Internal bus fabric only guarantee the order for the same AXID. 1
> write ddr may be slow. USB register get data before 1 because GPU occupy
> ddr now. So USB DMA start read from ddr and get old dma descriptor data
> and find not ready yet, then missed door bell.
>
> If do 2-3 times doorbell, problem also gone.
>
> So I think dmb(oshst) is not enough for writel.
>
> A writeX() by the CPU to the peripheral will first wait for the
> completion of all prior CPU writes to memory. For example, this ensures
> that writes by the CPU to an outbound DMA buffer allocated by
> dma_alloc_coherent() will be visible to a DMA engine when the CPU
> writes
> to its MMIO control register to trigger the transfer.
>
>
> Best regards
> Frank Li
More information about the linux-arm-kernel
mailing list