The problem about arm64: io: Relax implicit barriers in default I/O accessors
Frank Li
frank.li at nxp.com
Wed Jun 16 09:29:41 PDT 2021
>
> > -----Original Message-----
> > From: Frank Li
> > Sent: Monday, June 14, 2021 5:42 PM
> > To: Will Deacon <will at kernel.org>
> > Cc: Shenwei Wang <shenwei.wang at nxp.com>; Han Xu <han.xu at nxp.com>;
> > Nitin Garg <nitin.garg at nxp.com>; Jason Liu <jason.hui.liu at nxp.com>;
> linux-
> > arm-kernel at lists.infradead.org; Zhi Li <lznuaa at gmail.com>
> > Subject: The problem about arm64: io: Relax implicit barriers in default I/O
> > accessors
>
> Added Catalin.
[Frank Li] sorry, corrected catalin's address
>
> >
> > Will Deacon:
> >
> > Our a test case is failure at 8QM platform(arm64). USB transfer
> > failure if run with GPU stress test.
> > I found it related with your below change.
> >
> > commit 22ec71615d824f4f11d38d0e55a88d8956b7e45f
> > Author: Will Deacon <will at kernel.org>
> > Date: Fri Jun 7 15:48:58 2019 +0100
> >
> > arm64: io: Relax implicit barriers in default I/O accessors
> >
> > The arm64 implementation of the default I/O accessors requires barrier
> > instructions to satisfy the memory ordering requirements documented in
> > memory-barriers.txt [1], which are largely derived from the behaviour of
> > I/O accesses on x86.
> >
> > drivers/usb/host/xhci-ring.c
> >
> > static void giveback_first_trb(struct xhci_hcd *xhci, int slot_id,
> > unsigned int ep_index, unsigned int stream_id, int start_cycle,
> > struct xhci_generic_trb *start_trb)
> > {
> > /*
> > * Pass all the TRBs to the hardware at once and make sure this write
> > * isn't reordered.
> > */
> > wmb();
> > if (start_cycle)
> > start_trb->field[3] |= cpu_to_le32(start_cycle);
> > else
> > start_trb->field[3] &= cpu_to_le32(~TRB_CYCLE);
> > xhci_ring_ep_doorbell(xhci, slot_id, ep_index, stream_id);
> > }
> >
> > If I added wmb() before xhci_ring_ep_doorbell, the problem gone.
> > Writel include io_wmb, which map into dma_wmb().
> >
> > 1. write ddr
> > 2. writel
> > 2a. io_wmb(), dmb(oshst)
> > 2b, write usb register
> > 3. usb dma read ddr.
> >
> >
> > Internal bus fabric only guarantee the order for the same AXID. 1
> > write ddr may be slow. USB register get data before 1 because GPU occupy
> > ddr now. So USB DMA start read from ddr and get old dma descriptor data
> > and find not ready yet, then missed door bell.
> >
> > If do 2-3 times doorbell, problem also gone.
> >
> > So I think dmb(oshst) is not enough for writel.
> >
> > A writeX() by the CPU to the peripheral will first wait for the
> > completion of all prior CPU writes to memory. For example, this
> ensures
> > that writes by the CPU to an outbound DMA buffer allocated by
> > dma_alloc_coherent() will be visible to a DMA engine when the CPU
> > writes
> > to its MMIO control register to trigger the transfer.
> >
> >
> > Best regards
> > Frank Li
More information about the linux-arm-kernel
mailing list