The problem about arm64: io: Relax implicit barriers in default I/O accessors

Frank Li frank.li at nxp.com
Wed Jun 16 09:29:41 PDT 2021




> 
> > -----Original Message-----
> > From: Frank Li
> > Sent: Monday, June 14, 2021 5:42 PM
> > To: Will Deacon <will at kernel.org>
> > Cc: Shenwei Wang <shenwei.wang at nxp.com>; Han Xu <han.xu at nxp.com>;
> > Nitin Garg <nitin.garg at nxp.com>; Jason Liu <jason.hui.liu at nxp.com>;
> linux-
> > arm-kernel at lists.infradead.org; Zhi Li <lznuaa at gmail.com>
> > Subject: The problem about arm64: io: Relax implicit barriers in default I/O
> > accessors
> 
> Added Catalin.
[Frank Li] sorry, corrected catalin's address
> 
> >
> > Will Deacon:
> >
> > 	Our a test case is failure at 8QM platform(arm64).  USB transfer
> > failure if run with GPU stress test.
> > 	I found it related with your below change.
> >
> > commit 22ec71615d824f4f11d38d0e55a88d8956b7e45f
> > Author: Will Deacon <will at kernel.org>
> > Date:   Fri Jun 7 15:48:58 2019 +0100
> >
> >     arm64: io: Relax implicit barriers in default I/O accessors
> >
> >     The arm64 implementation of the default I/O accessors requires barrier
> >     instructions to satisfy the memory ordering requirements documented in
> >     memory-barriers.txt [1], which are largely derived from the behaviour of
> >     I/O accesses on x86.
> >
> > drivers/usb/host/xhci-ring.c
> >
> > static void giveback_first_trb(struct xhci_hcd *xhci, int slot_id,
> >                 unsigned int ep_index, unsigned int stream_id, int start_cycle,
> >                 struct xhci_generic_trb *start_trb)
> > {
> >         /*
> >          * Pass all the TRBs to the hardware at once and make sure this write
> >          * isn't reordered.
> >          */
> >         wmb();
> >         if (start_cycle)
> >                 start_trb->field[3] |= cpu_to_le32(start_cycle);
> >         else
> >                 start_trb->field[3] &= cpu_to_le32(~TRB_CYCLE);
> >         xhci_ring_ep_doorbell(xhci, slot_id, ep_index, stream_id);
> > }
> >
> > 	If I added wmb() before xhci_ring_ep_doorbell, the problem gone.
> > Writel include io_wmb, which map into dma_wmb().
> >
> > 	1. write ddr
> > 	2. writel
> > 		2a. io_wmb(),   dmb(oshst)
> > 		2b, write usb register
> > 	3. usb dma read ddr.
> >
> >
> > 	Internal bus fabric only guarantee the order for the same AXID.  1
> > write ddr may be slow.  USB register get data before 1 because GPU occupy
> > ddr now.  So USB DMA start read from ddr and get old dma descriptor data
> > and find not ready yet, then missed door bell.
> >
> > 	If do 2-3 times doorbell, problem also gone.
> >
> > 	So I think dmb(oshst) is not enough for writel.
> >
> >        A writeX() by the CPU to the peripheral will first wait for the
> >         completion of all prior CPU writes to memory. For example, this
> ensures
> >         that writes by the CPU to an outbound DMA buffer allocated by
> >         dma_alloc_coherent() will be visible to a DMA engine when the CPU
> > writes
> >         to its MMIO control register to trigger the transfer.
> >
> >
> > Best regards
> > Frank Li



More information about the linux-arm-kernel mailing list