[PATCH] ARM: mm: dma: Update coherent streaming apis with missing memory barrier

Catalin Marinas catalin.marinas at arm.com
Thu Apr 24 02:54:23 PDT 2014


On Wed, Apr 23, 2014 at 07:37:42PM +0100, Russell King - ARM Linux wrote:
> On Wed, Apr 23, 2014 at 06:17:27PM +0100, Will Deacon wrote:
> > On Wed, Apr 23, 2014 at 05:02:16PM +0100, Catalin Marinas wrote:
> > > In the I/O coherency case, I would say it is the responsibility of the
> > > device/hardware to ensure that the data is visible to all observers
> > > (CPUs) prior to issuing a interrupt for DMA-ready. Looking at the mvebu
> > > code, I think it covers such scenario from-device or bidirectional
> > > scenarios.
> > > 
> > > Maybe Santosh still has a point ;) but I don't know what the right
> > > barrier would be here. And I really *hate* per-SoC/snoop unit barriers
> > > (I still hope a dsb would do the trick on newer/ARMv8 systems).
> > 
> > If you have device interrupts which are asynchronous to memory coherency,
> > then you're in a world of pain. I can't think of a generic (architected)
> > solution to this problem, unfortunately -- it's going to be both device
> > and interconnect specific. Adding dsbs doesn't necessarily help at all.
> 
> Think, network devices with NAPI handling.  There, we explicitly turn
> off the device's interrupt, and switch to software polling for received
> packets.
> 
> The memory for the packets has already been mapped, and we're unmapping
> the buffer, and then reading from it (to locate the ether type, and/or
> vlan headers) before passing it up the network stack.
> 
> So in this case, we need to ensure that the cache operations are ordered
> before the subsequent loads read from the DMA'd data.  It's purely an
> ordering thing, it's not a completion thing.

Well, ordering of completed cache operations ;).

> However, what must not happen is that the unmap must not be re-ordered
> before reading the descriptor and deciding whether there's a packet
> present to be unmapped.  That probabily imples that code _should_ be
> doing this:
> 
> 	status = desc->status;
> 	if (!(status & CPU_OWNS_THIS_DESCRIPTOR))
> 		no_packet;
> 
> 	rmb();
> 
> 	addr = desc->buf;
> 	len = desc->length;
> 
> 	dma_unmap_single(dev, addr, len, DMA_FROM_DEVICE);

Indeed.

> 	...receive skb...reading buffer...

The point Will and myself were trying to make is about ordering as
observed by the CPU (rather than ordering of CPU actions). Independently
of whether the DMA is coherent or not, the ordering between device write
to the buffer and status update (in-memory descriptor or IRQ) *must* be
ordered by the device and interconnects. The rmb() as per your example
above only solves the relative CPU loads and cache maintenance but they
don't have any effect on the transactions done by the device.

The mvebu SoC has exactly this problem. mvebu_hwcc_sync_io_barrier() is
something that should be handled by the hardware automatically,
especially in a system which claims cache coherency.

-- 
Catalin



More information about the linux-arm-kernel mailing list