[PATCH] ARM: mm: dma: Update coherent streaming apis with missing memory barrier
Catalin Marinas
catalin.marinas at arm.com
Tue Apr 22 08:30:07 PDT 2014
On Tue, Apr 22, 2014 at 04:18:08PM +0100, Santosh Shilimkar wrote:
> On Tuesday 22 April 2014 11:07 AM, Catalin Marinas wrote:
> > On Tue, Apr 22, 2014 at 02:49:06PM +0100, Santosh Shilimkar wrote:
> >> On Tuesday 22 April 2014 06:28 AM, Will Deacon wrote:
> >>> On Mon, Apr 21, 2014 at 07:03:10PM +0100, Santosh Shilimkar wrote:
> >>>> ARM coherent CPU dma map APIS are assumed to be nops on cache coherent
> >>>> machines. While this is true, one still needs to ensure that no
> >>>> outstanding writes are pending in CPU write buffers. To take care
> >>>> of that, we at least need a memory barrier to commit those changes
> >>>> to main memory.
> >>>>
> >>>> Patch is trying to fix those cases. Without such a patch, you will
> >>>> end up patching device drivers to avoid the synchronisation issues.
> >>>
> >>> Don't you only need these barriers if you're passing ownership of a CPU
> >>> buffer to a device? In that case, I would expect a subsequent writel to tell
> >>> the device about the new buffer, which includes the required __iowmb().
> >>> That's the reason for the relaxed accessors: to avoid this barrier when it's
> >>> not needed. Perhaps you're using the relaxed accessors where you actually
> >>> need the stronger ordering guarantees?
> >>
> >> I kind of guessed some one will bring up above point. Infact this is how
> >> mostly people have been living with the issue on coherent machines. On
> >> Keystone too, we did explicit barriers in respective drivers.
> >>
> >> I have added these barriers only on CPU to device streaming APIs because on
> >> other direction, the memory is already upto date from CPU's perspective.
> >>
> >> But if you look at the actual problem, its really responsibility of
> >> DMA streaming APIs which we are trying to push on to drivers. A device
> >> driver should be independent of whether it is running on a coherent or
> >> a non-coherent CPU.
> >>
> >> Lets take a example....
> >> MMC controller driver running on a non-coherent and coherent machine.
> >> Driver has below code sequence which is generic.
> >> 1. Prepare SG list
> >> 2. Perform CMO using DMA streaming API
> >> 3. Start DMA transfer...
> >
> > The key here is how you start the DMA transfer. So far we assumed it's
> > done via an I/O operation like writel() and it has the right barriers.
> > If we have other ways for starting this (like writing the dma_addr in
> > some other memory descriptor), should we use explicit memory barriers?
>
> My point is why we want to rely on the DMA kick mechanism barrier
> for the operation which is suppose to be completed in step 2.
Because 2 only works if you use streaming DMA. Since we cover this with
the DMA kick mechanism already for coherent DMA buffers, we just double
the barriers for the streaming case.
--
Catalin
More information about the linux-arm-kernel
mailing list