[RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework

Tue Jun 18 06:46:13 EDT 2013

On Tue, Jun 18, 2013 at 09:00:16AM +0200, Daniel Vetter wrote:
> On Mon, Jun 17, 2013 at 04:42:37PM +0100, Russell King - ARM Linux wrote:
> > What we need is something along the lines of:
> > (a) dma_buf_map_attachment() _not_ to map the scatterlist for DMA.
> > or
> > (b) drm_gem_prime_import() not to call dma_buf_map_attachment() at all.
> 
> I strongly vote for (b) (plus adding a dma_buf_map_sync interface to allow
> syncing to other devices/cpu whithout dropping the dma mappings). At least
> that's been the idea behind things, but currently all (x86-based) drm
> drivers cut corners here massively.
> 
> Aside: The entire reason behind hiding the dma map step in the dma-buf
> attachment is to allow drivers to expose crazy iommus (e.g. the tiler unit
> on omap) to other devices. So imo (a) isn't the right choice.

That's why I also talk below about adding the dma_buf_map/sync callbacks
below, so that a dma_buf implementation can do whatever is necessary with
the sg at the point of use.

However, you are correct that this should be unnecessary, as DRM should
only be calling dma_buf_map_attachment() when it needs to know about the
memory behind the object.  The problem is that people will cache that
information - it also may be expensive to setup for the dma_buf - as it
involves counting the number of pages, memory allocation, and maybe
obtaining the set of pages from shmem.

With (a) plus the callbacks below, it means that step is only performed
once, and then the DMA API part is entirely separate - we can have our
cake and eat it in that regard.

> > Note: the existing stuff does have the nice side effect of being able
> > to pass buffers which do not have a struct page * associated with them
> > through the dma_buf API - I think we can still preserve that by having
> > dma_buf provide a couple of new APIs to do the SG list map/sync/unmap,
> > but in any case we need to fix the existing API so that:
> >
> > dma_buf_map_attachment() becomes dma_buf_get_sg()
> > dma_buf_unmap_attachment() becomes dma_buf_put_sg()
> >
> > both getting rid of the DMA direction argument, and then we have four
> > new dma_buf calls:
> >
> > dma_buf_map_sg()
> > dma_buf_unmap_sg()
> > dma_buf_sync_sg_for_cpu()
> > dma_buf_sync_sg_for_device()
> >
> > which do the actual sg map/unmap via the DMA API *at the appropriate
> > time for DMA*.
> 
> Hm, my idea was to just add a dma_buf_sync_attchment for the device side
> syncing, since the cpu access stuff is already bracketed with the
> begin/end cpu access stuff. We might need a sync_for_cpu or so for mmap,
> but imo mmap support for dma_buf is a bit insane anyway, so I don't care
> too much about it.
> 
> Since such dma mappings would be really longstanding in most cases anyway
> drivers could just map with BIDIRECTIONAL and do all the real flushing
> with the new sync stuff.

Note that the DMA API debug doesn't allow you to change the direction
argument on an existing mapping (neither should it, again this is
documented in the DMA API stuff in Documentation/).  This is where you
would need the complete set of four functions I mention above which
reflect the functionality of the DMA API.

The dma_buf implementation doesn't _have_ to implement them if they
are no-ops - for example, your dma_buf sg array contains DMA pointers,
and the memory is already coherent in some way (for example, it's a
separate chunk of memory which isn't part of system RAM.)