Introduce a new helper framework for buffer synchronization

Tue May 28 06:32:30 EDT 2013

On Tue, May 28, 2013 at 12:56:57PM +0900, Inki Dae wrote:
> 
> 
> > -----Original Message-----
> > From: linux-fbdev-owner at vger.kernel.org [mailto:linux-fbdev-
> > owner at vger.kernel.org] On Behalf Of Rob Clark
> > Sent: Tuesday, May 28, 2013 12:48 AM
> > To: Inki Dae
> > Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; Kyungmin
> > Park; myungjoo.ham; DRI mailing list;
> linux-arm-kernel at lists.infradead.org;
> > linux-media at vger.kernel.org
> > Subject: Re: Introduce a new helper framework for buffer synchronization
> > 
> > On Mon, May 27, 2013 at 6:38 AM, Inki Dae <inki.dae at samsung.com> wrote:
> > > Hi all,
> > >
> > > I have been removed previous branch and added new one with more cleanup.
> > > This time, the fence helper doesn't include user side interfaces and
> > cache
> > > operation relevant codes anymore because not only we are not sure that
> > > coupling those two things, synchronizing caches and buffer access
> > between
> > > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is
> > a
> > > good idea yet but also existing codes for user side have problems with
> > badly
> > > behaved or crashing userspace. So this could be more discussed later.
> > >
> > > The below is a new branch,
> > >
> > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> > exynos.git/?h=dma-f
> > > ence-helper
> > >
> > > And fence helper codes,
> > >
> > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> > exynos.git/commit/?
> > > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005
> > >
> > > And example codes for device driver,
> > >
> > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> > exynos.git/commit/?
> > > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae
> > >
> > > I think the time is not yet ripe for RFC posting: maybe existing dma
> > fence
> > > and reservation need more review and addition work. So I'd glad for
> > somebody
> > > giving other opinions and advices in advance before RFC posting.
> > 
> > thoughts from a *really* quick, pre-coffee, first look:
> > * any sort of helper to simplify single-buffer sort of use-cases (v4l)
> > probably wouldn't want to bake in assumption that seqno_fence is used.
> > * I guess g2d is probably not actually a simple use case, since I
> > expect you can submit blits involving multiple buffers :-P
> 
> I don't think so. One and more buffers can be used: seqno_fence also has
> only one buffer. Actually, we have already applied this approach to most
> devices; multimedia, gpu and display controller. And this approach shows
> more performance; reduced power consumption against traditional way. And g2d
> example is just to show you how to apply my approach to device driver.

Note that seqno_fence is an implementation pattern for a certain type of
direct hw->hw synchronization which uses a shared dma_buf to exchange the
sync cookie. The dma_buf attached to the seqno_fence has _nothing_ to do
with the dma_buf the fence actually coordinates access to.

I think that confusing is a large reason for why Maarten&I don't
understand what you want to achieve with your fence helpers. Currently
they're using the seqno_fence, but totally not in a way the seqno_fence
was meant to be used.

Note that with the current code there is only a pointer from dma_bufs to
the fence. The fence itself has _no_ pointer to the dma_buf it syncs. This
shouldn't be a problem since the fence fastpath for already signalled
fences is completely barrier&lock free (it's just a load+bit-test), and
fences are meant to be embedded into whatever dma tracking structure you
already have, so no overhead there. The only ugly part is the fence
refcounting, but I don't think we can drop that.

Note that you completely reinvent this part of Maarten's fence patches by
adding new r/w_complete completions to the reservation object, which
completely replaces the fence stuff.

Also note that a list of reservation entries is again meant to be used
only when submitting the dma to the gpu. With your patches you seem to
hang onto that list until dma completes. This has the ugly side-effect
that you need to allocate these reservation entries with kmalloc, whereas
if you just use them in the execbuf ioctl to construct the dma you can
usually embed it into something else you need already anyway. At least
i915 and ttm based drivers can work that way.

Furthermore fences are specifically constructed as frankenstein-monsters
between completion/waitqueues and callbacks. All the different use-cases
need the different aspects:
- busy/idle checks and bo retiring need the completion semantics
- callbacks (in interrupt context) are used for hybrid hw->irq handler->hw
  sync approaches

> 
> > * otherwise, you probably don't want to depend on dmabuf, which is why
> > reservation/fence is split out the way it is..  you want to be able to
> > use a single reservation/fence mechanism within your driver without
> > having to care about which buffers are exported to dmabuf's and which
> > are not.  Creating a dmabuf for every GEM bo is too heavyweight.
> 
> Right. But I think we should dealt with this separately. Actually, we are
> trying to use reservation for gpu pipe line synchronization such as sgx sync
> object and this approach is used without dmabuf. In order words, some device
> can use only reservation for such pipe line synchronization and at the same
> time, fence helper or similar thing with dmabuf for buffer synchronization.

I think a quick overview of the different pieces would be in order.

- ww_mutex: This is just the locking primite which allows you to grab
  multiple mutexes without the need to check for ordering (and so fear
  deadlocks).

- fence: This is just a fancy kind of completion with a bit of support for
  hw->hw completion events. The fences themselve have no tie-in with
  buffers, ww_mutexes or anything else.

- reservation: This ties together an object (doesn't need to be a buffer,
  could be any other gpu resource - see the drm/vmwgfx driver if you want
  your mind blown) with fences. Note that there's no connection the other
  way round, i.e. with the current patches you can't see which
  reservations are using which fences. Also note that other objects than
  reservations could point at fences, e.g. when the provide
  shared/exclusive semantics are not good enough for your needs.

  The ww_mutex in the reservation is simply the (fancy) lock which
  protects each reservation. The reason we need something fancy is that
  you need to lock all objects which are synced by the same fence
  toghether, otherwise you can race and construct deadlocks in the hw->hw
  sync part of fences.

- dma_buf integration: This is simply a pointer to the reservation object
  of the underlying buffer object. We need a pointer here since otherwise
  you can construct deadlocks between dma_buf objects and native buffer
  objects.

The crux is also that dma_buf integration comes last - before you can do
that you need to have your driver converted over to use
ww_mutexes/fences/reservations.

I hope that helps to unconfuse things a bit and help you understand the
different pieces of the fence/reservation/ww_mutex patches floating
around.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch