[RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework
Lucas Stach
l.stach at pengutronix.de
Fri Jun 21 08:27:08 EDT 2013
Hi Inki,
please refrain from sending HTML Mails, it makes proper quoting without
messing up the layout everywhere pretty hard.
Am Freitag, den 21.06.2013, 20:01 +0900 schrieb Inki Dae:
[...]
> Yeah, you'll some knowledge and understanding about the API
> you are
> working with to get things right. But I think it's not an
> unreasonable
> thing to expect the programmer working directly with kernel
> interfaces
> to read up on how things work.
>
> Second thing: I'll rather have *one* consistent API for every
> subsystem,
> even if they differ from each other than having to implement
> this
> syncpoint thing in every subsystem. Remember: a single execbuf
> in DRM
> might reference both GEM objects backed by dma-buf as well
> native SHM or
> CMA backed objects. The dma-buf-mgr proposal already allows
> you to
> handle dma-bufs much the same way during validation than
> native GEM
> objects.
>
> Actually, at first I had implemented a fence helper framework based on
> reservation and dma fence to provide easy-use-interface for device
> drivers. However, that was wrong implemention: I had not only
> customized the dma fence but also not considered dead lock issue.
> After that, I have reimplemented it as dmabuf sync to solve dead
> issue, and at that time, I realized that we first need to concentrate
> on the most basic thing: the fact CPU and CPU, CPU and DMA, or DMA and
> DMA can access a same buffer, And the fact simple is the best, and the
> fact we need not only kernel side but also user side interfaces. After
> that, I collected what is the common part for all subsystems, and I
> have devised this dmabuf sync framework for it. I'm not really
> specialist in Desktop world. So question. isn't the execbuf used only
> for the GPU? the gpu has dedicated video memory(VRAM) so it needs
> migration mechanism between system memory and the dedicated video
> memory, and also to consider ordering issue while be migrated.
>
Yeah, execbuf is pretty GPU specific, but I don't see how this matters
for this discussion. Also I don't see a big difference between embedded
and desktop GPUs. Buffer migration is more of a detail here. Both take
command stream that potentially reference other buffers, which might be
native GEM or dma-buf backed objects. Both have to make sure the buffers
are in the right domain (caches cleaned and address mappings set up) and
are available for the desired operation, meaning you have to sync with
other DMA engines and maybe also with CPU.
The only case where sync isn't clearly defined right now by the current
API entrypoints is when you access memory through the dma-buf fallback
mmap support, which might happen with some software processing element
in a video pipeline or something. I agree that we will need a userspace
interface here, but I think this shouldn't be yet another sync object,
but rather more a prepare/fini_cpu_access ioctl on the dma-buf which
hooks into the existing dma-fence and reservation stuff.
>
> And to get back to my original point: if you have more than
> one task
> operating together on a buffer you absolutely need some kind
> of real IPC
> to sync them up and do something useful. Both you syncpoints
> and the
> proposed dma-fences only protect the buffer accesses to make
> sure
> different task don't stomp on each other. There is nothing in
> there to
> make sure that the output of your pipeline is valid. You have
> to take
> care of that yourself in userspace. I'll reuse your example to
> make it
> clear what I mean:
>
> Task A Task B
> ------ -------
> dma_buf_sync_lock(buf1)
> CPU write buf1
> dma_buf_sync_unlock(buf1)
> ---------schedule Task A again-------
> dma_buf_sync_lock(buf1)
> CPU write buf1
> dma_buf_sync_unlock(buf1)
> ---------schedule Task B---------
> qbuf(buf1)
>
> dma_buf_sync_lock(buf1)
> ....
>
> This is what can happen if you don't take care of proper
> syncing. Task A
> writes something to the buffer in expectation that Task B will
> take care
> of it, but before Task B even gets scheduled Task A overwrites
> the
> buffer again. Not what you wanted, isn't it?
>
> Exactly wrong example. I had already mentioned about that. "In case
> that data flow goes from A to B, it needs some kind of IPC between the
> two tasks every time" So again, your example would have no any
> problem in case that *two tasks share the same buffer but these tasks
> access the buffer(buf1) as write, and data of the buffer(buf1) isn't
> needed to be shared*. They just need to use the buffer as *storage*.
> So All they want is to avoid stomping on the buffer in this case.
>
Sorry, but I don't see the point. If no one is interested in the data of
the buffer, why are you sharing it in the first place?
>
> So to make sure the output of a pipeline of some kind is what
> you expect
> you have to do syncing with IPC
>
> So not true.
>
> . And once you do CPU access it is a
> synchronous thing in the stream of events. I see that you
> might want to
> have some kind of bracketed CPU access even for the fallback
> mmap case
> for things like V4L2 that don't provide explicit sync by their
> own, but
> in no way I can see why we would need a user/kernel shared
> syncpoint for
> this.
>
> > > A more advanced way to achieve this
> > > would be using cross-device fences to avoid going through
> userspace for
> > > every syncpoint.
> > >
> >
> > Ok, maybe there is something I missed. So question. What is
> the
> > cross-device fences? dma fence?. And how we can achieve the
> > synchronization mechanism without going through user space
> for every
> > syncpoint; CPU and DMA share a same buffer?. And could you
> explain it
> > in detail as long as possible like I did?
> >
>
> Yeah I'm talking about the proposed dma-fences. They would
> allow you to
> just queue things into the kernel without waiting for a device
> operation
> to finish. But you still have to make sure that your commands
> have the
> right order and don't go wild. So for example you could do
> something
> like this:
>
> Userspace Kernel
> --------- ------
> 1. build DRM command stream
> rendering into buf1
> 2. queue command stream with execbuf
> 1. validate
> command stream
> 1.1 reference
> buf1 for writing
> through
> dma-buf-mgr
> 2. kick off GPU
> processing
> 3. qbuf buf1 into V4L2
> 3. reference buf1
> for reading
> 3.1 wait for
> fence from GPU to
> signal
> 4. kick off V4L2
> processing
>
>
> That seems like very specific to Desktop GPU. isn't it?
>
Would you mind explaining what you think is desktop specific about that?
Regards,
Lucas
--
Pengutronix e.K. | Lucas Stach |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
More information about the linux-arm-kernel
mailing list