[PATCH] dma-buf: add DMA_BUF_IOCTL_SYNC_PARTIAL support

Rong Qianfeng 11065417 at vivo.com
Fri Apr 12 00:46:35 PDT 2024


在 2024/4/12 0:52, T.J. Mercier 写道:
> On Thu, Apr 11, 2024 at 1:21 AM Rong Qianfeng <11065417 at vivo.com> wrote:
>>
>> 在 2024/4/10 0:37, T.J. Mercier 写道:
>>> [You don't often get email from tjmercier at google.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>
>>> On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng <11065417 at vivo.com> wrote:
>>>> 在 2024/4/8 15:58, Christian König 写道:
>>>>> Am 07.04.24 um 09:50 schrieb Rong Qianfeng:
>>>>>> [SNIP]
>>>>>>> Am 13.11.21 um 07:22 schrieb Jianqun Xu:
>>>>>>>> Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
>>>>>>>> offset and len.
>>>>>>> You have not given an use case for this so it is a bit hard to
>>>>>>> review. And from the existing use cases I don't see why this should
>>>>>>> be necessary.
>>>>>>>
>>>>>>> Even worse from the existing backend implementation I don't even see
>>>>>>> how drivers should be able to fulfill this semantics.
>>>>>>>
>>>>>>> Please explain further,
>>>>>>> Christian.
>>>>>> Here is a practical case:
>>>>>> The user space can allocate a large chunk of dma-buf for
>>>>>> self-management, used as a shared memory pool.
>>>>>> Small dma-buf can be allocated from this shared memory pool and
>>>>>> released back to it after use, thus improving the speed of dma-buf
>>>>>> allocation and release.
>>>>>> Additionally, custom functionalities such as memory statistics and
>>>>>> boundary checking can be implemented in the user space.
>>>>>> Of course, the above-mentioned functionalities require the
>>>>>> implementation of a partial cache sync interface.
>>>>> Well that is obvious, but where is the code doing that?
>>>>>
>>>>> You can't send out code without an actual user of it. That will
>>>>> obviously be rejected.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>> In fact, we have already used the user-level dma-buf memory pool in the
>>>> camera shooting scenario on the phone.
>>>>
>>>>    From the test results, The execution time of the photo shooting
>>>> algorithm has been reduced from 3.8s to 3s.
>>>>
>>> For phones, the (out of tree) Android version of the system heap has a
>>> page pool connected to a shrinker. That allows you to skip page
>>> allocation without fully pinning the memory like you get when
>>> allocating a dma-buf that's way larger than necessary. If it's for a
>>> camera maybe you need physically contiguous memory, but it's also
>>> possible to set that up.
>>>
>>> https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/drivers/dma-buf/heaps/system_heap.c#377
>>>
>> Thank you for the reminder.
>>
>> The page pool of the system heap can meet the needs of most scenarios,
>> but the camera shooting scenario is quite special.
>>
>> Why the camera shooting scenario needs to have a dma-buf memory pool in
>> the user-level?
>>
>> (1) The memory demand is extremely high and time requirements are
>> stringent.
> Preallocating for this makes sense to me, whether it is individual
> buffers or one large one.
>
>> (2) The memory in the page pool(system heap) is easily reclaimed or used
>> by other apps.
> Yeah, by design I guess. I have seen an implementation where the page
> pool is proactively refilled after it has been empty for some time
> which would help in scenarios where the pool is often reclaimed and
> low/empty. You could add that in a vendor heap.
Yeah, a similar feature has already been implemented by vendor.
>
>> (3) High concurrent allocation and release (with deferred-free) lead to
>> high memory usage peaks.
> Hopefully this is not every frame? I saw enough complaints about the
> deferred free of pool pages that it hasn't been carried forward since
> Android 13, so this should be less of a problem on recent kernels.
>
>> Additionally, after the camera exits, the shared memory pool can be
>> released, with minimal impact.
> Why do you care about the difference here? In one case it's when the
> buffer refcount goes to 0 and memory is freed immediately, and in the
> other it's the next time reclaim runs the shrinker.
I'm sorry, my description wasn't clear enough. What I meant to explain 
is that
the user-level dma-buf memory pool does not use reserved memory
(physically contiguous memory), and the memoryoccupation time is not too
long, resulting in a minimal impact on the system.
>
>
> I don't want to add UAPI for DMA_BUF_IOCTL_SYNC_PARTIAL to Android
> without it being in the upstream kernel. I don't think we can get that
> without an in-kernel user of dma_buf_begin_cpu_access_partial first,
> even though your use case wouldn't rely on that in-kernel usage. :\ So
> if you want to add this to phones for your camera app, then I think
> your best option is to add a vendor driver which implements this IOCTL
> and calls the dma_buf_begin_cpu_access_partial functions which are
> already exported.
Ok, thank you very much for your suggestion. I will definitely take it 
into consideration.
>
> Best,
> T.J.



More information about the Linux-rockchip mailing list