[RFC PATCH v1 6/7] media: video: introduce face detection driver module

Thu Dec 8 18:25:59 EST 2011

On 12/07/2011 02:40 PM, Ming Lei wrote:
>>> I understand the API you mentioned here should belong to kernel internal
>>> API, correct me if it is wrong.
>>
>> Yes, I meant the in kernel design, i.e. generic face detection kernel module
>> and an OMAP4 FDIF driver. It makes lots of sense to separate common code
>> in this way, maybe even when there would be only OMAP devices using it.
> 
> Yes, that is the motivation of the generic FD module. I think we can focus on
> two use cases for the generic FD now:
> 
> - one is to detect faces from user space image data
> 
> - another one is to detect faces in image data generated from HW(SoC
> internal bus, resize hardware)
> 
> For OMAP4 hardware, input data is always from physically continuous
> memory directly, so it is very easy to support the two cases. For the
> use case 2,
> if buffer copy is to be avoided, we can use the coming shared dma-buf[1]
> to pass the image buffer produced by other HW to FD hw directly.

Some H/W uses direct data buses to exchange data between processing blocks,
and there is no need for additional memory. We only need to manage the data
links, for which MC has been designed.

> 
> For other FD hardware, if it supports to detect faces in image data from
> physically continuous memory, I think the patch is OK to support it.
> 
> If the FD hw doesn't support to detect faces from physically continuous
> memory, I have some questions: how does user space app to parse the
> FD result if application can't get the input image data? If user space can

Do we need the map of detected objects on a input image in all cases ?
If an application needs only coordinates of detected object on a video
signal to for example, focus on it, trigger some action, or just count
detected faces, etc. Perhaps there are more practical similar use cases.

> get image data, how does it connect the image data with FD result? and

If hardware provides frame sequence numbers the FD result can be associated
with a frame, whether it's passing through H/W interconnect or is located
in memory.

> what standard v4l2 ways(v4l2_buffer?) can the app use to describe the
> image data?

We have USERPTR and MMAP memeory buffer for streaming IO, those use
v4l2_buffer 1). read()/write() is also used 2), mostly for compressed formats.
Except that there are works on shared buffers.

> 
>> I'm sure now the Samsung devices won't fit in video output node based driver
>> design. They read image data in different ways and also the FD result format
>> is totally different.
> 
> I think user space will need the FD result, so it is very important to define
> API to describe the FD result format to user space. And the input about your
> FD HW result format is certainly helpful to define the API.

I'll post exact attributes generated by our FD detection H/W soon.

> 
>>>
>>>> AFAICS OMAP4 FDIF processes only data stored in memory, thus it seems
>>>> reasonable to use the videodev interface for passing data to the kernel
>>>> from user space.
>>>>
>>>> But there might be face detection devices that accept data from other
>>>> H/W modules, e.g. transferred through SoC internal data buses between
>>>> image processing pipeline blocks. Thus any new interfaces need to be
>>>> designed with such devices in mind.
>>>>
>>>> Also the face detection hardware block might now have an input DMA
>>>> engine in it, the data could be fed from memory through some other
>>>> subsystem (e.g. resize/colour converter). Then the driver for that
>>>> subsystem would implement a video node.
>>>
>>> I think the direct input image or frame data to FD should be from memory
>>> no matter the actual data is from external H/W modules or input DMA because
>>> FD will take lot of time to detect faces in one image or frame and FD can't
>>> have so much memory to cache several images or frames data.
>>
>> Sorry, I cannot provide much details at the moment, but there exists hardware
>> that reads data from internal SoC buses and even if it uses some sort of
>> cache memory it doesn't necessarily have to be available for the user.
> 
> Without some hardware background, it is not easy to give a generic FD module
> design.

Yes, please give me some time so I can prepare the list of requirements.

> 
>> Still the FD result is associated with an image frame for such H/W, but not
>> necessarily with a memory buffer queued by a user application.
> 
> For user space application, it doesn't make sense to handle FD results
> only without image data.  Even though there are other ways of input
> image data to FD, user space still need to know the image data, so it makes
> sense to associate FD result with a memory buffer.
> 
>> How long it approximately takes to process single image for OMAP4 FDIF ?
> 
> See the link[2], and my test result is basically consistent with the data.

Thanks. The processing time is rather low, looks like we could easily detect
objects in each frame with 30..50 fps.

> 
>>>
>>> If you have seen this kind of FD hardware design, please let me know.
>>>
>>>> I'm for leaving the buffer handling details for individual drivers
>>>> and focusing on a standard interface for applications, i.e. new
>>>
>>> I think leaving buffer handling details in generic FD driver or
>>> individual drivers
>>> doesn't matter now, since it don't have effect on interfaces between kernel
>>> and user space.
>>
>> I think you misunderstood me. I wasn't talking about core/driver module split,
>> I meant we should not be making the user interface video node centric.
>>
>> I think for Samsung devices I'll need a capture video node for passing
> 
> Why is it a capture video node instead of OUTPUT v4l2 device? I think the

Let's forget about this capture device, I'm just in progress of learning
our devices internals, that's quite huge IPs..

> device name should be decided from the view of face detection function:
> FD need input image data and produce detection result.

No, we should be flexible about where the data comes from to a FD subsystem
and how it is then processed - if it is passed to some other processing block
or it is transferred to memory with DMA and returned to the user. And we
should use Media Controller API to route the data according to application
requirements.
OMAP4 FDIF is pretty simple, we need to think more about integrating FD with
existing data pipelines.

> 
>> the result to the user. So instead of associating FD result with a buffer index
> 
> See the explanation above.

No, I still insist on using frame sequence number rather than buffer index :-)

> 
>> we could try to use the frame sequence number (struct v4l2_buffer.sequence,
>> http://linuxtv.org/downloads/v4l-dvb-apis/buffer.html#v4l2-buffer).
> 
> [1], http://marc.info/?t=132281644700005&r=1&w=2
> [2], http://e2e.ti.com/support/embedded/linux/f/354/t/128938.aspx#462740

1) http://linuxtv.org/downloads/v4l-dvb-apis/vidioc-qbuf.html
2) http://linuxtv.org/downloads/v4l-dvb-apis/func-write.html

--

Thanks,
Sylwester