[RFC PATCH v1 5/7] media: v4l2: introduce two IOCTLs for face detection
Ming Lei
ming.lei at canonical.com
Thu Dec 8 23:34:01 EST 2011
Hi,
On Fri, Dec 9, 2011 at 6:27 AM, Sylwester Nawrocki <snjw23 at gmail.com> wrote:
> On 12/08/2011 04:42 AM, Ming Lei wrote:
>>>> +/**
>>>> + * struct v4l2_obj_detection
>>>> + * @buf_index: entry, index of v4l2_buffer for face detection
>
> I would prefer having the frame sequence number here. It will be more
> future proof IMHO. If for instance we decide to use such an ioctl on
> a v4l2 sub-device, without dequeuing buffers, there will be no problem
> with that. And still in your specific use case it's not big deal to
> look up the buffer index given it's sequence number in the application.
OK, take your suggestion to use frame index, but I still have question
about it, see my question in another thread.
>
>>>> + * @centerx: return, position in x direction of detected object
>>>> + * @centery: return, position in y direction of detected object
>>>> + * @angle: return, angle of detected object
>>>> + * 0 deg ~ 359 deg, vertical is 0 deg, clockwise
>>>> + * @sizex: return, size in x direction of detected object
>>>> + * @sizey: return, size in y direction of detected object
>>>> + * @confidence: return, confidence level of detection result
>>>> + * 0: the heighest level, 9: the lowest level
>>>
>>> Hmm, not a good idea to align a public interface to the capabilities
>>> of a single hardware implementation.
>>
>> I think that the current omap interface is general enough, so why can't
>> we use it as public interface?
>
> I meant exactly the line implying the range. What if for some hardware
> it's 0..11 ?
We can let driver to normalize it to user which doesn't care if the range
is 0~11 or 10~21, a uniform range should always make user happy,
shouldn't it?
>
>>
>>> min/max confidence could be queried with
>>> relevant controls and here we could remove the line implying range.
>>
>> No, the confidence is used to describe the probability about
>> the correctness of the current detection result. Anyway, no FD can
>> make sure that it is 100% correct. Other HW can normalize its
>> confidence level to 0~9 so that application can handle it easily, IMO.
>
> 1..100 might be better, to minimize rounding errors. Nevertheless IMO if we
> can export an exact range supported by FD device we should do it, and let
> upper layers do the normalization. And the bigger numbers should mean higher
> confidence, consistently for all devices.
Looks 1..100 is better, and I will change it to 1..100.
>
> Do you think we could assume that the FD threshold range (FD_LHIT register
> in case of OMAP4) is always same as the result confidence level ?
No, they are different. FD_LHIT is used to guild FD HW to detect more
faces but more false positives __or__ less faces but less false positives.
A control class is needed to be introduced for adjusting this value of FD
HW, and I think a normalized range is better too.
>
> If so then the confidence level range could possibly be queried with the
> detection threshold control. We could name it V4L2_CID_FD_CONFIDENCE_THRESHOLD
As I said above, there is no advantage to export the range to user, and a
uniform range will make user happy.
> for example.
> I could take care of preparing the control class draft and the documentation
> for it.
It is great to hear it, :-)
>
>>
>>>> + * @reserved: future extensions
>>>> + */
>>>> +struct v4l2_obj_detection {
>
> How about changing name of this structure to v4l2_fd_primitive or v4l2_fd_shape ?
>
I think v4l2_obj_detection is better because it can be reused to describe
some other kind of object detection from video in the future.
>>>> + __u16 centerx;
>>>> + __u16 centery;
>>>> + __u16 angle;
>>>> + __u16 sizex;
>>>> + __u16 sizey;
>>>
>>> How about using struct v4l2_rect in place of centerx/centery, sizex/sizey ?
>>> After all it describes a rectangle. We could also use struct v4l2_frmsize_discrete
>>> for size but there seems to be missing en equivalent for position, e.g.
>>
>> Maybe user space would like to plot a circle or ellipse over the detected
>> objection, and I am sure that I have seen this kind of plot over detected
>> face before.
>
> OK, in any way I suggest to replace all __u16 with __u32, to minimize performance
> issues and be consistent with the data type specifying pixel values elsewhere in
> V4L.
OK, but may introduce more memory footprint for the fd result.
> It makes sense to make 'confidence' __u32 as well and add a flags attribute to
> indicate the shape.
Sounds good.
>
>>>
>>>> + __u16 confidence;
>>>> + __u32 reserved[4];
>
> And then
> __u32 reserved[10];
>
> or
> __u32 reserved[2];
>
>>>> +};
>>>> +
>>>> +#define V4L2_FD_HAS_LEFT_EYE 0x1
>>>> +#define V4L2_FD_HAS_RIGHT_EYE 0x2
>>>> +#define V4L2_FD_HAS_MOUTH 0x4
>>>> +#define V4L2_FD_HAS_FACE 0x8
>
> Do you think we could change it to:
>
> #define V4L2_FD_FL_LEFT_EYE (1 << 0)
> #define V4L2_FD_FL_RIGHT_EYE (1 << 1)
> #define V4L2_FD_FL_MOUTH (1 << 2)
> #define V4L2_FD_FL_FACE (1 << 3)
OK
> and add:
>
> #define V4L2_FD_FL_SMILE (1 << 4)
> #define V4L2_FD_FL_BLINK (1 << 5)
Do you have any suggestion about how to describe this kind of
detection?
>>>> +
>>>> +/**
>>>> + * struct v4l2_fd_detection - VIDIOC_G_FD_RESULT argument
>>>> + * @flag: return, describe which objects are detected
>>>> + * @left_eye: return, left_eye position if detected
>>>> + * @right_eye: return, right_eye position if detected
>>>> + * @mouth_eye: return, mouth_eye position if detected
>>>
>>> mouth_eye ? ;)
>>
>> Sorry, it should be mouth, :-)
>
> :) also the word return could be omitted.
>
>>
>>>
>>>> + * @face: return, face position if detected
>>>> + */
>>>> +struct v4l2_fd_detection {
>
> How about changing the name to v4l2_fd_object ?
I think the structure is used to describe one single detection result which
may include several kind of objects detected, so sounds
v4l2_fd_detection is better than v4l2_fd_object(s?).
>
>>>> + __u32 flag;
>>>> + struct v4l2_obj_detection left_eye;
>>>> + struct v4l2_obj_detection right_eye;
>>>> + struct v4l2_obj_detection mouth;
>>>> + struct v4l2_obj_detection face;
>>>
>>> I would do this differently, i.e. put "flag" inside struct v4l2_obj_detection
>>> and then struct v4l2_fd_detection would be simply an array of
>>> struct v4l2_obj_detection, i.e.
>>>
>>> struct v4l2_fd_detection {
>>> unsigned int count;
>>> struct v4l2_obj_detection [V4L2_MAX_FD_OBJECT_NUM];
>>> };
>>>
>>> This might be more flexible, e.g. if in the future some hardware supports
>>> detecting wrinkles, we could easily add that by just defining a new flag:
>>> V4L2_FD_HAS_WRINKLES, etc.
>>
>> This is a bit flexible, but not explicit enough for describing
>> interface, how about reserving these as below for future usage?
>>
>> struct v4l2_fd_detection {
>> __u32 flag;
>> Struct v4l2_obj_detection left_eye;
>> Struct v4l2_obj_detection right_eye;
>> Struct v4l2_obj_detection mouth;
>> Struct v4l2_obj_detection face;
>> Struct v4l2_obj_detection reserved[4];
>> };
>
> OK, and how about this:
>
> struct v4l2_fd_object {
> struct v4l2_fd_shape left_eye;
> struct v4l2_fd_shape right_eye;
> struct v4l2_fd_shape mouth;
> struct v4l2_fd_shape face;
> __u32 reserved[33];
Why is struct 'v4l2_fd_shape reserved[4]' removed?
> __u32 flags;
> } __packed;
Why is '__packed' needed? It will introduce performance loss if we have
not good reason to do it.
thanks,
--
Ming Lei
More information about the linux-arm-kernel
mailing list