[RFC PATCH] media: hantro: respect data_offset in vb2_plane
Nicolas Dufresne
nicolas at ndufresne.ca
Sat Apr 1 08:28:04 PDT 2023
Le vendredi 31 mars 2023 à 18:24 +0200, Michael Tretter a écrit :
> Hi Nicolas,
>
> On Fri, 31 Mar 2023 11:07:44 -0400, Nicolas Dufresne wrote:
> > Le lundi 27 mars 2023 à 15:23 +0200, Michael Tretter a écrit :
> > > The vb2_plane in the vb2_v4l2_buffer may have a data_offset, which is
> > > written by user space to tell the driver that the data starts at an
> > > offset into the buffer. Currently the hantro driver assumes that the
> > > data starts directly at the base address of the buffer.
> > >
> > > Add the data_offset to the plane dma_address to make sure that the
> > > encoder actually reads the plane data if the user space put the plane
> > > data at an offset into the buffer. Otherwise the encoded data may not be
> > > the data that userspace expected to be encoded.
> > >
> >
> > The data_offset for this purpose have limited use, and this usage is only valid
> > for encoders (OUTPUT queues). Would it be possible to state this clearly in the
> > subject ?
>
> I'll update the subject.
>
> >
> > >
> > > Signed-off-by: Michael Tretter <m.tretter at pengutronix.de>
> > > ---
> > > Hi,
> > >
> > > Most other drivers also assume that the address returned by
> > > vb2_dma_contig_plane_dma_addr() is the start of the plane data. Maybe it
> > > would be better to change vb2_dma_contig_plane_dma_addr() to already add
> > > the data_offset to the plane address. However, there are a few drivers
> > > that already have a helper that respects the data_offset, but that seems
> > > to be the exception rather than the rule.
> >
> > I had this discussion recently, and its a bit hardware specific. Some HW may
> > support any random offset, in which case you will program the original (and
> > aligned) address, and program the offset in the HW. But some HW may not, in
> > which case you need to add alignment validation in the driver.
>
> This makes sense, but it's still a bit surprising that some drivers don't
> respect the data_offset. Maybe there should be a check that warns or returns
> an error to userspace if the data_offset is set but ignored by the driver?
> With such an error, the behavior of vb2_dma_contig_plane_dma_addr() would be
> obvious. OTOH, the warning would have to be added in the driver, too.
I would not be surprised if this is yet another "after the fact" API, or that we
didn't realize that by adding it, it became mandatory, Hans has been adding
capability flags for all the old one. My suggestion, cause there is no way we
can fix all the drivers is to introduce:
V4L2_BUF_CAP_SUPPORTS_DATA_OFFSET
At least we could fix userspace to avoid it if its not set. By the way, you'd be
impressed how many time we "fixed" GPU and Display drivers because they forgot
to take an offset into account.
>
> >
> > >
> > > What I am actually trying to achieve is to import a V4L2_PIX_FMT_NV12
> > > buffer from a Rockchip RGA2 (which doesn't support the multi-planar API)
> > > as a V4L2_PIX_FMT_NV12M buffer into the Hantro JPEG encoder (which
> > > doesn't support V4L2_PIX_FMT_NV12). Solving this by importing the same
> > > FD for each plane with a respective offset is how one would import such
> > > a buffer with the DRM API. Please tell me, if my approach is wrong and,
> > > if so, how I should solve it differently.
> >
> > The approach is fine, this is the only valid usage of data_offset when passed by
> > userspace. Though, you are making your live much more difficult then needed,
> > adding NV12 (contiguous) support to the jpeg encoder should be fairly easy too.
>
> It is fairly easy to add this in the driver, but the negotiation in GStreamer
> is becomes a bit cumbersome as the caps don't distinguish between
> V4L2_PIX_FMT_NV12 and V4L2_PIX_FMT_NV12M and the encoder will prefer
> V4L2_PIX_FMT_NV12M if NV12 is negotiated. Letting userspace explicitly state
> the offset of the second plane makes things easier as well. Thus, I discarded
> the option of adding NV12 contiguous support to this encoder.
>
By the way, currently encoder imports only works by pure luck. You have to fix
GStreamer video encoder, the authors (yes I'm on of them) forgot to apply the
alignment from the imported buffers, so sometimes that padding will screw-up
your image. In short, it completely ignores the incoming bytesperline, which is
a terrible idea. Guillaume Desmottes introduced a try_import() function that
update the FMT and SELECTION with the padding.
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/blob/main/subprojects/gst-plugins-good/sys/v4l2/gstv4l2object.c#L5372
For CAPTURE queues (his use case was v4l2src), this is handled by the buffer
pool already:
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/blob/main/subprojects/gst-plugins-good/sys/v4l2/gstv4l2bufferpool.c#L796
But for OUTPUT queue (sink pads), it has to be handled by the element, like
v4l2transform does today (in theory, we should also re-validate every buffer, in
case it changes, but V4L2 allocator makes the fallback impractical):
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/blob/main/subprojects/gst-plugins-good/sys/v4l2/gstv4l2transform.c#L909
But you'll see this is missing here:
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/blob/main/subprojects/gst-plugins-good/sys/v4l2/gstv4l2videoenc.c#L742
DMAbuf importation in this plugin is highly experimental, and missing a lot of
bits (and may also fail to detect some of the V4L2 API limitations). So far,
folks have been adding bits they need, the challenge in adding everything is
that most folks can't actually test it. And so you are warned now, you'll know
what to do if it fails randomly on you with a completely broken image.
I personally think its great that you fix that data_offset here, but clearly
GStreamer use of it will generally not work until we patch all drivers that
literally don't support it to fail QBUF. But that won't make it great, as
userspace won't have a clue if it could have switch NV12M -> NV12.
I think updating this function, which purpose is exactly that, to prefer
changing the format to NV12 rather then padding the same dmabuf N times through
NV12M would improve a lot the general import support.
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/blob/main/subprojects/gst-plugins-good/sys/v4l2/gstv4l2object.c#L4766
Hope this is useful information for the success of your project. In any cases,
your patches are part of the right direction, and one step further.
>
> >
> > >
> > > Michael
> > > ---
> > > .../verisilicon/rockchip_vpu2_hw_jpeg_enc.c | 24 +++++++++++++------
> > > 1 file changed, 17 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/media/platform/verisilicon/rockchip_vpu2_hw_jpeg_enc.c b/drivers/media/platform/verisilicon/rockchip_vpu2_hw_jpeg_enc.c
> > > index 8395c4d48dd0..05df7768187d 100644
> > > --- a/drivers/media/platform/verisilicon/rockchip_vpu2_hw_jpeg_enc.c
> > > +++ b/drivers/media/platform/verisilicon/rockchip_vpu2_hw_jpeg_enc.c
> > > @@ -32,6 +32,16 @@
> > >
> > > #define VEPU_JPEG_QUANT_TABLE_COUNT 16
> > >
> > > +static dma_addr_t rockchip_vpu2_plane_dma_addr(struct vb2_buffer *vb,
> > > + unsigned int plane_no)
> > > +{
> > > + struct vb2_v4l2_buffer *v4l2_buf = to_vb2_v4l2_buffer(vb);
> > > + dma_addr_t base = vb2_dma_contig_plane_dma_addr(vb, plane_no);
> > > + unsigned int offset = v4l2_buf->planes[plane_no].data_offset;
> > > +
> > > + return base + offset;
> >
> > As the offset is not being passed as a control to the HW, you need to go back to
> > the datasheet, and figure-out what is the required alignment. This aligment then
> > needs to be validated. I don't know the exact answer for you, we don't check the
> > dma address because they are all page aligned, so its not needed.
> >
> > If the alignment is not acceptable, you have to fail synchronously in
> > VIDIOC_QBUF, so that userspace knows and can fallback to copy.
>
> I will add the necessary checks and send a proper version of the patch.
>
> Thanks a lot!
>
> Michael
>
Thanks to you,
Nicolas
>
>
> >
> > > +}
> > > +
> > > static void rockchip_vpu2_set_src_img_ctrl(struct hantro_dev *vpu,
> > > struct hantro_ctx *ctx)
> > > {
> > > @@ -79,23 +89,23 @@ static void rockchip_vpu2_jpeg_enc_set_buffers(struct hantro_dev *vpu,
> > >
> > > WARN_ON(pix_fmt->num_planes > 3);
> > >
> > > - vepu_write_relaxed(vpu, vb2_dma_contig_plane_dma_addr(dst_buf, 0) +
> > > + vepu_write_relaxed(vpu, rockchip_vpu2_plane_dma_addr(dst_buf, 0) +
> > > ctx->vpu_dst_fmt->header_size,
> > > VEPU_REG_ADDR_OUTPUT_STREAM);
> > > vepu_write_relaxed(vpu, size_left, VEPU_REG_STR_BUF_LIMIT);
> > >
> > > if (pix_fmt->num_planes == 1) {
> > > - src[0] = vb2_dma_contig_plane_dma_addr(src_buf, 0);
> > > + src[0] = rockchip_vpu2_plane_dma_addr(src_buf, 0);
> > > vepu_write_relaxed(vpu, src[0], VEPU_REG_ADDR_IN_PLANE_0);
> > > } else if (pix_fmt->num_planes == 2) {
> > > - src[0] = vb2_dma_contig_plane_dma_addr(src_buf, 0);
> > > - src[1] = vb2_dma_contig_plane_dma_addr(src_buf, 1);
> > > + src[0] = rockchip_vpu2_plane_dma_addr(src_buf, 0);
> > > + src[1] = rockchip_vpu2_plane_dma_addr(src_buf, 1);
> > > vepu_write_relaxed(vpu, src[0], VEPU_REG_ADDR_IN_PLANE_0);
> > > vepu_write_relaxed(vpu, src[1], VEPU_REG_ADDR_IN_PLANE_1);
> > > } else {
> > > - src[0] = vb2_dma_contig_plane_dma_addr(src_buf, 0);
> > > - src[1] = vb2_dma_contig_plane_dma_addr(src_buf, 1);
> > > - src[2] = vb2_dma_contig_plane_dma_addr(src_buf, 2);
> > > + src[0] = rockchip_vpu2_plane_dma_addr(src_buf, 0);
> > > + src[1] = rockchip_vpu2_plane_dma_addr(src_buf, 1);
> > > + src[2] = rockchip_vpu2_plane_dma_addr(src_buf, 2);
> > > vepu_write_relaxed(vpu, src[0], VEPU_REG_ADDR_IN_PLANE_0);
> > > vepu_write_relaxed(vpu, src[1], VEPU_REG_ADDR_IN_PLANE_1);
> > > vepu_write_relaxed(vpu, src[2], VEPU_REG_ADDR_IN_PLANE_2);
> >
> >
More information about the linux-arm-kernel
mailing list