[BUG] rkvdec-vdpu383-h264: wrong pixels at horizontal de-blocking edges y=4 and y=12

Detlev Casanova detlev.casanova at collabora.com
Tue May 19 09:04:14 PDT 2026


Hi Simon,

Thank you for the complete analysis !

I am not able to reproduce this, but I had a similar report before.

I am using a Radxa Rock 4D for testing and development, yours and the 
other report both were on a NanoPi board.
It should be the same SoC, but I'm not ruling out something specific to 
that board, I'll see if I can get one.

What I don't have is a test showing that the hardware behaves properly 
with the vendor driver.
Is that something you could try on the NanoPi ?

The other report also mentioned that the issue was happening 10% of the 
time, but you seem to see it every time, could be nothing though.

The MPP userspace driver is really specific to VDPU383, there is no 
special case I can see for the NanoPi.

We do not have documentation for this decoder, but could you check your 
decoder version ? It is store in register 0 and I get 0x38321746.

Regards,
Detlev.

On 5/15/26 02:20, Simon Wright wrote:
> Hi Detlev,
>
> I'm seeing systematic pixel corruption on VDPU383 H.264 decodes on 
> RK3576 (NanoPi
> R76S).  The decoded luma plane is correct for rows 0–3 and row 8, but 
> wrong for rows
> 4 and 12 (and the corresponding rows in every subsequent macroblock 
> row).  The error
> propagates to all following P-frames.
>
> I confirmed the mismatch is in the raw V4L2 CAPTURE buffer two 
> independent ways:
>
>   1. GStreamer v4l2slh264dec output compared to avdec_h264 with no 
> videoconvert step.
>   2. A hand-written Rust V4L2 decoder that submits only 
> SPS+PPS+SCALING_MATRIX+
>      DECODE_PARAMS (SLICE_PARAMS returns EINVAL on 
> VIDIOC_QUERY_EXT_CTRL on this BSP,
>      so the control set is the same as GStreamer's actual submission) 
> — identical
>      20.3% mismatch at the identical first-diff byte.  This rules out 
> any GStreamer
>      post-processing or control-submission effect as the cause.
>
> Hardware:
>   Board:      NanoPi R76S (RK3576, VDPU383)
>   Kernel:     Linux 7.0.1 (mainline rkvdec-vdpu383-h264.c, unmodified)
>   GStreamer:  1.28.2 (with v4l2slh264dec from gst-plugins-bad)
>   Content:    1920×1080 Baseline H.264, SMPTE colour bars, openh264enc
>
>
> MINIMAL REPRODUCER
> ------------------
>
> Generate a test file (any H.264 Annex-B with visible content works; I 
> used openh264enc
> with SMPTE bars):
>
>   gst-launch-1.0 videotestsrc num-buffers=60 pattern=smpte \
>     ! video/x-raw,width=1920,height=1080,framerate=30/1 \
>     ! openh264enc ! h264parse ! filesink location=test.h264
>
> Decode via HW, capture raw NV12:
>
>   gst-launch-1.0 filesrc location=test.h264 num-buffers=60 \
>     ! h264parse ! v4l2slh264dec ! 'video/x-raw' \
>     ! filesink location=hw.raw
>
> Decode via SW, capture raw NV12:
>
>   gst-launch-1.0 filesrc location=test.h264 num-buffers=60 \
>     ! h264parse ! avdec_h264 ! videoconvert ! 'video/x-raw,format=NV12' \
>     ! filesink location=sw.raw
>
> For a 1920×1080 NV12 frame (frame 0), compare the first 3,110,400 bytes:
>
>   cmp hw.raw sw.raw
>
> Expected: identical.
> Observed: first mismatch at byte 7680 (Y plane, row=4, col=0).
>
> With SMPTE bars (white region at the top), SW Y[row=3] = 0xe9 (correct 
> white-bar luma).
> HW Y[row=4] = 0xaf instead of 0xe9; HW Y[row=3] = 0xe9 (correct).
> Overall mismatch rate: 20.3% of bytes in frame 0.
>
>
> QUANTIFIED EVIDENCE (frame 0, IDR)
> -----------------------------------
>
>   SW decode:  Y bytes [7680..7695] = e9 e9 e9 e9 e9 e9 e9 e9 e9 e9 e9 
> e9 e9 e9 e9 e9
>   HW decode:  Y bytes [7680..7695] = af af af af af af af af af af af 
> af af af af af
>   First diff: byte 7680 → Y plane row=4, col=0
>
> Error propagation:
>   Frame 0 (IDR):  20.3% mismatch, first_diff = byte 7680 (Y row=4)
>   Frame 1 (P):    23.0% mismatch, first_diff = byte 253 (error 
> propagated to row=0)
>   Frames 5–30 (P): 25–26% mismatch, stable
>
> ANALYSIS
> --------
>
> A diagnostic experiment implicates the filterd_rcb buffer (RCB index 
> 6).  Redirecting
> filterd_rcb buffers 6, 7, 8 to point at the output buffer produced 
> 98.4% corruption
> with first diff at row=1, which indicates the hardware reads p-side 
> pixel context from
> filterd_rcb (rather than from the reconstruction buffer) when applying 
> horizontal
> deblocking.
>
> Based on the error pattern, our hypothesis is that filterd_rcb uses an 
> 8-row circular
> index (slot = row mod 8).  If so, H.264's 4-row deblocking boundaries 
> within each
> 16-row macroblock row would cause a slot collision that HEVC (with 
> 8-row CTU boundaries)
> does not encounter:
>
>   Edge y=4:  p0 from row 3  → slot 3  (zero-initialised on IDR → wrong)
>   Edge y=8:  p0 from row 7  → slot 7  (written before this edge is 
> reached → correct)
>   Edge y=12: p0 from row 11 → slot 3  (still holds row-3 data from the 
> y=4 pass → wrong)
>
> This would explain why y=8 decodes correctly while y=4 and y=12 do 
> not.  We don't have
> hardware documentation for VDPU383, so we can't confirm whether this 
> is the actual
> mechanism.
>
> We tried several register adjustments hoping to change the filterd_rcb 
> update granularity:
> ctu_align_wr_en (reg027), buf_empty_en (reg009), ref strides 
> (reg083–106), and
> num_views in the SPS table.  None changed the corruption.
>
> Is there a known configuration difference for H.264's narrower 
> deblocking edges, or a
> BSP-level fix we've missed?
>
>
> ATTACHED REPRODUCER
> -------------------
>
> The C program below (builds against GStreamer on-device, ~100 lines) 
> automates the
> comparison and produces per-frame mismatch statistics:
>
>   gcc -O0 -g -o h264_hw_vs_sw_dump h264_hw_vs_sw_dump.c \
>       $(pkg-config --cflags --libs gstreamer-1.0 gstreamer-video-1.0 
> gstreamer-app-1.0)
>
>   ./h264_hw_vs_sw_dump /path/to/test.h264
>
> --- BEGIN h264_hw_vs_sw_dump.c ---
> /*
>  * H.264 HW vs SW byte-level comparison via GStreamer appsink.
>  *
>  * Decodes one frame of an H.264 Annex-B file via two paths:
>  *   SW:  h264parse ! avdec_h264 ! videoconvert ! NV12 appsink
>  *   HW:  h264parse ! v4l2slh264dec             ! NV12 appsink
>  *
>  * Reports first divergent byte, mismatch percentage, and unique Y 
> values for
>  * both decoders.  If HW bytes differ from SW bytes, the bug is in the 
> kernel
>  * rkvdec-vdpu383-h264.c driver.
>  *
>  * Build on device:
>  *   gcc -O0 -g -o h264_hw_vs_sw_dump h264_hw_vs_sw_dump.c \
>  *       $(pkg-config --cflags --libs gstreamer-1.0 
> gstreamer-video-1.0 gstreamer-app-1.0)
>  */
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <stdint.h>
> #include <unistd.h>
> #include <gst/gst.h>
> #include <gst/video/video.h>
> #include <gst/app/gstappsink.h>
>
> typedef struct {
>     uint8_t *data;
>     int      width, height;
>     size_t   y_size, uv_size, total;
> } DecodedFrame;
>
> static void free_frame(DecodedFrame *f) { if (f) { free(f->data); 
> f->data = NULL; } }
>
> static DecodedFrame *run_pipeline(const char *pipeline_str, const char 
> *label)
> {
>     fprintf(stderr, "[%s] pipeline: %s\n", label, pipeline_str);
>     GError *err = NULL;
>     GstElement *pipeline = gst_parse_launch(pipeline_str, &err);
>     if (!pipeline || err) {
>         fprintf(stderr, "[%s] gst_parse_launch: %s\n", label, err ? 
> err->message : "unknown");
>         return NULL;
>     }
>     GstElement *sink = gst_bin_get_by_name(GST_BIN(pipeline), "sink");
>     gst_app_sink_set_emit_signals(GST_APP_SINK(sink), FALSE);
>     gst_app_sink_set_drop(GST_APP_SINK(sink), FALSE);
>     gst_app_sink_set_max_buffers(GST_APP_SINK(sink), 1);
>     gst_element_set_state(pipeline, GST_STATE_PLAYING);
>
>     GstSample *sample = gst_app_sink_pull_sample(GST_APP_SINK(sink));
>     if (!sample) {
>         fprintf(stderr, "[%s] no sample\n", label);
>         gst_element_set_state(pipeline, GST_STATE_NULL);
>         gst_object_unref(sink); gst_object_unref(pipeline);
>         return NULL;
>     }
>     GstBuffer *buf  = gst_sample_get_buffer(sample);
>     GstCaps   *caps = gst_sample_get_caps(sample);
>     GstVideoInfo vinfo;
>     gst_video_info_from_caps(&vinfo, caps);
>
>     int w = GST_VIDEO_INFO_WIDTH(&vinfo);
>     int h = GST_VIDEO_INFO_HEIGHT(&vinfo);
>     GstVideoFrame vframe;
>     gst_video_frame_map(&vframe, &vinfo, buf, GST_MAP_READ);
>
>     size_t y_size  = (size_t)w * h;
>     size_t uv_size = (size_t)w * (h / 2);
>     DecodedFrame *frame = calloc(1, sizeof(*frame));
>     frame->data  = malloc(y_size + uv_size);
>     frame->width = w; frame->height = h;
>     frame->y_size = y_size; frame->uv_size = uv_size;
>     frame->total = y_size + uv_size;
>
>     uint8_t *y_src = GST_VIDEO_FRAME_PLANE_DATA(&vframe, 0);
>     int y_stride   = GST_VIDEO_FRAME_PLANE_STRIDE(&vframe, 0);
>     for (int row = 0; row < h; row++)
>         memcpy(frame->data + row * w, y_src + row * y_stride, w);
>
>     uint8_t *uv_src = GST_VIDEO_FRAME_PLANE_DATA(&vframe, 1);
>     int uv_stride   = GST_VIDEO_FRAME_PLANE_STRIDE(&vframe, 1);
>     uint8_t *uv_dst = frame->data + y_size;
>     for (int row = 0; row < h / 2; row++)
>         memcpy(uv_dst + row * w, uv_src + row * uv_stride, w);
>
>     gst_video_frame_unmap(&vframe);
>     gst_sample_unref(sample);
>     gst_element_set_state(pipeline, GST_STATE_NULL);
>     gst_object_unref(sink); gst_object_unref(pipeline);
>     return frame;
> }
>
> static void compare_frames(DecodedFrame *sw, DecodedFrame *hw)
> {
>     size_t n = sw->total < hw->total ? sw->total : hw->total;
>     size_t first_diff = (size_t)-1, diffs = 0;
>     for (size_t i = 0; i < n; i++) {
>         if (sw->data[i] != hw->data[i]) {
>             if (first_diff == (size_t)-1) first_diff = i;
>             diffs++;
>         }
>     }
>     if (!diffs) {
>         fprintf(stderr, "MATCH: HW == SW (%zu bytes)\n", n);
>         return;
>     }
>     size_t y_size  = (size_t)sw->width * sw->height;
>     const char *plane = first_diff < y_size ? "Y" : "UV";
>     size_t off = first_diff < y_size ? first_diff : first_diff - y_size;
>     fprintf(stderr, "MISMATCH: %zu/%zu bytes differ (%.1f%%)\n", 
> diffs, n, 100.0*diffs/n);
>     fprintf(stderr, "  First diff: byte %zu -> %s plane offset %zu 
> (row=%zu col=%zu)\n",
>             first_diff, plane, off, off / sw->width, off % sw->width);
>     fprintf(stderr, "  SW[%zu..]: ", first_diff);
>     for (size_t i = first_diff; i < first_diff+16 && i < n; i++)
>         fprintf(stderr, "%02x ", sw->data[i]);
>     fprintf(stderr, "\n  HW[%zu..]: ", first_diff);
>     for (size_t i = first_diff; i < first_diff+16 && i < n; i++)
>         fprintf(stderr, "%02x ", hw->data[i]);
>     fprintf(stderr, "\n");
> }
>
> int main(int argc, char **argv)
> {
>     if (argc < 2) { fprintf(stderr, "Usage: %s <h264_annex_b>\n", 
> argv[0]); return 1; }
>     gst_init(NULL, NULL);
>     char sw_pipe[1024], hw_pipe[1024];
>     snprintf(sw_pipe, sizeof(sw_pipe),
>         "filesrc location=%s ! h264parse ! avdec_h264 ! videoconvert ! "
>         "video/x-raw,format=NV12 ! appsink name=sink", argv[1]);
>     snprintf(hw_pipe, sizeof(hw_pipe),
>         "filesrc location=%s ! h264parse ! v4l2slh264dec ! "
>         "video/x-raw,format=NV12 ! appsink name=sink", argv[1]);
>
>     DecodedFrame *sw = run_pipeline(sw_pipe, "SW");
>     DecodedFrame *hw = run_pipeline(hw_pipe, "HW");
>     if (sw && hw) compare_frames(sw, hw);
>     if (sw) { free_frame(sw); free(sw); }
>     if (hw) { free_frame(hw); free(hw); }
>     return 0;
> }
> --- END h264_hw_vs_sw_dump.c ---
>
> Thanks,
> Simon Wright
> Symple Solutions, Dunedin, New Zealand
>




More information about the Linux-rockchip mailing list