[BUG] rkvdec-vdpu383-h264: wrong pixels at horizontal de-blocking edges y=4 and y=12
Detlev Casanova
detlev.casanova at collabora.com
Tue May 19 09:04:14 PDT 2026
Hi Simon,
Thank you for the complete analysis !
I am not able to reproduce this, but I had a similar report before.
I am using a Radxa Rock 4D for testing and development, yours and the
other report both were on a NanoPi board.
It should be the same SoC, but I'm not ruling out something specific to
that board, I'll see if I can get one.
What I don't have is a test showing that the hardware behaves properly
with the vendor driver.
Is that something you could try on the NanoPi ?
The other report also mentioned that the issue was happening 10% of the
time, but you seem to see it every time, could be nothing though.
The MPP userspace driver is really specific to VDPU383, there is no
special case I can see for the NanoPi.
We do not have documentation for this decoder, but could you check your
decoder version ? It is store in register 0 and I get 0x38321746.
Regards,
Detlev.
On 5/15/26 02:20, Simon Wright wrote:
> Hi Detlev,
>
> I'm seeing systematic pixel corruption on VDPU383 H.264 decodes on
> RK3576 (NanoPi
> R76S). The decoded luma plane is correct for rows 0–3 and row 8, but
> wrong for rows
> 4 and 12 (and the corresponding rows in every subsequent macroblock
> row). The error
> propagates to all following P-frames.
>
> I confirmed the mismatch is in the raw V4L2 CAPTURE buffer two
> independent ways:
>
> 1. GStreamer v4l2slh264dec output compared to avdec_h264 with no
> videoconvert step.
> 2. A hand-written Rust V4L2 decoder that submits only
> SPS+PPS+SCALING_MATRIX+
> DECODE_PARAMS (SLICE_PARAMS returns EINVAL on
> VIDIOC_QUERY_EXT_CTRL on this BSP,
> so the control set is the same as GStreamer's actual submission)
> — identical
> 20.3% mismatch at the identical first-diff byte. This rules out
> any GStreamer
> post-processing or control-submission effect as the cause.
>
> Hardware:
> Board: NanoPi R76S (RK3576, VDPU383)
> Kernel: Linux 7.0.1 (mainline rkvdec-vdpu383-h264.c, unmodified)
> GStreamer: 1.28.2 (with v4l2slh264dec from gst-plugins-bad)
> Content: 1920×1080 Baseline H.264, SMPTE colour bars, openh264enc
>
>
> MINIMAL REPRODUCER
> ------------------
>
> Generate a test file (any H.264 Annex-B with visible content works; I
> used openh264enc
> with SMPTE bars):
>
> gst-launch-1.0 videotestsrc num-buffers=60 pattern=smpte \
> ! video/x-raw,width=1920,height=1080,framerate=30/1 \
> ! openh264enc ! h264parse ! filesink location=test.h264
>
> Decode via HW, capture raw NV12:
>
> gst-launch-1.0 filesrc location=test.h264 num-buffers=60 \
> ! h264parse ! v4l2slh264dec ! 'video/x-raw' \
> ! filesink location=hw.raw
>
> Decode via SW, capture raw NV12:
>
> gst-launch-1.0 filesrc location=test.h264 num-buffers=60 \
> ! h264parse ! avdec_h264 ! videoconvert ! 'video/x-raw,format=NV12' \
> ! filesink location=sw.raw
>
> For a 1920×1080 NV12 frame (frame 0), compare the first 3,110,400 bytes:
>
> cmp hw.raw sw.raw
>
> Expected: identical.
> Observed: first mismatch at byte 7680 (Y plane, row=4, col=0).
>
> With SMPTE bars (white region at the top), SW Y[row=3] = 0xe9 (correct
> white-bar luma).
> HW Y[row=4] = 0xaf instead of 0xe9; HW Y[row=3] = 0xe9 (correct).
> Overall mismatch rate: 20.3% of bytes in frame 0.
>
>
> QUANTIFIED EVIDENCE (frame 0, IDR)
> -----------------------------------
>
> SW decode: Y bytes [7680..7695] = e9 e9 e9 e9 e9 e9 e9 e9 e9 e9 e9
> e9 e9 e9 e9 e9
> HW decode: Y bytes [7680..7695] = af af af af af af af af af af af
> af af af af af
> First diff: byte 7680 → Y plane row=4, col=0
>
> Error propagation:
> Frame 0 (IDR): 20.3% mismatch, first_diff = byte 7680 (Y row=4)
> Frame 1 (P): 23.0% mismatch, first_diff = byte 253 (error
> propagated to row=0)
> Frames 5–30 (P): 25–26% mismatch, stable
>
> ANALYSIS
> --------
>
> A diagnostic experiment implicates the filterd_rcb buffer (RCB index
> 6). Redirecting
> filterd_rcb buffers 6, 7, 8 to point at the output buffer produced
> 98.4% corruption
> with first diff at row=1, which indicates the hardware reads p-side
> pixel context from
> filterd_rcb (rather than from the reconstruction buffer) when applying
> horizontal
> deblocking.
>
> Based on the error pattern, our hypothesis is that filterd_rcb uses an
> 8-row circular
> index (slot = row mod 8). If so, H.264's 4-row deblocking boundaries
> within each
> 16-row macroblock row would cause a slot collision that HEVC (with
> 8-row CTU boundaries)
> does not encounter:
>
> Edge y=4: p0 from row 3 → slot 3 (zero-initialised on IDR → wrong)
> Edge y=8: p0 from row 7 → slot 7 (written before this edge is
> reached → correct)
> Edge y=12: p0 from row 11 → slot 3 (still holds row-3 data from the
> y=4 pass → wrong)
>
> This would explain why y=8 decodes correctly while y=4 and y=12 do
> not. We don't have
> hardware documentation for VDPU383, so we can't confirm whether this
> is the actual
> mechanism.
>
> We tried several register adjustments hoping to change the filterd_rcb
> update granularity:
> ctu_align_wr_en (reg027), buf_empty_en (reg009), ref strides
> (reg083–106), and
> num_views in the SPS table. None changed the corruption.
>
> Is there a known configuration difference for H.264's narrower
> deblocking edges, or a
> BSP-level fix we've missed?
>
>
> ATTACHED REPRODUCER
> -------------------
>
> The C program below (builds against GStreamer on-device, ~100 lines)
> automates the
> comparison and produces per-frame mismatch statistics:
>
> gcc -O0 -g -o h264_hw_vs_sw_dump h264_hw_vs_sw_dump.c \
> $(pkg-config --cflags --libs gstreamer-1.0 gstreamer-video-1.0
> gstreamer-app-1.0)
>
> ./h264_hw_vs_sw_dump /path/to/test.h264
>
> --- BEGIN h264_hw_vs_sw_dump.c ---
> /*
> * H.264 HW vs SW byte-level comparison via GStreamer appsink.
> *
> * Decodes one frame of an H.264 Annex-B file via two paths:
> * SW: h264parse ! avdec_h264 ! videoconvert ! NV12 appsink
> * HW: h264parse ! v4l2slh264dec ! NV12 appsink
> *
> * Reports first divergent byte, mismatch percentage, and unique Y
> values for
> * both decoders. If HW bytes differ from SW bytes, the bug is in the
> kernel
> * rkvdec-vdpu383-h264.c driver.
> *
> * Build on device:
> * gcc -O0 -g -o h264_hw_vs_sw_dump h264_hw_vs_sw_dump.c \
> * $(pkg-config --cflags --libs gstreamer-1.0
> gstreamer-video-1.0 gstreamer-app-1.0)
> */
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <stdint.h>
> #include <unistd.h>
> #include <gst/gst.h>
> #include <gst/video/video.h>
> #include <gst/app/gstappsink.h>
>
> typedef struct {
> uint8_t *data;
> int width, height;
> size_t y_size, uv_size, total;
> } DecodedFrame;
>
> static void free_frame(DecodedFrame *f) { if (f) { free(f->data);
> f->data = NULL; } }
>
> static DecodedFrame *run_pipeline(const char *pipeline_str, const char
> *label)
> {
> fprintf(stderr, "[%s] pipeline: %s\n", label, pipeline_str);
> GError *err = NULL;
> GstElement *pipeline = gst_parse_launch(pipeline_str, &err);
> if (!pipeline || err) {
> fprintf(stderr, "[%s] gst_parse_launch: %s\n", label, err ?
> err->message : "unknown");
> return NULL;
> }
> GstElement *sink = gst_bin_get_by_name(GST_BIN(pipeline), "sink");
> gst_app_sink_set_emit_signals(GST_APP_SINK(sink), FALSE);
> gst_app_sink_set_drop(GST_APP_SINK(sink), FALSE);
> gst_app_sink_set_max_buffers(GST_APP_SINK(sink), 1);
> gst_element_set_state(pipeline, GST_STATE_PLAYING);
>
> GstSample *sample = gst_app_sink_pull_sample(GST_APP_SINK(sink));
> if (!sample) {
> fprintf(stderr, "[%s] no sample\n", label);
> gst_element_set_state(pipeline, GST_STATE_NULL);
> gst_object_unref(sink); gst_object_unref(pipeline);
> return NULL;
> }
> GstBuffer *buf = gst_sample_get_buffer(sample);
> GstCaps *caps = gst_sample_get_caps(sample);
> GstVideoInfo vinfo;
> gst_video_info_from_caps(&vinfo, caps);
>
> int w = GST_VIDEO_INFO_WIDTH(&vinfo);
> int h = GST_VIDEO_INFO_HEIGHT(&vinfo);
> GstVideoFrame vframe;
> gst_video_frame_map(&vframe, &vinfo, buf, GST_MAP_READ);
>
> size_t y_size = (size_t)w * h;
> size_t uv_size = (size_t)w * (h / 2);
> DecodedFrame *frame = calloc(1, sizeof(*frame));
> frame->data = malloc(y_size + uv_size);
> frame->width = w; frame->height = h;
> frame->y_size = y_size; frame->uv_size = uv_size;
> frame->total = y_size + uv_size;
>
> uint8_t *y_src = GST_VIDEO_FRAME_PLANE_DATA(&vframe, 0);
> int y_stride = GST_VIDEO_FRAME_PLANE_STRIDE(&vframe, 0);
> for (int row = 0; row < h; row++)
> memcpy(frame->data + row * w, y_src + row * y_stride, w);
>
> uint8_t *uv_src = GST_VIDEO_FRAME_PLANE_DATA(&vframe, 1);
> int uv_stride = GST_VIDEO_FRAME_PLANE_STRIDE(&vframe, 1);
> uint8_t *uv_dst = frame->data + y_size;
> for (int row = 0; row < h / 2; row++)
> memcpy(uv_dst + row * w, uv_src + row * uv_stride, w);
>
> gst_video_frame_unmap(&vframe);
> gst_sample_unref(sample);
> gst_element_set_state(pipeline, GST_STATE_NULL);
> gst_object_unref(sink); gst_object_unref(pipeline);
> return frame;
> }
>
> static void compare_frames(DecodedFrame *sw, DecodedFrame *hw)
> {
> size_t n = sw->total < hw->total ? sw->total : hw->total;
> size_t first_diff = (size_t)-1, diffs = 0;
> for (size_t i = 0; i < n; i++) {
> if (sw->data[i] != hw->data[i]) {
> if (first_diff == (size_t)-1) first_diff = i;
> diffs++;
> }
> }
> if (!diffs) {
> fprintf(stderr, "MATCH: HW == SW (%zu bytes)\n", n);
> return;
> }
> size_t y_size = (size_t)sw->width * sw->height;
> const char *plane = first_diff < y_size ? "Y" : "UV";
> size_t off = first_diff < y_size ? first_diff : first_diff - y_size;
> fprintf(stderr, "MISMATCH: %zu/%zu bytes differ (%.1f%%)\n",
> diffs, n, 100.0*diffs/n);
> fprintf(stderr, " First diff: byte %zu -> %s plane offset %zu
> (row=%zu col=%zu)\n",
> first_diff, plane, off, off / sw->width, off % sw->width);
> fprintf(stderr, " SW[%zu..]: ", first_diff);
> for (size_t i = first_diff; i < first_diff+16 && i < n; i++)
> fprintf(stderr, "%02x ", sw->data[i]);
> fprintf(stderr, "\n HW[%zu..]: ", first_diff);
> for (size_t i = first_diff; i < first_diff+16 && i < n; i++)
> fprintf(stderr, "%02x ", hw->data[i]);
> fprintf(stderr, "\n");
> }
>
> int main(int argc, char **argv)
> {
> if (argc < 2) { fprintf(stderr, "Usage: %s <h264_annex_b>\n",
> argv[0]); return 1; }
> gst_init(NULL, NULL);
> char sw_pipe[1024], hw_pipe[1024];
> snprintf(sw_pipe, sizeof(sw_pipe),
> "filesrc location=%s ! h264parse ! avdec_h264 ! videoconvert ! "
> "video/x-raw,format=NV12 ! appsink name=sink", argv[1]);
> snprintf(hw_pipe, sizeof(hw_pipe),
> "filesrc location=%s ! h264parse ! v4l2slh264dec ! "
> "video/x-raw,format=NV12 ! appsink name=sink", argv[1]);
>
> DecodedFrame *sw = run_pipeline(sw_pipe, "SW");
> DecodedFrame *hw = run_pipeline(hw_pipe, "HW");
> if (sw && hw) compare_frames(sw, hw);
> if (sw) { free_frame(sw); free(sw); }
> if (hw) { free_frame(hw); free(hw); }
> return 0;
> }
> --- END h264_hw_vs_sw_dump.c ---
>
> Thanks,
> Simon Wright
> Symple Solutions, Dunedin, New Zealand
>
More information about the Linux-rockchip
mailing list