[BUG] rkvdec-vdpu383-h264: wrong pixels at horizontal de-blocking edges y=4 and y=12
Simon Wright
simon at symple.nz
Thu May 14 23:20:25 PDT 2026
Hi Detlev,
I'm seeing systematic pixel corruption on VDPU383 H.264 decodes on
RK3576 (NanoPi
R76S). The decoded luma plane is correct for rows 0–3 and row 8, but
wrong for rows
4 and 12 (and the corresponding rows in every subsequent macroblock
row). The error
propagates to all following P-frames.
I confirmed the mismatch is in the raw V4L2 CAPTURE buffer two
independent ways:
1. GStreamer v4l2slh264dec output compared to avdec_h264 with no
videoconvert step.
2. A hand-written Rust V4L2 decoder that submits only
SPS+PPS+SCALING_MATRIX+
DECODE_PARAMS (SLICE_PARAMS returns EINVAL on
VIDIOC_QUERY_EXT_CTRL on this BSP,
so the control set is the same as GStreamer's actual submission) —
identical
20.3% mismatch at the identical first-diff byte. This rules out
any GStreamer
post-processing or control-submission effect as the cause.
Hardware:
Board: NanoPi R76S (RK3576, VDPU383)
Kernel: Linux 7.0.1 (mainline rkvdec-vdpu383-h264.c, unmodified)
GStreamer: 1.28.2 (with v4l2slh264dec from gst-plugins-bad)
Content: 1920×1080 Baseline H.264, SMPTE colour bars, openh264enc
MINIMAL REPRODUCER
------------------
Generate a test file (any H.264 Annex-B with visible content works; I
used openh264enc
with SMPTE bars):
gst-launch-1.0 videotestsrc num-buffers=60 pattern=smpte \
! video/x-raw,width=1920,height=1080,framerate=30/1 \
! openh264enc ! h264parse ! filesink location=test.h264
Decode via HW, capture raw NV12:
gst-launch-1.0 filesrc location=test.h264 num-buffers=60 \
! h264parse ! v4l2slh264dec ! 'video/x-raw' \
! filesink location=hw.raw
Decode via SW, capture raw NV12:
gst-launch-1.0 filesrc location=test.h264 num-buffers=60 \
! h264parse ! avdec_h264 ! videoconvert ! 'video/x-raw,format=NV12' \
! filesink location=sw.raw
For a 1920×1080 NV12 frame (frame 0), compare the first 3,110,400 bytes:
cmp hw.raw sw.raw
Expected: identical.
Observed: first mismatch at byte 7680 (Y plane, row=4, col=0).
With SMPTE bars (white region at the top), SW Y[row=3] = 0xe9 (correct
white-bar luma).
HW Y[row=4] = 0xaf instead of 0xe9; HW Y[row=3] = 0xe9 (correct).
Overall mismatch rate: 20.3% of bytes in frame 0.
QUANTIFIED EVIDENCE (frame 0, IDR)
-----------------------------------
SW decode: Y bytes [7680..7695] = e9 e9 e9 e9 e9 e9 e9 e9 e9 e9 e9
e9 e9 e9 e9 e9
HW decode: Y bytes [7680..7695] = af af af af af af af af af af af
af af af af af
First diff: byte 7680 → Y plane row=4, col=0
Error propagation:
Frame 0 (IDR): 20.3% mismatch, first_diff = byte 7680 (Y row=4)
Frame 1 (P): 23.0% mismatch, first_diff = byte 253 (error
propagated to row=0)
Frames 5–30 (P): 25–26% mismatch, stable
ANALYSIS
--------
A diagnostic experiment implicates the filterd_rcb buffer (RCB index
6). Redirecting
filterd_rcb buffers 6, 7, 8 to point at the output buffer produced 98.4%
corruption
with first diff at row=1, which indicates the hardware reads p-side
pixel context from
filterd_rcb (rather than from the reconstruction buffer) when applying
horizontal
deblocking.
Based on the error pattern, our hypothesis is that filterd_rcb uses an
8-row circular
index (slot = row mod 8). If so, H.264's 4-row deblocking boundaries
within each
16-row macroblock row would cause a slot collision that HEVC (with 8-row
CTU boundaries)
does not encounter:
Edge y=4: p0 from row 3 → slot 3 (zero-initialised on IDR → wrong)
Edge y=8: p0 from row 7 → slot 7 (written before this edge is
reached → correct)
Edge y=12: p0 from row 11 → slot 3 (still holds row-3 data from the
y=4 pass → wrong)
This would explain why y=8 decodes correctly while y=4 and y=12 do not.
We don't have
hardware documentation for VDPU383, so we can't confirm whether this is
the actual
mechanism.
We tried several register adjustments hoping to change the filterd_rcb
update granularity:
ctu_align_wr_en (reg027), buf_empty_en (reg009), ref strides
(reg083–106), and
num_views in the SPS table. None changed the corruption.
Is there a known configuration difference for H.264's narrower
deblocking edges, or a
BSP-level fix we've missed?
ATTACHED REPRODUCER
-------------------
The C program below (builds against GStreamer on-device, ~100 lines)
automates the
comparison and produces per-frame mismatch statistics:
gcc -O0 -g -o h264_hw_vs_sw_dump h264_hw_vs_sw_dump.c \
$(pkg-config --cflags --libs gstreamer-1.0 gstreamer-video-1.0
gstreamer-app-1.0)
./h264_hw_vs_sw_dump /path/to/test.h264
--- BEGIN h264_hw_vs_sw_dump.c ---
/*
* H.264 HW vs SW byte-level comparison via GStreamer appsink.
*
* Decodes one frame of an H.264 Annex-B file via two paths:
* SW: h264parse ! avdec_h264 ! videoconvert ! NV12 appsink
* HW: h264parse ! v4l2slh264dec ! NV12 appsink
*
* Reports first divergent byte, mismatch percentage, and unique Y
values for
* both decoders. If HW bytes differ from SW bytes, the bug is in the
kernel
* rkvdec-vdpu383-h264.c driver.
*
* Build on device:
* gcc -O0 -g -o h264_hw_vs_sw_dump h264_hw_vs_sw_dump.c \
* $(pkg-config --cflags --libs gstreamer-1.0 gstreamer-video-1.0
gstreamer-app-1.0)
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <unistd.h>
#include <gst/gst.h>
#include <gst/video/video.h>
#include <gst/app/gstappsink.h>
typedef struct {
uint8_t *data;
int width, height;
size_t y_size, uv_size, total;
} DecodedFrame;
static void free_frame(DecodedFrame *f) { if (f) { free(f->data);
f->data = NULL; } }
static DecodedFrame *run_pipeline(const char *pipeline_str, const char
*label)
{
fprintf(stderr, "[%s] pipeline: %s\n", label, pipeline_str);
GError *err = NULL;
GstElement *pipeline = gst_parse_launch(pipeline_str, &err);
if (!pipeline || err) {
fprintf(stderr, "[%s] gst_parse_launch: %s\n", label, err ?
err->message : "unknown");
return NULL;
}
GstElement *sink = gst_bin_get_by_name(GST_BIN(pipeline), "sink");
gst_app_sink_set_emit_signals(GST_APP_SINK(sink), FALSE);
gst_app_sink_set_drop(GST_APP_SINK(sink), FALSE);
gst_app_sink_set_max_buffers(GST_APP_SINK(sink), 1);
gst_element_set_state(pipeline, GST_STATE_PLAYING);
GstSample *sample = gst_app_sink_pull_sample(GST_APP_SINK(sink));
if (!sample) {
fprintf(stderr, "[%s] no sample\n", label);
gst_element_set_state(pipeline, GST_STATE_NULL);
gst_object_unref(sink); gst_object_unref(pipeline);
return NULL;
}
GstBuffer *buf = gst_sample_get_buffer(sample);
GstCaps *caps = gst_sample_get_caps(sample);
GstVideoInfo vinfo;
gst_video_info_from_caps(&vinfo, caps);
int w = GST_VIDEO_INFO_WIDTH(&vinfo);
int h = GST_VIDEO_INFO_HEIGHT(&vinfo);
GstVideoFrame vframe;
gst_video_frame_map(&vframe, &vinfo, buf, GST_MAP_READ);
size_t y_size = (size_t)w * h;
size_t uv_size = (size_t)w * (h / 2);
DecodedFrame *frame = calloc(1, sizeof(*frame));
frame->data = malloc(y_size + uv_size);
frame->width = w; frame->height = h;
frame->y_size = y_size; frame->uv_size = uv_size;
frame->total = y_size + uv_size;
uint8_t *y_src = GST_VIDEO_FRAME_PLANE_DATA(&vframe, 0);
int y_stride = GST_VIDEO_FRAME_PLANE_STRIDE(&vframe, 0);
for (int row = 0; row < h; row++)
memcpy(frame->data + row * w, y_src + row * y_stride, w);
uint8_t *uv_src = GST_VIDEO_FRAME_PLANE_DATA(&vframe, 1);
int uv_stride = GST_VIDEO_FRAME_PLANE_STRIDE(&vframe, 1);
uint8_t *uv_dst = frame->data + y_size;
for (int row = 0; row < h / 2; row++)
memcpy(uv_dst + row * w, uv_src + row * uv_stride, w);
gst_video_frame_unmap(&vframe);
gst_sample_unref(sample);
gst_element_set_state(pipeline, GST_STATE_NULL);
gst_object_unref(sink); gst_object_unref(pipeline);
return frame;
}
static void compare_frames(DecodedFrame *sw, DecodedFrame *hw)
{
size_t n = sw->total < hw->total ? sw->total : hw->total;
size_t first_diff = (size_t)-1, diffs = 0;
for (size_t i = 0; i < n; i++) {
if (sw->data[i] != hw->data[i]) {
if (first_diff == (size_t)-1) first_diff = i;
diffs++;
}
}
if (!diffs) {
fprintf(stderr, "MATCH: HW == SW (%zu bytes)\n", n);
return;
}
size_t y_size = (size_t)sw->width * sw->height;
const char *plane = first_diff < y_size ? "Y" : "UV";
size_t off = first_diff < y_size ? first_diff : first_diff - y_size;
fprintf(stderr, "MISMATCH: %zu/%zu bytes differ (%.1f%%)\n", diffs,
n, 100.0*diffs/n);
fprintf(stderr, " First diff: byte %zu -> %s plane offset %zu
(row=%zu col=%zu)\n",
first_diff, plane, off, off / sw->width, off % sw->width);
fprintf(stderr, " SW[%zu..]: ", first_diff);
for (size_t i = first_diff; i < first_diff+16 && i < n; i++)
fprintf(stderr, "%02x ", sw->data[i]);
fprintf(stderr, "\n HW[%zu..]: ", first_diff);
for (size_t i = first_diff; i < first_diff+16 && i < n; i++)
fprintf(stderr, "%02x ", hw->data[i]);
fprintf(stderr, "\n");
}
int main(int argc, char **argv)
{
if (argc < 2) { fprintf(stderr, "Usage: %s <h264_annex_b>\n",
argv[0]); return 1; }
gst_init(NULL, NULL);
char sw_pipe[1024], hw_pipe[1024];
snprintf(sw_pipe, sizeof(sw_pipe),
"filesrc location=%s ! h264parse ! avdec_h264 ! videoconvert ! "
"video/x-raw,format=NV12 ! appsink name=sink", argv[1]);
snprintf(hw_pipe, sizeof(hw_pipe),
"filesrc location=%s ! h264parse ! v4l2slh264dec ! "
"video/x-raw,format=NV12 ! appsink name=sink", argv[1]);
DecodedFrame *sw = run_pipeline(sw_pipe, "SW");
DecodedFrame *hw = run_pipeline(hw_pipe, "HW");
if (sw && hw) compare_frames(sw, hw);
if (sw) { free_frame(sw); free(sw); }
if (hw) { free_frame(hw); free(hw); }
return 0;
}
--- END h264_hw_vs_sw_dump.c ---
Thanks,
Simon Wright
Symple Solutions, Dunedin, New Zealand
More information about the Linux-rockchip
mailing list