[PATCH 00/16] media: platform: rga: Add RGA3 support
Sven Püschel
s.pueschel at pengutronix.de
Fri Oct 10 01:33:21 PDT 2025
Hi Nicolas,
On 10/7/25 8:06 PM, Nicolas Dufresne wrote:
> Hi,
>
> Le mardi 07 octobre 2025 à 10:31 +0200, Sven Püschel a écrit :
>> This series adds support for the Raster Graphic Acceleration 3 (RGA3)
>> peripheral, which is included in the RK3588 SoC. Unlike the RGA2 it
>> can use the existing rockchip-iommu-v2 driver to handle iommu mappings.
>> Also the RK3588 contains two independent RGA3 cores.
> Thanks for working on this.
>
>> Only scaling and format conversions between common 8bit RGB/YUV formats
>> are implemented. Also the color space conversion is fixed to BT601F.
>> This already allows a practical usage of the RGA3.
> This seems quite limiting, can we expect an update on this, can't be that hard
> to fully implement.
Sorry, but currently there is no need to have a fully featured
implementation from our side. As the datasheet mentions that the RGA3
should do 4 or 2 pixel/cycle depending on the operation
>> This was tested on a Radxa Rock 5T. With the increased clock speeds in
>> the devicetree around 160 fps were measured when scaling and converting
> This is quite vague, I've checked the patch and you didn't extend either there.
> Is that an overclock or was it miss-configured ? Does RK implement a devfreq ?
> Should that be moved with a voltage adjustement ? Is there any thermal nearby we
> should monitor ?
This is mainly the result of a very low performance in the initial
testing. We were quite disappointed looking at 30 fps output. The
datasheet mentions the core should do 2 or 4 pixel/cycle, so we looked
if the clock speed could be increased. The TRM Part1 mentions that the
RGA3 clock uses a default divider of 2, so I've tweaked the dtsi to
avoid the clock divider and run it on the fastest clock.
But this tweaking only increased the frame rate to around 36fps. After
some brainstorming and testing we found the culprit being the
RGA3_WR_SW_OUTSTANDING_MAX value in the command. With this value maxed
out and without the clock tweaks I've got around 80fps. As the clock
increase resulted in the expected doubling of the fps and my few tests
worked, I've included it in the patch set.
I haven't done any stress testing and don't mind to remove the clock
speed adjustments from the dtsi.
>> from RGBA 480x360 to NV12 3840x2160. Without the clock speed scaling a
>> default clock division factor of 2 is used and only around 80 fps are
>> reached with one core. The v4l2-compliance tests only complain about
>> the already failing colorspace propagation:
> Did you do any more format testing to validation all supported combinations ?
> This is a tool [0] you can use to test this using GStreamer and how to use it
> [1].
Thanks for the link!
I've did some simple format conversion tests with a static test pattern.
The tests mainly converts any combination of RGB/YUV formats (hope I
didn't miss anything) to each other. Then I convert it back to rgba with
gstreamer and compare it's hash.
For scaling I've just tested one upscale, downscale and scale to a non
aligned width/height.
> [0]https://gitlab.collabora.com/mediatek/aiot/lava-test-definitions/-/tree/main/avvideocompare?ref_type=heads
> [1]https://gitlab.collabora.com/mediatek/aiot/linux/-/blob/mediatek-next/.gitlab-ci.yml?ref_type=heads#L282
>> v4l2-compliance 1.28.1, 64 bits, 64-bit time_t
>> ...
>> fail: v4l2-test-formats.cpp(923): fmt_cap.g_colorspace() !=
>> col
>> test VIDIOC_S_FMT: FAIL
>> ...
>> Total for rockchip-rga device /dev/video0: 47, Succeeded: 46, Failed: 1,
>> Warnings: 0
>>
>> v4l2-compliance 1.28.1, 64 bits, 64-bit time_t
>> ...
>> fail: v4l2-test-formats.cpp(923): fmt_cap.g_colorspace() !=
>> col
>> test VIDIOC_S_FMT: FAIL
>> ...
>> Total for rockchip-rga device /dev/video1: 47, Succeeded: 46, Failed: 1,
>> Warnings: 0
>>
>> v4l2-compliance 1.28.1, 64 bits, 64-bit time_t
>> ...
>> fail: v4l2-test-formats.cpp(923): fmt_cap.g_colorspace() !=
>> col
>> test VIDIOC_S_FMT: FAIL
>> ...
>> Total for rockchip-rga device /dev/video2: 47, Succeeded: 46, Failed: 1,
>> Warnings: 0
>>
>> Each RGA core is a separate /dev/video device. To distinguish the RGA2
>> core from the RGA3 cores the Card type is set accordingly. Combining all
>> cores into a single device and scheduling tasks to the best core might
>> be a future improvement, if it is desired by upstream to handle the
>> scheduling and selection in kernel space.
> It took me some time to understand why you spoke about multicore here. You
> forgot to say here that you add RGA3 into RGA2 driver. Some information on why
> you went that path instead of a separate driver.
Mostly as I've started by using the rga driver as a basis and just
adjusted the command stream and register values to the RGA3. I was
unsure, if I should create a separate driver.
As it didn't seem unfeasible to have the existing driver handle both
units, I've decided to add it to the existing driver to avoid code
duplication.
But looking at your comments about the wrong announcement of e.g. color
space conversion, I now think that a new driver is probably better to
avoid adding too much of the differences to the struct.
> From high level view, I don't think its a good idea to multi-plex over
> heterogeneous core. They may not even produce the exact same pixels for the same
> operation. They also don't share the same MMU, and at first glance, the use of
> rkiommu in RGA3 means it can no longer handle CPU cache (though I don't know if
> this is implemented/supported in upstream RGA2 driver).
Thanks for the insight. This gives me another reason to create a
separate driver. I'll probably also look into multiplexing the 2 RGA3
cores to only expose one RGA3 video device to userspace (the current
implementation exposes both cores individually to the userspace)
Sincerely
Sven
>> Patch 1-2 are general cleanups
>> Patch 3-12 prepare the rga driver for the RGA3
>> Patch 13 documments the RGA3 compatible value
>> Patch 14 adds the RGA3 cores to the rk3588 dtsi
>> Patch 15 increases the RGA3 core clock speeds
>> Patch 16 adds RGA3 support to the rga driver
>>
>> Signed-off-by: Sven Püschel<s.pueschel at pengutronix.de>
>> ---
>> Sven Püschel (16):
>> media: rockchip: rga: use clk_bulk api
>> media: rockchip: rga: use stride for offset calculation
>> media: rockchip: rga: align stride to 16 bytes
>> media: rockchip: rga: move hw specific parts to a dedicated struct
>> media: rockchip: rga: use card type to specify rga type
>> media: rockchip: rga: change offset to dma_addresses
>> media: rockchip: rga: support external iommus
>> media: rockchip: rga: remove size from rga_frame
>> media: rockchip: rga: remove stride from rga_frame
>> media: rockchip: rga: move rga_fmt to rga-hw.h
>> media: rockchip: rga: add iommu restore function
>> media: rockchip: rga: handle error interrupt
>> media: dt-bindings: media: rockchip-rga: add rockchip,rk3588-rga3
>> arm64: dts: rockchip: add rga3 dt nodes
>> arm64: dts: rockchip: increase rga3 clock speed
>> media: rockchip: rga: add rga3 support
>>
>> .../devicetree/bindings/media/rockchip-rga.yaml | 1 +
>> arch/arm64/boot/dts/rockchip/rk3588-base.dtsi | 50 +++
>> drivers/media/platform/rockchip/rga/Makefile | 2 +-
>> drivers/media/platform/rockchip/rga/rga-buf.c | 78 ++--
>> drivers/media/platform/rockchip/rga/rga-hw.c | 356 ++++++++++++---
>> drivers/media/platform/rockchip/rga/rga-hw.h | 15 +-
>> drivers/media/platform/rockchip/rga/rga.c | 404 ++++++-----------
>> drivers/media/platform/rockchip/rga/rga.h | 74 ++--
>> drivers/media/platform/rockchip/rga/rga3-hw.c | 490
>> +++++++++++++++++++++
>> drivers/media/platform/rockchip/rga/rga3-hw.h | 186 ++++++++
>> 10 files changed, 1246 insertions(+), 410 deletions(-)
>> ---
>> base-commit: afb100a5ea7a13d7e6937dcd3b36b19dc6cc9328
>> change-id: 20251001-spu-rga3-8a00e018b120
>>
>> Best regards,
More information about the linux-arm-kernel
mailing list