[PATCH 00/16] media: platform: rga: Add RGA3 support

Nicolas Dufresne nicolas at ndufresne.ca
Fri Oct 10 06:05:56 PDT 2025


Hi Sven,

Le vendredi 10 octobre 2025 à 10:33 +0200, Sven Püschel a écrit :
> Hi Nicolas,
> 
> On 10/7/25 8:06 PM, Nicolas Dufresne wrote:
> > Hi,
> > 
> > Le mardi 07 octobre 2025 à 10:31 +0200, Sven Püschel a écrit :
> > > This series adds support for the Raster Graphic Acceleration 3 (RGA3)
> > > peripheral, which is included in the RK3588 SoC. Unlike the RGA2 it
> > > can use the existing rockchip-iommu-v2 driver to handle iommu mappings.
> > > Also the RK3588 contains two independent RGA3 cores.
> > Thanks for working on this.
> > 
> > > Only scaling and format conversions between common 8bit RGB/YUV formats
> > > are implemented. Also the color space conversion is fixed to BT601F.
> > > This already allows a practical usage of the RGA3.
> > This seems quite limiting, can we expect an update on this, can't be that hard
> > to fully implement.
> 
> Sorry, but currently there is no need to have a fully featured 
> implementation from our side. As the datasheet mentions that the RGA3 
> should do 4 or 2 pixel/cycle depending on the operation

Upon first light review, its only the colorspace (and sub-paramters) aspect that
I believe should be done properly before merging. Proper SELECTION
implementation is on the edge, I'm more on the side of this should be covered
then not to be usable in the wild.

Overall, this is the sad part of adding drivers that don't have a corresponding
spec to define the rules.

> 
> > > This was tested on a Radxa Rock 5T. With the increased clock speeds in
> > > the devicetree around 160 fps were measured when scaling and converting
> > This is quite vague, I've checked the patch and you didn't extend either there.
> > Is that an overclock or was it miss-configured ? Does RK implement a devfreq ?
> > Should that be moved with a voltage adjustement ? Is there any thermal nearby we
> > should monitor ?
> 
> This is mainly the result of a very low performance in the initial 
> testing. We were quite disappointed looking at 30 fps output. The 
> datasheet mentions the core should do 2 or 4 pixel/cycle, so we looked 
> if the clock speed could be increased. The TRM Part1 mentions that the 
> RGA3 clock uses a default divider of 2, so I've tweaked the dtsi to 
> avoid the clock divider and run it on the fastest clock.
> 
> But this tweaking only increased the frame rate to around 36fps. After 
> some brainstorming and testing we found the culprit being the 
> RGA3_WR_SW_OUTSTANDING_MAX value in the command. With this value maxed 
> out and without the clock tweaks I've got around 80fps. As the clock 
> increase resulted in the expected doubling of the fps and my few tests 
> worked, I've included it in the patch set.
> 
> I haven't done any stress testing and don't mind to remove the clock 
> speed adjustments from the dtsi.

Thanks for the input, so yes, in V2 you should stay with the clock rate used by
Rockchip in their BSP, as a safe measure. If there is useful gain needed later
for anyone use case, we can work on the clocking in isolation and find the best
approach without risking wearing out the HW faster then needed.

> 
> > > from RGBA 480x360 to NV12 3840x2160. Without the clock speed scaling a
> > > default clock division factor of 2 is used and only around 80 fps are
> > > reached with one core. The v4l2-compliance tests only complain about
> > > the already failing colorspace propagation:
> > Did you do any more format testing to validation all supported combinations ?
> > This is a tool [0] you can use to test this using GStreamer and how to use it
> > [1].
> 
> Thanks for the link!
> 
> I've did some simple format conversion tests with a static test pattern.
> 
> The tests mainly converts any combination of RGB/YUV formats (hope I 
> didn't miss anything) to each other. Then I convert it back to rgba with 
> gstreamer and compare it's hash.
> 
> For scaling I've just tested one upscale, downscale and scale to a non 
> aligned width/height.

Thanks for the feedback.

> 
> > [0]https://gitlab.collabora.com/mediatek/aiot/lava-test-definitions/-/tree/main/avvideocompare?ref_type=heads
> > [1]https://gitlab.collabora.com/mediatek/aiot/linux/-/blob/mediatek-next/.gitlab-ci.yml?ref_type=heads#L282
> > >    v4l2-compliance 1.28.1, 64 bits, 64-bit time_t
> > >    ...
> > >    		fail: v4l2-test-formats.cpp(923): fmt_cap.g_colorspace() !=
> > > col
> > >    	test VIDIOC_S_FMT: FAIL
> > >    ...
> > >    Total for rockchip-rga device /dev/video0: 47, Succeeded: 46, Failed: 1,
> > > Warnings: 0
> > > 
> > >    v4l2-compliance 1.28.1, 64 bits, 64-bit time_t
> > >    ...
> > >    		fail: v4l2-test-formats.cpp(923): fmt_cap.g_colorspace() !=
> > > col
> > >    	test VIDIOC_S_FMT: FAIL
> > >    ...
> > >    Total for rockchip-rga device /dev/video1: 47, Succeeded: 46, Failed: 1,
> > > Warnings: 0
> > > 
> > >    v4l2-compliance 1.28.1, 64 bits, 64-bit time_t
> > >    ...
> > >    		fail: v4l2-test-formats.cpp(923): fmt_cap.g_colorspace() !=
> > > col
> > >    	test VIDIOC_S_FMT: FAIL
> > >    ...
> > >    Total for rockchip-rga device /dev/video2: 47, Succeeded: 46, Failed: 1,
> > > Warnings: 0
> > > 
> > > Each RGA core is a separate /dev/video device. To distinguish the RGA2
> > > core from the RGA3 cores the Card type is set accordingly. Combining all
> > > cores into a single device and scheduling tasks to the best core might
> > > be a future improvement, if it is desired by upstream to handle the
> > > scheduling and selection in kernel space.
> > It took me some time to understand why you spoke about multicore here. You
> > forgot to say here that you add RGA3 into RGA2 driver. Some information on why
> > you went that path instead of a separate driver.
> 
> Mostly as I've started by using the rga driver as a basis and just 
> adjusted the command stream and register values to the RGA3. I was 
> unsure, if I should create a separate driver.
> As it didn't seem unfeasible to have the existing driver handle both 
> units, I've decided to add it to the existing driver to avoid code 
> duplication.
> 
> But looking at your comments about the wrong announcement of e.g. color 
> space conversion, I now think that a new driver is probably better to 
> avoid adding too much of the differences to the struct.

I've discussed this with other devs, there is clearly no single opinion about
it. But some good argument came in in favour of your approach. Code duplication
is one (in fact, there is a lot of boiler plate code in all V4L2 drivers), but
the main thing is that it brings some maintenance to this close to abandoned
driver.

Though it means in v2, you have to push a little bit harder so the format, frame
size, try_format is properly adapted per hardware variants. We try and keep this
information as static as possible, using constant C structure to describe it.

> 
> >  From high level view, I don't think its a good idea to multi-plex over
> > heterogeneous core. They may not even produce the exact same pixels for the same
> > operation. They also don't share the same MMU, and at first glance, the use of
> > rkiommu in RGA3 means it can no longer handle CPU cache (though I don't know if
> > this is implemented/supported in upstream RGA2 driver).
> 
> Thanks for the insight. This gives me another reason to create a 
> separate driver. I'll probably also look into multiplexing the 2 RGA3 
> cores to only expose one RGA3 video device to userspace (the current 
> implementation exposes both cores individually to the userspace)

So no, you might want to keep it like this. I didn't understood when I read this
cover later that there it wasn't just about the RGA2 and the RGA3 core, but that
there is truly 2 RGA3 cores. For now, for the two RGA3 cores, you should mimic
what Sebastian did for the multiple Hantro G1 cores [0]. If you expose each
cores as its own device, removing it later will possibly break existing drivers
and will prevent implementing an in-kernel fair scheduling later.

Detlev is already prototyping scheduling of the two rockchip decoder cores, and
hopefully that should come with helper in v4l2-m2m to register cores so you can
easily schedule them. Note that both have similar challenges around the iommu,
since the dma integration does not seem (someone correct me if I'm wrong) to
already support a device driver with two or more iommus. We also have to decide
if we want the cores in complete isolation per operation or just map everything
in all iommus.

[0] https://www.spinics.net/lists//devicetree/msg708135.html

cheers,
Nicolas

> 
> Sincerely
> 
>      Sven
> 
> > > Patch 1-2 are general cleanups
> > > Patch 3-12 prepare the rga driver for the RGA3
> > > Patch 13 documments the RGA3 compatible value
> > > Patch 14 adds the RGA3 cores to the rk3588 dtsi
> > > Patch 15 increases the RGA3 core clock speeds
> > > Patch 16 adds RGA3 support to the rga driver
> > > 
> > > Signed-off-by: Sven Püschel<s.pueschel at pengutronix.de>
> > > ---
> > > Sven Püschel (16):
> > >        media: rockchip: rga: use clk_bulk api
> > >        media: rockchip: rga: use stride for offset calculation
> > >        media: rockchip: rga: align stride to 16 bytes
> > >        media: rockchip: rga: move hw specific parts to a dedicated struct
> > >        media: rockchip: rga: use card type to specify rga type
> > >        media: rockchip: rga: change offset to dma_addresses
> > >        media: rockchip: rga: support external iommus
> > >        media: rockchip: rga: remove size from rga_frame
> > >        media: rockchip: rga: remove stride from rga_frame
> > >        media: rockchip: rga: move rga_fmt to rga-hw.h
> > >        media: rockchip: rga: add iommu restore function
> > >        media: rockchip: rga: handle error interrupt
> > >        media: dt-bindings: media: rockchip-rga: add rockchip,rk3588-rga3
> > >        arm64: dts: rockchip: add rga3 dt nodes
> > >        arm64: dts: rockchip: increase rga3 clock speed
> > >        media: rockchip: rga: add rga3 support
> > > 
> > >   .../devicetree/bindings/media/rockchip-rga.yaml    |   1 +
> > >   arch/arm64/boot/dts/rockchip/rk3588-base.dtsi      |  50 +++
> > >   drivers/media/platform/rockchip/rga/Makefile       |   2 +-
> > >   drivers/media/platform/rockchip/rga/rga-buf.c      |  78 ++--
> > >   drivers/media/platform/rockchip/rga/rga-hw.c       | 356 ++++++++++++---
> > >   drivers/media/platform/rockchip/rga/rga-hw.h       |  15 +-
> > >   drivers/media/platform/rockchip/rga/rga.c          | 404 ++++++-----------
> > >   drivers/media/platform/rockchip/rga/rga.h          |  74 ++--
> > >   drivers/media/platform/rockchip/rga/rga3-hw.c      | 490
> > > +++++++++++++++++++++
> > >   drivers/media/platform/rockchip/rga/rga3-hw.h      | 186 ++++++++
> > >   10 files changed, 1246 insertions(+), 410 deletions(-)
> > > ---
> > > base-commit: afb100a5ea7a13d7e6937dcd3b36b19dc6cc9328
> > > change-id: 20251001-spu-rga3-8a00e018b120
> > > 
> > > Best regards,
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: This is a digitally signed message part
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20251010/0cfa45ea/attachment.sig>


More information about the linux-arm-kernel mailing list