[PATCH 1/2] media: rkvdec: reduce excessive stack usage in assemble_hw_pps()
Arnd Bergmann
arnd at arndb.de
Mon Feb 2 06:09:14 PST 2026
On Mon, Feb 2, 2026, at 14:42, Nicolas Dufresne wrote:
> Le lundi 02 février 2026 à 10:47 +0100, Arnd Bergmann a écrit :
>> From: Arnd Bergmann <arnd at arndb.de>
>>
>> The rkvdec_pps had a large set of bitfields, all of which
>> as misaligned. This causes clang-21 and likely other versions to
>> produce absolutely awful object code and a warning about very
>> large stack usage, on targets without unaligned access:
>>
>> drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c:966:12: error: stack frame size (1472) exceeds limit (1280) in 'rkvdec_vp9_start' [-Werror,-Wframe-larger-than]
>
> We had already addressed and validated that on clang-21, which indicates me that
> we likely are missing an architecture (or a config) in our CI. Can you document
> which architecture, configuration and flags was affected so we can add it on our
> side ?
>
> Our media pipeline before sending to Linus and the clang builds trace are in the
> following link, in case it matters.
>
> https://gitlab.freedesktop.org/linux-media/media-committers/-/pipelines/1588731
> https://gitlab.freedesktop.org/linux-media/media-committers/-/jobs/91604655
The configuration that hit this for me was an ARMv7-M NOMMU build. I'm
doing 'randconfig' builds here, so I inevitably hit some corner cases
that all deterministic CI systems miss. I don't think that you should
add ARMv7-M here, since that would take up useful build resources
from something more important. There are no drviers/media/ actual
users on ARMv7-M, and next time it is going to be something else.
>> Part of the problem here is how all the bitfield accesses are
>> inlined into a function that already has large structures on
>> the stack.
>
> Another observation is that you had to enable ASAN to make it miss-behave on for
> loop unrolling (with complex bitfield writes). All I've obtained by visiting
> the Link: is that its armv7-a architecture.
Right, this randconfig build likely got closer to the warning
limit because of the inherent overhead in KASAN, but the problem
with the unaligned bitfields was something that I could later
reproduce without KASAN, on ARMv5 and MIPS32r2.
This is something we should fix in clang.
>> Mark set_field_order_cnt() as noinline_for_stack, and split out
>> the following accesses in assemble_hw_pps() into another noinline
>> function, both of which now using around 800 bytes of stack in the
>> same configuration.
>>
>> There is clearly still something wrong with clang here, but
>> splitting it into multiple functions reduces the risk of stack
>> overflow.
>
> We've tried really hard to avoid this noninline_for_stack just because compilers
> are buggy. I'll have a look again in case I find some ideas, but meanwhile, with
> failing architecture in the commit message:
>
> Reviewed-by: Nicolas Dufresne <nicolas.dufresne at collabora.com>
Thanks!
Arnd
More information about the linux-arm-kernel
mailing list