[PATCH] iommu/arm-smmu-v3: Allocate cmdq_batch on the heap
Cheng-Yang Chou
yphbchou0911 at gmail.com
Thu Mar 12 11:24:09 PDT 2026
On Wed, Mar 11, 2026 at 02:22:50PM +0000, Pranjal Shrivastava wrote:
> On Wed, Mar 11, 2026 at 05:44:44PM +0800, Cheng-Yang Chou wrote:
> > The arm_smmu_cmdq_batch structure is large and was being allocated on
> > the stack in four call sites, causing stack frame sizes to exceed the
> > 1024-byte limit:
> >
> > - arm_smmu_atc_inv_domain: 1120 bytes
> > - arm_smmu_atc_inv_master: 1088 bytes
> > - arm_smmu_sync_cd: 1088 bytes
> > - __arm_smmu_tlb_inv_range: 1072 bytes
> >
> > Move these allocations to the heap using kmalloc_obj() and kfree() to
> > eliminate the -Wframe-larger-than=1024 warnings and prevent potential
> > stack overflows.
> >
>
> Thanks for the patch. I agree that we should address these warnings, but
> moving these allocations to the heap via kmalloc_obj() in the fast path
> is problematic. Introducing heap allocation adds unnecessary latency and
> potential for allocation failure in hot paths.
>
> So, yes, we are using a lot of stack but we're using it to do good
> things..
>
> IMO, if we really want to address these, instead of kmalloc, we could
> potentially consider some pre-allocated per-CPU buffers (that's a lot of
> additional book-keeping though) to keep the data off the stack or
> something similar following a simple rule: The fast path must be
> deterministic- no SLAB allocations and no introducing new failure points
>
> The last thing we'd want is a graphic driver's shrinker calling
> dma-unmaps when the system is already under heavy memory pressure and
> calling kmalloc leading to a circular dependency or allocation failure
> exactly when the system needs to perform the unmap the most.
>
> Thanks,
> Praan
Hi Praan,
Thanks for the feedback.
I agree that kmalloc() is unsuitable for the SMMU fast path due to
potential deadlocks and the need for determinism.
To resolve the stack warnings, I'm considering using per-CPU buffers in v2.
Does this direction sound reasonable, or would you prefer to keep it as-is
to avoid the added complexity?
--
Thanks,
Cheng-Yang
More information about the linux-arm-kernel
mailing list