[PATCH] iommu/arm-smmu-v3: Allocate cmdq_batch on the heap

Cheng-Yang Chou yphbchou0911 at gmail.com
Thu Mar 12 11:24:09 PDT 2026


On Wed, Mar 11, 2026 at 02:22:50PM +0000, Pranjal Shrivastava wrote:
> On Wed, Mar 11, 2026 at 05:44:44PM +0800, Cheng-Yang Chou wrote:
> > The arm_smmu_cmdq_batch structure is large and was being allocated on
> > the stack in four call sites, causing stack frame sizes to exceed the
> > 1024-byte limit:
> > 
> > - arm_smmu_atc_inv_domain: 1120 bytes
> > - arm_smmu_atc_inv_master: 1088 bytes
> > - arm_smmu_sync_cd: 1088 bytes
> > - __arm_smmu_tlb_inv_range: 1072 bytes
> > 
> > Move these allocations to the heap using kmalloc_obj() and kfree() to
> > eliminate the -Wframe-larger-than=1024 warnings and prevent potential
> > stack overflows.
> > 
> 
> Thanks for the patch. I agree that we should address these warnings, but
> moving these allocations to the heap via kmalloc_obj() in the fast path
> is problematic. Introducing heap allocation adds unnecessary latency and
> potential for allocation failure in hot paths.
> 
> So, yes, we are using a lot of stack but we're using it to do good
> things.. 
> 
> IMO, if we really want to address these, instead of kmalloc, we could
> potentially consider some pre-allocated per-CPU buffers (that's a lot of
> additional book-keeping though) to keep the data off the stack or
> something similar following a simple rule: The fast path must be 
> deterministic- no SLAB allocations and no introducing new failure points
> 
> The last thing we'd want is a graphic driver's shrinker calling
> dma-unmaps when the system is already under heavy memory pressure and 
> calling kmalloc leading to a circular dependency or allocation failure
> exactly when the system needs to perform the unmap the most.
> 
> Thanks,
> Praan

Hi Praan,

Thanks for the feedback.
I agree that kmalloc() is unsuitable for the SMMU fast path due to
potential deadlocks and the need for determinism.

To resolve the stack warnings, I'm considering using per-CPU buffers in v2. 
Does this direction sound reasonable, or would you prefer to keep it as-is
to avoid the added complexity?

-- 
Thanks,
Cheng-Yang



More information about the linux-arm-kernel mailing list