[PATCH] iommu/arm-smmu-v3: Allocate cmdq_batch on the heap

Pranjal Shrivastava praan at google.com
Thu Mar 12 17:06:11 PDT 2026


On Thu, Mar 12, 2026 at 03:50:19PM -0700, Nicolin Chen wrote:
> On Fri, Mar 13, 2026 at 02:24:09AM +0800, Cheng-Yang Chou wrote:
> > On Wed, Mar 11, 2026 at 02:22:50PM +0000, Pranjal Shrivastava wrote:
> > > IMO, if we really want to address these, instead of kmalloc, we could
> > > potentially consider some pre-allocated per-CPU buffers (that's a lot of
> > > additional book-keeping though) to keep the data off the stack or
> > > something similar following a simple rule: The fast path must be 
> > > deterministic- no SLAB allocations and no introducing new failure points
> 
> > To resolve the stack warnings, I'm considering using per-CPU buffers in v2. 
> > Does this direction sound reasonable, or would you prefer to keep it as-is
> > to avoid the added complexity?
> 
> I don't think per-CPU buffers would work here either..
> 
> arm_smmu_atc_inv_master() is used in a preemptible context, while
> arm_smmu_atc_inv_domain() can be called from an irq context.
> 
> Think of a !SMP case for simplification: we only have one per-CPU
> buffer, which is not enough if an IRQ preempts the task context.

+1

> 
> Maybe having a smaller backup array on the stack that can be used
> when the heap allocation fails? Still, I don't see how to address
> it elegantly without losing some of the performance optimization.
> 

A backup array is no good either IMO, stack sizes are fixed at compile
time, the compiler will still count those bytes against the 1024-byte
limit regardless of whether the heap allocation succeeds or fails. If
the limit changes tomorrow, we'll have to adjust the "backup array size"
Furthermore, for deep call chains 'smaller' array can still be the straw
that breaks the boundary. 

As for a pre-allocated global buffer, the synchronization and bookkeeping
required to safely handle re-entrancy between task and IRQ contexts would
essentially require writing a custom allocator inside the driver.

Falling back to code paths based on transient heap availability also 
introduces non-deterministic behavior in a critical path which must 
remain reliable when the system is under pressure.

I'm still open to suggestions in case we're able to come up with a
solution that keeps the unmap paths equally performant and reliable..

Thanks,
Praan



More information about the linux-arm-kernel mailing list