[PATCH v7 10/11] iommu/arm-smmu-v3: Invoke pm_runtime before hw access

Nicolin Chen nicolinc at nvidia.com
Thu May 28 16:18:58 PDT 2026


On Thu, May 28, 2026 at 10:25:11PM +0000, Pranjal Shrivastava wrote:
> On Thu, May 28, 2026 at 03:01:13PM -0700, Nicolin Chen wrote:
> > On Thu, May 28, 2026 at 09:46:33PM +0000, Pranjal Shrivastava wrote:
> > > On Thu, May 28, 2026 at 01:28:15PM -0700, Nicolin Chen wrote:
> > > > On Wed, May 27, 2026 at 10:14:06PM +0000, Pranjal Shrivastava wrote:
> > > > > TLB and CFG invalidations are
> > > > > elided if the SMMU is suspended by observing the CMDQ_PROD_STOP_FLAG via
> > > > > the arm_smmu_can_elide() helper.
> > > > 
> > > > All the arm_smmu_can_elide() call sites here would eventually elide
> > > > the commands in arm_smmu_cmdq_issue_cmdlist() that is already gated
> > > > by CMDQ_PROD_STOP_FLAG? It doesn't seem necessary to gate again?
> > > 
> > > While issue_cmdlist() would eventually elide these commands, the 
> > > can_elide() check is necessary to return early during suspension. 
> > > 
> > > This avoids unnecessary stack allocation, cmd building, and spinlock
> > > contention on the cmdq->lock for threads that are anyway about to be 
> > > elided. 
> > 
> > We aren't in the perf sensitive path.. most of those aren't going
> > to be that bad.
> > 
> > arm_smmu_cmdq_shared_lock() on the other hand is taken at step 2,
> > and the STOP flag in the same function is gated at step 1?
> 
> DMA unmaps frequently occur from atomic contexts, interrupt handlers etc
> Thee Step 1 check in issue_cmdlist() happens under local_irq_save().
> We may argue that it doesn't happen for long though..

It shouldn't IMHO. At least most of the call sites in this patch
are right before calling issue() functions, so they are merely a
few cycles away from the STOP gate in issue_cmdlist()?

The only place that might be slightly longer is the inv_range(),
if the domain->invs is really long (e.g. nesting parent for VM),
in which case, it might be plausible to add a gate. And even with
that being said, it should be add to the top of the iteration (on
invs->has_ats) rather than before submit()?

> > > By dropping these requests immediately, we significantly reduce cacheline
> > > bouncing and contention during unmap storms.
> > 
> > How significantly, so as to justify invading every command issue()
> > call site, which would be difficult to maintain? If we really need
> > an early return, it would be nicer to have a common place at least.
> 
> Eliding early is more of an early-exit from the DMA unmap paths really..
> If maintaining these high-level elision checks at 4 or 5 call sites is
> a maintenance burden, maybe we could move the logic into the issue_cmd
> macros? 

What kinda of macro? Again, if it is added to just a few cycles
right before issue_cmdlist(), it still wouldn't seem necessary.

Nicolin



More information about the linux-arm-kernel mailing list