[PATCH v7 10/11] iommu/arm-smmu-v3: Invoke pm_runtime before hw access

Pranjal Shrivastava praan at google.com
Fri May 29 07:48:32 PDT 2026


On Thu, May 28, 2026 at 04:18:58PM -0700, Nicolin Chen wrote:
> On Thu, May 28, 2026 at 10:25:11PM +0000, Pranjal Shrivastava wrote:
> > On Thu, May 28, 2026 at 03:01:13PM -0700, Nicolin Chen wrote:
> > > On Thu, May 28, 2026 at 09:46:33PM +0000, Pranjal Shrivastava wrote:
> > > > On Thu, May 28, 2026 at 01:28:15PM -0700, Nicolin Chen wrote:
> > > > > On Wed, May 27, 2026 at 10:14:06PM +0000, Pranjal Shrivastava wrote:
> > > > > > TLB and CFG invalidations are
> > > > > > elided if the SMMU is suspended by observing the CMDQ_PROD_STOP_FLAG via
> > > > > > the arm_smmu_can_elide() helper.
> > > > > 
> > > > > All the arm_smmu_can_elide() call sites here would eventually elide
> > > > > the commands in arm_smmu_cmdq_issue_cmdlist() that is already gated
> > > > > by CMDQ_PROD_STOP_FLAG? It doesn't seem necessary to gate again?
> > > > 
> > > > While issue_cmdlist() would eventually elide these commands, the 
> > > > can_elide() check is necessary to return early during suspension. 
> > > > 
> > > > This avoids unnecessary stack allocation, cmd building, and spinlock
> > > > contention on the cmdq->lock for threads that are anyway about to be 
> > > > elided. 
> > > 
> > > We aren't in the perf sensitive path.. most of those aren't going
> > > to be that bad.
> > > 
> > > arm_smmu_cmdq_shared_lock() on the other hand is taken at step 2,
> > > and the STOP flag in the same function is gated at step 1?
> > 
> > DMA unmaps frequently occur from atomic contexts, interrupt handlers etc
> > Thee Step 1 check in issue_cmdlist() happens under local_irq_save().
> > We may argue that it doesn't happen for long though..
> 
> It shouldn't IMHO. At least most of the call sites in this patch
> are right before calling issue() functions, so they are merely a
> few cycles away from the STOP gate in issue_cmdlist()?
>

I agree that eliding right before calling issue_cmdlist() might seem
like an over-optimization. I guess we had this earlier because we didn't
have ellision in the CMDQ. I'll think more about it (just in case we're
missing some scenario) and try to perf it to confirm there's no big diff

Otherwise, I guess I'll drop the "early-exit" in v8..

> The only place that might be slightly longer is the inv_range(),
> if the domain->invs is really long (e.g. nesting parent for VM),
> in which case, it might be plausible to add a gate. And even with
> that being said, it should be add to the top of the iteration (on
> invs->has_ats) rather than before submit()?
> 

I agree.. but I'm thinking if we plan to remove the early exits, does it
make sense to keep this one? Ideally, we shouldn't be dealing with a
long domain->invs if we are in VMs (IOMMUFD & VFIO both get a pm_ref).
So, I guess if we're dropping elisions from everywhere it would be fine

> > > > By dropping these requests immediately, we significantly reduce cacheline
> > > > bouncing and contention during unmap storms.
> > > 
> > > How significantly, so as to justify invading every command issue()
> > > call site, which would be difficult to maintain? If we really need
> > > an early return, it would be nicer to have a common place at least.
> > 
> > Eliding early is more of an early-exit from the DMA unmap paths really..
> > If maintaining these high-level elision checks at 4 or 5 call sites is
> > a maintenance burden, maybe we could move the logic into the issue_cmd
> > macros? 
> 
> What kinda of macro? Again, if it is added to just a few cycles
> right before issue_cmdlist(), it still wouldn't seem necessary.

I was referring to the arm_smmu_cmdq_issue_cmd* macros here.
But I suppose you're right..

Thanks,
Praan



More information about the linux-arm-kernel mailing list