[PATCH v8 11/12] iommu/arm-smmu-v3: Invoke pm_runtime before hw access

Pranjal Shrivastava praan at google.com
Tue Jun 9 03:34:51 PDT 2026


On Sun, Jun 07, 2026 at 03:22:19PM -0700, Daniel Mentz wrote:
> On Wed, Jun 3, 2026 at 11:27 PM Pranjal Shrivastava <praan at google.com> wrote:
> >
> > On Wed, Jun 03, 2026 at 01:28:19PM -0700, Daniel Mentz wrote:
> > > On Mon, Jun 1, 2026 at 2:59 PM Pranjal Shrivastava <praan at google.com> wrote:
> > > > @@ -2361,8 +2394,33 @@ static irqreturn_t arm_smmu_handle_gerror(struct arm_smmu_device *smmu)
> > > >  static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
> > > >  {
> > > >         struct arm_smmu_device *smmu = dev;
> > > > +       irqreturn_t ret;
> > > > +
> > > > +       /*
> > > > +        * Global Errors are only processed if the SMMU is active.
> > > > +        *
> > > > +        * If the STOP_FLAG is set (can_elide == true), the hardware is
> > > > +        * either already disabled or in the process of being disabled.
> > > > +        * Any errors captured during the quiesce/drain phase will be
> > > > +        * handled by the explicit arm_smmu_handle_gerror() call at the
> > > > +        * end of arm_smmu_runtime_suspend() callback. On resume, the
> > > > +        * STOP_FLAG is cleared before interrupts are re-enabled, ensuring
> > > > +        * no valid errors are missed.
> > > > +        *
> > > > +        * A lockless check is favoured here over a dynamic PM core check
> > > > +        * since the runtime_pm_get_if_active would return false during
> > > > +        * transient states like RPM_RESUMING & ignore level-triggered
> > > > +        * interrupts.
> > > > +        */
> > > > +       if (arm_smmu_cmdq_can_elide(smmu)) {
> > > > +               dev_err(smmu->dev,
> > > > +                       "Ignoring gerror interrupt because the SMMU is suspended\n");
> > > > +               return IRQ_NONE;
> > > > +       }
> > >
> > > Have you considered using arm_smmu_rpm_get() here instead?
> > > I can see two issues with the currenlty proposal:
> > >  * Returning IRQ_NONE when an interrupt is indeed active and needs to
> > > be handled. This might be interpreted as a spurious interrupt
> > >  * Nothing is preventing the suspend handler from running while
> > > arm_smmu_gerror_handler is in the middle of handling an interrupt
> > >
> > > I understand that using arm_smmu_rpm_get() also has downsides,
> > > including an unnecessary resume operation when the SMMU is already in
> > > RPM_SUSPENDING state. However, using arm_smmu_rpm_get() would make it
> > > easier to ensure correctness.
> > >
> >
> > I don't think using arm_smmu_rpm_get() here is possible..
> >
> > GERROR is registered as a hard IRQ handler, so calling rpm_get (which
> > can sleep) would be wrong.
> 
> You're right. Sorry, I missed that arm_smmu_gerror_handler is
> registered as a hard irq handler.
> 
> > Regarding the race, the STOP_FLAG is set at the very beginning of the
> > suspend sequence. If an IRQ fires after that, we return IRQ_NONE and
> > let the explicit arm_smmu_handle_gerror() call at the end of
> > runtime_suspend catch and clear it. After CMDQEN, PRIQEN, EVTQEN &
> > SMMUEN are all cleared, getting a Gerror should be treated as spurious
> >
> > That said, I understand your concerns about a real IRQ being interpreted
> > as a spurious one, and creating an IRQ storm since the gerror register
> > isn't really written. I have 2 ideas here:
> >
> > 1. We could have a "suspended" flag and check it with can_elide here:
> > arm_smmu_cmdq_can_elide() && is_suspended() to correctly return IRQ_NONE
> >
> > 2. We could explicitly disable Gerror in IRQ_CTRL write after setting
> > the CMDQ_STOP_FLAG. Even if there are Gerrors during the CMDQ drain,
> > we'll catcup to those at the end of our suspend callback.
> >
> > I'm more inclined towards 2 as it prevents potential races (execution of
> > an IRQ handler with handle_gerror calls at the end of the suspend).
> >
> > WDYT?
> 
> I'm not sure if I have a good suggestion here. Have you considered the
> following: Do not call arm_smmu_handle_gerror() from
> arm_smmu_runtime_suspend(). Instead, call disable_irq() at the end of
> the suspend handler (and enable_irq() at the beginning of the resume
> handler)?

I thought about using disable_irq(), but I think doing it at the
hardware level (IRQ_CTRL) is better.

By disabling in IRQ_CTRL and keeping the manual arm_smmu_handle_gerror()
call at the end of suspend, we ensure that we don't lose any gerror info
We catch and handle any errors that occurred during the drain/quiesce 
phase right before the power-down.

Thanks,
Praan



More information about the linux-arm-kernel mailing list