[PATCH 1/1] arm64: kexec: no need to do irq_chip->irq_mask if it already masked

Jason Liu jason.hui.liu at nxp.com
Thu Aug 13 02:03:13 EDT 2020



> -----Original Message-----
> From: Marc Zyngier <maz at kernel.org>
> Sent: Thursday, August 6, 2020 8:26 PM
> To: Jason Liu <jason.hui.liu at nxp.com>
> Cc: Sudeep Holla <sudeep.holla at arm.com>; catalin.marinas at arm.com;
> will at kernel.org; sashal at kernel.org; linux-kernel at vger.kernel.org;
> linux-arm-kernel at lists.infradead.org
> Subject: Re: [PATCH 1/1] arm64: kexec: no need to do irq_chip->irq_mask if it
> already masked
> 
> On 2020-08-06 11:05, Jason Liu wrote:
> >> -----Original Message-----
> >> From: Marc Zyngier <maz at kernel.org>
> 
> [...]
> 
> >> > No, this patch is not papering over a much deeper issue in the driver.
> >> > This is just to make things better for the ARM64 kexec.
> >>
> >> Yes, I'm sure it is... However:
> >>
> >> request_irq()
> >> <goes into suspend, panic somewhere after having turned the irqchip
> >> clock off> if (chip->irq_mask && !irqd_irq_masked(&desc->irq_data))
> >>      <explodes, as the interrupt isn't masked>
> >>
> >> This is because the PM in the irqsteer driver is completely busted:
> >> request_irq() should get a reference on the driver to prevent it from
> >> being suspended. Since you don't implement it correctly, this doesn't
> >> happen and your "improvement" doesn't help at all.
> >
> > The request_irq will get a reference to prevent the irqchip from being
> > suspended due to it call irq_chip_pm_get(). I am pretty sure we have
> > implemented correctly and that is also the common Linux code.
> 
> Then it seems you cannot read your own driver. At no point do you set the
> parent_device that would give you a fighting chance to get the device clocked
> and powered on. How does it work? Magic?
> 
> > In order to save power and let the irqchip enter into runtime SUSPEND
> > mode, the driver will call free_irq() When it was not used(idle). Then
> > free_irq() will decrease the reference and let the irqchip enter
> > suspend state.
> 
> The reference count on *what*? There is nothing to take a reference on. So yes,
> you understand how the core kernel works. But you don't seem to notice that
> there is no link between the irq and the device that implements the controller.

See the code, it will call irq_chip_pm_put(&desc->irq_data)

/*
 * Internal function to unregister an irqaction - used to free
 * regular and special interrupts that are part of the architecture.
 */
static struct irqaction *__free_irq(struct irq_desc *desc, void *dev_id)
{
..
        irq_chip_pm_put(&desc->irq_data);
        module_put(desc->owner);
        kfree(action->secondary);
        return action;
}

> 
> > So, when the irqchip entered suspend, which means there is no user for
> > the irqchip and all the irqs were DISABLED + MASKED.
> > Due to the runtimePM support for the irqchip, when kexec runs, it will
> > sometimes meet the situation that the irqchip is suspend due to no
> > users for it. So from either the performance(time cost) or coding
> > logic, the machine_kexec_mask_interrupts() should not do double mask
> > for the irqs which already masked.
> 
> I strongly suggest you start by fixing the damn driver first.

Our driver does not have issue at all. What to fix?

> 
> In the meantime, NAK to this patch.

Anyway, it seems don't really understand this issue and you just simply reject one reasonable fix. Sounds not good at all.

> 
>          M.
> --
> Jazz is not dead. It just smells funny...



More information about the linux-arm-kernel mailing list