[PATCH 1/1] arm64: kexec: no need to do irq_chip->irq_mask if it already masked

Marc Zyngier maz at kernel.org
Wed Aug 5 04:17:38 EDT 2020


On 2020-08-05 07:31, Jason Liu wrote:
>> -----Original Message-----
>> From: Sudeep Holla <sudeep.holla at arm.com>
>> Sent: Tuesday, August 4, 2020 7:39 PM
>> To: Marc Zyngier <maz at kernel.org>
>> Cc: Jason Liu <jason.hui.liu at nxp.com>; catalin.marinas at arm.com;
>> will at kernel.org; linux-kernel at vger.kernel.org; Sudeep Holla
>> <sudeep.holla at arm.com>; linux-arm-kernel at lists.infradead.org
>> Subject: Re: [PATCH 1/1] arm64: kexec: no need to do 
>> irq_chip->irq_mask if it
>> already masked
>> 
>> On Tue, Aug 04, 2020 at 11:58:47AM +0100, Marc Zyngier wrote:
>> > On 2020-08-04 09:56, Jason Liu wrote:
>> > > No need to do the irq_chip->irq_mask() if it already masked.
>> > > BTW, unconditionally do the irq_chip->irq_mask() will also bring
>> > > issues when the irq_chip in the runtime PM suspend. Accessing
>> > > registers of the irq_chip will bring in the exceptions. For example on the
>> i.MX:
>> > >
>> > > root at imx8qmmek:~# echo c > /proc/sysrq-trigger [  177.796182] sysrq:
>> > > Trigger a crash [  177.799596] Kernel panic - not syncing: sysrq
>> > > triggered crash [  177.875616] SMP: stopping secondary CPUs [
>> > > 177.891936] Internal error: synchronous external abort: 96000210
>> > > [#1] PREEMPT SMP [  177.899429] Modules linked in: crct10dif_ce
>> > > mxc_jpeg_encdec [  177.905018] CPU: 1 PID: 944 Comm: sh Kdump:
>> > > loaded Not tainted [  177.913457] Hardware name: Freescale i.MX8QM
>> > > MEK (DT) [  177.918517] pstate: a0000085 (NzCv daIf -PAN -UAO) [
>> > > 177.923318] pc : imx_irqsteer_irq_mask+0x50/0x80 [  177.927944] lr :
>> > > imx_irqsteer_irq_mask+0x38/0x80 [  177.932561] sp : ffff800011fe3a50
>> > > [  177.935880] x29: ffff800011fe3a50 x28: ffff0008f7708e00 [
>> > > 177.941196] x27: 0000000000000000 x26: 0000000000000000 [
>> > > 177.946513] x25: ffff800011a30c80 x24: 0000000000000000 [
>> > > 177.951830] x23: ffff800011fe3af8 x22: ffff0008f24469d4 [
>> > > 177.957147] x21: ffff0008f2446880 x20: ffff0008f25f5658 [
>> > > 177.962463] x19: ffff800012611004 x18: 0000000000000001 [
>> > > 177.967780] x17: 0000000000000000 x16: 0000000000000000 [
>> > > 177.973097] x15: ffff0008f7709270 x14: 0000000060000085 [
>> > > 177.978414] x13: ffff800010177570 x12: ffff800011fe3ab0 [
>> > > 177.983730] x11: ffff80001017749c x10: 0000000000000040 [
>> > > 177.989047] x9 : ffff8000119f1c80 x8 : ffff8000119f1c78 [
>> > > 177.994364] x7 : ffff0008f46bedf8 x6 : 0000000000000000 [
>> > > 177.999681] x5 : ffff0008f46beda0 x4 : 0000000000000000 [
>> > > 178.004997] x3 : ffff0008f24469d4 x2 : ffff800012611000 [
>> > > 178.010314] x1 : 0000000000000080 x0 : 0000000000000080 [
>> > > 178.015630] Call trace:
>> > > [  178.018077]  imx_irqsteer_irq_mask+0x50/0x80 [  178.022352]
>> > > machine_crash_shutdown+0xa8/0x100 [  178.026802]
>> > > __crash_kexec+0x6c/0x118 [  178.030464]  panic+0x19c/0x324 [
>> > > 178.033524]  sysrq_handle_reboot+0x0/0x20 [  178.037537]
>> > > __handle_sysrq+0x88/0x180 [  178.041290]
>> > > write_sysrq_trigger+0x8c/0xb0 [  178.045389]
>> > > proc_reg_write+0x78/0xb0 [  178.049055]  __vfs_write+0x18/0x40 [
>> > > 178.052461]  vfs_write+0xdc/0x1c8 [  178.055779]
>> > > ksys_write+0x68/0xf0 [  178.059098]  __arm64_sys_write+0x18/0x20 [
>> > > 178.063027]  el0_svc_common.constprop.0+0x68/0x160
>> > > [  178.067821]  el0_svc_handler+0x20/0x80 [  178.071573]
>> > > el0_svc+0x8/0xc [  178.074463] Code: 93407e73 91001273 aa0003e1
>> > > 8b130053 (b9400260) [  178.080567] ---[ end trace 652333f6c6d6b05d
>> > > ]---
>> > >
>> > > Signed-off-by: Jason Liu <jason.hui.liu at nxp.com>
>> > > Cc: <stable at vger.kernel.org>
>> > > Cc: Catalin Marinas <catalin.marinas at arm.com>
>> > > Cc: Will Deacon <will at kernel.org>
>> > > Cc: Sasha Levin <sashal at kernel.org>
>> > > ---
>> > >  arch/arm64/kernel/machine_kexec.c | 2 +-
>> > >  1 file changed, 1 insertion(+), 1 deletion(-)
>> > >
>> > > diff --git a/arch/arm64/kernel/machine_kexec.c
>> > > b/arch/arm64/kernel/machine_kexec.c
>> > > index a0b144cfaea7..8ab263c733bf 100644
>> > > --- a/arch/arm64/kernel/machine_kexec.c
>> > > +++ b/arch/arm64/kernel/machine_kexec.c
>> > > @@ -236,7 +236,7 @@ static void machine_kexec_mask_interrupts(void)
>> > >  		    chip->irq_eoi)
>> > >  			chip->irq_eoi(&desc->irq_data);
>> > >
>> > > -		if (chip->irq_mask)
>> > > +		if (chip->irq_mask && !irqd_irq_masked(&desc->irq_data))
>> > >  			chip->irq_mask(&desc->irq_data);
>> > >
>> > >  		if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
>> >
>> > This is pretty dodgy. irq_mask() should be an idempotent action
>> > (masking twice must not be harmful).
>> >
>> 
>> That was my understanding too, but was not totally against adding it 
>> here.
> 
> Yes, masking twice at least a time of waste and really no need to do
> it. If you look at the common API mask_irq
> There did avoid the unnecessary twice or multiple mask. Keep in mind
> that there are many irqs, so it will
> waste time to do the things which is not necessary. So, from this
> point, IMO, this patch is fine.

Let's be serious. You are doing a *kexec*. Rebooting the entire system.
Another 10 or 1000 accesses are completely invisible here.

> 
> void mask_irq(struct irq_desc *desc)
> {
>         if (irqd_irq_masked(&desc->irq_data))
>                 return;
> 
>         if (desc->irq_data.chip->irq_mask) {
>                 desc->irq_data.chip->irq_mask(&desc->irq_data);
>                 irq_state_set_masked(desc);
>         }
> }
> 
>> 
>> > Even more, it really isn't obvious to me how this can work at all, as
>> > even if the interrupt isn't masked, the irqsteer could well be
>> > suspended.
>> >
>> 
>> Indeed, the runtime PM ops in that driver looks dodgy. Any calls to 
>> mask_irq
>> from drivers or anywhere with irqchip suspended with just blows up the
>> system.
> 
> If you look at the chip->irq_mask implementation on different 
> platforms, almost
> all with directly access the register of the irqchip including
> irqsteer. There are fine due to
> driver will use the common mask_irq API.
> 
>> 
>> > So as is, this change is just papering over a much deeper issue in
>> > your driver.
>> >
>> 
>> Thanks for confirming
> 
> No, this patch is not papering over a much deeper issue in the driver.
> This is just to make things better for the ARM64 kexec.

Yes, I'm sure it is... However:

request_irq()
<goes into suspend, panic somewhere after having turned the irqchip 
clock off>
if (chip->irq_mask && !irqd_irq_masked(&desc->irq_data))
     <explodes, as the interrupt isn't masked>

This is because the PM in the irqsteer driver is completely busted:
request_irq() should get a reference on the driver to prevent it
from being suspended. Since you don't implement it correctly, this
doesn't happen and your "improvement" doesn't help at all.

         M.
-- 
Jazz is not dead. It just smells funny...



More information about the linux-arm-kernel mailing list