[V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

Thu Jul 30 05:27:47 PDT 2015

On Thu 30-07-15 11:55:52, 河合英宏 / KAWAI，HIDEHIRO wrote:
> > From: Michal Hocko [mailto:mhocko at kernel.org]
[...]
> > Could you point me to the code which does that, please? Maybe we are
> > missing that in our 3.0 kernel. I was quite surprised to see this
> > behavior as well.
> 
> Please see the snippet below.
> 
> void setup_local_APIC(void)
> {
> ...
>         /*
>          * only the BP should see the LINT1 NMI signal, obviously.
>          */
>         if (!cpu)
>                 value = APIC_DM_NMI;
>         else
>                 value = APIC_DM_NMI | APIC_LVT_MASKED;
>         if (!lapic_is_integrated())             /* 82489DX */
>                 value |= APIC_LVT_LEVEL_TRIGGER;
>         apic_write(APIC_LVT1, value);
> 
> 
> LINT1 pins of cpus other than CPU 0 are masked here.
> However, at least on some of Hitachi servers, NMI caused by NMI
> button doesn't seem to be delivered through LINT1.  So, my `external NMI'
> word may not be correct.

I am not familiar with details here but I can tell you that this
particular code snippet is the same in our 3.0 based kernel so it seems
that the HW is indeed doing something differently.

> > You might still get a panic on hardlockup which will happen on all CPUs
> > from the NMI context so we have to be able to handle panic in NMI on
> > many CPUs.
> 
> Do you say about the case of a kerne panic while other cpus locks up
> in NMI context?  In that case, there is no way to do things needed by
> kdump procedure including saving registeres...

I am saying that watchdog_overflow_callback might trigger on more CPUs
and panic from NMI context as well. So this is not reduced to the NMI
button sends NMI to more CPUs.

Why cannot the panic() context save all the registers if we are going to
loop in NMI context? This would be imho preferable to returning from
panic IMO.

[...]
> > I can provide the full log but it is quite mangled. I guess the
> > CPU130 was the only one allowed to proceed with the panic while others
> > returned from the unknown NMI handling. It took a lot of time until
> > CPU130 managed to boot the crash kernel with soft lockups and RCU stalls
> > reports. CPU0 is most probably locked up waiting for CPU130 to
> > acknowledge the IPI which will not happen apparently.
> 
> There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know
> why CPU 130 waits so long.  I'll try to consider for a while.

Yes, I do not understand the timing here either and the fact that the
log is a complete mess in the important parts doesn't help a wee bit.

[...]

-- 
Michal Hocko
SUSE Labs