[V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

Fri Jul 31 04:23:00 PDT 2015

> From: Michal Hocko [mailto:mhocko at kernel.org]
> 
> On Thu 30-07-15 11:55:52, 河合英宏 / KAWAI，HIDEHIRO wrote:
> > > From: Michal Hocko [mailto:mhocko at kernel.org]
> [...]
> > > Could you point me to the code which does that, please? Maybe we are
> > > missing that in our 3.0 kernel. I was quite surprised to see this
> > > behavior as well.
> >
> > Please see the snippet below.
> >
> > void setup_local_APIC(void)
> > {
> > ...
> >         /*
> >          * only the BP should see the LINT1 NMI signal, obviously.
> >          */
> >         if (!cpu)
> >                 value = APIC_DM_NMI;
> >         else
> >                 value = APIC_DM_NMI | APIC_LVT_MASKED;
> >         if (!lapic_is_integrated())             /* 82489DX */
> >                 value |= APIC_LVT_LEVEL_TRIGGER;
> >         apic_write(APIC_LVT1, value);
> >
> >
> > LINT1 pins of cpus other than CPU 0 are masked here.
> > However, at least on some of Hitachi servers, NMI caused by NMI
> > button doesn't seem to be delivered through LINT1.  So, my `external NMI'
> > word may not be correct.
> 
> I am not familiar with details here but I can tell you that this
> particular code snippet is the same in our 3.0 based kernel so it seems
> that the HW is indeed doing something differently.

Yes, and it turned out my PATCH 3/3 doesn't work at all on some
hardware...

> > > You might still get a panic on hardlockup which will happen on all CPUs
> > > from the NMI context so we have to be able to handle panic in NMI on
> > > many CPUs.
> >
> > Do you say about the case of a kerne panic while other cpus locks up
> > in NMI context?  In that case, there is no way to do things needed by
> > kdump procedure including saving registeres...
> 
> I am saying that watchdog_overflow_callback might trigger on more CPUs
> and panic from NMI context as well. So this is not reduced to the NMI
> button sends NMI to more CPUs.

I understand.  So, I have to also modify watchdog_overflow_callback
to call nmi_panic().

> Why cannot the panic() context save all the registers if we are going to
> loop in NMI context? This would be imho preferable to returning from
> panic IMO.

I'm not saying we cannot save registers and do some cleanups in NMI
context.  I fell that it would introduce unneeded complexity.
Since watchdog_overflow_callback is defined as generic code,
we need to implement the preparation for kdump for other architectures.
I haven't checked which architectures support both nmi watchdog and
kdump, though.

Anyway, I came up with a simple solution for x86.  Waiting for the
timing of nmi_shootdown_cpus() in nmi_panic(), then invoke the
callback registered by nmi_shootdown_cpus().

> > > I can provide the full log but it is quite mangled. I guess the
> > > CPU130 was the only one allowed to proceed with the panic while others
> > > returned from the unknown NMI handling. It took a lot of time until
> > > CPU130 managed to boot the crash kernel with soft lockups and RCU stalls
> > > reports. CPU0 is most probably locked up waiting for CPU130 to
> > > acknowledge the IPI which will not happen apparently.
> >
> > There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know
> > why CPU 130 waits so long.  I'll try to consider for a while.
> 
> Yes, I do not understand the timing here either and the fact that the
> log is a complete mess in the important parts doesn't help a wee bit.

I'm interested in where "kernel panic -not syncing: " is.
It may give us a clue.

Regards,
Kawai