[V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

Thu Jul 30 00:48:13 PDT 2015

On Thu 30-07-15 01:45:35, 河合英宏 / KAWAI，HIDEHIRO wrote:
> Hi,
> 
> > From: Michal Hocko [mailto:mhocko at kernel.org]
> > 
> > On Wed 29-07-15 09:09:18, 河合英宏 / KAWAI，HIDEHIRO wrote:
[...]
> > > #define nmi_panic(fmt, ...)                                            \
> > >        do {                                                            \
> > >                if (atomic_cmpxchg(&panic_cpu, -1, raw_smp_processor_id()) \
> > >                    == -1)                                              \
> > >                        panic(fmt, ##__VA_ARGS__);                      \
> > >        } while (0)
> > 
> > This would allow to return from NMI too eagerly.
> 
> Yes, but what's the problem?

I believe that panic should be noreturn as much as possible and return
only when we do not have any other options. Moreover I would ask an
opposite question, what is the problem to loop in NMI on other CPUs than
the one which is performing crash_kexec? We will not save registers, so
what?

> The root cause of your case hasn't been clarified yet.
> I can't fix for an unclear issue because I don't know what's the right
> solution.
> 
> > When I was testing my
> > previous approach (on 3.0 based kernel) I had basically the same thing
> > (one NMI to process panic) and others to return. This led to a strange
> > behavior when the NMI button triggered NMI on all (hundreds) CPUs.
> 
> It's strange.  Usually, NMI caused by NMI button is routed to only CPU 0
> as an external NMI.  External NMI for CPUs other than CPU 0 are masked
> at boot time.  Does it really happen?

Could you point me to the code which does that, please? Maybe we are
missing that in our 3.0 kernel. I was quite surprised to see this
behavior as well.

> Does the problem still happen on the latest kernel?

I do not have machine accessible so I have to rely on the customer to
test and the current vanilla might be an issue.

> What kind of NMI is deliverd to each CPU?

See the log below.

> Traditionally, we should have assumed that NMI for crash dumping is
> delivered to only one cpu.  Otherwise, we should often fail to take
> a proper crash dump.

You might still get a panic on hardlockup which will happen on all CPUs
from the NMI context so we have to be able to handle panic in NMI on
many CPUs.

> It seems that your case is another problem to be solved separately.

I do not think so, quite contrary. If you want to solve the reentrancy
then other CPUs might be spinning in NMI if there is a guarantee that at
least one CPU can progress to finish crash_kexec().

> > The
> > crash kernel booted eventually but the log contained lockups when a
> > CPU waited for an IPI to the CPU which was handling the NMI panic.
> 
> Could you explain more precisely?

[  167.843761] Uhhuh. NMI received for unknown reason 3d on CPU 130.
[  167.843763] Do you have a strange power saving mode enabled?
[... Mangled output ....]
[  167.856415] Dazed and confused, but trying to continue
[  167.856428] Dazed and confused, but trying to continue
[  167.856442] Dazed and confused, but trying to continue
[...]
[  193.108440] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:4]
[...]
[  193.108586] Call Trace:
[  193.108595]  [<ffffffff8109baeb>] smp_call_function_single+0x15b/0x170
[  193.108600]  [<ffffffff8109bb4e>] smp_call_function_any+0x4e/0x110
[  193.108607]  [<ffffffffa04a332c>] get_cur_val+0xbc/0x130 [acpi_cpufreq]
[  193.108630]  [<ffffffffa04a3417>] get_cur_freq_on_cpu+0x77/0xf0 [acpi_cpufreq]
[  193.108638]  [<ffffffff8137bc37>] cpufreq_update_policy+0x97/0x140
[  193.108646]  [<ffffffffa00ca04b>] acpi_processor_notify+0x4b/0x145 [processor]
[  193.108654]  [<ffffffff812d2eca>] acpi_ev_notify_dispatch+0x61/0x77
[  193.108659]  [<ffffffff812c1785>] acpi_os_execute_deferred+0x21/0x2c
[  193.108667]  [<ffffffff8107d03c>] process_one_work+0x16c/0x350
[  193.108673]  [<ffffffff8107fd6a>] worker_thread+0x17a/0x410
[  193.108679]  [<ffffffff81084136>] kthread+0x96/0xa0
[  193.108688]  [<ffffffff8146df64>] kernel_thread_helper+0x4/0x10
[...]
[  221.068390] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:4]
[...]
[  227.991235] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 130} (detected by 56, t=15002 jiffies)
[  227.991247] sending NMI to all CPUs:
[  227.991251] NMI backtrace for cpu 0
[  229.074091] INFO: rcu_bh_state detected stalls on CPUs/tasks: { 130} (detected by 105, t=15013 jiffies)
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.0.101-0.47.55.9.8853.0.TEST-default (geeko at buildhost) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Thu May 28 08:25:11 UTC 2015 (dc083ee)
[    0.000000] Command line: root=/dev/system/lvroot resume=/dev/system/lvswap intel_idle.max_cstate=0 processor.max_cstate=0 elevator=deadline nmi_watchdog=1 console=tty0 console=ttyS1,115200 elevator=deadline sysrq=yes reset_devices irqpoll maxcpus=1 disable_cpu_apicid=0 noefi acpi_rsdp=0xba7a4014  crashkernel=1024M-:512M memmap=exactmap memmap=576K at 64K memmap=523684K at 393216K elfcorehdr=916900K memmap=32768K#3018748K memmap=3736K#3051516K memmap=262144K$3145728K

I can provide the full log but it is quite mangled. I guess the
CPU130 was the only one allowed to proceed with the panic while others
returned from the unknown NMI handling. It took a lot of time until
CPU130 managed to boot the crash kernel with soft lockups and RCU stalls
reports. CPU0 is most probably locked up waiting for CPU130 to
acknowledge the IPI which will not happen apparently.

Maybe this is not possible in the current kernels for some reason but it
tells me that returning from panic is quite fragile so I would like to
prevent from it as much as possible.

> > Anyway, I do not thing this is really necessary to solve the panic
> > reentrancy issue.
> > If the missing saved state is a real problem then it
> > should be handled separately - maybe it can be achieved without an IPI
> > and directly from the panic context if we are in NMI.
> 
> What I would like to do via this patchse is to solve race issues
> among NMI, panic() and crash_kexec().

Yes I fully support you in this ;) I just believe that spinning in NMI
vs. saving registers is a separate issue.

> So, I don't think we should fix that separately, although I would need
> to reword some descriptions and titles.

I can have them tested.

-- 
Michal Hocko
SUSE Labs