[V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version

Mon Jul 18 02:02:04 PDT 2016

Hi,

On 07/15/16 at 11:50am, 河合英宏 / KAWAI，HIDEHIRO wrote:
> Hi Dave,
> 
> Thanks for your reply.
> 
> > From: 'Dave Young' [mailto:dyoung at redhat.com]
> > Sent: Wednesday, July 13, 2016 11:04 AM
> > 
> > On 07/12/16 at 02:49am, 河合英宏 / KAWAI，HIDEHIRO wrote:
> > > Hi Dave,
> > >
> > > Thanks for the comments.
> > >
> > > > From: Dave Young [mailto:dyoung at redhat.com]
> > > > Sent: Monday, July 11, 2016 5:35 PM
> > > >
> > > > On 07/05/16 at 08:33pm, Hidehiro Kawai wrote:
> > > > > This patch fixes one of the problems reported by Daniel Walker
> > > > > (https://lkml.org/lkml/2015/6/24/44).
> > > > >
> > > > > If crash_kexec_post_notifiers boot option is specified, other CPUs
> > > > > are stopped by smp_send_stop() instead of machine_crash_shutdown()
> > > > > in crash_kexec() path.  This behavior change leads two problems.
> > > > >
> > > > >  Problem 1:
> > > > >  octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs
> > > > > are  still online and try to stop their watchdog timer.  If
> > > > >  smp_send_stop() is called before octeon_generic_shutdown(),
> > > > > stopping  watchdog timer will fail because other CPUs have been
> > > > > offlined by  smp_send_stop().
> > > > >
> > > > >    panic()
> > > > >      if crash_kexec_post_notifiers == 1
> > > > >        smp_send_stop()
> > > > >        atomic_notifier_call_chain()
> > > > >        kmsg_dump()
> > > > >      crash_kexec()
> > > > >        machine_crash_shutdown()
> > > > >          octeon_generic_shutdown() // shutdown watchdog for ONLINE
> > > > > CPUs
> > > > >
> > > > >  Problem 2:
> > > > >  Most of architectures stop other CPUs in machine_crash_shutdown()
> > > > > path, and they also do something needed for kdump.  For example,
> > > > > they save registers, disable virtualization extensions, and so on.
> > > > >  However, if smp_send_stop() stops other CPUs before
> > > > > machine_crash_shutdown(), we miss those operations.
> > > > >
> > > > > How do we fix these problems?  In the first place, we should stop
> > > > > other CPUs as soon as possible when panic() was called, otherwise
> > > > > other CPUs may wipe out a clue to the cause of the failure.  So,
> > > > > we replace smp_send_stop() with more suitable one for kdump.
> > > >
> > > > We have been avoiding extra things in panic path, but unfortunately
> > > > crash_kexec_post_notifiers were added. I tend to agree the best
> > > > place for this stuff is in 2nd kernel or purgatory instead of in 1st kernel.
> > >
> > > Several months ago, I posted a patch set which writes regs to SEL,
> > > generate an event to send SNMP message, and start/stop BMC's watchdog
> > > timer in purgatory.  This feature requires BMC with KCS (Keyboard
> > > Controller Style) I/F, but the most of enterprise grade server would have it.
> > > (http://thread.gmane.org/gmane.linux.kernel.kexec/15382)
> > >
> > > Doing kmsg_dump things in purgatory wouldn't be suitable (should be
> > > done in the 2nd kernel before enabling devices and IRQs?)
> > 
> > In theory it is doable maybe do something like oldmem_kmsg_dump while /proc/vmcore initializing?
> 
> Hmm, I checked the case of using ACPI ERST as a persistent
> storage. The feature is initialized by device_initcall():
> 
> device_initcall
>   erst_init
>     pstore_register
> 
> And vmcore is initialized by fs_initcall() which is called
> before device_initcall().  We may be able to change the sequence,
> but anyway, these are done in later phase of the kernel initialization.
> So, it may get less reliable although it would be doable.

Agreed, it is just an idea, it may need more experiments if you need.

> 
> > > > As for this patch I'm not sure it is safe to replace the
> > > > smp_send_stop with the kdump friendly function. I'm also not sure if
> > > > the kdump friendly function is safe for kdump. Will glad to hear
> > > > opinions from other arch experts.
> > >
> > > This stuff depends on architectures, so I speak only about
> > > x86 (the logic doesn't change on other architectures at this time).
> > >
> > > kdump path with crash_kexec_post_notifiers disabled:
> > >  panic()
> > >    __crash_kexec()
> > >      crash_setup_regs()
> > >      crash_save_vmcoreinfo()
> > >      machine_crash_shutdown()
> > >        native_machine_crash_shutdown()
> > >          panic_smp_send_stop() /* mostly same as original
> > >                                 * kdump_nmi_shootdown_cpus()
> > >                                 */
> > >
> > > kdump path with crash_kexec_post_notifiers enabled:
> > >  panic()
> > >    panic_smp_send_stop()
> > >    __crash_kexec()
> > >      crash_setup_regs()
> > >      crash_save_vmcoreinfo()
> > >      machine_crash_shutdown()
> > >        native_machine_crash_shutdown()
> > >          panic_smp_send_stop() // do nothing
> > >
> > > The difference is that stopping other CPUs before crash_setup_regs()
> > > and crash_save_vmcoreinfo() or not.  Since crash_setup_regs() and
> > > crash_save_vmcoreinfo() just save information to some memory area,
> > > they wouldn't be affected by panic_smp_send_stop().  This means
> > > placing panic_smp_send_stop before __crash_kexec is safe.
> > >
> > > BTW, I noticed my patch breaks Xen kernel.  I'll fix it in the next
> > > version.
> > 
> > But it does breaks stuff which depends on cpu not being disabled like problem 1 you mentioned in patch log.
> 
> As I mentioned in the description of this patch, we should stop
> other CPUs ASAP to preserve current state either
> crash_kexec_post_notifiers is enabled or not.
> Then, all remaining procedures should work well
> after stopping other CPUs (but keep the CPU map online).
> 
> Vivek also mentioned similar things:
> https://lkml.org/lkml/2015/7/14/433

The implementation in this patchset is different from suggestion in above link?

I think Vivek's suggestion is a good idea, to drop smp_send_stop and do below:

stop_cpus_save_register_state;

if (!crash_kexec_post_notifiers)
	crash_kexec()
atomic_notifier_call_chain()
kmsg_dump()

I'm just commenting from code flow point of view, the detail implementation
definitely need more comments from Arch experts.

Any reason did not move the kdump friendly function to earlier point like
before previous __crash_kexec() below? 
        if (!crash_kexec_post_notifiers) {
                printk_nmi_flush_on_panic();
                __crash_kexec(NULL);
        }
> 
> > > > BTW, if one want to use crash_kexec_post_notifiers he should take
> > > > the risk of unreliable kdump. How about only call smp_send_stop in
> > > > case no crash_kexec_post_notifiers being used.
> > >
> > > Unlike panic_smp_send_stop()/kdump_nmi_shootdown_cpus(),
> > > smp_send_stop() for x86 tries to stop other CPUs with normal IPI before issuing NMI IPI.
> > > This would be because NMI IPI has a risk of deadlock.  We checked if
> > > the kdump path has a risk of deadlock in the case of NMI panic and
> > > fixed it.  But I'm not sure about normal panic path.  I agree with
> > > that use smp_send_stop if crash_kexec_post_notifiers or kdump is disabled.
> > 
> > What I mean is like below, problem 1 will not exist in this way, but kdump will be unreliable:
> > if (!crash_kexec_post_notifiers)
> > 	smp_send_stop()
> 
> Remaining procedures including atomic_notifier_call_chain and
> kmsg_dump may assume that other CPUs have stopped.  Actually,
> IPMI driver callback assumes so that.  Also, other CPUs may

Ok, if so please ignore my previous suggestion.

> change the current state while calling these callbacks and make
> the dump analysis difficult.
> 
> [snip]
> 
> Best regards,
> 
> Hidehiro Kawai
> 

Thanks
Dave