[RFC V2 PATCH 0/1] kexec: crash_kexec_post_notifiers boot option related fixes

Thu Aug 6 18:38:48 PDT 2015

> From: Eric W. Biederman [mailto:ebiederm at xmission.com]
> >> From: Eric W. Biederman [mailto:ebiederm at xmission.com]
> > [...]
> >> A specific hook for a very specific purpose when there is no other way
> >> we can consider.
> >
> > So, is kmsg_dump like feature admissible?
> >
> >> If you don't have something that generalises well into a general purpose
> >> operation that it makes sense for everyone to call you can always use
> >> the world's largest aka you can run code before the new kernel starts
> >> that is loaded with kexec_load.
> >
> > One of our purposes, notifying "I'm dying", would be achieved by purgatory
> > code provided by kexec command as I stated before.  Since the way of the
> > notification will differ from each vendor, I think we need to modify
> > the purgatory codes pluggable.  Also, I think we need some parameter
> > passing mechanism to the purgatory code.  For example, passing the panic
> > message via boot parameter to save it to SEL.  Although I'm not sure
> > we can do that (I've not investigated well yet).  Is that acceptable?
> 
> I think the address of panic message is available in crash notes.  If
> not that is very reasonable to add.

I believed the boot parameter is prepared by the 1st kernel, but
it's wrong.  The boot parameter is completely provieded kexec command.
So, passing the panic message through boot parameter will not
be feasible.  I'm not sure we can easily access to the crash notes
from purgatory, but I think it's a reasonable way to pass panic message.

> Updating the SEL from purgatory after purgatory has validated the
> checksums of the crash handling code is acceptable.
> 
> All that is desired is to run as little code as possible in a kernel
> that is known broken.  Once the checksums have verified things in
> purgatory you should be in good shape, and there is no possibility of
> relying on broken infrastructure because that code simply is not present
> in purgatory.
> 
> We already have a few early_printk style drivers in purgatory and I
> don't the code to update the SEL would be much worse.

For developers, early_printk style feature will be better solution.
For end users, however, it will not be true.  Sometimes they cannot
use a serial port for early_printk because the serial port is used
for other purpose.  Sometimes they cannot place additional machine
which receives messages from the serial port.  So we need some
plugin or enable/disable mechanism for specific purgatory code.

> On the flip side there are enough firmware bugs that I personally would
> not want to rely on firmware code running properly when the machine is
> in a known broken state, so I don't want the SEL update to be
> unconditional.

Yes, I don't also trust BMC firmware.  The most simple I/F to BMC
is KCS (Keyboard Controller Style) I/F which is accessible via
two I/O ports.  If BMC becomes insane, the state machine for the I/F
can go into infinite loop.  However, we can avoid this by introducing
proper timeout.  Of course, I think we should add some enable/disable
mechanism.

Regards,
Kawai