[RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE generated crashes
K.Prasad
prasad at linux.vnet.ibm.com
Fri May 27 11:53:46 EDT 2011
On Thu, May 26, 2011 at 07:32:57PM +0200, Andi Kleen wrote:
> On Thu, May 26, 2011 at 10:53:05PM +0530, K.Prasad wrote:
> >
> > slimdump: Capture slimdump for fatal MCE generated crashes
> >
> > System crashes resulting from fatal hardware errors (such as MCE) don't need
> > all the contents from crashing-kernel's memory. Generate a new 'slimdump' that
> > retains only essential information while discarding the old memory.
>
> While this is a good idea, note there may be still poisoned lines
> in memory that haven't resulted in a machine check yet, but could
> still be fatal when read after a full crash dump for some other
> reason.
>
True, this patch does not handle the discovery of old poisoned lines/new
memory errors that may occur when inside the kdump kernel.
> So you still need
>
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=commit;h=fe61906edce9e70d02481a77a617ba1397573dce
> and
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=commit;h=cb58f049ae6709ddbab71be199390dc6852018cd
>
> in addition.
>
> -Andi
So, there could be (atleast) two ways to handle fatal MCEs in kdump
kernel:
- To disable MCE exceptions as done by the patches cited above. However
the result of a read operation on corrupted memory is unknown and the
system behaviour is undefined. We're unsure if this is a safe thing to
do.
- To disable capture of kdump (when panic is invoked from) inside kdump
kernel and simply reboot the system. Since the chance of memory error
inside kdump kernel (which runs for a very short duration) is rare, I
think this solution is preferrable.
Let me know your thoughts on this.
Thanks,
K.Prasad
More information about the kexec
mailing list