[RFC] Kdump and memory error handling

K.Prasad prasad at linux.vnet.ibm.com
Mon May 9 13:29:35 EDT 2011


On Wed, May 04, 2011 at 10:39:14PM +0200, Andi Kleen wrote:
> > Any thoughts/suggestions?
> 
> My old attempts to solve this are
> 
> Don't dump on MCE:
> 
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/xpanic
> 

The problem we seen in avoiding a panic->crash_kexec->[coredump capture] is
that the user may not have a means to know the reason for crash, unless
the serial console is connected to capture and store the panic string.

Alternatively a 'slim' kdump (as described here:
https://lkml.org/lkml/2011/5/4/396) would not contain meaningless data from
the old memory, but inform the user about the cause of the crash. I'm
intending to post some patches with a quick implementation of it soon.

> Handle dumps of corrupted memory regresions:
> 
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/crashdump
> 

> IMHO these patches are still the right solutions for this.
> 

Like Vatsa had raised, the processor's behaviour upon reading (or any I/O
operation) the faulty memory location isn't clearly defined (to the
extent I read through System Programming Guide Part 1, Volume 3A,
Chapter 15). In such a scenario, disabling MCE for the kdump kernel (which can
potentially read the faulty memory) is making things hazy.

Thanks,
K.Prasad




More information about the kexec mailing list