[RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE generated crashes

Vivek Goyal vgoyal at redhat.com
Thu May 26 14:26:18 EDT 2011


On Thu, May 26, 2011 at 08:09:31PM +0200, Andi Kleen wrote:
> On Thu, May 26, 2011 at 01:44:47PM -0400, Vivek Goyal wrote:
> > On Thu, May 26, 2011 at 10:53:05PM +0530, K.Prasad wrote:
> > > 
> > > slimdump: Capture slimdump for fatal MCE generated crashes
> > > 
> > > System crashes resulting from fatal hardware errors (such as MCE) don't need
> > > all the contents from crashing-kernel's memory. Generate a new 'slimdump' that
> > > retains only essential information while discarding the old memory.
> > > 
> > 
> > Why to enforce zeroing out of rest of the vmcore data in kernel. Why not
> > leave it to user space. 
> 
> I think it's a good default to not do a full dump on MCE.
> It's very unlikely to be useful for anything, and will just waste
> reboot time (aka nines).

If we are just extracting and saving MCE registers from vmcore, then
reboot time does not increase. It increases only if user decides to
extract and save extra data from vmcore.

> 
> That said including the dmesg too may be a good idea.

dmesg is already part of vmcore and user space tools can easily find
it. 

I can easily imagine a default policy of a distro in user space where
in case of MCE crash, we just extract dmesg and MCE registers (from vmcore
notes section) reboot. This will be fast and will reduce the amount of code
in kernel.

IMHO, we should not introduce any additional notion of slimdump as such in
kernel. A better thing would be to just read MCE registers and export to
user space through ELF notes and then let user space automate the rest of
it.

Thanks
Vivek



More information about the kexec mailing list