[Patch 1/4][kernel][slimdump] Add new elf-note of type NT_NOCOREDUMP to capture slimdump

Mon Oct 3 08:03:36 EDT 2011

On Mon, Oct 03, 2011 at 03:10:43AM -0700, Eric W. Biederman wrote:
> "K.Prasad" <prasad at linux.vnet.ibm.com> writes:
> 
> > There are certain types of crashes induced by faulty hardware in which
> > capturing crashing kernel's memory (through kdump) makes no sense (or sometimes
> > dangerous).
> >
> > A case in point, is unrecoverable memory errors (resulting in fatal machine
> > check exceptions) in which reading from the faulty memory location from the
> > kexec'ed kernel will cause double fault and system reset (leaving no
> > information for the user).
> 
> It does make plenty of sense, and I capture the all of the time.
> It totally doesn't make sense to do this in the kernel when we can
> filter this from userspace just fine.
> 

It's interesting...according to Intel's Software Developer Manual
(quoting from Volume 3A, Chapter 15), the MCIP bit in IA32_MCG_STATUS
MSR behaves as described below.

"MCIP (machine check in progress) flag, bit 2 Indicates (when set)
that a machine-check exception was generated. Software can set or clear this
flag. The occurrence of a second Machine-Check Event while MCIP is set will
cause the processor to enter a shutdown state."

While in do_machine_check function, we enter the panic path (for
unrecoverable errors) much before the IA32_MCG_STATUS MSR is reset and
this is likely to dangerous.

911 void do_machine_check(struct pt_regs *regs, long error_code)
912 {
.............
................
1055         if (no_way_out && tolerant < 3)
1056                 mce_panic("Fatal machine check on current CPU", final, msg);
.............
................
1073         mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
1074 out:

It'd be interesting to know the type of memory error (as classified by
the processor) for which you're able to capture the memory dump.
Maybe a dump of the various MCE status registers (and struct mce) would
help us understand the behaviour on your system better.

> Nacked-by: "Eric W. Biederman" <ebiederm at xmission.com>
> 
> I thought we already had this discussion.  Why is this silliness coming
> back?
> 
> I especially dislike the notion of hardcoding policy in the kernel like this.
> 

The last time this was discussed in the community, the kernel was hardcoded to
prevent anybody from reading the kernel memory, while this time it is NOT.

This kernel patch is different from the last time, in that it only adds an
elf-note to denote a particular type of crash. However in the user-space, using
'cp' for instance, the entire coredump can be read from /proc/vmcore. Similarly
'makedumpfile' can be used to extract the dmesg from the crashed kernel and the
new elf-note does not interfere with the same.

Hope this addresses your concerns.

Thanks,
K.Prasad