[PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
Borislav Petkov
bp at alien8.de
Mon Jan 23 09:51:30 PST 2017
Hey Tony,
a "welcome back" is in order? :-)
On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
> If the system had experienced some memory corruption, but
> recovered ... then there would be some pages sitting around
> that the old kernel had marked as POISON and stopped using.
> The kexec'd kernel doesn't know about these, so may touch that
> memory while taking a crash dump ...
Hmm, pass a list of poisoned pages to the kdump kernel so as not to
touch. Looks like there's already functionality for that:
"makedumpfile can exclude the following types of pages while copying
VMCORE to DUMPFILE, and a user can choose which type of pages will be
excluded.
- Pages filled with zero
- Cache pages
- User process data pages
- Free pages"
(there is a makedumpfile manpage somewhere)
And apparently crash knows about poisoned pages and handles them:
static int __init crash_save_vmcoreinfo_init(void)
{
...
#ifdef CONFIG_MEMORY_FAILURE
VMCOREINFO_NUMBER(PG_hwpoison);
#endif
so if that works, the kexeced kernel should know about that list.
> and then you have a broadcast machine check (on older[1] Intel CPUs
> that don't support local machine check).
Right.
> This is hard to work around. You really need all the CPUs to have set
> CR4.MCE=1 (if any didn't, then they will force a reset when they see
> the machine check). Also you need to make sure that they jump to the
> copy of do_machine_check() in the new kernel, not the old kernel.
Doesn't matter, right? The new copy is as clueless as the old one about
those MCEs.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
More information about the kexec
mailing list