[RFC][PATCH] Add a sysctl option controlling kexec when MCE occurred

Eric W. Biederman ebiederm at xmission.com
Thu Dec 23 14:56:21 EST 2010


Seiji Aguchi <seiji.aguchi at hds.com> writes:

> Hi,
>
> I agree with Borislav that kexec shouldn't start at all because we can't guarantee 
> a stable system anymore when MCE is reported.

In the case of kexec on panic we can never guarantee a stable system.
But the odds are much better of executing non-corrupt code  and of
telling people you had a hardware error if you go through the kexec
on panic process.

If I read Andi's patch correctly he was suggesting to not allow any more
mces to be reported on that path.


> On the other hand, I understand there are people like Andi who want to start kexec 
> even if MCE occurred.
>
> That is why I propose adding a new option controlling kexec behaviour
> when MCE occurred.

What do you gain but not doing the kexec on panic, when you have the
system configured to take one.  We already have the big policy knobs
to enable or disable this kind of behavior.

> I don't stick to "sysctl".

I think adding a sysctl in this path or any unnecessary code will make
things less reliable.

Last time this happened to me (about a week ago).  The kexec on panic
from a ecc reported memory error worked just fine.  Aka in the real
world it seems to work.

So what is the problem you are trying to avoid, and why can't we do
something in the kernels initialization path to avoid initializing
when there is a problem?

Eric



More information about the kexec mailing list