crash_kexec in oops_end() and panic()

Eric W. Biederman ebiederm at xmission.com
Wed Jun 7 09:46:27 PDT 2017


Daniel Walker <danielwa at cisco.com> writes:

> Hi,
>
> These two paths seem to be duplicating each other. We have an issue
> where we're using mtdoops to collect kernel logs on oops and panic, we
> also have a crash kernel (which also collects these logs). mtdoops
> saves logs differently for oops and panic, since oops isn't always
> fatal it schedules a write to the flash. Since panic() is always fatal
> is writes the logs immediately. In oops_end() the crash kernel runs
> immediately while still signaling an OOPS condition to mtdoops. Since
> mtdoops schedules a write to flash later, there is no later since the
> crash kernel runs immediately, we end up without getting the logs
>
> I'm wondering what the significance is to have these two paths ?
> oops_end() could just call into panic() or a modified
> panic_with_regs() then we would collapse multiple paths. There is what
> I would call a hack in kexec_should_crash() which checks if there are
> crash_kexec_post_notifiers and it runs panic() if they exist. This
> wouldn't be needed if we always called panic() . I also wonder if
> there are other things in panic() which we should be running , but
> don't get run because of these two paths.

crash_kexec_post_notifiers is a horrible hack it is broken by design and
no one should use it.

Looking at the history and it still seems valid is the point of
kexec_should_crash is so that crash_kexec could be called with the
registers at the time of the crash.

The code that is run in kexec on panic path the less well it works.
This has been a known fact for years.

Please figure out how to depend on less code running in a broken
kernel.  Trying to figure out how to run more code is not the solution
to making the kernel reliable at the time of a crash.

Eric



More information about the kexec mailing list