[patch] add kdump_after_notifier

Wed Aug 1 06:00:48 EDT 2007

Takenori Nagano <t-nagano at ah.jp.nec.com> writes:

>> No.  The problem with your patch is that it doesn't have a code
>> impact.  We need to see who is using this and why.
>
> My motivation is very simple. I want to use both kdb and kdump, but I think it
> is too weak to satisfy kexec guys. Then I brought up the example enterprise
> software. But it isn't a lie. I know some drivers which use panic_notifier.
> IMHO, they use only major distribution, and they has the workaround or they
> don't notice this problem yet. I think they will be in trouble if all
> distributions choose only kdump.

Possibly.

> BTW, I use kdb and lkcd now, but I want to use kdb and kdump. I sent a patch to
> kdb community but it was rejected. kdb maintainer Keith Owens said,

>> Both KDB and crash_kexec should be using the panic_notifier_chain, with
>> KDB having a higher priority than crash_exec.  The whole point of
>> notifier chains is to handle cases like this, so we should not be
>> adding more code to the panic routine.
>>
>> The real problem here is the way that the crash_exec code is hard coded
>> into various places instead of using notifier chains.  The same issue
>> exists in arch/ia64/kernel/mca.c because of bad coding practices from
>> kexec.

I respectfully disagree with his opinion, as using notifier chains
assumes more of the kernel works.  Although following it's argument
to it's logical conclusion we should call crash_kexec as the very
first thing inside of panic.  Given how much state something like
bust_spinlocks messes up that might not be a bad idea.

It does make adding an alternative debug mechanism in there difficult.
Does anyone know if this also affects kgdb?

> Then I gave up to merge my patch to kdb, and I tried to send another patch to
> kexec community. I can understand his opinion, but it is very difficult to
> modify that kdump is called from panic_notifier. Because it has a reason why
> kdump don't use panic_notifier. So, I made this patch.
>
> Please do something about this problem.

Hmm.  Tricky.  These appear to be two code bases with a completely different
philosophy on what errors are being avoided.

The kexec on panic assumption is that the kernel is broken and we better not
touch it something horrible has gone wrong.  And this is the reason why
kexec on panic is replacing lkcd.  Because the strong assumption results
in more errors getting captured with less likely hood of messing up your
system.

The kdb assumption appears to be that the kernel is mostly ok, and that there
are just some specific thing that is wrong.

The easiest way I can think to resolve this is for kdb to simply set
a break point at the entry point of panic() when it initializes.  Then
it wouldn't even need to be on the panic_list.  That approach would probably
even give better debug information because you would not have the effects
of bust_spinlocks to undo.

Is there some reason why kdb doesn't want to hook panic with a some
kind of break point?

Eric