[patch] add kdump_after_notifier

Thu Aug 2 04:11:01 EDT 2007

Eric W. Biederman wrote:
> Takenori Nagano <t-nagano at ah.jp.nec.com> writes:
>> Then I gave up to merge my patch to kdb, and I tried to send another patch to
>> kexec community. I can understand his opinion, but it is very difficult to
>> modify that kdump is called from panic_notifier. Because it has a reason why
>> kdump don't use panic_notifier. So, I made this patch.
>>
>> Please do something about this problem.
> 
> Hmm.  Tricky.  These appear to be two code bases with a completely different
> philosophy on what errors are being avoided.
> 
> The kexec on panic assumption is that the kernel is broken and we better not
> touch it something horrible has gone wrong.  And this is the reason why
> kexec on panic is replacing lkcd.  Because the strong assumption results
> in more errors getting captured with less likely hood of messing up your
> system.
> 
> The kdb assumption appears to be that the kernel is mostly ok, and that there
> are just some specific thing that is wrong.

Yes, kdump and kdb have a completely different philosophy. But it's natural,
because their duties are different.

I think kdb is a supplementary debug means. kdump is not perfect, because
hardware sometimes breaks down. The probability that hardware (HDD, HBA, memory,
etc...) breaks down is very low, but it is not zero. If kdump fails taking a
dump, kdb data (process status, backtrace, log buffer, etc...) is very useful to
analyze the panic reason. kdb data is very poor in comparison with kdump, but
better than nothing.

So I request a favor of you again, please do something about this problem.

> The easiest way I can think to resolve this is for kdb to simply set
> a break point at the entry point of panic() when it initializes.  Then
> it wouldn't even need to be on the panic_list.  That approach would probably
> even give better debug information because you would not have the effects
> of bust_spinlocks to undo.
> 
> Is there some reason why kdb doesn't want to hook panic with a some
> kind of break point?

I think there is no technical reason. But panic code will be dirty if every
kernel developers adds their own hook. I think this is a reason why kdb uses
panic_notifier.

Thanks,