watchdogs and kdump

Pádraig Brady P at draigBrady.com
Thu Oct 27 17:43:58 EDT 2011


On 10/27/2011 09:30 PM, Don Zickus wrote:
> Hi,
> 
> I was assisting a customer the other day debugging a kdump[1] problem, when we
> noticed the real problem was the hardware watchdog was firing and
> rebooting the box.
> 
> Of course, this can be inconvienant if the panic happens right before the
> watchdog is supposed to be kicked, leading to a spontaneous reboot before
> the second kernel finishes booting and loading the watchdog module.
> 
> I was trying to think of a way to solve this and thought, one way to
> minimize the problem is to kick the watchdog before we jump into the kdump
> kernel.  Another way is to disable the watchdog entirely, but that doesn't
> work on all hardware I believe.
> 
> Anyway, I was posting on the watchdog mailing list to see if anyone had any
> ideas that might help.  And if my above idea to kick the watchdog before
> jumping into the kdump kernel seems ok, then an api would need to be
> developed.
> 
> I am willing to do any coding and testing necessary, but before I did, I
> wanted help to get a direction to go in first.
> 
> Thoughts?

Seems like the appropriate thing to do is to call all the
reboot notifiers that each watchdog registers.
Since one is not doingn a full SYS_RESTART (SYS_DOWN) though,
i.e. not running through the BIOS code again,
it might be worth having a different SYS_JUMP code in notifier.h
that would allow you to kick rather than stop the watchdogs
as the reboot notifiers generally do at the moment.
I think it would be important not to stop the watchdog if possible,
given the large amount of logic that's going to be executed
after the jump.

cheers,
Pádraig.



More information about the kexec mailing list