[RFC PATCH 1/4] purgatory/ipmi: Support BMC watchdog timer start/stop in purgatory

河合英宏 / KAWAI,HIDEHIRO hidehiro.kawai.ez at hitachi.com
Thu Jan 21 21:10:12 PST 2016


> A general note here.  It does not appear that you implement the
> error recovery states in your state machine.  If the system fails
> in the middle of doing an IPMI operation, it is likely to fail.

The reason why I din't implement the error handling is that
I think the error rate is low and it may take many seconds (but I
don't have any statistical data, that's my anticipation).

The most important thing is to start booting the 2nd kernel surely
and as soon as possible.  For example, if a user uses a feature
like fence_kdump and if the execution of fence_kdump gets delayed,
the crashed host will be shot down by other host waiting for the
notification from fence_kdump.

Also, to keep the code simple is important for the reliability.

Anyway, I'll rethink whether I can implement the error handling
in simple logic or not.

> If you do this you will need to detect and abort any running
> operation.  Implementing the full state machine is probably the
> best approach, it should handle this, though it is rather complex.
> 
> -corey

Regards,
--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



More information about the kexec mailing list