[RFC PATCH 1/4] purgatory/ipmi: Support BMC watchdog timer start/stop in purgatory
河合英宏 / KAWAI,HIDEHIRO
hidehiro.kawai.ez at hitachi.com
Thu Jan 21 21:10:12 PST 2016
> A general note here. It does not appear that you implement the
> error recovery states in your state machine. If the system fails
> in the middle of doing an IPMI operation, it is likely to fail.
The reason why I din't implement the error handling is that
I think the error rate is low and it may take many seconds (but I
don't have any statistical data, that's my anticipation).
The most important thing is to start booting the 2nd kernel surely
and as soon as possible. For example, if a user uses a feature
like fence_kdump and if the execution of fence_kdump gets delayed,
the crashed host will be shot down by other host waiting for the
notification from fence_kdump.
Also, to keep the code simple is important for the reliability.
Anyway, I'll rethink whether I can implement the error handling
in simple logic or not.
> If you do this you will need to detect and abort any running
> operation. Implementing the full state machine is probably the
> best approach, it should handle this, though it is rather complex.
>
> -corey
Regards,
--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group
More information about the kexec
mailing list