[LEDE-DEV] a Procd alternative to respawn for a died daemon?

Fri Jul 1 01:31:54 PDT 2016

On 01/07/2016 09:14, Jurgen Van Ham wrote:
> Dear all,
> 
> The current version of procd can respond to a failing daemon by
> respawning it. This works for daemons that no (or not many) daemons
> rely on.
> 
> Is there an elegant way to cause a restart after critical service
> dies. With critical I mean that a mere restart would require too many
> actions from other daemons.
> 
> I can imagine a work around by replacing the "procd_set_param command
> XXXX" in an init file by "procd_set_param command
> /bin/critical_XXXX.sh" and using a file /bin/critical_XXXX.sh that
> first starts XXXX.sh and when it returns triggers a reboot. This could
> lead to a reboot during a intended shutdown.
> 
> Does it make sense to support an extra procd_set_param (e.g.,
> 'reboot') that explicitly triggers a reboot when its 'command' dies?
> This reboot could either have a time argument or a script that is
> executed after the daemon fails.
> 
> I don't see much reason for executing a script, but maybe this makes
> it possible to support more advanced recovery scenarios instead of
> restarting the device.
> 
> Do other developers have other (better) ideas how to deal with a dying
> daemon, that is more complex to restart because it requires actions
> from other daemon to deal with its restart.

it has long been my intent to add some kind of dependency tree. say you
have daemon A, B, C, D. B and C depend on A being in a good state and D
depends on C. if A dies the other 3 would be restarted over some defined
timeout. if only C dies, then only D will be restarted. I am not sure
what effects your critical process dieing has, would a reboot really be
required or would it be enough to simply restart the dependency chain ?
the problem i see is that if there is a fundamental error, the unit
would go into a neverending reboot loop. the respawn feature has some
detection logic, that will trigger if the service died too often too
quickly and not respawn after a given number of deaths. we could add the
same to a dependency tree i guess.

	John