<div dir="ltr"><div dir="ltr"><br></div><div dir="ltr"><div dir="ltr"><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

That loop-kill-all thing should be a kind of last resort really, what's<br>

actually needed is some kind of "init 1" procd equivalent which shuts down all<br>

services in a more or less clean manner.<br>

<br></blockquote><div><br></div><div>Oddly enough, the /lib/upgrade/stage2 script has some aspect of this. It explicitly shuts down (kill -9) telnet, dropbear, and ash before looping with sigTERM, and then again with sigKILL.</div><div><br></div><div>I find it very odd that it's explicitly singling out telnet, dropbear, and ash. My OpenWRT build doesn't have any of these installed in the first place. E.g. I have OpenSSH, and it's jumping straight to kill -9 instead of sending sigTERM first like it should.</div><div><br></div><div>I imagine this is the reason why I've had my SSH sessions hang indefinitely when sysupgrading a board with dropbear.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

I'm just not sure offhand how much possible error conditions there are besides<br>

the actual image writing itself, which you cannot recover from if it dies midway.<br></blockquote><div><br></div><div>I would expect that if the image writing fails, at least one more attempt should be made before giving up. Rendering the device soft-bricked is very much not desirable...</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

No it is not. When the logic was implemented there wasn't any cgroup support<br>

in OpenWrt. Sysupgrade was introduced in 2007 when we still supported Linux<br>

2.4 on some targets. Using the freezer cgroup probably makes sense nowadays,<br>

it will however further bloat the kernel which might hurt various lower end<br>

targets, flash space wise.<br>

<br></blockquote><div>Ok, noted.</div><div><br></div><div>I suppose I should point out that I'm not personally interested in the lower end devices, but I understand where you're coming from there.</div><div><br></div><div>Perhaps a way to address this in a reliable way:</div><div><br></div><div>1) If cgroups support is detected at runtime (or conditional compilation to save even more space in the binary), procd, acting as it's role of PID 1 places all services that it creates into their own cgroup. I don't know how this interacts with procd jails, but perhaps some code from that can be adapted and reused.</div><div>1.a) I would even add that there should be a top-level cgroup that should contain all service-cgroups as nested cgroups, so that *everything* can be terminated in one fell swoop.</div><div><br></div><div>2) on sysupgrade, just prior to execvp /sbin/upgraded, procd gracefully shuts down all services that are running.</div><div>2.a) If cgroups are available, then after shutting down all services, use the cgroup freezer to terminate any services cgroups that still have active processes. </div><div>2.b) Use the global cgroup to nuke everything from orbit.</div><div><br></div><div>3) /sbin/upgraded handles terminating any remaining processes. This isn't something that should be practically handled in a shell script. Moving the logic for this into /sbin/upgraded means that the only safety check is that it not try to terminate pid1.</div><div><br></div><div>4) Now /lib/upgrade/stage2 doesn't need to worry about terminating processes, and can focus entirely on handling the ramdisk chroot logic.</div></div></div>

</div>