[LEDE-DEV] libubox, procd: init process hangs
Felix Fietkau
nbd at nbd.name
Wed May 18 04:01:39 PDT 2016
On 2016-05-18 11:38, Mats Karrman wrote:
>
>
> On 2016-05-17 17:31, Mats Karrman wrote:
>>
>> On 2016-05-17 13:29, Felix Fietkau wrote:
>>> I just took a look at the code and uloop's processing of signals looked
>>> a bit racy to me. I've pushed a commit that makes it use signalfd if
>>> available. I also found that waitpid wasn't being retried on signal
>>> interrupt, so I added an extra check there. The changes are in libubox
>>> git, but not in OpenWrt/LEDE yet.
>>> Please test if this fixes your issue.
>>>
>>> Thanks,
>>>
>>> - Felix
>> Tried that but no immediate success, but it might have provided
>> some additional clues. Now the boot hangs early on *every* boot
>> but after logging in I found something different in the ps list.
>> There is a Broadcom utility (smd) that is called from one of the
>> start scripts (S10environment). It's purpose is to set scheduling
>> priority and cpu affinity for some of the Broadcom proprietary
>> processes, The smd program handles fork rather ugly. The
>> parent only loops until it receives SIGCHLD and then exits without
>> any wait. With the modified libubox I get a zombie smd child and
>> sleeping smd parent and S11environment (no other zombie).
>>
>> Not sure exactly how this happened but I got to think about
>> something written in the wait man page:
>>
>> """
>> If a parent process terminates, then its "zombie" children (if any)
>> are adopted by init(8), which automatically performs a wait to
>> remove the zombies.
>> """
>>
>> Is this wait really (unconditionally) implemented in procd or could
>> that be what I accomplished with the "forced timeout" patch?
>>
>> I fixed the ugly fork and got the system to boot once.
>> Then tried the original libubox with the fixed smd program but
>> this was not enough to get things working (25 reboots to hang).
>>
>> Now I'm running reboot tests with your new libubox and fixed smd...
> More than 250 reboots without problem :)
>
> Clearly the smd program is broken, but still it doesn't feel good that it
> manages to hang the init process. Considering that timing is involved
> it's difficult to make any certain conclusions but it seems like having
> uloop epoll_wait to time out occasionally isn't such a bad idea?
I agree, that definitely needs fixing. What kernel are you using?
- Felix
More information about the Lede-dev
mailing list