[LEDE-DEV] dnsmasq sometimes fails to start on 17.01.0

Gui Iribarren gui at altermundi.net
Wed Mar 29 07:48:16 PDT 2017


On 23/03/17 16:10, Gui Iribarren wrote:
> Thanks both for the replies!
> 
> I continued yeterday further debugging this, i played with this
> particular number in /e/i/dnsmasq
> (line 815 in http://pastebin.com/FV09f2jG)
> 
>     procd_add_raw_trigger "interface.*" 2000 /etc/init.d/dnsmasq reload
> 
> this number, as the following link suggests
> http://wiki.prplfoundation.org/wiki/Procd_reference#procd_add_raw_trigger.28event.2C_timeout.2C_.5Bscript.5D.29
> is the number of milliseconds that procd will wait after the trigger (in
> this case, anything related to an "interface" AFAIU) before executing
> "dnsmasq reload"
> 
> i put 60000 (60 seconds) and now the reload only happens once (after ~60
> seconds from the first "/e/i/dnsmasq boot") and dnsmasq starts without
> problem
> 
> so it looks to me like a race condition, where two "interface.*" events
> are happening one after the other, triggering two consecutive reloads,
> the first reload doesn't finish its work before the second reload comes,
> and the second reload kills the first reload, and suicides itself for
> some reason.
> 
> setting a long raw_trigger timeout works around the problem because the
> "interface.*" events happen all inside the 60 second window, and procd
> runs "/e/i/dnsmasq reload" only once
> 
> i can imagine that few people came across this since dnsmasq normally
> takes less than 2 seconds to start, and so the first reload is already
> done starting when the second reload is triggered.
> 
> my added "sleeps" help reproduce the issue by artificially making
> dnsmasq take always longer than 2 seconds, but without them the race
> condition is still there, only it doesn't end badly every boot (only
> some boots, and on some hardware)
> 
> is this enough info to point the right direction for someone to look
> into ....procd? (not me, not enough proficient in C, sorry)

bump?
i'm more than happy to do any more needed tests, given any pointer in
the right direction

> 
> (running two long "/e/i/dnsmasq reload" in parallel manually via
> console, the second one doesn't kill the first one, which makes me think
> that procd is the murderer)
> 
> On 22/03/17 23:29, Yousong Zhou wrote:
>> From the log, there are at least 4 runs of /etc/init.d/dnsmasq
>>
>> This is "/e/c/dnsmasq boot" and returns immediately because the script skips
>> this phase and the dnsmasq service is expected to be brought up by interface
>> trigger events
>>
>> 	root at coco:~# logread | grep dns
>> 	Wed Mar 22 18:25:56 2017 user.notice root: guidebug dnsmasq init
>> 	Wed Mar 22 18:25:56 2017 user.notice root: guidebug dnsmasq boot
>> 	Wed Mar 22 18:25:56 2017 user.notice root: guidebug dnsmasq startsrv 1,
>> 	Wed Mar 22 18:25:56 2017 user.notice root: guidebug dnsmasq srvtrg
>>
>> This one I do not have much clue about
>>
>> 	Wed Mar 22 18:25:57 2017 user.notice root: guidebug dnsmasq init
>>
>> This one was brought up by the interface trigger event.  Your log stopped at
>> "wait 18" without "wait 16" because the service_triggers call in the
>> /e/c/dnsmasq has set the timeout of event handling to be 2000ms.  That's how
>> the logger and sleep call were impeding the process
>>
>> 	Wed Mar 22 18:26:06 2017 user.notice root: guidebug dnsmasq init
>> 	Wed Mar 22 18:26:06 2017 user.notice root: guidebug dnsmasq reload
>> 	Wed Mar 22 18:26:06 2017 user.notice root: guidebug dnsmasq startsrv ,
>> 	Wed Mar 22 18:26:06 2017 user.notice root: guidebug dnsmasq realstart cfg02411c wait 20
>> 	Wed Mar 22 18:26:08 2017 user.notice root: guidebug dnsmasq realstart cfg02411c wait 18
>>
>> This one I also have no clue about.  It didn't even make it to the "realstart"
>> part.  So checking further what happened in between was causing that should be
>> helpful
>>
>> 	Wed Mar 22 18:26:10 2017 user.notice root: guidebug dnsmasq init
>> 	Wed Mar 22 18:26:10 2017 user.notice root: guidebug dnsmasq reload
>> 	Wed Mar 22 18:26:10 2017 user.notice root: guidebug dnsmasq startsrv ,
>>
> 



More information about the Lede-dev mailing list