[LEDE-DEV] dnsmasq sometimes fails to start on 17.01.0

Yousong Zhou yszhou4tech at gmail.com
Wed Mar 29 08:57:13 PDT 2017


On 24 March 2017 at 03:17, Gui Iribarren via Lede-dev
<lede-dev at lists.infradead.org> wrote:
> The sender domain has a DMARC Reject/Quarantine policy which disallows
> sending mailing list messages using the original "From" header.
>
> To mitigate this problem, the original message has been wrapped
> automatically by the mailing list software.
>
>
> ---------- Forwarded message ----------
> From: Gui Iribarren <gui at altermundi.net>
> To: Yousong Zhou <yszhou4tech at gmail.com>
> Cc: lede-dev at lists.infradead.org
> Bcc:
> Date: Thu, 23 Mar 2017 16:10:06 -0300
> Subject: Re: [LEDE-DEV] dnsmasq sometimes fails to start on 17.01.0
> Thanks both for the replies!
>
> I continued yeterday further debugging this, i played with this
> particular number in /e/i/dnsmasq
> (line 815 in http://pastebin.com/FV09f2jG)
>
>     procd_add_raw_trigger "interface.*" 2000 /etc/init.d/dnsmasq reload
>
> this number, as the following link suggests
> http://wiki.prplfoundation.org/wiki/Procd_reference#procd_add_raw_trigger.28event.2C_timeout.2C_.5Bscript.5D.29
> is the number of milliseconds that procd will wait after the trigger (in
> this case, anything related to an "interface" AFAIU) before executing
> "dnsmasq reload"
>
> i put 60000 (60 seconds) and now the reload only happens once (after ~60
> seconds from the first "/e/i/dnsmasq boot") and dnsmasq starts without
> problem
>
> so it looks to me like a race condition, where two "interface.*" events
> are happening one after the other, triggering two consecutive reloads,
> the first reload doesn't finish its work before the second reload comes,
> and the second reload kills the first reload, and suicides itself for
> some reason.
>
> setting a long raw_trigger timeout works around the problem because the
> "interface.*" events happen all inside the 60 second window, and procd
> runs "/e/i/dnsmasq reload" only once
>

I cannot yet reproduce the issue, but here are some findings after
reading code of procd

The 2000ms and 60000ms above as argument of procd_add_raw_trigger is a
delay before running the specified action when the event occurs.  It's
NOT a timeout to wait for the action to complete before killing it.

When the event first occurs, procd will schedule the action to be run
after a "delay".  If the same kind of event happens again

 - during that delay (before the action has been started), the delay
will be reset to its initial value (wait for that amount of time
again).
 - after the delay and the action is still being carried out, the
action will be marked to re-run after the current one completes

I am wondering if FS#660 was caused by the same reason

  https://bugs.lede-project.org/index.php?do=details&task_id=660

Regards,
                yousong



More information about the Lede-dev mailing list