[LEDE-DEV] [PATCH] dnsmasq: Add upstream patch fixing SERVFAIL issues with multiple servers

Baptiste Jonglez baptiste at bitsofnetworks.org
Mon Feb 20 07:48:49 PST 2017

From: Baptiste Jonglez <git at bitsofnetworks.org>

This fixes FS#391 for lede-17.01

Signed-off-by: Baptiste Jonglez <git at bitsofnetworks.org>
 .../patches/000-fix-servfail-handling.patch        | 130 +++++++++++++++++++++
 1 file changed, 130 insertions(+)
 create mode 100644 package/network/services/dnsmasq/patches/000-fix-servfail-handling.patch

diff --git a/package/network/services/dnsmasq/patches/000-fix-servfail-handling.patch b/package/network/services/dnsmasq/patches/000-fix-servfail-handling.patch
new file mode 100644
index 0000000000..e311c34729
--- /dev/null
+++ b/package/network/services/dnsmasq/patches/000-fix-servfail-handling.patch
@@ -0,0 +1,130 @@
+From 68f6312d4bae30b78daafcd6f51dc441b8685b1e Mon Sep 17 00:00:00 2001
+From: Baptiste Jonglez <git at bitsofnetworks.org>
+Date: Mon, 6 Feb 2017 21:09:11 +0000
+Subject: [PATCH] Stop treating SERVFAIL as a successful response from upstream
+ servers.
+This effectively reverts most of 51967f9807 ("SERVFAIL is an expected
+error return, don't try all servers.") and 4ace25c5d6 ("Treat REFUSED (not
+SERVFAIL) as an unsuccessful upstream response").
+With the current behaviour, as soon as dnsmasq receives a SERVFAIL from an
+upstream server, it stops trying to resolve the query and simply returns
+SERVFAIL to the client.  With this commit, dnsmasq will instead try to
+query other upstream servers upon receiving a SERVFAIL response.
+According to RFC 1034 and 1035, the semantic of SERVFAIL is that of a
+temporary error condition.  Recursive resolvers are expected to encounter
+network or resources issues from time to time, and will respond with
+SERVFAIL in this case.  Similarly, if a validating DNSSEC resolver [RFC
+4033] encounters issues when checking signatures (unknown signing
+algorithm, missing signatures, expired signatures because of a wrong
+system clock, etc), it will respond with SERVFAIL.
+Note that all those behaviours are entirely different from a negative
+response, which would provide a definite indication that the requested
+name does not exist.  In our case, if an upstream server responds with
+SERVFAIL, another upstream server may well provide a positive answer for
+the same query.
+Thus, this commit will increase robustness whenever some upstream servers
+encounter temporary issues or are misconfigured.
+Quoting RFC 1034, Section 4.3.1. "Queries and responses":
+    If recursive service is requested and available, the recursive response
+    to a query will be one of the following:
+       - The answer to the query, possibly preface by one or more CNAME
+         RRs that specify aliases encountered on the way to an answer.
+       - A name error indicating that the name does not exist.  This
+         may include CNAME RRs that indicate that the original query
+	  name was an alias for a name which does not exist.
+       - A temporary error indication.
+Here is Section 5.2.3. of RFC 1034, "Temporary failures":
+    In a less than perfect world, all resolvers will occasionally be unable
+    to resolve a particular request.  This condition can be caused by a
+    resolver which becomes separated from the rest of the network due to a
+    link failure or gateway problem, or less often by coincident failure or
+    unavailability of all servers for a particular domain.
+And finally, RFC 1035 specifies RRCODE 2 for this usage, which is now more
+widely known as SERVFAIL (RFC 1035, Section 4.1.1. "Header section format"):
+    RCODE           Response code - this 4 bit field is set as part of
+                    responses.  The values have the following
+                    interpretation:
+                    (...)
+                    2               Server failure - The name server was
+                                    unable to process this query due to a
+                                    problem with the name server.
+For the DNSSEC-related usage of SERVFAIL, here is RFC 4033
+Section 5. "Scope of the DNSSEC Document Set and Last Hop Issues":
+    A validating resolver can determine the following 4 states:
+    (...)
+    Insecure: The validating resolver has a trust anchor, a chain of
+       trust, and, at some delegation point, signed proof of the
+       non-existence of a DS record.  This indicates that subsequent
+       branches in the tree are provably insecure.  A validating resolver
+       may have a local policy to mark parts of the domain space as
+       insecure.
+    Bogus: The validating resolver has a trust anchor and a secure
+       delegation indicating that subsidiary data is signed, but the
+       response fails to validate for some reason: missing signatures,
+       expired signatures, signatures with unsupported algorithms, data
+       missing that the relevant NSEC RR says should be present, and so
+       forth.
+    (...)
+    This specification only defines how security-aware name servers can
+    signal non-validating stub resolvers that data was found to be bogus
+    (using RCODE=2, "Server Failure"; see [RFC4035]).
+Notice the difference between a definite negative answer ("Insecure"
+state), and an indefinite error condition ("Bogus" state).  The second
+type of error may be specific to a recursive resolver, for instance
+because its system clock has been incorrectly set, or because it does not
+implement newer cryptographic primitives.  Another recursive resolver may
+succeed for the same query.
+There are other similar situations in which the specified behaviour is
+similar to the one implemented by this commit.
+For instance, RFC 2136 specifies the behaviour of a "requestor" that wants
+to update a zone using the DNS UPDATE mechanism.  The requestor tries to
+contact all authoritative name servers for the zone, with the following
+behaviour specified in RFC 2136, Section 4:
+    4.6. If a response is received whose RCODE is SERVFAIL or NOTIMP, or
+    if no response is received within an implementation dependent timeout
+    period, or if an ICMP error is received indicating that the server's
+    port is unreachable, then the requestor will delete the unusable
+    server from its internal name server list and try the next one,
+    repeating until the name server list is empty.  If the requestor runs
+    out of servers to try, an appropriate error will be returned to the
+    requestor's caller.
+ src/forward.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+--- a/src/forward.c
++++ b/src/forward.c
+@@ -853,7 +853,8 @@ void reply_query(int fd, int family, tim
+      we get a good reply from another server. Kill it when we've
+      had replies from all to avoid filling the forwarding table when
+      everything is broken */
+-  if (forward->forwardall == 0 || --forward->forwardall == 1 || RCODE(header) != REFUSED)
++  if (forward->forwardall == 0 || --forward->forwardall == 1 ||
++      (RCODE(header) != REFUSED && RCODE(header) != SERVFAIL))
+     {
+       int check_rebind = 0, no_cache_dnssec = 0, cache_secure = 0, bogusanswer = 0;

More information about the Lede-dev mailing list