[LEDE-DEV] [PATCH] [17.01] dnsmasq: Add upstream patch fixing SERVFAIL issues with multiple servers
Baptiste Jonglez
baptiste at bitsofnetworks.org
Mon Feb 20 07:59:28 PST 2017
From: Baptiste Jonglez <git at bitsofnetworks.org>
This fixes FS#391 for lede-17.01
Signed-off-by: Baptiste Jonglez <git at bitsofnetworks.org>
---
.../patches/000-fix-servfail-handling.patch | 130 +++++++++++++++++++++
1 file changed, 130 insertions(+)
create mode 100644 package/network/services/dnsmasq/patches/000-fix-servfail-handling.patch
diff --git a/package/network/services/dnsmasq/patches/000-fix-servfail-handling.patch b/package/network/services/dnsmasq/patches/000-fix-servfail-handling.patch
new file mode 100644
index 0000000000..e311c34729
--- /dev/null
+++ b/package/network/services/dnsmasq/patches/000-fix-servfail-handling.patch
@@ -0,0 +1,130 @@
+From 68f6312d4bae30b78daafcd6f51dc441b8685b1e Mon Sep 17 00:00:00 2001
+From: Baptiste Jonglez <git at bitsofnetworks.org>
+Date: Mon, 6 Feb 2017 21:09:11 +0000
+Subject: [PATCH] Stop treating SERVFAIL as a successful response from upstream
+ servers.
+
+This effectively reverts most of 51967f9807 ("SERVFAIL is an expected
+error return, don't try all servers.") and 4ace25c5d6 ("Treat REFUSED (not
+SERVFAIL) as an unsuccessful upstream response").
+
+With the current behaviour, as soon as dnsmasq receives a SERVFAIL from an
+upstream server, it stops trying to resolve the query and simply returns
+SERVFAIL to the client. With this commit, dnsmasq will instead try to
+query other upstream servers upon receiving a SERVFAIL response.
+
+According to RFC 1034 and 1035, the semantic of SERVFAIL is that of a
+temporary error condition. Recursive resolvers are expected to encounter
+network or resources issues from time to time, and will respond with
+SERVFAIL in this case. Similarly, if a validating DNSSEC resolver [RFC
+4033] encounters issues when checking signatures (unknown signing
+algorithm, missing signatures, expired signatures because of a wrong
+system clock, etc), it will respond with SERVFAIL.
+
+Note that all those behaviours are entirely different from a negative
+response, which would provide a definite indication that the requested
+name does not exist. In our case, if an upstream server responds with
+SERVFAIL, another upstream server may well provide a positive answer for
+the same query.
+
+Thus, this commit will increase robustness whenever some upstream servers
+encounter temporary issues or are misconfigured.
+
+Quoting RFC 1034, Section 4.3.1. "Queries and responses":
+
+ If recursive service is requested and available, the recursive response
+ to a query will be one of the following:
+
+ - The answer to the query, possibly preface by one or more CNAME
+ RRs that specify aliases encountered on the way to an answer.
+
+ - A name error indicating that the name does not exist. This
+ may include CNAME RRs that indicate that the original query
+ name was an alias for a name which does not exist.
+
+ - A temporary error indication.
+
+Here is Section 5.2.3. of RFC 1034, "Temporary failures":
+
+ In a less than perfect world, all resolvers will occasionally be unable
+ to resolve a particular request. This condition can be caused by a
+ resolver which becomes separated from the rest of the network due to a
+ link failure or gateway problem, or less often by coincident failure or
+ unavailability of all servers for a particular domain.
+
+And finally, RFC 1035 specifies RRCODE 2 for this usage, which is now more
+widely known as SERVFAIL (RFC 1035, Section 4.1.1. "Header section format"):
+
+ RCODE Response code - this 4 bit field is set as part of
+ responses. The values have the following
+ interpretation:
+ (...)
+
+ 2 Server failure - The name server was
+ unable to process this query due to a
+ problem with the name server.
+
+For the DNSSEC-related usage of SERVFAIL, here is RFC 4033
+Section 5. "Scope of the DNSSEC Document Set and Last Hop Issues":
+
+ A validating resolver can determine the following 4 states:
+ (...)
+
+ Insecure: The validating resolver has a trust anchor, a chain of
+ trust, and, at some delegation point, signed proof of the
+ non-existence of a DS record. This indicates that subsequent
+ branches in the tree are provably insecure. A validating resolver
+ may have a local policy to mark parts of the domain space as
+ insecure.
+
+ Bogus: The validating resolver has a trust anchor and a secure
+ delegation indicating that subsidiary data is signed, but the
+ response fails to validate for some reason: missing signatures,
+ expired signatures, signatures with unsupported algorithms, data
+ missing that the relevant NSEC RR says should be present, and so
+ forth.
+ (...)
+
+ This specification only defines how security-aware name servers can
+ signal non-validating stub resolvers that data was found to be bogus
+ (using RCODE=2, "Server Failure"; see [RFC4035]).
+
+Notice the difference between a definite negative answer ("Insecure"
+state), and an indefinite error condition ("Bogus" state). The second
+type of error may be specific to a recursive resolver, for instance
+because its system clock has been incorrectly set, or because it does not
+implement newer cryptographic primitives. Another recursive resolver may
+succeed for the same query.
+
+There are other similar situations in which the specified behaviour is
+similar to the one implemented by this commit.
+
+For instance, RFC 2136 specifies the behaviour of a "requestor" that wants
+to update a zone using the DNS UPDATE mechanism. The requestor tries to
+contact all authoritative name servers for the zone, with the following
+behaviour specified in RFC 2136, Section 4:
+
+ 4.6. If a response is received whose RCODE is SERVFAIL or NOTIMP, or
+ if no response is received within an implementation dependent timeout
+ period, or if an ICMP error is received indicating that the server's
+ port is unreachable, then the requestor will delete the unusable
+ server from its internal name server list and try the next one,
+ repeating until the name server list is empty. If the requestor runs
+ out of servers to try, an appropriate error will be returned to the
+ requestor's caller.
+---
+ src/forward.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+--- a/src/forward.c
++++ b/src/forward.c
+@@ -853,7 +853,8 @@ void reply_query(int fd, int family, tim
+ we get a good reply from another server. Kill it when we've
+ had replies from all to avoid filling the forwarding table when
+ everything is broken */
+- if (forward->forwardall == 0 || --forward->forwardall == 1 || RCODE(header) != REFUSED)
++ if (forward->forwardall == 0 || --forward->forwardall == 1 ||
++ (RCODE(header) != REFUSED && RCODE(header) != SERVFAIL))
+ {
+ int check_rebind = 0, no_cache_dnssec = 0, cache_secure = 0, bogusanswer = 0;
+
--
2.11.1
More information about the Lede-dev
mailing list