[LEDE-DEV] [PATCH] dnsmasq: Add upstream patch fixing SERVFAIL issues with multiple servers
Baptiste Jonglez
baptiste at bitsofnetworks.org
Mon Feb 20 07:58:36 PST 2017
Sorry, I forgot to add the 17.01 tag. I am resending it with the proper tag.
On Mon, Feb 20, 2017 at 04:48:49PM +0100, Baptiste Jonglez wrote:
> From: Baptiste Jonglez <git at bitsofnetworks.org>
>
> This fixes FS#391 for lede-17.01
>
> Signed-off-by: Baptiste Jonglez <git at bitsofnetworks.org>
> ---
> .../patches/000-fix-servfail-handling.patch | 130 +++++++++++++++++++++
> 1 file changed, 130 insertions(+)
> create mode 100644 package/network/services/dnsmasq/patches/000-fix-servfail-handling.patch
>
> diff --git a/package/network/services/dnsmasq/patches/000-fix-servfail-handling.patch b/package/network/services/dnsmasq/patches/000-fix-servfail-handling.patch
> new file mode 100644
> index 0000000000..e311c34729
> --- /dev/null
> +++ b/package/network/services/dnsmasq/patches/000-fix-servfail-handling.patch
> @@ -0,0 +1,130 @@
> +From 68f6312d4bae30b78daafcd6f51dc441b8685b1e Mon Sep 17 00:00:00 2001
> +From: Baptiste Jonglez <git at bitsofnetworks.org>
> +Date: Mon, 6 Feb 2017 21:09:11 +0000
> +Subject: [PATCH] Stop treating SERVFAIL as a successful response from upstream
> + servers.
> +
> +This effectively reverts most of 51967f9807 ("SERVFAIL is an expected
> +error return, don't try all servers.") and 4ace25c5d6 ("Treat REFUSED (not
> +SERVFAIL) as an unsuccessful upstream response").
> +
> +With the current behaviour, as soon as dnsmasq receives a SERVFAIL from an
> +upstream server, it stops trying to resolve the query and simply returns
> +SERVFAIL to the client. With this commit, dnsmasq will instead try to
> +query other upstream servers upon receiving a SERVFAIL response.
> +
> +According to RFC 1034 and 1035, the semantic of SERVFAIL is that of a
> +temporary error condition. Recursive resolvers are expected to encounter
> +network or resources issues from time to time, and will respond with
> +SERVFAIL in this case. Similarly, if a validating DNSSEC resolver [RFC
> +4033] encounters issues when checking signatures (unknown signing
> +algorithm, missing signatures, expired signatures because of a wrong
> +system clock, etc), it will respond with SERVFAIL.
> +
> +Note that all those behaviours are entirely different from a negative
> +response, which would provide a definite indication that the requested
> +name does not exist. In our case, if an upstream server responds with
> +SERVFAIL, another upstream server may well provide a positive answer for
> +the same query.
> +
> +Thus, this commit will increase robustness whenever some upstream servers
> +encounter temporary issues or are misconfigured.
> +
> +Quoting RFC 1034, Section 4.3.1. "Queries and responses":
> +
> + If recursive service is requested and available, the recursive response
> + to a query will be one of the following:
> +
> + - The answer to the query, possibly preface by one or more CNAME
> + RRs that specify aliases encountered on the way to an answer.
> +
> + - A name error indicating that the name does not exist. This
> + may include CNAME RRs that indicate that the original query
> + name was an alias for a name which does not exist.
> +
> + - A temporary error indication.
> +
> +Here is Section 5.2.3. of RFC 1034, "Temporary failures":
> +
> + In a less than perfect world, all resolvers will occasionally be unable
> + to resolve a particular request. This condition can be caused by a
> + resolver which becomes separated from the rest of the network due to a
> + link failure or gateway problem, or less often by coincident failure or
> + unavailability of all servers for a particular domain.
> +
> +And finally, RFC 1035 specifies RRCODE 2 for this usage, which is now more
> +widely known as SERVFAIL (RFC 1035, Section 4.1.1. "Header section format"):
> +
> + RCODE Response code - this 4 bit field is set as part of
> + responses. The values have the following
> + interpretation:
> + (...)
> +
> + 2 Server failure - The name server was
> + unable to process this query due to a
> + problem with the name server.
> +
> +For the DNSSEC-related usage of SERVFAIL, here is RFC 4033
> +Section 5. "Scope of the DNSSEC Document Set and Last Hop Issues":
> +
> + A validating resolver can determine the following 4 states:
> + (...)
> +
> + Insecure: The validating resolver has a trust anchor, a chain of
> + trust, and, at some delegation point, signed proof of the
> + non-existence of a DS record. This indicates that subsequent
> + branches in the tree are provably insecure. A validating resolver
> + may have a local policy to mark parts of the domain space as
> + insecure.
> +
> + Bogus: The validating resolver has a trust anchor and a secure
> + delegation indicating that subsidiary data is signed, but the
> + response fails to validate for some reason: missing signatures,
> + expired signatures, signatures with unsupported algorithms, data
> + missing that the relevant NSEC RR says should be present, and so
> + forth.
> + (...)
> +
> + This specification only defines how security-aware name servers can
> + signal non-validating stub resolvers that data was found to be bogus
> + (using RCODE=2, "Server Failure"; see [RFC4035]).
> +
> +Notice the difference between a definite negative answer ("Insecure"
> +state), and an indefinite error condition ("Bogus" state). The second
> +type of error may be specific to a recursive resolver, for instance
> +because its system clock has been incorrectly set, or because it does not
> +implement newer cryptographic primitives. Another recursive resolver may
> +succeed for the same query.
> +
> +There are other similar situations in which the specified behaviour is
> +similar to the one implemented by this commit.
> +
> +For instance, RFC 2136 specifies the behaviour of a "requestor" that wants
> +to update a zone using the DNS UPDATE mechanism. The requestor tries to
> +contact all authoritative name servers for the zone, with the following
> +behaviour specified in RFC 2136, Section 4:
> +
> + 4.6. If a response is received whose RCODE is SERVFAIL or NOTIMP, or
> + if no response is received within an implementation dependent timeout
> + period, or if an ICMP error is received indicating that the server's
> + port is unreachable, then the requestor will delete the unusable
> + server from its internal name server list and try the next one,
> + repeating until the name server list is empty. If the requestor runs
> + out of servers to try, an appropriate error will be returned to the
> + requestor's caller.
> +---
> + src/forward.c | 3 ++-
> + 1 file changed, 2 insertions(+), 1 deletion(-)
> +
> +--- a/src/forward.c
> ++++ b/src/forward.c
> +@@ -853,7 +853,8 @@ void reply_query(int fd, int family, tim
> + we get a good reply from another server. Kill it when we've
> + had replies from all to avoid filling the forwarding table when
> + everything is broken */
> +- if (forward->forwardall == 0 || --forward->forwardall == 1 || RCODE(header) != REFUSED)
> ++ if (forward->forwardall == 0 || --forward->forwardall == 1 ||
> ++ (RCODE(header) != REFUSED && RCODE(header) != SERVFAIL))
> + {
> + int check_rebind = 0, no_cache_dnssec = 0, cache_secure = 0, bogusanswer = 0;
> +
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/lede-dev/attachments/20170220/490fadff/attachment-0001.sig>
More information about the Lede-dev
mailing list