[LEDE-DEV] [PATCH] dnsmasq: Add upstream patch fixing SERVFAIL issues with multiple servers

Baptiste Jonglez baptiste at bitsofnetworks.org
Mon Feb 20 07:58:36 PST 2017


Sorry, I forgot to add the 17.01 tag.  I am resending it with the proper tag.

On Mon, Feb 20, 2017 at 04:48:49PM +0100, Baptiste Jonglez wrote:
> From: Baptiste Jonglez <git at bitsofnetworks.org>
> 
> This fixes FS#391 for lede-17.01
> 
> Signed-off-by: Baptiste Jonglez <git at bitsofnetworks.org>
> ---
>  .../patches/000-fix-servfail-handling.patch        | 130 +++++++++++++++++++++
>  1 file changed, 130 insertions(+)
>  create mode 100644 package/network/services/dnsmasq/patches/000-fix-servfail-handling.patch
> 
> diff --git a/package/network/services/dnsmasq/patches/000-fix-servfail-handling.patch b/package/network/services/dnsmasq/patches/000-fix-servfail-handling.patch
> new file mode 100644
> index 0000000000..e311c34729
> --- /dev/null
> +++ b/package/network/services/dnsmasq/patches/000-fix-servfail-handling.patch
> @@ -0,0 +1,130 @@
> +From 68f6312d4bae30b78daafcd6f51dc441b8685b1e Mon Sep 17 00:00:00 2001
> +From: Baptiste Jonglez <git at bitsofnetworks.org>
> +Date: Mon, 6 Feb 2017 21:09:11 +0000
> +Subject: [PATCH] Stop treating SERVFAIL as a successful response from upstream
> + servers.
> +
> +This effectively reverts most of 51967f9807 ("SERVFAIL is an expected
> +error return, don't try all servers.") and 4ace25c5d6 ("Treat REFUSED (not
> +SERVFAIL) as an unsuccessful upstream response").
> +
> +With the current behaviour, as soon as dnsmasq receives a SERVFAIL from an
> +upstream server, it stops trying to resolve the query and simply returns
> +SERVFAIL to the client.  With this commit, dnsmasq will instead try to
> +query other upstream servers upon receiving a SERVFAIL response.
> +
> +According to RFC 1034 and 1035, the semantic of SERVFAIL is that of a
> +temporary error condition.  Recursive resolvers are expected to encounter
> +network or resources issues from time to time, and will respond with
> +SERVFAIL in this case.  Similarly, if a validating DNSSEC resolver [RFC
> +4033] encounters issues when checking signatures (unknown signing
> +algorithm, missing signatures, expired signatures because of a wrong
> +system clock, etc), it will respond with SERVFAIL.
> +
> +Note that all those behaviours are entirely different from a negative
> +response, which would provide a definite indication that the requested
> +name does not exist.  In our case, if an upstream server responds with
> +SERVFAIL, another upstream server may well provide a positive answer for
> +the same query.
> +
> +Thus, this commit will increase robustness whenever some upstream servers
> +encounter temporary issues or are misconfigured.
> +
> +Quoting RFC 1034, Section 4.3.1. "Queries and responses":
> +
> +    If recursive service is requested and available, the recursive response
> +    to a query will be one of the following:
> +
> +       - The answer to the query, possibly preface by one or more CNAME
> +         RRs that specify aliases encountered on the way to an answer.
> +
> +       - A name error indicating that the name does not exist.  This
> +         may include CNAME RRs that indicate that the original query
> +	  name was an alias for a name which does not exist.
> +
> +       - A temporary error indication.
> +
> +Here is Section 5.2.3. of RFC 1034, "Temporary failures":
> +
> +    In a less than perfect world, all resolvers will occasionally be unable
> +    to resolve a particular request.  This condition can be caused by a
> +    resolver which becomes separated from the rest of the network due to a
> +    link failure or gateway problem, or less often by coincident failure or
> +    unavailability of all servers for a particular domain.
> +
> +And finally, RFC 1035 specifies RRCODE 2 for this usage, which is now more
> +widely known as SERVFAIL (RFC 1035, Section 4.1.1. "Header section format"):
> +
> +    RCODE           Response code - this 4 bit field is set as part of
> +                    responses.  The values have the following
> +                    interpretation:
> +                    (...)
> +
> +                    2               Server failure - The name server was
> +                                    unable to process this query due to a
> +                                    problem with the name server.
> +
> +For the DNSSEC-related usage of SERVFAIL, here is RFC 4033
> +Section 5. "Scope of the DNSSEC Document Set and Last Hop Issues":
> +
> +    A validating resolver can determine the following 4 states:
> +    (...)
> +
> +    Insecure: The validating resolver has a trust anchor, a chain of
> +       trust, and, at some delegation point, signed proof of the
> +       non-existence of a DS record.  This indicates that subsequent
> +       branches in the tree are provably insecure.  A validating resolver
> +       may have a local policy to mark parts of the domain space as
> +       insecure.
> +
> +    Bogus: The validating resolver has a trust anchor and a secure
> +       delegation indicating that subsidiary data is signed, but the
> +       response fails to validate for some reason: missing signatures,
> +       expired signatures, signatures with unsupported algorithms, data
> +       missing that the relevant NSEC RR says should be present, and so
> +       forth.
> +    (...)
> +
> +    This specification only defines how security-aware name servers can
> +    signal non-validating stub resolvers that data was found to be bogus
> +    (using RCODE=2, "Server Failure"; see [RFC4035]).
> +
> +Notice the difference between a definite negative answer ("Insecure"
> +state), and an indefinite error condition ("Bogus" state).  The second
> +type of error may be specific to a recursive resolver, for instance
> +because its system clock has been incorrectly set, or because it does not
> +implement newer cryptographic primitives.  Another recursive resolver may
> +succeed for the same query.
> +
> +There are other similar situations in which the specified behaviour is
> +similar to the one implemented by this commit.
> +
> +For instance, RFC 2136 specifies the behaviour of a "requestor" that wants
> +to update a zone using the DNS UPDATE mechanism.  The requestor tries to
> +contact all authoritative name servers for the zone, with the following
> +behaviour specified in RFC 2136, Section 4:
> +
> +    4.6. If a response is received whose RCODE is SERVFAIL or NOTIMP, or
> +    if no response is received within an implementation dependent timeout
> +    period, or if an ICMP error is received indicating that the server's
> +    port is unreachable, then the requestor will delete the unusable
> +    server from its internal name server list and try the next one,
> +    repeating until the name server list is empty.  If the requestor runs
> +    out of servers to try, an appropriate error will be returned to the
> +    requestor's caller.
> +---
> + src/forward.c | 3 ++-
> + 1 file changed, 2 insertions(+), 1 deletion(-)
> +
> +--- a/src/forward.c
> ++++ b/src/forward.c
> +@@ -853,7 +853,8 @@ void reply_query(int fd, int family, tim
> +      we get a good reply from another server. Kill it when we've
> +      had replies from all to avoid filling the forwarding table when
> +      everything is broken */
> +-  if (forward->forwardall == 0 || --forward->forwardall == 1 || RCODE(header) != REFUSED)
> ++  if (forward->forwardall == 0 || --forward->forwardall == 1 ||
> ++      (RCODE(header) != REFUSED && RCODE(header) != SERVFAIL))
> +     {
> +       int check_rebind = 0, no_cache_dnssec = 0, cache_secure = 0, bogusanswer = 0;
> + 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/lede-dev/attachments/20170220/490fadff/attachment-0001.sig>


More information about the Lede-dev mailing list