-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: builtin DNS stub resolver fails to parse responses from consul with "cannot unmarshal DNS message" #11070
Comments
I can continue investigating tomorrow, there must be some difference between the responses from consul vs the other dns server (which is a internal AWS recursor). |
I just realize that there is hashicorp/consul#854, so it might be a consul bug after all. Still not 100% sure whether netgo should behave the way it does... |
Is "netcgo" a typo? |
Also, note that in Go 1.5, netgo will be the default most of the time, without a special build tag. It'll decide at runtime whether to use netgo or libc's resolver based on the system's config and the hostname. Please try with Go tip. |
Please open a new issue for using builtin DNS stub resolver with search? domain? both? keywords. Please don't mix two issues together. For the "cannot unmarshal DNS message" error, as described in hashicorp/consul#854, builtin DNS stub resolver in go1.4 and above are more RFC 1035 compliant than go1.3 and below. I'm not sure what we should do when the recursor stuff (I guess it's a recursive server) replies a long response message on UDP transport. The "no such host" error; builtin DNS stub resolver with search? domain? both? issue looks pretty interesting. A few possibilities come to my mind, but not sure without any concrete information. Please provide, a) DNS RR sets for your target alias or canonical name, b) your stub resolver configuration (usually resolv.conf), c) your recursive DNS server configuration, into a new issue. It would be a great help If you can provide the environment for repro. Thanks. |
@mikioh Yes, it seems like those are two issues. "Something" doesn't handle responses >512 bytes and this gets silently ignored since it tries to append the domain which again fails. At least that's what I assume is going on. The "fix" in consul seems to be to compress the response, bringing it below 512 bytes but it is perfectly valid to have bigger responses and all major resolvers are supporting that by implementing EDNS or fall back to tcp and I think netgo is doing this as well, right? Possible that consul is returning a malformed response that most resolvers still manage to parse but netgo is too strict. But then the error could be probably improved. PS: @bradfitz Yes fixed, was a bit in a rush ;) |
Yup, see #6464 for the reason why we hesitate to support EDNS0. If the need of DNSSEC increases, we perhaps might implement DNSSEC+EDNS0 to builtin DNS stub resolver; even in that case we might not allow simple EDNS0-only conversation, not sure. |
Please let us know if you have any updates on this "cannot unmarshal DNS meesage" issue at any time. We keep this issue open awhile. Also if you still have "no such host" errors, please open a new issue and provide your environment information. |
@mikioh I can reproduce the "no such host" error consistently. Much of what's below was helped by @discordianfish
As for the "cannot unmarshal DNS" I have not reproduced it directly with Go1.4. The situation where I produced it was as follows.
|
Can we get a repro case that doesn't involve a massive step like installing a custom DNS server with unknown details on how to configure? Give us a self-contained Go program, or maybe a network dump, or even a Docker environment where we can reproduce it. But not instructions with a massive-yet-undefined setup step. |
@discordianfish If Consul is sending >512 byte DNS responses over UDP without the client indicating support for large DNS responses via EDNS0, then Consul is RFC non-compliant. It's supposed to instead truncate the packet (note: truncated packets should still be valid DNS responses, so don't just blindly chop off bytes past 512), set the TC flag, and support the same query over TCP (which has no size limits). |
Can you please open a new issue for investigating the "no such host error with consul" because it would be a long journey and usually an issue related to DNS has multiple combined root causes. |
Okay, the actual problem appears to be a API / usability issue with miekg/dns: miekg/dns#216 and how consul it using it. Will make sure someone opens a new issue for the 'no such host' error. |
Hi,
if you have a CNAME like registry-1.docker.io, using a DNS resolver like consul and try to resolve the record by using a go 1.4.2 application using netgo, the resolution fails.
This can be reproduced by running a recursor like consul, pointing /etc/resolv.conf to it and compile this: https://gist.github.com/discordianfish/467ea55ae86426815a21 with
CGO_ENABLED=0 go build -installsuffix netgo
.This particular behaviour is something I only observed with consul, so it might be very well a bug there. But compiling the same with default options / CGO enabled, the resolution works just fine. All other tools can resolve the record just fine as well. And we couldn't reproduce it with go 1.2 nor go 1.3.
Consul
Run consul and provide upstream
-recursors
Without search domain
tcpdump shows:
With search domain
If you use some search domain, the results are different:
2015/06/04 18:04:14 Get https://registry-1.docker.io: dial tcp: lookup registry-1.docker.io: no such host
Tcpdump:
This later one is different, but might be related to moby/moby#10863
Other DNS servers
Other DNS servers seem to work, yet the DNS requests look very similar. I'm using the same nameservers I provided as upstream recursors for consul before to rule out it's somehow related to those.
Without search domain
Tcpdump:
With search domain
Tcpdump:
The text was updated successfully, but these errors were encountered: