-
Notifications
You must be signed in to change notification settings - Fork 233
[solo] Use unbound to proxy to skydns; fixes shit #900
Conversation
Skydns now needs to be compiled with go 1.5+.
Current coverage is
|
88248a8
to
c3b4743
Compare
c3b4743
to
0d1483b
Compare
@@ -23,7 +23,8 @@ done | |||
curl -XPUT http://127.0.0.1:4001/v2/keys/skydns/${SKYDNS_PATH} \ | |||
-d value="{\"host\":\"$HOST_ADDRESS\"}" | |||
|
|||
skydns $SKYDNS_OPTS & | |||
skydns $SKYDNS_OPTS -verbose & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if -verbose
should be put in the image. Seems like you could run the docker image with -e SKYDNS_OPTS=-verbose
at runtime if you really wanted verbose logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea ill remove.
👍 overall |
0d1483b
to
8989006
Compare
TL;DR When two DNS servers don't work, add one more! When running some integration tests with HeliosSoloDeployment on Docker hosts that use a local unbound instance as its DNS resolver (i.e. specified in `/etc/resolv.conf` on the Docker host), we saw tests failures due to failed SRV queries to skydns. Skydns is running in the solo container and forwards DNS queries it doesn't know about to nameservers specified in `/etc/resolv.conf` via logic in `start.sh`. The skydns error output from the helios solo container spawned by HeliosSoloDeployment looked like: ``` skydns: failure to forward request "dns: failed to unpack truncated message" ``` Our guess is that large UDP responses from the upstream unbound have the "Message Truncated" DNS flag set. When this type of response reaches skydns, skydns blows up and doesn't tell the client about the error. The client times out without retrying in TCP mode. The client would've retried if it had received an error message from skydns. Running `dig` against skydns works. We think this is because `dig` adds an OPT record to its query that sets "udp payload size: 4096". Here are outstanding issues in skydns that seem related: * skynetservices/skydns#242 * skynetservices/skydns#45 Solution: We start an unbound instance in the solo container and have it forward DNS queries via UDP to the upstream skydns in the same container. Unbound will add the OPT section that makes everything work. Things are fixed. :) We admit this is super funky...And this only might work for UDP packets up to 4096 bytes, the default set by unbound in OPT. Much thanks to @gimaker for helping and suggesting unbound inside the container.
8989006
to
886f1dc
Compare
@gimaker If it looks good to you, I'll merge and try to get a release out. |
@davidxia what happened to the nice commit message? :( |
Oh nevermind. I see that it's still there. |
👍 |
TL;DR; skydns does not handle TCP well. We already worked around this in #900. See thar PR for more context. However, that change only fixed the problem to an extent as we still have the same issue once the responses are >4096 bytes. This change extends that workaround to allow us to survive responses up to 32768 bytes in size. This change does not fix the issue, but should make it more rare.
and use ServicesResourceTransformer to relocate class names in META-INF/services/. fixes #900
[solo] Use unbound to proxy to skydns; fixes shit
TL;DR When two DNS servers don't work, add one more!
When running some integration tests with HeliosSoloDeployment on Docker
hosts that use a local unbound instance as its DNS resolver (i.e.
specified in
/etc/resolv.conf
on the Docker host),we saw tests failures due to failed SRV queries to skydns. Skydns is
running in the solo container and forwards DNS queries it doesn't know
about to nameservers specified in
/etc/resolv.conf
via logic instart.sh
.The skydns error output from the helios solo container spawned by
HeliosSoloDeployment looked like:
Our guess is that large UDP responses from the upstream unbound
have the "Message Truncated" DNS flag set. When this type of response
reaches skydns, skydns blows up and doesn't tell the client about the
error. The client times out without retrying in TCP mode. The client
would've retried if it had received an error message from skydns.
Running
dig
against skydns works. We think this is becausedig
addsan OPT record to its query that sets "udp payload size: 4096".
Here are outstanding issues in skydns that seem related:
Solution:
We start an unbound instance in the solo container and have it forward
DNS queries via UDP to the upstream skydns in the same container.
Unbound will add the OPT section that makes everything work.
Things are fixed. :)
We admit this is super funky...And this only might work for UDP packets
up to 4096 bytes, the default set by unbound in OPT.
Much thanks to @gimaker for helping and suggesting unbound inside the
container.