Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: server_test.go:1235: dns: failed to unpack truncated message #242

Open
miekg opened this issue Jan 4, 2016 · 0 comments
Open

Fix: server_test.go:1235: dns: failed to unpack truncated message #242

miekg opened this issue Jan 4, 2016 · 0 comments

Comments

@miekg
Copy link

miekg commented Jan 4, 2016

See https://travis-ci.org/skynetservices/skydns/builds/100020771

=== RUN TestMsgOverflow-2
2016/01/03 23:53:43 skydns: ready for queries on skydns.test. for tcp://127.0.0.1:9500 [rcache 0]
2016/01/03 23:53:43 skydns: ready for queries on skydns.test. for udp://127.0.0.1:9500 [rcache 0]
--- FAIL: TestMsgOverflow-2 (7.21s)
    server_test.go:1235: dns: failed to unpack truncated message
FAIL
FAIL    github.com/skynetservices/skydns/server 18.595s
?       github.com/skynetservices/skydns/stats  [no test files]
The command "go test -v ./..." exited with 1.
Done. Your build exited with 1.
davidxia added a commit to spotify/helios that referenced this issue Apr 6, 2016
When running some integration tests with HeliosSoloDeployment on Docker
hosts that use a local unbound instance as its DNS resolver (i.e.
specified in `/etc/resolv.conf` on the Docker host),
we saw tests failures due to failed SRV queries to skydns. Skydns is
running in the solo container and forwards DNS queries it doesn't know
about to the unbound instance via logic in `start.sh`.

The skydns error output from the helios solo container spawned by
HeliosSoloDeployment looked like:

```
skydns: failure to forward request "dns: failed to unpack truncated
message"
```

Our guess is that large UDP responses from the upstream unbound
have the "Message Truncated" DNS flag set. When this type of response
reaches skydns, skydns blows up and doesn't tell the client about the
error. The client times out without retrying in TCP mode. The client
would've retried if it had received an error message from skydns.

Running `dig` against skydns works. We think this is because `dig` adds
an OPT record to its query that sets "udp payload size: 4096".

Here's an outstanding issue in skydns that seem related:

* skynetservices/skydns#242
* skynetservices/skydns#45
davidxia added a commit to spotify/helios that referenced this issue Apr 6, 2016
TL;DR When two DNS servers don't work, add one more!

When running some integration tests with HeliosSoloDeployment on Docker
hosts that use a local unbound instance as its DNS resolver (i.e.
specified in `/etc/resolv.conf` on the Docker host),
we saw tests failures due to failed SRV queries to skydns. Skydns is
running in the solo container and forwards DNS queries it doesn't know
about to the unbound instance via logic in `start.sh`.

The skydns error output from the helios solo container spawned by
HeliosSoloDeployment looked like:

```
skydns: failure to forward request "dns: failed to unpack truncated
message"
```

Our guess is that large UDP responses from the upstream unbound
have the "Message Truncated" DNS flag set. When this type of response
reaches skydns, skydns blows up and doesn't tell the client about the
error. The client times out without retrying in TCP mode. The client
would've retried if it had received an error message from skydns.

Running `dig` against skydns works. We think this is because `dig` adds
an OPT record to its query that sets "udp payload size: 4096".

Here's an outstanding issue in skydns that seem related:

* skynetservices/skydns#242
* skynetservices/skydns#45

Solution:

We start an unbound instance in the solo container and have it speak
only TCP to the upstream skydns (in the same container) with
`tcp-upstream: yes`. This forces skydns to speak only TCP with its
upstream. No UDP truncation shit. Things are fixed. :)

We admit this is super funky, but there's no way to force skydns to
speak only TCP right now.
davidxia added a commit to spotify/helios that referenced this issue Apr 6, 2016
TL;DR When two DNS servers don't work, add one more!

When running some integration tests with HeliosSoloDeployment on Docker
hosts that use a local unbound instance as its DNS resolver (i.e.
specified in `/etc/resolv.conf` on the Docker host),
we saw tests failures due to failed SRV queries to skydns. Skydns is
running in the solo container and forwards DNS queries it doesn't know
about to the unbound instance via logic in `start.sh`.

The skydns error output from the helios solo container spawned by
HeliosSoloDeployment looked like:

```
skydns: failure to forward request "dns: failed to unpack truncated
message"
```

Our guess is that large UDP responses from the upstream unbound
have the "Message Truncated" DNS flag set. When this type of response
reaches skydns, skydns blows up and doesn't tell the client about the
error. The client times out without retrying in TCP mode. The client
would've retried if it had received an error message from skydns.

Running `dig` against skydns works. We think this is because `dig` adds
an OPT record to its query that sets "udp payload size: 4096".

Here's an outstanding issue in skydns that seem related:

* skynetservices/skydns#242
* skynetservices/skydns#45

Solution:

We start an unbound instance in the solo container and have it forward
DNS queries via UDP to the upstream skydns in the same container.
Unbound will add the OPT section that makes everything work.
Things are fixed. :)

We admit this is super funky...And this only might work for UDP packets
up to 4096 bytes, the default set by unbound in OPT.
davidxia added a commit to spotify/helios that referenced this issue Apr 7, 2016
TL;DR When two DNS servers don't work, add one more!

When running some integration tests with HeliosSoloDeployment on Docker
hosts that use a local unbound instance as its DNS resolver (i.e.
specified in `/etc/resolv.conf` on the Docker host),
we saw tests failures due to failed SRV queries to skydns. Skydns is
running in the solo container and forwards DNS queries it doesn't know
about to nameservers specified in `/etc/resolv.conf` via logic in `start.sh`.

The skydns error output from the helios solo container spawned by
HeliosSoloDeployment looked like:

```
skydns: failure to forward request "dns: failed to unpack truncated
message"
```

Our guess is that large UDP responses from the upstream unbound
have the "Message Truncated" DNS flag set. When this type of response
reaches skydns, skydns blows up and doesn't tell the client about the
error. The client times out without retrying in TCP mode. The client
would've retried if it had received an error message from skydns.

Running `dig` against skydns works. We think this is because `dig` adds
an OPT record to its query that sets "udp payload size: 4096".

Here are outstanding issues in skydns that seem related:

* skynetservices/skydns#242
* skynetservices/skydns#45

Solution:

We start an unbound instance in the solo container and have it forward
DNS queries via UDP to the upstream skydns in the same container.
Unbound will add the OPT section that makes everything work.
Things are fixed. :)

We admit this is super funky...And this only might work for UDP packets
up to 4096 bytes, the default set by unbound in OPT.
davidxia added a commit to spotify/helios that referenced this issue Apr 7, 2016
TL;DR When two DNS servers don't work, add one more!

When running some integration tests with HeliosSoloDeployment on Docker
hosts that use a local unbound instance as its DNS resolver (i.e.
specified in `/etc/resolv.conf` on the Docker host),
we saw tests failures due to failed SRV queries to skydns. Skydns is
running in the solo container and forwards DNS queries it doesn't know
about to nameservers specified in `/etc/resolv.conf` via logic in `start.sh`.

The skydns error output from the helios solo container spawned by
HeliosSoloDeployment looked like:

```
skydns: failure to forward request "dns: failed to unpack truncated
message"
```

Our guess is that large UDP responses from the upstream unbound
have the "Message Truncated" DNS flag set. When this type of response
reaches skydns, skydns blows up and doesn't tell the client about the
error. The client times out without retrying in TCP mode. The client
would've retried if it had received an error message from skydns.

Running `dig` against skydns works. We think this is because `dig` adds
an OPT record to its query that sets "udp payload size: 4096".

Here are outstanding issues in skydns that seem related:

* skynetservices/skydns#242
* skynetservices/skydns#45

Solution:

We start an unbound instance in the solo container and have it forward
DNS queries via UDP to the upstream skydns in the same container.
Unbound will add the OPT section that makes everything work.
Things are fixed. :)

We admit this is super funky...And this only might work for UDP packets
up to 4096 bytes, the default set by unbound in OPT.
davidxia added a commit to spotify/helios that referenced this issue Apr 7, 2016
TL;DR When two DNS servers don't work, add one more!

When running some integration tests with HeliosSoloDeployment on Docker
hosts that use a local unbound instance as its DNS resolver (i.e.
specified in `/etc/resolv.conf` on the Docker host),
we saw tests failures due to failed SRV queries to skydns. Skydns is
running in the solo container and forwards DNS queries it doesn't know
about to nameservers specified in `/etc/resolv.conf` via logic in `start.sh`.

The skydns error output from the helios solo container spawned by
HeliosSoloDeployment looked like:

```
skydns: failure to forward request "dns: failed to unpack truncated
message"
```

Our guess is that large UDP responses from the upstream unbound
have the "Message Truncated" DNS flag set. When this type of response
reaches skydns, skydns blows up and doesn't tell the client about the
error. The client times out without retrying in TCP mode. The client
would've retried if it had received an error message from skydns.

Running `dig` against skydns works. We think this is because `dig` adds
an OPT record to its query that sets "udp payload size: 4096".

Here are outstanding issues in skydns that seem related:

* skynetservices/skydns#242
* skynetservices/skydns#45

Solution:

We start an unbound instance in the solo container and have it forward
DNS queries via UDP to the upstream skydns in the same container.
Unbound will add the OPT section that makes everything work.
Things are fixed. :)

We admit this is super funky...And this only might work for UDP packets
up to 4096 bytes, the default set by unbound in OPT.

Much thanks to @gimaker for helping and suggesting unbound inside the
container.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant