Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tunnel errors with "cannot unmarshal DNS message" #75

Closed
bompus opened this issue Mar 5, 2019 · 16 comments
Closed

Tunnel errors with "cannot unmarshal DNS message" #75

bompus opened this issue Mar 5, 2019 · 16 comments
Labels
Confirmed Issue has been reproduced and confirmed

Comments

@bompus
Copy link

bompus commented Mar 5, 2019

INFO[0000] ResolveEdgeIPs err
ERRO[0000] Quitting due to error                         error="lookup cloudflarewarp.com on 127.0.0.53:53: cannot unmarshal DNS message"
INFO[0000] Metrics server stopped

I'm running Ubuntu 18.04.2 LTS with kernel 4.15.0-45-generic.

The 127.0.0.53 is the default Ubuntu Server caching DNS server provided by systemd-resolve. It's set up to cache queries from 1.1.1.1 and 1.0.0.1

If I change my DNS servers in /etc/resolv.conf and hard-code to 1.1.1.1 , the tunnel will start. However, it's weird that it won't work on a default Ubuntu install.

Can you check to see if you can reproduce on your end?

# systemd-resolve --status
Global
         DNS Servers: 1.1.1.1
                      1.0.0.1
                      94.237.127.9
                      94.237.40.9
          DNSSEC NTA: 10.in-addr.arpa
                      16.172.in-addr.arpa
                      168.192.in-addr.arpa
                      17.172.in-addr.arpa
                      18.172.in-addr.arpa
                      19.172.in-addr.arpa
                      20.172.in-addr.arpa
                      21.172.in-addr.arpa
                      22.172.in-addr.arpa
                      23.172.in-addr.arpa
                      24.172.in-addr.arpa
                      25.172.in-addr.arpa
                      26.172.in-addr.arpa
                      27.172.in-addr.arpa
                      28.172.in-addr.arpa
                      29.172.in-addr.arpa
                      30.172.in-addr.arpa
                      31.172.in-addr.arpa
                      corp
                      d.f.ip6.arpa
                      home
                      internal
                      intranet
                      lan
                      local
                      private
                      test

Link 4 (eth2)
      Current Scopes: none
@bompus
Copy link
Author

bompus commented Mar 5, 2019

I'm going to test with dnsmasq instead, but would like you to see if you can reproduce with the DNS cache daemon that is enabled by default on Ubuntu to see if you can determine what the issue might be.

Thanks!

@nickvollmar
Copy link
Contributor

Thanks for the report!

I suspect this is related to golang/go#27546. In particular, a commenter there specifically mentions SRV responses from systemd-resolved on Ubuntu 18.04. There's an upstream systemd commit to address the issue: systemd/systemd#9828

But I have here a GCP Ubuntu instance running systemd version 239 –– and that commit landed in version 240 –– and I'm seeing correct resolution. So that may not be the whole story.

@bompus
Copy link
Author

bompus commented Mar 5, 2019

Weird, I'm seeing this on my Ubuntu 18.04.02 with all updates installed:

systemd is already the newest version (237-3ubuntu10.13)

I'm trying dnsmasq and unbound now to see if it works there.

@bompus
Copy link
Author

bompus commented Mar 6, 2019

Works fine with dnsmasq. Hopefully Ubuntu pulls in a newer version of systemd soon to get the issue resolvd (see what I did there).

@sssilver sssilver added the Confirmed Issue has been reproduced and confirmed label Mar 6, 2019
@bigben386
Copy link

I have the exact same issue on Ubuntu on GCP. I changed my resolve.conf temporarily to 1.1.1.1 to get the tunnel service to start. Any idea if the tunnel will stay up when the resolve.conf gets overwritten by the dhcp client?

@bompus
Copy link
Author

bompus commented Mar 6, 2019

I've tested both dnsmasq 2.79 and unbound 1.9.0 as a local dns cache on Ubuntu 18.04.02, as well as switching /etc/resolv.conf to use nameserver 1.1.1.1 at the top. With the above being tested, pretty sure this is purely a systemd-resolved issue. Hopefully they can fix and backport it soon.

Perhaps cloudflare can look into the possibility to specify the DNS server IP address to use on the cli/config?

@adaptive
Copy link

adaptive commented Mar 7, 2019

I add to reduce/kill the tunnels to dodge this bug. Anything using ubuntu 18.04 is failing.

A simplistic solution could be to use DoH with cloudflared.
curl -v 'https://1.1.1.1/dns-query?ct=application/dns-json&name=cloudflare.com'

@donovan
Copy link

donovan commented Mar 9, 2019

I am hitting this issue on Ubuntu 18.04.2 LTS in AWS (linux-image-4.15.0-1032-aws). I downgraded to cloudflared version 2018.8.0 as I happened to have the deb handy. This fixed the issue for me.

# dpkg -l | grep 'ii  systemd '
ii  systemd                        237-3ubuntu10.13                   amd64        system and service manager

Is this the problematic SRV record?

# host -v -t srv cloudflarewarp.com
Trying "cloudflarewarp.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38290
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;cloudflarewarp.com.		IN	SRV

Received 36 bytes from 127.0.0.53#53 in 5 ms

@nickvollmar
Copy link
Contributor

We have just released 2019.3.0, which addresses this issue.

Would y'all please try that version and let us know if it resolves your errors?

@bompus
Copy link
Author

bompus commented Mar 11, 2019

I've already switched over from systemd-resolve to unbound to give me some more flexibility overall. Perhaps @donovan and @bigben386 can update and confirm the fix?

@bearcage
Copy link

One of my machines saw this problem too — confirmed 2019.3.0 fixes it.

@bigben386
Copy link

I can confirm it works in Ubuntu on GCP now. Thanks for the quick fix.

@bompus bompus closed this as completed Mar 12, 2019
@donovan
Copy link

donovan commented Mar 13, 2019

Fixed for me on Ubuntu 18.04.2 LTS in AWS.

@donovan
Copy link

donovan commented Mar 13, 2019

Answering my own question about the SRV record above:

$ host -v -t srv _warp._tcp.cloudflarewarp.com
Trying "_warp._tcp.cloudflarewarp.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7748
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;_warp._tcp.cloudflarewarp.com.	IN	SRV

;; ANSWER SECTION:
_warp._tcp.cloudflarewarp.com. 300 IN	SRV	2 1 7844 region2.cloudflarewarp.com.
_warp._tcp.cloudflarewarp.com. 300 IN	SRV	1 1 7844 region1.cloudflarewarp.com.

Received 103 bytes from 127.0.0.53#53 in 39 ms

@rulim34
Copy link

rulim34 commented Jul 17, 2023

Hi, I'm trying to run cloudflared on Ubuntu 18 NVIDIA Jetson Nano (arm64), but also facing this issue even with latest 2023.7.1. Tried to install and use dnsmasq, bust still facing the same error. Any idea how to fix it?

Log:

2023-07-17T10:47:18Z INF Starting tunnel tunnelID=9ecf8095-0fb5-4161-b419-874a8ceef77c
2023-07-17T10:47:18Z INF Version 2023.7.1
2023-07-17T10:47:18Z INF GOOS: linux, GOVersion: go1.19.11, GoArch: arm64
2023-07-17T10:47:18Z INF Settings: map[no-autoupdate:true token:*****]
2023-07-17T10:47:18Z INF Generated Connector ID: fb01e07c-6696-4411-abc7-01314b417ee2
2023-07-17T10:47:18Z INF Initial protocol quic
2023-07-17T10:47:18Z INF ICMP proxy will use 172.17.0.2 as source for IPv4
2023-07-17T10:47:18Z INF ICMP proxy will use :: as source for IPv6
2023-07-17T10:47:18Z ERR edge discovery: error looking up Cloudflare edge IPs: the DNS query failed error="lookup argotunnel.com on 100.100.100.100:53: cannot unmarshal DNS message" event=0
2023-07-17T10:47:18Z ERR Please try the following things to diagnose this issue: event=0
2023-07-17T10:47:18Z ERR   1. ensure that argotunnel.com is returning "origintunneld" service records. event=0
2023-07-17T10:47:18Z ERR      Run your system's equivalent of: dig srv _origintunneld._tcp.argotunnel.com event=0
2023-07-17T10:47:18Z ERR   2. ensure that your DNS resolver is not returning compressed SRV records. event=0
2023-07-17T10:47:18Z ERR      See GitHub issue https://github.com/golang/go/issues/27546 event=0
2023-07-17T10:47:18Z ERR      For example, you could use Cloudflare's 1.1.1.1 as your resolver: event=0
2023-07-17T10:47:18Z ERR      https://developers.cloudflare.com/1.1.1.1/setting-up-1.1.1.1/ event=0
2023-07-17T10:47:18Z INF Starting metrics server on 127.0.0.1:41147/metrics
2023-07-17T10:47:18Z ERR edge discovery: error looking up Cloudflare edge IPs: the DNS query failed error="lookup argotunnel.com on 100.100.100.100:53: cannot unmarshal DNS message" event=0
2023-07-17T10:47:18Z ERR Please try the following things to diagnose this issue: event=0
2023-07-17T10:47:18Z ERR   1. ensure that argotunnel.com is returning "origintunneld" service records. event=0
2023-07-17T10:47:18Z ERR      Run your system's equivalent of: dig srv _origintunneld._tcp.argotunnel.com event=0
2023-07-17T10:47:18Z ERR   2. ensure that your DNS resolver is not returning compressed SRV records. event=0
2023-07-17T10:47:18Z ERR      See GitHub issue https://github.com/golang/go/issues/27546 event=0
2023-07-17T10:47:18Z ERR      For example, you could use Cloudflare's 1.1.1.1 as your resolver: event=0
2023-07-17T10:47:18Z ERR      https://developers.cloudflare.com/1.1.1.1/setting-up-1.1.1.1/ event=0
2023-07-17T10:47:18Z INF Tunnel server stopped
2023-07-17T10:47:18Z ERR Initiating shutdown error="Could not lookup srv records on _v2-origintunneld._tcp.argotunnel.com: lookup argotunnel.com on 100.100.100.100:53: cannot unmarshal DNS message"
2023-07-17T10:47:18Z INF Metrics server stopped
Could not lookup srv records on _v2-origintunneld._tcp.argotunnel.com: lookup argotunnel.com on 100.100.100.100:53: cannot unmarshal DNS message

@siennathesane
Copy link

siennathesane commented Sep 9, 2023

I'm also running into this problem with 18.04 on NVIDIA Jetson Nano SoCs

Edit: I upgraded my Jetson Nano from 18.04 to 20.04 and this resolved the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Confirmed Issue has been reproduced and confirmed
Projects
None yet
Development

No branches or pull requests

9 participants