Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure because TXT dns records are sometimes filtered #423

Open
Zibri opened this issue Jul 28, 2021 · 18 comments
Open

failure because TXT dns records are sometimes filtered #423

Zibri opened this issue Jul 28, 2021 · 18 comments

Comments

@Zibri
Copy link

Zibri commented Jul 28, 2021

yesterday it was working poerfectly on ubuntu 18.04
today it fails with this error:

2021-07-28T00:47:28Z INF Requesting new Quick Tunnel...
2021-07-28T00:47:32Z INF +------------------------------------------------------+
2021-07-28T00:47:32Z INF |  Your Quick Tunnel has been created! Visit it at:    |
2021-07-28T00:47:32Z INF |  cool-creativity-petersburg-makes.trycloudflare.com  |
2021-07-28T00:47:32Z INF +------------------------------------------------------+
2021-07-28T00:47:32Z INF Version 2021.7.3
2021-07-28T00:47:32Z INF GOOS: linux, GOVersion: devel +11087322f8 Fri Nov 13 03:04:52 2020 +0100, GoArch: amd64
2021-07-28T00:47:32Z INF Generated Connector ID: a02696fb-d996-4047-b4f5-e860be44bfce
2021-07-28T00:47:32Z INF cloudflared will not automatically update when run from the shell. To enable auto-updates, run cloudflared as a service: https://developers.cloudflare.com/argo-tunnel/reference/service/
2021-07-28T00:47:52Z ERR Couldn't start tunnel error="lookup protocol.argotunnel.com on 127.0.0.53:53: read udp 127.0.0.1:39905->127.0.0.53:53: i/o timeout"
lookup protocol.argotunnel.com on 127.0.0.53:53: read udp 127.0.0.1:39905->127.0.0.53:53: i/o timeout

same goes if I change dns
note: the machine is a VM inside my main pc.

on my windows host pc I can do:
cloudflared tunnel --url http://192.168.1.104:XXXX

yesterday the same command worked on the guest machine (192.168.1.104)
today gives that error.

any clue?

@benbalter
Copy link

I receive the same error with 2021.7.3 (with both that protocol.argotunnel.com address and a cloudflare-gateway.com teams address). Downgrading to 2021.7.0 resolves the issue.

Could this be related to TUN-4699: Make quick tunnels the default in cloudflared from 2021.7.1?

I'm running proxy-dns on a Raspberry Pi, which has been running without issue for over a year, and then suddenly broke with ~2021.7.1. Happy to help diagnose.

@nmldiegues
Copy link
Contributor

I receive the same error with 2021.7.3 (with both that protocol.argotunnel.com address and a cloudflare-gateway.com teams address). Downgrading to 2021.7.0 resolves the issue.

Could this be related to TUN-4699: Make quick tunnels the default in cloudflared from 2021.7.1?

I'm running proxy-dns on a Raspberry Pi, which has been running without issue for over a year, and then suddenly broke with ~2021.7.1. Happy to help diagnose.

@benbalter can you show the cloudflared command and config that you are running with that broke with 2021.7.1 onwards?

@nmldiegues
Copy link
Contributor

@Zibri and @benbalter can you run the following command in the environment where cloudflared is failing?

dig -t txt protocol.argotunnel.com

@nmldiegues
Copy link
Contributor

This is the same as #388

@Zibri
Copy link
Author

Zibri commented Jul 28, 2021

dig -t txt protocol.argotunnel.com

it does not return anything and times out.
in egypt dns queries are very restricted.
perhaps you should do the query using https dns

@Zibri
Copy link
Author

Zibri commented Jul 28, 2021

SRV queries are not blocked. and a few other types too.
so you have 2 choices:
or you use an https dns or you try other dns queries as a backup like SRV or SIG, CAA etc etc

@Zibri Zibri changed the title very strange bug failure because TXT dns records are sometimes filtered Jul 28, 2021
@Zibri
Copy link
Author

Zibri commented Jul 28, 2021

Downgrading to 2021.7.0 resolves the issue.

Thanks for poiting this out.
Also to avoid autoupdating, an easy trick is this:

# sed -i "s/2021.7.0/2025.7.0/" $(which cloudflared)

@nmldiegues
Copy link
Contributor

About the lookup TXT problem, we haven't yet addressed, and will soon.

About the "quick tunnel" (i.e., a no-login tunnel) causing the lookup TXT --- that seems to fail on rare situations such as those described here --- we have reverted that logic in 2021.7.4, meaning it will no longer cause that lookup.

@benbalter
Copy link

benbalter commented Jul 28, 2021

can you show the cloudflared command and config that you are running with that broke with 2021.7.1 onwards?

I have a service defined to run /usr/local/bin/cloudflared --config /etc/cloudflared/config.yml with the following config:

proxy-dns: true
proxy-dns-port: 5053
proxy-dns-upstream:
  - https://XXX.cloudflare-gateway.com/dns-query
proxy-dns-bootstrap:
  - https://1.1.1.2/dns-query

can you run the following command in the environment where cloudflared is failing?

With cloudflared running (2021.7.0), I get the "http2=100" response, presumably as expected.

Before cloudflared bootstraps, the dig query fails, because the system resolver (set to 127.0.0.1#53) uses cloudflared's proxy-dns as it's upstream resolver (127.0.0.1#5053).

perhaps you should do the query using https dns

It seems 2021.7.1's quick channels default introduced a dependency on being able to query that TXT record during the bootstrap process, but does so in a way that uses the system resolver, rather than the designated bootstrap resolver / DNS over HTTPS.

Similar to the discussion in #388 and above, on my network, non-DoH DNS queries are blocked entirely, meaning as before 2021.7.1, in order to maintain backwards compatibility, the bootstrap process should allow use of DoH for its initial resolution, not the system resolver.

All that said, thank you for your quick response and for maintaining such a great project! 🎉

@sudarshan-reddy
Copy link
Contributor

sudarshan-reddy commented Jul 28, 2021

With cloudflared running (2021.7.0), I get the "http2=100" response, presumably as expected.

Hi @benbalter! Can you share the stdout logs when you run the same command with v2021.7.0 please?

non-DoH DNS queries are blocked entirely, meaning as before 2021.7.1, in order to maintain backwards compatibility, the bootstrap process should allow use of DoH for its initial resolution, not the system resolver.

This should still happen if you were to use the command cloudflared proxy-dns. Can you try it out with the latest version and let me know if that works for you?

@benbalter
Copy link

benbalter commented Jul 28, 2021

Can you share the stdout logs when you run the same command with v2021.7.0 please?

Of course. Thanks for the quick reply. Here's the output on 2021.7.0:

$ dig -t txt protocol.argotunnel.com

; <<>> DiG 9.11.5-P4-5.1+deb10u2-Raspbian <<>> -t txt protocol.argotunnel.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63169
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;protocol.argotunnel.com.	IN	TXT

;; ANSWER SECTION:
protocol.argotunnel.com. 300	IN	TXT	"http2=100"

;; Query time: 26 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Jul 28 19:24:24 UTC 2021
;; MSG SIZE  rcvd: 97

And if I were to query cloudflared directly (bypassing the downstream pi-hole DNS server), here's the result:

$ dig -t txt protocol.argotunnel.com @127.0.0.1 -p 5053
; <<>> DiG 9.11.5-P4-5.1+deb10u2-Raspbian <<>> -t txt protocol.argotunnel.com @127.0.0.1 -p 5053
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44008
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;protocol.argotunnel.com.	IN	TXT

;; ANSWER SECTION:
protocol.argotunnel.com. 277	IN	TXT	"http2=100"

;; Query time: 30 msec
;; SERVER: 127.0.0.1#5053(127.0.0.1)
;; WHEN: Wed Jul 28 19:26:30 UTC 2021
;; MSG SIZE  rcvd: 97

@benbalter
Copy link

benbalter commented Jul 28, 2021

Can you try it out with the latest version and let me know if that works for you?

The 2021.7.4 bootstraps as expected, both via the cloudflared command + config file and with cloudflared proxy-dns directly

@sudarshan-reddy
Copy link
Contributor

Oops. I misspoke. Can you also do me the favour of trying cloudflared proxy-dns out with 2021.7.3?

Of course. Thanks for the quick reply. Here's the output on 2021.7.0:

Thanks for this. Can you also share the output of your cloudflared command please?

@nmldiegues
Copy link
Contributor

So I think we've understood this a bit better now.

This second case therefore starts a tunnel, besides starting the dns proxy. It's very likely that you are not even using that tunnel at all. So you can just run the first case above and therefore skip the tunnel logic.

The reason why the behaviour changed is because we changed those "account-less tunnels" (where no --hostname is provided, and no tunnel is pre-created with a login) to no longer use our legacy tunnels infrastructure, and use the new one for named tunnels. This new one looks up a TXT record, and that's what you noticed.
We will make cloudflared more resilient to the TXT lookup.

@nmldiegues
Copy link
Contributor

We've uncovered that this different behaviour (of running a tunnel next to the proxy-dns) was a regression/accidental recent change due to some bad argument handling. FYI, we will revert that

@benbalter
Copy link

So you can just run the first case above and therefore skip the tunnel logic.

Came here to post the stdout requested above, and arrived at a similar conclusion.

That said, I may have found another bug (happy to move this to a new issue, if unrelated), in that either I don't believe cloudflared proxy-dns is using the bootstrap resolver (either specified or default), or I don't understand what the purpose of that setting is (probably more likely).

Output of cloudflared on 2021.7.3:
pi@raspberrypi:~ $ cloudflared
2021-07-28T22:36:45Z INF Requesting new Quick Tunnel...
failed to request quick tunnel: Post "https://api.trycloudflare.com/tunnel": dial tcp: lookup api.trycloudflare.com on 127.0.0.1:53: read udp 127.0.0.1:46291->127.0.0.1:53: i/o timeout
Output of cloudflared proxy-dns with a Cloudflare gateway upstream (duplicate log entries removed)
^Cpi@raspberrypi:~ $ cloudflared proxy-dns --port 5053 --upstream https://XXX.cloudflare-gateway.com/dns-query --bootstrap "https://1.1.1.1/dns-query"
2021-07-28T22:38:58Z INF Adding DNS upstream url=https://XXX.cloudflare-gateway.com/dns-query
2021-07-28T22:38:58Z INF Starting DNS over HTTPS proxy server address=dns://localhost:5053
2021-07-28T22:38:58Z INF Starting metrics server on 127.0.0.1:41525/metrics
2021-07-28T22:39:05Z ERR failed to connect to an HTTPS backend "https://XXX.cloudflare-gateway.com/dns-query" error="failed to perform an HTTPS request: Post \"https://XXX.cloudflare-gateway.com/dns-query\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
2021-07-28T22:39:10Z ERR failed to connect to an HTTPS backend "https://XXX.cloudflare-gateway.com/dns-query" error="failed to perform an HTTPS request: Post \"https://XXX.cloudflare-gateway.com/dns-query\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"

I get similar output for cloudflared proxy-dns on 2021.7.4. As you can see above, in both versions, cloudflared is attempting to resolve the XXX.cloudflare-gateway.com subdomain via the 127.0.0.1#53 resolver, even though the bootstrap resolver is specified in the config (and the default resolver should be Cloudflare's IP). I can also see the XXX.cloudflare-gateway.com requests in my #53 resolver's logs (which uses cloudflared as upstream, resulting in a timeout). cloudflared proxy-dns with no arguments works, as it uses 1.1.1.1 as its upstream.

Is my understanding incorrect in that cloudflared proxy-dns should use the bootstrap resolver to resolve the upstream resolver's domain at startup?

If instead I use the following config (moving 1.1.1.2 to a second upstream), when the first DNS lookup fails, it falls back to 1.1.1.2 (I believe, only for that request. since the first resolver could then be used), and resolves/proxies requests as expected:

proxy-dns: true
proxy-dns-port: 5053
proxy-dns-upstream:
  - https://XXX.cloudflare-gateway.com/dns-query
  - https://1.1.1.2/dns-query

Again, very grateful for your time and thoughtfulness here, and glad to hear that I found at least one bug, and it wasn't entirely my fault. Eager to hear your thoughts on the bootstrap issue, and again, if unrelated, happy to move it to a new issue. Thanks again!

@benbalter
Copy link

if you run cloudflared proxy-dns --config ...

One minor note, in case it impacts the above, cloudflared takes a config argument, but it does not appear proxy-dns does.

Placing the --config argument after proxy-dns results in Incorrect Usage: flag provided but not defined: -config and placing it before results in the command succeeding, but with the config ignored.

To be clear, I'm not seeking to complain (easy enough to pass as command line vars), but wanted to share in case the change in behavior was helpful.

@sudarshan-reddy
Copy link
Contributor

Sorry for getting back a bit late here guys. These issues should be fixed in the newest release. Give it a go and let us know what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants