Skip to content

Commit

Permalink
[wicketd] Refine preflight uplink DNS check (#4050)
Browse files Browse the repository at this point in the history
Prior to this change, we used trust-dns-resolver's
[lookup_ip()](https://docs.rs/trust-dns-resolver/0.22.0/trust_dns_resolver/struct.AsyncResolver.html#method.lookup_ip)
with default options for our preflight DNS checks, but this doesn't
expose as much information as we want in failure cases. In particular,
it will query for an A record, and if that fails for any reason at all
(including I/O errors and timeouts), it will discard the error, query
for an AAAA record, and return the result of that second query.

The substantial changes in this PR are:

* We set several of the options when creating our resolver to more
closely match how we're using it (see `build_resolver()`). Particularly
relevant, we disable retries (because we're going to retry ourselves).
* Instead of using `lookup_ip()`, we now use `ipv4_lookup()` (A query)
and `ipv6_lookup()` (AAAA query) explicitly so we can see the results of
each, and decide ourselves whether to proceed from A to AAAA. We do
proceed if we get a `NoRecordsFound` from the A query, but do not
proceed if there's any other error (which indicates a problem
communicating with the server that would not be resolved by changing
query types).

Running a `preflight uplink` check on madrid with two correct DNS
servers (1.1.1.1, 9.9.9.9) and one nonexistent DNS server
(192.168.100.100) and querying for one valid hostname
(`ntp.eng.oxide.computer`) and one invalid hostname
(`ntp2.eng.oxide.computer`) now produces this output:

```
⚠ Checking for external DNS connectivity (33.088334968s)
    DNS server 1.1.1.1 A query attempt 1: resolved ntp.eng.oxide.computer to 172.20.0.5
    DNS server 9.9.9.9 A query attempt 1: resolved ntp.eng.oxide.computer to 172.20.0.5
    DNS server 1.1.1.1 A query attempt 1: failed to look up ntp2.eng.oxide.computer: no record found for Query { name: Name("ntp2.eng.oxide.computer."), query_type: A, query_class: IN }
    DNS server 1.1.1.1 AAAA query attempt 1: failed to look up ntp2.eng.oxide.computer: no record found for Query { name: Name("ntp2.eng.oxide.computer."), query_type: AAAA, query_class: IN }
    DNS server 9.9.9.9 A query attempt 1: failed to look up ntp2.eng.oxide.computer: no record found for Query { name: Name("ntp2.eng.oxide.computer."), query_type: A, query_class: IN }
    DNS server 9.9.9.9 AAAA query attempt 1: failed to look up ntp2.eng.oxide.computer: no record found for Query { name: Name("ntp2.eng.oxide.computer."), query_type: AAAA, query_class: IN }
    DNS server 192.168.100.100 A query attempt 1: failed to look up ntp.eng.oxide.computer: request timed out
    DNS server 192.168.100.100 A query attempt 2: failed to look up ntp.eng.oxide.computer: request timed out
    DNS server 192.168.100.100 A query attempt 3: failed to look up ntp.eng.oxide.computer: request timed out
```

We can see the A and AAAA queries made to the valid servers when failing
to resolve `ntp2.eng.oxide.computer`, and we can see the specific errors
from the three attempts to query the nonexistent server; we never move
on to querying it for AAAA records.

Fixes #4044.

---------

Co-authored-by: Andrew J. Stone <[email protected]>
  • Loading branch information
jgallagher and andrewjstone authored Sep 13, 2023
1 parent 9700d44 commit 980c5be
Show file tree
Hide file tree
Showing 3 changed files with 318 additions and 135 deletions.
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions wicketd/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ debug-ignore.workspace = true
display-error-chain.workspace = true
dpd-client.workspace = true
dropshot.workspace = true
either.workspace = true
flume.workspace = true
futures.workspace = true
gateway-messages.workspace = true
Expand Down
Loading

0 comments on commit 980c5be

Please sign in to comment.