-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dns servers respond with SRV records when queried for AAAA ones #4051
Comments
I think this behavior is correct and we should write a test to make sure it is retained. Here is the response for a SRV record from Knot DNS, whose target is an A record known to the resolver:
Consider also the similar implementation for glue records:
|
Wait I ran a different thing entirely. Here's Knot again:
So, yes, our current behavior is likely incorrect. |
We noticed this again under #6259. A quick look at the DNS server shows that it seems to ignore the requested record type. Here's where we handle the incoming message: omicron/dns-server/src/dns_server.rs Lines 254 to 266 in 57a07c3
We grab the records from the store at L266 there. That function is documented to return all records associated with the name and says nothing about filtering by record type: omicron/dns-server/src/storage.rs Lines 583 to 594 in 57a07c3
I think Given this, you can see any kind of record when querying for any other kind of record. It's not just limited to getting SRV records when querying for AAAA -- the reverse can also happen. @iliana is also right that we should continue to provide additional records when it makes sense. I think the fix I suggested won't change that because it's a separate hunk of code in |
As noted in #4051, queries to the internal DNS server would get any records for a name in response, rather than only records matching the query incoming query type. That behavior is confusing, but worse, wrong. Subtly, this is not actually a misbehavior I believe we can observe through `trust_dns_resolver`: the resolver from that crate includes its own `CachingClient`. As a side effect of upstream answers going through that caching client, incorrect Answers sections are cached and only correct answers actually make it out to us as consumers of `trust_dns_resolver`. But as is plenty clear in #4051, `dig` and other DNS clients can get incoherent answers! Simple enough to fix: only return answers that are answers to the question we were asked.
`lookup_ipv6` is both buggy and easy to misuse: * it sends an AAAA query for a domain which should have a SRV record - this works only because #4051 means the SRV record is incorrectly returned, along with the desired AAAA in Additionals * it looks up an IPv6 address from a SRV record *but ignores the port*. in places `lookup_ipv6` was used, it was paired consistently with the hardcoded port NEXUS_INTERNAL_PORT and matched what should be in the resolved SRV record. if we for example wanted to move Nexus' port (or start a test Nexus on an atypical port), the authoritative port number in the SRV response would be ignored for the hardcoded port. lets just use the port that we told DNS we're at! we may still want a bare IPv6 address for a service if we're testing network reachability, for example, but we're not doing that today. and if we need a service's IPv6 address to use with an alternate port to access a different API, we *probably* should have a second SRV record for that API to use instead? (`lookup_ipv6` would be simple enough to fix and retain, but after looking at its uses it seems it might be more trouble than it's worth right now)
`lookup_ipv6` is both buggy and easy to misuse: * it sends an AAAA query for a domain which should have a SRV record - this works only because #4051 means the SRV record is incorrectly returned, along with the desired AAAA in Additionals * it looks up an IPv6 address from a SRV record *but ignores the port*. in places `lookup_ipv6` was used, it was paired consistently with the hardcoded port NEXUS_INTERNAL_PORT and matched what should be in the resolved SRV record. if we for example wanted to move Nexus' port (or start a test Nexus on an atypical port), the authoritative port number in the SRV response would be ignored for the hardcoded port. lets just use the port that we told DNS we're at! we may still want a bare IPv6 address for a service if we're testing network reachability, for example, but we're not doing that today. and if we need a service's IPv6 address to use with an alternate port to access a different API, we *probably* should have a second SRV record for that API to use instead? (`lookup_ipv6` would be simple enough to fix and retain, but after looking at its uses it seems it might be more trouble than it's worth right now)
`lookup_ipv6` is both buggy and easy to misuse: * it sends an AAAA query for a domain which should have a SRV record - this works only because #4051 means the SRV record is incorrectly returned, along with the actually-desired AAAA for the SRV's target in Additionals * it looks up an IPv6 address from a SRV record *but ignores the port*. in places `lookup_ipv6` was used, it was paired consistently with the hardcoded port NEXUS_INTERNAL_PORT and matched what should be in the resolved SRV record. if we for example wanted to move Nexus' port (or start a test Nexus on an atypical port), the authoritative port number in the SRV response would be ignored for the hardcoded port. lets just use the port that we told DNS we're at! we may still want a bare IPv6 address for a service if we're going to test network reachability, for example, but we're not doing that with this function today. this all is distinct from helpers like `lookup_all_ipv6`. if we need a service's IPv6 address to use with an alternate port to access a different API, we *probably* should have a distinct SRV record for that lookup to use instead? i've found three instances of this: * wicket assumes the techport proxy is on the same IP as Nexus' API, but that isn't necessarily true * we assume the CRDB admin service listens on the same IP as CRDB itself, but that doesn't have to be true * we look up addresses for MGS via `ServiceName::Dendrite`, but there's a `ServiceName::ManagementGatewayService`, so either that's a typo or can be made to have its own SRV records there are some uses of `lookup_all_ipv6` that make a lot of sense still, where we're discovering the rack's network and _really_ do not care about the port that Dendrite happens to be on.
FWIW, with #6320 landed i believe we no longer depend on this misbehavior.. |
As noted in #4051, queries to the internal DNS server would get any records for a name in response, rather than only records matching the query incoming query type. That behavior is confusing, but worse, wrong. Subtly, this is not actually a misbehavior I believe we can observe through `trust_dns_resolver`: the resolver from that crate includes its own `CachingClient`. As a side effect of upstream answers going through that caching client, incorrect Answers sections are cached and only correct answers actually make it out to us as consumers of `trust_dns_resolver`. But as is plenty clear in #4051, `dig` and other DNS clients can get incoherent answers! Simple enough to fix: only return answers that are answers to the question we were asked.
As noted in #4051, queries to the internal DNS server would get any records we have for a name in response, rather than only records matching the query incoming query type. That behavior is confusing, but worse, wrong. Subtly, this is not actually a misbehavior I believe we can observe through `trust_dns_resolver`: the resolver from that crate includes its own `CachingClient`. As a side effect of upstream answers going through that caching client, incorrect Answers records are cached and only correct answers actually make it out to us as consumers of `trust_dns_resolver`. But as is plenty clear in #4051, `dig` and other DNS clients can get incoherent answers! Simple enough to fix: only return answers that are answers to the question we were asked.
and with #6308 we no longer produce this misbehavior 👋 |
I am not sure this is wrong and I think it's not very important, but I want to at least have a record of this.
@jordanhendricks reported this output from dogfood today:
Here,
dig
is making a query forAAAA
records. The server is responding withSRV
records (andAAAA
records as additionals). I expected the server to report no records since there are noAAAA
records with that name.Interestingly, with
+short
, you get this:As a
dig
user I find that particularly surprising because I asked forAAAA
records and got records on output with no IPv6 addresses. I suspectdig
is just dumping whatever it finds in the answer section.The text was updated successfully, but these errors were encountered: