Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[internal-dns] remove lookup_ipv6 in favor of lookup_socket_v6 #6320

Merged
merged 4 commits into from
Aug 16, 2024

Conversation

iximeow
Copy link
Member

@iximeow iximeow commented Aug 13, 2024

lookup_ipv6 is both buggy and easy to misuse:

  • it sends an AAAA query for a domain which should have a SRV record
  • it looks up an IPv6 address from a SRV record but ignores the port. in places lookup_ipv6 was used, it was paired consistently with the hardcoded port NEXUS_INTERNAL_PORT and matched what should be in the resolved SRV record. if we for example wanted to move Nexus' port (or start a test Nexus on an atypical port), the authoritative port number in the SRV response would be ignored for the hardcoded port.

lets just use the port that we told DNS we're at!

we may still want a bare IPv6 address for a service if we're going to test network reachability, for example, but we're not doing that with this function today. this all is distinct from helpers like lookup_all_ipv6.

if we need a service's IPv6 address to use with an alternate port to access a different API, we probably should have a distinct SRV record for that lookup to use instead? i've found three instances of this:

  • wicket assumes that the techport proxy can be done on the same IP as Nexus' API, just a different port
  • we assume the CRDB admin service listens on the same IP as CRDB itself, but i don't think that has to be true?
  • we look up addresses for MGS here via ServiceName::Dendrite, but there's a ServiceName::ManagementGatewayService that - if it doesn't already - seems like it should have a SRV record too.

there are some uses of lookup_all_ipv6 that make a lot of sense still, where we're discovering the rack's network and really do not care about the port that Dendrite happens to be on.

(this all said, lookup_ipv6 would be simple enough to fix and retain. but after looking at its uses it seems it might be more trouble than it's worth right now)

this is the counterpart to #6308 which fixes clients (so far as the test suite and i can tell) to not depend on our internal DNS server returning answers we didn't ask for. we could land this first and #6308 at some point later, or i think since Oximeter is the only place we depended on lookup_ipv6, i'm pretty sure we can land both?

`lookup_ipv6` is both buggy and easy to misuse:
* it sends an AAAA query for a domain which should have a SRV record
  - this works only because #4051
    means the SRV record is incorrectly returned, along with the desired
    AAAA in Additionals
* it looks up an IPv6 address from a SRV record *but ignores the port*.
  in places `lookup_ipv6` was used, it was paired consistently with the
  hardcoded port NEXUS_INTERNAL_PORT and matched what should be in the
  resolved SRV record. if we for example wanted to move Nexus' port (or
  start a test Nexus on an atypical port), the authoritative port number
  in the SRV response would be ignored for the hardcoded port.

lets just use the port that we told DNS we're at!

we may still want a bare IPv6 address for a service if we're testing
network reachability, for example, but we're not doing that today. and
if we need a service's IPv6 address to use with an alternate port to
access a different API, we *probably* should have a second SRV
record for that API to use instead?

(`lookup_ipv6` would be simple enough to fix and retain, but after
looking at its uses it seems it might be more trouble than it's worth
right now)
Copy link
Contributor

@luqmana luqmana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • it looks up an IPv6 address from a SRV record but ignores the port. in places lookup_ipv6 was used, it was paired consistently with the hardcoded port NEXUS_INTERNAL_PORT and matched what should be in the resolved SRV record. if we for example wanted to move Nexus' port (or start a test Nexus on an atypical port), the authoritative port number in the SRV response would be ignored for the hardcoded port.

lets just use the port that we told DNS we're at!

Yea, that's always been the goal so glad to see this!

Dunno about the CRDB admin service case, but for the other two:

  • wicket assumes that the techport proxy can be done on the same IP as Nexus' API, just a different port

Ah yea, the dropshot instance that serves the external api for the techport proxy listens on the underlay network by binding to the same address as the internal nexus api. An explicit SRV/service entry for it though wouldn't be a bad idea to get rid of hardcoding ports everywhere.

  • we look up addresses for MGS here via ServiceName::Dendrite, but there's a ServiceName::ManagementGatewayService that - if it doesn't already - seems like it should have a SRV record too.

Yea, I think that's just a typo. Looking up ManagementGatewayService should work.

PR looks good with one suggestion. The specific service fixes can be done as a follow up.

oximeter/collector/src/agent.rs Outdated Show resolved Hide resolved
@iximeow iximeow force-pushed the ixi/remove-service-bare-ip-resolution branch from 3acffa4 to 0dcec7a Compare August 16, 2024 00:51
@iximeow
Copy link
Member Author

iximeow commented Aug 16, 2024

OK. this should actually pass (finally). failures were not because of changes at any point here, they were because a combination of #6204, this branch being made before #6204, and the illumos used here by buildomat being upgraded to include the change that motivated #6204. so entirely incidentally to this PR, ipadm was not erroring in exactly the way we expected it would to hint that we needed to create the addresses, and instead everything got wedged.

merging main pulls in #6204 and with a ✔️ i'll merge.

@iximeow iximeow merged commit 8ae0833 into main Aug 16, 2024
22 checks passed
@iximeow iximeow deleted the ixi/remove-service-bare-ip-resolution branch August 16, 2024 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants