-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preview DNS changes in reconfigurator-cli #5338
Conversation
Remaining steps at this point are: - have reconfigurator-cli compute and store DNS generations with each blueprint that gets generated. I could do this now but it will probably conflict a little with my other DNS PR. - add DNS diff'ing to blueprint diff'ing. This is probably blocked on the above since otherwise I don't have any sample data to test with. - update all of this to support external DNS too. This is definitely blocked on my other DNS PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - only tiny nitpicks and a couple questions
dev-tools/omdb/src/bin/omdb/db.rs
Outdated
.map(|dns_zone| dns_zone.zone_name) | ||
.collect(); | ||
let state = UnstableReconfiguratorState { | ||
policy: policy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
policy: policy, | |
policy, |
DnsConfigParams { | ||
generation: u64::from(self.generation), | ||
generation: u64::from(Generation::new()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asking this without more context from elsewhere in the PR, which might make this question moot: should this function take a generation number as input instead of always returning params with generation=1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, fair question. The history as I recall it is:
- This struct was originally only used to build the initial DNS configuration so the generation was always 1. I believe it's used in this way by both RSS and the automated test runner setup stuff.
- A few weeks ago I wanted to use this struct in nexus-reconfigurator-execution to build a new DNS configuration, too. There, I needed to produce a different generation, so I added a function to specify the generation and it would use that.
- Now for the reasons mentioned in the description I'm changing the caller in nexus-reconfigurator-execution so that it only gets the zone out of this object. So I'm removing the (now unused) method to change the generation and hardcoding it back to 1, since the only callers that get a
DnsConfigParams
here want generation 1. We could always add the other thing back but at this point it does not appear needed.
I think of it as: this thing used to build a DnsConfigParams
, but now really generates a DnsConfigZone
, with a convenience method for assembling that into a DnsConfigParams
with generation 1 for the callers that want that. I thought about renaming it to build_full_config_for_initial_generation()
or something to better distinguish it from build_zone()
but I didn't bother -- wasn't sure the extra mouthful was really clearer. But since you had this question, I went ahead and did this. (I don't think it makes sense to accept the generation here given that all existing callers provide generation 1. It's not at all hard to build your own DnsConfigParams
if you want a different one and the whole point of this convenience function is to commonize code among the 5 or so callers so if they all have to decide which generation to put here it defeats part of the point.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks. I do think the longer name is worth the clarity.
); | ||
let records = &external_dns_config.zones[0].records; | ||
assert_eq!(external_dns_zone.zone_name, String::from("oxide.test"),); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_eq!(external_dns_zone.zone_name, String::from("oxide.test"),); | |
assert_eq!(external_dns_zone.zone_name, String::from("oxide.test")); |
|
||
fn iter_names(&self) -> impl Iterator<Item = NameDiff<'_>> { | ||
let all_names: BTreeSet<_> = | ||
self.left.keys().chain(self.right.keys()).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to ignore this as premature optimization, but - should we cache all_names
? If a caller calls multiple methods that all internally call iter_names()
, we'll recreate this set every time.
.sled_agents | ||
.iter() | ||
.map(|(sled_id, sled_agent_info)| { | ||
let sled = nexus_reconfigurator_execution::Sled::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to ask if this should consider the sled policy (e.g., to skip expunged sleds), but I don't think that's a property of collections, right? Any sled_agent present in a collection was not expunged at the time, by definition, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really good question. There's definitely something a little fishy about this. tl;dr: I think it doesn't affect what we're currently using this tool for, and fixing it is somewhat hard, so I think it's beyond the scope of this PR.
Background: we're constructing this list of sleds solely to be able to figure out what internal DNS records reconfigurator would create for a given blueprint. Reconfigurator only needs this list of sleds to figure out which sleds are scrimlets so that it can generate the switch zone services' DNS records. Eventually it should generate sled agent DNS records too but we don't seem to do that today.
This code is being used in two contexts:
- when diff'ing two blueprints, to figure out how DNS would change between them
- when diff'ing a blueprint against an actual DNS config, to see how executing that blueprint would change DNS
But as you're observing, the actual DNS contents for a blueprint depends on the policy, not just the blueprint. It's maybe misleading that reconfigurator-cli has an operation to diff blueprints without saying anything about the policy. It should probably let you keep track of multiple policies and make you specify which policy you want. You would need this if you wanted to use these tools to see how DNS would change if you just expunged a sled.
I do think we should think about about this longer term. For now I think the summary is:
- there is one state of the world that reconfigurator-cli knows about
- it's populated either with "sled-add" or "load" commands
- it is implicitly used wherever we need this information
This isn't the first place we implicitly use the policy like this. We use it when generating an inventory and when generating a blueprint from the inventory, too. But I think this is the first place where the user is referencing two points in time and might reasonably want two different policies.
The goal of this PR is to be able to preview the DNS changes that Reconfigurator will make to a real production system. This PR adds:
In omdb:
omdb db reconfigurator-save
now saves details about internal and external DNS versions, plus silo names and external DNS zone namesIn reconfigurator-CLI:
blueprint-diff BLUEPRINT1 BLUEPRINT2
now shows the internal and external DNS changes between two blueprintsblueprint-diff-dns internal|external VERSION BLUEPRINT
: view the internal or external DNS changes between a specific internal/external DNS version and a blueprintImplementation changes as part of this:
DnsDiff
and most of the Reconfigurator DNS execution stuff now works in terms of DNS zones (DnsConfigZone
) rather than complete configs (DnsConfigParams
). The difference is that a whole config contains any number of zones plus a generation number. (Note that we were already assuming there was exactly one DNS zone each for internal and external DNS (inDnsDiff::new
) -- this doesn't change that.) The whole config that we were generating before was not suitable for diff'ing because it would always bump the generation number even if the zone contents hadn't changed (because it didn't know at that point if they had changed). This didn't matter before because later we would check if anything had changed and just throw away the config altogether if nothing changed. Now that we're diff'ing these values, the automatic generation bump showed up as a spurious delta. I think this approach of operating on zones is much cleaner because these helpers now only return what they're really able to reliably compute (the zone contents) and the generation number becomes a concern of the code that's deciding what to do with it (which is also the code that knows what it should be).I tested this against dogfood by:
omdb db reconfigurator-save
to save the relevant state(NOTE: the format here has changed slightly in subsequent commits.)
First, load the dogfood state:
We see that dogfood has internal DNS generation 1 and external generation 25. It also has one blueprint. Here's what would change about internal DNS if we executed this blueprint (elided a bunch of irrelevant stuff with
...
):Reconfigurator wants to add records for mgd. I think this is because we added mgd DNS entries to Omicron some time after Dogfood was set up, so it never had them, but current systems do. I looked around and found no references to
_mgd
orServiceName::Mgd
so I think this change is at worst harmless.What about external DNS?
No changes. Great!
Let's create a new blueprint and see how DNS changes between the two of them:
Great -- no changes. To make this more interesting, I also added a knob for tuning the target number of Nexus nodes. Let's bump that up and generate a new plan and see what changes:
Almost all of that looks right:
The only thing a little weird is the choice of external IP. That's just because the services IP pool range is not configurable in reconfigurator-cli so it uses a different one than dogfood and so picks the 192.0.2.2 address instead of something from dogfood's actual external IP range.