-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define a saga for instance start #3873
Conversation
Create a saga that starts instances. This has some immediate benefits: - It's no longer possible to leak an instance registration during start; previously this could happen if Nexus crashed while handling a start call. - The saga synchronizes properly with concurrent attempts to delete an instance; the existing start routine may not be handling this correctly (it can look up an instance and decide it's OK to start, then start talking to sled agent about it while a deletion saga runs concurrently and deletes the instance). - The saga establishes networking state (Dendrite NAT entries, OPTE V2P mappings) for a newly started instance if it wasn't previously established. This is a stopgap measure to ensure that this state exists when restarting an instance after a cluster is restarted. It should eventually be replaced by a step that triggers the appropriate networking RPW(s). This saga can be used, at least in theory, as a subsaga of the instance create saga to replace that saga's logic for starting a newly-created instance. This work isn't done in this PR, though. (The change isn't trivial because the new start saga expects a prior instance record as a parameter, and the create saga can't construct *a priori* the instance record it intends to insert into CRDB.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for these tests!
match db_instance.runtime_state.state.0 { | ||
InstanceState::Starting | InstanceState::Running => { | ||
return Ok(db_instance) | ||
} | ||
InstanceState::Stopped => {} | ||
_ => { | ||
return Err(Error::conflict(&format!( | ||
"instance is in state {} but must be {} to be started", | ||
db_instance.runtime_state.state.0, | ||
InstanceState::Stopped | ||
))) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any concern of TOCTTOU here? Kinda seems like this state checking could/should be part of the saga?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be covered here by the record's state generation: if the state changes, the saga will fail to transition from Stopped to Starting (because its generation number will be outdated) and will bail. This still isn't a perfect scheme, since it's possible for the saga to hit the "oops the state changed" condition even if the state has come to rest at Stopped. To completely fix all this I think we need to fix another bug first (bet you can guess which one...).
I'll add a comment here about the possible TOCTTOU and how we avoid it. There's another comment in the saga explaining the other remaining problems with the synchronization scheme in this PR.
InstanceState::Starting | InstanceState::Running => { | ||
return Ok(db_instance) | ||
} | ||
InstanceState::Stopped => {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The saga itself seems fine accepting instances in the Creating
state -- is that not acceptable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to be careful about that here. There's a small window in the create saga where the instance record is in CRDB (and has the Creating state) while the objects that support it (external IPs, attached disks, etc.) are still being set up, and we don't want to allow a start request to go through while we're in that state.
The saga accepts Creating as a prior state so that it can be used as a subsaga from within the instance create saga (even though that's not hooked up yet). I could buy, though, that a better approach is to have the create saga fully create the instance, then unconditionally move it to Stopped, and then have it invoke the start saga if needed. That would make everything very consistent at the cost of having a Creating instance (where the create request has start: true
) briefly go to Stopped before entering Starting. That seems to me like a smallish price to pay (and arguably it's strictly better because it's more accurate). I'll mull this over, but WDYT of this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that approach. I was considering other options, but they all kinda seems like they incur an equivalent CRDB write to make the instance transition from "being constructed, not visible" to "constructed enough that we can correctly start it", which is basically what you're also proposing with the intermediate stopped state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, having slept on it, I'm comfortable saying the value of having a uniform start path outweighs the benefit of going directly from Creating to Starting without very briefly passing through Stopped first. Filed #3883 to track the remaining work here.
Thanks as always for the review! I've added some comments in d59b4cf to cover the topics discussed above. I'll address the remaining open question about whether the instance create saga should move instances to Stopped before starting them in whatever follow-on PR connects the create saga to the start saga. |
Finding boundary switches in the instance start saga requires fleet query access. Use the Nexus instance allocation context to get it instead of the saga initiator's operation context. (This was introduced in #3873; it wasn't previously a problem because the instance create saga set up instance NAT state, and that saga received a boundary switch list as a parameter, which parameter was generated by using the instance allocation context. #4194 made this worse by making instance create use the start saga to start instances instead of using its own saga nodes.) Update the instance-in-silo integration test to make sure that instances created by a silo collaborator actually start. This is unfortunately not very elegant. The `instance_simulate` function family normally uses `OpContext::for_tests` to get an operation context, but that context is associated with a user that isn't in the test silo. To get around this, add some simulation interfaces that take an explicit `OpContext` and then generate one corresponding to the silo user for the test case in question. It seems like it'd be nicer to give helper routines like `instance_simulate` access to a context that is omnipotent across all silos, but I wasn't sure how best to do this. I'm definitely open to suggestions here. Tested via cargo tests. Fixes #4272.
Create a saga that starts instances. This has the following immediate benefits:
This saga can also be used, at least in theory, as a subsaga of the instance create saga to replace that saga's logic for starting a newly-created instance. This work isn't done in this PR, though. (The change isn't trivial because the new start saga expects a prior instance record as a parameter, and the create saga can't construct a priori the instance record it intends to insert into CRDB.)
Tested via assorted new cargo tests and by launching a dev cluster with the changes, stopping an instance, restarting it, and verifying that the instance restarted correctly and that Nexus logs contained the expected log lines.
Fixes #2824. Fixes #3813.