-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upstairs state machine refactoring (3/3) #1577
base: mkeeter/explicit-stop-state
Are you sure you want to change the base?
Upstairs state machine refactoring (3/3) #1577
Conversation
79652b6
to
93a6788
Compare
d23dd68
to
3835f2a
Compare
3835f2a
to
43a1e30
Compare
29ee56a
to
448a57a
Compare
448a57a
to
958627d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, just questions to help me understand.
I reviewed up to is_state_transition_valid()
in upstairs/src/client.rs
I wanted to post this much so it would be recorded and I'll continue and either
post more later, or post that I'm done.
DsState::WaitActive => "WAC".to_string(), | ||
DsState::WaitQuorum => "WAQ".to_string(), | ||
DsState::Reconcile => "REC".to_string(), | ||
DsState::Connecting { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about how to present this, but what you have done here is fine for now and if I come up with something I like better, I'll put out a PR for it
self.state = DsState::Reconcile; | ||
info!(self.log, "Transition from {:?} to Reconcile", self.state); | ||
let DsState::Connecting { state, mode } = &mut self.state else { | ||
panic!("invalid state {:?}", self.state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was there before, but could we add the client ID to the panic string here?
Sometimes the log right before a panic does not make it out, and it would be nice to
know which client was in the invalid state.
// Otherwise, use live-repair; `ConnectionMode::Replaced` | ||
// indicates that the address is allowed to change. | ||
UpstairsState::Active | UpstairsState::Deactivating { .. } => { | ||
ConnectionMode::Replaced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are deactivating here, could we just go to New? Or we need the replaced state to let ourselves know when we restart this client that we should expect a new downstairs?
DsState::Active | DsState::Offline if !can_replay => { | ||
Some(DsState::Faulted) | ||
} | ||
let new_mode = match current { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still tough to track what happens here. I don't know how to make it
any easier to follow though. The new states do help somewhat, but we still
have the can_replay adding into the layers here.
DsState::Active => Some(DsState::Offline), | ||
// If the Downstairs has spontaneously stopped, we will attempt to | ||
// replay jobs when reconnecting | ||
DsState::Active => ConnectionMode::Offline, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might put this right after line 553 with a comment like you have on line 552 about that covering the other cases for DsState::Connecting
. This way you have all the cases covered for Active
and Connecting
in the same area. I was initially confused when I saw DsState::Active
again here after seeing it at line 544.
DsState::Stopping(ClientStopReason::Replacing) => { | ||
Some(DsState::Replaced) | ||
// XXX NegotiationFailed could also be hit during reconnection, | ||
// in which case we shouldn't use `ConnectionMode::New` (?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the question is, what path would we take that is different if we see NegotiationFailed
on initial startup (before we are active) vs. seeing it after we are active.
Looking at all the reasons we might set NegotiationFailed
, they appear to be something that will fail if we try again (version mismatch), or receiving some message out of expected order (programmer bug??). If we go to New for this client, as long as we have to go back through the negotiation again I think our downstairs will end up in the right place.
@@ -920,9 +869,10 @@ impl DownstairsClient { | |||
) -> EnqueueResult { | |||
match self.state { | |||
// We never send jobs if we're in certain inactive states | |||
DsState::Faulted | |||
| DsState::Replaced | |||
| DsState::LiveRepairReady |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does having a downstairs DsState::Connecting
with state: NegotiationState::LiveRepairReady
end up? In the panic below?
| DsState::Offline => {} // Okay | ||
|
||
DsState::LiveRepairReady if self.cfg.read_only => {} // Okay | ||
"invalid state transition from {:?} -> {:?} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should keep the client ID and upstairs ID in the panic message. And, maybe even add the session_id, as a propolis instance with multiple Upstairs could panic here and we might have trouble figuring out which piece hit this panic.
(staged on top of #1570)
This PR refactors the
DsState
type into more specific, data-bearing types. Here's a quick look at the new types:Many data members of the
DownstairsClient
– which were only valid for a single state – are now data within a specific variant, e.g.auto_promote
andnegotiation_state
. This makes the code both more precise and harder to mess up.+400 of this change is updating the OpenAPI spec, which includes
DsState
. I don't think anyone else is expecting it to be a particular shape, but let me know if I'm wrong (also, it already changed in #1570, so warn me there as well!).Remaining work:
XXX
questions, which are places that I wasn't sure about existing behaviorchecked_state_transition
, which I just deleted for the moment.negotiation_state
withinDsState::Connecting
?)