-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify fault and restart code (1/3) #1568
base: main
Are you sure you want to change the base?
Conversation
3d11ac2
to
de39d6e
Compare
de39d6e
to
26823a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good, one nit and one question/confirmation.
upstairs/src/client.rs
Outdated
@@ -2202,25 +2180,53 @@ pub(crate) struct DownstairsStats { | |||
#[derive(Debug)] | |||
pub(crate) enum ClientStopReason { | |||
/// We are about to replace the client task | |||
Replacing, | |||
Replaced, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's still Replacing
here, as that's what we are doing.
If we request to replace a downstairs during negotiation, that's ClientNegotiationFailed(Replacing)
If we request to replace a downstairs after activation, does that become ClientStopReason(Replaced)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that ClientStopReason::Replacing
makes more sense here.
If we request to replace a downstairs during negotiation, that's ClientNegotiationFailed(Replacing)
If we request to replace a downstairs after activation, does that become ClientStopReason(Replaced)?
Close, the two options are ClientStopReason::NegotiationFailed(ClientNegotiationFailed::Replaced))
and ClientStopReason::Replaced
.
i, | ||
up_state, | ||
ClientFaultReason::FailedLiveRepair, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we no longer need the Faulted
check here, it means when this DS went to Faulted, it must have triggered the fault_client()
path. That would triggered the skip_all_jobs()
call.
I don't believe that was always true, but it (as far as I can tell) is true now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with you – the DsState::Faulted
check is now in the catchall _ => { .. }
branch below. You don't want anything to change here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with you – the
DsState::Faulted
check is now in the catchall_ => { .. }
branch below. You don't want anything to change here, right?
Correct, no changes, just making sure I'm understanding the impact of the change and my logic is correct.
The conditions under which we restart a client IO task are somewhat baroque:
This PR adds more specific types:
enum ClientNegotiationFailed
andenum ClientFaultReason
. Mid-negotiation restarts must provide aClientNegotiationFailed
; faulting the IO task must provide aClientFaultReason
. Both of these types are then converted into aClientStopReason
for logging.In addition, faulting a client is now done through
Downstairs::fault_client
instead of calling bothDownstairs::skip_all_jobs
andDownstairsClient::fault
. Having a single function makes this harder to mess up!