-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set instance state to failed on bad instance_put #1715
Conversation
If a call to sled-agent's `instance_put` fails with anything other than a 400 (bad request), set instance's state to failed - we cannot know what state the instance is in. This exposed another two problems: - Nexus would only delete network interfaces off instances that were stopped - Nexus would only detach disks off instances that were creating or stopped. This commit changes the relevant network interface queries to allow deletion of network interfaces for instances that are either stopped or failed, and adds Failed to the list of ok_to_detach_instance_states for disks. Note that network interface insertion still requires an instance to be stopped, not failed. Closes oxidecomputer#1713
@@ -194,6 +194,13 @@ impl SledAgent { | |||
initial_hardware: InstanceHardware, | |||
target: InstanceRuntimeStateRequested, | |||
) -> Result<InstanceRuntimeState, Error> { | |||
let ncpus: i64 = (&initial_hardware.runtime.ncpus).into(); | |||
if ncpus > 16 { | |||
return Err(Error::internal_error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this should probably be bad_request
instead, but for test_instance_fails_to_boot_with_disk
I needed it to be an internal_error
. I welcome any input for better ways to do this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First thought was InvalidValue
but that also just gets mapped to 400 as well :/
Either way, add a comment here to indicate why this specific error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comments in be9f7a3
// this is unfortunate, but sled_agent_client::Error doesn't | ||
// implement Copy, and can't be match'ed upon below without this | ||
// line. | ||
let e = e.into(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the Copy bit actually matter? It'd doesn't seem like you need to copy anywhere, and the .into()
is gonna be needed regardless.
for e.g. you could do:
match e {
e @ sled_agent_client::Error::InvalidRequest { .. } => Err(e.into()),
e => {
// ...
Err(e.into())
}
}
or just hoist the .into()
:
match e.into() {
e @ Error::InvalidRequest { .. } => Err(e),
e => {
// ...
Err(e)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried all variants of moving around the borrow, and .into()
, and what's there is the only thing I could make work unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you shouldn't need a borrow at all?
@@ -194,6 +194,13 @@ impl SledAgent { | |||
initial_hardware: InstanceHardware, | |||
target: InstanceRuntimeStateRequested, | |||
) -> Result<InstanceRuntimeState, Error> { | |||
let ncpus: i64 = (&initial_hardware.runtime.ncpus).into(); | |||
if ncpus > 16 { | |||
return Err(Error::internal_error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First thought was InvalidValue
but that also just gets mapped to 400 as well :/
Either way, add a comment here to indicate why this specific error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nits but thanks!
I think I might've inadvertently exposed this by changing exactly when an instance transitions from Creating
to Starting
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for doing this!
If a call to sled-agent's
instance_put
fails with anything other than a 400 (bad request), set instance's state to failed - we cannot know what state the instance is in.This exposed another two problems:
This commit changes the relevant network interface queries to allow deletion of network interfaces for instances that are either stopped or failed, and adds Failed to the list of ok_to_detach_instance_states for disks. Note that network interface insertion still requires an instance to be stopped, not failed.
Closes #1713