-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement global images #915
Conversation
Fill in image creation code, adding a few fields to the image from RFD 4 (version and digest specifically). Also add block_size due to the constraint that block_size must be present during any volume creation. Store volume construction requests created from the parameters in global_image_create. This commit does not implement global_image_delete because that involves much work to figure out how to safely do this (are any disks using it? are other volumes using this as a sub volume?). Both "Image" and "GlobalImage" are required to separate authz policy, so this commit separates the two. Project images will be implemented in another PR. Implement authz for global images. GlobalImagesList is a dummy authz resource that was added to have a place to check list_children permissions for.
@davepacheco one unresolved part is that GlobalImage and GlobalImageList are both children of Fleet: https://github.com/oxidecomputer/omicron/pull/915/files#diff-f5f8df6872a0c2cc4cbc022eb186649c66c8aee85c51329b639736a82424d305R2776 I couldn't make Fleet -> GlobalImageList -> GlobalImage work, would appreciate a look |
nexus/src/nexus.rs
Outdated
} | ||
|
||
params::ImageSource::Snapshot(_id) => { | ||
unimplemented!() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about returning a 500 here instead of panicking? (You could even use unimplemented_todo
, but it's not necessary.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opted for 503 in 3b88391
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using 503 for this case makes it harder for us to identify operational issues (usually 503s) vs. bugs (usually 500s) in the field...but hopefully this will never ship so it won't matter!
nexus/src/sagas.rs
Outdated
.global_image_fetch(&opctx, image_id) | ||
.await | ||
.map_err(|e| { | ||
ActionError::action_failed(Error::internal_error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this and the later one an internal error instead of directly using e
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get
the trait `From<omicron_common::api::external::Error>` is not implemented for `ActionError`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant .map_err(ActionError::action_failed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in ef02727
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. There's another one further down I think.
I did the same thing with |
Very nice! I had several suggestions above but none are critical. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome, only a couple minor comments.
@@ -54,6 +55,24 @@ async fn test_unauthorized(cptestctx: &ControlPlaneTestContext) { | |||
let client = &cptestctx.external_client; | |||
let log = &cptestctx.logctx.log; | |||
|
|||
// Run a httptest server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this is more explicit, we do this for:
Clickhouse (bind to port zero):
omicron/nexus/test-utils/src/lib.rs
Line 99 in 3817f7e
let clickhouse = dev::clickhouse::ClickHouseInstance::new(0).await.unwrap(); |
Nexus (use ConfigDropshot
's default, which is 127.0.0.1:0
):
omicron/nexus/test-utils/src/lib.rs
Lines 105 to 107 in 3817f7e
let server = omicron_nexus::Server::start(&config, rack_id, &logctx.log) | |
.await | |
.unwrap(); |
Lines 57 to 60 in 3817f7e
/// Dropshot configuration for external API server | |
pub dropshot_external: ConfigDropshot, | |
/// Dropshot configuration for internal API server | |
pub dropshot_internal: ConfigDropshot, |
https://docs.rs/dropshot/0.6.0/src/dropshot/config.rs.html#54-61
Etc.
Dropshot's local_addr()
method makes it easy to access the "real" port that was allocated after the fact:
omicron/nexus/test-utils/src/lib.rs
Line 131 in 3817f7e
server.http_server_internal.local_addr(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you decide not to do something similar with this mock that we do with CockroachDB and others?
@@ -442,11 +442,8 @@ CREATE TABLE omicron.public.image ( | |||
project_id UUID NOT NULL, | |||
volume_id UUID NOT NULL, | |||
|
|||
/* Optional URL: Images may be backed by either a URL or a volume */ | |||
url STRING(8192), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you removed the "optional" comments here but the fields are still optional. Is that the intent or are they supposed to be NOT NULL
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are supposed to be optional
nexus/src/authz/api_resources.rs
Outdated
fleet: Fleet, | ||
/// ConsoleSessionList is a synthetic resource used for modeling who has access | ||
/// to create sessions. | ||
#[derive(Clone, Copy, Debug, Eq, PartialEq)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like something about the way this was merged caused this stuff to show up as part of your change. That's weird.
nexus/src/sagas.rs
Outdated
Some(Box::new(serde_json::from_str(volume.data()).map_err( | ||
|e| { | ||
ActionError::action_failed(Error::internal_error( | ||
&e.to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe want a little more context in this message? ("failed to deserialize volume data"?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 added in 4752359
params::DiskSource::Blank { block_size } => { | ||
db::model::BlockSize::try_from(*block_size).map_err(|e| { | ||
ActionError::action_failed(Error::internal_error( | ||
&e.to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe want a little more context in this message? I'm not sure what should go here -- I'm just imagining we see this error in the log. We'll be able to tell it came from this saga action, at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll at least see invalid block size {}
from the db::model::BlockSize
try_from
, though I'm not sure how we would ever hit this error because params::BlockSize
try_from
also validates that it's one of the (currently) three supported values?
Crucible: update rust crate base64 to 0.21.3 (#913) update rust crate slog-async to 2.8 (#915) update rust crate async-recursion to 1.0.5 (#912) Move active jobs into a separate data structure and optimize `ackable_work` (#908) Check repair IDs correctly (#910) update actions/checkout action to v4 (#903) update rust crate tokio to 1.32 (#890) Remove single-item Vecs (#898) Move "extent under repair" into a helper function (#907) offset_mod shouldn't be randomized (#905) Only rehash if a write may have failed (#899) Make negotiation state an enum (#901) Test update for fast write ack and gather errors on test failure (#897) Propolis: Update cpuid-gen util for better coverage Make storage backend config more flexible and consistent Use correct register sizes for PIIX3 PM device Update bitflags dependency fix softnpu port order (#517) Use hex formatting for unhandled MMIO/PIO/MSRs Update deps for new crucible and oximeter Update standalone-with-crucible docs (#514)
Crucible: update rust crate base64 to 0.21.3 (#913) update rust crate slog-async to 2.8 (#915) update rust crate async-recursion to 1.0.5 (#912) Move active jobs into a separate data structure and optimize `ackable_work` (#908) Check repair IDs correctly (#910) update actions/checkout action to v4 (#903) update rust crate tokio to 1.32 (#890) Remove single-item Vecs (#898) Move "extent under repair" into a helper function (#907) offset_mod shouldn't be randomized (#905) Only rehash if a write may have failed (#899) Make negotiation state an enum (#901) Test update for fast write ack and gather errors on test failure (#897) Propolis: Update cpuid-gen util for better coverage Make storage backend config more flexible and consistent Use correct register sizes for PIIX3 PM device Update bitflags dependency fix softnpu port order (#517) Use hex formatting for unhandled MMIO/PIO/MSRs Update deps for new crucible and oximeter Update standalone-with-crucible docs (#514) Co-authored-by: Alan Hanson <[email protected]>
Fill in image creation code, adding a few fields to the image from RFD
4 (version and digest specifically). Also add block_size due to the
constraint that block_size must be present during any volume creation.
Store volume construction requests created from the parameters in
global_image_create. This commit does not implement global_image_delete
because that involves much work to figure out how to safely do this (are
any disks using it? are other volumes using this as a sub volume?).
Both "Image" and "GlobalImage" are required to separate authz policy, so
this commit separates the two. Project images will be implemented in
another PR.
Implement authz for global images. GlobalImagesList is a dummy authz
resource that was added to have a place to check list_children
permissions for.