Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reconfigurator] SledEditor: be more strict about decommissioned sleds #7234

Merged
merged 10 commits into from
Dec 16, 2024

Conversation

jgallagher
Copy link
Contributor

This is a followup from #7204 (comment) and makes two changes, neither of which should affect behavior:

  • SledEditor will now fail if a caller attempts to make changes to a decommissioned sled (internally, this is statically enforced by a type state enum - a sled in the decommissioned state does not have any methods that support editing, so we're forced to return an error)
  • SledEditor::set_state() is now SledEditor::decommission(), and it performs some checks that the sled looks decommissionable

The second bullet is more questionable than I expected it to be:

  1. There are some arguments that SledEditor shouldn't do any checks here; in particular, it doesn't have the full context (e.g., any checks on "should we decommission this sled" that depend on the PlanningInput can't be performed here, because SledEditor intentionally doesn't have access to PlanningInput).
  2. I wanted to check zones + disks + datasets, but in practice it can only check zones today; I left a comment (and the commented-out disks + datasets checks we should do) about why. I think we will eventually be able to turn these on; the current behavior of removing disks/datasets from the blueprint for expunged sleds will have to change to fix Blueprint structure allows a variety of "illegal" combinations #7078, at which point these checks should be valid.

I don't feel super strongly about the checks in decommission() or even this PR as a whole; if this doesn't look like a useful direction, I'd be fine with discarding it. Please review with a pretty critical eye.

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a net improvement but I share your ambivalence.

I like making set_sled_state() more specific to decommissioning. If we don't support other transitions today, then it seems like if/when we do, we'll want to think through what other updates need to be made. That might be easy to forget if somebody can just call set_sled_state(SomeOtherState).

Comment on lines +188 to +199
// We can't take ownership of `editor` from the `&mut self`
// reference we have, and we need ownership to call
// `finalize()`. Steal the contents via `mem::swap()` with an
// empty editor. This isn't panic safe (i.e., if we panic
// between the `mem::swap()` and the reassignment to `self.0`
// below, we'll be left in the active state with an empty sled
// editor), but omicron in general is not panic safe and aborts
// on panic. Plus `finalize()` should never panic.
let mut stolen = ActiveSledEditor::new_empty(
DatasetIdsBackfillFromDb::empty(),
);
mem::swap(editor, &mut stolen);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of all this, what about accepting an owned SledEditor here and returning an owned one back? I think this means the caller would have to remove this sled editor from sled_editors and insert the one it got back. But that seems okay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think this makes failures super awkward? If we take self and return Result<SledEditor, SledEditError>, then on failure our caller doesn't get an editor back. I guess we could return either (SledEditor, Result<(), SledEditError>) or Result<SledEditor, (SledEditor, SledEditError)> but either of those seems worse to me than the slightly-messy internal details here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough!

) -> impl Iterator<Item = &BlueprintPhysicalDiskConfig> {
match &self.0 {
InnerSledEditor::Active(editor) => {
Either::Left(editor.disks(filter))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never seen Either before. Is this just a way to have a function return either of two different iterators that otherwise would have different types? I think I usually use Box<dyn Iterator<...>> instead. This is neat.

(I think I'm ambivalent between both approaches but it's cool to know this exists.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! This is from itertools. This one doesn't require boxing but does mean there's a match on Left/Right on every call to Iterator::next(). I doubt either choice makes any meaningful difference in this case.

Comment on lines 358 to 392
// TODO-john The disks and datasets checks below don't pass what the
// planner does currently to decommission sleds: if a sled is expunged,
// we'll omit its disks and datasets from the outgoing blueprint
// entirely without setting them all to the `Expunged` disposition.
// Fixing this will conflict with ongoing disk work, so for now these
// checks are commented out.
/*
// Check that all disks are expunged...
if let Some(disk) =
self.disks(DiskFilter::All).find(|disk| match disk.disposition {
BlueprintPhysicalDiskDisposition::InService => true,
BlueprintPhysicalDiskDisposition::Expunged => false,
})
{
return Err(SledEditError::NonDecommissionableDiskInService {
disk_id: disk.id,
zpool_id: disk.pool_id,
});
}

// ... and all datasets are expunged ...
if let Some(dataset) =
self.datasets(BlueprintDatasetFilter::All).find(|dataset| {
match dataset.disposition {
BlueprintDatasetDisposition::InService => true,
BlueprintDatasetDisposition::Expunged => false,
}
})
{
return Err(SledEditError::NonDecommissionableDatasetInService {
dataset_id: dataset.id,
kind: dataset.kind.clone(),
});
}
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You knew I was going to suggest putting this into an issue instead. 😄 It could be in a comment for an existing issue that covers the work that would introduce these checks or a standalone issue we track under Reconfigurator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 79535ec and opened #7238

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of my comments is a blocker.

@jgallagher jgallagher merged commit ca21fe7 into main Dec 16, 2024
17 checks passed
@jgallagher jgallagher deleted the john/sled-editor-decommissioned branch December 16, 2024 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Blueprint structure allows a variety of "illegal" combinations
2 participants