[reconfigurator] SledEditor: be more strict about decommissioned sleds #7234

jgallagher · 2024-12-11T16:20:50Z

This is a followup from #7204 (comment) and makes two changes, neither of which should affect behavior:

SledEditor will now fail if a caller attempts to make changes to a decommissioned sled (internally, this is statically enforced by a type state enum - a sled in the decommissioned state does not have any methods that support editing, so we're forced to return an error)
SledEditor::set_state() is now SledEditor::decommission(), and it performs some checks that the sled looks decommissionable

The second bullet is more questionable than I expected it to be:

There are some arguments that SledEditor shouldn't do any checks here; in particular, it doesn't have the full context (e.g., any checks on "should we decommission this sled" that depend on the PlanningInput can't be performed here, because SledEditor intentionally doesn't have access to PlanningInput).
I wanted to check zones + disks + datasets, but in practice it can only check zones today; I left a comment (and the commented-out disks + datasets checks we should do) about why. I think we will eventually be able to turn these on; the current behavior of removing disks/datasets from the blueprint for expunged sleds will have to change to fix Blueprint structure allows a variety of "illegal" combinations #7078, at which point these checks should be valid.

I don't feel super strongly about the checks in decommission() or even this PR as a whole; if this doesn't look like a useful direction, I'd be fine with discarding it. Please review with a pretty critical eye.

davepacheco

I think this is a net improvement but I share your ambivalence.

I like making set_sled_state() more specific to decommissioning. If we don't support other transitions today, then it seems like if/when we do, we'll want to think through what other updates need to be made. That might be easy to forget if somebody can just call set_sled_state(SomeOtherState).

nexus/reconfigurator/planning/src/blueprint_editor/sled_editor.rs

davepacheco · 2024-12-12T19:37:16Z

nexus/reconfigurator/planning/src/blueprint_editor/sled_editor.rs

+                // We can't take ownership of `editor` from the `&mut self`
+                // reference we have, and we need ownership to call
+                // `finalize()`. Steal the contents via `mem::swap()` with an
+                // empty editor. This isn't panic safe (i.e., if we panic
+                // between the `mem::swap()` and the reassignment to `self.0`
+                // below, we'll be left in the active state with an empty sled
+                // editor), but omicron in general is not panic safe and aborts
+                // on panic. Plus `finalize()` should never panic.
+                let mut stolen = ActiveSledEditor::new_empty(
+                    DatasetIdsBackfillFromDb::empty(),
+                );
+                mem::swap(editor, &mut stolen);


Instead of all this, what about accepting an owned SledEditor here and returning an owned one back? I think this means the caller would have to remove this sled editor from sled_editors and insert the one it got back. But that seems okay?

Hm, I think this makes failures super awkward? If we take self and return Result<SledEditor, SledEditError>, then on failure our caller doesn't get an editor back. I guess we could return either (SledEditor, Result<(), SledEditError>) or Result<SledEditor, (SledEditor, SledEditError)> but either of those seems worse to me than the slightly-messy internal details here.

Fair enough!

davepacheco · 2024-12-12T19:41:03Z

nexus/reconfigurator/planning/src/blueprint_editor/sled_editor.rs

+    ) -> impl Iterator<Item = &BlueprintPhysicalDiskConfig> {
+        match &self.0 {
+            InnerSledEditor::Active(editor) => {
+                Either::Left(editor.disks(filter))


I've never seen Either before. Is this just a way to have a function return either of two different iterators that otherwise would have different types? I think I usually use Box<dyn Iterator<...>> instead. This is neat.

(I think I'm ambivalent between both approaches but it's cool to know this exists.)

Yep! This is from itertools. This one doesn't require boxing but does mean there's a match on Left/Right on every call to Iterator::next(). I doubt either choice makes any meaningful difference in this case.

davepacheco · 2024-12-12T19:43:47Z

nexus/reconfigurator/planning/src/blueprint_editor/sled_editor.rs

+        // TODO-john The disks and datasets checks below don't pass what the
+        // planner does currently to decommission sleds: if a sled is expunged,
+        // we'll omit its disks and datasets from the outgoing blueprint
+        // entirely without setting them all to the `Expunged` disposition.
+        // Fixing this will conflict with ongoing disk work, so for now these
+        // checks are commented out.
+        /*
+        // Check that all disks are expunged...
+        if let Some(disk) =
+            self.disks(DiskFilter::All).find(|disk| match disk.disposition {
+                BlueprintPhysicalDiskDisposition::InService => true,
+                BlueprintPhysicalDiskDisposition::Expunged => false,
+            })
+        {
+            return Err(SledEditError::NonDecommissionableDiskInService {
+                disk_id: disk.id,
+                zpool_id: disk.pool_id,
+            });
+        }
+
+        // ... and all datasets are expunged ...
+        if let Some(dataset) =
+            self.datasets(BlueprintDatasetFilter::All).find(|dataset| {
+                match dataset.disposition {
+                    BlueprintDatasetDisposition::InService => true,
+                    BlueprintDatasetDisposition::Expunged => false,
+                }
+            })
+        {
+            return Err(SledEditError::NonDecommissionableDatasetInService {
+                dataset_id: dataset.id,
+                kind: dataset.kind.clone(),
+            });
+        }
+        */


You knew I was going to suggest putting this into an issue instead. 😄 It could be in a comment for an existing issue that covers the work that would introduce these checks or a standalone issue we track under Reconfigurator.

Removed in 79535ec and opened #7238

davepacheco

None of my comments is a blocker.

…mmissioned

jgallagher added 8 commits December 11, 2024 11:05

use type state to ensure no edits of decommissioned sleds

fe2f36c

rename SledEditor constructors

55e9838

mem::swap panic comment

cb868ad

add validation when expunging a sled (BROKEN)

380a4eb

spelling

5296f23

formatting

ab544d8

disable disks+datasets decommissioning checks

75fb472

remove unnecessary test change

66858a5

jgallagher requested review from andrewjstone and davepacheco December 11, 2024 16:20

jgallagher mentioned this pull request Dec 11, 2024

[reconfigurator] Move resource allocation out of BlueprintBuilder #7235

Open

davepacheco reviewed Dec 12, 2024

View reviewed changes

davepacheco approved these changes Dec 12, 2024

View reviewed changes

jgallagher mentioned this pull request Dec 12, 2024

Reconfigurator: Planner should check disks and datasets before decommissioning a sled #7238

Open

jgallagher added 2 commits December 12, 2024 15:12

move dead code to issue #7238

79535ec

Merge remote-tracking branch 'origin/main' into john/sled-editor-deco…

51ce5fb

…mmissioned

jgallagher merged commit ca21fe7 into main Dec 16, 2024
17 checks passed

jgallagher deleted the john/sled-editor-decommissioned branch December 16, 2024 14:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[reconfigurator] SledEditor: be more strict about decommissioned sleds #7234

[reconfigurator] SledEditor: be more strict about decommissioned sleds #7234

jgallagher commented Dec 11, 2024

davepacheco left a comment

davepacheco Dec 12, 2024

jgallagher Dec 12, 2024

davepacheco Dec 12, 2024

davepacheco Dec 12, 2024

jgallagher Dec 12, 2024

davepacheco Dec 12, 2024

jgallagher Dec 13, 2024

davepacheco left a comment

[reconfigurator] SledEditor: be more strict about decommissioned sleds #7234

[reconfigurator] SledEditor: be more strict about decommissioned sleds #7234

Conversation

jgallagher commented Dec 11, 2024

davepacheco left a comment

Choose a reason for hiding this comment

davepacheco Dec 12, 2024

Choose a reason for hiding this comment

jgallagher Dec 12, 2024

Choose a reason for hiding this comment

davepacheco Dec 12, 2024

Choose a reason for hiding this comment

davepacheco Dec 12, 2024

Choose a reason for hiding this comment

jgallagher Dec 12, 2024

Choose a reason for hiding this comment

davepacheco Dec 12, 2024

Choose a reason for hiding this comment

jgallagher Dec 13, 2024

Choose a reason for hiding this comment

davepacheco left a comment

Choose a reason for hiding this comment