Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blueprint structure allows a variety of "illegal" combinations #7078

Open
jgallagher opened this issue Nov 15, 2024 · 1 comment · Fixed by #7234
Open

Blueprint structure allows a variety of "illegal" combinations #7078

jgallagher opened this issue Nov 15, 2024 · 1 comment · Fixed by #7234
Assignees

Comments

@jgallagher
Copy link
Contributor

Blueprint currently has four different maps, all keyed by sled ID:

/// A map of sled id -> desired state of the sled.
///
/// A sled is considered part of the control plane cluster iff it has an
/// entry in this map.
pub sled_state: BTreeMap<SledUuid, SledState>,
/// A map of sled id -> zones deployed on each sled, along with the
/// [`BlueprintZoneDisposition`] for each zone.
///
/// Unlike `sled_state`, this map may contain entries for sleds that are no
/// longer a part of the control plane cluster (e.g., sleds that have been
/// decommissioned, but still have expunged zones where cleanup has not yet
/// completed).
pub blueprint_zones: BTreeMap<SledUuid, BlueprintZonesConfig>,
/// A map of sled id -> disks in use on each sled.
pub blueprint_disks: BTreeMap<SledUuid, BlueprintPhysicalDisksConfig>,
/// A map of sled id -> datasets in use on each sled
pub blueprint_datasets: BTreeMap<SledUuid, BlueprintDatasetsConfig>,

In general we would expect all those maps to have the same keys, but in practice that isn't true. A couple examples:

  • sled_state drops decommissioned sleds, but they may still exist in the other maps (but should only contain expunged values). We have to work around this when diff'ing blueprints.
  • Clients that use BlueprintBuilder, not the planner, that fail to call sled_ensure_disks / sled_ensure_datasets at appropriate times will emit blueprints that contain zones that reference datasets / disks that one would reasonably expect to exist in their relative maps, but those maps may be out of date or empty. Empty maps happen to work today because for backwards compatibility reasons we allow them to be empty, but doing so is a little messy and will eventually be removed once production systems can all be expected to have populated maps. That will break any of these clients (tests and reconfigurator-cli, today).

I think (?) we should probably only have one map here, keyed by sled ID, with values that encompass all the blueprint details for a single sled (today: state + zones + disks + datasets, growing in the future to support update versioning as needed). This may require some fundamental rework of BlueprintBuilder and possibly the planner, as today they manage these maps mostly independently (which is problematic!).

@jgallagher jgallagher self-assigned this Nov 15, 2024
andrewjstone pushed a commit that referenced this issue Nov 22, 2024
…ol, it must be from a disk present in the blueprint (#7106)

Builds and is staged on top of #7105.

The intended change here is in the first commit
(8274174): In
`BlueprintBuilder::sled_select_zpool()`, instead of only looking at the
`PlanningInput`, we also look at the disks present in the blueprint, and
only select a zpool that the planning input says is in service and that
we have in the blueprint.

This had a surprisingly-large blast radius in terms of tests - we had
_many_ tests which were adding zones (which implicitly selects a zpool)
from a `BlueprintBuilder` where there were no disks configured at all,
causing them to emit invalid blueprints. These should all be fixed as of
this PR, but I'm a little worried about test fragility in general,
particularly with an eye toward larger changes like #7078. Nothing to do
about that at the moment, but something to keep an eye on.

Fixes #7079.
jgallagher added a commit that referenced this issue Dec 5, 2024
…uilder` (#7204)

This is a step on the road to #7078. The current `BlueprintBuilder`
internals would not at all be amenable to squishing the disparate maps
in `Blueprint` together, so this PR tries to rework those internals.
`BlueprintBuilder` now does its own map squishing (inside
`new_based_on()`), combining state+zones+disks+datasets into one helper
(`SledEditor`). All modifications are preformed via those editors, then
`BlueprintBuilder::build()` breaks the combined editors' results back
out into four maps for `Blueprint`.

There should be only one functional change in this PR (marked with a
`TODO-john`): previously, expunging a zone checked that that zone had
not been modified in the current planning cycle. I'm not sure that's
really necessary; it felt like some defensiveness born out of the
complexity of the builder itself, maybe? I think we could put it back
(inside `ZonesEditor`, presumably) if folks think it would be good to
keep.

My intention was to change the tests as little as possible to ensure I
didn't break anything as I was moving functionality around, so
`new_based_on()` and `build()` have some arguably-pretty-gross
concessions to behave exactly the way the builder did before these
changes. I'd like to _remove_ those concessions, but those will be
nontrivial behavioral changes that I don't want to try to roll in with
all of this cleanup. I think I landed this pretty well; there are only a
few expectorate changes (due to the slightly reworked `ZonesEditor`
producing a different ordering for a few tests), and none in `omdb`
(which is where I've often seen incidental planner changes bubble out).

Apologies for the size of the PR. I'd lightly recommend that the new
`blueprint_editor/` module and its subcomponents should be pretty
quickly reviewable and should be looked at first; they're _supposed_ to
be simple and obvious, and are largely ported over from the prior
storage / disks / datasets / zones editors. I did rework how we're
handling #6645 backwards compat to try to reduce how much we need to
pass `SledResources` around, so that complicates things in
`DatasetsEditor` some, but not too bad IMO. (I also added a test for
this, since I changed it and I don't think we had one before.) The
`BlueprintBuilter` changes should then be more or less the natural way
one would use a collection of `SledEditor`s without changing its API or
behavior (yet!).
@jgallagher
Copy link
Contributor Author

This shouldn't have been closed; github picked up on "fix #7078" in a commit message, but in context it's clearly not that the commit fixes this.

@jgallagher jgallagher reopened this Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant