Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decommissioning expunged disks should happen in planner #7098

Open
andrewjstone opened this issue Nov 19, 2024 · 2 comments
Open

Decommissioning expunged disks should happen in planner #7098

andrewjstone opened this issue Nov 19, 2024 · 2 comments
Assignees

Comments

@andrewjstone
Copy link
Contributor

andrewjstone commented Nov 19, 2024

Currently, we decommission expunged disks during reconfigurator execution. This expungement step is tied to a previous step in the executor that ensures that any expunged disks no longer exist on the given sled.

Unfortunately, this coupling prevents us from making all execution steps independent and skippable as a solution to #6999.
We can resolve this issue by decommissioning disks in the planner, rather than in the executor as we do with sled decommissioning

@andrewjstone
Copy link
Contributor Author

andrewjstone commented Nov 19, 2024

In order to have the planner do this, it needs to learn about the physical disks that the sled-agent has via inventory. These physical disks are already retrieve in inventory. However they don't include the generation number of the current configuration on the sled-agent. We likely want to return that along with the configuration so that we don't inadvertently decommission disks that have not yet been expunged by the executor.

@andrewjstone
Copy link
Contributor Author

physical disk generation number added to the inventory in #7162

andrewjstone added a commit that referenced this issue Nov 26, 2024
This is the first PR related to fixing #7098
andrewjstone added a commit that referenced this issue Nov 26, 2024
This is the second PR related to fixing #7098.

BlueprintPhysicalDisksConfig is no longer an alias of
OmicronPhysicalDisksConfig, but instead wraps a new
type BlueprintPhysicalDiskConfig which itself wraps an
OmicronPhysicalDiskConfig and a BlueprintPhysicalDiskDisposition.
This change brings blueprint physical disks in line with blueprint
datasets and zones such that expungement is a first class notion in
the blueprint, and not implicit in the disk not being present in a
blueprint.

This change only adds the new types and makes them work with
the existing code. The underlying logic around expungement and
decommissioning has not changed. That will come in a follow up PR.
andrewjstone added a commit that referenced this issue Nov 26, 2024
This is the second PR related to fixing #7098.

BlueprintPhysicalDisksConfig is no longer an alias of
OmicronPhysicalDisksConfig, but instead wraps a new
type BlueprintPhysicalDiskConfig which itself wraps an
OmicronPhysicalDiskConfig and a BlueprintPhysicalDiskDisposition.
This change brings blueprint physical disks in line with blueprint
datasets and zones such that expungement is a first class notion in
the blueprint, and not implicit in the disk not being present in a
blueprint.

This change only adds the new types and makes them work with
the existing code. The underlying logic around expungement and
decommissioning has not changed. That will come in a follow up PR.
andrewjstone added a commit that referenced this issue Nov 26, 2024
This is the second PR related to fixing #7098.

BlueprintPhysicalDisksConfig is no longer an alias of
OmicronPhysicalDisksConfig, but instead wraps a new type
BlueprintPhysicalDiskConfig which itself wraps an
OmicronPhysicalDiskConfig and a BlueprintPhysicalDiskDisposition. This
change brings blueprint physical disks in line with blueprint datasets
and zones such that expungement is a first class notion in the
blueprint, and not implicit in the disk not being present in a
blueprint.

This change only adds the new types and makes them work with the
existing code. The underlying logic around expungement and
decommissioning has not changed. That will come in a follow up PR.
andrewjstone added a commit that referenced this issue Dec 19, 2024
This PR is intended to fix #6999 and #7098.

The executor steps are all independent already except those that will be
fixed in this PR. A disk disposition was added to the blueprint and gets
changed from `InService` to `Expunged` when the `PlanningInput` changes.
A disk state was also added that goes from `Active` -> `Decommissioned`.
This disk state changes when the parent sled is expunged or the
sled-agent has been guaranteed to have seen the disk expungement as
reported via inventory.

This code fully implements the planner bits. The exucutor bits are
going to be pulled in from the [fewer-fatal-errors-in-executor-branch](main...fewer-fatal-errors-in-executor).
The executor bits are going to be expanded to implement the expunge and
decommission as dictated by the blueprint. Because the planner changes
are tightly coupled with the executor,  they will all come in this PR.

Further code will also be added to represent the disk disposition and
state in the blueprint and diff tables for omdb output. This PR is primarily
blocked on [revamping the diff system](#7240).
This is particularly necessary because we need to change those tables
and want to do it rigorously after we have a visitor based mechanism
for doing so.

There is some more cleanup to do here as well, in particular around
the `physical_disk_filter` and things that may not be needed when the
executor bits get fully implemented.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant