-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decommissioning expunged disks should happen in planner #7098
Comments
In order to have the planner do this, it needs to learn about the physical disks that the sled-agent has via inventory. These physical disks are already retrieve in inventory. However they don't include the generation number of the current configuration on the sled-agent. We likely want to return that along with the configuration so that we don't inadvertently decommission disks that have not yet been expunged by the executor. |
physical disk generation number added to the inventory in #7162 |
This is the first PR related to fixing #7098
This is the second PR related to fixing #7098. BlueprintPhysicalDisksConfig is no longer an alias of OmicronPhysicalDisksConfig, but instead wraps a new type BlueprintPhysicalDiskConfig which itself wraps an OmicronPhysicalDiskConfig and a BlueprintPhysicalDiskDisposition. This change brings blueprint physical disks in line with blueprint datasets and zones such that expungement is a first class notion in the blueprint, and not implicit in the disk not being present in a blueprint. This change only adds the new types and makes them work with the existing code. The underlying logic around expungement and decommissioning has not changed. That will come in a follow up PR.
This is the second PR related to fixing #7098. BlueprintPhysicalDisksConfig is no longer an alias of OmicronPhysicalDisksConfig, but instead wraps a new type BlueprintPhysicalDiskConfig which itself wraps an OmicronPhysicalDiskConfig and a BlueprintPhysicalDiskDisposition. This change brings blueprint physical disks in line with blueprint datasets and zones such that expungement is a first class notion in the blueprint, and not implicit in the disk not being present in a blueprint. This change only adds the new types and makes them work with the existing code. The underlying logic around expungement and decommissioning has not changed. That will come in a follow up PR.
This is the second PR related to fixing #7098. BlueprintPhysicalDisksConfig is no longer an alias of OmicronPhysicalDisksConfig, but instead wraps a new type BlueprintPhysicalDiskConfig which itself wraps an OmicronPhysicalDiskConfig and a BlueprintPhysicalDiskDisposition. This change brings blueprint physical disks in line with blueprint datasets and zones such that expungement is a first class notion in the blueprint, and not implicit in the disk not being present in a blueprint. This change only adds the new types and makes them work with the existing code. The underlying logic around expungement and decommissioning has not changed. That will come in a follow up PR.
This PR is intended to fix #6999 and #7098. The executor steps are all independent already except those that will be fixed in this PR. A disk disposition was added to the blueprint and gets changed from `InService` to `Expunged` when the `PlanningInput` changes. A disk state was also added that goes from `Active` -> `Decommissioned`. This disk state changes when the parent sled is expunged or the sled-agent has been guaranteed to have seen the disk expungement as reported via inventory. This code fully implements the planner bits. The exucutor bits are going to be pulled in from the [fewer-fatal-errors-in-executor-branch](main...fewer-fatal-errors-in-executor). The executor bits are going to be expanded to implement the expunge and decommission as dictated by the blueprint. Because the planner changes are tightly coupled with the executor, they will all come in this PR. Further code will also be added to represent the disk disposition and state in the blueprint and diff tables for omdb output. This PR is primarily blocked on [revamping the diff system](#7240). This is particularly necessary because we need to change those tables and want to do it rigorously after we have a visitor based mechanism for doing so. There is some more cleanup to do here as well, in particular around the `physical_disk_filter` and things that may not be needed when the executor bits get fully implemented.
Currently, we decommission expunged disks during reconfigurator execution. This expungement step is tied to a previous step in the executor that ensures that any expunged disks no longer exist on the given sled.
Unfortunately, this coupling prevents us from making all execution steps independent and skippable as a solution to #6999.
We can resolve this issue by decommissioning disks in the planner, rather than in the executor as we do with sled decommissioning
The text was updated successfully, but these errors were encountered: