Skip to content

Commit

Permalink
Add MHCClass spec and details to proposal
Browse files Browse the repository at this point in the history
  • Loading branch information
killianmuldoon committed Oct 27, 2021
1 parent 0697e4d commit a104275
Showing 1 changed file with 87 additions and 15 deletions.
102 changes: 87 additions & 15 deletions docs/proposals/202105256-cluster-class-and-managed-topologies.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ We are fully aware that in order to exploit the potential of ClusterClass and ma
- Provide an external mechanism which allows to leverage the full power of a programming language to implement more complex customizations.
- **Adoption**: Providing a way to convert existing clusters into managed topologies.
- **Observability**: Build an SDK and enhance the Cluster object status to surface a summary of the status of the topology.
- **Lifecycle integrations**: Extend ClusterClass to include lifecycle management integrations such as MachineHealthCheck and Cluster Autoscaler to manage the state and health of the managed topologies.
- **Lifecycle integrations**: Extend ClusterClass to include lifecycle management integrations such as Cluster Autoscaler to manage the state of the managed topologies.

However we are intentionally leaving them out from this initial iteration for the following reasons:
- We want the community to reach a consensus on cornerstone elements of the design before iterating on additional features.
Expand Down Expand Up @@ -179,10 +179,14 @@ the `controlPlaneMachineType` variable accordingly.

**Notes**: Same notes as in Story 4 apply.

#### Story 8 - Easy UX for MachineHealthChecks
As a cluster operator I want a simple way to define checks to manage the health of the machines in my cluster.
- Instead of defining Machine Health Checks each time a cluster is created, there should be a mechanism for creating the same type of health check for each Cluster stamped by a ClusterClass.

### Implementation Details/Notes/Constraints

The following section provides details about the introduction of new types and modifications to existing types to implement the ClusterClass functionality.
If instead you are eager to see an example of ClusterClass and how the Cluster object will look like, you can jump to the Behavior paragraph.
If instead you are eager to see an example of ClusterClass and how the Cluster object will look, you can jump to the Behavior paragraph.

#### New API types

Expand Down Expand Up @@ -231,23 +235,31 @@ type ClusterClassSpec struct {
// Note: Patches will be applied in the order of the array.
// +optional
Patches []ClusterClassPatch `json:"patches,omitempty"`

// MachineHealthCheckClass defines a MachineHealthCheck for each
// ControlPlaneClass and MachineDeploymentClass in this ClusterClass.
// +optional
MachineHealthCheck *MachineHealthCheckClass `json:"machineHealthCheckClass,omitempty"`
}

// ControlPlaneClass defines the class for the control plane.
type ControlPlaneClass struct {
Metadata ObjectMeta `json:"metadata,omitempty"`
Metadata ObjectMeta `json:"metadata,omitempty"`

// LocalObjectTemplate contains the reference to the control plane provider.
LocalObjectTemplate `json:",inline"`
// LocalObjectTemplate contains the reference to the control plane provider.
LocalObjectTemplate `json:",inline"`

// MachineTemplate defines the metadata and infrastructure information
// for control plane machines.
//
// This field is supported if and only if the control plane provider template
// referenced above is Machine based and supports setting replicas.
//
// +optional
MachineInfrastructure *LocalObjectTemplate `json:"machineInfrastructure,omitempty"`
// MachineTemplate defines the metadata and infrastructure information
// for control plane machines.
//
// This field is supported if and only if the control plane provider template
// referenced above is Machine based and supports setting replicas.
// +optional
MachineInfrastructure *LocalObjectTemplate `json:"machineInfrastructure,omitempty"`

// MachineHealthCheckClass defines a MachineHealthCheck for this ControlPlaneClass.
// +optional
MachineHealthCheck *MachineHealthCheckClass `json:"machineHealthCheckClass,omitempty"`
}

// WorkersClass is a collection of deployment classes.
Expand All @@ -268,6 +280,10 @@ type MachineDeploymentClass struct {
// Template is a local struct containing a collection of templates for creation of
// MachineDeployment objects representing a set of worker nodes.
Template MachineDeploymentClassTemplate `json:"template"`

// MachineHealthCheckClass defines a MachineHealthCheck for this MachineDeploymentClass.
// +optional
MachineHealthCheck *MachineHealthCheckClass `json:"machineHealthCheckClass,omitempty"`
}

// MachineDeploymentClassTemplate defines how a MachineDeployment generated from a MachineDeploymentClass
Expand All @@ -290,6 +306,52 @@ type LocalObjectTemplate struct {
// offered by a provider.
Ref *corev1.ObjectReference `json:"ref"`
}

// MachineHealthCheckClass defines a MachineHealthCheck for a group of Machines.
type MachineHealthCheckClass struct {
// UnhealthyConditions contains a list of the conditions that determine
// whether a node is considered unhealthy. The conditions are combined in a
// logical OR, i.e. if any of the conditions is met, the node is unhealthy.
// If not set this field will contain two values:
// - type: Ready
// status: Unknown
// timeout: 300s
// - type: Ready
// status: "False"
// timeout: 300s
// +optional
UnhealthyConditions []UnhealthyCondition `json:"unhealthyConditions,omitempty"`

// Any further remediation is only allowed if at most "MaxUnhealthy" machines selected by
// "selector" are not healthy.
// If not set this value defaults to 100 percent.
// +optional
MaxUnhealthy *intstr.IntOrString `json:"maxUnhealthy,omitempty"`

// Any further remediation is only allowed if the number of machines selected by "selector" as not healthy
// is within the range of "UnhealthyRange". Takes precedence over MaxUnhealthy.
// Eg. "[3-5]" - This means that remediation will be allowed only when:
// (a) there are at least 3 unhealthy machines (and)
// (b) there are at most 5 unhealthy machines
// +optional
UnhealthyRange *string `json:"unhealthyRange,omitempty"`

// Machines older than this duration without a node will be considered to have
// failed and will be remediated.
// If you wish to disable this feature, set the value explicitly to 0.
// If not set, this value is defaulted to 10 minutes.
// +optional
NodeStartupTimeout *metav1.Duration `json:"nodeStartupTimeout,omitempty"`

// RemediationTemplate is a reference to a remediation template
// provided by an infrastructure provider.
//
// This field is completely optional, when filled, the MachineHealthCheck controller
// creates a new object from the template referenced and hands off remediation of the machine to
// a controller that lives outside of Cluster API.
// +optional
RemediationTemplate *corev1.ObjectReference `json:"remediationTemplate,omitempty"`
}
```

**ClusterClassVariable**
Expand Down Expand Up @@ -631,7 +693,9 @@ Builtin variables are available under the `builtin.` prefix. Some examples:
the template and validating if the result is a valid YAML or JSON value.
- We will only implement syntax validation. A semantic validation is not possible as the patches will be applied
to provider templates. **It’s the responsibility of the ClusterClass author to ensure the patches are semantically valid**.

- `MachineHealthCheck`
- If defined at `ClusterClass.spec.machineHealthCheck` must not be defined in `ClusterClass.spec.workers.machineDeployments[i].machineHealthCheck` or `ClusterClass.spec.workers.controlPlane.machineHealthCheck`
- (defaulting) All fields required for MachineHealthChecks, including UnhealthyConditions, are defaulted as described in the API definition.
- For object updates:
- all the reference must be in the same namespace of `metadata.Namespace`
- `spec.workers.machineDeployments[i].class` field must be unique within a ClusterClass.
Expand Down Expand Up @@ -783,7 +847,7 @@ intentionally use resources without patches and variables to focus on the simple
1. Creates the control plane object.
1. Sets the `spec.infrastructureRef` and `spec.controlPlaneRef` fields for the Cluster object.
1. Saves the cluster object in the API server.
5. Machine deployment object creation
5. MachineDeployment object creation
1. For each `spec.topology.workers.machineDeployments` item in the list
1. Create a name `<cluster-name>-<worker-set-name>` (if too long, hash it)
1. Initializes a new MachineDeployment object.
Expand All @@ -799,6 +863,14 @@ intentionally use resources without patches and variables to focus on the simple
```
Note: The topology label needs to be set on the individual Machine objects as well.
1. Creates the Machine Deployment object in the API server.
6. MachineHealthCheck object creation
1. If `ClusterClass.spec.machineHealthCheck` is set:
1. For each `Cluster.spec.topology.workers.machineDeployments` item in the list create a MachineHealthCheck with the label `topology.cluster.x-k8s.io/deployment-name: <machine-deployment-topology-name>` as the selector and set the `clusterName` field from the cluster.
2. For the Cluster.spec.topology.controlPlane create a MachineHealthCheck with the label `cluster.x-k8s.io/control-plane=` as the selector and set the `clusterName` field from the cluster.
2. For each `Cluster.spec.topology.workers.machineDeployments` with `ClusterClass.spec.workers.machineDeployments[i].machineDeploymentClass` defined create a MachineHealthCheck with the label `topology.cluster.x-k8s.io/deployment-name: <machine-deployment-topology-name>` as the selector and set the `clusterName` field from the cluster.
3. If `Cluster.spec.topology.controlPlane.machineHealthCheckClass` is set create a MachineHealthCheck with the label `cluster.x-k8s.io/control-plane=` as the selector and set the `clusterName` field from the cluster.

**Note** `ClusterClass.spec.workers.machineDeployments[i].machineDeploymentClass` and `Cluster.spec.topology.controlPlane.machineHealthCheckClass` can not be set if `Cluster.spec.topology.workers.machineDeployments` is set.

![Creation of cluster with ClusterClass](./images/cluster-class/create.png)

Expand Down

0 comments on commit a104275

Please sign in to comment.