ClusterClass: Avoid unnecessary rollouts when reconciling topology #8656
Labels
area/clusterclass
Issues or PRs related to clusterclass
kind/bug
Categorizes issue or PR as related to a bug.
triage/accepted
Indicates an issue or PR is ready to be actively worked on.
Detailed Description:
When dealing with Kubernetes version upgrades across all the machines in a Cluster, the system is expected to enforce a sequence of operations: first the control plane Machines and then worker Machines.
This is implemented by holding back/deferring the change of the Kubernetes version in MachineDeployments until the CP upgrade completes (and then following the order of MD).
However, when there are additional concurrent changes in addition to the Kubernetes version change (e.g change to the failureDomain or replicas fields), they are currently reconciled immediately, in an attempt to move the system to the desired state as fast as possible.
However, this leads to Machines being rolled out twice: one rollout for the initial set of changes, one for the Kubernetes version upgrade as soon as it is allowed, and this is perceived as a bug as users expect the system to minimise the number of Machine rollouts which ultimately leads to improved stability of users’ workloads.
This issue is about fixing the topology controller so both the concurrent changes and the Kubernetes version change will happen at the same time (both are deferred).
Note: changes which are not concurrent to a Kubernetes version change will continue to be applied immediately as of today.
/kind bug
/area topology
The text was updated successfully, but these errors were encountered: