ClusterClass: Avoid unnecessary rollouts when reconciling topology #8656

ykakarap · 2023-05-14T01:59:41Z

Detailed Description:

When dealing with Kubernetes version upgrades across all the machines in a Cluster, the system is expected to enforce a sequence of operations: first the control plane Machines and then worker Machines.

This is implemented by holding back/deferring the change of the Kubernetes version in MachineDeployments until the CP upgrade completes (and then following the order of MD).

However, when there are additional concurrent changes in addition to the Kubernetes version change (e.g change to the failureDomain or replicas fields), they are currently reconciled immediately, in an attempt to move the system to the desired state as fast as possible.

However, this leads to Machines being rolled out twice: one rollout for the initial set of changes, one for the Kubernetes version upgrade as soon as it is allowed, and this is perceived as a bug as users expect the system to minimise the number of Machine rollouts which ultimately leads to improved stability of users’ workloads.

This issue is about fixing the topology controller so both the concurrent changes and the Kubernetes version change will happen at the same time (both are deferred).

Note: changes which are not concurrent to a Kubernetes version change will continue to be applied immediately as of today.

/kind bug
/area topology

k8s-ci-robot · 2023-05-14T01:59:44Z

@ykakarap: The label(s) area/topology cannot be applied, because the repository doesn't have them.

In response to this:

Detailed Description:

When dealing with Kubernetes version upgrades across all the machines in a Cluster, the system is expected to enforce a sequence of operations: first the control plane Machines and then worker Machines.

This is implemented by holding back/deferring the change of the Kubernetes version in MachineDeployments until the CP upgrade completes (and then following the order of MD).

However, when there are additional concurrent changes in addition to the Kubernetes version change (e.g change to the failureDomain or replicas fields), they are currently reconciled immediately, in an attempt to move the system to the desired state as fast as possible.

However, this leads to Machines are rolled out twice: one rollout for the initial set of changes, one for the Kubernetes version upgrade as soon as it is allowed, and this is perceived as a bug as users expect the system to minimise the number of Machine rollouts which ultimately leads to improved stability of users’ workloads.

This issue is about fixing the topology controller so both the concurrent changes and the Kubernetes version change will happen at the same time (both are deferred).

Note: changes which are not concurrent to a Kubernetes version change will continue to be applied immediately as of today.

/kind bug
/area topology

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ykakarap · 2023-05-14T01:59:49Z

/assign

fabriziopandini · 2023-05-15T08:51:47Z

/triage accepted

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 14, 2023

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label May 14, 2023

k8s-ci-robot assigned ykakarap May 14, 2023

ykakarap mentioned this issue May 14, 2023

🐛 topology controller should avoid unnecessary rollouts during upgrades #8628

Merged

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 15, 2023

sbueringer added the area/clusterclass Issues or PRs related to clusterclass label May 15, 2023

sbueringer mentioned this issue May 19, 2023

When BeforeClusterUpgrade is blocking an upgrade new MachineDeployments are being created with new version #8695

Closed

k8s-ci-robot closed this as completed in #8628 Jun 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClusterClass: Avoid unnecessary rollouts when reconciling topology #8656

ClusterClass: Avoid unnecessary rollouts when reconciling topology #8656

ykakarap commented May 14, 2023 •

edited

Loading

k8s-ci-robot commented May 14, 2023

ykakarap commented May 14, 2023

fabriziopandini commented May 15, 2023

ClusterClass: Avoid unnecessary rollouts when reconciling topology #8656

ClusterClass: Avoid unnecessary rollouts when reconciling topology #8656

Comments

ykakarap commented May 14, 2023 • edited Loading

k8s-ci-robot commented May 14, 2023

ykakarap commented May 14, 2023

fabriziopandini commented May 15, 2023

ykakarap commented May 14, 2023 •

edited

Loading