MachineSet version changes, upgrading from v1alpha3 to v1alpha4 #5405

abhinavnagaraj · 2021-10-07T00:03:08Z

What steps did you take and what happened:
Launch a cluster in v1alpha3 (capi v0.3.19), with one MachineDeployment containing 2 replicas and with strategyType='RollingUpdate'.

Upgrade the cluster to v1alpha4(capi v0.4.0), without any changes in MachineDeployment spec.

This results in the creation of new MachineSets and hence new machines.

What did you expect to happen:
If there is no change in the MachineDeployment spec, there should be no rolling-upgrade of MachineSets.

Anything else you would like to add:
In v1alpha4 MachineDeployment and MachineSet template versions(spec.template.spec.version) is expected to be prefixed with a 'v'.
A Default() function in MachineDeployment webhook adds this prefix - #4670
But this is missing in the MachineSet webhook.

Environment:

Cluster-api version:v0.4.0
Minikube/KIND version:v0.11.1
Kubernetes version: (use kubectl version):v1.21.1
OS (e.g. from /etc/os-release):

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

The text was updated successfully, but these errors were encountered:

abhinavnagaraj · 2021-10-07T00:10:14Z

/assign @abhinavnagaraj

sbueringer · 2021-10-07T06:16:40Z

I would assume defaulting in the MachineSet leads to an infinite rollout of the MD when the MD tries to rollout a MachineSet without the v prefix but after creation it gets the prefix through the webhook. (afaik there is a diff logic to compare the template in the MD with the MachineSet to figure out if the MD must be rolled out)

ykakarap · 2021-10-07T23:14:31Z

@abhinavnagaraj I tried the flow you explained in the issue and I have not been able to reproduce the issue.
Steps I took:

Use the latest release 0.3 version of clusterctl (v0.3.23) to initialize a local kind cluster (infrastructure = docker)
Use clusterctl config cluster to create a cluster YAML
Remove the v prefix from the spec.template.spec.version of the MachieDeploment in the cluster YAML.
Apply the cluster yaml (and the CNI) to wait for the cluster to come up.
Verified the API version of the machine deployments to be cluster.x-k8s.io/v1alpha3
Verify that the version in the MachineDeployment and the MachineSet did not have the v prefix
Download the newest version of the release 0.4 version of clusterctl (v0.4.4)
Run clusterctl upgrade apply --contract v1alpha4
Wait for the upgrade to finish

At the end the machine deployments and the machine sets were updated to the new API version (cluster.x-k8s.io/v1alpha4) and no machineset rollout was triggered.

The version on the MachineDeployment and the MachineSet did not have the v prefix after the upgrade too.

If possible, can you please share some more details about the state of the cluster and the yamls used before and after the upgrade.
Any details that can help with reproducing the issue would be helpful 🙂

abhinavnagaraj · 2021-10-08T23:27:10Z

@ykakarap Yes, these steps are correct. Now, if we scale the MachineDeployment with API version v1alpha4, we would expect the corresponding MachineSet(v1alpha4) to also scale up. Instead, the MachineDeployment version updates to have the prefix 'v'. A new MachineSet is launched with the version set to have prefix 'v'. The old MachineSet finally reconciles to have 0 replicas, but does not get deleted.

sbueringer · 2021-10-11T07:54:56Z

Note: That the old MachineSet doesn't get deleted is intended behaviour because of revisionHistoryLimit (which defaults to 1)

From your description I would guess that the scale triggers the MD defaulting webhook (which runs on create/update) which adds the v prefix and thus triggers the rollout of a new MachineSet.

I wonder how adding the same defaulting to the MachineSet logic can fix the issue as the MS webhook is only run on create/update.

vincepri · 2021-10-22T17:23:26Z

/priority awaiting-more-evidence

sbueringer · 2021-10-22T17:52:03Z

@vincepri I think you reopened the issue because of my comment here: #5406 (comment)

I confirmed it. So post-upgrade the capi-controller is upgrading the template references automatically and during that also updates the version field through the webhooks. I think this only leaves small edge cases like when someone updates from v1alpha3 to v1beta1 without updating the infra or bootstrap provider at the same time. I guess that's unlikely as most providers will update their code and api version between v1alpha3 and v1beta1.

k8s-triage-robot · 2022-01-20T20:23:12Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

fabriziopandini · 2022-01-20T20:50:55Z

/remove-lifecycle stale
@sbueringer to re-assess

sbueringer · 2022-01-21T08:06:36Z

@fabriziopandini To summarize:

In a regular CAPI update we update core CAPI, the bootstrap, the controlplane and the infra provider.
for this issue it's relevant that either the bootstrap provider or the infra provider used in the MachineSet is updated
when the MachineSetReconciler updates either of the references in the MachineSet because of that, this issue is automatically mitigated (because an update to one of the refs, automatically adds the v prefix to the version field)

In my opinion now the issue can only occur if someone upgrades only core CAPI, without upgrading the bootstrap or infra provider. In my opinion that's as very uncommon case. So I personally wouldn't try to add handling for that case. Especially as we didn't get any reports that it is an issue.

(The PR which closed this issue added the v-prefix defaulting in the MS webhook, which made it possible for the reconciler to automatically mitigate)

sbueringer · 2022-01-21T08:08:10Z

If we want to mitigate that edge case ,we could add code which adds the prefix to the version field during each reconcile (if necessary).

I think I would prefer avoiding that, if we don't have to.

fabriziopandini · 2022-01-21T09:25:11Z

the issue can only occur if someone upgrades only core CAPI, without upgrading the bootstrap or infra provider.

That's a use case we are not supporting, given that clusterctl ensures that all the providers move up at the same time
I agree we should not fix

sbueringer · 2022-02-18T17:50:54Z

the issue can only occur if someone upgrades only core CAPI, without upgrading the bootstrap or infra provider.

That's a use case we are not supporting, given that clusterctl ensures that all the providers move up at the same time I agree we should not fix

As discussed in the grooming on 18th February, let's close this issue based on that it's an unsupported use case to only upgrade core CAPI.

sbueringer · 2022-02-18T17:51:01Z

/close

k8s-ci-robot · 2022-02-18T17:51:12Z

@sbueringer: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 7, 2021

abhinavnagaraj mentioned this issue Oct 7, 2021

🐛 normalize MachineSet version validation #5406

Merged

k8s-ci-robot assigned abhinavnagaraj Oct 7, 2021

k8s-ci-robot closed this as completed in #5406 Oct 13, 2021

vincepri reopened this Oct 13, 2021

vincepri added this to the Next milestone Oct 22, 2021

k8s-ci-robot added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Oct 22, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2022

k8s-ci-robot closed this as completed Feb 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MachineSet version changes, upgrading from v1alpha3 to v1alpha4 #5405

MachineSet version changes, upgrading from v1alpha3 to v1alpha4 #5405

abhinavnagaraj commented Oct 7, 2021

abhinavnagaraj commented Oct 7, 2021

sbueringer commented Oct 7, 2021 •

edited

Loading

ykakarap commented Oct 7, 2021

abhinavnagaraj commented Oct 8, 2021

sbueringer commented Oct 11, 2021 •

edited

Loading

vincepri commented Oct 22, 2021

sbueringer commented Oct 22, 2021 •

edited

Loading

k8s-triage-robot commented Jan 20, 2022

fabriziopandini commented Jan 20, 2022

sbueringer commented Jan 21, 2022

sbueringer commented Jan 21, 2022

fabriziopandini commented Jan 21, 2022

sbueringer commented Feb 18, 2022

sbueringer commented Feb 18, 2022

k8s-ci-robot commented Feb 18, 2022

MachineSet version changes, upgrading from v1alpha3 to v1alpha4 #5405

MachineSet version changes, upgrading from v1alpha3 to v1alpha4 #5405

Comments

abhinavnagaraj commented Oct 7, 2021

abhinavnagaraj commented Oct 7, 2021

sbueringer commented Oct 7, 2021 • edited Loading

ykakarap commented Oct 7, 2021

abhinavnagaraj commented Oct 8, 2021

sbueringer commented Oct 11, 2021 • edited Loading

vincepri commented Oct 22, 2021

sbueringer commented Oct 22, 2021 • edited Loading

k8s-triage-robot commented Jan 20, 2022

fabriziopandini commented Jan 20, 2022

sbueringer commented Jan 21, 2022

sbueringer commented Jan 21, 2022

fabriziopandini commented Jan 21, 2022

sbueringer commented Feb 18, 2022

sbueringer commented Feb 18, 2022

k8s-ci-robot commented Feb 18, 2022

sbueringer commented Oct 7, 2021 •

edited

Loading

sbueringer commented Oct 11, 2021 •

edited

Loading

sbueringer commented Oct 22, 2021 •

edited

Loading