-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 normalize MachineSet version validation #5406
🐛 normalize MachineSet version validation #5406
Conversation
Hi @abhinavnagaraj. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
d5a1d79
to
929728c
Compare
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vincepri The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
As stated in #5405 (comment) I would really prefer to understand how this PR can fix the issue. |
We should probably remove the |
This change adds the So I wonder what happens after an upgrade. Initially (if I understood it correctly) you will have a MD and a MS without prefix. When you scale up the MD the Do status updates also trigger mutating webhooks configured for If it is our controller which triggers the update, imho we will now have a race condition after this PR. More or less the MD and the MS have to get the prefix at the same time (or at least we should never run into a MD reconcile loop where only one of them has the prefix). I'm probably missing something, but it sounds to me like the PR will make the issue harder to reproduce, not solve it. I think it could be solved for example by also adding the |
I played a bit around with this PR and I think it only helps when somehow the MS also is updated at the same time as the MD (but I couldn't reproduce this case) Setup
I had the following results Option 1: Run
|
I think the change is legit orthogonally to the issue which I don't think it fixes. One for consistency and to eventually remove the edge cases you describe here #5406 (comment) and two because the MachineSet is nothing less than a user facing API which UX we should care same as a MachineDeployment, e.g as a user I might choose to run a MachineSet directly because I need more granular control than I have with a MachineDeployment and from a UX pov I'd be surprise if the fields differently.
sounds reasonable to me. |
Users might want to use MachineSet without a MachineDeployment though, which has been a valid use case for quite a while. There are cases where you need strict control on how MachineTemplate is rolled out, which you might want to leverage MachineSet for rather than a MachineDeployment resource. The change as-is seems valid from that point of view?
We usually don't configure our admission or validation webhooks to trigger on status (or any other subresource) changes cluster-api/config/webhook/manifests.yaml Lines 322 to 323 in 0caa40d
Wouldn't this cause a rollout? To clarify, I don't think that the change in this PR solves the related issue, if it does it might be by chance or because something else gets triggered. We'd have to dig a little deeper on it. |
Agree, the change itself is fine so that the MS itself works correctly. | Wouldn't this cause a rollout? Ah I think I missed something. Maybe it fixes the issue because after an upgrade the MachineSetReconciler reconciles every MS at least once (after the list call (?)). During reconcile it reconciles the external references (aka upgrades the API versions of at least the bootstrap template ref) at least in our case because we've also upgrade the kubeadm types. Those updates should then trigger the defaulting. Update: So I think it could be good enough to just fix it with this PR. |
/lgtm |
/hold cancel |
@vincepri We should make sure this PR gets into the next v1.0 release. Do we need a cherry-pick? I'm not sure if there was a consensus around fast-forward. |
/cherrypick release-1.0 |
@vincepri: failed to push cherry-picked changes in GitHub: pushing failed, output: "To https://github.com/k8s-infra-cherrypick-robot/cluster-api\n ! [remote rejected] cherry-pick-5406-to-release-1.0 -> cherry-pick-5406-to-release-1.0 (refusing to allow a Personal Access Token to create or update workflow In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherrypick release-1.0 |
@sbueringer: failed to push cherry-picked changes in GitHub: pushing failed, output: "To https://github.com/k8s-infra-cherrypick-robot/cluster-api\n ! [remote rejected] cherry-pick-5406-to-release-1.0 -> cherry-pick-5406-to-release-1.0 (refusing to allow a Personal Access Token to create or update workflow In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I've opened a Slack thread: https://kubernetes.slack.com/archives/CDECRSC5U/p1634797739002500 |
/cherrypick release-1.0 |
@sbueringer: failed to push cherry-picked changes in GitHub: pushing failed, output: "To https://github.com/k8s-infra-cherrypick-robot/cluster-api\n ! [remote rejected] cherry-pick-5406-to-release-1.0 -> cherry-pick-5406-to-release-1.0 (refusing to allow a Personal Access Token to create or update workflow In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
New thread in #testing-ops: https://kubernetes.slack.com/archives/C7J9RP96G/p1634827485007700 |
/cherrypick release-1.0 |
@sbueringer: failed to push cherry-picked changes in GitHub: pushing failed, output: "To https://github.com/k8s-infra-cherrypick-robot/cluster-api\n ! [remote rejected] cherry-pick-5406-to-release-1.0 -> cherry-pick-5406-to-release-1.0 (refusing to allow a Personal Access Token to create or update workflow In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherrypick release-0.4 |
@sbueringer: new pull request created: #5482 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherrypick release-1.0 |
@sbueringer: new pull request created: #5560 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What this PR does / why we need it:
This PR normalizes the MachineSet Template version, to match the pattern
v.<major>.<minor>.<patch>
This prevents the creation of new machines when upgrading from v1alpha3 to v1alpha4, when there are no changes in the MachineDeployment spec.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #5405