-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 topology controller should avoid unnecessary rollouts during upgrades #8628
🐛 topology controller should avoid unnecessary rollouts during upgrades #8628
Conversation
/test pull-cluster-api-e2e-informing-dualstack-ipv6-main |
14ee7f0
to
6b2f3a1
Compare
b15064d
to
2b9f1f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/area clusterclass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. I wonder if it would be better to define "upgrade is held back" independent of the pending stuff.
Or we have to re-evaluate what pending exactly means and align to what we need.
In my opinion holding back an upgrade simply means:
- For control plane: Cluster.topology.version != desiredCP.version
- For MD: Cluster.topology.version != desiredMD.version
Maybe it's as easy as that?
37d855a
to
a53b2c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress :)
Let's pair a bit on it. I think we have some good ideas on how we can optimize this further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress :)
Let's pair a bit on it. I think we have some good ideas on how we can optimize this further
Let's rebase on top of #8658
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made a first pass while the PR is still WIP, and it looks already good
just a couple of nits from my side
a53b2c5
to
db2140a
Compare
/test ? |
@ykakarap: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-cluster-api-e2e-full-main |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only nits, looks great!
/test pull-cluster-api-e2e-full-main |
5fb211a
to
13d456d
Compare
/retest |
Thx, last nit from my side: #8628 (comment) |
pending latest comment from @sbueringer, lgtm from my side |
13d456d
to
0be8e6e
Compare
@ykakarap: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Thank you! /lgtm Let's hold until we have some idea what's going on regarding #8786 (just to keep debugging as easy as possible. This PR could introduce another potential error source) |
/hold |
LGTM label has been added. Git tree hash: a6c0f5bb594609962f3252e4e33d09f8d5f40933
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sbueringer The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
Updated the status of #8786 (comment). I think we're good with merging this PR. The issue is something in KCP. So unrelated to this PR /hold cancel |
What this PR does / why we need it:
This PR modified the topology controller so that it modified pushing any changes to ControlPlane/MachineDeployment if it is pending an upgrade.
If a ControlPlane or MD is pending an upgrade it implies that it will be rollouts after it picks up the upgrade. Therefore it is better to hold out other changes, potentially causing rollouts, to these objects as well in the mean time to avoid unnecessary machine churn.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #8656
Fixes #8695