-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🏃Remove annotations on upgradeControlPlane #2887
🏃Remove annotations on upgradeControlPlane #2887
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is linked to my PR rather than an issue on the last line, and it going to close my PR. Please update the first comment to point to the correct issue.
I updated the reference |
/assign @fabriziopandini @sedefsavas @yastij for extra review 👀 |
/milestone v0.3.4 |
Sorry about that, not sure how I messed that up |
7e9daef
to
8b550ef
Compare
/hold |
/hold cancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatted yesterday with @benmoss, this PR is only a first step in removing the annotations. We'll follow-up with more changes, more specifically
- Add the concept of
outdated
machines in the filters. - Make
scaleDownControlPlane
to take only theownedMachines
as input, and decide on its own which Machine should be scaled down - Picking a Machine for scale down will happen in
selectAndMarkMachine
(which should probably be renamed)- Modify the logic to first look at the outdated Machines. If there is any outdated Machines, use this list as collection.
- If there is no outdated Machines, pick the oldest one in the FailureDomain with most machines.
Going to update this to remove all annotations, there's also a bug right now with selectedMachines / ownedMachines in scale down |
c5d2e8e
to
8ff8401
Compare
8ff8401
to
a045cbc
Compare
Upgrade logic no longer uses machine annotations Logic is now: - Check number of nodes in workload cluster - If node count <= replicas, create new upgrade machine - If node count > replicas, scale down Scale up logic ensures that we don't create additional machines if we reconcile while waiting for an upgrade machine to appear in the node list Scale up should only consider machines needing upgrade We never support upgrading a subset of the cluster, but this will ensure that we pick the FD that has the most machines needing upgrade, rather than just the FD with the most machines. Also add a comment to explain why scale up will not cause more than 1 machine to be created Scale down always scales down outdated machines first This removes the need to pass through outdated machines
a045cbc
to
dd02966
Compare
/hold cancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: benmoss, vincepri The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benmoss looks great!
only few questions from my side to double-check some code steps
return ctrl.Result{}, &capierrors.RequeueAfterError{RequeueAfter: healthCheckFailedRequeueAfter} | ||
|
||
} | ||
if err := workloadCluster.RemoveMachineFromKubeadmConfigMap(ctx, machineToDelete); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to double-check this.
if there is an error after removing the machine from the config map but before deleting, when re-entering this code machineToDelete will be the same, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that can be guaranteed, since a user could encounter an error and change their cluster such that the next reconciliation doesn't retry scale down. At that point any number of changes can happen before the next scale down (like more scale up) such that the partially removed machine isn't the same one removed.
|
||
func (r *KubeadmControlPlaneReconciler) selectAndMarkMachine(ctx context.Context, machines internal.FilterableMachineCollection, annotation string, controlPlane *internal.ControlPlane) (*clusterv1.Machine, error) { | ||
selected, err := controlPlane.MachineInFailureDomainWithMostMachines(machines) | ||
status, err := workloadCluster.ClusterStatus(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is not part of this PR, but ClusterStatus confused me.
It is ControlPlane status (workers are not considered)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I think this is absolutely true, I can't believe this didn't occur to me earlier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, nevermind! I thought your point was that I am looking at all nodes, and not just control plane nodes, and this was a huge bug 🤦 . But in fact it's that this method is only for control plane nodes, even though it seems more like it should be for all nodes.
if needingUpgrade := controlPlane.MachinesNeedingUpgrade(); needingUpgrade.Len() > 0 { | ||
machines = needingUpgrade | ||
} | ||
return controlPlane.MachineInFailureDomainWithMostMachines(machines) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not wrong, in case of upgrades, this is considering only the machines to upgrades, not all the control-plane machines.
Is it possible this leads to an un-unexpected machine distribution? probably this is compensated by scale-up always looking at all the machines for placement...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it will change the machine distribution. We scale down the FD with the most un-upgraded machines, but we scale back up by smallest FD overall
Here's a run-through of what I understand the algorithm to be: https://gist.github.com/benmoss/9914fb4e09e1e4fed4a651119e983298
} | ||
|
||
return selected, nil | ||
if status.Nodes <= *kcp.Spec.Replicas { | ||
// scaleUp ensures that we don't continue scaling up while waiting for Machines to have NodeRefs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for my knowledge, where is this logic? I cant' find it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold |
/hold cancel |
/test pull-cluster-api-apidiff |
@benmoss: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
API changes looks good, it's all internal |
@benmoss thanks for your answers! |
What this PR does / why we need it:
Removes use of annotations in upgrade logic. We were duplicating the logic of selecting the machine in the scale down code, and it also causes bugs like #2430 where the cached state stored in the annotation gets out of sync with the world.
This logic working is dependent on the fact that
scaleUp
/TargetClusterControlPlaneIsHealthy
ensures there's a 1:1 correspondence of nodes and machines so we don't continue scaling up while new machines are still joining the cluster.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #2702