-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix stale replicas issue with cluster-autoscaler CAPI provider #3177
Conversation
This change adds a mutex to the MachineController structure which is used to gate access to the DeleteNodes function. This is one in a series of PRs to mitigate kubernetes#3104
When getting Replicas() the local struct in the scalable resource might be stale. To mitigate possible side effects, we want always get a fresh replicas. This is one in a series of PR to mitigate kubernetes#3104
provider When calling deleteNodes() we should fail early if the operation could delete nodes below the nodeGroup minSize(). This is one in a series of PR to mitigate kubernetes#3104
cluster-autoscaler CAPI provider
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
thanks! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enxebre The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[CA-1.18] #3177 cherry-pick: Fix stale replicas issue with cluster-autoscaler CAPI provider
[CA-1.18] kubernetes#3177 cherry-pick: Fix stale replicas issue with cluster-autoscaler CAPI provider
[CA-1.18] kubernetes#3177 cherry-pick: Fix stale replicas issue with cluster-autoscaler CAPI provider
This change brings in a series of patches to remediate the issues with MachineSet and MachineDeployment replicas becoming stale. These changes add protection around the deletion mechanisms by adding a mutex to this operation and also changing the check versus 0 to use the minimum size for that group. Additionally, the operations to get the replicas are now using API server calls to ensure that the freshest information is available during replica count checks. Lastly, a minor change to the unit tests is added to address the underlying changes with respect to API server queries for replicas.
fixes: #3104
/area provider/cluster-api