Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CA-1.18] #3177 cherry-pick: Fix stale replicas issue with cluster-autoscaler CAPI provider #3345

Merged

Conversation

detiber
Copy link
Member

@detiber detiber commented Jul 23, 2020

Backports #3177 to cluster-autoscaler-release-1.18:

From the original PR:

This change brings in a series of patches to remediate the issues with MachineSet and MachineDeployment replicas becoming stale. These changes add protection around the deletion mechanisms by adding a mutex to this operation and also changing the check versus 0 to use the minimum size for that group. Additionally, the operations to get the replicas are now using API server calls to ensure that the freshest information is available during replica count checks. Lastly, a minor change to the unit tests is added to address the underlying changes with respect to API server queries for replicas.

elmiko and others added 4 commits July 23, 2020 15:15
This change adds a mutex to the MachineController structure which is
used to gate access to the DeleteNodes function.

This is one in a series of PRs to mitigate kubernetes#3104
When getting Replicas() the local struct in the scalable resource might be stale. To mitigate possible side effects, we want always get a fresh replicas.

This is one in a series of PR to mitigate kubernetes#3104
provider

When calling deleteNodes() we should fail early if the operation could delete nodes below the nodeGroup minSize().

This is one in a series of PR to mitigate kubernetes#3104
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 23, 2020
@detiber
Copy link
Member Author

detiber commented Jul 23, 2020

/assign @elmiko

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 23, 2020
@k8s-ci-robot k8s-ci-robot requested review from enxebre and hardikdr July 23, 2020 19:20
Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for backporting Jason!
/approve

@enxebre @JoelSpeed ptal

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elmiko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 23, 2020
@JoelSpeed
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 29, 2020
@k8s-ci-robot k8s-ci-robot merged commit 72178ad into kubernetes:cluster-autoscaler-release-1.18 Jul 29, 2020
benmoss pushed a commit to benmoss/autoscaler that referenced this pull request Sep 28, 2020
[CA-1.18] kubernetes#3177 cherry-pick: Fix stale replicas issue with cluster-autoscaler CAPI provider
colin-welch pushed a commit to Paperspace/autoscaler that referenced this pull request Mar 5, 2021
[CA-1.18] kubernetes#3177 cherry-pick: Fix stale replicas issue with cluster-autoscaler CAPI provider
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants