fix(clusterstate): invalidate instance cache when scaling down #6337

qianlei90 · 2023-12-02T17:35:30Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

After CA finishes scaling down, the instance cache still exists in the ClusterStateRegistry, causing CA to mistakenly believe that there are some unregistered nodes until the cache is refreshed (CloudProviderNodeInstancesCacheRefreshInterval = 2 * time.Minute).

This PR invalidates the cache once the scale down is completed.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

vadasambar · 2023-12-04T05:36:17Z

/assign vadasambar

vadasambar · 2023-12-04T18:30:39Z

Thank you for the PR!

vadasambar · 2023-12-11T18:31:59Z

cluster-autoscaler/clusterstate/clusterstate.go

-	// NodeName is the name of the node to be deleted.
-	NodeName string
+	// Node is the node to be deleted.
+	Node *apiv1.Node


Do we need to use Node instead of NodeName? I see we are using only name field of the node in the code (in this PR).

I use Node instead of NodeName to delete scaleDownRequest when there are no instances on the cloud provider side. I have to get the node object from lister before calling the HasInstance function if we use NodeName.

vadasambar · 2024-01-05T04:48:28Z

cluster-autoscaler/clusterstate/clusterstate.go

+		// delete scaleDownRequest if there's no instance in cloud provider side
+		// otherwise we check the delete time
+		hasInstance, err := csr.cloudProvider.HasInstance(scaleDownRequest.Node)
+		if err == nil && !hasInstance {


k8s-ci-robot · 2024-01-09T16:45:29Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: qianlei90, vadasambar
Once this PR has been reviewed and has the lgtm label, please assign bigdarkclown for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

cluster-autoscaler/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

vadasambar · 2024-01-09T16:45:54Z

/lgtm

@BigDarkClown @x13n can you please take a look at this PR 🙏

vadasambar · 2024-01-15T05:01:46Z

@qianlei90 can you rebase this PR ?

k8s-ci-robot · 2024-01-25T02:42:33Z

New changes are detected. LGTM label has been removed.

qianlei90 · 2024-01-25T03:26:14Z

/hold

x13n · 2024-02-06T14:43:25Z

Since there's no linked issue, can you clarify what is the bug you're trying to fix with this change?

k8s-triage-robot · 2024-05-06T15:42:26Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-06-05T16:07:30Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-07-05T16:35:41Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen
Mark this PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2024-07-05T17:42:08Z

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/cluster-autoscaler labels Dec 2, 2023

k8s-ci-robot requested review from BigDarkClown and vadasambar December 2, 2023 17:36

qianlei90 force-pushed the fix-instance-cache branch from 03b621f to 2166ddd Compare December 2, 2023 17:40

k8s-ci-robot assigned vadasambar Dec 4, 2023

vadasambar mentioned this pull request Dec 4, 2023

Dec 2023 vadafoss/daily-updates#16

Closed

vadasambar reviewed Dec 11, 2023

View reviewed changes

vadasambar mentioned this pull request Jan 2, 2024

Jan 2024 vadafoss/daily-updates#17

Closed

vadasambar reviewed Jan 5, 2024

View reviewed changes

vadasambar approved these changes Jan 9, 2024

View reviewed changes

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 9, 2024

fix(clusterstate): invalidate instance cache when scaling down

bfff113

qianlei90 force-pushed the fix-instance-cache branch from 2166ddd to bfff113 Compare January 25, 2024 02:42

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 25, 2024

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 25, 2024

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 25, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 6, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 5, 2024

k8s-ci-robot closed this Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(clusterstate): invalidate instance cache when scaling down #6337

fix(clusterstate): invalidate instance cache when scaling down #6337

qianlei90 commented Dec 2, 2023 •

edited

Loading

vadasambar commented Dec 4, 2023

vadasambar commented Dec 4, 2023

vadasambar Dec 11, 2023 •

edited

Loading

qianlei90 Jan 2, 2024 •

edited

Loading

vadasambar Jan 5, 2024

k8s-ci-robot commented Jan 9, 2024

vadasambar commented Jan 9, 2024

vadasambar commented Jan 15, 2024

k8s-ci-robot commented Jan 25, 2024

qianlei90 commented Jan 25, 2024

x13n commented Feb 6, 2024

k8s-triage-robot commented May 6, 2024

k8s-triage-robot commented Jun 5, 2024

k8s-triage-robot commented Jul 5, 2024

k8s-ci-robot commented Jul 5, 2024

fix(clusterstate): invalidate instance cache when scaling down #6337

fix(clusterstate): invalidate instance cache when scaling down #6337

Conversation

qianlei90 commented Dec 2, 2023 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

vadasambar commented Dec 4, 2023

vadasambar commented Dec 4, 2023

vadasambar Dec 11, 2023 • edited Loading

Choose a reason for hiding this comment

qianlei90 Jan 2, 2024 • edited Loading

Choose a reason for hiding this comment

vadasambar Jan 5, 2024

Choose a reason for hiding this comment

k8s-ci-robot commented Jan 9, 2024

vadasambar commented Jan 9, 2024

vadasambar commented Jan 15, 2024

k8s-ci-robot commented Jan 25, 2024

qianlei90 commented Jan 25, 2024

x13n commented Feb 6, 2024

k8s-triage-robot commented May 6, 2024

k8s-triage-robot commented Jun 5, 2024

k8s-triage-robot commented Jul 5, 2024

k8s-ci-robot commented Jul 5, 2024

qianlei90 commented Dec 2, 2023 •

edited

Loading

vadasambar Dec 11, 2023 •

edited

Loading

qianlei90 Jan 2, 2024 •

edited

Loading