Get capi targetsize from cache #5025

enxebre · 2022-07-13T14:40:42Z

Which component this PR applies to?

Cluster autoscaler - cluster api.

What type of PR is this?

This ensured that access to replicas during scale down operations were never stale by accessing the API server #3104.

This honoured that behaviour while moving to unstructured client #3312.

This regressed that behaviour while trying to reduce the API server load #4443.

This put back the never stale replicas behaviour at the cost of loading back the API server #4634.

Currently on e.g a 48 minutes cluster with no scaling activity it does 1.4k get request to the scale subresource.
This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets TargetSize which is called on every autoscaling cluster state loop from come from cache while getting fresh replicas at the time of perform the operations.

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.

/area provider/cluster-api

/kind bug
/kind cleanup

enxebre · 2022-07-13T14:41:17Z

cc @JoelSpeed @elmiko

elmiko

this makes sense to me, thanks for the update @enxebre

i just have a quick question about

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.

did you have something specific in mind?

arunmk

LGTM with a minor comment

arunmk · 2022-07-13T15:59:34Z

cluster-autoscaler/cloudprovider/clusterapi/clusterapi_nodegroup.go

+		return 0, errors.Wrap(err, "error getting replica count")
+	}
+	if !found {
+		replicas = 0


should we set replicas = ng.scalableResource.MinSize() here?

that seems reasonable to me, i guess we still need to default to 0 if there is an issue with minsize though.

actually I made it an error, as I don't think it is a valid case for it to not be found.

that's probably better

k8s-ci-robot · 2022-07-13T16:00:33Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: arunmk, enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/cloudprovider/clusterapi/OWNERS~~ [enxebre]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104. This honoured that behaviour while moving to unstructured client kubernetes#3312. This regressed that behaviour while trying to reduce the API server load kubernetes#4443. This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634. Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource. This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache. Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.

elmiko · 2022-07-13T18:46:10Z

LGTM, i'll let @arunmk add the label

arunmk · 2022-07-14T16:37:46Z

/lgtm

/approve

JoelSpeed · 2022-07-18T13:37:45Z

/lgtm

Get capi targetsize from cache

enxebre mentioned this pull request Jul 13, 2022

clusterapi scale from zero support #4840

Merged

k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jul 13, 2022

k8s-ci-robot requested review from arunmk and mrajashree July 13, 2022 14:40

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 13, 2022

elmiko reviewed Jul 13, 2022

View reviewed changes

arunmk approved these changes Jul 13, 2022

View reviewed changes

enxebre force-pushed the get-targetSize-fromcache branch from a80af8e to b2f1823 Compare July 13, 2022 18:26

jbartosik added the area/cluster-autoscaler label Jul 18, 2022

k8s-ci-robot assigned JoelSpeed Jul 18, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 18, 2022

k8s-ci-robot merged commit 028584a into kubernetes:master Jul 18, 2022

navinjoy pushed a commit to navinjoy/autoscaler that referenced this pull request Oct 26, 2022

Merge pull request kubernetes#5025 from enxebre/get-targetSize-fromcache

b27f88e

Get capi targetsize from cache

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get capi targetsize from cache #5025

Get capi targetsize from cache #5025

enxebre commented Jul 13, 2022 •

edited

Loading

enxebre commented Jul 13, 2022

elmiko left a comment

arunmk left a comment

arunmk Jul 13, 2022

elmiko Jul 13, 2022 •

edited

Loading

enxebre Jul 13, 2022

elmiko Jul 13, 2022

k8s-ci-robot commented Jul 13, 2022

elmiko commented Jul 13, 2022

arunmk commented Jul 14, 2022 •

edited

Loading

JoelSpeed commented Jul 18, 2022

Get capi targetsize from cache #5025

Get capi targetsize from cache #5025

Conversation

enxebre commented Jul 13, 2022 • edited Loading

Which component this PR applies to?

What type of PR is this?

enxebre commented Jul 13, 2022

elmiko left a comment

Choose a reason for hiding this comment

arunmk left a comment

Choose a reason for hiding this comment

arunmk Jul 13, 2022

Choose a reason for hiding this comment

elmiko Jul 13, 2022 • edited Loading

Choose a reason for hiding this comment

enxebre Jul 13, 2022

Choose a reason for hiding this comment

elmiko Jul 13, 2022

Choose a reason for hiding this comment

k8s-ci-robot commented Jul 13, 2022

elmiko commented Jul 13, 2022

arunmk commented Jul 14, 2022 • edited Loading

JoelSpeed commented Jul 18, 2022

enxebre commented Jul 13, 2022 •

edited

Loading

elmiko Jul 13, 2022 •

edited

Loading

arunmk commented Jul 14, 2022 •

edited

Loading