Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync with upstream v1.21.3 #129

Merged
merged 47 commits into from
Jun 25, 2022

Conversation

himanshu-kun
Copy link

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Release note:

Gardener autoscaler now in sync with upstream v1.21.3

bpineau and others added 30 commits August 16, 2021 16:25
While this was previously effectively limited to 50, `DescribeAutoScalingGroups` now supports
fetching 100 ASG per calls on all regions, matching what's documented:
https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_DescribeAutoScalingGroups.html
```
     AutoScalingGroupNames.member.N
       The names of the Auto Scaling groups.
       By default, you can only specify up to 50 names.
       You can optionally increase this limit using the MaxRecords parameter.
     MaxRecords
       The maximum number of items to return with this call.
       The default value is 50 and the maximum value is 100.
```

Doubling this halves API calls on large clusters, which should help to prevent throttling.
Refactor to allow for optimisation
The pricing json for us-east-1 is currently 129MB. Currently fetching
this into memory and parsing results in a large memory footprint on
startup, and can lead to the autoscaler being OOMKilled.

Change the ReadAll/Unmarshal logic to a stream decoder to significantly
reduce the memory use.
…pick-of-#3999-kubernetes#4199-upstream-cluster-autoscaler-release-1.21

Automated cherry pick of kubernetes#3999 kubernetes#4127 kubernetes#4199 upstream cluster autoscaler release 1.21
Backport Merge pull request kubernetes#4274 to upstream/cluster-autoscaler-release-1.21
Both upscale's `getUpcomingNodeInfos` and the binpacking estimator now uses
the same shared DeepCopyTemplateNode function and inherits its naming
pattern, which is great as that fixes a long standing bug.

Due to that, `getUpcomingNodeInfos` will enrich the cluster snapshots with
generated nodeinfos and nodes having predictable names (using template name
+ an incremental ordinal starting at 0) for upcoming nodes.

Later, when it looks for fitting nodes for unschedulable pods (when upcoming
nodes don't satisfy those (FitsAnyNodeMatching failing due to nodes capacity,
or pods antiaffinity, ...), the binpacking estimator will also build virtual
nodes and place them in a snapshot fork to evaluate scheduler predicates.

Those temporary virtual nodes are built using the same pattern (template name
and an index ordinal also starting at 0) as the one previously used by
`getUpcomingNodeInfos`, which means it will generate the same nodeinfos/nodes
names for nodegroups having upcoming nodes.

But adding nodes by the same name in an existing cluster snapshot isn't
allowed, and the evaluation attempt will fail.

Practically this blocks re-upscales for nodegroups having upcoming nodes,
which can cause a significant delay.
Signed-off-by: Sylvain Rabot <[email protected]>
Cherry-pick kubernetes#4130 onto 1.21: dont proactively decrement azure cache for unregistered nodes
…ist-Update-03-10-21-1.21

CA - AWS - Instance List Update 03-10-21 - 1.21 release branch
…ist-Update-29-10-21-1.21

CA - AWS - Instance List Update 29-10-21 - 1.21 release branch
…ist-Update-29-11-21-1.21

Cluster-Autoscaler update AWS EC2 instance types with g5, m6 and r6 - 1.21 release branch
…elease-1.21

Cherry-pick kubernetes#4497 onto 1.21 - add more azure instance types
…ist-Update-13-12-21-1.21

CA - AWS Instance List Update - 13/12/21 - 1.21
k8s-ci-robot and others added 11 commits April 25, 2022 06:31
…ler-release-1.21-nodegroup-minmax

[cluster-autoscaler] backport kubernetes#4022 Publish node group min/max metrics into 1.21
…dd flag to control DaemonSet eviction on non-empty nodes & Allow DaemonSet pods to opt in/out

from eviction.
Instead of logging a fatal error, log a standard error and fall back to
loading instance types from the static list.
…ease-1.21-aws-fallback

CA - AWS Cloud Provider - 1.21 - fix instance type fallback
…r-release-1.21-aws-instance-update-02-06-2022

CA - AWS Cloud Provider - 1.21 Static Instance List Update 02-06-2022
…release-1.21-daemonset-eviction-for-empty-nodes-and-occupied-nodes

Backport kubernetes#4162 and kubernetes#4172 [cluster-autoscaler] "Add a flag to control DaemonSet eviction on non-empty nodes and Allow DaemonSet pods to opt in/out from eviction" into 1.21
…r-release-1.21.3

Cluster Autoscaler - 1.21.3 release
@himanshu-kun himanshu-kun requested review from hardikdr and a team as code owners June 23, 2022 17:26
@gardener-robot gardener-robot added needs/review Needs review size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py) needs/second-opinion Needs second review by someone else labels Jun 23, 2022
@CLAassistant
Copy link

CLAassistant commented Jun 23, 2022

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 15 committers have signed the CLA.

✅ ialidzhikov
✅ himanshu-kun
❌ bpineau
❌ k8s-ci-robot
❌ darkpssngr
❌ aidy
❌ gjtempleton
❌ marwanad
❌ sturman
❌ towca
❌ sylr
❌ Shubham82
❌ lzhecheng
❌ tzneal
❌ MaciekPytel
You have signed the CLA already but the status is still pending? Let us recheck it.

@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jun 23, 2022
@gardener-robot-ci-3 gardener-robot-ci-3 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jun 24, 2022
Copy link
Member

@ialidzhikov ialidzhikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/review Needs review needs/second-opinion Needs second review by someone else labels Jun 24, 2022
@himanshu-kun himanshu-kun merged commit 1b0b9f5 into gardener:rel-v1.21 Jun 25, 2022
@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Jun 25, 2022
@himanshu-kun himanshu-kun deleted the rel-v1.21-prep branch June 25, 2022 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) reviewed/lgtm Has approval for merging size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
Development

Successfully merging this pull request may close these issues.