add more azure instance types #4497

marwanad · 2021-12-06T14:48:22Z

Adds more supported azure instances to the static SKU list

/area provider/azure

k8s-ci-robot · 2021-12-06T14:48:42Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: marwanad

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/cloudprovider/azure/OWNERS~~ [marwanad]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

nilo19 · 2021-12-07T13:49:19Z

/lgtm

…nce-types add more azure instance types

Cherry-pick #4497 onto 1.22 - add more azure instance types

nilo19 · 2021-12-14T04:43:18Z

@marwanad will this be backported to older versions or 1.22 only?

…nce-types add more azure instance types

Cherry-pick #4497 onto 1.20 - add more azure instance types

Cherry-pick #4497 onto 1.21 - add more azure instance types

* Fix cluster-autoscaler clusterapi sample manifest This commit fixes sample manifest of cluster-autoscaler clusterapi provider. (cherry picked from commit a5fee21) * Adding functionality to cordon the node before destroying it. This helps load balancer to remove the node from healthy hosts (ALB does have this support). This won't fix the issue of 502 completely as there is some time node has to live even after cordoning as to serve In-Flight request but load balancer can be configured to remove Cordon nodes from healthy host list. This feature is enabled by cordon-node-before-terminating flag with default value as false to retain existing behavior. * Set maxAsgNamesPerDescribe to the new maximum value While this was previously effectively limited to 50, `DescribeAutoScalingGroups` now supports fetching 100 ASG per calls on all regions, matching what's documented: https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_DescribeAutoScalingGroups.html ``` AutoScalingGroupNames.member.N The names of the Auto Scaling groups. By default, you can only specify up to 50 names. You can optionally increase this limit using the MaxRecords parameter. MaxRecords The maximum number of items to return with this call. The default value is 50 and the maximum value is 100. ``` Doubling this halves API calls on large clusters, which should help to prevent throttling. * Break out unmarshal from GenerateEC2InstanceTypes Refactor to allow for optimisation * Optimise GenerateEC2InstanceTypes unmarshal memory usage The pricing json for us-east-1 is currently 129MB. Currently fetching this into memory and parsing results in a large memory footprint on startup, and can lead to the autoscaler being OOMKilled. Change the ReadAll/Unmarshal logic to a stream decoder to significantly reduce the memory use. * use aws sdk to find region * update readme * Update cluster-autoscaler/cloudprovider/aws/README.md Co-authored-by: Guy Templeton <[email protected]> * Merge pull request kubernetes#4274 from kinvolk/imran/cloud-provider-packet-fix Cloud provider[Packet] fixes * Fix bug where a node that becomes ready after 2 mins can be treated as unready. Deprecated LongNotStarted In cases where node n1 would: 1) Be created at t=0min 2) Ready condition is true at t=2.5min 3) Not ready taint is removed at t=3min the ready node is counted as unready Tested cases after fix: 1) Case described above 2) Nodes not starting even after 15mins still treated as unready 3) Nodes created long ago that suddenly become unready are counted as unready. * Improve misleading log Signed-off-by: Sylvain Rabot <[email protected]> * dont proactively decrement azure cache for unregistered nodes * Cluster Autoscaler: fix unit tests after kubernetes#3924 was backported to 1.20 in kubernetes#4319 The backport included unit tests using a function that changed signature after 1.20. This was not detected before merging because CI is not running correctly on 1.20. * Cluster Autoscaler: backport Github Actions CI to 1.20 (kubernetes#4366) * annotate fakeNodes so that cloudprovider implementations can identify them if needed * move annotations to cloudprovider package * fix 1.19 test * remove flaky test that's removed in master * Cluster Autoscaler 1.20.1 * Make arch-specific releases use separate images instead of tags on the same image This seems to be the current convention in k8s. * Cluster Autoscaler: add arch-specific build targets to .gitignore * CA - AWS - Instance List Update 03-10-21 - 1.20 release branch * CA - AWS - Instance List Update 29-10-21 - 1.20 release branch * Cluster-Autoscaler update AWS EC2 instance types with g5, m6 and r6 * CA - AWS Instance List Update - 13/12/21 - 1.20 * Merge pull request kubernetes#4497 from marwanad/add-more-azure-instance-types add more azure instance types * Cluster Autoscaler 1.20.2 * Add `--feature-gates` flag to support scale up on volume limits (CSI migration enabled) Signed-off-by: ialidzhikov <[email protected]> * CA - AWS Cloud Provider - 1.20 Static Instance List Update 02-06-2022 * Cluster Autoscaler - 1.20.3 release * sync_file updates & other changes * Updating vendor against [email protected]:kubernetes/kubernetes.git:e3de62298a730415c5d2ab72607ef6adadd6304d (e3de622) * fixed some declaration errors Co-authored-by: Kubernetes Prow Robot <[email protected]> Co-authored-by: Hidekazu Nakamura <[email protected]> Co-authored-by: atul <[email protected]> Co-authored-by: Benjamin Pineau <[email protected]> Co-authored-by: Adrian Lai <[email protected]> Co-authored-by: darkpssngr <[email protected]> Co-authored-by: Guy Templeton <[email protected]> Co-authored-by: Vivek Bagade <[email protected]> Co-authored-by: Sylvain Rabot <[email protected]> Co-authored-by: Marwan Ahmed <[email protected]> Co-authored-by: Jakub Tużnik <[email protected]> Co-authored-by: GuyTempleton <[email protected]> Co-authored-by: sturman <[email protected]> Co-authored-by: Maciek Pytel <[email protected]> Co-authored-by: ialidzhikov <[email protected]>

* Set maxAsgNamesPerDescribe to the new maximum value While this was previously effectively limited to 50, `DescribeAutoScalingGroups` now supports fetching 100 ASG per calls on all regions, matching what's documented: https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_DescribeAutoScalingGroups.html ``` AutoScalingGroupNames.member.N The names of the Auto Scaling groups. By default, you can only specify up to 50 names. You can optionally increase this limit using the MaxRecords parameter. MaxRecords The maximum number of items to return with this call. The default value is 50 and the maximum value is 100. ``` Doubling this halves API calls on large clusters, which should help to prevent throttling. * Break out unmarshal from GenerateEC2InstanceTypes Refactor to allow for optimisation * Optimise GenerateEC2InstanceTypes unmarshal memory usage The pricing json for us-east-1 is currently 129MB. Currently fetching this into memory and parsing results in a large memory footprint on startup, and can lead to the autoscaler being OOMKilled. Change the ReadAll/Unmarshal logic to a stream decoder to significantly reduce the memory use. * use aws sdk to find region * Merge pull request kubernetes#4274 from kinvolk/imran/cloud-provider-packet-fix Cloud provider[Packet] fixes * Fix templated nodeinfo names collisions in BinpackingNodeEstimator Both upscale's `getUpcomingNodeInfos` and the binpacking estimator now uses the same shared DeepCopyTemplateNode function and inherits its naming pattern, which is great as that fixes a long standing bug. Due to that, `getUpcomingNodeInfos` will enrich the cluster snapshots with generated nodeinfos and nodes having predictable names (using template name + an incremental ordinal starting at 0) for upcoming nodes. Later, when it looks for fitting nodes for unschedulable pods (when upcoming nodes don't satisfy those (FitsAnyNodeMatching failing due to nodes capacity, or pods antiaffinity, ...), the binpacking estimator will also build virtual nodes and place them in a snapshot fork to evaluate scheduler predicates. Those temporary virtual nodes are built using the same pattern (template name and an index ordinal also starting at 0) as the one previously used by `getUpcomingNodeInfos`, which means it will generate the same nodeinfos/nodes names for nodegroups having upcoming nodes. But adding nodes by the same name in an existing cluster snapshot isn't allowed, and the evaluation attempt will fail. Practically this blocks re-upscales for nodegroups having upcoming nodes, which can cause a significant delay. * Improve misleading log Signed-off-by: Sylvain Rabot <[email protected]> * dont proactively decrement azure cache for unregistered nodes * annotate fakeNodes so that cloudprovider implementations can identify them if needed * move annotations to cloudprovider package * Cluster Autoscaler 1.21.1 * CA - AWS - Instance List Update 03-10-21 - 1.21 release branch * CA - AWS - Instance List Update 29-10-21 - 1.21 release branch * Cluster-Autoscaler update AWS EC2 instance types with g5, m6 and r6 * CA - AWS Instance List Update - 13/12/21 - 1.21 * Merge pull request kubernetes#4497 from marwanad/add-more-azure-instance-types add more azure instance types * Cluster Autoscaler 1.21.2 * Add `--feature-gates` flag to support scale up on volume limits (CSI migration enabled) Signed-off-by: ialidzhikov <[email protected]> * [Cherry pick 1.21] Remove TestDeleteBlob UT Signed-off-by: Zhecheng Li <[email protected]> * cherry-pick kubernetes#4022 [cluster-autoscaler] Publish node group min/max metrics * Skipping metrics tests added in kubernetes#4022 Each test works in isolation, but they cause panic when the entire suite is run (ex. make test-in-docker), because the underlying metrics library panics when the same metric is registered twice. (cherry picked from commit 52392b3) * cherry-pick kubernetes#4162 and kubernetes#4172 [cluster-autoscaler]Add flag to control DaemonSet eviction on non-empty nodes & Allow DaemonSet pods to opt in/out from eviction. * CA - AWS Cloud Provider - 1.21 Static Instance List Update 02-06-2022 * fix instance type fallback Instead of logging a fatal error, log a standard error and fall back to loading instance types from the static list. * Cluster Autoscaler - 1.21.3 release * FAQ updated * Sync_changes file updated Co-authored-by: Benjamin Pineau <[email protected]> Co-authored-by: Adrian Lai <[email protected]> Co-authored-by: darkpssngr <[email protected]> Co-authored-by: Kubernetes Prow Robot <[email protected]> Co-authored-by: Sylvain Rabot <[email protected]> Co-authored-by: Marwan Ahmed <[email protected]> Co-authored-by: Jakub Tużnik <[email protected]> Co-authored-by: GuyTempleton <[email protected]> Co-authored-by: sturman <[email protected]> Co-authored-by: Maciek Pytel <[email protected]> Co-authored-by: ialidzhikov <[email protected]> Co-authored-by: Zhecheng Li <[email protected]> Co-authored-by: Shubham Kuchhal <[email protected]> Co-authored-by: Todd Neal <[email protected]>

* Set maxAsgNamesPerDescribe to the new maximum value While this was previously effectively limited to 50, `DescribeAutoScalingGroups` now supports fetching 100 ASG per calls on all regions, matching what's documented: https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_DescribeAutoScalingGroups.html ``` AutoScalingGroupNames.member.N The names of the Auto Scaling groups. By default, you can only specify up to 50 names. You can optionally increase this limit using the MaxRecords parameter. MaxRecords The maximum number of items to return with this call. The default value is 50 and the maximum value is 100. ``` Doubling this halves API calls on large clusters, which should help to prevent throttling. * Break out unmarshal from GenerateEC2InstanceTypes Refactor to allow for optimisation * Optimise GenerateEC2InstanceTypes unmarshal memory usage The pricing json for us-east-1 is currently 129MB. Currently fetching this into memory and parsing results in a large memory footprint on startup, and can lead to the autoscaler being OOMKilled. Change the ReadAll/Unmarshal logic to a stream decoder to significantly reduce the memory use. * Use highest available magnum microversion Magnum allows using the microversion string "latest", and it will replace it internally with the highest microversion that it supports. This will let the autoscaler use microversion 1.10 which allows scaling groups to 0 nodes, if it is available. The autoscaler will still be able to use microversion 1.9 on older versions of magnum. * Merge pull request kubernetes#4274 from kinvolk/imran/cloud-provider-packet-fix Cloud provider[Packet] fixes * Improve misleading log Signed-off-by: Sylvain Rabot <[email protected]> * Cluster Autoscaler 1.22.1 * CA - AWS - Instance List Update 03-10-21 - 1.22 release branch * CA - AWS - Instance List Update 29-10-21 - 1.22 release branch * Cluster-Autoscaler update AWS EC2 instance types with g5, m6 and r6 * Merge pull request kubernetes#4497 from marwanad/add-more-azure-instance-types add more azure instance types * CA - AWS Instance List Update - 13/12/21 - 1.22 * Cluster Autoscaler 1.22.2 * Add `--feature-gates` flag to support scale up on volume limits (CSI migration enabled) Signed-off-by: ialidzhikov <[email protected]> * Fix Azure IMDS Url in InstanceMetadataService initialization * Remove variables not used in azure_util_test Signed-off-by: Zhecheng Li <[email protected]> * add recent AKS agentpool label to ignore for similarity checks * ignore azure csi topology label for similarity checks and populate it for scale from zero * fix autoscaling due to VMSS tag prefix issue corrected the azure_kubernetes_ercice_pool_test unit test cases involving the changed tag prefix added const aksManagedPoolName attribute to the top of the code and fixed file name sercice -> service added logic for old clusters that still have poolName added legacy tag for poolName Fixed Autoscaling due to VMSS tag prefix issue, added tags for legacy poolName and aksManagedPoolName, and corrected file name sercice->service * CA - AWS Cloud Provider - 1.22 Static Instance List Update 02-06-2022 * fix instance type fallback Instead of logging a fatal error, log a standard error and fall back to loading instance types from the static list. * Cluster Autoscaler - 1.22.3 release * Sync_changes file updated Co-authored-by: Benjamin Pineau <[email protected]> Co-authored-by: Adrian Lai <[email protected]> Co-authored-by: Kubernetes Prow Robot <[email protected]> Co-authored-by: Thomas Hartland <[email protected]> Co-authored-by: Sylvain Rabot <[email protected]> Co-authored-by: Jakub Tużnik <[email protected]> Co-authored-by: GuyTempleton <[email protected]> Co-authored-by: sturman <[email protected]> Co-authored-by: Maciek Pytel <[email protected]> Co-authored-by: ialidzhikov <[email protected]> Co-authored-by: Christian Bianchi <[email protected]> Co-authored-by: Zhecheng Li <[email protected]> Co-authored-by: Marwan Ahmed <[email protected]> Co-authored-by: mirandacraghead <[email protected]> Co-authored-by: Todd Neal <[email protected]>

k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 6, 2021

k8s-ci-robot requested review from feiskyer and nilo19 December 6, 2021 14:48

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 6, 2021

add more azure instance types

bda85b3

marwanad force-pushed the add-more-azure-instance-types branch from 37860c8 to bda85b3 Compare December 6, 2021 15:06

k8s-ci-robot assigned nilo19 Dec 7, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 7, 2021

k8s-ci-robot merged commit 9e2ecfd into kubernetes:master Dec 7, 2021

gandhipr pushed a commit to gandhipr/autoscaler that referenced this pull request Dec 8, 2021

Merge pull request kubernetes#4497 from marwanad/add-more-azure-insta…

4abe389

…nce-types add more azure instance types

gandhipr mentioned this pull request Dec 8, 2021

Cherry-pick #4497 onto 1.22 - add more azure instance types #4509

Merged

k8s-ci-robot added a commit that referenced this pull request Dec 9, 2021

Merge pull request #4509 from gandhipr/cluster-autoscaler-release-1.22

f17533b

Cherry-pick #4497 onto 1.22 - add more azure instance types

gandhipr pushed a commit to gandhipr/autoscaler that referenced this pull request Dec 14, 2021

Merge pull request kubernetes#4497 from marwanad/add-more-azure-insta…

e94dcdd

…nce-types add more azure instance types

gandhipr mentioned this pull request Dec 14, 2021

Cherry-pick #4497 onto 1.21 - add more azure instance types #4525

Merged

gandhipr pushed a commit to gandhipr/autoscaler that referenced this pull request Dec 14, 2021

Merge pull request kubernetes#4497 from marwanad/add-more-azure-insta…

2bc2337

…nce-types add more azure instance types

gandhipr mentioned this pull request Dec 14, 2021

Cherry-pick #4497 onto 1.20 - add more azure instance types #4526

Merged

k8s-ci-robot added a commit that referenced this pull request Dec 14, 2021

Merge pull request #4526 from gandhipr/cluster-autoscaler-release-1.20

a5c77a0

Cherry-pick #4497 onto 1.20 - add more azure instance types

k8s-ci-robot added a commit that referenced this pull request Dec 14, 2021

Merge pull request #4525 from gandhipr/cluster-autoscaler-release-1.21

b45adf1

Cherry-pick #4497 onto 1.21 - add more azure instance types

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add more azure instance types #4497

add more azure instance types #4497

marwanad commented Dec 6, 2021

k8s-ci-robot commented Dec 6, 2021

nilo19 commented Dec 7, 2021

nilo19 commented Dec 14, 2021

add more azure instance types #4497

add more azure instance types #4497

Conversation

marwanad commented Dec 6, 2021

k8s-ci-robot commented Dec 6, 2021

nilo19 commented Dec 7, 2021

nilo19 commented Dec 14, 2021