Skip to content

Commit

Permalink
Sync with upstream v1.22.3 (#131)
Browse files Browse the repository at this point in the history
* Set maxAsgNamesPerDescribe to the new maximum value

While this was previously effectively limited to 50, `DescribeAutoScalingGroups` now supports
fetching 100 ASG per calls on all regions, matching what's documented:
https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_DescribeAutoScalingGroups.html
```
     AutoScalingGroupNames.member.N
       The names of the Auto Scaling groups.
       By default, you can only specify up to 50 names.
       You can optionally increase this limit using the MaxRecords parameter.
     MaxRecords
       The maximum number of items to return with this call.
       The default value is 50 and the maximum value is 100.
```

Doubling this halves API calls on large clusters, which should help to prevent throttling.

* Break out unmarshal from GenerateEC2InstanceTypes

Refactor to allow for optimisation

* Optimise GenerateEC2InstanceTypes unmarshal memory usage

The pricing json for us-east-1 is currently 129MB. Currently fetching
this into memory and parsing results in a large memory footprint on
startup, and can lead to the autoscaler being OOMKilled.

Change the ReadAll/Unmarshal logic to a stream decoder to significantly
reduce the memory use.

* Use highest available magnum microversion

Magnum allows using the microversion string "latest",
and it will replace it internally with the highest
microversion that it supports.

This will let the autoscaler use microversion 1.10 which
allows scaling groups to 0 nodes, if it is available.

The autoscaler will still be able to use microversion 1.9
on older versions of magnum.

* Merge pull request kubernetes#4274 from kinvolk/imran/cloud-provider-packet-fix

Cloud provider[Packet] fixes

* Improve misleading log

Signed-off-by: Sylvain Rabot <[email protected]>

* Cluster Autoscaler 1.22.1

* CA - AWS - Instance List Update 03-10-21 - 1.22 release branch

* CA - AWS - Instance List Update 29-10-21 - 1.22 release branch

* Cluster-Autoscaler update AWS EC2 instance types with g5, m6 and r6

* Merge pull request kubernetes#4497 from marwanad/add-more-azure-instance-types

add more azure instance types

* CA - AWS Instance List Update - 13/12/21 - 1.22

* Cluster Autoscaler 1.22.2

* Add `--feature-gates` flag to support scale up on volume limits (CSI migration enabled)

Signed-off-by: ialidzhikov <[email protected]>

* Fix Azure IMDS Url in InstanceMetadataService initialization

* Remove variables not used in azure_util_test

Signed-off-by: Zhecheng Li <[email protected]>

* add recent AKS agentpool label to ignore for similarity checks

* ignore azure csi topology label for similarity checks and populate it for scale from zero

* fix autoscaling due to VMSS tag prefix issue

corrected the azure_kubernetes_ercice_pool_test unit test cases involving the changed tag prefix

added const aksManagedPoolName attribute to the top of the code and fixed file name sercice -> service

added logic for old clusters that still have poolName

added legacy tag for poolName

Fixed Autoscaling due to VMSS tag prefix issue, added tags for legacy poolName and aksManagedPoolName, and corrected file name sercice->service

* CA - AWS Cloud Provider - 1.22 Static Instance List Update 02-06-2022

* fix instance type fallback

Instead of logging a fatal error, log a standard error and fall back to
loading instance types from the static list.

* Cluster Autoscaler - 1.22.3 release

* Sync_changes file updated

Co-authored-by: Benjamin Pineau <[email protected]>
Co-authored-by: Adrian Lai <[email protected]>
Co-authored-by: Kubernetes Prow Robot <[email protected]>
Co-authored-by: Thomas Hartland <[email protected]>
Co-authored-by: Sylvain Rabot <[email protected]>
Co-authored-by: Jakub Tużnik <[email protected]>
Co-authored-by: GuyTempleton <[email protected]>
Co-authored-by: sturman <[email protected]>
Co-authored-by: Maciek Pytel <[email protected]>
Co-authored-by: ialidzhikov <[email protected]>
Co-authored-by: Christian Bianchi <[email protected]>
Co-authored-by: Zhecheng Li <[email protected]>
Co-authored-by: Marwan Ahmed <[email protected]>
Co-authored-by: mirandacraghead <[email protected]>
Co-authored-by: Todd Neal <[email protected]>
  • Loading branch information
16 people authored Jun 25, 2022
1 parent ddbef23 commit 4ee83bf
Show file tree
Hide file tree
Showing 25 changed files with 3,282 additions and 714 deletions.
536 changes: 148 additions & 388 deletions cluster-autoscaler/FAQ.md

Large diffs are not rendered by default.

31 changes: 31 additions & 0 deletions cluster-autoscaler/SYNC-CHANGES/SYNC-CHANGES-1.22.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,14 @@
- [During vendoring k8s](#during-vendoring-k8s)
- [Others](#others)

- [v1.22.1](#v1221)
- [Synced with which upstream CA](#synced-with-which-upstream-ca-1)
- [Changes made](#changes-made-1)
- [To FAQ](#to-faq-1)
- [During merging](#during-merging-1)
- [During vendoring k8s](#during-vendoring-k8s-1)
- [Others](#others-1)


# v1.22.0

Expand Down Expand Up @@ -35,3 +43,26 @@
### Others
- [Release matrix](../README.md#releases-gardenerautoscaler) of Gardener Autoscaler updated.
- GO111MODULE=off in .ci/build and .ci/test.


# v1.22.1


## Synced with which upstream CA

[v1.22.3](https://github.com/kubernetes/autoscaler/tree/cluster-autoscaler-1.22.3/cluster-autoscaler)

## Changes made

### To FAQ

- updated with new question and answers

### During merging
_None_

### During vendoring k8s
- didn't vendor in k8s 1.25 which is vendored in upstream CA 1.22.3

### Others

14 changes: 7 additions & 7 deletions cluster-autoscaler/cloudprovider/aws/auto_scaling_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,22 +27,22 @@ import (
"github.com/stretchr/testify/require"
)

func TestMoreThen50Groups(t *testing.T) {
func TestMoreThen100Groups(t *testing.T) {
service := &AutoScalingMock{}
autoScalingWrapper := &autoScalingWrapper{
autoScaling: service,
}

// Generate 51 ASG names
names := make([]string, 51)
// Generate 101 ASG names
names := make([]string, 101)
for i := 0; i < len(names); i++ {
names[i] = fmt.Sprintf("asg-%d", i)
}

// First batch, first 50 elements
// First batch, first 100 elements
service.On("DescribeAutoScalingGroupsPages",
&autoscaling.DescribeAutoScalingGroupsInput{
AutoScalingGroupNames: aws.StringSlice(names[:50]),
AutoScalingGroupNames: aws.StringSlice(names[:100]),
MaxRecords: aws.Int64(maxRecordsReturnedByAPI),
},
mock.AnythingOfType("func(*autoscaling.DescribeAutoScalingGroupsOutput, bool) bool"),
Expand All @@ -51,10 +51,10 @@ func TestMoreThen50Groups(t *testing.T) {
fn(testNamedDescribeAutoScalingGroupsOutput("asg-1", 1, "test-instance-id"), false)
}).Return(nil)

// Second batch, element 51
// Second batch, element 101
service.On("DescribeAutoScalingGroupsPages",
&autoscaling.DescribeAutoScalingGroupsInput{
AutoScalingGroupNames: aws.StringSlice([]string{"asg-50"}),
AutoScalingGroupNames: aws.StringSlice([]string{"asg-100"}),
MaxRecords: aws.Int64(maxRecordsReturnedByAPI),
},
mock.AnythingOfType("func(*autoscaling.DescribeAutoScalingGroupsOutput, bool) bool"),
Expand Down
5 changes: 4 additions & 1 deletion cluster-autoscaler/cloudprovider/aws/aws_cloud_provider.go
Original file line number Diff line number Diff line change
Expand Up @@ -362,7 +362,10 @@ func BuildAWS(opts config.AutoscalingOptions, do cloudprovider.NodeGroupDiscover

generatedInstanceTypes, err := GenerateEC2InstanceTypes(region)
if err != nil {
klog.Fatalf("Failed to generate AWS EC2 Instance Types: %v", err)
klog.Errorf("Failed to generate AWS EC2 Instance Types: %v, falling back to static list with last update time: %s", err, lastUpdateTime)
}
if generatedInstanceTypes == nil {
generatedInstanceTypes = map[string]*InstanceType{}
}
// fallback on the static list if we miss any instance types in the generated output
// credits to: https://github.com/lyft/cni-ipvlan-vpc-k8s/pull/80
Expand Down
19 changes: 19 additions & 0 deletions cluster-autoscaler/cloudprovider/aws/aws_cloud_provider_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ limitations under the License.
package aws

import (
"os"
"testing"

"github.com/aws/aws-sdk-go/aws"
Expand All @@ -26,6 +27,7 @@ import (
"github.com/stretchr/testify/mock"
apiv1 "k8s.io/api/core/v1"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider"
"k8s.io/autoscaler/cluster-autoscaler/config"
)

type AutoScalingMock struct {
Expand Down Expand Up @@ -148,6 +150,23 @@ func TestBuildAwsCloudProvider(t *testing.T) {
assert.NoError(t, err)
}

func TestInstanceTypeFallback(t *testing.T) {
resourceLimiter := cloudprovider.NewResourceLimiter(
map[string]int64{cloudprovider.ResourceNameCores: 1, cloudprovider.ResourceNameMemory: 10000000},
map[string]int64{cloudprovider.ResourceNameCores: 10, cloudprovider.ResourceNameMemory: 100000000})

do := cloudprovider.NodeGroupDiscoveryOptions{}
opts := config.AutoscalingOptions{}

os.Setenv("AWS_REGION", "non-existent-region")
defer os.Unsetenv("AWS_REGION")

// This test ensures that no klog.Fatalf calls occur when constructing the AWS cloud provider. Specifically it is
// intended to ensure that instance type fallback works correctly in the event of an error enumerating instance
// types.
_ = BuildAWS(opts, do, resourceLimiter)
}

func TestName(t *testing.T) {
provider := testProvider(t, testAwsManager)
assert.Equal(t, provider.Name(), cloudprovider.AwsProviderName)
Expand Down
4 changes: 2 additions & 2 deletions cluster-autoscaler/cloudprovider/aws/aws_manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ const (
operationWaitTimeout = 5 * time.Second
operationPollInterval = 100 * time.Millisecond
maxRecordsReturnedByAPI = 100
maxAsgNamesPerDescribe = 50
maxAsgNamesPerDescribe = 100
refreshInterval = 1 * time.Minute
autoDiscovererTypeASG = "asg"
asgAutoDiscovererKeyTag = "tag"
Expand Down Expand Up @@ -312,7 +312,7 @@ func (m *AwsManager) getAsgTemplate(asg *asg) (*asgTemplate, error) {
region := az[0 : len(az)-1]

if len(asg.AvailabilityZones) > 1 {
klog.Warningf("Found multiple availability zones for ASG %q; using %s\n", asg.Name, az)
klog.V(4).Infof("Found multiple availability zones for ASG %q; using %s for %s label\n", asg.Name, az, apiv1.LabelFailureDomainBetaZone)
}

instanceTypeName, err := m.buildInstanceType(asg)
Expand Down
79 changes: 63 additions & 16 deletions cluster-autoscaler/cloudprovider/aws/aws_util.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,24 +20,26 @@ import (
"encoding/json"
"errors"
"fmt"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/ec2metadata"
"github.com/aws/aws-sdk-go/aws/endpoints"
"github.com/aws/aws-sdk-go/aws/session"
"io/ioutil"
klog "k8s.io/klog/v2"
"io"
"net/http"
"os"
"regexp"
"strconv"
"strings"

"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/ec2metadata"
"github.com/aws/aws-sdk-go/aws/endpoints"
"github.com/aws/aws-sdk-go/aws/session"

klog "k8s.io/klog/v2"
)

var (
ec2MetaDataServiceUrl = "http://169.254.169.254"
ec2PricingServiceUrlTemplate = "https://pricing.us-east-1.amazonaws.com/offers/v1.0/aws/AmazonEC2/current/%s/index.json"
ec2PricingServiceUrlTemplateCN = "https://pricing.cn-north-1.amazonaws.com.cn/offers/v1.0/cn/AmazonEC2/current/%s/index.json"
staticListLastUpdateTime = "2020-12-07"
staticListLastUpdateTime = "2022-06-02"
ec2Arm64Processors = []string{"AWS Graviton Processor", "AWS Graviton2 Processor"}
)

Expand Down Expand Up @@ -87,16 +89,9 @@ func GenerateEC2InstanceTypes(region string) (map[string]*InstanceType, error) {

defer res.Body.Close()

body, err := ioutil.ReadAll(res.Body)
if err != nil {
klog.Warningf("Error parsing %s skipping...\n", url)
continue
}

var unmarshalled = response{}
err = json.Unmarshal(body, &unmarshalled)
unmarshalled, err := unmarshalProductsResponse(res.Body)
if err != nil {
klog.Warningf("Error unmarshalling %s, skip...\n", url)
klog.Warningf("Error parsing %s skipping...\n%s\n", url, err)
continue
}

Expand Down Expand Up @@ -135,6 +130,58 @@ func GetStaticEC2InstanceTypes() (map[string]*InstanceType, string) {
return InstanceTypes, staticListLastUpdateTime
}

func unmarshalProductsResponse(r io.Reader) (*response, error) {
dec := json.NewDecoder(r)
t, err := dec.Token()
if err != nil {
return nil, err
}
if delim, ok := t.(json.Delim); !ok || delim.String() != "{" {
return nil, errors.New("Invalid products json")
}

unmarshalled := response{map[string]product{}}

for dec.More() {
t, err = dec.Token()
if err != nil {
return nil, err
}

if t == "products" {
tt, err := dec.Token()
if err != nil {
return nil, err
}
if delim, ok := tt.(json.Delim); !ok || delim.String() != "{" {
return nil, errors.New("Invalid products json")
}
for dec.More() {
productCode, err := dec.Token()
if err != nil {
return nil, err
}

prod := product{}
if err = dec.Decode(&prod); err != nil {
return nil, err
}
unmarshalled.Products[productCode.(string)] = prod
}
}
}

t, err = dec.Token()
if err != nil {
return nil, err
}
if delim, ok := t.(json.Delim); !ok || delim.String() != "}" {
return nil, errors.New("Invalid products json")
}

return &unmarshalled, nil
}

func parseMemory(memory string) int64 {
reg, err := regexp.Compile("[^0-9\\.]+")
if err != nil {
Expand Down
119 changes: 118 additions & 1 deletion cluster-autoscaler/cloudprovider/aws/aws_util_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,14 @@ limitations under the License.
package aws

import (
"github.com/stretchr/testify/assert"
"net/http"
"net/http/httptest"
"os"
"strconv"
"strings"
"testing"

"github.com/stretchr/testify/assert"
)

func TestGetStaticEC2InstanceTypes(t *testing.T) {
Expand Down Expand Up @@ -136,3 +138,118 @@ func TestGetCurrentAwsRegionWithRegionEnv(t *testing.T) {
assert.Nil(t, err)
assert.Equal(t, region, result)
}

func TestUnmarshalProductsResponse(t *testing.T) {
body := `
{
"products": {
"VVD8BG8WWFD3DAZN" : {
"sku" : "VVD8BG8WWFD3DAZN",
"productFamily" : "Compute Instance",
"attributes" : {
"servicecode" : "AmazonEC2",
"location" : "US East (N. Virginia)",
"locationType" : "AWS Region",
"instanceType" : "r5b.4xlarge",
"currentGeneration" : "Yes",
"instanceFamily" : "Memory optimized",
"vcpu" : "16",
"physicalProcessor" : "Intel Xeon Platinum 8259 (Cascade Lake)",
"clockSpeed" : "3.1 GHz",
"memory" : "128 GiB",
"storage" : "EBS only",
"networkPerformance" : "Up to 10 Gigabit",
"processorArchitecture" : "64-bit",
"tenancy" : "Shared",
"operatingSystem" : "Linux",
"licenseModel" : "No License required",
"usagetype" : "UnusedBox:r5b.4xlarge",
"operation" : "RunInstances:0004",
"availabilityzone" : "NA",
"capacitystatus" : "UnusedCapacityReservation",
"classicnetworkingsupport" : "false",
"dedicatedEbsThroughput" : "10 Gbps",
"ecu" : "NA",
"enhancedNetworkingSupported" : "Yes",
"instancesku" : "G4NFAXD9TGJM3RY8",
"intelAvxAvailable" : "Yes",
"intelAvx2Available" : "No",
"intelTurboAvailable" : "No",
"marketoption" : "OnDemand",
"normalizationSizeFactor" : "32",
"preInstalledSw" : "SQL Std",
"servicename" : "Amazon Elastic Compute Cloud",
"vpcnetworkingsupport" : "true"
}
},
"C36QEQQQJ8ZR7N32" : {
"sku" : "C36QEQQQJ8ZR7N32",
"productFamily" : "Compute Instance",
"attributes" : {
"servicecode" : "AmazonEC2",
"location" : "US East (N. Virginia)",
"locationType" : "AWS Region",
"instanceType" : "d3en.8xlarge",
"currentGeneration" : "Yes",
"instanceFamily" : "Storage optimized",
"vcpu" : "32",
"physicalProcessor" : "Intel Xeon Platinum 8259 (Cascade Lake)",
"clockSpeed" : "3.1 GHz",
"memory" : "128 GiB",
"storage" : "16 x 14000 HDD",
"networkPerformance" : "50 Gigabit",
"processorArchitecture" : "64-bit",
"tenancy" : "Dedicated",
"operatingSystem" : "SUSE",
"licenseModel" : "No License required",
"usagetype" : "DedicatedRes:d3en.8xlarge",
"operation" : "RunInstances:000g",
"availabilityzone" : "NA",
"capacitystatus" : "AllocatedCapacityReservation",
"classicnetworkingsupport" : "false",
"dedicatedEbsThroughput" : "5000 Mbps",
"ecu" : "NA",
"enhancedNetworkingSupported" : "Yes",
"instancesku" : "2XW3BCEZ83WMGFJY",
"intelAvxAvailable" : "Yes",
"intelAvx2Available" : "Yes",
"intelTurboAvailable" : "Yes",
"marketoption" : "OnDemand",
"normalizationSizeFactor" : "64",
"preInstalledSw" : "NA",
"processorFeatures" : "AVX; AVX2; Intel AVX; Intel AVX2; Intel AVX512; Intel Turbo",
"servicename" : "Amazon Elastic Compute Cloud",
"vpcnetworkingsupport" : "true"
}
}
}
}
`
r := strings.NewReader(body)
resp, err := unmarshalProductsResponse(r)
assert.Nil(t, err)
assert.Len(t, resp.Products, 2)
assert.NotNil(t, resp.Products["VVD8BG8WWFD3DAZN"])
assert.NotNil(t, resp.Products["C36QEQQQJ8ZR7N32"])
assert.Equal(t, resp.Products["VVD8BG8WWFD3DAZN"].Attributes.InstanceType, "r5b.4xlarge")
assert.Equal(t, resp.Products["C36QEQQQJ8ZR7N32"].Attributes.InstanceType, "d3en.8xlarge")

invalidJsonTests := map[string]string{
"[": "[",
"]": "]",
"}": "}",
"{": "{",
"Plain text": "invalid",
"List": "[]",
"Invalid products ([])": `{"products":[]}`,
"Invalid product ([])": `{"products":{"zz":[]}}`,
}
for name, body := range invalidJsonTests {
t.Run(name, func(t *testing.T) {
r := strings.NewReader(body)
resp, err := unmarshalProductsResponse(r)
assert.NotNil(t, err)
assert.Nil(t, resp)
})
}
}
Loading

0 comments on commit 4ee83bf

Please sign in to comment.