From 09a08c220faa56156a8ad47226075dd1d852d148 Mon Sep 17 00:00:00 2001 From: Benjamin Pineau Date: Fri, 21 Aug 2020 17:55:36 +0200 Subject: [PATCH] Azure cloud provider: backoff needs retries When `cloudProviderBackoff` is configured, `cloudProviderBackoffRetries` must also be set to a value > 0, otherwise the cluster-autoscaler will instanciate a vmssclient with 0 Steps retries, which will cause `doBackoffRetry()` to return a nil response and nil error on requests. ARM client can't cope with those and will then segfault. See https://github.com/kubernetes/kubernetes/pull/94078 The README.md needed a small update, because the documented defaults are a bit misleading: they don't apply when the cluster-autoscaler is provided a config file, due to: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/azure_manager.go#L299-L308 ... which is also causing all environment variables to be ignored when a configuration file is provided. --- cluster-autoscaler/cloudprovider/azure/README.md | 2 +- cluster-autoscaler/cloudprovider/azure/azure_util.go | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/cluster-autoscaler/cloudprovider/azure/README.md b/cluster-autoscaler/cloudprovider/azure/README.md index 9cab61ee767a..c7aca5cfdaa6 100644 --- a/cluster-autoscaler/cloudprovider/azure/README.md +++ b/cluster-autoscaler/cloudprovider/azure/README.md @@ -250,7 +250,7 @@ Please see the [AKS autoscaler documentation][] for details. ## Rate limit and back-off retries -The new version of [Azure client][] supports rate limit and back-off retries when the cluster hits the throttling issue. These can be set by environment variables or cloud config file. +The new version of [Azure client][] supports rate limit and back-off retries when the cluster hits the throttling issue. These can be set by either environment variables, or cloud config file. With config file, defaults values are false or 0. | Config Name | Default | Environment Variable | Cloud Config File | | ----------- | ------- | -------------------- | ----------------- | diff --git a/cluster-autoscaler/cloudprovider/azure/azure_util.go b/cluster-autoscaler/cloudprovider/azure/azure_util.go index 8ff90451bdac..21ae63dae808 100644 --- a/cluster-autoscaler/cloudprovider/azure/azure_util.go +++ b/cluster-autoscaler/cloudprovider/azure/azure_util.go @@ -559,6 +559,10 @@ func validateConfig(cfg *Config) error { return fmt.Errorf("ARM Client ID not set") } + if cfg.CloudProviderBackoff && cfg.CloudProviderBackoffRetries == 0 { + return fmt.Errorf("Cloud provider backoff is enabled but retries are not set") + } + return nil }