Azure cloud provider: backoff needs retries #3449
Merged
+5
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When
cloudProviderBackoff
is enabled,cloudProviderBackoffRetries
must also be set to a value> 0
.Otherwise cluster-autoscaler will instantiate a vmss client with 0 Steps retries, which will cause the doBackoffRetry() decorator to return a nil response and nil error on requests. ARM client can't cope with those; it will dereference the nil response and segfault. A PR to prevent the segfault is discussed with ARM client's upstream: here we only try to reject wrong configs early and avoid providing bogus values in the first place.
README.md
needed a small update, because the defaults values' documentation can be slightly misleading. Defaults don't apply (and all env variables are ignored) when the cluster-autoscaler is provided a config file, due to env+defaults parsing being silently ignored in that case. ie. we're using a config file, and shouldn't have counted oncloudProviderBackoffRetries: 6
default.Semi-related: would a follow-up PR setting the defaults in both cases (env and config file), and deep merging config file parsing over potentially provided env variables something you'd consider? I think that's the least surprise behaviour, but that's a change for those used to documented default and env being ignored when they provide a conf.