Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure cloud provider: backoff needs retries #3449

Merged
merged 1 commit into from
Aug 22, 2020

Conversation

bpineau
Copy link
Contributor

@bpineau bpineau commented Aug 22, 2020

When cloudProviderBackoff is enabled, cloudProviderBackoffRetries must also be set to a value > 0.

Otherwise cluster-autoscaler will instantiate a vmss client with 0 Steps retries, which will cause the doBackoffRetry() decorator to return a nil response and nil error on requests. ARM client can't cope with those; it will dereference the nil response and segfault. A PR to prevent the segfault is discussed with ARM client's upstream: here we only try to reject wrong configs early and avoid providing bogus values in the first place.

README.md needed a small update, because the defaults values' documentation can be slightly misleading. Defaults don't apply (and all env variables are ignored) when the cluster-autoscaler is provided a config file, due to env+defaults parsing being silently ignored in that case. ie. we're using a config file, and shouldn't have counted on cloudProviderBackoffRetries: 6 default.

Semi-related: would a follow-up PR setting the defaults in both cases (env and config file), and deep merging config file parsing over potentially provided env variables something you'd consider? I think that's the least surprise behaviour, but that's a change for those used to documented default and env being ignored when they provide a conf.

When `cloudProviderBackoff` is configured, `cloudProviderBackoffRetries`
must also be set to a value > 0, otherwise the cluster-autoscaler
will instanciate a vmssclient with 0 Steps retries, which will
cause `doBackoffRetry()` to return a nil response and nil error on
requests. ARM client can't cope with those and will then segfault.
See kubernetes/kubernetes#94078

The README.md needed a small update, because the documented defaults
are a bit misleading: they don't apply when the cluster-autoscaler
is provided a config file, due to:
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/azure_manager.go#L299-L308
... which is also causing all environment variables to be ignored
when a configuration file is provided.
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 22, 2020
@nilo19
Copy link
Member

nilo19 commented Aug 22, 2020

Thanks for the fix.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 22, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nilo19

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 22, 2020
@k8s-ci-robot k8s-ci-robot merged commit 8457dce into kubernetes:master Aug 22, 2020
@feiskyer
Copy link
Member

@bpineau would you like to cherry pick this to 1.18 and 1.19 branch?

@bpineau
Copy link
Contributor Author

bpineau commented Aug 23, 2020

@feiskyer sure. Just did so in #3450 and #3451

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants