Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unwanted VMSS VMs caches invalidations #3437

Merged

Conversation

bpineau
Copy link
Contributor

@bpineau bpineau commented Aug 18, 2020

fetchAutoAsgs() is called at regular intervals, fetches a list of VMSS, then call Register() to cache each of those. That registration function will tell the caller wether that vmss' cache is outdated (when the provided VMSS, supposedly fresh, is different than the one held in cache) and will replace existing cache entry by the provided VMSS (which in effect will require a forced refresh since that ScaleSet struct is passed by fetchAutoAsgs with a nil lastRefresh time and an empty instanceCache).

To detect changes, Register() uses an reflect.DeepEqual() between the provided and the cached VMSS. Which does always find them different: cached VMSS were enriched with instances lists (while the provided one is blank, fresh from a simple vmss.list call). That DeepEqual is also fragile due to the compared structs containing mutexes (that may be held or not) and refresh timestamps, attributes that shouldn't be relevant to the comparison.

As a consequence, all Register() calls causes indirect cache invalidations and a costly refresh (VMSS VMS List). The number of Register() calls is directly proportional to the number of VMSS attached to the cluster, and can easily triggers ARM API throttling.

With a large number of VMSS, that throttling prevents fetchAutoAsgs to ever succeed (and cluster-autoscaler to start). ie.:

I0807 16:55:25.875907     153 azure_scale_set.go:344] GetScaleSetVms: starts
I0807 16:55:25.875915     153 azure_scale_set.go:350] GetScaleSetVms: scaleSet.Name: a-testvmss-10, vmList: []
E0807 16:55:25.875919     153 azure_scale_set.go:352] VirtualMachineScaleSetVMsClient.List failed for a-testvmss-10: &{true 0 2020-08-07 17:10:25.875447854 +0000 UTC m=+913.985215807 azure cloud provider throttled for operation VMSSVMList with reason "client throttled"}
E0807 16:55:25.875928     153 azure_manager.go:538] Failed to regenerate ASG cache: Retriable: true, RetryAfter: 899s, HTTPStatusCode: 0, RawError: azure cloud provider throttled for operation VMSSVMList with reason "client throttled"
F0807 16:55:25.875934     153 azure_cloud_provider.go:167] Failed to create Azure Manager: Retriable: true, RetryAfter: 899s, HTTPStatusCode: 0, RawError: azure cloud provider throttled for operation VMSSVMList with reason "client throttled"

From ScaleSet struct attributes (manager, sizes, mutexes, refreshes timestamps) only sizes are relevant to that comparison. curSize is not strictly necessary, but comparing it will provide early instance caches refreshes (please tell me if you'd rather not have this here).

Example on a cluster with 219 attached VMSS (prev version was unthrottled at 12:29, modified CA rolled out at 12:38):
image
image

`fetchAutoAsgs()` is called at regular intervals, fetches a list of VMSS,
then call `Register()` to cache each of those. That registration function
will tell the caller wether that vmss' cache is outdated (when the provided
VMSS, supposedly fresh, is different than the one held in cache) and will
replace existing cache entry by the provided VMSS (which in effect will
require a forced refresh since that ScaleSet struct is passed by
fetchAutoAsgs with a nil lastRefresh time and an empty instanceCache).

To detect changes, `Register()` uses an `reflect.DeepEqual()` between the
provided and the cached VMSS. Which does always find them different: cached
VMSS were enriched with instances lists (while the provided one is blank,
fresh from a simple vmss.list call). That DeepEqual is also fragile due to
the compared structs containing mutexes (that may be held or not) and
refresh timestamps, attributes that shoudln't be relevant to the comparison.

As a consequence, all Register() calls causes indirect cache invalidations
and a costly refresh (VMSS VMS List). The number of Register() calls is
directly proportional to the number of VMSS attached to the cluster, and
can easily triggers ARM API throttling.

With a large number of VMSS, that throttling prevents `fetchAutoAsgs` to
ever succeed (and cluster-autoscaler to start). ie.:

```
I0807 16:55:25.875907     153 azure_scale_set.go:344] GetScaleSetVms: starts
I0807 16:55:25.875915     153 azure_scale_set.go:350] GetScaleSetVms: scaleSet.Name: a-testvmss-10, vmList: []
E0807 16:55:25.875919     153 azure_scale_set.go:352] VirtualMachineScaleSetVMsClient.List failed for a-testvmss-10: &{true 0 2020-08-07 17:10:25.875447854 +0000 UTC m=+913.985215807 azure cloud provider throttled for operation VMSSVMList with reason "client throttled"}
E0807 16:55:25.875928     153 azure_manager.go:538] Failed to regenerate ASG cache: Retriable: true, RetryAfter: 899s, HTTPStatusCode: 0, RawError: azure cloud provider throttled for operation VMSSVMList with reason "client throttled"
F0807 16:55:25.875934     153 azure_cloud_provider.go:167] Failed to create Azure Manager: Retriable: true, RetryAfter: 899s, HTTPStatusCode: 0, RawError: azure cloud provider throttled for operation VMSSVMList with reason "client throttled"
goroutine 28 [running]:
```

From [`ScaleSet` struct attributes](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/azure_scale_set.go#L74-L89)
(manager, sizes, mutexes, refreshes timestamps) only sizes are relevant
to that comparison. `curSize` is not strictly necessary, but comparing it
will provide early instance caches refreshs.
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 18, 2020
@k8s-ci-robot
Copy link
Contributor

Welcome @bpineau!

It looks like this is your first PR to kubernetes/autoscaler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/autoscaler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Aug 18, 2020
@nilo19
Copy link
Member

nilo19 commented Aug 18, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 18, 2020
@marwanad
Copy link
Member

@bpineau Thanks for this fix! On that note, the implementation of the caches within azure_scale_set.go isn't ideal because essentially every scaling group will have its own cache of the size, which makes the batched List VMSS calls that retrieve the size redundant. I'm planning on redesigning this flow .

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: marwanad

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 18, 2020
@k8s-ci-robot k8s-ci-robot merged commit 5b69017 into kubernetes:master Aug 18, 2020
@bpineau
Copy link
Contributor Author

bpineau commented Aug 19, 2020

Thanks @marwanad !
Very true, that makes two sources of truth for vmss sizes.

One good aspect of the VMSS List (as op. to the many VMSS VMs Lists) is it's constant time (one call every vmssCacheTTL seconds), while we're suffering from the many VMSS VM List calls, O(number of vmss).
In that sense, it's a cheaper way to check for VMSSes freshness. Is that something informing the redesign?

I'm working on adding an optional jitter to spread those vmssvm.list calls, should help us survive until the redesign lands in.

@marwanad
Copy link
Member

@bpineau

Yup, so the redesign will address VMSS LIST and have the size source of truth in a single global cache. This is going to be O(1) every vmssCacheTTL. Today, unfortunately it's O(numberOfVMSS) every vmssCacheTTL.

This acts as guard to the VMSS VM cache in a sense as well because if the sizes are the same, there's no need to fetch new VM instances.

@bpineau
Copy link
Contributor Author

bpineau commented Aug 19, 2020

@marwanad Amazing! That redesign will be very useful for us, and a great relieve!

k8s-ci-robot added a commit that referenced this pull request Sep 17, 2020
Cherry pick #3437 onto 1.18 - Avoid unwanted VMSS VMs caches invalidation
k8s-ci-robot added a commit that referenced this pull request Sep 17, 2020
Cherry pick #3437 onto 1.19 - Avoid unwanted VMSS VMs caches invalidations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants