Skip to content

Commit

Permalink
Azure: keep refreshes spread over time
Browse files Browse the repository at this point in the history
When a `vmssVmsCacheJitter` is provided, API calls (after start)
will be randomly spread over the provided time range, then happens
at regular interval (for a given VMSS). This prevents API calls
spikes.

But we noticed that the various VMSS' refreshes will progressively
converge and agglomerate over time (in particular after a few large
throttling windows affected the autoscaler), which defeats the
purpose.

Re-randomizing the next refresh deadline every time (rather than
just at autoscaler start) keeps the calls properly spread.
Configuring `vmssVmsCacheJitter` and `vmssVmsCacheTTL` allows users
to control the average and worst case refresh interval (and avg
API call rate). And we can count on VMSS size change detection
to kick early refreshes when needed.

That's a small behaviour change, but possibly still a good time
for that, as `vmssVmsCacheJitter` was introduced recently and
wasn't part of any release yet.
  • Loading branch information
bpineau committed Oct 19, 2020
1 parent 0e8e609 commit ec2e477
Showing 1 changed file with 3 additions and 8 deletions.
11 changes: 3 additions & 8 deletions cluster-autoscaler/cloudprovider/azure/azure_scale_set.go
Original file line number Diff line number Diff line change
Expand Up @@ -562,18 +562,13 @@ func (scaleSet *ScaleSet) Nodes() ([]cloudprovider.Instance, error) {
}

klog.V(4).Infof("Nodes: starts to get VMSS VMs")

lastRefresh := time.Now()
if scaleSet.lastInstanceRefresh.IsZero() && scaleSet.instancesRefreshJitter > 0 {
// new VMSS: spread future refreshs
splay := rand.New(rand.NewSource(time.Now().UnixNano())).Intn(scaleSet.instancesRefreshJitter + 1)
lastRefresh = time.Now().Add(-time.Second * time.Duration(splay))
}
splay := rand.New(rand.NewSource(time.Now().UnixNano())).Intn(scaleSet.instancesRefreshJitter + 1)
lastRefresh := time.Now().Add(-time.Second * time.Duration(splay))

vms, rerr := scaleSet.GetScaleSetVms()
if rerr != nil {
if isAzureRequestsThrottled(rerr) {
// Log a warning and update the instance refresh time so that it would retry after next scaleSet.instanceRefreshPeriod.
// Log a warning and update the instance refresh time so that it would retry after cache expiration
klog.Warningf("GetScaleSetVms() is throttled with message %v, would return the cached instances", rerr)
scaleSet.lastInstanceRefresh = lastRefresh
return scaleSet.instanceCache, nil
Expand Down

0 comments on commit ec2e477

Please sign in to comment.