Azure: keep refreshes spread over time

When a `vmssVmsCacheJitter` is provided, API calls (after start) will be randomly spread over the provided time range, then happens at regular interval (for a given VMSS). This prevents API calls spikes. But we noticed that the various VMSS' refreshes will progressively converge and agglomerate over time (in particular after a few large throttling windows affected the autoscaler), which defeats the purpose. Re-randomizing the next refresh deadline every time (rather than just at autoscaler start) keeps the calls properly spread. Configuring `vmssVmsCacheJitter` and `vmssVmsCacheTTL` allows users to control the average and worst case refresh interval (and avg API call rate). And we can count on VMSS size change detection to kick early refreshes when needed. That's a small behaviour change, but possibly still a good time for that, as `vmssVmsCacheJitter` was introduced recently and wasn't part of any release yet.
kubernetes · Oct 19, 2020 · ec2e477 · ec2e477
1 parent 0e8e609
commit ec2e477
Showing 1 changed file with 3 additions and 8 deletions.
diff --git a/cluster-autoscaler/cloudprovider/azure/azure_scale_set.go b/cluster-autoscaler/cloudprovider/azure/azure_scale_set.go
@@ -562,18 +562,13 @@ func (scaleSet *ScaleSet) Nodes() ([]cloudprovider.Instance, error) {
 	}
 
 	klog.V(4).Infof("Nodes: starts to get VMSS VMs")
-
-	lastRefresh := time.Now()
-	if scaleSet.lastInstanceRefresh.IsZero() && scaleSet.instancesRefreshJitter > 0 {
-		// new VMSS: spread future refreshs
-		splay := rand.New(rand.NewSource(time.Now().UnixNano())).Intn(scaleSet.instancesRefreshJitter + 1)
-		lastRefresh = time.Now().Add(-time.Second * time.Duration(splay))
-	}
+	splay := rand.New(rand.NewSource(time.Now().UnixNano())).Intn(scaleSet.instancesRefreshJitter + 1)
+	lastRefresh := time.Now().Add(-time.Second * time.Duration(splay))
 
 	vms, rerr := scaleSet.GetScaleSetVms()
 	if rerr != nil {
 		if isAzureRequestsThrottled(rerr) {
-			// Log a warning and update the instance refresh time so that it would retry after next scaleSet.instanceRefreshPeriod.
+			// Log a warning and update the instance refresh time so that it would retry after cache expiration
 			klog.Warningf("GetScaleSetVms() is throttled with message %v, would return the cached instances", rerr)
 			scaleSet.lastInstanceRefresh = lastRefresh
 			return scaleSet.instanceCache, nil