Azure: keep refreshes spread over time #3631
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When a
vmssVmsCacheJitter
is provided, API calls will be randomly spread (at startup time) over the provided time range, then happens at regular interval (for a given VMSS). This is meant to prevents API calls spikes.But we noticed that the various VMSS' refreshes will progressively converge and agglomerate over time (in particular after a few large throttling windows affected the autoscaler), which defeats the purpose.
Re-randomizing the next refresh deadline every time (rather than just at autoscaler start) keeps the calls properly spread.
Configuring
vmssVmsCacheJitter
andvmssVmsCacheTTL
allows users to control the average and worst case refresh interval (and avg API call rate). And we can count on VMSS size change detection to kick early refreshes when needed.That's a small behaviour change, but possibly still a good time for that, as
vmssVmsCacheJitter
was introduced recently and wasn't part of any release yet.This also slightly simplifies the code (no need to check if scaleSet.instancesRefreshJitter is > 0, if it is then lastRefresh will be time.Now().Add(0), so current time anyway).
An autoscaler instance running with
"vmssVmsCacheJitter": 3540
a few days after start:Once patched: