Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provider/Azure: Add VM cache. #2683

Merged
merged 1 commit into from
Dec 24, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion cluster-autoscaler/cloudprovider/azure/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,8 @@ Make a copy of [cluster-autoscaler-standard-master.yaml](examples/cluster-autosc

In the `cluster-autoscaler` spec, find the `image:` field and replace `{{ ca_version }}` with a specific cluster autoscaler release.

Below that, in the `command:` section, update the `--nodes=` arguments to reference your node limits and node pool name. For example, if node pool "k8s-nodepool-1" should scale from 1 to 10 nodes:
Below that, in the `command:` section, update the `--nodes=` arguments to reference your node limits and node pool name (tips: node pool name is NOT availability set name, e.g., the corresponding node pool name of the availability set
`agentpool1-availabilitySet-xxxxxxxx` would be `agentpool1`). For example, if node pool "k8s-nodepool-1" should scale from 1 to 10 nodes:

```yaml
- --nodes=1:10:k8s-nodepool-1
Expand Down
39 changes: 36 additions & 3 deletions cluster-autoscaler/cloudprovider/azure/azure_agent_pool.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,16 @@ import (
schedulernodeinfo "k8s.io/kubernetes/pkg/scheduler/nodeinfo"
)

var (
vmInstancesRefreshPeriod = 5 * time.Minute
)

var virtualMachinesStatusCache struct {
lastRefresh time.Time
mutex sync.Mutex
virtualMachines []compute.VirtualMachine
}

// AgentPool implements NodeGroup interface for agent pools deployed by aks-engine.
type AgentPool struct {
azureRef
Expand Down Expand Up @@ -117,9 +127,32 @@ func (as *AgentPool) MaxSize() int {
return as.maxSize
}

func (as *AgentPool) getVirtualMachinesFromCache() ([]compute.VirtualMachine, error) {
virtualMachinesStatusCache.mutex.Lock()
defer virtualMachinesStatusCache.mutex.Unlock()

if virtualMachinesStatusCache.lastRefresh.Add(vmInstancesRefreshPeriod).After(time.Now()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there're three places of such codes, let's extract a new func for getting VMs from cache and refresh if the cache is outdated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished.

return virtualMachinesStatusCache.virtualMachines, nil
}

vms, err := as.GetVirtualMachines()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should check err before the cache update. or else, the cache may be cleaned on errors

if err != nil {
if isAzureRequestsThrottled(err) {
klog.Warningf("getAllVirtualMachines: throttling with message %v, would return the cached vms", err)
return virtualMachinesStatusCache.virtualMachines, nil
}

return []compute.VirtualMachine{}, err
}
virtualMachinesStatusCache.virtualMachines = vms
virtualMachinesStatusCache.lastRefresh = time.Now()

return vms, err
}

// GetVMIndexes gets indexes of all virtual machines belonging to the agent pool.
func (as *AgentPool) GetVMIndexes() ([]int, map[int]string, error) {
instances, err := as.GetVirtualMachines()
instances, err := as.getVirtualMachinesFromCache()
if err != nil {
return nil, nil, err
}
Expand Down Expand Up @@ -266,7 +299,7 @@ func (as *AgentPool) DecreaseTargetSize(delta int) error {
as.mutex.Lock()
defer as.mutex.Unlock()

nodes, err := as.GetVirtualMachines()
nodes, err := as.getVirtualMachinesFromCache()
if err != nil {
return err
}
Expand Down Expand Up @@ -391,7 +424,7 @@ func (as *AgentPool) TemplateNodeInfo() (*schedulernodeinfo.NodeInfo, error) {

// Nodes returns a list of all nodes that belong to this node group.
func (as *AgentPool) Nodes() ([]cloudprovider.Instance, error) {
instances, err := as.GetVirtualMachines()
instances, err := as.getVirtualMachinesFromCache()
if err != nil {
return nil, err
}
Expand Down