Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix availability set cache in vmss cache #537

Merged
merged 2 commits into from
Mar 10, 2021

Conversation

CecileRobertMichon
Copy link
Contributor

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

/kind bug

What this PR does / why we need it: This fixes a bug affecting clusters with virtual machines when vmType is set to "vmss". What happens is the control manager comes online and queries for azure machines power status. Since the machines are not available yet they are not in the cache. When the request comes in for the load balancer the cache is queried and reports that the node does not exist as a VMAS and attempts to run the VMSS code hence the following error message: failed: not a vmss instance. The same error also occurs when trying to expose a load balancer service. When it is found the cache it goes down the correct code path.

This adds logic to the vmss cache to refresh the availability set cache if the node it is querying for has joined the cluster since the last cache refresh, or if the cache has expired. Note that this approach minimizes cache refreshes by not refreshing unless necessary, but it does add extra GET API calls whenever a new node joins the cluster. This can be skipped altogether with the DisableAvailabilitySetNodes flag that is already implemented. It's not perfect but it will fix most scenarios for CAPZ. In the future, I'd like to propose a new "mixed" or "hybrid" VMSet implementation that is truly aware of both VMs and VMSS nodes as a more long term fix.

Which issue(s) this PR fixes:

Fixes #363

Special notes for your reviewer: @feiskyer if I want this fix to make it back into the in tree provider, what's the best way to proceed?

Release note:

Fix availability set cache in vmss cache

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 5, 2021
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 5, 2021
@CecileRobertMichon
Copy link
Contributor Author

/assign @aramase @jsturtevant @alexeldeib @feiskyer

@CecileRobertMichon
Copy link
Contributor Author

will fix/add UTs shortly. Please feel free to review the code in the meantime.

@nilo19
Copy link
Contributor

nilo19 commented Mar 7, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 7, 2021
@feiskyer
Copy link
Member

feiskyer commented Mar 8, 2021

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 8, 2021
@@ -210,7 +211,7 @@ func (ss *scaleSet) getVmssVM(nodeName string, crt azcache.AzureCacheReadType) (

// GetPowerStatusByNodeName returns the power state of the specified node.
func (ss *scaleSet) GetPowerStatusByNodeName(name string) (powerState string, err error) {
managedByAS, err := ss.isNodeManagedByAvailabilitySet(name, azcache.CacheReadTypeUnsafe)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change would cause every not found node to trigger cache updates and may cause issues when a large number of nodes are deleted or newly created. Have you done some validations on these scenarios?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. See my note in PR description:

Note that this approach minimizes cache refreshes by not refreshing unless necessary, but it does add extra GET API calls whenever a new node joins the cluster. This can be skipped altogether with the DisableAvailabilitySetNodes flag that is already implemented. It's not perfect but it will fix most scenarios for CAPZ. In the future, I'd like to propose a new "mixed" or "hybrid" VMSet implementation that is truly aware of both VMs and VMSS nodes as a more long-term fix.

We already do the same thing for VMSS: https://github.com/kubernetes-sigs/cloud-provider-azure/blob/master/pkg/provider/azure_vmss.go#L182-L188

There isn't a way to avoid this, unfortunately, if the node just joined the cluster and we've never fetched its corresponding VM/VMSS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we are doing the refresh below based on the new diff between new nodes and known vms, I am not sure that this change is required? If it isn't in the cache but the node is new then it is forced by https://github.com/kubernetes-sigs/cloud-provider-azure/pull/537/files#diff-e6715e02b5aa5f8b693bf3e14dd99dd9a160f0f0f1481df96891af24fc3a5fffR333

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may cause issues when a large number of nodes are deleted or newly created

This will only be an issue if the nodes come up in sequence? I.E. if a node is added every 5 mins we would have an api call every five minutes.

Do we have any data on how often that type scenario occurs? Maybe during node auto scaler operations?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we can keep the cache read type unsafe as we are doing a force refresh if not found in the cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we can keep the cache read type unsafe as we are doing a force refresh if not found in the cache.

Correct, reverted that change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the discussion and addressing the issues

@jsturtevant
Copy link
Contributor

I think this does solve the 15 mins on initial boot or when the nodes are fetch right before a scale event and it does mitigate the extra api calls by only doing it if the node is new to the cluster.

Couple of thoughts for the future:

  • Do we have any types of tests or data that calculates the number of API calls to see if a code change make a significant increase in the number of api calls?
  • Is there an label on the node that allows for us to determine if it is VMSS or VMAS? Is that something worth considering?

pkg/provider/azure.go Outdated Show resolved Hide resolved
@@ -210,7 +211,7 @@ func (ss *scaleSet) getVmssVM(nodeName string, crt azcache.AzureCacheReadType) (

// GetPowerStatusByNodeName returns the power state of the specified node.
func (ss *scaleSet) GetPowerStatusByNodeName(name string) (powerState string, err error) {
managedByAS, err := ss.isNodeManagedByAvailabilitySet(name, azcache.CacheReadTypeUnsafe)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we can keep the cache read type unsafe as we are doing a force refresh if not found in the cache.

@aramase
Copy link
Member

aramase commented Mar 9, 2021

Is there an label on the node that allows for us to determine if it is VMSS or VMAS? Is that something worth considering?

Typically this is determined from the ProviderID field on the node? But if the node has not yet been registered, then the field is empty.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 9, 2021
Copy link
Member

@feiskyer feiskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold cancel

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Mar 10, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon, feiskyer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 10, 2021
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
7 participants