Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make VM extension reconcile async and move VMSS extension into scaleset service #2177

Merged
merged 1 commit into from
Apr 21, 2022

Conversation

Jont828
Copy link
Contributor

@Jont828 Jont828 commented Mar 17, 2022

What type of PR is this?

/kind feature
What this PR does / why we need it: Implementation of an async service for VM extensions as a follow up for #1610 and #1541.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2141

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Make VM extension reconcile async and move VMSS extension into scaleset service

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 17, 2022
@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Mar 17, 2022
@Jont828
Copy link
Contributor Author

Jont828 commented Mar 21, 2022

/assign @CecileRobertMichon

@Jont828 Jont828 force-pushed the async-vmextensions branch from d51ffa9 to b489ad5 Compare March 28, 2022 20:00
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 28, 2022
@Jont828 Jont828 force-pushed the async-vmextensions branch from b489ad5 to 179e48f Compare March 28, 2022 20:02
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 28, 2022
@Jont828 Jont828 force-pushed the async-vmextensions branch 3 times, most recently from f1281fe to 9891e80 Compare March 28, 2022 23:54
Settings: nil,
ProtectedSettings: s.ProtectedSettings,
},
// TODO: should we include location since it's used in VMExtensions too?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why one has location and not the other, is location required for VM extensions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in VMSS created the parameters w/o a location like this while the VM extensions included it. I kept the original behavior since it's a refactor but I wasn't sure if it was a bug.

extensions[i] = compute.VirtualMachineScaleSetExtension{
	Name: &extensionSpec.Name,
	VirtualMachineScaleSetExtensionProperties: &compute.VirtualMachineScaleSetExtensionProperties{
		Publisher:          to.StringPtr(extensionSpec.Publisher),
		Type:               to.StringPtr(extensionSpec.Name),
		TypeHandlerVersion: to.StringPtr(extensionSpec.Version),
		Settings:           nil,
		ProtectedSettings:  extensionSpec.ProtectedSettings,
	},
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either. You could try 1) removing Location from VM extensions 2) adding Location to VMSS extensions, and see if either breaks?

I agree it doesn't seem right that they aren't consistent, but it's possible that this is due to the Azure API between VM and VMSS not being consistent (if it's not, let's make it consistent).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can give that a try. Are VM/VMSS extensions covered in the e2e tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are VM/VMSS extensions covered in the e2e tests?

yes they are, since we create an extension for every single VM and every single VMSS to check kubeadm completion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from your other comment sounds like there is a discrepancy in the API and we can just get rid of this comment?


// VMSSExtensionSpec defines the specification for a VM or VMScaleSet extension.
type VMSSExtensionSpec struct {
Name string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of duplicating fields here and in VMExtensionSpec (since they are the same) consider nesting the Azure extension spec in both

type VMSSExtensionSpec struct {
    azure.ExtensionSpec
}

azure/types.go Outdated
// ExtensionSpec defines the specification for a VM or VMScaleSet extension.
type ExtensionSpec struct {
// BootstrapingExtensionSpec defines the specification for a CAPZ Bootstrapping VM or VMSS extension.
type BootstrapingExtensionSpec struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the rename? the fields here could be reused later on for any other type of extension, nothing in here is bootstrapping specific AFAIK so I think the generic naming is appropriate

Copy link
Contributor Author

@Jont828 Jont828 Mar 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it cause that struct is populated only by the GetBootstrappingVMExtension() function here and I thought it would be more clear where it's coming from. Additionally, I wanted to make it clear that that type isn't coming from the services but from types.go and defaults.go. But if we want to use it more generically then we can change it back.

func GetBootstrappingVMExtension(osType string, cloud string, vmName string) *ExtensionSpec {

azure/types.go Outdated
@@ -96,7 +96,7 @@ type PrivateDNSLinkSpec struct {
LinkName string
}

// ExtensionSpec defines the specification for a VM or VMScaleSet extension.
// ExtensionSpec defines the specification for a CAPZ Bootstrapping VM or VMSS extension.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: revert the comment about CAPZ Bootstrapping (VMSS change is good to keep)


// VMExtensionSpec defines the specification for a VM or VMScaleSet extension.
type VMExtensionSpec struct {
Spec azure.ExtensionSpec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could just be:

Suggested change
Spec azure.ExtensionSpec
azure.ExtensionSpec

so you can refer to fields directly, for example VMExtensionSpec.Publisher instead of VMExtensionSpec.Spec.Publisher

@Jont828
Copy link
Contributor Author

Jont828 commented Apr 1, 2022

@CecileRobertMichon It looks like the reason we don't have a location in compute.VirtualMachineScaleSetExtension is because it simply doesn't have a location field while compute.VirtualMachineExtension does. So I guess it's something built in to the Azure Go API.

@Jont828 Jont828 changed the title [WIP] Make VM extension reconcile async Make VM extension reconcile async Apr 4, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 4, 2022
@Jont828 Jont828 force-pushed the async-vmextensions branch from 6dac4e3 to 05653ab Compare April 4, 2022 21:03
@Jont828 Jont828 changed the title Make VM extension reconcile async Make VM extension reconcile async and refactor VMSS extension spec Apr 4, 2022
// check the extension status and set the associated conditions.
if retErr := s.Scope.SetBootstrapConditions(ctx, to.String(existing.ProvisioningState), extensionSpec.Name); retErr != nil {
if retErr := s.Scope.SetBootstrapConditions(ctx, to.String(vmextension.ProvisioningState), extensionSpec.ResourceName()); retErr != nil {
// TODO: what precedence should this error have?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetBootstrapConditions might need a bit of refactoring. Instead of taking the error and using that to set the condition like we're doing for other resources, it's taking the provisioning state of the extension and using that to set the condition, then returning an error if the extension is done creating. Can we change it to always run (even if resultErr is not nil) and use the error to determine what the condition should be set to instead? Similar to what we're doing in UpdatePutStatus

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good, there are a few outstanding TODOs that need to be resolved before this can merge and looks like you need to rebase due to a conflict.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 5, 2022
@Jont828 Jont828 force-pushed the async-vmextensions branch from e309bbb to 5d34df0 Compare April 6, 2022 22:16
@Jont828 Jont828 force-pushed the async-vmextensions branch from 5d34df0 to 0f35de5 Compare April 12, 2022 20:24
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 12, 2022
s.Scope.DeleteLongRunningOperationState(s.Scope.ScaleSetSpec().Name, serviceName)
// This also means that the VMSS extensions were successfully installed
s.Scope.UpdatePutStatus(infrav1.BootstrapSucceededCondition, serviceName, nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CecileRobertMichon I went ahead and added the UpdatePutStatus() call in the scaleset service. However, I'm not sure how we want to set the error message for VMSSExtensions. Do we want to just set it with whatever error comes from s.createVMSS(ctx), s.patchVMSSIfNeeded(ctx, fetchedVMSS), and s.getVirtualMachineScaleSetIfDone(ctx, future)? Or should we try to set specific errors caused by the VMSSExtension (and not necessarily the scaleset as a whole)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question... looks like right now we're not setting conditions at all for the scale set itself because it hasn't yet been refactored (it was implemented as async before the async proposal landed so it's not fully following the same patterns). I would say let's leave this for now and add a note to handle both VMSS and VMSS extension conditions when an error occurs.

@@ -43,7 +43,7 @@ const (
// WaitingForBootstrapDataReason used when machine is waiting for bootstrap data to be ready before proceeding.
WaitingForBootstrapDataReason = "WaitingForBootstrapData"
// BootstrapSucceededCondition reports the result of the execution of the boostrap data on the machine.
BootstrapSucceededCondition = "BoostrapSucceeded"
BootstrapSucceededCondition clusterv1.ConditionType = "BootstrapSucceeded"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

s.Scope.DeleteLongRunningOperationState(s.Scope.ScaleSetSpec().Name, serviceName)
// This also means that the VMSS extensions were successfully installed
s.Scope.UpdatePutStatus(infrav1.BootstrapSucceededCondition, serviceName, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question... looks like right now we're not setting conditions at all for the scale set itself because it hasn't yet been refactored (it was implemented as async before the async proposal landed so it's not fully following the same patterns). I would say let's leave this for now and add a note to handle both VMSS and VMSS extension conditions when an error occurs.

Settings: nil,
ProtectedSettings: s.ProtectedSettings,
},
// TODO: should we include location since it's used in VMExtensions too?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the TODO since you've determined location is not included in the API?

@CecileRobertMichon
Copy link
Contributor

build is failing with

exp/controllers/azuremachinepool_reconciler.go:28:2: no required module provides package sigs.k8s.io/cluster-api-provider-azure/azure/services/vmssextensions; to add it:
	go get sigs.k8s.io/cluster-api-provider-azure/azure/services/vmssextensions

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/assign @mboersma

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 19, 2022
@jackfrancis
Copy link
Contributor

Thanks @Jont828!

@Jont828 Jont828 force-pushed the async-vmextensions branch from faaed62 to bf7e23e Compare April 19, 2022 21:21
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 19, 2022
@Jont828 Jont828 force-pushed the async-vmextensions branch from bf7e23e to d7ad4a8 Compare April 20, 2022 22:13
@k8s-ci-robot
Copy link
Contributor

@Jont828: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-apidiff d7ad4a8 link false /test pull-cluster-api-provider-azure-apidiff

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@Jont828
Copy link
Contributor Author

Jont828 commented Apr 21, 2022

@CecileRobertMichon Squashed and pushed, should be good to merge!

@Jont828 Jont828 changed the title Make VM extension reconcile async and refactor VMSS extension spec Make VM extension reconcile async and move VMSS extension into scaleset service Apr 21, 2022
@CecileRobertMichon
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 21, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 21, 2022
@k8s-ci-robot k8s-ci-robot merged commit 40d18ef into kubernetes-sigs:main Apr 21, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.3 milestone Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

async vmextensions
5 participants