Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DisableExtensionOperations field #4792

Merged

Conversation

RadekManak
Copy link
Contributor

@RadekManak RadekManak commented Apr 30, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:
This PR adds a new field to machine spec DisableExtensionOperations that when set to true ensures that no VMExtensions are configured on the machine by preventing users to add values to VMExtensions field in machine template and setting AllowExtensionOperations false in the instance osProfile.
Setting DisableExtensionOperations also disables the bootstrap failure detection VMExtension.

The intent of this change is to provide a way to create VMs that use images without the capability of running VMExtensions. Currently, these machines get stuck waiting for the bootstrapping VMExtension to get installed.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

  • cherry-pick candidate

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Add option DisableExtensionOperations to disable VMExtensions

/cc @damdo @JoelSpeed

@k8s-ci-robot k8s-ci-robot requested review from damdo and JoelSpeed April 30, 2024 13:42
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 30, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @RadekManak. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should make this about disabling the bootstrapping extension rather than disabling all extensions, since we can already disable extensions by just adding an empty list?

Going even further, is this actually a three option case, no bootstrapping extension, the default bootstrapping extension, or a custom bootstrapping extension (if there's a difference between a bootstrapping extension and regular extension that is, though I'm not convinced it is?)

Comment on lines 135 to 140
// Specifies whether extension operations should be disabled on the virtual machine. This may only be set to True when no
// extensions are configured on the virtual machine.
// +optional
DisableExtensionOperations *bool `json:"disableExtensionOperations,omitempty"`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there are VMExtensions provided, does the default extension still get added? I wonder if it would be better to disable the default extension rather than disable all so you don't have the awkward interaction between this field and the VMExtensions field

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the default extension is always added even if no extensions are specified. I'm wondering if there is a specific use-case for just disabling extension operations temporarily while VMExtensions is specified. If that's not something you need, adding the option to disable will probably be enough.

@jackfrancis
Copy link
Contributor

/assign @willie-yao

Copy link

codecov bot commented Apr 30, 2024

Codecov Report

Attention: Patch coverage is 95.23810% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 62.06%. Comparing base (43e72a9) to head (087217a).
Report is 41 commits behind head on main.

Files Patch % Lines
azure/scope/machine.go 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4792      +/-   ##
==========================================
+ Coverage   62.01%   62.06%   +0.04%     
==========================================
  Files         201      201              
  Lines       16858    16922      +64     
==========================================
+ Hits        10455    10503      +48     
- Misses       5620     5634      +14     
- Partials      783      785       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@willie-yao
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 1, 2024
allErrs := field.ErrorList{}

if ptr.Deref(disableExtensionOperations, false) && len(vmExtensions) > 0 {
allErrs = append(allErrs, field.Forbidden(field.NewPath("AzureMachineTemplate", "spec", "template", "spec", "VMExtensions"), "VMExtensions must be empty when DisableExtensionOperations is true"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Judging by the logic here, it seems like this check is unnecessary. If the goal is to disable the default extension, a flag to explicitly do that would be better. That way, it's still possible to specify additional extensions without using the default one, and avoid confusion about the two conflicting fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent of disableExtensionOperations is to disable all VMExtension for the machine. This check is run in webhook to reject the machine template. Setting disableExtensionOperations true and configuring VMExtension would at best result with no VMExtension deplyoed (unexpteced behavior) and at worst machine failing to provision.

@jackfrancis
Copy link
Contributor

/test pull-cluster-api-provider-azure-e2e-optional

@RadekManak RadekManak force-pushed the disableExtensionOperations branch from 05357d1 to 266cb60 Compare May 3, 2024 11:36
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 3, 2024
@RadekManak RadekManak changed the title Add DisableExtensionOperations field Add DisableBootstrappingVMExtension field May 3, 2024
@RadekManak
Copy link
Contributor Author

My original idea with DisableExtensionOperations was to have an explicit toggle that the machine won't be running any VMExtensions that is visible in both the machine spec and Azure portal. I can see that what we actually care about is being able to disable the bootstrapping extension is just a side effect of that toggle can be confusing.

I have changed the PR to add DisableBootstrappingVMExtension instead. It controls only the creation of bootstrapping VMExtension. We can solve users being able to add VMExtensions that will fail by documenting that our images don't support them.

@RadekManak RadekManak force-pushed the disableExtensionOperations branch from 266cb60 to 8d4749a Compare May 3, 2024 12:11
@willie-yao
Copy link
Contributor

My original idea with DisableExtensionOperations was to have an explicit toggle that the machine won't be running any VMExtensions that is visible in both the machine spec and Azure portal.

Ah that makes sense. It might be good to log a warning in the webhook if user's do specify VMExtensions while also disabling the bootstapping extension since if they are disabling it, it probably means the VM does not support extensions.

@patrickdillon
Copy link

I was investigating this issue independently; so was happy to discover this PR. As is, the PR solves the problem we have with RHCOS being unable to run extensions. LGTM.

At risk of bikeshedding, if there is a chance that future extensions could be added by default, it might be better to relabel this DisableDefaultVMExtensions so that this represents disabling all default VM extensions rather than is specific to the bootstrap extension, which is the only default extension. That said, it seems unlikely to me that more extensions would be added.

@RadekManak
Copy link
Contributor Author

I don't think there will be more default VMExtensions in the future. I've kept it as is, unless maintainers disagree, then I can change it.

@patrickdillon
Copy link

I don't think there will be more default VMExtensions in the future. I've kept it as is, unless maintainers disagree, then I can change it.

+1. This LGTM.

// Its role is to detect and report Kubernetes bootstrap failure or success.
// Use this setting only if VMExtensions are not supported by your image.
// +optional
DisableBootstrappingVMExtension *bool `json:"disableBootstrappingVMExtension,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RadekManak sorry for the confusion, I was following this effort from the edge but didn't make the effort to comment. I actually prefered the previous "disable all VM extensions" option. I think that is a more clarifying option to fulfill your problem, and would help other users in a similar scenario where VM extensions are not supported.

Disabling this particular extension doesn't make a lot of sense as you lose an explicit part of the CAPI bootstrapping contract guarantee; but it does make sense if your CAPZ VM scenario doesn't allow extensions that we want to ship a VM configuration with no Extensions in the configuration.

IMO

We'll prioritize merging this change w/ the previous implementation, thank you so much for this and for your patience!

@RadekManak RadekManak force-pushed the disableExtensionOperations branch from 1860d2f to 50d4e21 Compare May 10, 2024 11:39
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 10, 2024
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 10, 2024
@RadekManak RadekManak changed the title Add DisableBootstrappingVMExtension field Add DisableExtensionOperations field May 10, 2024
@RadekManak
Copy link
Contributor Author

Changed it back to DisableExtensionOperations and added a note explaining that it should only be used when VMExtensions are not supported by the image and that it disables the bootstrapping extension.

@willie-yao
Copy link
Contributor

/retest

Copy link
Contributor

@willie-yao willie-yao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good pending one comment!

// Use this setting only if VMExtensions are not supported by your image, as it disables CAPZ bootstrapping extension used for detecting Kubernetes bootstrap failure.
// This may only be set to True when no extensions are configured on the virtual machine.
// +optional
DisableExtensionOperations *bool `json:"disableExtensionOperations,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking this field should be immutable since it doesn't make too much sense to disable extensions after the VM is already provisioned. Can you add an immutability check in ValidateUpdate? https://github.com/willie-yao/cluster-api-provider-azure/blob/5243ae7d5dae9483280e73da2904a3568fbf8839/api/v1beta1/azuremachine_webhook.go#L77

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I thought that the entire spec was immutable and did not check. Fixed.

@RadekManak RadekManak force-pushed the disableExtensionOperations branch 2 times, most recently from 32674b2 to ca5e9c1 Compare May 14, 2024 10:05
This option when set to true ensures no that no VMExtension are configured on the
machine by preventing users to add values to VMExtensions field in machine template,
Disabling the bootstrapping VMExtension and setting AllowExtensionOperations false
in the instance osProfile.
@RadekManak RadekManak force-pushed the disableExtensionOperations branch from ca5e9c1 to 087217a Compare May 14, 2024 10:09
@@ -213,6 +213,13 @@ func (mw *azureMachineWebhook) ValidateUpdate(ctx context.Context, oldObj, newOb
allErrs = append(allErrs, err)
}

if err := webhookutils.ValidateImmutable(
field.NewPath("spec", "disableExtensionOperations"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
field.NewPath("spec", "disableExtensionOperations"),
field.NewPath("spec", "DisableExtensionOperations"),

This should also be fixed for capacityReservationGroupID on line 210. Sorry for not catching this in the previous PR!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use the capitalized variant. capacityReservationGroupID should be fixed in its own PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why with a capital? This breaks Kube API conventions. The path is supposed to take the json serialisation representation of the field name, so a camelCase is correct.

Adding DisableExtensionOperations with a capital means the error that comes back prints an incorrect JSON path

The capacityReservationGroupID is correct and should not be changed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm you're definitely correct. I'm seeing some inconsistencies in the use of field.NewPath() in CAPZ and that should definitely be fixed. Apologies for the confusion there!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is changed back to camelCase.

@RadekManak RadekManak force-pushed the disableExtensionOperations branch from 087217a to 20e053a Compare May 16, 2024 08:17
@RadekManak RadekManak force-pushed the disableExtensionOperations branch from 20e053a to 087217a Compare May 21, 2024 08:10
@willie-yao
Copy link
Contributor

/retest

Copy link
Contributor

@willie-yao willie-yao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
cc @jackfrancis for approval

@k8s-ci-robot
Copy link
Contributor

@willie-yao: GitHub didn't allow me to request PR reviews from the following users: for, approval.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/lgtm
/cc @jackfrancis for approval

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 21, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a04d0a356a205765a7281ea2b4b83cb2bf421bc8

@jackfrancis
Copy link
Contributor

/approve

Thank you @RadekManak @JoelSpeed!

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 21, 2024
@k8s-ci-robot k8s-ci-robot merged commit 3c61283 into kubernetes-sigs:main May 21, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants