Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AzureMachineTemplate roleAssignmentName validation #2672

Merged

Conversation

majimenez-stratio
Copy link
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

Validation of roleAssignmentName is broken for AzureMachineTemplate when using SystemAssigned identity. Validation fails if empty and it shouldn't.

Special notes for your reviewer:

N/A

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Fix AzureMachineTemplate roleAssignmentName validation when SystemAssigned identity is used

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Sep 27, 2022
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Sep 27, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: majimenez-stratio / name: Miguel Angel Jimenez (1b6d14c)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Sep 27, 2022
@k8s-ci-robot
Copy link
Contributor

Welcome @majimenez-stratio!

It looks like this is your first PR to kubernetes-sigs/cluster-api-provider-azure 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api-provider-azure has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 27, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @majimenez-stratio. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Sep 27, 2022
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Sep 27, 2022
@mboersma
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 27, 2022
@@ -56,11 +56,31 @@ func (r *AzureMachineTemplate) ValidateCreate(ctx context.Context, obj runtime.O
t := obj.(*AzureMachineTemplate)
spec := t.Spec.Template.Spec

allErrs := ValidateAzureMachineSpec(spec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we replacing the existing pattern? Is it because we don't want to validate system-assigned identity similarly for AzureMachine and AzureMachineTemplate resources?

Copy link
Contributor Author

@majimenez-stratio majimenez-stratio Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not that familiar with the code, but it seems so since in AzureMachineTemplate, an empty value for roleAssignmentName is enforced. However, the validator for AzureMachine is doing exactly the opposite since it tries to parse it as an UUID in:

if _, err := uuid.Parse(newIdentity); err != nil {

An empty string is not valid, and as a result, the validation for AzureMachineTemplate can't pass if system-assigned identity is enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the validation for AzureMachineTemplate can't pass if system-assigned identity is enabled.

Can you please expand on why that's the case? The roleAssignmentName should be set to "" for templates as it's a GUID value that needs to be unique (so it can't be shared between all machines of a template). CAPZ will generate it for you on each AzureMachine so you shouldn't need to set it.

To enable system-assigned identity, simply set identity to SystemAssigned on the template, no need to set a roleAssignmentName.

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureMachineTemplate
metadata:
  name: ${CLUSTER_NAME}-md-0
  namespace: default
spec:
  template:
    spec:
      identity: SystemAssigned

Copy link
Contributor Author

@majimenez-stratio majimenez-stratio Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll try to walk through the case step by step by analyzing the current code (without this PR).

  1. As you point out, for AzureMachineTemplate, the roleAssignmentName should be left empty as it should be generated for each AzureMachine. In fact, it's validated as such here:

if spec.RoleAssignmentName != "" {
allErrs = append(allErrs,
field.Invalid(field.NewPath("AzureMachineTemplate", "spec", "template", "spec", "roleAssignmentName"), t, AzureMachineTemplateRoleAssignmentNameMsg),
)
}

  1. In addition to that, a call is made to validate the template spec itself, using the validation code for AzureMachine:

allErrs := ValidateAzureMachineSpec(spec)

  1. That call, in turn, calls ValidateSystemAssignedIdentity that tries to parse the roleAssignmentName inside the template spec, and if left empty, the validation will fail here:

if _, err := uuid.Parse(newIdentity); err != nil {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add more information on the issue, given this example of AzureMachineTemplate with SystemAssigned identity and not specifying roleAssignmentName:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureMachineTemplate
metadata:
  name: testma-control-plane
  namespace: default
spec:
  template:
    spec:
      identity: SystemAssigned
      dataDisks:
      - diskSizeGB: 256
        lun: 0
        nameSuffix: etcddisk
      osDisk:
        diskSizeGB: 128
        osType: Linux
      vmSize: Standard_D2s_v5

This is the error I got when applying it to the management cluster:

Error from server (Invalid): error when creating "testma.yaml": admission webhook "validation.azuremachinetemplate.infrastructure.cluster.x-k8s.io" denied the request: AzureMachineTemplate.infrastructure.cluster.x-k8s.io "testma-control-plane" is invalid: roleAssignmentName: Invalid value: "": Role assignment name must be a valid GUID. It is optional and will be auto-generated when not specified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, thank you for providing more details. I'm looking at #2111 which last changed this logic to see if this is a regression or if it never worked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so it looks like this is not a recent regression, the commit above has the same issue. I was able to reproduce the validation error on my end.

I'm not a huge fan of stopping to use ValidateAzureMachineSpec for validation altogether. The main issue of forking is that we risk forgetting to add new validations in both places as we now have duplicated code.

There are a few alternatives we could go with to fix the issue:

  1. don't make ValidateSystemAssignedIdentity fail when roleAssignmentName is "". Pro: this is a quick and easy fix. Con: AzureMachines are required to have a roleAssignmentName when identity is SystemAssigned so this would no longer catch those issues. It's mitigated by the fact that our defaulting webhook always defaults roleAssignmentName when it's empty and identity type is SystemAssigned.
  2. extract ValidateSystemAssignedIdentity from ValidateAzureMachineSpec add it separately to AzureMachines' ValidateCreate, keep the logic in azuremachinetemplate_webhook.go the same. This is equivalent to the change you are making now but reversed (opt-out the func that doesn't apply instead of explicit opt-in all the others)

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly I vote for #2. Let's optimize for code maintenance instead of webhook order-of-precedence tricks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I opted for this approach due to me not being familiar with the code, as it looked like the safest solution. However, I see why you don’t like it and I agree with either of your suggestions. I personally would take no 2 since the code will be easier to understand since it’s not dependent on the defaulter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a new commit implementing approach no 2. Any suggestions?

@CecileRobertMichon
Copy link
Contributor

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 28, 2022
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 29, 2022
@CecileRobertMichon
Copy link
Contributor

lgtm

Could we add a unit test that validates the scenario this is fixing in https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/e022e4c2a50c97a7121e4054e8c2cb96a122b825/api/v1beta1/azuremachinetemplate_webhook_test.go? Thanks!

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 30, 2022
@majimenez-stratio
Copy link
Contributor Author

majimenez-stratio commented Sep 30, 2022

New test added. Let me know if there's something else to do.

@CecileRobertMichon
Copy link
Contributor

Looks great, thank you @majimenez-stratio!

Can you please just squash the commits? Thanks for finding and fixing this. I'm going to queue it up for cherry-pick.

/cherry-pick release-1.4
/cherry-pick release-1.5

@k8s-infra-cherrypick-robot

@CecileRobertMichon: once the present PR merges, I will cherry-pick it on top of release-1.4 in a new PR and assign it to you.

In response to this:

Looks great, thank you @majimenez-stratio!

Can you please just squash the commits? Thanks for finding and fixing this. I'm going to queue it up for cherry-pick.

/cherry-pick release-1.4
/cherry-pick release-1.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@majimenez-stratio
Copy link
Contributor Author

Done!

@CecileRobertMichon
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 30, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 30, 2022
@k8s-ci-robot k8s-ci-robot merged commit f6f3a08 into kubernetes-sigs:main Sep 30, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.6 milestone Sep 30, 2022
@k8s-infra-cherrypick-robot

@CecileRobertMichon: new pull request created: #2690

In response to this:

Looks great, thank you @majimenez-stratio!

Can you please just squash the commits? Thanks for finding and fixing this. I'm going to queue it up for cherry-pick.

/cherry-pick release-1.4
/cherry-pick release-1.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-infra-cherrypick-robot

@CecileRobertMichon: new pull request created: #2691

In response to this:

Looks great, thank you @majimenez-stratio!

Can you please just squash the commits? Thanks for finding and fixing this. I'm going to queue it up for cherry-pick.

/cherry-pick release-1.4
/cherry-pick release-1.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants