Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors for azure-wi-webhook-controller-manager pods on install: User \"system:serviceaccount:azure-workload-identity-system:azure-wi-webhook-admin\" cannot update resource \"mutatingwebhookconfigurations\" in API group \"admissionregistration.k8s.io\" at the cluster scope: Azure does not have opinion for this user. #856

Closed
VioletHynes opened this issue Apr 19, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@VioletHynes
Copy link

VioletHynes commented Apr 19, 2023

Describe the bug

Hi there! I'm trying to set up a WIF enabled cluster. Here are all of the steps I've done so far:

  • Created my AKS cluster
  • Enabled WIF using az feature show --namespace "Microsoft.ContainerService" --name "EnableWorkloadIdentityPreview" and az provider register --namespace Microsoft.ContainerService (once it's registered)
  • Enabled WIF for my cluster: az aks update -g wif-test_group -n wif-test --enable-workload-identity
  • Enabled an OIDC issuer: az aks update -g wif-test_group -n wif-test --enable-oidc-issuer
  • Created an identity, created a service account with the right client_id, and done az identity federated-credential create to it
  • Created a pod in K8S with the above identity as its service account (and the WIF annotation on both)

The above completed without issue (though the documentation was fairly scattered). The only step I think I'm missing is installing the webhook.

When I install the webhook through either of the approaches indicated here: https://azure.github.io/azure-workload-identity/docs/installation/mutating-admission-webhook.html - the two pods in the azure-workload-identity-system namespace are erroring with that error, and do not seem to be injecting anything into my annotated pods.

This has been a clean install each time, and I've made sure to clean up each time.

I found a similar issue here: #777 but reinstallation doesn't fix it for me. I've tried many times to reinstall and always get this issue.

There could be something I'm missing, but this is a fresh workload-identity-webhook install on a fairly fresh (created last week) AKS cluster, so I kind of would expect this to 'just work' since this is meant to be the new way to do things. If there is something I'm missing, do let me know!

Steps To Reproduce

Install WIF admissions webhook using either of the approaches outlined here: https://azure.github.io/azure-workload-identity/docs/installation/mutating-admission-webhook.html

I'm using the latest Helm chart (I've done helm repo update) on an AKS cluster I made last week.

Expected behavior

I shouldn't get errors when installing WIF into an AKS environment.

Logs

{"level":"error","timestamp":"2023-04-19T18:57:18.292265Z","caller":"/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:326$controller.(*Controller).reconcileHandler","message":"Reconciler error","controller":"cert-rotator","object":{"name":"azure-wi-webhook-server-cert","namespace":"azure-workload-identity-system"},"namespace":"azure-workload-identity-system","name":"azure-wi-webhook-server-cert","reconcileID":"b1c518a5-15f6-4e17-a16c-89809d2645ac","error":"mutatingwebhookconfigurations.admissionregistration.k8s.io \"azure-wi-webhook-mutating-webhook-configuration\" is forbidden: User \"system:serviceaccount:azure-workload-identity-system:azure-wi-webhook-admin\" cannot update resource \"mutatingwebhookconfigurations\" in API group \"admissionregistration.k8s.io\" at the cluster scope: Azure does not have opinion for this user."}
{"level":"info","timestamp":"2023-04-19T19:00:02.133676Z","logger":"cert-rotation","caller":"/go/pkg/mod/github.com/open-policy-agent/[email protected]/pkg/rotator/rotator.go:722$rotator.(*ReconcileWH).ensureCerts","message":"Ensuring CA cert","name":"azure-wi-webhook-mutating-webhook-configuration","gvk":"admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration","name":"azure-wi-webhook-mutating-webhook-configuration","gvk":"admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration"}
{"level":"error","timestamp":"2023-04-19T19:00:02.209346Z","logger":"cert-rotation","caller":"/go/pkg/mod/github.com/open-policy-agent/[email protected]/pkg/rotator/rotator.go:729$rotator.(*ReconcileWH).ensureCerts","message":"Error updating webhook with certificate","name":"azure-wi-webhook-mutating-webhook-configuration","gvk":"admissionregistration.k8s.io/v1, Kind=MutatingWebhookConfiguration","error":"mutatingwebhookconfigurations.admissionregistration.k8s.io \"azure-wi-webhook-mutating-webhook-configuration\" is forbidden: User \"system:serviceaccount:azure-workload-identity-system:azure-wi-webhook-admin\" cannot update resource \"mutatingwebhookconfigurations\" in API group \"admissionregistration.k8s.io\" at the cluster scope: Azure does not have opinion for this user."}
{"level":"error","timestamp":"2023-04-19T19:00:02.209888Z","caller":"/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:326$controller.(*Controller).reconcileHandler","message":"Reconciler error","controller":"cert-rotator","object":{"name":"azure-wi-webhook-server-cert","namespace":"azure-workload-identity-system"},"namespace":"azure-workload-identity-system","name":"azure-wi-webhook-server-cert","reconcileID":"ba47bcab-db12-484a-8021-4a81848eaa09","error":"mutatingwebhookconfigurations.admissionregistration.k8s.io \"azure-wi-webhook-mutating-webhook-configuration\" is forbidden: User \"system:serviceaccount:azure-workload-identity-system:azure-wi-webhook-admin\" cannot update resource \"mutatingwebhookconfigurations\" in API group \"admissionregistration.k8s.io\" at the cluster scope: Azure does not have opinion for this user."}

Environment

AKS

  • Kubernetes version (use kubectl version): Client Version: v1.26.3, Kustomize Version: v4.5.7, Server Version: v1.24.10
  • Cloud provider or hardware configuration: AKS
  • OS (e.g: cat /etc/os-release): Mac
  • Kernel (e.g. uname -a): Darwin
  • Install tools: Helm/kubectl (recommended ways)
  • Network plugin and version (if this is a network-related bug): N/A
  • Others:

Additional context

I'm looking to get my environment working so I can test a change to Vault Agent to support WIF authentication for Vault Agent.

@VioletHynes VioletHynes added the bug Something isn't working label Apr 19, 2023
@aramase
Copy link
Member

aramase commented Apr 19, 2023

There could be something I'm missing, but this is a fresh workload-identity-webhook install on a fairly fresh (created last week) AKS cluster, so I kind of would expect this to 'just work' since this is meant to be the new way to do things. If there is something I'm missing, do let me know!

If you're enabling the addon --enable-workload-identity, you don't have to install the webhook again from this repo. The add-on is a managed version of this project and when you run --enable-workload-identity, AKS will deploy the webhook in kube-system namespace.

@VioletHynes
Copy link
Author

VioletHynes commented Apr 19, 2023

Ah. The WIF troubleshooting documentation suggested I debug in the azure-workload-identity-system namespace, which wasn't populated at all: https://azure.github.io/azure-workload-identity/docs/troubleshooting.html - the other documentation (e.g.
https://learn.microsoft.com/en-us/azure/aks/workload-identity-deploy-cluster) doesn't really mention troubleshooting steps or that it installs these resources or where.

I'm not entirely convinced that the one that was installed by default is working right now either, but now that I know where it is, I can at the very least look at the logs a bit and understand why.

Would you suggest that I uninstall the helm chart, and should the one installed by --enable-workload-identity be good enough?

@aramase
Copy link
Member

aramase commented Apr 19, 2023

Ah. The WIF troubleshooting documentation suggested I debug in the azure-workload-identity-system namespace, which wasn't populated at all: https://azure.github.io/azure-workload-identity/docs/troubleshooting.html - the other documentation (e.g.
https://learn.microsoft.com/en-us/azure/aks/workload-identity-deploy-cluster) doesn't really mention troubleshooting steps or that it installs these resources or where.

The troubleshooting docs in this repo are specific to the helm chart installation. Thanks for point out the missing section in the AKS docs. There is room for improvement here.

Would you suggest that I uninstall the helm chart, and should the one installed by --enable-workload-identity be good enough?

Yes, you can uninstall the helm chart. Only a single instance of the webhook is required.

@VioletHynes
Copy link
Author

VioletHynes commented Apr 19, 2023

It has been quite confusing to troubleshoot WIF and those were the only troubleshooting docs I could find (e.g. they're the top google result for "workload identity federation troubleshooting azure") - it might be helpful to also note in the troubleshooting docs where to find the troubleshooting docs for WIF that doesn't use the helm chart, or some information that's not specific to the helm chart information. Nothing on the docs that I can see would indicate it doesn't apply for AKS.

The only other docs I've seen (as part of the error message I get when the request to http://169.254.169.254/metadata/identity/oauth2/token fails) is this one, which doesn't mention WIF at all: https://aka.ms/azsdk/go/identity/troubleshoot#managed-id

Thanks for the help, though! I appreciate the help and information greatly. Feel free to close this. I'm still surprised the resource errored in the way it did on a fresh install, but ultimately I'm not blocked by it any more.

@aramase
Copy link
Member

aramase commented Apr 19, 2023

It has been quite confusing to troubleshoot WIF and those were the only troubleshooting docs I could find (e.g. they're the top google result for "workload identity federation troubleshooting azure") - it might be helpful to also note in the troubleshooting docs where to find the troubleshooting docs for WIF that doesn't use the helm chart, or some information that's not specific to the helm chart information.

There should be a separate troubleshooting guide in the AKS docs. @miwithro @karataliu could you'll track this?

The only other docs I've seen (as part of the error message I get when the request to http://169.254.169.254/metadata/identity/oauth2/token fails) is this one, which doesn't mention WIF at all: https://aka.ms/azsdk/go/identity/troubleshoot#managed-id

This means the workload is using an old version of sdk which still relies on IMDS to get a managed identity token. Here are the minimum required SDK versions for workload identity: https://azure.github.io/azure-workload-identity/docs/topics/language-specific-examples/azure-identity-sdk.html.

I'm still surprised the resource errored in the way it did on a fresh install

The service account permission error could be because of multiple instances of the webhook. Just enabling the add-on with --enable-workload-identity shouldn't contain any errors.

@aramase aramase closed this as not planned Won't fix, can't repro, duplicate, stale Apr 21, 2023
@karataliu
Copy link
Contributor

To clarify, there are two ways you can enable workload identity on AKS:

  1. Using AKS integration: https://aka.ms/aks/wi, this will install the webhook deployment in kube-system namespace
  2. Using open source: see https://github.com/Azure/azure-workload-identity/tree/main/charts/workload-identity-webhook, this will by default install the webhook deployment in azure-workload-identity-system namespace.

Using both together will result in a conflict.

The cause for issue here is there is a non-namespace resource clusterrolebinding
After you install AKS version, it points to serviceaccount in kube-system namespace. When you then install opensource version
it temply changed it to azure-workload-identity-system namespace. But AKS integration will keep refreshing it back to the kube-system namespace. Thus the pods in azure-workload-identity-system namespace will report errors.

The suggestion here is to choose only one of the solutions (AKS integration or open source).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants