-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(azure): add support for workload identity using azidentity #3906
feat(azure): add support for workload identity using azidentity #3906
Conversation
Signed-off-by: Igor Beliakov <[email protected]>
Hi @weisdd. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
{ | ||
"subscriptionId": "$(az account show --query id -o tsv)", | ||
"resourceGroup": "$AZURE_DNS_ZONE_RESOURCE_GROUP", | ||
"useWorkloadIdentityExtension": true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a pro at Azure, but wonder if it is possible to avoid this mount..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the PR creator, but as long as Azure-workload-identity is installed in the Cluster, the Mutating Admission Webhook will take care of the projection based on just the podLabels
& the service Account annotations.
So, perhaps the secretConfiguration
section in the helm values can be done away with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, this is what I am thinking about
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the mutating webhook inserts environment variables that help to get an access token:
Whereas Subscription ID (subscriptionId
) and Resource Group (resourceGroup
) are needed for Azure provider (in-tree plugin in external-dns) to understand where a DNS zone is maintained. It's used to construct proper API calls against ARM.
As for the last option called useWorkloadIdentityExtension
, it's needed to instruct provider which authentication method to use. Strictly speaking, azidentity offers DefaultAzureCredential function that can automatically detect authentication method:
https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity#section-readme
But I don't think it should get enabled under this PR. So, this option is also required, at least for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I think DefaultAzureCredential should always be preferred and if it had been in most projects azwi would've been easier to get going. But I don't know enough about this project to judge if that change should or should not be made in this pr or as it's own change so if you think that's not appropriate you probably know better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weisdd I don't have a strong opinion here as I haven't used DefaultAzureCredential
in the past and I'm not familiar with the tradeoffs that come with it. Can you please clarify pros and cons of the two approaches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Raffo Here's what I can think of:
- Pros:
- Easy-to-use;
- No need to explicitly maintain configuration settings related to any of the authentication methods (e.g. when I looked into how to add support for Workload Identity to FluxCD, it was a matter of bumping deps in those controllers that relied on
DefaultAzureCredential
);
- Cons:
- If your environment satisfies multiple authentication methods, you can't directly override the order of preference;
- You can't use it in a complex multi-tenant scenario when a token is not mounted, but referenced (multiple MSIs delegate access to different k8s service-accounts, so the app needs to have a more complex flow for getting tokens. It's used it external-secrets operator, docs).
I think none of the cons are applicable to external-dns at this point:
- A container environment is unlikely to satisfy multiple authentication methods (why would anyone configure all of them?);
- external-dns currently works with just one resource group (so, you don't have complex multi-tenant setups in the same instance).
Order of preference
DefaultAzureCredential
has this order of preference when it come to authentication methods:
- Environment -
DefaultAzureCredential
will read account information specified via environment variables and use it to authenticate. - Workload Identity - If the app is deployed on Kubernetes with environment variables set by the workload identity webhook,
DefaultAzureCredential
will authenticate the configured identity. - Managed Identity - If the app is deployed to an Azure host with managed identity enabled,
DefaultAzureCredential
will authenticate with it. - Azure CLI - If a user or service principal has authenticated via the Azure CLI
az login
command,DefaultAzureCredential
will authenticate that identity.
Source: https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity#section-readme
I think this order is well thought-out and is unlikely to cause any issues:
- Managed Identity will soon be gone (or is already gone) in most clusters;
- Azure CLI is quite heavy for a container image and will only be pre-installed in local development environments. Devs can make the library use an SPN through environment variables if there's any need for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the context, this is very helpful. If the switch to DefaultAzureCredential
is trivial, then I wouldn't be opposed in having this in this PR. It would align us with other controllers and make this feature easier to adopt. At the end of the day, we are adding a new functionality, so it is not bad to make now the change that we need to make.
The only real consideration that I want to make is that I want to cut a release soon and I'm waiting for this one to merge. It's fine to do so, but if adding DefaultAzureCredential
means waiting a lot of time or extensive testing due to lack of confidence in the change, then I'd prefer having this one in and handle the change in a follow up PR. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Raffo At a first glance, code-wise, migration to DefaultAzureCredentialt
would be simple, something along the lines:
func getCredentials(cfg config) (azcore.TokenCredential, error) {
cloudCfg, err := getCloudConfiguration(cfg.Cloud)
if err != nil {
return nil, fmt.Errorf("failed to get cloud configuration: %w", err)
}
opts := azidentity.DefaultAzureCredentialOptions{
ClientOptions: azcore.ClientOptions{
Cloud: cloudCfg,
},
}
cred, err := azidentity.NewDefaultAzureCredential(&opts)
if err != nil {
return nil, fmt.Errorf("failed to obtain a token: %w", err)
}
return cred, nil
}
(+ remove old configuration fields here and there)
Testing is unlikely to be complex.
When it comes to documentation, it'll be more time-consuming:
- We'll need to add some references to the docs for the new authentication logic and highlight important pieces there;
- I think there are plenty of things to be simplified in the docs (including the part that I prepared in this PR), and we should probably share some code snippets (e.g. for terraform).
So, I'd prefer to go with the second option you suggested - to merge the current PR and then handle other changes in a follow-up PR. If I don't get busy at work, I'll do it this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me, thank you for your careful explanation and to be mindful to separate work appropriately, that is very appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good to me and I am very thankful for the amount of documentation that is included.
What I request the community is to provide feedback on the instructions as I don't have an immediate way to test (I don't have an azure cluster ready, although I should be able to create one if someone can't provide information).
I'm also gonna ask the Azure folks what they think about this so that we can get some direct cloud provider feedback.
I'll keep an eye on this so it won't be delayed long.
{ | ||
"subscriptionId": "$(az account show --query id -o tsv)", | ||
"resourceGroup": "$AZURE_DNS_ZONE_RESOURCE_GROUP", | ||
"useWorkloadIdentityExtension": true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weisdd I don't have a strong opinion here as I haven't used DefaultAzureCredential
in the past and I'm not familiar with the tradeoffs that come with it. Can you please clarify pros and cons of the two approaches?
@Raffo Thanks for your time and the review! Important context for the documentation:
I'll be happy to improve docs should any suggestions arise from community or Microsoft :) |
/ok-to-test |
As one Azure community member, what I know is that the Azure has finally hard-deprecated the pod identity mechanism, so that's not an option for use with external-dns. With this not yet landed, the only option is to create, manage, and inject your own special purpose service principal. Not impossible, but not very good either. I don't speak for MSFT/Azure but usage wise, it's pretty clear they created this functionality specifically so you wouldn't have to do that (and not just for external-dns). Personally, I appreciate the constant consideration of backward compatibility, but external factors have broken this use case pretty thoroughly, and a release with perhaps some special handling/mention in changelog would be in order. Only one opinion. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Raffo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
'{"spec": {"template": {"metadata": {"labels": {\"azure.workload.identity/use\": \"true\"}}}}}' | ||
``` | ||
|
||
NOTE: it's also possible to specify (or override) ClientID through `UserAssignedIdentityID` field in `azure.json`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: according this line, I kinda understand that "if I do not set the ENV variables right through annotations, I can override/specify the clientId through the value in azure.json"
However, here, it feels it is the opposite: if the azure.json does not have the values, we default to the ENV variables.
I tried setting all the values via the azure.json and not using annotations but, the clientId is not fetching properly. May there be a problem in the config somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the code, it seems the property to set is ClientID
instead of UserAssignedIdentityID
:
external-dns/provider/azure/config.go
Line 110 in 859892f
ClientID: cfg.ClientID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my bad, we should correct either docs or code. Sorry for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reply, I think this provides quite some insights on why it didn't work. So... which of both options do you prefer? Updating docs or the code?
I don't mind opening a PR, but I don't want to make a decision without an agreement on what we want.
From my side I wouldn't mind opening a PR to update the docs. Also I can test that ClientID
and TenantID
work as expected before reflecting that in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do encounter the same issue. For me it seems like the default way with annotation of client-id and pod label is not working. I get the error message that the client id is not present.
To set the clientId in the json file you'll have to set it in the key aadClientId
and not in UserAssignedIdentityID
like in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Sturgelose sorry for the delay in replies, intense period at work and some personal matters.
To be honest, not sure what's the best approach here. On one hand, changing code to UserAssignedIdentityID
would make more sense in terms of namings. I think I used it in the initial implementation on the old library, but when preparing the new PR, somehow lost the fact that there are two ways to pass client ID in external-dns.
On the other, aadClientId
is an "easier" change as it doesn't change behaviour.
At the same time, as the documentation is not fully accurate, it's unlikely that many people used overrides. By the way, could you expand on why you need to use explicit override through config instead of k8s annotations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my point of view, it's better to stick with clientID
as it's the environment variable the mutating webhook will present it in the pod through AZURE_CLIENT_ID
as described in https://azure.github.io/azure-workload-identity/docs/installation/mutating-admission-webhook.html#mutating-admission-webhook
More other, UserAssignedIdentityID
was just used for the old pod identity feature wich is deprecated so once it will be removed, it will make a weird sitution where UserAssignedIdentityID
is used to populate AZURE_CLIENT_ID
. I think it makes more sense to use the same vocabulary everywhere.
Hey @weisdd, I've got problems with getting this to work with Azure China. I've linked the issue I've created in the azure-sdk-for-go above because I think the problem is maybe there but could you have a quick look into this, too to identity if maybe there is a problem in the external-dns implementation? Thank you very much! |
Hi @janpfischer it's been fix with this PR #3942 but it's not yet in a release. You'll have to use an image compiled from master. |
Thank you very much. I will test this! |
Gr8 work with this (long) one @weisdd, waiting for it... :-) |
It's already released in https://github.com/kubernetes-sigs/external-dns/releases/tag/v0.13.6 @jonasnorlund (even if a small bug impacts sovereign tenants) |
Hi, yes I saw it later, I just looked at Azure Private DNS and the docs wasn't updated there. |
@jonasnorlund I thought of merging docs for public and private dns (the difference in external-dns settings is tiny), improve some of the sections, and experiment with migration from manual selection of authentication methods to an automated one, but some personal matters got into my way (load at work and some construction defects in a newly bought property). It's still in my plans, I just haven't had mental capacity for that yet. :/ |
Description
In Release 2022-10-10, AKS has brought support for Workload Identity Federation as a replacement for Pod-Managed Identities. The latter now is considered deprecated and is about to reach its end of life.
This PR adds support for Workload Identity using azidentity (which replaced adal in #3040).
Fixes #2724
NOTE: This PR comes to replace #3111 where I implemented workload identity using adal in a bit hacky way.
Checklist