Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authentication: Support Azure Federated Identity environment variables directly #23965

Merged
merged 7 commits into from
Dec 14, 2023

Conversation

mrsheepuk
Copy link
Contributor

@mrsheepuk mrsheepuk commented Nov 20, 2023

While using an Azure AKS Workload Identity to authenticate the provider is supported (as noted in issues #21635 and #18612), it isn't obvious from the documentation that all that is needed is to map the environment variables over (and indeed I didn't find those issues when searching for how to do this and ended up trying many different things before ending up with the working combination).

As these environment variables are standardised by Azure, this PR proposes supporting them natively to automatically detect this configuration and work as expected when executed inside an AKS cluster with the federated identity presented.

With this, a provider that is authenticated and working inside an AKS pod with workload identity configured becomes simply:

provider "azurerm" {
  features {}
  subscription_id = "12345678-abcd-458d-aa01-0123abcd1234"
  use_aks_workload_identity = true
}

... with the federated identity, tenant ID and client ID all auto-populated from the environment.

  • make pr-check passing

@mrsheepuk
Copy link
Contributor Author

mrsheepuk commented Nov 21, 2023

Thanks for taking a look at this @manicminer - I've tested this by building the provider locally and transferring it into a pod with a workload identity configured using kubectl cp as follows:

apiVersion: v1
kind: Secret
metadata:
  name: config-mark
  namespace: terraform-test
stringData:
  backend.tf: |
    terraform {
      backend "kubernetes" {
        in_cluster_config = true
        namespace         = "terraform-system"
        secret_suffix     = "marktest"
      }
    }
  provider.tf: |
    provider "azurerm" {
      features {}
      subscription_id = "REDACTED"
      use_aks_workload_identity = true
    }
  main.tf: |
    resource "azurerm_storage_account" "sa" {
      name                     = "testbob2023111702"
      resource_group_name      = "mark-aks-2-infra-uksouth"
      location                 = "uksouth"
      account_tier             = "Standard"
      account_replication_type = "LRS"
    }
  tfrc: |
    provider_installation {
      dev_overrides {
        "hashicorp/azurerm" = "/data/bin/"
      }
      direct {}
    }
type: Opaque
---
apiVersion: v1
kind: Pod
metadata:
  labels:
    azure.workload.identity/use: "true"
  name: marktest
  namespace: terraform-test
spec:
  containers:
  - args:
    - -F
    - /data/logcat
    command:
    - tail
    env:
    - name: HOME
      value: /data
    - name: KUBE_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    image: hashicorp/terraform:1.6.4
    imagePullPolicy: IfNotPresent
    name: terraform
    volumeMounts:
    - mountPath: /data
      name: source
    - mountPath: /data/bin
      name: bin
    - mountPath: /data/config
      name: config
    workingDir: /data
  securityContext:
    fsGroup: 65534
    runAsGroup: 65534
    runAsUser: 65534
  serviceAccount: terraform-executor
  serviceAccountName: terraform-executor
  volumes:
  - emptyDir: {}
    name: source
  - emptyDir: {}
    name: bin
  - name: config
    secret:
      defaultMode: 420
      items:
      - key: backend.tf
        path: backend.tf
      - key: provider.tf
        path: provider.tf
      - key: main.tf
        path: main.tf
      - key: tfrc
        path: ".terraformrc"
      optional: false
      secretName: config-mark

Copy the locally-built provider in to the pod:

kubectl cp ./bin/terraform-provider-azurerm terraform-test/marktest:/data/bin/

Then exec'ing into the pod (kubectl -n terraform-test exec marktest -it -- /bin/sh) and running the following:

cp ./config/* ./
cp ./config/.terraformrc ./
terraform init
terraform plan -out tfplan
terraform apply tfplan
terraform destroy -auto-approve

All works as expected and self-configures from the Azure-provided environment variables injected via the azure.workload.identity/use: "true" label.

Copy link
Contributor

@manicminer manicminer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mrsheepuk, thanks for suggesting this. I agree that whilst it's relatively straightforward to set up the necessary environment variables based on those set when using ASK Workload Identity, there are valid points to be made about documenting this or potentially supporting it directly as per your proposal.

We've chatted internally about this and since this is now a mature and stable implementation in AKS, this is something we'd like to support. However, we think it's important to mitigate any risk of disrupting existing configurations (whether using workload identity or not, or even OIDC or not), and also to ensure reliable testing in isolation, so that we have confidence in this being stable across future releases.

With that in mind, would you be able to rework this as follows:

  1. Add a new provider-level boolean configuration option use_aks_workload_identity, with a corresponding environment variable ARM_USE_AKS_WORKLOAD_IDENTITY
  2. When this provider property is true, we can specifically consume the AKS-native environment variables in the provider's configure function.
  3. In the providerConfigure() function, if the client_id / client_id_file_path properties are set and produce different value(s) to that specified in the AZURE_CLIENT_ID environment variable, we should raise an error and warn the user to either set one of these, or ensure they are set to the same value. This check should happen in the getClientId() function.
  4. Perform the same validation for tenant_id / AZURE_TENANT_ID respectively. A new getTenantId() function will be needed in provider.go for this (based on getClientId()).
  5. The use_aks_workload_identity feature should fall back gracefully - that is, if it's set to true and the environment variables are not all set, we should make a best effort to configure the provider with the existing provider properties / environment variables.
    • If the AZURE_FEDERATED_TOKEN_FILE env var is not set, we should skip over it. If it is set, but points to an invalid file, or the contents cannot be read, we should raise an error.
    • If the AZURE_TENANT_ID env var is not set, we should use the value of tenant_id.
    • If the AZURE_CLIENT_ID env var is not set, we should use the value of client_id / client_id_file_path.
    • If any of the above env vars are set, appear to be valid, but are different to any of their existing corresponding properties, we should raise an error.
  6. For testing, we'd prefer to create a new test function to test this in isolation, for example TestAccProvider_aksWorkloadIdentity().

This approach would on balance be preferable to specifying the environment variables for client_id, tenant_id and oidc_token_file_path in the provider schema, as it means they will only be consumed conditionally when enabled by the practitioner. It will also establish an order of precedence and prevent users from inadvertently supplying the wrong authentication primitives.

Please do let me know what you think, and whether you'd be keen to work on this? Thanks!

@mrsheepuk
Copy link
Contributor Author

mrsheepuk commented Nov 21, 2023

Thanks for discussing it and for the feedback @manicminer - I like the approach you have suggested, and most of it sounds reasonably straightforward. I might need some help with the new test function, but let me start from the top and work down your list and see how far I get.

I have to admit I was a bit worried about implicitly picking up the wrong tenant ID / client ID if my approach was run in an existing flow that happened to have AZURE_TENANT_ID and/or AZURE_CLIENT_ID set (less so the AZURE_FEDERATED_TOKEN_FILE as that is unlikely to be set in other flows), so I prefer the intentionality of a new provider property.

@github-actions github-actions bot added size/M and removed size/XS labels Nov 22, 2023
@mrsheepuk
Copy link
Contributor Author

mrsheepuk commented Nov 22, 2023

I think I've covered all the points in dc0932d

I'm not sure how you typically run the acceptance tests, so I, ahem, borrowed a token file from an AKS cluster into a local file named 'auth' so I could run the new acceptance test:

$ ARM_SUBSCRIPTION_ID=(redacted) AZURE_FEDERATED_TOKEN_FILE=${PWD}/auth AZURE_CLIENT_ID=(redacted) AZURE_TENANT_ID=(redacted) TF_ACC=1 go test ./internal/provider -v -run=TestAccProvider_aksWorkloadIdentityAuth -timeout 1m -ldflags="-X=github.com/hashicorp/terraform-provider-azurerm/version.ProviderVersion=acc"
=== RUN   TestAccProvider_aksWorkloadIdentityAuth
--- PASS: TestAccProvider_aksWorkloadIdentityAuth (2.28s)
PASS
ok  	github.com/hashicorp/terraform-provider-azurerm/internal/provider	4.132s

I also ran it with incorrect values etc to check that it does indeed fail when the values are not correct.

I repeated my above manual test inside an AKS pod as well, using the new 'use_aks_workload_identity = true' in my provider.tf instead of use_oidc and it worked perfectly 🎉

I think it could use more documentation, but writing a whole document on how to set up AKS Workload Identity feels a bit out of scope, so not sure how far we'd need to take that?

@manicminer
Copy link
Contributor

@mrsheepuk Thanks for making the changes, and for the typo fixes along the way - that looks great at first pass. I'll play a bit locally and come back with some thoughts on how we might want to test this. We may elect to do the same as with GitHub OIDC and rely on the execution environment being set up out of band, or possibly add some extra tooling to spin up a cluster as needed, deploy to a pod, and run it there. Don't want to delay this unnecessarily but I think it warrants some thought.

For the docs, I was thinking it would be fine to add this as a new section in the existing OIDC Guide. If you want to have a go at this, please feel free (what you've written up so far is great) but I'm also happy to help with that prior to merging.

@github-actions github-actions bot added size/L and removed size/M labels Nov 22, 2023
@mrsheepuk
Copy link
Contributor Author

I tried adding it to the existing OIDC guide but couldn't quite make it fit, so I've pushed a first draft of doing it as a separate doc. Feel free to make changes / amend / delete the whole thing 😆 but it's a start.

@mrsheepuk
Copy link
Contributor Author

Just noticed the lint failure - will fix up. @manicminer any further thoughts on how we can test this? anything I can do to help with that?

@manicminer manicminer added this to the v3.85.0 milestone Dec 14, 2023
Copy link
Contributor

@manicminer manicminer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrsheepuk Given that we currently do not perform complete automated end-to-end testing for all the authentication methods we support, I'm happy to approve this based on the manual testing that you and I have conducted - I can confirm that this works great in the two setups that I have tested with.

Thanks again for this contribution. This is a great quality-of-life improvement and I appreciate the time taken to document this, as well as tidying up some typos in general. I pushed a couple of wording tweaks but aside from those, this is good to merge!


Testing with client ID annotation:

SCR-20231213-uy4

Screenshot 2023-12-13 at 22 15 22

@manicminer manicminer merged commit f60ef7c into hashicorp:main Dec 14, 2023
24 checks passed
manicminer added a commit that referenced this pull request Dec 14, 2023
@mrsheepuk
Copy link
Contributor Author

Ah brilliant, thanks for taking the time to review and test it out @manicminer :)

dduportal pushed a commit to jenkins-infra/azure that referenced this pull request Dec 15, 2023
<Actions>
<action
id="f410411e63aff4bb73a81c2aec1d373cf8a903e63b30dee2006b0030d8a94cc8">
        <h3>Bump Terraform `azurerm` provider version</h3>
<details
id="1d9343c012f5434ac9fe8a98135bae3667b399259be16d9b14302ea3bd424a24">
            <summary>Update Terraform lock file</summary>
<p>&#34;hashicorp/azurerm&#34; updated from &#34;3.84.0&#34; to
&#34;3.85.0&#34; in file &#34;.terraform.lock.hcl&#34;</p>
            <details>
                <summary>3.85.0</summary>
<pre>Changelog retrieved
from:&#xA;&#x9;https://github.com/hashicorp/terraform-provider-azurerm/releases/tag/v3.85.0&#xA;FEATURES:&#xA;&#xA;*
New Data Source: `azurerm_locations`
([#23324](hashicorp/terraform-provider-azurerm#23324
New Resource: `azurerm_iotcentral_organization`
([#23132](https://github.com/hashicorp/terraform-provider-azurerm/issues/23132))&#xA;&#xA;ENHANCEMENTS:&#xA;&#xA;*
provider: support for authenticating using Azure Kubernetes Service
Workload Identity
([#23965](hashicorp/terraform-provider-azurerm#23965
dependencies: updating to `v0.65.0` of
`github.com/hashicorp/go-azure-helpers`
([#24222](hashicorp/terraform-provider-azurerm#24222
dependencies: updating to `v0.20231214.1220802` of
`github.com/hashicorp/go-azure-sdk`
([#24246](hashicorp/terraform-provider-azurerm#24246
dependencies: updating to version `v0.20231214.1160726` of
`github.com/hashicorp/go-azure-sdk`
([#24241](hashicorp/terraform-provider-azurerm#24241
dependencies: update `security/automation` to use
`hashicorp/go-azure-sdk`
([#24156](hashicorp/terraform-provider-azurerm#24156
`dataprotection`: updating to API Version `2023-05-01`
([#24143](hashicorp/terraform-provider-azurerm#24143
`kusto`: removing the remnants of the old Resource ID Parsers now this
uses `hashicorp/go-azure-sdk`
([#24238](hashicorp/terraform-provider-azurerm#24238
Data Source: `azurerm_cognitive_account` - export the `identity` block
([#24214](hashicorp/terraform-provider-azurerm#24214
Data Source: `azurerm_monitor_workspace` - add support for the
`default_data_collection_endpoint_id` and
`default_data_collection_rule_id` properties
([#24153](hashicorp/terraform-provider-azurerm#24153
Data Source: `azurerm_shared_image_gallery` - add support for the
`image_names` property
([#24176](hashicorp/terraform-provider-azurerm#24176
`azurerm_dns_txt_record` - allow up to `4096` characters for the
property `record.value`
([#24169](hashicorp/terraform-provider-azurerm#24169
`azurerm_container_app` - support for the `workload_profile_name`
property
([#24219](hashicorp/terraform-provider-azurerm#24219
`azurerm_container_app` - suppot for the `init_container` block
([#23955](hashicorp/terraform-provider-azurerm#23955
`azurerm_hpc_cache_blob_nfs_target` - support for the
`verification_timer_in_seconds` and `write_back_timer_in_seconds`
properties
([#24207](hashicorp/terraform-provider-azurerm#24207
`azurerm_hpc_cache_nfs_target` - support for the
`verification_timer_in_seconds` and `write_back_timer_in_seconds`
properties
([#24208](hashicorp/terraform-provider-azurerm#24208
`azurerm_linux_web_app` - make `client_secret_setting_name` optional and
conflict with `client_secret_certificate_thumbprint`
([#21834](hashicorp/terraform-provider-azurerm#21834
`azurerm_linux_web_app_slot` - make `client_secret_setting_name`
optional and conflict with `client_secret_certificate_thumbprint`
([#21834](hashicorp/terraform-provider-azurerm#21834
`azurerm_linux_web_app` - fix a bug in `app_settings` where settings
could be lost
([#24221](hashicorp/terraform-provider-azurerm#24221
`azurerm_linux_web_app_slot` - fix a bug in `app_settings` where
settings could be lost
([#24221](hashicorp/terraform-provider-azurerm#24221
`azurerm_log_analytics_workspace` - add support for the
`immediate_data_purge_on_30_days_enabled` property
([#24015](hashicorp/terraform-provider-azurerm#24015
`azurerm_mssql_server` - support for other identity types for the key
vault key
([#24236](hashicorp/terraform-provider-azurerm#24236
`azurerm_machine_learning_datastore_blobstorage` - resource now skips
validation when being created
([#24078](hashicorp/terraform-provider-azurerm#24078
`azurerm_machine_learning_datastore_datalake_gen2` - resource now skips
validation when being created
([#24078](hashicorp/terraform-provider-azurerm#24078
`azurerm_machine_learning_datastore_fileshare` - resource now skips
validation when being created
([#24078](hashicorp/terraform-provider-azurerm#24078
`azurerm_monitor_workspace` - support for the
`default_data_collection_endpoint_id` and
`default_data_collection_rule_id` properties
([#24153](hashicorp/terraform-provider-azurerm#24153
`azurerm_redis_cache` - support for the
`storage_account_subscription_id` property
([#24101](hashicorp/terraform-provider-azurerm#24101
`azurerm_storage_blob` - support for the `source_content` type `Page`
([#24177](hashicorp/terraform-provider-azurerm#24177
`azurerm_web_application_firewall_policy` - support new values to the
`rule_group_name` property
([#24194](hashicorp/terraform-provider-azurerm#24194
`azurerm_windows_web_app` - make the `client_secret_setting_name`
property optional and conflicts with the
`client_secret_certificate_thumbprint` property
([#21834](hashicorp/terraform-provider-azurerm#21834
`azurerm_windows_web_app_slot` - make the `client_secret_setting_name`
property optional and conflicts with the
`client_secret_certificate_thumbprint` property
([#21834](hashicorp/terraform-provider-azurerm#21834
`azurerm_windows_web_app` - fix a bug in `app_settings` where settings
could be lost
([#24221](hashicorp/terraform-provider-azurerm#24221
`azurerm_windows_web_app_slot` - fix a bug in `app_settings` where
settings could be lost
([#24221](hashicorp/terraform-provider-azurerm#24221
`azurerm_cognitive_account` - add `ContentSafety` to the `kind` property
validation
([#24205](https://github.com/hashicorp/terraform-provider-azurerm/issues/24205))&#xA;&#xA;BUG
FIXES:&#xA;&#xA;* provider: fix an authentication issue with Azure
Storage when running in Azure China cloud
([#24246](hashicorp/terraform-provider-azurerm#24246
Data Source: `azurerm_role_definition` - fix bug where
`role_definition_id` and `scope` were being incorrectly set
([#24211](hashicorp/terraform-provider-azurerm#24211
`azurerm_batch_account` - fix bug where `UserAssigned, SystemAssigned`
could be passed to the resource even though it isn&#39;t supported
([#24204](hashicorp/terraform-provider-azurerm#24204
`azurerm_batch_pool` - fix bug where `settings_json` and
`protected_settings` were not being unmarshaled
([#24075](hashicorp/terraform-provider-azurerm#24075
`azurerm_bot_service_azure_bot` - fix bug where
`public_network_access_enabled` was being set as the value for `LuisKey`
([#24164](hashicorp/terraform-provider-azurerm#24164
`azurerm_cognitive_account_customer_managed_key` - `identity_client_id`
is no longer passed to the api when it is empty
([#24231](hashicorp/terraform-provider-azurerm#24231
`azurerm_linux_web_app_slot` - error when `service_plan_id` is identical
to the parent `service_plan_id`
([#23403](hashicorp/terraform-provider-azurerm#23403
`azurerm_management_group_template_deployment` - fixing a bug where
`template_spec_version_id` couldn&#39;t be updated
([#24072](hashicorp/terraform-provider-azurerm#24072
`azurerm_pim_active_role_assignment` - fix an importing issue by
filtering available role assignments based on the provided `scope`
([#24077](hashicorp/terraform-provider-azurerm#24077
`azurerm_pim_eligible_role_assignment` - fix an importing issue by
filtering available role assignments based on the provided `scope`
([#24077](hashicorp/terraform-provider-azurerm#24077
`azurerm_resource_group_template_deployment` - fixing a bug where
`template_spec_version_id` couldn&#39;t be updated
([#24072](hashicorp/terraform-provider-azurerm#24072
`azurerm_security_center_setting` - fix the casing for the
`setting_name` `Sentinel`
([#24210](hashicorp/terraform-provider-azurerm#24210
`azurerm_storage_account` - Fix crash when checking for
`routingInputs.PublishInternetEndpoints` and
`routingInputs.PublishMicrosoftEndpoints`
([#24228](hashicorp/terraform-provider-azurerm#24228
`azurerm_storage_share_file` - prevent panic when the file specified by
`source` is empty
([#24179](hashicorp/terraform-provider-azurerm#24179
`azurerm_subscription_template_deployment` - fixing a bug where
`template_spec_version_id` couldn&#39;t be updated
([#24072](hashicorp/terraform-provider-azurerm#24072
`azurerm_tenant_template_deployment` - fixing a bug where
`template_spec_version_id` couldn&#39;t be updated
([#24072](hashicorp/terraform-provider-azurerm#24072
`azurerm_virtual_machine` - prevent a panic by nil checking the first
element of `additional_capabilities`
([#24159](hashicorp/terraform-provider-azurerm#24159
`azurerm_windows_web_app_slot` - error when `service_plan_id` is
identical to the parent `service_plan_id`
([#23403](https://github.com/hashicorp/terraform-provider-azurerm/issues/23403))&#xA;&#xA;&#xA;</pre>
            </details>
        </details>
<a
href="https://infra.ci.jenkins.io/job/terraform-jobs/job/azure/job/main/942/">Jenkins
pipeline link</a>
    </action>
</Actions>

---

<table>
  <tr>
    <td width="77">
<img src="https://www.updatecli.io/images/updatecli.png" alt="Updatecli
logo" width="50" height="50">
    </td>
    <td>
      <p>
Created automatically by <a
href="https://www.updatecli.io/">Updatecli</a>
      </p>
      <details><summary>Options:</summary>
        <br />
<p>Most of Updatecli configuration is done via <a
href="https://www.updatecli.io/docs/prologue/quick-start/">its
manifest(s)</a>.</p>
        <ul>
<li>If you close this pull request, Updatecli will automatically reopen
it, the next time it runs.</li>
<li>If you close this pull request and delete the base branch, Updatecli
will automatically recreate it, erasing all previous commits made.</li>
        </ul>
        <p>
Feel free to report any issues at <a
href="https://github.com/updatecli/updatecli/issues">github.com/updatecli/updatecli</a>.<br
/>
If you find this tool useful, do not hesitate to star <a
href="https://github.com/updatecli/updatecli/stargazers">our GitHub
repository</a> as a sign of appreciation, and/or to tell us directly on
our <a
href="https://matrix.to/#/#Updatecli_community:gitter.im">chat</a>!
        </p>
      </details>
    </td>
  </tr>
</table>

Co-authored-by: Jenkins Infra Bot (updatecli) <[email protected]>
Copy link

github-actions bot commented May 3, 2024

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants