Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service principal with working Azure Roles as tf context is unable to authenticate via kubernetes provider block but az aks get-credentials and kubectl get pods -n xy works #1964

Closed
slzmruepp opened this issue Jan 20, 2023 · 7 comments

Comments

@slzmruepp
Copy link

slzmruepp commented Jan 20, 2023

Terraform Version, Provider Version and Kubernetes Version

Terraform version: 1.3.7
Kubernetes provider version: 2.16.1
Kubernetes version: 1.24.6

Affected Resource(s)

  • data.azurerm_kubernetes_cluster
  • all other kubernetes resources

Terraform Configuration Files

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = ">= 3.39.1"
    }
    azuread = {
      source  = "hashicorp/azuread"
      version = ">= 2.33.0"
    }
    kubernetes = {
      source = "hashicorp/kubernetes"
      version = ">= 2.16.1"
    }
  }
  required_version = ">= 1.3.7"
  backend "azurerm" {
  }
}

data "azurerm_kubernetes_cluster" "aks_provider_config" {
  name                = var.env_config[var.ENV][ "aks_cluster_name" ]
  resource_group_name = var.env_config[var.ENV][ "aks_rg_name" ]
}

provider "kubernetes" {
  host                   = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.host
  username               = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.username
  password               = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.password
  client_certificate     = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.client_certificate)
  client_key             = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.client_key)
  cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.cluster_ca_certificate)
}

Steps to Reproduce

  1. We create a aks cluster with service principal sp-1
  2. We create a project service principal sp-2
  3. sp-2 is a project based SP which has no IAM roles except the following:
  4. We grant sp-2 Azure Kubernetes Service Cluster User Role (this should allow it to fetch kubeconfig
  5. We grant sp-2 rbac admin on the project namespace:
resource "azurerm_role_assignment" "role_cluster_rbac_admin" {
  scope                = "${var.aks_cluster_id}/namespaces/${var.aks_proj_ns}"
  role_definition_name = "Azure Kubernetes Service RBAC Admin"
  principal_id         = azuread_group.sg.id
  depends_on           = [azuread_group.sg]
}

(This allows the sp-2 to do everything in its namespace: kubectl list all -n var.aks_proj_ns works, kubectl list all does not work)
This is tested with az login sp-2 and executing kubectl commands in azure pipelines, it works.

  1. If we try to set up a second terraform project, authenticate with sp-2 and with the config I provided, we get the following error:
{"error":{"code":"AuthorizationFailed","message":"The client '<<sp-2 objectid>>' with object id '<<sp-2 objectid>>' does not have authorization to perform action 'Microsoft.ContainerService/managedClusters/accessProfiles/listCredential/action' over scope '/subscriptions/<<subscriptionid>>/resourceGroups/<<aks-resource-group-name>>/providers/Microsoft.ContainerService/managedClusters/<<aks-cluster-name>>/accessProfiles/clusterUser' or the scope is invalid. If access was recently granted, please refresh your credentials."}}: timestamp=2023-01-19T15:56:02.124Z
2023-01-19T15:56:02.125Z [ERROR] provider.terraform-provider-azurerm_v3.39.1_x5: Response contains error diagnostic: diagnostic_detail= diagnostic_severity=ERROR tf_rpc=ReadDataSource tf_req_id=1e60db36-ddcc-1dd4-386c-a0cd68dc1a86 @caller=github.com/hashicorp/[email protected]/tfprotov5/internal/diag/diagnostics.go:55 @module=sdk.proto diagnostic_summary="retrieving Access Profile for Managed Cluster (Subscription: "<<subscriptionid>>"
Resource Group Name: "<<aks-resource-group-name>>"
Resource Name: "<<aks-cluster-name>>"): managedclusters.ManagedClustersClient#GetAccessProfile: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client '<<sp-2 objectid>>' with object id '<<sp-2 objectid>>' does not have authorization to perform action 'Microsoft.ContainerService/managedClusters/accessProfiles/listCredential/action' over scope '/subscriptions/<<subscriptionid>>/resourceGroups/<<aks-resource-group-name>>/providers/Microsoft.ContainerService/managedClusters/<<aks-cluster-name>>/accessProfiles/clusterUser' or the scope is invalid. If access was recently granted, please refresh your credentials."" tf_data_source_type=azurerm_kubernetes_cluster tf_proto_version=5.3 tf_provider_addr=provider timestamp=2023-01-19T15:56:02.124Z
2023-01-19T15:56:02.125Z [ERROR] vertex "data.azurerm_kubernetes_cluster.aks_provider_config" error: retrieving Access Profile for Managed Cluster (Subscription: "<<subscriptionid>>"
Resource Group Name: "<<aks-resource-group-name>>"
Resource Name: "<<aks-cluster-name>>"): managedclusters.ManagedClustersClient#GetAccessProfile: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client '<<sp-2 objectid>>' with object id '<<sp-2 objectid>>' does not have authorization to perform action 'Microsoft.ContainerService/managedClusters/accessProfiles/listCredential/action' over scope '/subscriptions/<<subscriptionid>>/resourceGroups/<<aks-resource-group-name>>/providers/Microsoft.ContainerService/managedClusters/<<aks-cluster-name>>/accessProfiles/clusterUser' or the scope is invalid. If access was recently granted, please refresh your credentials."
2023-01-19T15:56:02.125Z [ERROR] vertex "data.azurerm_kubernetes_cluster.aks_provider_config (expand)" error: retrieving Access Profile for Managed Cluster (Subscription: "<<subscriptionid>>"
Resource Group Name: "<<aks-resource-group-name>>"
Resource Name: "<<aks-cluster-name>>"): managedclusters.ManagedClustersClient#GetAccessProfile: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client '<<sp-2 objectid>>' with object id '<<sp-2 objectid>>' does not have authorization to perform action 'Microsoft.ContainerService/managedClusters/accessProfiles/listCredential/action' over scope '/subscriptions/<<subscriptionid>>/resourceGroups/<<aks-resource-group-name>>/providers/Microsoft.ContainerService/managedClusters/<<aks-cluster-name>>/accessProfiles/clusterUser' or the scope is invalid. If access was recently granted, please refresh your credentials."
2023-01-19T15:56:02.126Z [INFO]  backend/local: plan operation completed
╷
│ Error: retrieving Access Profile for Managed Cluster (Subscription: "<<subscriptionid>>"
│ Resource Group Name: "<<aks-resource-group-name>>"
│ Resource Name: "<<aks-cluster-name>>"): managedclusters.ManagedClustersClient#GetAccessProfile: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client '<<sp-2 objectid>>' with object id '<<sp-2 objectid>>' does not have authorization to perform action 'Microsoft.ContainerService/managedClusters/accessProfiles/listCredential/action' over scope '/subscriptions/<<subscriptionid>>/resourceGroups/<<aks-resource-group-name>>/providers/Microsoft.ContainerService/managedClusters/<<aks-cluster-name>>/accessProfiles/clusterUser' or the scope is invalid. If access was recently granted, please refresh your credentials."
│ 
│   with data.azurerm_kubernetes_cluster.aks_provider_config,
│   on var-proj.tf line 12, in data "azurerm_kubernetes_cluster" "aks_provider_config":
│   12: data "azurerm_kubernetes_cluster" "aks_provider_config" {

If I grant sp-2 Contributor role on the aks-resource group, it works without error, but if we then do:

data "kubernetes_namespace" "example" {
  metadata {
    name = "var.aks_proj_ns"
  }
}

we get the error kubernetes_namespace.example unauthenticated (or similar)

Only if we than change the provider setup to following:

provider "kubernetes" {
  host                   = data.azurerm_kubernetes_cluster.aks_provider_config.kube_admin_config.0.host
  username               = data.azurerm_kubernetes_cluster.aks_provider_config.kube_admin_config.0.username
  password               = data.azurerm_kubernetes_cluster.aks_provider_config.kube_admin_config.0.password
  client_certificate     = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_admin_config.0.client_certificate)
  client_key             = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_admin_config.0.client_key)
  cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_admin_config.0.cluster_ca_certificate)
}

everything works as expected. But we grant the project sp-2 which should then have limited permissions contributor rights on the aks resource group (which is a no go) and also RBAC admin on the cluster which I don't even know where this comes from, I only suspect that this is inherited from the Contributor role on the resource group.

Expected Behavior

What should have happened?
We want the sp-2 with limited permissions to only be able to see and manage the project namespace for which it has the RBAC Admin rights anyway and only deploy to this namespace kube objects through terraform.
We want the provider configuration to work as documented (the sp-2 of the tf context has Kubernetes User Role which should allow it to download the certs and auth for acting on the specific namespace.

Actual Behavior

What actually happened?
Despite the sp-2 has the appropriate roles which are verified by using az aks commands and kubectl commands to download kubeconfig and act on the specific namespace it has RBAC Admin role for, the kubernetes provider fails with 403 error.

Important Factoids

References

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@slzmruepp slzmruepp added the bug label Jan 20, 2023
@github-actions github-actions bot removed the bug label Jan 20, 2023
@mruepp
Copy link

mruepp commented Jan 20, 2023

One Addition I forgot to mention: The asignee of the roles Kubernetes User and Namespace RBAC Admin is not the service principal itself, but a security group "azuread_group.sg.id". The app registration/service principal is member in this group.

@mruepp
Copy link

mruepp commented Feb 2, 2023

Can anyone verify this issue? Its quite a blocker for us. Thanks

@browley86
Copy link

browley86 commented Feb 17, 2023

I just wanted to shed a bit more light on the issue, the TLDR is that Terraform is calling a soon-to-be deprecated API. More specifically, based off the error message, the endpoint is calling: https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ContainerService/managedClusters/{resourceName}/accessProfiles/{roleName}/listCredential which the link above calls out will soon be deprecated and to use either the ListClusterUserCredentials or the ListClusterAdminCredentials API. While normally the "soon-to-be deprecated" will imply time to update the underlying APIs, the issue is that newer Azure Service Principal permissions are going to be scoped against the non-deprecated API which means that newer workflows with newly created service principals, such as calling the Azure az cli, will work and Terraform will fail unless the service principal was created with permissions scoped to the old API. I wanted to post the steps to re-crate but I really struggled to get curl running. Finally a found a post the illustrates az rest and hitting endpoints so I did the following:

# [OPTIONAL] Pull latest docker container for run
docker run -it mcr.microsoft.com/azure-cli /bin/bash

# Set ENV for below commands 
export AZURE_CLIENT_ID=<replace w/your Service Principal's clientId>
export AZURE_CLIENT_SECRET=<replace w/your Service Principal's clientSecret>
export AZURE_SUBSCRIPTION_ID=<replace w/your Service Principal's subscriptionId>
export AZURE_TENANT_ID=<replace w/your Service Principal's tenantId>
export AZURE_RESOURCE_GROUP=<replace w/the Resource Group of the AKS cluster>
export AZURE_RESOURCE_NAME=<replace w/name of target AKS cluster>

# Get SP token
az login --service-principal -u $AZURE_CLIENT_ID -p $AZURE_CLIENT_SECRET --tenant $AZURE_TENANT_ID

# Hit accessProfiles endpoint
az rest -m post --header "Accept=application/json" -u 'https://management.azure.com/subscriptions/${AZURE_SUBSCRIPTION_ID}/resourceGroups/${AZURE_RESOURCE_GROUP}/providers/Microsoft.ContainerService/managedClusters/${AZURE_RESOURCE_NAME}/accessProfiles/clusterUser/listCredential?api-version=2022-11-01'
## This results in a 403: 
## Forbidden({"error":{"code":"AuthorizationFailed","message":"The client '<CLIENT>' with object id '<SP_OBJECT_ID>' does not have authorization to perform action 'Microsoft.ContainerService/managedClusters/accessProfiles/listCredential/action' over scope '/subscriptions/<SUBSCRIPTION>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.ContainerService/managedClusters/<AZURE_RESOURCE_NAME>/accessProfiles/clusterUser' or the scope is invalid. If access was recently granted, please refresh your credentials."}})

# Hit listClusterUserCredential endpoint
az rest -m post --header "Accept=application/json" -u "https://management.azure.com/subscriptions/${AZURE_SUBSCRIPTION_ID}/resourceGroups/${AZURE_RESOURCE_GROUP}/providers/Microsoft.ContainerService/managedClusters/${AZURE_RESOURCE_NAME}/listClusterUserCredential?api-version=2022-11-01"
## Returns 200 w/JSON {"kubeconfigs":[{"name":"clusterUser","value": "<BASE64 ENCODED KUBECONFIG STRING>"}]}

Here's the thing: I don't think the kubernetes provider-- where this issue is-- is the right place for this. The k8s provider is just a victim. This is probably better suited under the azurerm provider. I'm going to take the above and bring it there.

@slzmruepp
Copy link
Author

@browley86 Thank you for the analysis. Should I file the issue with the azurerm provider?

@browley86
Copy link

@slzmruepp - Yes, please, if you have time. I'm buried in other things this week and probably won't get around to it. Apologies and thanks for the assist.

@slzmruepp
Copy link
Author

@browley86 It seems I can not transfer the thread because of permissions. I am not a hashicorp member. Do you want me to copy paste the stuff? Thx

@iBrandyJackson
Copy link
Member

iBrandyJackson commented Apr 12, 2023

this is an issue for the azurerm terraform provider and has been created there - closing this issue.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants