Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetching AKS Cluster credentials unfortunately does still not work. PR #20927 did NOT solve the issue #21183

Closed
1 task done
slzmruepp opened this issue Mar 29, 2023 · 14 comments · Fixed by #21229
Closed
1 task done
Labels
Milestone

Comments

@slzmruepp
Copy link

slzmruepp commented Mar 29, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

1.4.2

AzureRM Provider Version

3.49.0

Affected Resource(s)/Data Source(s)

azurerm_kubernetes_cluster

Terraform Configuration Files

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = ">= 3.49.0"
    }
    azuread = {
      source  = "hashicorp/azuread"
      version = ">= 2.36.0"
    }
    kubernetes = {
    source = "hashicorp/kubernetes"
    version = ">= 2.19.0"
    }
    helm = {
      source = "hashicorp/helm"
      version = "2.9.0"
    }
  }


  required_version = ">= 0.14.9"
  backend "azurerm" {
  }
}

# Configure the Microsoft Azure Provider
provider "azurerm" {
  features {}
}

# Configure the Microsoft Azure Provider
provider "kubernetes" {
  host                   = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.host
  username               = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.username
  password               = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.password
  client_certificate     = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.client_certificate)
  client_key             = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.client_key)
  cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.cluster_ca_certificate)
}
data "azurerm_kubernetes_cluster" "aks_provider_config" {
  name                = var.env_config[var.ENV][ "aks_cluster_name" ]
  resource_group_name = var.env_config[var.ENV][ "aks_rg_name" ]
}

Debug Output/Panic Output

Planning failed. Terraform encountered an error while generating this plan.

╷
│ Error: retrieving Admin Credentials for Managed Cluster (Subscription: "XXX"
│ Resource Group Name: "rg-k8s"
│ Managed Cluster Name: "aks"): managedclusters.ManagedClustersClient#ListClusterAdminCredentials: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client 'XXX' with object id 'XXX' does not have authorization to perform action 'Microsoft.ContainerService/managedClusters/listClusterAdminCredential/action' over scope '/subscriptions/XXX/resourceGroups/rg-k8s/providers/Microsoft.ContainerService/managedClusters/aks' or the scope is invalid. If access was recently granted, please refresh your credentials."
│ 
│   with data.azurerm_kubernetes_cluster.aks_provider_config,
│   on var-proj.tf line 12, in data "azurerm_kubernetes_cluster" "aks_provider_config":
│   12: data "azurerm_kubernetes_cluster" "aks_provider_config" {
│ 
╵
##[error]Terraform command 'plan' failed with exit code '1'.
##[error]╷
│ Error: retrieving Admin Credentials for Managed Cluster (Subscription: "XXX"
│ Resource Group Name: "rg-k8s"
│ Managed Cluster Name: "aks"): managedclusters.ManagedClustersClient#ListClusterAdminCredentials: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client 'XXX' with object id 'XXX' does not have authorization to perform action 'Microsoft.ContainerService/managedClusters/listClusterAdminCredential/action' over scope '/subscriptions/XXX/resourceGroups/rg-k8s/providers/Microsoft.ContainerService/managedClusters/aks' or the scope is invalid. If access was recently granted, please refresh your credentials."
│ 
│   with data.azurerm_kubernetes_cluster.aks_provider_config,
│   on var-proj.tf line 12, in data "azurerm_kubernetes_cluster" "aks_provider_config":
│   12: data "azurerm_kubernetes_cluster" "aks_provider_config" {
│ 
╵

Expected Behaviour

The implementation should work as documented in the provider:
https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/guides/getting-started

Actual Behaviour

Despite the merge of PR:
#20927
the issue:
#20843

is not solved. The whole problem is described here: #20843

We have the following problem now: We dont try to fetch the admin credentials but the USER credentials as you can see here:
Using data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.host as OPPOSED to when you fetch the admin config which is not protected by RBAC:
data.azurerm_kubernetes_cluster.aks_provider_config.kube_admin_config.0.host

  host                   = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.host
  username               = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.username
  password               = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.password
  client_certificate     = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.client_certificate)

The implementation here #20927
is buggy because under it seems that it tries to fetch ALWAYS the Admin credentials even when the request is the USER credentials: kube_config versus kube_admin_config. This usecase was not implemented.

Steps to Reproduce

Create a service principal.

resource "azuread_application" "app" {
  display_name = var.sp_name
  owners       = distinct(concat([data.azuread_client_config.current.object_id],var.sp_owners))
}

resource "azuread_service_principal" "sp" {
  application_id               = azuread_application.app.application_id
  app_role_assignment_required = var.sp_role_assignment_required
  owners                       = distinct(concat([data.azuread_client_config.current.object_id],var.sp_owners))
}

Create a security group and add the service principal:

resource "azuread_group" "sg" {
  description      = "This group grants access to the specific namespace in the aks environment"
  display_name     = var.sg_name
  owners           = distinct(concat([data.azuread_client_config.current.object_id],var.sg_owners))
  security_enabled = var.sg_security_enabled
  members          = distinct(concat([azuread_service_principal.sp.object_id],var.sg_members))
}

Create a Kubernetes Namespace.

resource "kubernetes_namespace_v1" "cluster_ns" {
  metadata {
    annotations = {
      project = var.project_data["proj_name"]
      devops  = var.project_data["devops_name"]
    }

    labels = {
      env = var.project_data["env"]
    }

    name = var.project_data["proj_name"]
  }
}

Give the service principal security group the following roles:

resource "azurerm_role_assignment" "role_cluster_user" {
  scope                = var.aks_cluster_id
  role_definition_name = "Azure Kubernetes Service Cluster User Role"
  principal_id         = azuread_group.sg.id
  depends_on           = [azuread_group.sg]
}

resource "azurerm_role_assignment" "role_cluster_rbac_admin" {
  scope                = "${var.aks_cluster_id}/namespaces/${var.aks_proj_ns}"
  role_definition_name = "Azure Kubernetes Service RBAC Admin"
  principal_id         = azuread_group.sg.id
  depends_on           = [azuread_group.sg]
}

Test if the service principal is able to fetch the credentials via:

        az aks get-credentials -n ${{ parameters.kubernetesCluster }} -g ${{ parameters.azureResourceGroup }} --overwrite-existing
        kubelogin convert-kubeconfig -l azurecli

        kubectl -n ${{ parameters.namespace }} ${{ parameters.command }} ${{ parameters.arguments }}

Which WORKS! The SP gets the proper credentials and is only allowed to manage the specific Namespace it has RBAC Admin permissions on.

It can NOT though fetch the cluster credentials from the proper endpoint by the kubernetes-provider. The issue was not solved with the code change here:
#20927

Important Factoids

No response

References

No response

@slzmruepp slzmruepp added the bug label Mar 29, 2023
@github-actions github-actions bot removed the bug label Mar 29, 2023
@slzmruepp
Copy link
Author

Tagging here @lonegunmanb and @browley86 refering also to the issue in the k8s provider repo: hashicorp/terraform-provider-kubernetes#1964

@slzmruepp slzmruepp changed the title Fetching AKS Cluster credentials unfortunately does still not work. Fetching AKS Cluster credentials unfortunately does still not work. PR #20927 did NOT solve the issue Mar 30, 2023
@lonegunmanb

This comment was marked as off-topic.

@tombuildsstuff
Copy link
Contributor

@lonegunmanb FYI I'm marking your comment as off-topic since this isn't an approach we'd recommend, the Data Source should support limited permissions, whereas the Resource requires that we have CRUD permissions (including the ability to List the Admin and User Credentials here).

@lonegunmanb
Copy link
Contributor

@tombuildsstuff thanks for the correction, I'll submit a new pr to solve this issue

@slzmruepp
Copy link
Author

slzmruepp commented Mar 30, 2023

In my opinion, the data source:

data.azurerm_kubernetes_cluster.aks_provider_config.kube_config

should use the https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ContainerService/managedClusters/{resourceName}/listClusterUserCredential?api-version=2023-01-01
API documented here:
https://learn.microsoft.com/en-us/rest/api/aks/managed-clusters/list-cluster-user-credentials?tabs=HTTP

(the provider should assume the authenticated role of the service principal context, terraform is running in)
If the sp/user context does not have Cluster User permissions
role_definition_name = "Azure Kubernetes Service Cluster User Role"
the data source should fail with permission error

And the data source:

data.azurerm_kubernetes_cluster.aks_provider_config.kube_admin_config

should use the
https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ContainerService/managedClusters/{resourceName}/listClusterAdminCredential?api-version=2023-01-01
API documented here:
https://learn.microsoft.com/en-us/rest/api/aks/managed-clusters/list-cluster-admin-credentials?tabs=HTTP

(the provider should assume the authenticated role of the service principal context, terraform is running in)
If the sp/user context does not have Cluster Admin permissions
role_definition_name = "Azure Kubernetes Service Cluster Admin Role"
the data source should fail with permission error

that would be consistent.
Thanks everyone for your effort!

@browley86
Copy link

browley86 commented Mar 30, 2023

There's a lot flying around here, I think it's worth a summary: there are 3 API endpoints one can call to get cluster creds:

The change here moved from the deprecated getAccessProfile API to the listClusterAdminCredential API which, for what it's worth, is a win. However, there still is a valid use case of using the non-admin list API to get credentials-- this is what is broken/not implemented. What I was suggesting earlier was to allow for an option, in both the resource and data resource, to use the different backend:

# Currently no option
data "azurerm_kubernetes_cluster" "my_cluster" {
  name                              = "my_cluster"
  resource_group_name = "my_cluster_az_resource_group"
}

vs

# Adding the option
data "azurerm_kubernetes_cluster" "my_cluster" {
  name                                 = "my_cluster"
  resource_group_name    = "my_cluster_az_resource_group"
  credential_api_backend = "listClusterUserCredential"
}

This would be defaulted to the listClusterAdminCredential to not change/break existing behavior (aka "happy path"). Conversely, this will allow both the non-admin API call AND allow an escape-route if Microsoft decides to deprecate these APIs in the future and/or change to other API endpoints. It's hard to say if this is a bug/feature, I won't get involved in that, but I do believe this should be an easy/safe change.

@mruepp
Copy link

mruepp commented Mar 30, 2023

I would be anytime available testing the implementation. Just write me a PM @lonegunmanb

@slzmruepp
Copy link
Author

slzmruepp commented Mar 31, 2023

I really am not sure this is going in the right direction. Is the problem clearly understood? There need to be two ways for the data object to get credentials. 1. Admin mode, 2. User mode. So the proposed solution by @browley86 would be the preferred one. See here:

# Adding the option
data "azurerm_kubernetes_cluster" "my_cluster" {
  name                                 = "my_cluster"
  resource_group_name    = "my_cluster_az_resource_group"
  credential_api_backend = "listClusterUserCredential"
}

But to suspend the error message when Admin mode is not working because the tf user context has no Cluster Admin Role is not the right approach. The whole K8s part later on, also the Helm chart provider, etc. will all fail because the credentials have not been obtained. See PR here: #21209
Maybe you chime in @tombuildsstuff Thx!

@tombuildsstuff
Copy link
Contributor

@slzmruepp @browley86

As mentioned in this comment when using the Data Source we should support limited permissions (that is, access to the AKS Cluster itself, but not necessarily the credentials endpoint) - however the Resource requires that we have CRUD access to the relevant APIs.

In this instance whilst both the Data Source and the Resource retrieve the Admin and User Credentials, at present the Data Source isn't correctly handling the limited permissions for either the Admin or User Credential endpoints - which #21129 will fix.

Since the provider needs to call both of these APIs to be able to expose these credentials, unfortunately we aren't planning to add a feature-toggle to the provider to select which API to use - however #21129 will solve this issue by supporting limited permissions to both the ListClusterAdminCredentials and ListClusterUserCredentials API endpoints when using the Data Source.

Thanks!

@mruepp
Copy link

mruepp commented Mar 31, 2023

@slzmruepp @browley86

As mentioned in this comment when using the Data Source we should support limited permissions (that is, access to the AKS Cluster itself, but not necessarily the credentials endpoint) - however the Resource requires that we have CRUD access to the relevant APIs.

In this instance whilst both the Data Source and the Resource retrieve the Admin and User Credentials, at present the Data Source isn't correctly handling the limited permissions for either the Admin or User Credential endpoints - which #21129 will fix.

Since the provider needs to call both of these APIs to be able to expose these credentials, unfortunately we aren't planning to add a feature-toggle to the provider to select which API to use - however #21129 will solve this issue by supporting limited permissions to both the ListClusterAdminCredentials and ListClusterUserCredentials API endpoints when using the Data Source.

Thanks!

Thanks. How will this data then be accessed? Will the limited permissions be provided by calling:
data.azurerm_kubernetes_cluster.aks_provider_config.kube_config

and the admin permissions by:
data.azurerm_kubernetes_cluster.aks_provider_config.kube_admin_config ?

How do you handle if the tf user context does not have the "Azure Kubernetes Service Cluster Admin Role"? Will the data structure be empty? Will the tf run fail or will it just ignore silently?

@tombuildsstuff
Copy link
Contributor

@mruepp

How will this data then be accessed? Will the limited permissions be provided by calling:
data.azurerm_kubernetes_cluster.aks_provider_config.kube_config

and the admin permissions by:
data.azurerm_kubernetes_cluster.aks_provider_config.kube_admin_config?

Correct.

How do you handle if the tf user context does not have the "Azure Kubernetes Service Cluster Admin Role"? Will the data structure be empty? Will the tf run fail or will it just ignore silently?

These fields will be empty if there's limited permissions, as in other data sources. So, if the user has access to the User Credentials endpoint, then that'll be populated, else it'll be empty - ditto with the Admin Credentials endpoint.

Hope that helps.

@mruepp
Copy link

mruepp commented Mar 31, 2023

@mruepp

How will this data then be accessed? Will the limited permissions be provided by calling:
data.azurerm_kubernetes_cluster.aks_provider_config.kube_config
and the admin permissions by:
data.azurerm_kubernetes_cluster.aks_provider_config.kube_admin_config?

Correct.

How do you handle if the tf user context does not have the "Azure Kubernetes Service Cluster Admin Role"? Will the data structure be empty? Will the tf run fail or will it just ignore silently?

These fields will be empty if there's limited permissions, as in other data sources. So, if the user has access to the User Credentials endpoint, then that'll be populated, else it'll be empty - ditto with the Admin Credentials endpoint.

Hope that helps.

Great, that sounds like a viable solution. It basically reflects all options in one data source depending on the combination of permissions. Thanks!

@github-actions github-actions bot added this to the v3.51.0 milestone Apr 4, 2023
@github-actions
Copy link

This functionality has been released in v3.51.0 of the Terraform Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
6 participants