Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to azurerm_kubernetes_cluster fail when cluster uses managed AAD integration #7325

Closed
pindey opened this issue Jun 15, 2020 · 35 comments · Fixed by #7874
Closed

Updates to azurerm_kubernetes_cluster fail when cluster uses managed AAD integration #7325

pindey opened this issue Jun 15, 2020 · 35 comments · Fixed by #7874

Comments

@pindey
Copy link

pindey commented Jun 15, 2020

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform v0.12.26

  • provider.azurerm v2.14.0

Affected Resource(s)

  • azurerm_kubernetes_cluster

Terraform Configuration Files

resource "azurerm_resource_group" "aks" {
  name     = "aks-service-rg"
  location = "northeurope"
}

resource "azurerm_kubernetes_cluster" "aks" {
  name                            = "aks-service"
  location                        = azurerm_resource_group.aks.location
  resource_group_name             = azurerm_resource_group.aks.name
  node_resource_group             = "aks-infra-rg"
  dns_prefix                      = "aks-dev"
  enable_pod_security_policy      = false
  private_cluster_enabled         = false
  api_server_authorized_ip_ranges = null
 
  default_node_pool {
    name            = "default"
    node_count      = 4
    vm_size         = "Standard_B2ms"
    os_disk_size_gb = 30
    vnet_subnet_id  = var.virtual_network.subnets.aks.id
    max_pods        = 60
    type            = "VirtualMachineScaleSets"
  }

  linux_profile {
    admin_username = var.admin_username

    ssh_key {
      key_data = tls_private_key.aks.public_key_openssh
    }
  }
  
  role_based_access_control {
    enabled = true

    azure_active_directory {
      managed                 = true
      admin_group_object_ids  = [for key, value in local.cluster_admins : value.object_id] 
    }
  }

  identity {
    type    = "SystemAssigned"
  }

  addon_profile {

    azure_policy {
      enabled = true
    }
    
    oms_agent {
      enabled                    = true
      log_analytics_workspace_id = var.log_analytics_workspace.id
    }

    kube_dashboard {
      enabled = true
    }

    http_application_routing {
      enabled = false
    }

  }

  network_profile {
    network_plugin     = "azure"
    network_policy     = "azure"
    load_balancer_sku  = "Basic"
    service_cidr       = var.kubernetes_service_cidr
    docker_bridge_cidr = var.docker_bridge_cidr
    dns_service_ip     = cidrhost(var.kubernetes_service_cidr, 2)
  }

  tags = local.tags

}

Debug Output

Panic Output

Expected Behavior

  • Enable feature 'Microsoft.ContainerService/AAD-V2' on subscription
  • Apply plan to create cluster with managed Azure Active Directory integration
  • Change value of tags - or any other argument that doesn't necessitate a replacement of the resource
  • Run terraform plan
  • Apply plan
  • Tags are updated to reflect changes

Actual Behavior

  • Enable feature 'Microsoft.ContainerService/AAD-V2' on subscription
  • Apply plan to create cluster with managed Azure Active Directory integration
  • Change value of tags - or any other argument that doesn't necessitate a replacement of the resource
  • Run terraform plan
  • Apply plan
  • Apply fails with error: -

Error: updating Managed Kubernetes Cluster AAD Profile in cluster "aks-service" (Resource Group "aks-service-rg"): containerservice.ManagedClustersClient#ResetAADProfile: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="Operation 'resetAADProfile' is not allowed for managed AAD enabled cluster."

Steps to Reproduce

  1. Register feature 'Microsoft.ContainerService/AAD-V2' on subscription as per https://docs.microsoft.com/en-us/azure/aks/managed-aad
  2. terraform plan
  3. terraform apply
  4. Make changes to resource
  5. terraform plan
  6. terraform apply

Important Factoids

References

  • #0000
@ekarlso

This comment has been minimized.

@jhawthorn22
Copy link

jhawthorn22 commented Jun 24, 2020

Week late on this buuutt... me and a colleague had same error yesterday. We noticed you could update the rbac details via cli so for anyone that wants a workaround while this is being looked at: we deleted the aks cluster, set the role_based_access_control block to

role_based_access_control {
    enabled = true
    azure_active_directory {
      managed = true
    }
}

then created a null resource where we update the managed admin ids

resource "null_resource" "update_admin_group_ids" {
  depends_on = [
    azurerm_kubernetes_cluster.aks
  ]
  provisioner "local-exec" {
    command = <<EOT
      # --update ids
      az aks update -g <resource_group> -n <name> --aad-tenant-id <tenant_id> --aad-admin-group-object-ids <admin_group_ids>
   EOT
  }
}

However, you'll also need a ignore_change on the aks rbac block

lifecycle {
    ignore_changes = [
      role_based_access_control
    ]
  }

az version: 2.8
azurerm_provider version: 2.15

EDIT: if tags change, it still raises the resetAADProfile error. You can add this to the ignore if that works for you, but obviously you can't update tags (big disadvantage). Unfortunately, there is no az aks update tags options either. Investigating using az resource tag

@EPinci

This comment has been minimized.

@jhawthorn22

This comment has been minimized.

@r3mattia

This comment has been minimized.

@r3mattia

This comment has been minimized.

@pindey

This comment has been minimized.

@jhawthorn22
Copy link

Really don't like this approach but it's working for us. Created 2 provisioners, one for the AAD admin group ids, one for updating the tags.

Admin groups provisioner:

resource "null_resource" "update_admin_group_ids" {
  depends_on = [
    azurerm_kubernetes_cluster.aks
  ]

  triggers = {
    admin_group_changed = var.default_config.aks_admin_group_id
  }

  provisioner "local-exec" {
    command ="${path.module}/scripts/update_admin_group_ids.sh -g ${azurerm_kubernetes_cluster.aks.resource_group_name} -n ${azurerm_kubernetes_cluster.aks.name} -t ${data.azurerm_client_config.current.tenant_id} -a ${var.default_config.aks_admin_group_id}"
    interpreter = ["bash", "-c"]
  }
}

AKS tags update provisioner:

resource "null_resource" "update_aks_tags" {
  depends_on = [
    azurerm_kubernetes_cluster.aks
  ]

  triggers = {
    always_run = timestamp()
  }

  provisioner "local-exec" {
    command = "${path.module}/scripts/update_tags.sh -n ${azurerm_kubernetes_cluster.aks.name} -g ${azurerm_kubernetes_cluster.aks.resource_group_name} -t ${jsonencode(var.tags)}"
    interpreter = ["bash", "-c"]
  }
}

Provisioner scripts:

#!/bin/bash

# options
while getopts g:n:t:a: option
do
    # shellcheck disable=SC2220
    case "${option}"
    in
        g) resource_group=${OPTARG};;
        n) aks_name=${OPTARG};;
        t) tenant_id=${OPTARG};;
        a) admin_group_id=${OPTARG};;
    esac
done

# --get extension
echo "--get aks-preview extension"
az extension add --name aks-preview
az extension list

# --register feature
echo "--register feature"
az feature register --name AAD-V2 --namespace Microsoft.ContainerService
az feature list -o table --query "[?contains(name, 'Microsoft.ContainerService/AAD-V2')].{Name:name,State:properties.state}"
az provider register --namespace Microsoft.ContainerService

# --update admin id
az aks update --resource-group "$resource_group" --name "$aks_name" --aad-tenant-id "$tenant_id" --aad-admin-group-object-ids "$admin_group_id"

#!/bin/bash

## options
while getopts g:n:t: option
do
    # shellcheck disable=SC2220
    case "${option}"
    in
        g) resource_group=${OPTARG};;
        n) aks_name=${OPTARG};;
        t) tags=${OPTARG};;
    esac
done

## reformat tags
# shellcheck disable=SC2207
tags_arr=($(echo "$tags" | jq . | jq -r 'to_entries[] | "\(.key)=\(.value)"' | tr '\n' ' '))

## update tags
for tag in "${tags_arr[@]}";
do
    echo "tag: $tag"
    az resource tag --resource-group "$resource_group" --name "$aks_name" --resource-type "Microsoft.ContainerService/ManagedClusters" -i --tags "$tag"
done

@r3mattia

This comment has been minimized.

@id27182
Copy link

id27182 commented Jun 30, 2020

I also faced with this issue. I've got this error message when I've tried to update AAD settings manually through API , however I've managed to update settings with azure cli.

@patpicos
Copy link

patpicos commented Jul 7, 2020

This error also occurs when modifying other properties of the cluster such as the max node count on a node pool

      ~ default_node_pool {
            availability_zones    = []
            enable_auto_scaling   = true
            enable_node_public_ip = false
          ~ max_count             = 3 -> 4
            max_pods              = 30
            min_count             = 3
            name                  = "default"
            node_count            = 3
            node_labels           = {}
            node_taints           = []
            orchestrator_version  = "1.17.7"
            os_disk_size_gb       = 30
            tags                  = {}
            type                  = "VirtualMachineScaleSets"
            vm_size               = "Standard_DS3_v2"
            vnet_subnet_id        = "/subscriptions/xxxxxxxxxxxxxxxxx/resourceGroups/rg-pegaplatform-network-sbox-canadacentral-persistent/providers/Microsoft.Network/virtualNetworks/vnet-pegaplatform-network-sbox-canadacentral/subnets/Private"
        }

....


Plan: 0 to add, 1 to change, 0 to destroy.

error:

Error: updating Managed Kubernetes Cluster AAD Profile in cluster "aks-pegaplatform-sbox-canadacentral" (Resource Group "rg-pegaplatform-sbox-canadacentral-persistent"): containerservice.ManagedClustersClient#ResetAADProfile: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="Operation 'resetAADProfile' is not allowed for managed AAD enabled cluster."

  on main.tf line 141, in resource "azurerm_kubernetes_cluster" "aks_cluster":
 141: resource "azurerm_kubernetes_cluster" "aks_cluster" {

@patpicos
Copy link

patpicos commented Jul 7, 2020

@julianwyngaard

This comment has been minimized.

@hieumoscow
Copy link
Contributor

Updating kubernetes version also caused this issue:
'resetAADProfile' is not allowed for managed AAD enabled cluster.

@bohlenc
Copy link

bohlenc commented Jul 22, 2020

I can reproduce the same error while updating the autoscaler configuration (e.g. update max_count 3 -> 4).
Executing the same configuration update via the Azure CLI works without issues.

Versions:
Terraform 0.12.28
terraform-provider-azurerm 2.18.0

@patpicos
Copy link

Short of it, AAD v2 is a preview feature and it was enabled in the provider. the resetAADProfile is not supported with AAD v2 clusters (from MS side). Therefore the call to reset it should be omitted if managed = true until microsoft starts supporting the call.

@PSanetra
Copy link

resetAADProfile with API version 2020-06-01 seems to support enableAzureRBAC:
https://docs.microsoft.com/en-us/rest/api/aks/managedclusters/resetaadprofile#request-body

So I guess this could be fixed by using the new API version.

@sam-cogan
Copy link
Contributor

Yeah I am seeing this when amending pool size, doing Kubernetes upgrades or changing auto scale, so unusable currently.

@patpicos I don't believe it is preview any more, all the preview markers have been removed from the docs and the old version is now referred to as legacy - https://docs.microsoft.com/en-us/azure/aks/managed-aad

@patpicos
Copy link

Yeah I am seeing this when amending pool size, doing Kubernetes upgrades or changing auto scale, so unusable currently.

@patpicos I don't believe it is preview any more, all the preview markers have been removed from the docs and the old version is now referred to as legacy - https://docs.microsoft.com/en-us/azure/aks/managed-aad

@sam-cogan that is very interesting news. This commit for the documentation confirms what you are saying. MicrosoftDocs/azure-docs@96ab8c1#diff-90a9850acdb4834ff96cc6562e19144e

I didnt see a notice in AKS release notes. Perhaps one is imminent. @PSanetra might be on the right path w/ updating the API version and see if it makes these errors go away

@patpicos
Copy link

@aristosvo
Copy link
Collaborator

We are creating a new cluster today with AAD v2 support, will let you know how it goes! Will look into it if it is not working

@sam-cogan
Copy link
Contributor

I created a new cluster yesterday and can confirm the issue is present. You do not see it at cluster creation (at least I didn't) but when you try and modify the cluster to change the amount of nodes, update version etc. you will see the issue.

@aristosvo
Copy link
Collaborator

The feature is not GA anymore, due to a delayed rollout: Azure/AKS#1489 (comment).

Also, when I deploy it with a custom build azurerm provider API version 2020-06-01, it still doesn't work and still complains if the preview feature is not enabled:

az feature register --name AAD-V2 --namespace Microsoft.ContainerService
az provider register -n Microsoft.ContainerService

I'm working on a PR at the moment, seems to work but Acceptance testing takes a little while.

@aristosvo
Copy link
Collaborator

I've implemented a fix and added Acceptance tests to cover the scenarios in this issue.

If nothing goes wrong it will make next release! 🎉

@mbfrahry mbfrahry added this to the v2.21.0 milestone Jul 28, 2020
@ghost
Copy link

ghost commented Jul 31, 2020

This has been released in version 2.21.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 2.21.0"
}
# ... other configuration ...

@c4m4
Copy link

c4m4 commented Aug 6, 2020

Upgrading provider to 2.21.0 version works :)

@tkinz27
Copy link

tkinz27 commented Aug 6, 2020

With the managed AAD how do we attach an ACR instance to the AKS cluster? Before with a manually setup service provider you would just propogate the ACR role to that principal, but as far as I can see there is no way to get access to the underlying service principal that gets setup automatically.

@sam-cogan
Copy link
Contributor

@tkinz27 your talking about two different things here. The managed AAD integration this issue refers to is related to being able to login to the cluster for admin work as an AAD user, has nothing to do with the clusters access to other resources.

Using managed identity for the cluster identity creates a user assigned managed identity which you can retrieve the name of using the "user_assigned_identity_id" of the "kubelet_identity" block. you would then grant this managed identity access to ACR.

@tkinz27
Copy link

tkinz27 commented Aug 6, 2020

Ohhh... Sorry new to Azure (coming from AWS) and all the auth has definitely been confusing. Thank you for so quickly clearing that up for me.

@sutinse1
Copy link

sutinse1 commented Aug 13, 2020

EDIT: This is working fine now, it was my loosing configuration. Thanks aristosvo!

So I added like instructed to main main.tf

  managed                 = true
  // optional:
  admin_group_object_ids  = ["myAksAdminId_NOT_group_name"]
  # these have to comment out  
  #client_app_id     = var.aad_client_app_id
  #server_app_id     = var.aad_server_app_id
  #server_app_secret = var.aad_server_app_secret
  tenant_id         = var.aad_tenant_id

WORKED!

I get still error about ResetAADProfile althoug I used v2.21.0 azurerm provider.

Error: updating Managed Kubernetes Cluster AAD Profile in cluster "sutinenseaks-aks" (Resource Group "sutinenseaks-rg"): containerservice.ManagedClustersClient#ResetAADProfile: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="Operation 'resetAADProfile' is not allowed for managed AAD enabled cluster."

on main.tf line 45, in resource "azurerm_kubernetes_cluster" "demo":
45: resource "azurerm_kubernetes_cluster" "demo" {

I up
terraform.zip
graded azurerm provider to 2.21.0
terraform init -upgrade

Upgraded also kubernetes provider 1.11.1 -> 1.12.0, not still working

terraform version
Terraform v0.13.0

  • provider registry.terraform.io/hashicorp/azurerm v2.21.0
  • provider registry.terraform.io/hashicorp/github v2.4.1
  • provider registry.terraform.io/hashicorp/kubernetes v1.12.0
  • provider registry.terraform.io/hashicorp/tls v2.1.0

My try was done according that tutorial
https://github.com/Azure/sg-aks-workshop

@aristosvo
Copy link
Collaborator

@sutinse1 Can you provide the configuration you are using?

@aristosvo
Copy link
Collaborator

aristosvo commented Aug 14, 2020

What I see is a cluster setup with AAD-v1 integration. Apparently either the backward compatibility here is a problem or you're mixing things up in your setup. I think the first is the issue, I'll run a few tests when I've time at hand.

I'd recommend for now to restructure/simplify your terraform file for the AAD integration:

resource "azurerm_kubernetes_cluster" "demo" {
...
  role_based_access_control {
    enabled = true

    azure_active_directory {
      managed = true
      // optional:
      // admin_group_object_ids  = [<AAD group object ids which you want to make cluster admin via AAD login>] 
    }
  }
...
}

@aristosvo
Copy link
Collaborator

@sutinse1 Can you explain in short what you did before you ended up with the mentioned error?

What I think you did was as follows:

  • Create the azurerm_kubernetes_cluster with the setup from the course:
resource "azurerm_kubernetes_cluster" "demo" {
...
  role_based_access_control {
    enabled = true

    azure_active_directory {
      client_app_id     = var.aad_client_app_id
      server_app_id     = var.aad_server_app_id
      server_app_secret = var.aad_server_app_secret
      tenant_id         = var.aad_tenant_id
    }
  }
...
}
  • You probably upgraded it to AAD-v2 via commandline az aks update -g myResourceGroup -n myManagedCluster --enable-aad or similar.
  • You reapplied the old configuration with Terraform.

If not, I'm very curious how your configuration ended up in the state with the error 😄

@sutinse1
Copy link

@aristosvo I did just like you wrote, I upgraded to AAD-v2 with registering feature.

// I registered AAD-V2 feature
az feature register --name AAD-V2 --namespace Microsoft.ContainerService
// created AD group for AKS
az ad group create --display-name myAKSAdmin --mail-nickname myAKSAdmin
// added myself to group
az ad group member add --group myAKSAdmin --member-id $id

// Updated cluster
groupid=$(az ad group show --groupmyAKSAdmin --query objectId --output tsv)
tenantid=$(az account show --query tenantId --output tsv)
az aks update -g myaks-rg -n myaks-aks --aad-tenant-id $tenantid --aad-admin-group-object-ids $groupid

I somehow think that terraform can query if AAD is used :) My mistake.

My configuration now (I have to uncomment SP's)

role_based_access_control {
enabled = true
azure_active_directory {
managed = true
// optional:
admin_group_object_ids = ["myAKSAdmin_groupID_not_text"]
#client_app_id = var.aad_client_app_id
#server_app_id = var.aad_server_app_id
#server_app_secret = var.aad_server_app_secret
tenant_id = var.aad_tenant_id
}
}

@ghost
Copy link

ghost commented Aug 28, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Aug 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.