Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform replacing AKS nodepool cluster when changing VM count #3835

Closed
local-master opened this issue Jul 12, 2019 · 8 comments · Fixed by #4898
Closed

Terraform replacing AKS nodepool cluster when changing VM count #3835

local-master opened this issue Jul 12, 2019 · 8 comments · Fixed by #4898

Comments

@local-master
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform v0.12.4
provider.azurerm v1.31.0

Affected Resource(s)

azurerm_kubernetes_cluster

Terraform Configuration Files

resource "azurerm_resource_group" "test" {
  location = "${var.location}"
  name = "${var.resource_group_name}"
}
resource "azurerm_kubernetes_cluster" "test" {
  name                = "${var.resource_group_name}"
  location            = "${var.location}"
  resource_group_name = "${azurerm_resource_group.test.name}"
  dns_prefix          = "test"
  kubernetes_version  = "${var.kubernetes_version}"

  linux_profile {
      admin_username  = "${var.admin_user}"
      ssh_key {
            key_data  = "${var.ssh_key}"
        }
    }

  agent_pool_profile {
    name              = "${var.agent_pool1_name}"
    count             = "${var.agent_pool1_count}"
    vm_size           = "${var.agent_pool1_vm_size}"
    os_type           = "Linux"
    os_disk_size_gb   = "${var.agent_pool1_os_disk_size}"
    vnet_subnet_id    = "${var.subnet_id}"
    max_pods          = "${var.max_pods_per_node}"
    type              = "${var.agent_pool_scaling}"
  }

  agent_pool_profile {
    name              = "${var.agent_pool2_name}"
    count             = "${var.agent_pool2_count}"
    vm_size           = "${var.agent_pool2_vm_size}"
    os_type           = "Linux"
    os_disk_size_gb   = "${var.agent_pool2_os_disk_size}"
    vnet_subnet_id    = "${var.subnet_id}"
    max_pods          = "${var.max_pods_per_node}"
    type              = "${var.agent_pool_scaling}"
  }

  service_principal {
    client_id         = "is somewhere"
    client_secret     = "is somewhere"
  }

  network_profile {
    network_plugin    = "${var.network_plugin}"
  }

  role_based_access_control {
    enabled           = "true"

Expected Behavior

Changing "count" in one of the "agent_pool_profile" and running "terraform apply" should add one more node to cluster.

Actual Behavior

Terraform replaces whole cluster ad adds new one with new number of nodes in the given nodepool. It also seems to be chaining the nodepool name around from looking at the plan

terraform plan output:

  # azurerm_kubernetes_cluster.cloudbees-jenkins-dev must be replaced

~ agent_pool_profile {
          ~ count           = 1 -> 2
          + dns_prefix      = (known after apply)
          ~ fqdn            = "test" -> (known after apply)
            max_pods        = 30
          ~ name            = "medium" -> "performance" # forces replacement
            os_disk_size_gb = 50
            os_type         = "Linux"
            type            = "VirtualMachineScaleSets"
          ~ vm_size         = "Standard_D4s_v3" -> "Standard_D8s_v3" # forces replacement
}

        }
      ~ agent_pool_profile {
            count           = 1
          + dns_prefix      = (known after apply)
          ~ fqdn            = "test" -> (known after apply)
            max_pods        = 30
          ~ name            = "performance" -> "medium" # forces replacement
            os_disk_size_gb = 50
            os_type         = "Linux"
            type            = "VirtualMachineScaleSets"
          ~ vm_size         = "Standard_D8s_v3" -> "Standard_D4s_v3" # forces replacement
}

      - service_principal {
          - client_id = "acutal_client_id" -> null
        }

### Steps to Reproduce

1. Change nodepool count from 1 to 2
2. `terraform plan or apply`

@rmb938
Copy link

rmb938 commented Jul 23, 2019

I would also like the ability to modify agent pools without having the cluster be recreated.

All this can be done via the command line without having to delete the cluster https://docs.microsoft.com/en-us/cli/azure/ext/aks-preview/aks/nodepool?view=azure-cli-latest

So it should be simple to modify the provider to do something similar.

@djsly
Copy link
Contributor

djsly commented Aug 1, 2019

@titilambert here's another on that we could tackle at the same time

@davidack
Copy link

Modifications to node pools causing the cluster to be destroyed and recreated are definitely a problem with the current version of the azurerm provider. That problem still needs to be fixed.

However in this case I believe you are running into the same problem I ran into last week: the provider is not sorting the agent_pool_profile blocks from your code before comparing them to the node pools in the current state (which appear to be returned in alphabetical order by name). Two of the four agent_pool_profile blocks in my code were not in alphabetical order by name, and running terraform plan or terraform apply would result in exactly the kind of behavior you are seeing: a plan that wanted to destroy then recreate the two node pools in question along with the cluster, while swapping all the differing parameters of the two node pools (name, vm_size etc), even if no changes had been made to the code since the last apply.

It seems to be that the provider should be sorting both the elements from the code, and the elements from the query of current state, in the same way so they can be compared properly. Should this be considered another aspect of this issue, or should I open a separate issue for it?

The workaround, until this sorting bug is fixed, is to make sure the names of your agent_pool_profile blocks are listed in your code in alphabetical order.

@artburkart
Copy link

@davidack, someone ultimately made an issue documenting what you reported: #4560

@davidack
Copy link

Thanks Art, for both the heads up and for the fix in #4676.

@ghost
Copy link

ghost commented Nov 26, 2019

This has been released in version 1.37.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 1.37.0"
}
# ... other configuration ...

@nidhi5885
Copy link

I am facing the same kind of issue:

Problem Statement : Terraform is causing Kubernetes cluster to be recreated everytime I execute the below command:
az aks get-credentials -n K8clustername -g resourcegroupname

Particulary this command is replacing my .kube/config file.

I am not getting how by executing the above command changes the terraform state.

Provider Versions I am using:
Terraform v0.12.2

  • provider.azurerm v1.39.0
  • provider.helm v0.10.4
  • provider.kubernetes v1.10.0
  • provider.local v1.4.0

@ghost
Copy link

ghost commented Jan 17, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Jan 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
9 participants