Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing "azurerm_kubernetes_cluster" default_node_pool.vm_size forces replacement of the whole kubernetes cluster #7093

Closed
thomas-riccardi opened this issue May 26, 2020 · 26 comments · Fixed by #20628
Assignees
Labels
enhancement sdk/requires-newer-api-version This requires upgrading the version of the API being used service/kubernetes-cluster upstream/microsoft Indicates that there's an upstream issue blocking this issue/PR
Milestone

Comments

@thomas-riccardi
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform v0.12.25
+ provider.azuread v0.9.0
+ provider.azurerm v2.10.0

Affected Resource(s)

  • azurerm_kubernetes_cluster

Terraform Configuration Files

resource "azurerm_kubernetes_cluster" "k8s" {
  name                = var.aks_name
  location            = var.location
  dns_prefix          = var.aks_dns_prefix
  resource_group_name = var.resource_group_app
  kubernetes_version  = "1.16.7"

  linux_profile {
    admin_username = var.vm_user_name
    ssh_key {
      key_data = file(var.public_ssh_key_path)
    }
  }

  addon_profile {
    http_application_routing {
      enabled = false
    }
  }

  default_node_pool {
    name           = "default"
    vm_size        = "Standard_D3_v2"
    vnet_subnet_id = data.azurerm_subnet.app_aks.id

    type = "VirtualMachineScaleSets"

    enable_auto_scaling = true
    min_count           = 2
    max_count           = 4
  }

  network_profile {
    network_plugin    = "azure"
    load_balancer_sku = "standard"
  }

  service_principal {
    client_id     = data.azuread_service_principal.aks.application_id
    client_secret = var.aks_service_principal_client_secret
  }
}

Expected Behavior

Replacing the default node-pool vm_size somehow updates/replaces the default node-pool with new vm_size, ideally with some seamless rolling replacement of nodes.

Actual Behavior

Replacing the default node-pool vm_size actually forces the replacement of the whole resource, i.e. the whole cluster.

Steps to Reproduce

  1. terraform apply
  2. change default_node_pool.vm_size: I want bigger machines on my default node pool
  3. terraform apply

result:

# azurerm_kubernetes_cluster.k8s must be replaced
-/+ resource "azurerm_kubernetes_cluster" "k8s" {
...
      ~ default_node_pool {
...
          ~ vm_size               = "Standard_D3_v2" -> "Standard_D4_v2" # forces replacement
...

Important Factoids

  • I initially created my cluster with the 2.10.0 azurerm terraform provider, with the new default_node_pool property, so it's not the old transition/breaking change issue from agents pools.
  • the AKS documentation explicitly says we can delete the system node pool as long as we already have a second one. https://docs.microsoft.com/en-us/azure/aks/use-multiple-node-pools
  • I tried to manually do the operation:
    • create a new node pool: OK
    • delete the old default: the portal says I must create another "system node pool" first, but the portal doesn't provide a way to specify that.
    • az aks nodepool list doesn't talk about "system" for any node pool of the cluster, all have "mode": "User",, even the default... there may be a strange edge case here.
    • az aks nodepool update -g myResourceGroup --cluster-name myAKSCluster -n default2 --mode system sets my new node pool as system. Result: now I have "mode": "System", on that node pool
    • finally deleting the old default node pool worked!
@thomas-riccardi
Copy link
Author

thomas-riccardi commented May 28, 2020 via email

@thomas-riccardi
Copy link
Author

So this is in fact probably a duplicate of #6058, which has been fixed in #7233, which should be shipped in v2.14.0 if I understand correctly.

If my analysis is correct, don't hesitate to close the issue in favor of #6058.

Maybe let's wait v2.14.0 to test if it works.

@tombuildsstuff
Copy link
Contributor

@thomas-riccardi this isn’t a duplicate of #6058 (although that allows for new node pools, the default one isn’t cycled) - but is something we’re looking into.

@jochenrichter
Copy link

It would be great if we can do the same as on GKE. There you have the option to remove the default node pool during cluster creation.

From the terraform GKE docs:
remove_default_node_pool - (Optional) If true, deletes the default node pool upon cluster creation. If you're using google_container_node_pool resources with no default node pool, this should be set to true, alongside setting initial_node_count to at least 1.

@jemag
Copy link

jemag commented Aug 12, 2020

just to be sure, right now this is only doable through azure cli? So if I were to create a new node pool with terraform, set is as mode = "System" and then try to change de VM size or OS disk size of the default_node_pool, this situation would still be detected as needing to recreate the cluster?

@PPACI
Copy link

PPACI commented Sep 25, 2020

Report from my experience.

I was in the following situation:

  • One AKS created with a default node pool called default
  • One additional node pool called foo in system mode
  • One additional node pool called bar in system mode

Those three objects were created with terraform.

I've manually deleted (using Azure portal) default and foo. Only bar remained.

What I've done in terraform:

  • terraform state rm azurerm_kubernetes_cluster_node_pool.foo.
  • terraform state rm azurerm_kubernetes_cluster_node_pool.bar.
    • We must also remove this node pool from the state. Otherwise, terraform will want to remove it when we will move it to the default node pool in the cluster object.
  • Update the config of aks_we.azurerm_kubernetes_cluster.main to have the default_node_pool block matching the configuration of bar.

In the end, terraform is happy about my manual migration and do not report further needed change 👍

@brannondorsey
Copy link

This sounds like a great temporary solution. We need to make a few changes to our default node pool but can't afford to have the cluster recreated in the process. Are there any adverse consequences to performing this manual migration via terraform state that could come back to bite us in the future that anyone would no of? For instance, if we made a cluster or node pool k8s version upgrade in the future, or something like this.

@PPACI
Copy link

PPACI commented Oct 7, 2020

If you mean using terraform state rm as I did. Then it should be ok.
Terraform will not distinguish "migrated" cluster from native one. Updating or scaling node pool will be fine.

@brannondorsey
Copy link

Gotcha, thanks for that! Yeah I was mostly wondering if whatever design decision that has lead to the azurerm_kubernetes_cluster resource behaving the way it currently does, where recreating the default_node_pool triggers a cluster recreation, might be somehow undermined by this manual state change via terraform state rm. Or if this is simply an implementation detail that can be largely ignored.

Perhaps someone from the Azurerm provider team or the AKS team could shed some light on whether this is safe to do to a production cluster (after testing in a staging cluster of course) and there won't be long-term negative effects or incompatibilities with the cluster or how the azurerm provider treats it in the future.

@evenh
Copy link
Contributor

evenh commented Sep 27, 2021

Is there any changes planned here for the near future? I would also like (if possible) to get someone with low-level expertise to say something about the safety around doing this to a production cluster.

@marcoboffi
Copy link

We deleted a production cluster changing os_disk_type on default node pool.
I think that default nodepool concept should be deleted because it's no longer present on AKS.

@adityasundaramurthy-maersk

Are there any plans to change this? AKS (via CLI) only expects one of the nodepools to be a System nodepool. Why does Terraform not have parity with the CLI/ARM API?

@alicyn
Copy link

alicyn commented May 25, 2022

We needed to increase VM size due to higher load on our AKS cluster. As others have mentioned, AKS documentation clearly walks through the procedure to create a new System nodepool, drain the old node pool, and then simply delete it. Now our terraform state can no longer be used, because it demands to destroy the entire cluster. In a production environment, it seems like a bizarre workaround to be forced to create a new nodepool that simply has the correct "default" nodepool name just so that Terraform will accept it.

@zerodayyy
Copy link

At the moment we have to manually create second system node pool, delete the "default" one and let Terraform recreate it to change its settings without re-creating the whole cluster. Since not having a "default" node pool is legal in AKS, and furthermore this concept does not even apply any longer, why does this provider still enforce it?

@tombuildsstuff
Copy link
Contributor

@zerodayyy because unfortunately the API still requires one to create the cluster at this point in time, and the AKS Team have stated they don't plan to change that behaviour/requirement, meaning the default_node_pool is required.

We should look to support cycling the default node pool in the future, but unfortunately that's currently blocked on this issue, where the API doesn't provide a means of reconciling clusters/node pools: Azure/AKS#1972

@dwrusse
Copy link

dwrusse commented Oct 18, 2022

Our team mistakenly replaced one of our our AKS clusters due to this as well. The most damaging part of this for us is that this also replaces the nodepool's resource group, which contains other Azure resources provisioned by AKS, most notably the Azure Storage Account that our PVs were provisioned in. So, no only is your AKS cluster at risk, but external resources that you never intended to be ephemeral may be as well.

@melzayet
Copy link

Hi @tombuildsstuff, will the cycling of default node you mentioned -current blocked by Azure/AKS#1972 - fix this issue in this current thread?

In other words, will cycling help in changing the node size or disk size in a node pool? Is it part of the fix? My understanding is it will help in Terraform catching the correct state, but not change the default node pool configuration

@alex0ptr
Copy link

Maybe I'm missing something but I think we can keep the default node pool and just need to allow to overwrite/implement the lifecycle: create_before_destroy for this subresource.

The code then needs to handle this. If it is create before destroy then do exactly that. This way we satisfy the current API and support changing e.g. node size.

@alex0ptr
Copy link

We looked into this slightly more deeply. I think we can implement this without API changes. We would consider to change the state to references the node pool id and then have the implementation check which properties would be changed and either run createOrUpgrade or create a new nodepool with a hash of the config as postfix and then delete the old node pool.

Of course the schema of the default_nodepool would not be the same as the one of the nodepoll resources as the properties that force recreation are not the same anymore. This could lead to a slightly increased maintenance effort.

If we provide a PR would is it likely you would approve the solution? @tombuildsstuff

@tombuildsstuff
Copy link
Contributor

@alex0ptr unfortunately we can't do that until this issue is resolved, since at this point in time it's not possible to know when the new node pool is reliably available in an automated manner (without polling the cluster itself, which the provider may not have access to).

Once the above upstream issue is resolved we can look into solving this via introducing a temporary node pool, deleting the old default node pool and then reintroducing a new default node pool. At this time, given the bug above, we can't reliably know when the new node pool becomes available such that we could tear down the old one.

The alternative here would be for the AKS API to not require using a default node pool. Unfortunately the AKS Team have previously said that's not something that's planned, as such we're kind of in limbo until the above issue is resolved (which will also let us poll more reliably during creation, until the cluster becomes available).

Apologies that there's not much more we can do to support this until the upstream issue is resolved. As such unfortunately we wouldn't accept a PR for this at this time due to the limitations described above which could lead to larger issues. Once the upstream issue's resolved we can take a look into this - so I'd recommend reaching out to the AKS Team about this if you'd like to see this functionality.

Thanks!

@tombuildsstuff tombuildsstuff added upstream/microsoft Indicates that there's an upstream issue blocking this issue/PR sdk/requires-newer-api-version This requires upgrading the version of the API being used labels Nov 18, 2022
@alex0ptr
Copy link

via introducing a temporary node pool, deleting the old default node pool and then reintroducing a new default node pool.

I think we can go just with a new pool? My proposal more clearly: We string-append all values that would create a new node pool normally, hash it and add it as postfix to the name of the node pool. That way we have no name/id conflict and just can use this one as the default node pool.

unfortunately we can't do that until Azure/AKS#1972 is resolved

AHA! Yes, we see this regulariliy too. I'll try to get some attention to this...

@tombuildsstuff
Copy link
Contributor

@alex0ptr that still would encounter the issues outlined above (and then be subject to other issues, like failing deployments where Azure Policy is configured for various naming conventions etc) - it's unfortunate but ultimately there's not much we can do with this until Azure/AKS#1972 is resolved

@marcindulak
Copy link

@stephybun the issue is closed now, should Azure/AKS#1972 be looked at again, and closed?

@stephybun
Copy link
Member

@marcindulak updates on that issue suggest that we're still waiting for the manual reconciliation feature to go GA, I think that issue should remain open until it does and we've had a chance to take a look and see how/if this can be incorporated into the resource to prevent a state mismatch when something goes wrong.

@github-actions
Copy link

This functionality has been released in v3.47.0 of the Terraform Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement sdk/requires-newer-api-version This requires upgrading the version of the API being used service/kubernetes-cluster upstream/microsoft Indicates that there's an upstream issue blocking this issue/PR
Projects
None yet