node_pools within same subnet are created sequentially #26933

aescrob · 2024-08-05T12:58:00Z

Is there an existing issue for this?

I have searched the existing issues

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave comments along the lines of "+1", "me too" or "any updates", they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment and review the contribution guide to help.

Terraform Version

1.5.7

AzureRM Provider Version

3.114.0

Affected Resource(s)/Data Source(s)

azurerm_kubernetes_cluster_node_pool

Terraform Configuration Files

resource "azurerm_kubernetes_cluster_node_pool" "nodepool1" {
  kubernetes_cluster_id = azurerm_kubernetes_cluster.aks_cluster.id
  name                  = "az1${local.nodepool_name_suffix}"
  vm_size               = local.nodepool_immutable_config.vm_size
  vnet_subnet_id        = local.nodepool_immutable_config.subnet_id
  zones                 = ["1"]
  enable_auto_scaling   = true
  min_count             = var.agents_min_count
  max_count             = var.agents_max_count
  node_count            = var.agents_node_count
  orchestrator_version  = var.cluster_version
  snapshot_id           = local.nodepool_immutable_config.nodepool_snapshot_id
  kubelet_config {
    container_log_max_size_mb = local.kubelet_config.container_log_max_size_mb
  }

  upgrade_settings {
    max_surge = var.upgrade_settings_max_surge
  }

  lifecycle {
    create_before_destroy = true
    ignore_changes = [
      node_count, # managed by cluster autoscaler
    ]

  }
  depends_on = [azurerm_kubernetes_cluster.aks_cluster]
}

resource "azurerm_kubernetes_cluster_node_pool" "nodepool2" {
  kubernetes_cluster_id = azurerm_kubernetes_cluster.aks_cluster.id
  name                  = "az2${local.nodepool_name_suffix}"
  vm_size               = local.nodepool_immutable_config.vm_size
  vnet_subnet_id        = local.nodepool_immutable_config.subnet_id
  zones                 = ["2"]
  enable_auto_scaling   = true
  min_count             = var.agents_min_count
  max_count             = var.agents_max_count
  node_count            = var.agents_node_count
  orchestrator_version  = var.cluster_version
  snapshot_id           = local.nodepool_immutable_config.nodepool_snapshot_id
  kubelet_config {
    container_log_max_size_mb = local.kubelet_config.container_log_max_size_mb
  }
  upgrade_settings {
    max_surge = var.upgrade_settings_max_surge
  }

  lifecycle {
    create_before_destroy = true
    ignore_changes = [
      node_count, # managed by cluster autoscaler
    ]

  }
  depends_on = [azurerm_kubernetes_cluster.aks_cluster]
}

resource "azurerm_kubernetes_cluster_node_pool" "nodepool3" {
  kubernetes_cluster_id = azurerm_kubernetes_cluster.aks_cluster.id
  name                  = "az3${local.nodepool_name_suffix}"
  vm_size               = local.nodepool_immutable_config.vm_size
  vnet_subnet_id        = local.nodepool_immutable_config.subnet_id
  zones                 = ["3"]
  enable_auto_scaling   = true
  min_count             = var.agents_min_count
  max_count             = var.agents_max_count
  node_count            = var.agents_node_count
  orchestrator_version  = var.cluster_version
  snapshot_id           = local.nodepool_immutable_config.nodepool_snapshot_id
  kubelet_config {
    container_log_max_size_mb = local.kubelet_config.container_log_max_size_mb
  }
  upgrade_settings {
    max_surge = var.upgrade_settings_max_surge
  }

  lifecycle {
    create_before_destroy = true
    ignore_changes = [
      node_count, # managed by cluster autoscaler
    ]

  }
  depends_on = [azurerm_kubernetes_cluster.aks_cluster]
}

#----------------------------
# Dedicated node pool for ingress nginx
#----------------------------
resource "azurerm_kubernetes_cluster_node_pool" "nodepool_ingress_nginx" {
  kubernetes_cluster_id = azurerm_kubernetes_cluster.aks_cluster.id
  name                  = "ingress${substr(local.nodepool_name_suffix, 0, 5)}"
  vm_size               = local.nodepool_immutable_config.vm_size
  vnet_subnet_id        = local.nodepool_immutable_config.subnet_id
  zones                 = ["1", "2", "3"]
  enable_auto_scaling   = false
  node_count            = local.stages[local.cluster_stage].ingress_count
  orchestrator_version  = var.cluster_version
  snapshot_id           = local.nodepool_immutable_config.nodepool_snapshot_id
  kubelet_config {
    container_log_max_size_mb = local.kubelet_config.container_log_max_size_mb
  }

  # Label (for node selector) and taint to only schedule the ingress on this node pool
  node_labels = {
    "kubernetes.post.ch/ingress-node" = "true"
  }
  node_taints = [
    "kubernetes.post.ch/ingress-node=true:NoSchedule"
  ]

  upgrade_settings {
    max_surge = "50%"
  }

  lifecycle {
    create_before_destroy = true
  }
  depends_on = [azurerm_kubernetes_cluster.aks_cluster]
}

Debug Output/Panic Output

TestDeploy/Terraform_Init&Apply 2024-08-05T08:56:05+02:00 logger.go:66: module.aks_post.azurerm_kubernetes_cluster_node_pool.nodepool3: Creating...
TestDeploy/Terraform_Init&Apply 2024-08-05T08:56:05+02:00 logger.go:66: module.aks_post.azurerm_kubernetes_cluster_node_pool.nodepool1: Creating...
TestDeploy/Terraform_Init&Apply 2024-08-05T08:56:05+02:00 logger.go:66: module.aks_post.azurerm_kubernetes_cluster_node_pool.nodepool2: Creating...
TestDeploy/Terraform_Init&Apply 2024-08-05T08:56:05+02:00 logger.go:66: module.aks_post.azurerm_kubernetes_cluster_node_pool.nodepool_ingress_nginx: Creating...

:

TestDeploy/Terraform_Init&Apply 2024-08-05T09:03:15+02:00 logger.go:66: module.aks_post.azurerm_kubernetes_cluster_node_pool.nodepool1: Creation complete after 7m10s [id=/subscriptions/****/resourceGroups/rg-aks-ci-m98ln0012-j6l58k/providers/Microsoft.ContainerService/managedClusters/aks-ci-m98ln0012-j6l58k/agentPools/az1e7897d581]

TestDeploy/Terraform_Init&Apply 2024-08-05T09:10:05+02:00 logger.go:66: module.aks_post.azurerm_kubernetes_cluster_node_pool.nodepool3: Creation complete after 14m0s [id=/subscriptions/****/resourceGroups/rg-aks-ci-m98ln0012-j6l58k/providers/Microsoft.ContainerService/managedClusters/aks-ci-m98ln0012-j6l58k/agentPools/az3e7897d581]

TestDeploy/Terraform_Init&Apply 2024-08-05T09:17:18+02:00 logger.go:66: module.aks_post.azurerm_kubernetes_cluster_node_pool.nodepool_ingress_nginx: Creation complete after 21m13s [id=/subscriptions/****/resourceGroups/rg-aks-ci-m98ln0012-j6l58k/providers/Microsoft.ContainerService/managedClusters/aks-ci-m98ln0012-j6l58k/agentPools/ingresse7897]

TestDeploy/Terraform_Init&Apply 2024-08-05T09:24:30+02:00 logger.go:66: module.aks_post.azurerm_kubernetes_cluster_node_pool.nodepool2: Creation complete after 28m24s [id=/subscriptions/****/resourceGroups/rg-aks-ci-m98ln0012-j6l58k/providers/Microsoft.ContainerService/managedClusters/aks-ci-m98ln0012-j6l58k/agentPools/az2e7897d581]

Expected Behaviour

node_pools are created in parallel with nearly the same 'Creation complete after ' timestamp/duration

Actual Behaviour

All three node_pool 'Creating...' starting at the same time but are processed sequentially causing a longer execution time for our various tests

Steps to Reproduce

na

Important Factoids

No response

References

possibly caused by locks on subnet in kubernetes_cluster_node_pool_resource.go...introduced with kubernetes_cluster_node_pool: Fix race condition with virtual network status when creating node pool #25888

zioproto · 2024-08-05T14:09:15Z

Cc: @lonegunmanb @ms-henglu @stephybun

aescrob · 2024-08-06T14:35:42Z

Hi @ms-henglu - thank you for your PR #26939 - @zioproto fyi

We use the same vnet_subnet_id for all our node_pools.
I doubt that this change will allow them to be built in parallel as you still use locks.ByID(subnetID.ID()) - obviously identical for all node_pools.

rcskosir · 2024-09-25T17:54:28Z

Adding the link to the upstream issue here: Azure/AKS#4522

github-actions · 2024-11-16T02:14:50Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions bot added the v/3.x label Aug 5, 2024

ms-henglu mentioned this issue Aug 6, 2024

azurerm_kubernetes_cluster_node_pool - lock subnet ID instead of subnet name #26939

Closed

14 tasks

mybayern1974 added question service/kubernetes-cluster labels Aug 6, 2024

rcskosir added the enhancement label Aug 6, 2024

rcskosir added upstream/microsoft/waiting-on-service-team This label is applicable when waiting on the Microsoft Service Team and removed question labels Sep 25, 2024

stephybun linked a pull request Oct 15, 2024 that will close this issue

azurerm_kubernetes_cluster - remove subnet lock #27583

Merged

14 tasks

stephybun closed this as completed in #27583 Oct 16, 2024

github-actions bot locked as resolved and limited conversation to collaborators Nov 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node_pools within same subnet are created sequentially #26933

node_pools within same subnet are created sequentially #26933

aescrob commented Aug 5, 2024

zioproto commented Aug 5, 2024

aescrob commented Aug 6, 2024

rcskosir commented Sep 25, 2024

github-actions bot commented Nov 16, 2024

node_pools within same subnet are created sequentially #26933

node_pools within same subnet are created sequentially #26933

Comments

aescrob commented Aug 5, 2024

Is there an existing issue for this?

Community Note

Terraform Version

AzureRM Provider Version

Affected Resource(s)/Data Source(s)

Terraform Configuration Files

Debug Output/Panic Output

Expected Behaviour

Actual Behaviour

Steps to Reproduce

Important Factoids

References

zioproto commented Aug 5, 2024

aescrob commented Aug 6, 2024

rcskosir commented Sep 25, 2024

github-actions bot commented Nov 16, 2024