Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max_surge is required, but not a vaild parameter for spot node pools #26465

Closed
1 task done
stockmaj opened this issue Jun 25, 2024 · 7 comments · Fixed by #26541
Closed
1 task done

max_surge is required, but not a vaild parameter for spot node pools #26465

stockmaj opened this issue Jun 25, 2024 · 7 comments · Fixed by #26541

Comments

@stockmaj
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave comments along the lines of "+1", "me too" or "any updates", they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment and review the contribution guide to help.

Terraform Version

1.8.5

AzureRM Provider Version

v3.109.0

Affected Resource(s)/Data Source(s)

azurerm_kubernetes_cluster_node_pool

Terraform Configuration Files

resource "azurerm_kubernetes_cluster_node_pool" "test" {
  name                   = "test"
  kubernetes_cluster_id  = local.id
  enable_host_encryption = true
  eviction_policy        = "Delete"
  vm_size                = "Standard_D16s_v3"
  priority               = "Spot"
  node_taints            = ["kubernetes.azure.com/scalesetpriority=spot:NoSchedule"]
  node_labels            = {
    "kubernetes.azure.com/scalesetpriority" = "spot"
  }
  orchestrator_version = local.kubernetes_version
  os_disk_type         = "Ephemeral"
  enable_auto_scaling  = true
  max_pods             = 110
  min_count            = 0
  max_count            = 30
  upgrade_settings     = {
      max_surge = "50%"
  }
  vnet_subnet_id       = local.subnet.id
  zones                = ["1", "2", "3"]
}

Debug Output/Panic Output

this is pretty straightforward without debug

Expected Behaviour

If I omit upgrade_setttings entirely, the resource is marked as changed on every apply

   ~ upgrade_settings {}

If I supply upgrade_settings with an empty object (upgrade_settings {}) then it tells me that max_surge is a required parameter

If I supply upgrade_settings with a max surge, as above, I get the error in actual behavior.

I would expect one of 2 behaviors

  1. Upgrade settings can be omitted and the resource is not marked as changed on every apply so I don't have to explain it to everyone who tries to run the module and says "Did you change this?"
  2. Upgrade settings can be provided with an empty object.

Actual Behaviour

With no upgrade_settings:

  # azurerm_kubernetes_cluster_node_pool.test will be updated in-place
  ~ resource "azurerm_kubernetes_cluster_node_pool" "test" {
        id                            = "xxx"
        name                          = "test"
        tags                          = {}
        # (33 unchanged attributes hidden)

      - upgrade_settings {}
    }

With upgrade settings supplied, but empty like this:

  upgrade_settings = {}

I get this error:

╷
│ Error: Missing required argument
│ 
│   on testtf line 23, in resource "azurerm_kubernetes_cluster_node_pool" "test":
│   23:   upgrade_settings {
│ 
│ The argument "max_surge" is required, but no definition was found.
╵

With the config above, the plan says:

  # azurerm_kubernetes_cluster_node_pool.test will be updated in-place
  ~ resource "azurerm_kubernetes_cluster_node_pool" "test" {
        id                            = "xxx"
        name                          = "test"
        tags                          = {}
        # (33 unchanged attributes hidden)

      ~ upgrade_settings {
          + max_surge = "50%"
        }
    }

but I get this error

│ Error: updating Node Pool Agent Pool (Subscription: "xxxxx"
│ Resource Group Name: "rg-test"
│ Managed Cluster Name: "test"
│ Agent Pool Name: "test"): performing CreateOrUpdate: unexpected status 400 (400 Bad Request) with response: {
│   "code": "InvalidParameter",
│   "details": null,
│   "message": "The value of parameter agentPoolProfile.upgrade.maxSurge is invalid. Error details: Spot pools can't set max surge. Please see https://aka.ms/aks-naming-rules for more details.",
│   "subcode": "",
│   "target": "agentPoolProfile.upgrade.maxSurge"
│  }
│ 
│   with azurerm_kubernetes_cluster_node_pool.test,
│   on testl.tf line 1, in resource "azurerm_kubernetes_cluster_node_pool" "test":
│    1: resource "azurerm_kubernetes_cluster_node_pool" "test" {
│ 
╵

Steps to Reproduce

run a terraform apply with a node pool that either does not specify an upgrade_settings, does not supply an upgrade_settings.max_surge, or does so on a spot nodepool, as described above.

Important Factoids

No response

References

No response

@ms-henglu
Copy link
Contributor

Hi @stockmaj ,

Thank you for taking time to report this issue!

A workaround is using the lifecycle.ignore_changes = [upgrade_settings] to ignore the changes.

@stephybun
Copy link
Member

Thanks for opening this issue @stockmaj.

We've had issues with the upgrade_settings block and max_surge property since the AKS team changed the default value for cluster versions greater than 1.28, but I am unable to reproduce this behaviour using the config provided.

I was able to successfully provision a spot node pool with the upgrade_settings block omitted with no plan diff afterwards. The only variables I can think of that could be different and causing the inconsistent behaviour we're seeing are the k8s version as well as the location you're using to provision the cluster.

Could you let me know which k8s versions you're using and where you're trying to create the cluster?

@fheinonen
Copy link

This claiming to have changes every time is happening on two AKS kubernetes clusters version 1.28.9, when I omit upgrade settings in terraform.

@stockmaj
Copy link
Author

@stephybun thanks for looking.

  • terraform version 1.9.0
  • azurerm provider 3.109.0
  • AKS k8s version 1.29.2

@stephybun
Copy link
Member

@stockmaj thanks, would you mind providing the Azure location you're trying to provision to as well?

@stockmaj
Copy link
Author

@stockmaj thanks, would you mind providing the Azure location you're trying to provision to as well?
eastus and canadacentral

Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
5 participants