Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting update_config in EKS managed node groups #1554

Closed
loupgaroublond opened this issue Aug 30, 2021 · 7 comments · Fixed by #1560
Closed

Supporting update_config in EKS managed node groups #1554

loupgaroublond opened this issue Aug 30, 2021 · 7 comments · Fixed by #1560

Comments

@loupgaroublond
Copy link

What am I really asking for?

We need to use an AWS feature, I'd like to suggest a change to this terraform module, to enable it. I am looking for permission to share an implementation, after discussing API and any other concerns with you fine folks.

Is your request related to a new offering from AWS?

Setting update_config to either a max unavailable instances or percentage of instances has been supported since the latest release of the provider, as of the time of writing.

Is your request related to a problem? Please describe.

Rolling node groups with the current settings goes slowly as EKS only rolls one node at a time. We can sometimes tolerate more disruption, and will want to ensure this process goes faster

Describe the solution you'd like.

Some kind of option, when passing in parameters for node groups into the module, to set and update the update_config block settings

Describe alternatives you've considered.

I've considered a tracheotomy, but that won't solve the issue.

Additional context

I have a proof of concept implementation in a fork, that we can start using internally at $dayjob, here's a rough sense of how it works:

I want to set either percentage or a number iff the user of the module has passed in one of those parameters. I am using two parameters, one to indicate type, "number" or "percent", and the other to represent amount. This API could be rewritten, I just set it to this, to prove the concept.

The code adds two dynamic blocks to ./modules/node_groups/node_group.tf:

  dynamic "update_config" { 
    for_each = each.value["max_unavailable_type"] == "number" ? toset([each.value["max_unavailable"]]) : toset([])
    content {
      max_unavailable = update_config.value
    }
  }

  dynamic "update_config" { 
    for_each = each.value["max_unavailable_type"] == "percent" ? toset([each.value["max_unavailable"]]) : toset([])
    content {
      max_unavailable_percentage = update_config.value
    }
  }
@daroga0002
Copy link
Contributor

If you are able please open a PR for this enhancement

@daroga0002
Copy link
Contributor

leaving here info that this will require updating AWS provider to version 3.56.0 (August 26, 2021) as per https://github.com/hashicorp/terraform-provider-aws/blob/main/CHANGELOG.md

@marianobilli
Copy link
Contributor

@daroga0002 @loupgaroublond guys I did the PR myself because I am needing it for my project
feat: Supporting update_config in EKS managed node groups

marianobilli pushed a commit to marianobilli/terraform-aws-eks that referenced this issue Sep 1, 2021
marianobilli pushed a commit to marianobilli/terraform-aws-eks that referenced this issue Sep 2, 2021
…aws-modules#1554)

Setting default as max_unavailable=1, same as default for the tf aws_eks_node_group resource.

Using a 2 level map in order to be able to set node_group_defaults and be able to overwrite with node_group specific config
@jaimehrubiks
Copy link
Contributor

May I hijack the PR to ask you how this actually works? I've read the AWS documentation related to update managed nodes, but didn't really understand the effect of these two options. Why would there be a disruption (given that the option is called max unavailable nodes or %) if first they create (more at once) new nodes, and then they just taint the old ones, one by one (waiting for pods to be migrated to the new nodes before proceeding to taint the next old node)?

@daroga0002
Copy link
Contributor

@jaimehrubiks here you have docs
https://docs.aws.amazon.com/eks/latest/userguide/managed-node-update-behavior.html

In general without that feature node groups were updated by creating same number of new nodes and moving workload to them.

So lets assume you had 10 nodes which require update, it was creating additional 10 nodes with new AMI, joined them to cluster and then moved workload under K8s. This was problem when you had big clusters with multiple nodes under single node groups. Now you can put specify:

  • Number – Select and specify the number of nodes in your nodegroup that can be updated in parallel. These nodes will be unavailable during update.

  • Percentage – Select and specify the percentage of nodes in your nodegroup that can be updated in parallel. These nodes will be unavailable during update. This is useful if you have a large number of nodes in your node group.

This allows make update oof node groups in more rolling fashion.

marianobilli pushed a commit to marianobilli/terraform-aws-eks that referenced this issue Sep 2, 2021
…aws-modules#1554)

Add documentation for update_config fields in node_groups module README.md
@daroga0002 daroga0002 added the wip label Sep 3, 2021
@antonbabenko
Copy link
Member

v17.10.0 has been just released with this feature supported.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants