Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to perform rolling updates via terraform for managed node groups when using custom AMI #1238

Closed
1 of 4 tasks
krishnapmv opened this issue Feb 12, 2021 · 7 comments
Closed
1 of 4 tasks

Comments

@krishnapmv
Copy link

I have issues

I'm using managed node groups in my setup with custom launch template and custom AMI. I'm using the launch template from example here. When rolling out new image id, I see that the terraform is destroying all worker nodes in one go thus causing a downtime. Is there a way to perform rolling updates via terraform?

I'm submitting a...

  • bug report
  • feature request
  • support request - read the FAQ first!
  • kudos, thank you, warm fuzzy

What is the current behavior?

All worker nodes are drained in one go during custom AMI rollouts causing instability in the kubernetes workloads.

If this is a bug, how to reproduce? Please include a code sample if relevant.

Here is the TF module code snippet I'm using: https://gist.github.com/krishnapmv/d175a0e1fb404d6fcfe92b3beeb52fa8 (in the module consumer, I set use_custom_ami = true)
and
Here is the terraform plan : https://gist.github.com/krishnapmv/4e15b27b932fe9ee4bb8c55a3fc10ed2

What's the expected behavior?

I'd expect rolling restart of worker nodes. Please note that when using official AMI, this is not a problem.

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

  • Affected module version: 13.1.0
  • Terraform version: v0.13.5

Any other relevant info

@krishnapmv krishnapmv changed the title How to perform rolling updates for managed node groups when using custom AMI How to perform rolling updates via terraform for managed node groups when using custom AMI Feb 12, 2021
@krishna-pp
Copy link

Hello, I'm able to reproduce this issue with latest terraform code from the example as well.

Steps to reproduce:

  1. Setup EKS cluster using the example here (make sure to uncomment image_id and user_data).
  2. Once the EKS cluster is ready, change image_id to a different AMI in your account.
  3. Run terraform again.

Terraform plan looks the similar to the one I've mentioned in my initial comment. All the worker nodes are drained at same time and I don't see a way to perform rolling update. This means every time an AMI is rolled out, we suffer downtime.

Please let me know if you've any questions/clarifications. Thanks!

@siku4
Copy link

siku4 commented Mar 25, 2021

Hi @krishnapmv, this seems to be a general issue when changing a launch template. Take a look at #1109

@lgg42
Copy link

lgg42 commented May 27, 2021

@krishnapmv it used to work in a rolling update fashion, it first created a new ASG, waited for all nodes to join the cluster and be in Ready state and then proceeded to Drain the old ASG by tainting the nodes and then deleting them.

I've found with version 15.2.0 this does not works anymore, basically is not honoring the create_before_destroy

BTW, have you found a solution?

@barryib
Copy link
Member

barryib commented May 27, 2021

Some of this behavior has been introduced by random_pets resources to simulate name_prefix for MNG. This introduce a lot of problems. Furthermore eks_node_group now support name_prefix, so we're about to drop random_pets.

You can track #1372. Could you please help us to test and review that PR ?

@stale
Copy link

stale bot commented Aug 25, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 25, 2021
@stale
Copy link

stale bot commented Sep 14, 2021

This issue has been automatically closed because it has not had recent activity since being marked as stale.

@stale stale bot closed this as completed Sep 14, 2021
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants