How to perform rolling updates via terraform for managed node groups when using custom AMI #1238

krishnapmv · 2021-02-12T13:42:52Z

I have issues

I'm using managed node groups in my setup with custom launch template and custom AMI. I'm using the launch template from example here. When rolling out new image id, I see that the terraform is destroying all worker nodes in one go thus causing a downtime. Is there a way to perform rolling updates via terraform?

I'm submitting a...

bug report
feature request
support request - read the FAQ first!
kudos, thank you, warm fuzzy

What is the current behavior?

All worker nodes are drained in one go during custom AMI rollouts causing instability in the kubernetes workloads.

If this is a bug, how to reproduce? Please include a code sample if relevant.

Here is the TF module code snippet I'm using: https://gist.github.com/krishnapmv/d175a0e1fb404d6fcfe92b3beeb52fa8 (in the module consumer, I set use_custom_ami = true)
and
Here is the terraform plan : https://gist.github.com/krishnapmv/4e15b27b932fe9ee4bb8c55a3fc10ed2

What's the expected behavior?

I'd expect rolling restart of worker nodes. Please note that when using official AMI, this is not a problem.

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

Affected module version: 13.1.0
Terraform version: v0.13.5

Any other relevant info

krishna-pp · 2021-02-13T18:22:21Z

Hello, I'm able to reproduce this issue with latest terraform code from the example as well.

Steps to reproduce:

Setup EKS cluster using the example here (make sure to uncomment image_id and user_data).
Once the EKS cluster is ready, change image_id to a different AMI in your account.
Run terraform again.

Terraform plan looks the similar to the one I've mentioned in my initial comment. All the worker nodes are drained at same time and I don't see a way to perform rolling update. This means every time an AMI is rolled out, we suffer downtime.

Please let me know if you've any questions/clarifications. Thanks!

siku4 · 2021-03-25T17:10:18Z

Hi @krishnapmv, this seems to be a general issue when changing a launch template. Take a look at #1109

lgg42 · 2021-05-27T11:22:30Z

@krishnapmv it used to work in a rolling update fashion, it first created a new ASG, waited for all nodes to join the cluster and be in Ready state and then proceeded to Drain the old ASG by tainting the nodes and then deleting them.

I've found with version 15.2.0 this does not works anymore, basically is not honoring the create_before_destroy

BTW, have you found a solution?

barryib · 2021-05-27T11:39:54Z

Some of this behavior has been introduced by random_pets resources to simulate name_prefix for MNG. This introduce a lot of problems. Furthermore eks_node_group now support name_prefix, so we're about to drop random_pets.

You can track #1372. Could you please help us to test and review that PR ?

stale · 2021-08-25T17:55:04Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2021-09-14T22:21:45Z

This issue has been automatically closed because it has not had recent activity since being marked as stale.

github-actions · 2022-11-18T02:29:55Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

krishnapmv changed the title ~~How to perform rolling updates for managed node groups when using custom AMI~~ How to perform rolling updates via terraform for managed node groups when using custom AMI Feb 12, 2021

stale bot added the stale label Aug 25, 2021

stale bot closed this as completed Sep 14, 2021

github-actions bot locked as resolved and limited conversation to collaborators Nov 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to perform rolling updates via terraform for managed node groups when using custom AMI #1238

How to perform rolling updates via terraform for managed node groups when using custom AMI #1238

krishnapmv commented Feb 12, 2021

krishna-pp commented Feb 13, 2021

siku4 commented Mar 25, 2021

lgg42 commented May 27, 2021 •

edited

Loading

barryib commented May 27, 2021

stale bot commented Aug 25, 2021

stale bot commented Sep 14, 2021

github-actions bot commented Nov 18, 2022

How to perform rolling updates via terraform for managed node groups when using custom AMI #1238

How to perform rolling updates via terraform for managed node groups when using custom AMI #1238

Comments

krishnapmv commented Feb 12, 2021

I have issues

I'm submitting a...

What is the current behavior?

If this is a bug, how to reproduce? Please include a code sample if relevant.

What's the expected behavior?

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

Any other relevant info

krishna-pp commented Feb 13, 2021

siku4 commented Mar 25, 2021

lgg42 commented May 27, 2021 • edited Loading

barryib commented May 27, 2021

stale bot commented Aug 25, 2021

stale bot commented Sep 14, 2021

github-actions bot commented Nov 18, 2022

lgg42 commented May 27, 2021 •

edited

Loading