Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASG node rotation after launch configuration / template change #333

Closed
1 of 4 tasks
sc250024 opened this issue Mar 31, 2019 · 22 comments
Closed
1 of 4 tasks

ASG node rotation after launch configuration / template change #333

sc250024 opened this issue Mar 31, 2019 · 22 comments

Comments

@sc250024
Copy link
Contributor

sc250024 commented Mar 31, 2019

I have issues

I'm posting this issue before I open a PR around this in order to have a discussion around how to best handle this problem in the context of this module. Currently the module updates launch configurations, and the auto scaling group is updated with this new configuration. However, the old nodes aren't rotated out, and it's left up to the user as to how to rotate out their nodes with the old configuration.

I'm submitting a...

  • bug report
  • feature request
  • support request
  • kudos, thank you, warm fuzzy

What is the current behavior?

My example came from changing my t3.medium worker group to use t3.large. While the launch configuration and ASG were updated, the old nodes persist without any change.

What's the expected behavior?

The nodes should be rotated out by the terraform apply command using the create_before_destroy lifecycle hook to ensure that nodes are first brought up before the old ones are killed.

There's a great Medium article (Using Terraform for zero downtime updates of an Auto Scaling group in AWS) that has the approach I think should be taken. Essentially, the name of the launch template is used as a prefix for the aws_autoscaling_group resource that's created.

resource "aws_autoscaling_group" "worker" {
  # Force a redeployment when launch configuration changes.
  # This will reset the desired capacity if it was changed due to
  # autoscaling events.
  name = "${aws_launch_configuration.worker.name}-asg"

  min_size             = 10
  desired_capacity     = 15
  max_size             = 25
  health_check_type    = "EC2"
  launch_configuration = "${aws_launch_configuration.worker.name}"
  vpc_zone_identifier  = ["${aws_subnet.public.*.id}"]

  # Required to redeploy without an outage.
  lifecycle {
    create_before_destroy = true
  }
}

I would update the module to use this approach for launch configurations / launch templates in order to handle node rotation.

Are you able to fix this problem and submit a PR? Link here if you have already.

Yes, but I am curious as to how other people currently solve this issue, or if there's an approach already in the module I haven't considered.

Environment details

  • Affected module version: 2.3.1
  • OS: macOS
  • Terraform version:
Terraform v0.11.13
+ provider.aws v2.4.0
+ provider.local v1.2.0
+ provider.null v2.1.0
+ provider.template v2.1.0
@sc250024 sc250024 changed the title ASG node rotation after launch template change ASG node rotation after launch configuration / template change Mar 31, 2019
@RothAndrew
Copy link
Contributor

I'm not going to want my entire cluster to redeploy at the same time. The way I've always done it and the way I believe it should be done is to manually drain and terminate one node at a time. The way you have it up there you will still get an outage because kubernetes doesn't get the opportunity to gracefully drain the nodes. Once the new auto scaling group comes up the old one will be terminated with no grace period.

@dpiddockcmp
Copy link
Contributor

There is also a long standing bug in Terraform that means the suggested workflow no longer works.

The ASG resource does not get correctly tagged for recreation if the launch config changes name. This results in the "cannot delete launch_configuration, resource still in use" error and then a messy terraform state. hashicorp/terraform#17855

@max-rocket-internet
Copy link
Contributor

The nodes should be rotated out by the terraform apply command using the create_before_destroy lifecycle hook

1000% no 😅

I'm not going to want my entire cluster to redeploy at the same time

This is correct.

the old nodes aren't rotated out, and it's left up to the user as to how to rotate out their nodes

Exactly. This is desired.

When an ASG is deleted and all the instances are terminated, there's no graceful shutdown of pods. And some pods have complicated or time consuming processes that need to happen. These processes are correctly done in the draining process or when a pod is gracefully stopped. Some info here but specifically think about StatefulSets with Container Lifecycle Hooks. Here's some notable examples from some common Helm charts:

Yes, but I am curious as to how other people currently solve this issue, or if there's an approach already in the module I haven't considered.

Personally, for me, changes to the LC are quite a big deal. e.g. a new AMI. In this scenario I will manually drain nodes, wait for the pods to be pending, wait for the cluster-autoscaler to scale up, wait for pods to be started on the new nodes and then repeat. If your k8s workload is completely stateless then you could be more aggressive here but it really depends.

So perhaps we could implement the more aggressive option of ASG replacement as an option? Currently I think not as you can't have interpolation in lifecycle rules, unfortunately.

@sc250024 perhaps we could solve your problem a different way? Perhaps you could define your ASG outside of this module? Perhaps there's some missing outputs that would make this easy to implement?

@max-rocket-internet
Copy link
Contributor

but specifically think about StatefulSets with Container Lifecycle Hooks

An even simpler example: just a database that has an EBS volume mounted in the pod might need some time to shutdown.

@joseph-wortmann
Copy link

This is a bigger issue when the module deletes and recreates a launch template. The ASG's do not get updated with the new launch template, leaving ASG's that are basically zombies.

@sc250024
Copy link
Contributor Author

sc250024 commented Apr 5, 2019

Thanks everyone for their feedback! I guess this a controversial feature to have, and it seems right now is not the time to think about it. Personally, I'm coming from using Kops, and one feature that it has (that @RothAndrew and @max-rocket-internet mentioned) is to be able to rotate nodes out, one at a time, until the change you want is gone.

Does anyone have a link to an open-source project to do what Kops does, i.e. rotate nodes out one at a time, safely?

@barryib
Copy link
Member

barryib commented Aug 1, 2019

What if we use cloud formation (aws_cloudformation_stack) to do rolling upgrade (node by node) ?

@sc250024
Copy link
Contributor Author

sc250024 commented Aug 1, 2019

@barryib I think that's more in line with what eksctl does, right? I think what I originally suggested is out of scope for this module. Frankly, I'm beginning to see more of the point of view of the replies. Rotating out nodes should be something manual, or if it's in a pipeline, something done with some care and supervision. Closing this.

@sc250024 sc250024 closed this as completed Aug 1, 2019
@mlafeldt
Copy link

mlafeldt commented Aug 1, 2019

What if we use cloud formation (aws_cloudformation_stack) to do rolling upgrade (node by node) ?

While CloudFormation gives you a lot of control over the update process, e.g. replacing nodes 1 by 1, you'd still need to cordon and drain nodes. Unfortunately, that's not possible using vanilla CloudFormation.

@barryib
Copy link
Member

barryib commented Aug 6, 2019

@sc250024 @mlafeldt any thoughts on this #462 ?

@mlafeldt
Copy link

mlafeldt commented Aug 6, 2019

https://github.com/aws-samples/amazon-k8s-node-drainer sounds very interesting. Going to give it a try. 👍

At this point, I'm using eksctl to deploy EKS clusters and update worker node groups, mainly because the upgrade path is too cumbersome with Terraform (which I prefer otherwise).

@mlafeldt
Copy link

mlafeldt commented Aug 13, 2019

https://github.com/aws-samples/amazon-k8s-node-drainer sounds very interesting. Going to give it a try. 👍

After fixing a bug, I can report that the node drainer works as expected.

  • I tested it with plain Terraform by forcing the recreation of an ASG after changing the AMI ID. That, as expected, didn't work well as the replacement of the ASG will terminate all instances at once. That, in turn, will cause the node drainer to drain all nodes at the same time. Bad.

  • By using CloudFormation (via aws_cloudformation_stack) to manage the ASG, I was able to replace nodes one by one, so the node drainer could safely drain nodes one by one.

@sc250024
Copy link
Contributor Author

@mlafeldt I'll give it a try as well

@barryib
Copy link
Member

barryib commented Aug 19, 2019

@mlafeldt Thanks for your feedback.

I tested it with plain Terraform by forcing the recreation of an ASG after changing the AMI ID. That, as expected, didn't work well as the replacement of the ASG will terminate all instances at once. That, in turn, will cause the node drainer to drain all nodes at the same time. Bad.

Yes, this is an expected behavior. I was thinking to use SQS for notification with 1 as batch size of 1 for the lambda function to drain nodes one by one.

By using CloudFormation (via aws_cloudformation_stack) to manage the ASG, I was able to replace nodes one by one, so the node drainer could safely drain nodes one by one.

Great. The AutoScalingRollingUpdate of cloudformation is awsome for this use case. Unfortunatly, the terraform-aws-eks module doesn't support cfn and it sounds like it won't support either.

@mlafeldt
Copy link

One thing that's also possible with Terraform is to add a provisioner script to the launch configuration that will drain nodes of the old ASG before deleting it.

  provisioner "local-exec" {
    command = "./provisioner.sh ${local.image_id}"
  }

The script might use node labels added via user data (e.g. --kubelet-extra-args '--node-labels=ami_id=${local.image_id}') to know what nodes to drain.

Here's an example script I used for testing:

#!/bin/bash

set -e
set -x

new_ami_id="$1"

kubectl cordon --selector "ami_id!=${new_ami_id}"

nodes=$(kubectl get node --selector "ami_id!=${new_ami_id}" -o json | jq -r '.items[].metadata.name')

for node in $nodes; do
    kubectl drain "$node" --ignore-daemonsets=true --delete-local-data --force
    sleep 60
done

However, unless this script is super robust (which it clearly isn't at the moment), I consider this a hack more than anything else.

@sc250024
Copy link
Contributor Author

@mlafeldt That's pretty much what I do locally; let this module update the ASGs, and run something like that script.

For the setup of the Cloudformation/ SQS, maybe that could be documented in this repository? Whether or not it's actually implemented in the module is up for debate, but it'd be super cool if there was a document describing how to do it in general.

@max-rocket-internet
Copy link
Contributor

The script might use node labels added via user data

You can also just filter by kubeletVersion:

kubectl get nodes -o json  | jq -r '.items[] | select(.status.nodeInfo.kubeletVersion == "v1.11.9") | .metadata.name'

@mlafeldt
Copy link

@mlafeldt That's pretty much what I do locally; let this module update the ASGs, and run something like that script.

The advantage of calling the script via local-exec is that you would never have to run it manually, decoupled from Terraform. At least in theory. 😁 I'm still not sure it's a viable path.

@barryib
Copy link
Member

barryib commented Aug 23, 2019

FWIW, I tested the lambda function with SQS (autoscaling group can send lifecycle hook events into a queue). I was able to spawn new nodes and to drain old nodes in batch (one by one in my case). For now, everything looks good. I'll try to test more in the next couple of days.

PS: I found another lambda for EKS draining https://github.com/awslabs/amazon-eks-serverless-drainer which is written in few lines of bash.

@manfredlift
Copy link

@barryib, do you have an example setup for this?

@Vermyndax
Copy link

We've put some thought into this too, but what we're leaning toward is creation of ASG scheduled actions in Terraform to handle this rotation for us.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants