-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws: Allow rolling updates for ASGs #1552
Comments
👍 Our use case pretty much depends on this to be crystal clean. Other wise rolling updates require some minimal external intervention which I am working on making obsolete. Huge +1 on this |
👍 |
👍 Likewise. Using a python script to handle this, which makes it a bit clunky to keep things simple with terraform. |
👍 |
@radeksimko totally agreed that this is desirable behavior. It's worth noting, though, that this is CloudFormation-specific behavior that's not exposed up at the AutoScaling API. I think the way we'd be able to achieve something similar would be to build some resources based on CodeDeploy: https://docs.aws.amazon.com/codedeploy/latest/APIReference/Welcome.html |
My experience with code deploy is that it's limited to installing software on running instances...so it can roll out updates to ASG instances but it doesn't know how to do a roll-out by 'terminating' instances like CF does. So something like:
These actions then could be a whole separate concept in Terraform. |
👍 just realized this. did not realize that this was implemented by AWS as a cloudformation primitive rather than an ASG primitive we could hook into. |
Is anyone experimenting with using other tools to hack around this terraform limitation? Even if we were to combine terraform with external tooling like http://docs.ansible.com/ec2_asg_module.html I'm not sure where we'd hook in. A local-exec provisioner seems like the right thing -- but those are only triggered on creation, not modification. Maybe as a placeholder before implementing a native rolling update solution terraform could offer some sort of hook for launch configuration changes that we could use to trigger an external process? Otherwise, I think we'll need to manage AMI version updates externally via ansible or some homegrown tool and then use terraform refresh to pull them in before doing plan or apply runs. It's all starting to drift away from the single-command infrastructure creation and mutation dream we had while starting to use terraform. As part of our migration to terraform/packer away from a set of legacy tools we had been planning a deployment model based on updating pre-baked AMIs created with packer on a rolling basis. Any other ideas for workarounds that we could use until terraform is able to do this sort of thing out of the box? |
I've been using bash around the AWS cli. It would be awesome to implement tasks in Go against the awslabs library and then just call them from terraform though. |
Over the weekend I wrote some code to handle blue/green deploys for our particular use case in ruby, adding to our ever growing wrapper of custom code needed to actually make terraform useful. Rather than hooking into terraform events it's ended up as a huge hack: it dynamically modifies the terraform plan to do the create of the new ASG with new AMI alongside the existing ASG. Then it uses ruby to make sure the new ASG is up and working correctly before removing the old ASG, regenerates the terraform plan file to match the new state, and finally calls terraform refresh so that the tfstate matches the new reality we've created. Would be great if this sort of workflow for mutating a running app when we've updated an AMI were built in or if there were at least easy ways to hook into terraform to add custom behavior like this beyond learning Go and Terraform internals. In our case, even if terraform could handle the ASG operations for us, we'd still like to be able to run our quick, custom sanity check script to make sure everything is working properly on the new instances before removing the old ASG from the pool. |
This feature might be the most straightforward (although awkwardly round-about) way to get rolling deploys working inside terraform: #1083 |
+1.. this would make a huge difference in some of my current workflows |
+1 |
1 similar comment
+1 |
+1 |
@woodhull that wrapper wouldn't be around somewhere we could take a looksee, would it? 😄 |
@nathanielks I copy/pasted some fragments of the code here: https://gist.github.com/woodhull/c56cbd0a68cb9b3fd1f4 It's wasn't designed for sharing or reusability, sorry about that! I hope it's enough of a code fragment to help. |
@woodhull you're a gentlemen, thanks for sharing! |
+1 |
+100 |
If anyone is trying to achieve Blue-Green deployments with AWS Auto Scaling and Terraform, I've made a note about how I did it here. Not sure if it's an ideal solution, but it's working for me and proving to be very reliable at the moment :) |
Thanks @ryandjurovich! |
👍 |
+1 just about to look at how to do this, would love to avoid writing a script to down and up instances to use the new Launch Configuration will take a look at @ryandjurovich's script also :) |
👍 |
@rvangundy Ah, I have mine inlined with a heredoc, which is probably why I didn't hit that issue. Good to know, thanks! |
Regarding "Immagine a 10 nodes db cluster with 10TB of data, spinning up a complete new cluster will cause the full resync of the 10TB of data from the old cluster to the new cluster all at once, this might saturate the network link and cause denial of service, where maybe all you wanted was to increase the number of connections." @BrunoBonacci , we are facing the same situation about rolling update. Imagine we want to bump up one version of the software running on a data node, we need the kind of rolling update "in-place". It looks like the rolling update with TF is not going to get you there. Maybe we should consider something like Ansible to deal with that? |
@shuoy Certainly Ansible is a way to "script" this behaviour but the idea of using terraform (at least for me) is to replace previous scripting tools such as Chef, puppet, Ansible as so on. I've seen different approaches to rolling update around. Kubernetes allow you the set a grace time to wait between the update of a machine to the next. This certainly could solve some of the issues, however it would work only for quite short grace time. I think the right approach it would be to provide an attribute which can be set to choose if the rolling update has to be performed automatically (based on grace period) or in stages. something like:
Which it will wait 3 minutes between a successful update to the next. While the stage rolling update could work as follow:
In this case terraform would look at the ASG for example of 10 machines, and just update one. Again, this is just a suggestion to allow a declarative (non scripted) approach to rolling update, |
@BrunoBonacci "If you have 10TB to replicate in a cloud environment it could take a while, specially if you throttle the speed to avoid network saturation. Having a grace time of 10-20 hours wouldn't be reasonable." Do we have scenario that large volume of data reside on ephemeral disk? Yes, e.g., for people use EC2 as their Hadoop cluster capacity, data is saved on ephemeral disk from cost saving perspective (EBS gives you extra cost). So, in short, I think Terraform is great at certain scenarios, but it's not accurate that Chef/Ansible can be totally replaced. Particularly in the update use case for stateful nodes. |
@brikis98 and others that are using the output example posted here, it seems to not work if you upgrade to 0.7.0-rc2. Here is the Error that is kicking out:
I am still trying to get outputs working but if anyone has any advice on how to get this working again with 0.7.x that would be awesome. |
@jdoss on first glance, it looks like that reference is using the pre-0.7 map dot-index notation. in 0.7, maps are indexed like So try changing this line to use square bracket indexing and see if that fixes it for you!
If that doesn't work I'd welcome a fresh bug report and we can dig in. 👍 |
@phinze You da best! That was the issue. 😄 |
@rvangundy -- did you keep the lifecycle hook |
@moofish32 I have |
@brikis98 - Thank you for the piece of CloudFormation code which does rolling deployments. It does not work for me, because I use spot instances in my launch configuration and when I specify
I wonder if anyone has any ideas how to implement ASG rolling update for spot instances using less moving parts? My initial idea was to make a shell script which would work similarly to aws-ha-release. I prefer to use just Terraform and CloudFormation and avoid implementation of orchestration magic. UPD: Having |
+1 |
Is there any plan to implement this in the near/far future? |
+1 |
2 similar comments
+1 |
+1 |
for automatic rollout of new launch configurations See: hashicorp/terraform#1552 (comment)
I immediately regret this suggestion. I don't mind implementing this feature, but I'm kind of at a loss on how to implement it in HCL. As the comment above would suggest, I thought about using the data source format, but linking it back to the ASG resource (Something like a Please advise on the most idiomatic way this could be represented in HCL. |
+1 |
3 similar comments
+1 |
+1 |
+1 |
Hi folks, Therefore it's more helpful for everyone to use reactions as we can then sort issues by the number of 👍 : The 👍 reactions do count and we're more than happy for people to use those and prefer over "+1" comments for the mentioned reasons. Thanks. |
👍 |
Hi everyone, While this sort of sophisticated behavior would definitely be useful, we (the Terraform team) don't have any short-term plans to implement this ourselves since we're generally recommending that people with these needs consider other solutions such as EC2 Container Service and Nomad, which either have or are more likely to grow sophisticated mechanisms for safe rollout and are in a better position to do so due to the ability to manage such multi-step state transitions. We're trying to prune stale feature requests (that aren't likely to be addressed soon) by closing them. In this case we're currently leaning towards not implementing significant additional behaviors on top of what the EC2 API natively supports, so I'm going to close this. |
After chatting with @apparentlymart privately I just want to add a few more things. We do not suggest everyone should use containers (nor that containers solve the problem entirely) and for those who prefer not to, there's a workaround - you can use Also I'm tracking this issue in my own TODO list, to not forget about it. I personally want to get this done, but it's currently pretty low on my list of priorities, PRs are certainly welcomed. |
Once #1109 is fixed, I'd like to be able to use Terraform to actually roll out the updated launch configuration and do it carefully.
Whoever decides to not roll out the update and change the LC association only should still be allowed to do so.
Here's an example from CloudFormation:
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatepolicy.html
How about something like this?
then if there's such policy defined, TF can use autoscaling API and shut down each EC2 instance separately and let ASG spin up a new one with an updated LC.
The text was updated successfully, but these errors were encountered: