Destroy Action #649

Nalum · 2014-12-11T11:15:06Z

Is there a way to have a resource run a command before it is destroyed?

Example: I have an aws_instance that when provisioned was added to a chef servers node and client list, when this resource is destroyed I want to remove it from the node and client list.

The text was updated successfully, but these errors were encountered:

armon · 2014-12-11T17:49:44Z

Not currently, but we want to extend the provisioners to support running them at different points in the lifecycle. e.g. post-create, pre-destroy, etc.

Nalum · 2014-12-11T18:48:35Z

To contribute code do I just fork the repo, write code, make sure it doesn't break anything and does what I expect and then create a pull request or is there some other things to be done?

sethvargo · 2014-12-11T18:53:38Z

@Nalum that and add some tests 😄. You can read more about it in the Contributing.md file at the root of the repo. Feel free to ping us if you have any questions!

armon · 2014-12-11T19:29:06Z

Something like this is a pretty major change, both code wise and in terms of UX and core interaction. So I would definitely start with a design doc to open up the discussion. I'd hate for you to do a bunch of work and have us disagree on it.

/cc: @mitchellh

knuckolls · 2014-12-11T20:25:07Z

Just a +1 to all of this. Was thinking yesterday that perhaps the cleanest way to implement the logical_mutate resource that was referenced in #580 would just be to run a provisioner later in the dependency graph based upon post-create, post-update, or post-destroy events.

Along these lines, I'd also love to be able to run a provisioner based on the fact that multiple resources fired created, updated, or deleted events. I haven't dug down into the internals far enough to suggest how this might work but it's certainly something I'm hoping Terraform will support.

Nalum · 2014-12-11T21:01:33Z

Is there an example or previous design doc I can look at so I can follow your existing process.

Nalum · 2014-12-15T17:14:27Z

I've not really done any Design Docs before, I'm sure I'm missing quite a bit so let me know but here it is: https://docs.google.com/document/d/1fg5cxNDqWMDFioDXIpNbcSYiaOD7EGrnfhspfWvdkVg/edit

armon · 2014-12-16T04:24:00Z

@Nalum I think we need to address a few major concerns with the lifecycle. One is partial failures. With the current post-create provisioner, if it fails, we mark the resource as tainted, and re-create it on the next run. With pre-create I assume we could simply abort the apply of that instance. I think the delete ones are more complex. What does it mean if a pre-destroy fails? Do we taint the resource? Do we destroy anyways? I think post-destroy is the most confusing, as the resource is already gone. It's not clear at first thought what is best to do.

The other is remote provisioning. Those clearly only make sense for post-create and pre-destroy. I guess it is probably sensible to have a validation pass that is dependent on lifecycle so that declaring a "remote-exec" in the other life cycles is aborted.

mitchellh · 2014-12-16T05:02:53Z

Isn't there another issue that asks for this exact same thing? /cc @armon

Nalum · 2014-12-16T10:44:33Z

@mitchellh If this is better discussed else where I'll go there if I can.

@armon I had thought of implementing it in the following way:

provisioner "local-exec" {
    command = "..."
    pre-create = true
}

Instead of the lifecycle attribute you have each pre or post option being an attribute, this would allow you to have the pre-create on local-exec and not on remote-exec.

Would you only allow for one value, so if you have pre-create you cannot have post-create, pre-destroy or post-destroy?

Maybe post-destroy isn't a valid option as the resource is gone at this point. The only reason I put it as an option is if you had some generated data locally that you wanted to remove with the resource, but I don't know how common or rare that would be.

For me if pre-destroy was to fail I would want the resource to be tainted, but doing so with a local-exec where you have command which accepts a string value could be problematic as you can run multiple commands in that string. So, if this was to be implemented, would it be best to have command accept a list so that you can track which command failed and the next run can continue from that point?

Also should I open that document up so anyone can edit it or are you happy with it being in suggestion mode? I can add anyone to it who should be able to edit it.

mitchellh · 2014-12-16T17:21:24Z

There is definitely an issue tracking this somewhere, but this looks about along the lines of what I was thinking. I think our point of discussion revolved around "what does tainting mean" in this scenario.

armon · 2014-12-17T06:04:39Z

@Nalum I would just open the doc up for edits as well.

I think you bring up an interesting point, of can you run a provisioner at multiple life cycle points? I can't think of a case where you would want to, but maybe there is a case...

I think the partial failure behavior at least for pre-create (abort), post-create (taint), and pre-destroy (taint) make sense. Maybe by avoiding post-destroy we can avoid that complication.

Nalum · 2014-12-17T10:56:51Z

I've updated the doc and opened it up for anyone to edit.

Nalum · 2014-12-23T10:26:28Z

@armon @mitchellh Are you happy for me to go at this as defined in the doc?

It will probably take me some time as I'm only getting into Go now.

knuckolls · 2014-12-29T21:43:49Z

@Nalum fwiw i just read through your google doc and I like the format defined there.

bitglue · 2015-01-29T12:49:43Z

I would think a much cleaner way to address this problem would be creating a terraform resource that represents the entry in the Chef client list. Then the chef client entry gets created and destroyed through the normal Terraform lifecycle.

Provisioners are a hack at best, and this highlights the reason why. If you think of your infrastructure as a state machine, then Terraform is based on modelling the desired state and Terraform then selects the right CRUD operations from the providers to make the edges to get to that state. Provisioners don't model desired state, they model edges. Consequently Terraform doesn't know anything about how that provisioner altered state, and you have a bad time.

If you want a tool that models edges and not state, that tool already exists. It's called Ansible. Or Chef. Or Puppet. There's no reason you can't do both. I don't see any reason to dilute Terraform's model to support a use case that's already well-addressed by other tools.

knuckolls · 2015-01-29T18:02:09Z

I agree with you about the possibility of using custom terraform resources for things like the "clean up the chef client on instance destruction". Although, I do respectfully disagree with you about the need for a before-destroy provisioner specifically. I have a blog post largely written up about this idea as a part of a grander scheme of other things we've been working on but here's the part that's pertinent to this issue specifically.

Assume you have a cluster of stateful machines. An elasticsearch cluster for example. Anything that is clustered by default with a >1 replication factor fits this model quite nicely actually. Now, assume that we want to cut a new server image with packer and roll the cluster so that all the machines are on the new image. There are a variety of good reasons to do this. Right now Terraform does not have a safe way to drain old AMIs one by one. It can either delete the whole cluster and re-create it, or stand up the new nodes and then delete the old nodes. I believe we require stronger guarantees about how our stateful instances get destroyed. For example, we should never delete all the nodes that contain a given block worth of data until we have confirmed that the data has been replicated to the new set of instances. Create before destroy doesn't give us that assurance. It merely waits for the new instance to be up before terminating the old instance.

Terraform is very close to providing the correct primitives for such graceful draining of stateful instances. It requires two things: the parallelism semaphore deep within terraform/context.go should be exposed up to the command line UI as a "--parallelism" flag so you can enforce that a specific apply is done one node at a time. Right now parallelism is just hardcoded to a default of 10 items at a time if it isn't overridden and the current implementation of what sets up the ContextOpts doesn't override it as far as I tell. The second thing is that stateful instances controlled by terraform require the need to gracefully decomission themselves and then spin until their blocks have been replicated to the other nodes before starting the next destroy. This decomission and blocking dance can be done within before-destroy provisioners on the instance resource. Either me or someone on our team will be PRing some of this work as we get to it and I'd like to hopefully come to some sort of agreement on how it could be done that the community and Hashicorp is comfortable with.

Furthermore, tainting a stateful node may not be strong enough of an assurance when it comes to the automated destruction of stateful instances. I believe a new "decomission-on-delete" flag may need to be created so that we can provide the worst case assurance that you can flag specific instances as ones that we want humans to follow up and destroy, just in case. Decomissioning a resource would merely take it out of the statefile, or move it into a decomissioned section and whine about how the user needs to clean these up on every terraform plan. This is because we don't want Terraform to accidentally cause a "we need to recover from cold backups" scenario because too many nodes were deleted too quickly. Decomission-on-delete would put the onus on the operator to be quite certain that the state was replicated off that node before initializing it's destruction.

I'll eventually publish my thoughts much more extensively about this and related things in a more general sort of way. It did seem prudent to defend my stance that adding an on-destroy provisioner is not a dilution of Terraform's model and is instead an integral piece to the graceful rolling of stateful instances. As always, I'd be willing to make time to discuss these sorts of things on a hangout or move this discussion to it's own dedicated issue if needed.

Upon re-reading this I realize this diatribe likely falls under the umbrella of "but but someone on the internet is wrong". My apologies if it comes across as gruff in any way. This sort of thing has been a primary concern of mine for awhile now. While this may not be the final solution for graceful rolling of stateful nodes with Terraform, I believe it's one possible way that we just haven't gotten to taking care of yet. :)

bitglue · 2015-01-29T18:21:02Z

The second thing is that stateful instances controlled by terraform require the need to gracefully decomission themselves and then spin until their blocks have been replicated to the other nodes before starting the next destroy. This decomission and blocking dance can be done within before-destroy provisioners on the instance resource.

But it can also be done by making a resource that represents whatever it is you did when you provisioned the thing. This "provisioning" resource, which might represent some bit of configuration in a cluster (like a Chef client list, or whatever) gets created after the EC2 instance is created, and on destroy, the provisioning resource is destroyed before the instance is destroyed.

Furthermore, tainting a stateful node may not be strong enough of an assurance when it comes to the automated destruction of stateful instances.

You are absolutely right. "Tainting" is just a half-baked, ambiguously defined state. Since it's also an unknown state, it's impossible to do anything automatically from this state without first resetting to a known state (for example by destroying the resource) or without making restrictions on how provisioners can work (for example requiring that they are idempotent).

I believe a new "decomission-on-delete" flag may need to be created

But you already have that, it's the Delete function of a resource provider. If your provisioning is resources, then you don't need the half-baked "tainted" state: you have the real deal, actual state, defined exactly as you need it by your provider implementation.

What you are describing with a "deprovisioning" mechanism is CRUD without the R or U, and an incomplete description of the known state (tainting). It's just a half-baked implementation of what Terraform already does, but which can't recover from failures of any of the operations unless you make all operations idempotent, and if you do that, what you've done is implemented Ansible.

dalehamel · 2015-07-07T14:01:35Z

Not currently, but we want to extend the provisioners to support running them at different points in the lifecycle. e.g. post-create, pre-destroy, etc.

I agree this is a decent approach.

This is definitely related to #386

One thing that @thegedge and I discussed is that provisioners almost seem like they should just be their own resource, rather than a nested attribute of some types of resources.

thegedge · 2015-07-07T14:10:16Z

To explain why I see them as a resource: they add data to the chef server (environment, run list, etc) that follows CRUD. To me this doesn't seem terribly different from, for example, an aws_ecs_task_definition or consul_keys resource.

It really is a key component of our infrastructure at the moment, which fits with the definition of a resource. The actual bootstrapping run could be part of the provisioning block, but the client/node setup in the chef server is. Would it make sense to have a chef-server provider?

samdunne · 2015-09-21T16:15:34Z

Has there been any further movement on this? It's becoming a serious issue when coupled with #2720

davidblewett · 2015-11-09T19:43:17Z

+1 on this feature (mainly for ensuring chef stays current). In the mean time, does someone have some kind of script to clean out expired data from chef?

icebourg · 2015-11-09T19:48:24Z

I've taken to using a local provisioner to delete the Chef node and client. This is a horrible hack but it at least lets me create new instances without having to manually delete the old ones. Unfortunately, it only runs when an instance is provisioned, so if you de-provision an instance, you'll still have stale instances in Chef's database. But at least I can recreate instances without worrying what's in Chef.

  # Remove the Chef node since a terraform destroy won't. Return true no matter what.
  provisioner "local-exec" {
    command = "knife node delete -y ${var.instance_name}; knife client delete -y ${var.instance_name}; true"
  }

calvn · 2015-12-09T22:27:15Z

👍 for this feature/enhancement

Tails · 2015-12-28T09:45:12Z

+1.

This would come in handy so Terraform could have Digitalocean make a snapshot of a server image instead of deleting it and building from scratch.

eidge · 2016-01-14T13:04:50Z

+1.

Our use case is unregistering app services from consul so that they are taken out of the load balancer before killing the server.

davehodgson · 2016-01-16T13:36:11Z

+1 My use case is that some of the servers are joined to Active Directory domain, it would be very handy to have them delete themselves from the domain pre-destroy. The lack of it means that if I change the server in a way that results in a destroy and recreate the bootstrap to join the domain fails because the account name already exists

dhoer · 2016-02-23T20:30:22Z

+1 See #3605

avdhoot · 2016-03-02T13:49:19Z

+1 desperately need this 😀

jseaidou · 2016-03-21T21:03:13Z

has there been any activity on this ?

tjlee0909 · 2016-03-22T01:11:34Z

Is this issue still blocked on a philosophical debate regarding whether this feature is appropriate? Or is it a matter of developing a refined product spec for how the feature should behave, and having the time to implement it? (I'm not taking sides or passing judgment, I'm just honestly asking).

dmourati · 2016-03-22T02:33:19Z

See, e.g.:

http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/AutoScalingGroupLifecycle.html

For how this works in AWS.

DigitalRoadi3s · 2016-05-20T16:55:31Z

+1

tjlee0909 · 2016-05-24T22:26:09Z

Hey guys, we (Box) is about to start design work on implementing a wrapper around "terraform destroy" for internal use. Are there any updates on this feature request?

edit: whoops, just saw apparentlymart's proposal

apparentlymart · 2016-05-25T05:01:03Z

Don't let my complex proposal discourage a simple solution here... I was trying to collect together a number of use-cases, but my main target was the cases where we use null_resource to trigger actions on updates, rather than destroy actions.

There are a few different ways to attack destroy actions in particular so it would be nice to have a few different approaches to think about.

There are some great ideas in this thread that take a different approach to what I suggested. Trying out some of these ideas with real examples is probably the best way to converge on a solution.

eyalzek · 2016-06-09T09:47:49Z

+1

PaulusTM · 2016-06-22T10:19:43Z

I have mainly 2 use cases where this would be really helpful.

Windows Server 2012r2 instances that are a domain member. Unjoining them before destroy with a powershell command is quite easy, but I need a trigger somehow.
As provisioner/chef: unregister the client from chef server when replacing an instance #3605 mentions a Chef client needs to unregister itself

dmourati · 2016-06-24T07:13:09Z

How about /sbin/halt.local or equivalent?

https://www.redhat.com/archives/rhl-list/2007-August/msg00314.html

ashald · 2016-10-15T14:43:01Z

We have similar requirements where we need to manually de-register Consul server when we destroy the ASG with them.

saurabh-hirani · 2016-10-25T13:03:35Z

+1 - this is a must have for different stages - even if it rudimentary script execution hooks at various points in the infra lifecycle.

saurabh-hirani · 2016-10-25T14:59:52Z

I am relatively new to terraform but building on @icebourg 's example and #2831 one could do the following

terraform apply -var="deregister=1" -target=null_resource.deregister_my_resource

with

variable "deregister" {
  description = "Deregister from chef - 1 or 0"
  default = 0
}

and

resource "null_resource" "deregister_my_resource" {
  count = "${var.deregister}"
  provisioner "local-exec" {
    command = "./bin/deregister-chef.sh my_resource; true"
  }
}

with ./bin/deregister-chef.sh containing

#!/bin/bash
knife node delete -y $1
knife client delete -y $1

That way this can be called post a destroy.

5tefan · 2016-11-03T21:19:21Z

Another use case: when removing a Kuberenetes worker, it would be nice to move pods off the node and remove it before the resource is destroyed:

kubectl drain <nodename>
kubectl delete node <nodename>

Thanks @DocBradfordSoftware hermanjunge/kubernetes-digitalocean-terraform#21

amclain · 2016-11-03T21:45:04Z

Another use case: when removing a Kuberenetes worker

I had the same issue except with Nomad. When Terraform would recreate the instance(s), the containers wouldn't migrate to the live instances until nomad node-drain was run, which I ended up having to do manually.

ajbouh · 2016-11-03T23:23:32Z

I also need it for nomad and kubernetes

On Nov 3, 2016 2:45 PM, "Alex McLain" [email protected] wrote:

Another use case: when removing a Kuberenetes worker

I had the same issue except with Nomad
https://gitter.im/hashicorp-nomad/Lobby?at=58177616482c168b22d6c440.
When Terraform would recreate the instance(s), the containers wouldn't
migrate to the live nodes until nomad node-drain was run, which I ended
up having to do manually.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#649 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAAcnfSq8IrbfcfxcJUaknq9mQoEEDyyks5q6lXvgaJpZM4DHHZg
.

mitchellh · 2016-11-10T02:41:32Z

I know there is a lot of discussion here but this is identical to #386 and there is an equally large amount of discussion there so I'm going to close this and centralize there.

ghost · 2020-04-20T02:09:02Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

armon added enhancement core labels Dec 11, 2014

svanharmelen mentioned this issue May 19, 2015

Feature request: Add Chef *provider* (not provisioner) #2010

Closed

svanharmelen mentioned this issue Jul 15, 2015

since chef 12.2.0 validator is deprecated #2720

Closed

lhagemann mentioned this issue Dec 11, 2015

Subsequent run of chef provisioner on a converged instance fails #4262

Closed

partamonov mentioned this issue Jan 7, 2016

provisioner/chef: unregister the client from chef server when replacing an instance #3605

Closed

sgotti mentioned this issue Mar 2, 2016

Rolling-apply to instances, or individual instance application #2896

Closed

apparentlymart mentioned this issue May 20, 2016

Proposal: "Actions" #6810

Closed

5tefan mentioned this issue Nov 3, 2016

reducing workers leaves kube nodes in place kubernetes-digitalocean-terraform/kubernetes-digitalocean-terraform#21

Open

mitchellh closed this as completed Nov 10, 2016

dmourati mentioned this issue Nov 13, 2016

Destroy 'provisioner' for instance resources #386

Closed

ghost locked and limited conversation to collaborators Apr 20, 2020

Destroy Action #649

Destroy Action #649

Comments

Nalum commented Dec 11, 2014

armon commented Dec 11, 2014

Nalum commented Dec 11, 2014

sethvargo commented Dec 11, 2014

armon commented Dec 11, 2014

knuckolls commented Dec 11, 2014

Nalum commented Dec 11, 2014

Nalum commented Dec 15, 2014

armon commented Dec 16, 2014

mitchellh commented Dec 16, 2014

Nalum commented Dec 16, 2014

mitchellh commented Dec 16, 2014

armon commented Dec 17, 2014

Nalum commented Dec 17, 2014

Nalum commented Dec 23, 2014

knuckolls commented Dec 29, 2014

bitglue commented Jan 29, 2015

knuckolls commented Jan 29, 2015

bitglue commented Jan 29, 2015

dalehamel commented Jul 7, 2015

thegedge commented Jul 7, 2015

samdunne commented Sep 21, 2015

davidblewett commented Nov 9, 2015

icebourg commented Nov 9, 2015

calvn commented Dec 9, 2015

Tails commented Dec 28, 2015

eidge commented Jan 14, 2016

davehodgson commented Jan 16, 2016

dhoer commented Feb 23, 2016

avdhoot commented Mar 2, 2016

jseaidou commented Mar 21, 2016

tjlee0909 commented Mar 22, 2016

dmourati commented Mar 22, 2016

DigitalRoadi3s commented May 20, 2016

tjlee0909 commented May 24, 2016 • edited Loading

apparentlymart commented May 25, 2016

eyalzek commented Jun 9, 2016

PaulusTM commented Jun 22, 2016

dmourati commented Jun 24, 2016

ashald commented Oct 15, 2016

saurabh-hirani commented Oct 25, 2016

saurabh-hirani commented Oct 25, 2016 • edited Loading

5tefan commented Nov 3, 2016

amclain commented Nov 3, 2016 • edited Loading

ajbouh commented Nov 3, 2016

mitchellh commented Nov 10, 2016

ghost commented Apr 20, 2020

tjlee0909 commented May 24, 2016 •

edited

Loading

saurabh-hirani commented Oct 25, 2016 •

edited

Loading

amclain commented Nov 3, 2016 •

edited

Loading