Skip to content
This repository has been archived by the owner on Mar 25, 2022. It is now read-only.

Removing a specific OPC compute instance is causing the security list association to be removed from other instances #42

Closed
ubersol opened this issue Jul 20, 2017 · 19 comments

Comments

@ubersol
Copy link

ubersol commented Jul 20, 2017

Hello, I was hoping that you guys might be able to shed some light to the following issue I am running into.

Currently I am working on developing a deployment infrastructure with terraform-provider-opc. One of the problem we are seeing with our current TF config is that when we wanted to remove a specific resource, in this case a compute instance, some other dependencies also get deleted/removed from other instances. In this case it is the opc_compute_security_association from the other instances too.

For example, I have two instances, and I try to remove second one with the following

$ terraform plan -destroy -target=opc_compute_instance.test[1]

The plan execution then shows me the following:

$ terraform plan -destroy -target=opc_compute_instance.test[1]
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.template_file.userdata: Refreshing state...
opc_compute_ssh_key.test: Refreshing state... (ID: test-key)
opc_compute_ip_reservation.ipreservation.0: Refreshing state... (ID: 9725578a-3484-46f9-885a-675e1f62daec)
opc_compute_ip_reservation.ipreservation.1: Refreshing state... (ID: 0eb1407e-bc6e-49e8-86ef-d5a4cd10ad93)
opc_compute_storage_volume.data.0: Refreshing state... (ID: data-0)
opc_compute_storage_volume.data.1: Refreshing state... (ID: data-1)
opc_compute_instance.test.1: Refreshing state... (ID: d49a2366-7b00-4c86-84b4-fc72d21841f5)
The Terraform execution plan has been generated and is shown below.
Resources are shown in alphabetical order for quick scanning. Green resources
will be created (or destroyed and then created if an existing resource
exists), yellow resources are being changed in-place, and red resources
will be destroyed. Cyan entries are data sources to be read.

Note: You didn't specify an "-out" parameter to save this plan, so when
"apply" is called, Terraform can't guarantee this is what will execute.

- opc_compute_instance.test.1

- opc_compute_security_association.associate_SSH.0

- opc_compute_security_association.associate_SSH.1

As the output shows, TF is going to attempt to remove opc_compute_security_association.associate_SSH.0.

How do I prevent opc_compute_security_association.associate_SSH.0 being removed? That has to stay with the first instance ( opc_compute_instance.test.0 ) or any other undeleted instances. If I execute this, indeed the opc_compute_security_association.associate_SSH.0 gets deleted:

$ terraform  destroy -target=opc_compute_instance.test[1]
Do you really want to destroy?
  Terraform will delete the following infrastructure:
  	opc_compute_instance.test[1]
  There is no undo. Only 'yes' will be accepted to confirm

  Enter a value: yes

data.template_file.userdata: Refreshing state...
opc_compute_storage_volume.data.1: Refreshing state... (ID: data-1)
opc_compute_storage_volume.data.0: Refreshing state... (ID: data-0)
opc_compute_ssh_key.test: Refreshing state... (ID: test-key)
opc_compute_ip_reservation.ipreservation.1: Refreshing state... (ID: 0eb1407e-bc6e-49e8-86ef-d5a4cd10ad93)
opc_compute_ip_reservation.ipreservation.0: Refreshing state... (ID: 9725578a-3484-46f9-885a-675e1f62daec)
opc_compute_instance.test.1: Refreshing state... (ID: d49a2366-7b00-4c86-84b4-fc72d21841f5)
opc_compute_security_association.associate_SSH.1: Destroying... (ID: associate_SSH2)
opc_compute_security_association.associate_SSH.0: Destroying... (ID: associate_SSH1)
opc_compute_security_association.associate_SSH.1: Destruction complete
opc_compute_security_association.associate_SSH.0: Destruction complete
opc_compute_instance.test.1: Destroying... (ID: d49a2366-7b00-4c86-84b4-fc72d21841f5)
opc_compute_instance.test.1: Still destroying... (ID: d49a2366-7b00-4c86-84b4-fc72d21841f5, 10s elapsed)
opc_compute_instance.test.1: Still destroying... (ID: d49a2366-7b00-4c86-84b4-fc72d21841f5, 20s elapsed)
opc_compute_instance.test.1: Still destroying... (ID: d49a2366-7b00-4c86-84b4-fc72d21841f5, 30s elapsed)
opc_compute_instance.test.1: Still destroying... (ID: d49a2366-7b00-4c86-84b4-fc72d21841f5, 40s elapsed)
opc_compute_instance.test.1: Still destroying... (ID: d49a2366-7b00-4c86-84b4-fc72d21841f5, 50s elapsed)
opc_compute_instance.test.1: Still destroying... (ID: d49a2366-7b00-4c86-84b4-fc72d21841f5, 1m0s elapsed)
opc_compute_instance.test.1: Still destroying... (ID: d49a2366-7b00-4c86-84b4-fc72d21841f5, 1m10s elapsed)
opc_compute_instance.test.1: Still destroying... (ID: d49a2366-7b00-4c86-84b4-fc72d21841f5, 1m20s elapsed)
opc_compute_instance.test.1: Destruction complete

Destroy complete! Resources: 3 destroyed.

So since, opc_compute_security_association.associate_SSH.0 gets removed, this breaks my ssh access to the first instance

Moreover, when I go back to the compute UI, I also see from the Storage tab that "data-1" storage volume is not deleted but only its association with the instance is removed. This is as opposed to terraform destroy default behaviour where it really deletes every single resource and removes all of the associations.

Terraform Version

terraform -v
Terraform v0.9.11

Affected Resource(s)

Please list the resources as a list, for example:

  • opc_instance
  • opc_storage_volume
  • opc_compute_security_association

Terraform Configuration Files

main.tf 
# The user data here defines what opc-init should be doing after the images are installed
# In this case first we are creating/formatting the storage volume and then we are installing chef-client/chef-solo into
# each image that is being created.
data "template_file" "userdata" {
  template = <<JSON
{
  "userdata": {
    "pre-bootstrap": {
      "script": [
   ...skipped as not needed
      ]
    }
  }
}
JSON
}

# Reserve a public IP
resource "opc_compute_ip_reservation" "ipreservation" {
  count       = "${var.instance_count}"
  parent_pool = "/oracle/public/ippool"
  permanent   = true
}

# Creates a storage volume.  It turns out that the name must be unique so adding
# the index in the name
resource "opc_compute_storage_volume" "data" {
  count = "${var.instance_count}"
  name  = "data-${count.index}"
  size  = 10
}

# Creates an instance, attaches a storage volume, associates a public IP
# The storage volume and chef client is installed through instance_attributes where it uses opc-init in "userdata" defined in
# the data section above
resource "opc_compute_instance" "test" {
  count      = "${var.instance_count}"
  name       = "deniz-test-instance-${count.index}"
  label      = "Terraform Provisioned Instance"
  shape      = "oc3"
  image_list = "/oracle/public/OL_6.8_UEKR3_x86_64"
  ssh_keys            = ["${opc_compute_ssh_key.test.name}"]
  instance_attributes = "${data.template_file.userdata.rendered}"
# Attach the previously created storage volume
  storage {
    #volume = "${element(opc_compute_storage_volume.data.*.id, count.index)}"
    volume = "${opc_compute_storage_volume.data.*.id[count.index]}"
    index  = 1
  }
# Sets up the network and attaches the previously created IP
  networking_info {
    index          = 0
    shared_network = true
    #nat            = ["${element(opc_compute_ip_reservation.ipreservation.*.id, count.index)}"]
    nat            = ["${opc_compute_ip_reservation.ipreservation.*.id[count.index]}"]
  }
}
# outputs are printed after an apply is finished.  It is useful to pass on certain
# details to the operator. In this case Public IP of the instances are being printed out
output "public_ip" {
  value = "${opc_compute_ip_reservation.ipreservation.*.ip}"
}
secrule.tf
#secrules, seclists, sec associations

resource "opc_compute_sec_rule" "ssh-vm-secrule" {
  name             = "ssh-vm-secrule"
  source_list      = "seciplist:${opc_compute_security_ip_list.public_internet.name}"
  destination_list = "seclist:${opc_compute_security_list.allow-ssh-access.name}"
  action      = "permit"
  application = "${opc_compute_security_application.all.name}"
}

resource "opc_compute_security_application" "all" {
  name     = "all"
  protocol = "tcp"
  dport    = "22"
}

resource "opc_compute_security_ip_list" "public_internet" {
  name       = "public_internet"
  ip_entries = ["0.0.0.0/0"]
}

resource "opc_compute_security_list" "allow-ssh-access" {
  name                 = "allow-ssh-access"
  policy               = "DENY"
  outbound_cidr_policy = "PERMIT"
}

resource "opc_compute_security_association" "associate_SSH" {
  name = "${format("associate_SSH%1d", count.index + 1)}"
  count   = "${var.instance_count}"
  #vcable  = "${element(opc_compute_instance.test.*.vcable,count.index)}"
  vcable  = "${opc_compute_instance.test.*.vcable[count.index]}"
  seclist = "${element(opc_compute_security_list.allow-ssh-access.*.name,count.index)}"
  #seclist = "${opc_compute_security_list.allow-ssh-access.*.name[count.index]}"
}
variables.tf
variable instance_count {
  description = "Number of instances you want"
  default     = "3"
}

Debug Output

https://gist.github.com/ubersol/7b338ddf00457e2b3b04fbfc8603d7a8

Panic Output

No panic

Expected Behavior

The second instance should have been deleted with its security association removed without touching other instances' association. The storage volume should be destroyed associated with this instance.

Actual Behavior

The second instance indeed gets deleted, but in the process the security association for the other unrelated instances to be removed, which breaks ssh access. The storage volume does not get deleted and it is shown as online. However, its instance association gets removed.

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

First create couple instances

  1. terraform apply -var instance_count=2

Then remove the second instance

  1. terraform destroy -target=opc_compute_instance.test[1]

Important Factoids

Are there anything atypical about your accounts that we should know? For example: Running in EC2 Classic? Custom version of OpenStack? Tight ACLs?

References

There are couple of similar bugs as below, but honestly the solution is really not clear to me for the instance deletion:
hashicorp/terraform#10952
The issue above is then merged into this following one:
hashicorp/terraform#3449

@grubernaut
Copy link
Contributor

Hey @ubersol, thanks for the issue!

Just to clarify though, this happens when you have your config written with the element() interpolation function or with the [...] syntax?

@ubersol
Copy link
Author

ubersol commented Jul 20, 2017

hi @grubernaut thank you for your quick reply , the only element function I have in this whole configuration is in this line at secrule.tf:

  seclist = "${element(opc_compute_security_list.allow-ssh-access.*.name,count.index)}"

and as you can see above I did try to change this, into the following ( commented out above ):

#seclist = "${opc_compute_security_list.allow-ssh-access.*.name[count.index]}"

which resulted some kind of an index out of range error when I was testing yesterday....I'll go ahead and change that line to report the exact error I saw shortly.

@ubersol
Copy link
Author

ubersol commented Jul 20, 2017

hi @grubernaut , if in the config file I change the seclist line to

seclist = "${opc_compute_security_list.allow-ssh-access.*.name[count.index]}"

then I see the following error during plan:

$ terraform plan -var instance_count=2
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.template_file.userdata: Refreshing state...
Error running plan: 1 error(s) occurred:

* opc_compute_security_association.associate_SSH: 1 error(s) occurred:

* opc_compute_security_association.associate_SSH[1]: index 1 out of range for list opc_compute_security_list.allow-ssh-access.*.name (max 1) in:

${opc_compute_security_list.allow-ssh-access.*.name[count.index]}

@grubernaut
Copy link
Contributor

Ah interesting. Thanks for the additional information!
I believe that @apparentlymart is the most knowledgeable about this section of Terraform Core, and would be able to help you out further. But I'll try to see what I can get done as well. Thank you for your patience!

@apparentlymart
Copy link

Hi @ubersol! Sorry things aren't working well here.

The root cause of your original problem is that referring to opc_compute_instance.test.*.vcable creates a dependency on all instances of opc_compute_instance.test, since the dependency is resolved based on just the variable used rather than the entire expression. Therefore as far as Terraform can tell all of your opc_compute_security_association instances depend on all the opc_compute_instance instances, rather than just the one indicated by count.index.

In future we intend to extend Terraform with a new way of creating related collections of resources that doesn't depend on the "splat variable" mechanism and thus doesn't run into this problem. Unfortunately there's a lot of other work to do before we can make that happen, so this won't help address your issue in the short term.

My advice for now would be to use terraform taint opc_compute_instance.test.1 to explicitly taint the instance you want to destroy, and then on the next plan Terraform will attempt to replace that resource. In order for this to work as expected it's necessary to be using the square bracket index syntax, since that allows Terraform to understand that it only needs to update the other resources that refer to attributes of that instance, rather than all of them as it would if using element.

In your later comment I see that you had some trouble with the list bracket syntax. Based on your config, it looks like neither element nor the list bracket syntax are needed in that case because opc_compute_security_list.allow-ssh-access doesn't have count set:

seclist = "${opc_compute_security_list.allow-ssh-access.name}"

Sorry for all the confusing interactions here. We have started planning some configuration language features that should make this sort of thing easier and less error-prone to express in future.

(In a future version we intend to merge #12289, which will change the syntax of the above taint command to terraform taint opc_compute_instance.test[1]. This is not important right now but I'm just noting this here in case someone finds this comment in future, after that PR has been merged and released, and wonders why the example above doesn't work anymore.)

@ubersol
Copy link
Author

ubersol commented Jul 20, 2017

Hi @apparentlymart , thank you so much for the detailed and quick response! I really appreciate all of this...I am afraid the workflow is not very clear to me! Could you be so kind to verify the following steps for me? And thank you for your patience !:

So first create the instances as usual:

terraform apply -var instance_count=2

Then taint the instance I want to destroy as you mentioned in this case I am gonna remove the second instance

terraform taint opc_compute_instance.test.1

and when I do this step, I get this output:

$ terraform taint opc_compute_instance.test.1
The resource opc_compute_instance.test.1 in the module root has been marked as tainted!

so is the next step then:

$ terraform  destroy -target=opc_compute_instance.test[1]

and this will remove the second instance and leave the ssh association for the first instance intact for now? Is this right ?

@grubernaut
Copy link
Contributor

Hi @ubersol!

The terraform taint command, marks that specific resource to be deleted on the next apply of Terraform. Terraform then attempts to destroy, and replace the tainted resource on the next run of terraform apply.
You can read more about taint here: https://www.terraform.io/docs/commands/taint.html

Hope that helps! Thanks again!

@apparentlymart
Copy link

apparentlymart commented Jul 21, 2017

This was all correct until the final step. Instead of running destroy, you would instead run a normal plan, like this:

$ terraform plan -out=tfplan

Since the resource is tainted, Terraform will plan to destroy it and make a new one in its place. After verifying that Terraform did indeed produce a plan like that, and doesn't intend to replace or destroy anything else along with it, you can apply the plan:

$ terraform apply tfplan

The -target flag is provided as a way to work around tricky problems that can't be resolved any other way. It's not intended for routine use and has various limitations. terraform taint is the recommended way to tell Terraform that a particular resource needs to be replaced, and is safer to use because it can be undone using terraform untaint if the resulting plan isn't what you expected.

@ubersol
Copy link
Author

ubersol commented Jul 21, 2017

Hi @grubernaut & @apparentlymart , thank you for the information and detailed explanations!. So as I understand, using the taint option will recreate the instance, but this is not going to solve my original problem, where I want to get rid off an instance among the many instances. Is that correct ?

@grubernaut
Copy link
Contributor

@ubersol correct, if you only wish to remove the instance fully, and not replace the instance on a subsequent Terraform apply, taint will not work for that use-case.

@apparentlymart
Copy link

apparentlymart commented Jul 21, 2017

Indeed, to remove an instance and not recreate removing it from configuration. That then tells Terraform that you don't want that resource anymore.

This is trickier when count is in play, since of course you can't just remove one instance out of the middle of the sequence of counted resources... only the ones at the end of the sequence can be removed. But if you just want to remove one or more instances from the end of the sequence, this can work:

  • Decrease the count value in config. (In your case, it looks like this would be to lower the value of var.instance_count.)
  • Run terraform plan and verify that Terraform is indeed planning to destroy the tainted resources and not replace them.

To expand on my previous statement, the reason we recommend this approach of tainting and/or updating the config, rather than just destroying with target, is that it makes your intent clearer. If you run terraform destroy -target=... then this obscures what happened here since there's no trace recorded in the state or config of why that instance was destroyed. Whereas if you run terraform taint, it is clear in the state that the resource was explicitly tainted, and if you remove it from the config (presumably then checking the result into a VCS system) then you've left a record of your intent to remove it.

Doing these operations in these multiple steps also gives you the ability to check that the correct behavior is planned before applying, and to undo/revert these changes if things don't work out, before making any changes to real infrastructure. (By untaint on the resource, or reverting the configuration from VCS.)

@ubersol
Copy link
Author

ubersol commented Jul 21, 2017

Ok, This is great information! Thank you so much!

I'll give this a try! In the meantime, could you guys be kind enough to point me the right place about what "tfplan" is supposed to do or give some more information? Looks like I am creating an out file called tfplan with the following:

```terraform plan -out=tfplan``

When I look at the contents of the file, I see some data ( some encrypted ? ) that contains something similar to the output of terraform plan

I will post the steps of what I did here when I am done.

@grubernaut
Copy link
Contributor

Hey @ubersol,

terraform plan -out=<file> will create a plan file that can be used for subsequent applies.

https://www.terraform.io/docs/commands/plan.html#out-path

The plan can be saved using -out, and then provided to terraform apply to ensure only the pre-planned actions are executed.

Then you can use the pre-generated planfile during a Terraform apply to ensure that the actions listed inside the generated planfile are the only actions taken by Terraform.

https://www.terraform.io/docs/commands/apply.html

By default, apply scans the current directory for the configuration and applies the changes appropriately. However, a path to another configuration or an execution plan can be provided. Execution plans can be used to only execute a pre-determined set of actions.

Hope this helps, thanks!

@grubernaut
Copy link
Contributor

grubernaut commented Jul 21, 2017

@ubersol for now, I'm going to close this issue, as the initial problem is now solved.

In the future, though, it might be more beneficial to direct questions to either the Mailing List, Stack Exchange, or the IRC channel. The Terraform community has been amazing when it comes to either usability questions, or configuration questions; while we would prefer to keep the Issue Tracker mainly for bugs and enhancement requests. This is mainly because general questions have a tendency to be overlooked or lost, whereas the Mailing List and IRC can often yield very detailed and accurate answers to questions.

See https://www.terraform.io/community.html, for more details.

Thanks again!

@ubersol
Copy link
Author

ubersol commented Jul 21, 2017

Hi again!, thank you so much for all of this. I'd be happy to move this conversation to email lists etc, but before I do that, I think I am still having issues with this approach, and it does not seem to be doing what I am expecting, and it is very very possible that I am doing something wrong! The following is just for trying to remove/destroy three instances out of five:

Plan out for creating five instances:

terraform plan -out=tfplan -var instance_count=5

Apply this plan

terraform apply tfplan

These will create my 5 instances as expected, no problem.

Now I am going to taint three instances between instance 0 and instance 5:

terraform taint opc_compute_instance.test.1
   terraform taint opc_compute_instance.test.2
   terraform taint opc_compute_instance.test.3

This is also good, then reduce the number of count from 5 to 2 since I am getting rid off three instances:

   terraform plan -out=tfplan -var instance_count=2

and apply this plan:

   terraform apply tfplan

At this point what happens is that TF will stop and destroy
opc_compute_instance.test.1
opc_compute_instance.test.2
opc_compute_instance.test.3
unexpected
opc_compute_instance.test.4
and it will recreate opc_compute_instance.test.4. So if you had data for example in this instance, it'll get destroyed before it gets recreated. Am I missing something? What am I doing wrong? Again, apologies I am totally new to this, but there must be an easier way to remove x amount of instances out of y. Is this expected behaviour ?

Thank you again with your patience with me here!

@apparentlymart
Copy link

Hi @ubersol,

This is what I was trying to say above, but didn't explain clearly: since the instances are identified by index, it's not possible to destroy ones that aren't at the end of the sequence.

That is, in your case you could start off with five instances and reduce to two, losing the ones numbered 2, 3 and 4, but you can't specifically destroy instance 2 on its own, because that would then cause instance 3 to become instance 2, and so on.

In many cases this is acceptable because all of the instances in the set are "equivalent", and thus it doesn't matter which ones get deleted. If you have the need to specify a specific one to delete, a different design would be better: either specify each instance as a separate resource block, or move the set of resources related to an instance into a module and have a separate module block for each instance. That way you can remove the specific item you wish to remove, without any impact on others.

@ubersol
Copy link
Author

ubersol commented Jul 21, 2017

Hi @apparentlymart , ok I do understand that but if I have 5 instances and If I am attempting to remove 3 of them, why does TF attempt to recreate the last instance ( the fifth instance in this case ).

Note, it is not touching the first instance at all. I am only tainting three and TF is destroying four and recreate the last instance it destroyed.

@ubersol
Copy link
Author

ubersol commented Jul 24, 2017

Hi Guys, does the problem I described above merit a new BUG ?

@apparentlymart
Copy link

Hi @ubersol,

I'm not sure I entirely follow here. But note that in my earlier advice I wasn't meaning to suggest that you should both taint the instances and remove them from config, but rather that these are two separate things you can do:

  • If you want to replace a specific instance, you can taint it.
  • If you want to destroy a specific instance, you can remove it from config. (In your case, by decreasing the count)

Tainting something that is also removed from config is not very useful, since it's redundant... both of these things cause Terraform to produce a destroy diff on the next plan, albeit in slightly different ways.

It's possible that there's a strange interaction here if you try to do both of these things at once... possibly the index shifting caused by removing some of the items is causing Terraform to get a bit confused. Whether or not Terraform is confused, I will say that I'm confused 😀 . If you'd like to pursue this as a core bug (in the main terraform repo) then I'm happy to dig into it with you further; ideally to understand better what's going on I'd like to see a terminal transcript showing the steps you followed in order and in detail, since it's otherwise a bit hard to keep track of what's changing at which point.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants