Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recreating lost resources does not work when resources are referenced by other resources using array element syntax #14536

Closed
sigmunau opened this issue May 16, 2017 · 14 comments

Comments

@sigmunau
Copy link

sigmunau commented May 16, 2017

Terraform Version

Terraform v0.9.6-dev
Terraform v0.9.5

Affected Resource(s)

Please list the resources as a list, for example:

  • openstack_compute_instance_v2
  • template_file
    (probably a core issue)

Terraform Configuration Files

variable "auth_url" {}
variable "domain_name" {}
variable "tenant_name" {}
variable "region" {}
variable "node_flavor" {}
variable "worker_node_flavor" {}
variable "coreos_image" {}
variable "user_name" {}
variable "password" {}
variable "network" {}

variable "worker_count" { default = 4 }

provider "openstack" {
    auth_url = "${var.auth_url}"
    domain_name = "${var.domain_name}"
    tenant_name = "${var.tenant_name}"
    user_name = "${var.user_name}"
    password = "${var.password}"
}

resource "openstack_compute_instance_v2" "worker" {
    count = "${var.worker_count}"
    name = "worker-${count.index}"
    region = "${var.region}"
    image_id = "${var.coreos_image}"
    flavor_name = "${var.worker_node_flavor}"
    network {
        uuid = "${var.network}"
    }
}

data "template_file" "workers_ansible" {
    template = "$${name} ansible_host=$${ip}"
    count = "${var.worker_count}"
    vars {
        name  = "${openstack_compute_instance_v2.worker.*.name[count.index]}"
        ip = "${openstack_compute_instance_v2.worker.*.access_ip_v4[count.index]}"
#        name  = "${element(openstack_compute_instance_v2.worker.*.name,count.index)}"
#        ip = "${element(openstack_compute_instance_v2.worker.*.access_ip_v4,count.index)}"
    }
}

output "inventory" {
    value = "${join("\n", data.template_file.workers_ansible.*.rendered)}"
}

Debug Output

https://gist.github.com/sigmunau/0c3c698bb26ec7835f146b5e49b34c3b

Expected Behavior

terraform refresh should work.
terraform plan should indicate that missing node will get recreated

Actual Behavior

Both terraform refresh and terraform plan fails with the following error:

openstack_compute_instance_v2.worker.1: Refreshing state... (ID: 73810b14-2041-4ff3-b3bb-5166ad870e51)
openstack_compute_instance_v2.worker.3: Refreshing state... (ID: 4b326211-0892-4783-9820-435fdc8eb749)
openstack_compute_instance_v2.worker.2: Refreshing state... (ID: 596dafd4-2641-4067-b15e-eb03900d82f4)
openstack_compute_instance_v2.worker.0: Refreshing state... (ID: be4a57ac-1cf1-4b59-864c-c80a5a29e67f)
data.template_file.workers_ansible.0: Refreshing state...
data.template_file.workers_ansible.1: Refreshing state...
data.template_file.workers_ansible.2: Refreshing state...
Error refreshing state: 1 error(s) occurred:

* data.template_file.workers_ansible: 1 error(s) occurred:

* data.template_file.workers_ansible[3]: index 3 out of range for list openstack_compute_instance_v2.worker.*.access_ip_v4 (max 3) in:

${openstack_compute_instance_v2.worker.*.access_ip_v4[count.index]}

Steps to Reproduce

  1. terraform apply
  2. delete one of the instances using openstack portal
  3. run terraform refresh or terraform plan

Important Factoids

Replacing array syntax with element() function (as shown in comments in code) gives the desired behaviour. Changing worker_count does not trigger the problem

References

@apparentlymart
Copy link
Contributor

Hi @sigmunau! Sorry for the problems here and thanks for reporting this.

This seems similar to #14521, so for the moment I'm going to proceed under the assumption that they are the same root cause, though I'll definitely circle back here once I have a theory over there and see if it holds up.

@apparentlymart
Copy link
Contributor

Hi again @sigmunau! Sorry for the delay in getting back to you here.

There were some fixes in this area included in 0.9.6, but looking at this again with the existing fixes in mind I'm suspecting that this is something different than what we fixed already. If you're able, it'd be useful if you could retry this with the official 0.9.6 release and let me know if it's still broken and, if it is, whether there are any differences in the error messages produced. (It's possible that changes may have affected exactly how this manifests, even if they didn't fix it.)

@rlees85
Copy link

rlees85 commented Jun 5, 2017

Hi,

This appears to be affecting me on Terraform 0.9.6 official release.

* module.test_alb_target_alternative.aws_alb_target_group_attachment.scope: 2 error(s) occurred:

* module.test_alb_target_alternative.aws_alb_target_group_attachment.scope[1]: index 1 out of range for list var.target_instances (max 1) in:

${var.target_instances[count.index]}
* module.test_alb_target_alternative.aws_alb_target_group_attachment.scope[2]: index 2 out of range for list var.target_instances (max 1) in:

${var.target_instances[count.index]}

Changing to an element type lookup works fine - and DOES select the correct items rather than repeating the first one.

Also the same code, using array syntax, worked fine in Terraform 0.9.3.

@apparentlymart
Copy link
Contributor

Thanks for the confirmation, @rlees85! I'll see if I can repro this and get it fixed.

@rlees85
Copy link

rlees85 commented Jul 19, 2017

Thanks for the reply. I see this still seems to be an issue in Terraform 0.9.11. Below is a really cut down example of how to re-create the issue:

###################################################################################################

variable "ami_id"    { default = "ami-af455dc9"    }
variable "az_name"   { default = "eu-west-1c"      }
variable "ssh_key"   { default = "yoursshkeyhere"  }
variable "subnet_id" { default = "subnet-00000000" }

###################################################################################################

variable "instance_type"                     { default = "t2.micro"  }
variable "instance_count"                    { default = 3           }
variable "instance_data_volume_name"         { default = "/dev/xvdb" }
variable "instance_data_volume_size"         { default = 1           }
variable "instance_data_volume_force_detach" { default = true        }
variable "instance_data_volume_skip_destroy" { default = false       }

###################################################################################################

resource "aws_instance" "instance" {
  count         = "${var.instance_count}"
  ami           = "${var.ami_id        }"
  key_name      = "${var.ssh_key       }"
  subnet_id     = "${var.subnet_id     }"
  instance_type = "${var.instance_type }"
}

###################################################################################################

resource "aws_ebs_volume" "volume_1" {
  count             = "${var.instance_count           }"
  size              = "${var.instance_data_volume_size}"
  availability_zone = "${var.az_name                  }"
}

###################################################################################################

resource "aws_volume_attachment" "scope" {
  count        = "${var.instance_count                                }"
  volume_id    = "${element(aws_ebs_volume.volume_1.*.id, count.index)}"
  device_name  = "${var.instance_data_volume_name                     }"
  instance_id  = "${element(aws_instance.instance.*.id, count.index)  }"
  force_detach = "${var.instance_data_volume_force_detach             }"
  skip_destroy = "${var.instance_data_volume_skip_destroy             }"
}

###################################################################################################

Steps:

  • Change the first four lines to match your AWS environment.
  • Run Terraform Apply against this script
  • Log into AWS, manually destroy ONE instance from the 3 created
  • Run Terraform Plan

Observe it wants to re-create all 3 volume attachments (for all 3 instances). This would cause a disruption in service in an environment where if a server got terminated then all other production servers had a filesystem ripped out and re-attached.

Running a Terraform Apply does indeed rip out the attachments and re-create them for ALL instances.

Terraform 0.9.3 worked fine...

Output from plan:

+ aws_instance.instance.1
    ami:                          "ami-af455dc9"
    associate_public_ip_address:  "<computed>"
    availability_zone:            "<computed>"
    ebs_block_device.#:           "<computed>"
    ephemeral_block_device.#:     "<computed>"
    instance_state:               "<computed>"
    instance_type:                "t2.micro"
    ipv6_address_count:           "<computed>"
    ipv6_addresses.#:             "<computed>"
    key_name:                     "<<removed by me>>"
    network_interface.#:          "<computed>"
    network_interface_id:         "<computed>"
    placement_group:              "<computed>"
    primary_network_interface_id: "<computed>"
    private_dns:                  "<computed>"
    private_ip:                   "<computed>"
    public_dns:                   "<computed>"
    public_ip:                    "<computed>"
    root_block_device.#:          "<computed>"
    security_groups.#:            "<computed>"
    source_dest_check:            "true"
    subnet_id:                    "<<removed by me>>"
    tenancy:                      "<computed>"
    volume_tags.%:                "<computed>"
    vpc_security_group_ids.#:     "<computed>"

-/+ aws_volume_attachment.scope.0
    device_name:  "/dev/xvdb" => "/dev/xvdb"
    force_detach: "true" => "true"
    instance_id:  "i-08d60cd9a9ef5889f" => "${element(aws_instance.instance.*.id, count.index)  }" (forces new resource)
    skip_destroy: "false" => "false"
    volume_id:    "vol-0d0555687a55fc584" => "vol-0d0555687a55fc584"

+ aws_volume_attachment.scope.1
    device_name:  "/dev/xvdb"
    force_detach: "true"
    instance_id:  "${element(aws_instance.instance.*.id, count.index)  }"
    skip_destroy: "false"
    volume_id:    "vol-09e95c5fa89a0650a"

-/+ aws_volume_attachment.scope.2
    device_name:  "/dev/xvdb" => "/dev/xvdb"
    force_detach: "true" => "true"
    instance_id:  "i-0ded09a87a5047c23" => "${element(aws_instance.instance.*.id, count.index)  }" (forces new resource)
    skip_destroy: "false" => "false"
    volume_id:    "vol-027145aa2fb5e1555" => "vol-027145aa2fb5e1555"


Plan: 4 to add, 0 to change, 2 to destroy.

@elliotweiser
Copy link

I've encountered similar behavior using Terraform 0.9.11 and openstack_compute_volume_attach_v2. Simply by tainting a node or changing the count of the compute resource, every volume attachment would have to be recreated, and thus every node is impacted by the change. I think the ideal behavior would be to confine the behavior only to the nodes that are "supposed" to change.

@dannytrigo
Copy link
Contributor

This is impacting me quite badly when creation of a resource fails. Subsequent applies hit this index out of range error. Still seems to be an issue in TF 0.10.

@rlees85
Copy link

rlees85 commented Dec 18, 2017

#16110

Also related.

Hitting me quite hard again now too. Even with resources that don't use a count. Data sources run with against the current state not the target state EVEN if there is a direct input into the module from the prerequisite module.

i.e.

module "my_security_group" {
  source = "git::blah"
  name  = "my_sg"
..
}

module "my_security_group_rule" {
  source = "git::blah-blah"
  security_group_name = "${module.my_security_group.name}"
}

my_security_group_rule has a data source that resolves the security group ID by the name. This is so things can be decoupled.

First run works, if you change the input name = "my_sg" it breaks on second apply, as the data source in my_security_group_rule is searching against the current state not the target state.

@mrubin
Copy link

mrubin commented Jan 4, 2018

Are there any workarounds to this? I am on Terraform v0.11.1. I do not use modules at all.

@OneBadSanta
Copy link

Halp! I can't figure this out. I removed the .terraform folder by accident and started receiving the same error. I've never had this issue before and have not modified any code related to the VPC and now I cant update simple stuff in my production account. Any hel[p would be greatly appreciated. thanks

@OneBadSanta
Copy link

OneBadSanta commented Jun 21, 2018

To give me detail. I am outputing this from the VPC module

value = "${module.vpc_platform.private_route_table_ids}"

and ingesting it like this

route_table_id = "${module.platform_virginia_v1.private_route_table_ids[0]}"

@lmayorga1980
Copy link

The element fixed my issue.

@hashibot
Copy link
Contributor

Hello! 🤖

This issue relates to an older version of Terraform that is no longer in active development, and because the area of Terraform it relates to has changed significantly since the issue was opened we suspect that the issue is either fixed or that the circumstances around it have changed enough that we'd need an updated issue report in order to reproduce and address it.

If you're still seeing this or a similar issue in the latest version of Terraform, please do feel free to open a new bug report! Please be sure to include all of the information requested in the template, even if it might seem redundant with the information already shared in this issue, because the internal details relating to this problem are likely to be different in the current version of Terraform.

Thanks!

@ghost
Copy link

ghost commented Sep 27, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Sep 27, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants