Unexpected nil variable on terraform destroy #7993

kglee79 · 2016-08-04T23:04:02Z

Terraform Version

0.7.0

Affected Resource(s)

aws_security_group

Terraform Configuration Files

# ---------
# Variables
# ---------
variable "name" { default = "nat" }
variable "vpc_id" {}
variable "vpc_name" {}
variable "vpc_cidr" {}

variable "ami_id" {}
variable "public_subnet_ids" { type = "list" }
variable "instance_type" {}
variable "region" {}
variable "key_name" {}
variable "key_path" {}
variable "user" {}
variable "custom_instance_count" { description = "The number of pre-baked AMI NAT instances to create"}
variable "ami_instance_count" { description = "The number of custom NAT instances to create"}

variable "bastion_host" {}
variable "bastion_user" {}

# ---------
# Resources
# ---------
resource "aws_security_group" "nat" {
  name = "${var.name}"
  vpc_id = "${var.vpc_id}"
  description = "NAT security group"

  ingress {
    protocol = -1
    from_port = 0
    to_port = 0
    cidr_blocks = ["${var.vpc_cidr}"]
  }

  egress {
    protocol = -1
    from_port = 0
    to_port = 0
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags {
    Name = "${var.name}"
    Vpc = "${var.vpc_name}"
    Application = "NAT"
  }
}

# Build NAT manually
module "nat-custom" {
  source = "custom"

  name = "${var.name}"
  public_subnet_ids = "${var.public_subnet_ids}"
  instance_type = "${var.instance_type}"
  region = "${var.region}"
  key_name = "${var.key_name}"
  key_path = "${var.key_path}"
  user = "${var.user}"
  vpc_id = "${var.vpc_id}"
  vpc_name = "${var.vpc_name}"
  vpc_cidr = "${var.vpc_cidr}"
  bastion_host = "${var.bastion_host}"
  bastion_user = "${var.bastion_user}"
  security_group_id = "${aws_security_group.nat.id}"
  public_subnet_start_index = "0"

  instance_count = "${var.custom_instance_count}"
}

# Build NAT from pre-canned NAT AMI
module "nat-ami" {
  source = "ami"

  name = "${var.name}"
  ami_id = "${var.ami_id}"
  public_subnet_ids = "${var.public_subnet_ids}"
  instance_type = "${var.instance_type}"
  key_name = "${var.key_name}"
  vpc_id = "${var.vpc_id}"
  vpc_name = "${var.vpc_name}"
  security_group_id = "${aws_security_group.nat.id}"
  public_subnet_start_index = "${var.custom_instance_count}"

  instance_count = "${var.ami_instance_count}"
}

# ---------
# Outputs
# ---------
output "security_group_id" { value = "${aws_security_group.nat.id}" }
output "instance_ids" { value = "${concat(module.nat-ami.instance_ids, module.nat-custom.instance_ids)}" }

Debug Output

https://gist.github.com/kglee79/111d61658e700ad40d1ad367497f4d66

Expected Behavior

Expected aws_security_group nat-sg to not be nil during terraform destroy. Prior to 0.7.0 there was no issue with the configuration above. After installing 0.7.0, only on destroy the issue occurs.

Actual Behavior

nat-sg is nil on destroy.

The error pasted below:

Error creating plan: 2 error(s) occurred:

* variable "nat-sg" is nil, but no error was reported
* variable "nat-sg" is nil, but no error was reported

Steps to Reproduce

Could not reproduce in a simplified example.

The text was updated successfully, but these errors were encountered:

kglee79 · 2016-08-05T00:34:43Z

I also tried adding a null_resource right after the security group (see below) and got the same error. So it doesn't look like it is related to aws_security_group specifically, but perhaps the order in which dependencies are being "destroyed"?

resource "null_resource" "test" {
  triggers = {
    test = "aws_security_group.nat-sg.id"
  }

  depends_on = ["aws_security_group.nat-sg"]
}

Error creating plan: 4 error(s) occurred:

* variable "nat-sg" is nil, but no error was reported
* variable "nat-sg" is nil, but no error was reported
* variable "test" is nil, but no error was reported
* variable "test" is nil, but no error was reported

danielnbarbosa · 2016-08-05T02:32:31Z

I'm also running into this issue on destroy. Could be security group related as well.

* variable "elasticache" is nil, but no error was reported
* variable "mod" is nil, but no error was reported
* variable "backup" is nil, but no error was reported
* variable "rds" is nil, but no error was reported
* variable "mod" is nil, but no error was reported
* variable "mod" is nil, but no error was reported

Bowbaq · 2016-08-05T15:59:01Z

Running into this as well, can provide logs if needed

dmikalova · 2016-08-11T22:04:15Z

I just ran into this issue. I did a destroy, it waited ~4 minutes for the igw to destroy, and then moved on. Later on it said it couldn't find the igw, crashed, and after that I get the nil variable error.

* aws_internet_gateway.mod: Error waiting for internet gateway (igw-289a574c) to detach: couldn't find resource

Bowbaq · 2016-08-17T09:56:48Z

Has anybody found a workaround for this? I seem to have trouble with internet gateways like @dmikalova, and it is making it impossible to cleanly destroy infrastructure.

danielnbarbosa · 2016-08-17T14:26:14Z

Yah, its often an internet gateway issue for me too. FWIW it looks like the internet gateway issue will be addressed in 0.7.1

Bowbaq · 2016-08-17T14:28:52Z

@danielnbarbosa I think the fix in 0.7.1 is related to igw creation, not deletion.

I've done the following change locally, we'll see if things improve:

diff --git a/builtin/providers/aws/resource_aws_internet_gateway.go b/builtin/providers/aws/resource_aws_internet_gateway.go
index fe9ed3e..d4505a1 100644
--- a/builtin/providers/aws/resource_aws_internet_gateway.go
+++ b/builtin/providers/aws/resource_aws_internet_gateway.go
@@ -237,24 +237,28 @@ func resourceAwsInternetGatewayDetach(d *schema.ResourceData, meta interface{})

 // InstanceStateRefreshFunc returns a resource.StateRefreshFunc that is used to watch
 // an EC2 instance.
-func detachIGStateRefreshFunc(conn *ec2.EC2, instanceID, vpcID string) resource.StateRefreshFunc {
+func detachIGStateRefreshFunc(conn *ec2.EC2, gatewayID, vpcID string) resource.StateRefreshFunc {
        return func() (interface{}, string, error) {
                _, err := conn.DetachInternetGateway(&ec2.DetachInternetGatewayInput{
-                       InternetGatewayId: aws.String(instanceID),
+                       InternetGatewayId: aws.String(gatewayID),
                        VpcId:             aws.String(vpcID),
                })
                if err != nil {
-                       ec2err, ok := err.(awserr.Error)
-                       if ok {
-                               if ec2err.Code() == "InvalidInternetGatewayID.NotFound" {
-                                       return nil, "Not Found", err
-                               } else if ec2err.Code() == "Gateway.NotAttached" {
+                       if ec2err, ok := err.(awserr.Error); ok {
+                               switch ec2err.Code() {
+                               case "InvalidInternetGatewayID.NotFound":
+                                       log.Printf("[TRACE] Error detaching Internet Gateway '%s' from VPC '%s': %s", gatewayID, vpcID, err)
+                                       return nil, "Not Found", nil
+
+                               case "Gateway.NotAttached":
                                        return "detached", "detached", nil
-                               } else if ec2err.Code() == "DependencyViolation" {
+
+                               case "DependencyViolation":
                                        return nil, "detaching", nil
                                }
                        }
                }
+
                // DetachInternetGateway only returns an error, so if it's nil, assume we're
                // detached
                return "detached", "detached", nil

danielnbarbosa · 2016-08-17T14:46:15Z

dang. you're right. thanks for the code snippet. i'm surprised that more people aren't having this issue. i wonder what's so unique about our setup. maybe most people don't do destroy very often. i'm currently in the middle of testing a new setup so i'm often doing a full create/destroy cycle.

Bowbaq · 2016-08-17T15:47:55Z

I'm running a scaled down version of my setup in CI, so it gets built and torn down very frequently, which may be why I run into this more often

danielnbarbosa · 2016-08-18T14:31:29Z

I've confirmed that this is related to aws_security_group and aws_vpc resources. And not specific ones, all of them have this issue. Furthermore it is not related to anything in the state file. I can reproduce this error with no pre-existing state file and just doing a terraform get followed by terraform plan -destroy.

iverberk · 2016-08-22T07:17:43Z

We are running into this issue as well. Indeed related in some way to aws_security_group and/or aws_vpc. Tried to reproduce this with a small testcase but haven't been able to trigger it yet and don't have the time right now. Given the previous descriptions I don't think we are seeing a different issue. Also running TF 0.7.0. Let me know if I can help debug this issue.

Bowbaq · 2016-08-25T21:38:06Z

I've noticed that this gets triggered a lot when the internet gateway fails to delete. I dug into the logs, and the gateway was failing to delete because there were still some public IPs mapped in the VPC. I added some dependencies between the NAT gateways and the internet gateway (which causes the NAT gateways to be deleted first) and that seems to have improved the issue

brikis98 · 2016-08-29T17:10:22Z

Same issue here, but not with security groups. Instead, as far as I can tell, it is with Network ACLs:

Error creating plan: 4 error(s) occurred:

* variable "public_subnets" is nil, but no error was reported
* variable "private_app_subnets" is nil, but no error was reported
* variable "private_persistence_subnets" is nil, but no error was reported
* variable "private_persistence_subnets" is nil, but no error was reported

The only resources or variables in my templates with those IDs are Network ACLs, such as:

resource "aws_network_acl" "private_persistence_subnets" {
  vpc_id = "${var.vpc_id}"
  subnet_ids = ["${var.private_persistence_subnet_ids}"]

  tags {
    Name = "${var.vpc_name}-private-persistence-subnets"
  }
}

NeckBeardPrince · 2016-08-29T23:48:04Z

Getting this as well on TF 0.7.2

Error creating plan: 2 error(s) occurred:

* variable "elasticsearch_cluster" is nil, but no error was reported
* variable "rethinkdb_cluster" is nil, but no error was reported

chiefy · 2016-09-02T13:36:37Z

Can anyone on the team comment on this? We're thinking of reverting to 0.6.16 because this is blocking us from destroying our envs.

mxlan · 2016-09-04T18:49:14Z

Getting this similar issue during destroy and on subsiquent destroy plans.

jdblack · 2016-09-05T21:18:38Z

Still seeing this on 0.7.2

esword · 2016-09-19T15:00:54Z

Reproduced this issue with much simpler config file. Submitted as #8918.

Bowbaq · 2016-09-26T22:44:53Z

Still seeing this almost every day. I've noticed that this happens almost 100% of the time when terraform exits with an error due to a transient AWS failure. I think that the problem is the following:

the terraform configuration has a variable that's derived from a resource (id or something like that) OR a module that takes input derived from a resource
terraform partially destroys the infrastructure. the resource referenced by the variable is destroyed successfully, but then terraform exits with an error due to flakiness
on the next run of terraform destroy, the referenced resource no longer exists, causing the variable * is nil, but no error was reported error.

I've managed to get past the errors by replacing the problematic references with hardcoded values & calling destroy again

alexsomesan · 2016-10-06T11:16:10Z

I'm also seeing this when calling destroy a second time after the first destroy succeeds in removing the resources.
TF ver. 0.7.4

fgimian · 2016-10-19T21:16:44Z

I also encountered this one today while performing a terraform destroy.

The following code appeared to be the cause in my case but I haven't managed to find a workaround or similar:

resource "aws_key_pair" "keypair" {
  key_name = "my-keypair"
  public_key = "${file("key.pub")}"
}

module "shared_services_compute" {
  ...
  key_pair_name = "${aws_key_pair.keypair.key_name}"
  ...
}

Any help would be greatly appreciated
Fotis

devops-senpai · 2016-10-25T17:31:32Z

👍

mitchellh · 2016-10-25T18:02:59Z

Just tested this with the new destroy graph that we have coming in Terraform 0.8 and this is fixed. You can get this starting in the next version of Terraform by running destroy with the -Xnew-destroy flag. This will be on by default in Terraform 0.8.

Bowbaq · 2016-10-25T18:06:53Z

@mitchellh when is that new version coming out?

mitchellh · 2016-10-25T18:09:24Z

@Bowbaq we average a release every 2 weeks or so so I'd imagine within a couple weeks

alexsomesan · 2016-10-25T18:23:32Z

Mitchell, that sounds great!
Thanks a lot for all the work on the new apply graph. Read about it and am
really looking forward to it.

Could you please drop a line or two about why this thing was happening in
the old version? Just for my better understanding of terraform internals.

Thanks a lot!

On Tuesday, 25 October 2016, Mitchell Hashimoto [email protected]
wrote:

Closed #7993 #7993.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#7993 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABC-mxpD01IPImMB-N68UBFDjdZ8ZYB3ks5q3kRkgaJpZM4JdNKl
.

fgimian · 2016-10-25T19:46:44Z

Huge huge thanks for this guys, you are awesome! 😄

chiefy · 2016-10-28T14:47:26Z

@mitchellh I am still having issues - even with -Xnew-destroy flag...

terraform destroy -Xnew-destroy \
        -state=./envs/minimal/minimal.tfstate \
        -var-file=./envs/minimal/minimal.tfvars \
        -target=module.base.module.couchbase_data -target=module.base.module.couchbase_index -target=module.base.module.couchbase_query -target=module.base.module.consul -target=module.base.module.gateway -target=module.base.module.infra_router -target=module.base.module.infra_router.module.router_asg -target=module.base.module.kafka -target=module.base.module.nfs -target=module.base.module.router -target=module.base.module.router.module.router_asg -target=module.base.module.search -target=module.base.module.search.module.search_store_client -target=module.base.module.search.module.search_store_client.module.asg -target=module.base.module.search.module.search_store_data -target=module.base.module.search.module.search_store_master -target=module.base.module.security_groups -target=module.base.module.ucp_controller -target=module.base.module.ucp_standalone_node -target=module.base.module.ucp_standalone_node.module.ucp_node_asg -target=module.base.module.ucp_persistence_node -target=module.base.module.ucp_persistence_node.module.ucp_node_asg -target=module.base.module.zookeeper
Experimental feature enabled: new destroy graph. Continue?
  Experimental new destroy graph has been enabled. This may still
  have bugs, and should be used with care. If you'd like to continue,
  you must enter exactly 'yes' as a response.

  Enter a value: yes
.............
Error refreshing state: 1 error(s) occurred:

* variable "router_sg" is nil, but no error was reported
make[1]: *** [destroy] Error 1
make: *** [__main] Error 2

~/GitHub/go/src/x/x/tf-infra master* 32s
❯ terraform --version
Terraform v0.7.8-dev (9665a0f8de7a9d6f26a132cad039c215d3cb5ae8+CHANGES)

mitchellh · 2016-10-28T14:53:32Z

@chiefy Huh, I don't have a repro anymore that can get this interestingly. Can you please post a repro if you get a chance? Please open a new issue when you do so so it gets more visibility. This may be a case where some cases are fixed but some aren't.

chiefy · 2016-10-28T15:05:06Z

@mitchellh - interesting, we're doing a lot of specification with -target=module.x.y.z and if I remove that it seems to get past that. I will try to come up with a repro case.

mitchellh · 2016-10-28T15:07:31Z

@chiefy Yeah that looks like the key, the targeting. All the cases I had before didn't have targeting (but did contain modules). Any repro would help immensely and I can look into getting it fixed ASAP.

akranga · 2016-11-04T14:27:44Z

Building terraform from latest git Terraform v0.7.9-dev (24e5ea89521c8af3930c5a87a76c0bdef0037627) and running terraform destroy -refresh=true -force -Xnew-destroy -var-file ... gives

module.ami.data.aws_ami.coreos: Refreshing state...
Destroy complete! Resources: 0 destroyed.

backnight · 2016-11-09T22:09:37Z

Same behavior as @akranga with -Xnew-destroy flag (Terraform v0.7.10).

mitchellh · 2016-11-09T22:14:23Z

Hey, can one of you get me a reproduction? Happy to fix with that, hard to without. :)

chiefy · 2016-11-10T16:49:57Z

@mitchellh I tried to create a test-case, but could not get it to reproduce - is it possible it only happens with .tfstate created in older version of terraform?

mitchellh · 2016-11-10T16:51:07Z

@chiefy It is possible, but the state version last changed in 0.7.0. Between then and now the format hasn't changed.

akranga · 2016-11-10T17:18:28Z

How may I help? Do you need the state? Tfplan?

jdblack · 2016-11-10T22:57:22Z

@mitchellh Sorry for the delay. It took a few tries, but I was able to get it to reproduce on 0.7.10 while removing a redis cluster.

terraform.log.txt

By the way, are you the Hashi in Hashicorp?

mitchellh · 2016-11-11T05:58:21Z

@jdblack Thanks! I took a look and unfortunately don't see anything obvious in that log. I know you had some trouble getting that to reproduce so I apologize but if you can get that log with TF_LOG=trace env var set, that'd be GREAT. WARNING though that this might contain sensitive information so if you want to redact it or email it to me directly or something, feel free to do that.

And yep, I am the H in HashiCorp. :P

jdblack · 2016-11-11T17:16:56Z

Good news! I was able to get it with a trace. I replaced the sensitive information that I could find.

I would recommend breaking it up over more than one sitting if you intend to read the whole thing, as it's 838,498 lines. =)

I'll have a chat with our powers that be to get permissio nto share our terraform repo with you. =)

mitchellh · 2016-11-11T17:37:39Z

@jdblack Thanks, feel free to email me privately. I'm surprisingly nimble at parsing that 800,000 line log with my eyes. :)

sriraja53 · 2019-04-29T10:38:20Z

I still see this issue in Terraform v0.11.10

work around.

##Fix : #16681
##Ref : https://www.terraform.io/upgrade-guides/0-11.html#referencing-attributes-from-resources-with-count-0

Code inside submodule
output "vpc_id" {
description = "The ID of the VPC"
value = "${aws_vpc.vpc.id}"
}

Fix

output "vpc_id" {
description = "The ID of the VPC"
value = "${element(concat(aws_vpc.vpc.*.id, list("")), 0)}"
}

After the above change I ran terraform destroy it will without any issue.

ghost · 2019-07-26T02:07:12Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

stack72 added bug core labels Aug 4, 2016

Bowbaq mentioned this issue Aug 25, 2016

Terraform destroy fail #8229

Closed

mitchellh closed this as completed Oct 25, 2016

metmajer mentioned this issue Jul 10, 2018

terraform destroy error when referencing a random_id resource in a module's locals block #18424

Closed

ghost locked and limited conversation to collaborators Jul 26, 2019

Unexpected nil variable on terraform destroy #7993

Unexpected nil variable on terraform destroy #7993

Comments

kglee79 commented Aug 4, 2016

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Expected Behavior

Actual Behavior

Steps to Reproduce

kglee79 commented Aug 5, 2016

danielnbarbosa commented Aug 5, 2016 • edited Loading

Bowbaq commented Aug 5, 2016

dmikalova commented Aug 11, 2016

Bowbaq commented Aug 17, 2016 • edited Loading

danielnbarbosa commented Aug 17, 2016

Bowbaq commented Aug 17, 2016

danielnbarbosa commented Aug 17, 2016

Bowbaq commented Aug 17, 2016

danielnbarbosa commented Aug 18, 2016

iverberk commented Aug 22, 2016

Bowbaq commented Aug 25, 2016

brikis98 commented Aug 29, 2016

NeckBeardPrince commented Aug 29, 2016

chiefy commented Sep 2, 2016

mxlan commented Sep 4, 2016

jdblack commented Sep 5, 2016

esword commented Sep 19, 2016

Bowbaq commented Sep 26, 2016

alexsomesan commented Oct 6, 2016

fgimian commented Oct 19, 2016 • edited Loading

devops-senpai commented Oct 25, 2016

mitchellh commented Oct 25, 2016

Bowbaq commented Oct 25, 2016

mitchellh commented Oct 25, 2016

alexsomesan commented Oct 25, 2016

fgimian commented Oct 25, 2016

chiefy commented Oct 28, 2016

mitchellh commented Oct 28, 2016

chiefy commented Oct 28, 2016

mitchellh commented Oct 28, 2016

akranga commented Nov 4, 2016

backnight commented Nov 9, 2016

mitchellh commented Nov 9, 2016

chiefy commented Nov 10, 2016 • edited Loading

mitchellh commented Nov 10, 2016

akranga commented Nov 10, 2016

jdblack commented Nov 10, 2016

mitchellh commented Nov 11, 2016

jdblack commented Nov 11, 2016

mitchellh commented Nov 11, 2016

sriraja53 commented Apr 29, 2019

ghost commented Jul 26, 2019

danielnbarbosa commented Aug 5, 2016 •

edited

Loading

Bowbaq commented Aug 17, 2016 •

edited

Loading

fgimian commented Oct 19, 2016 •

edited

Loading

chiefy commented Nov 10, 2016 •

edited

Loading