Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws: aws_elasticache_cluster doesn't wait till completed #2732

Closed
mzupan opened this issue Jul 15, 2015 · 11 comments · Fixed by #2842
Closed

aws: aws_elasticache_cluster doesn't wait till completed #2732

mzupan opened this issue Jul 15, 2015 · 11 comments · Fixed by #2842

Comments

@mzupan
Copy link
Contributor

mzupan commented Jul 15, 2015

Haven't looked at the aws library to see if this is possible but the aws_elasticache_cluster creation at least for redis doesn't wait until a cache node is created so anything referencing the attribute will fail.

For example

resource "aws_elasticache_cluster" "logstash-redis" {
  cluster_id = "web-logstash-redis"
  engine = "redis"
  node_type = "cache.m3.medium"
  num_cache_nodes = 1
  parameter_group_name = "default.redis2.8"
  port = 6379

  subnet_group_name = "${aws_elasticache_subnet_group.logstash-redis.name}"
  security_group_ids = ["${aws_security_group.internal-redis.id}"]
  parameter_group_name = "default.redis2.8"
}

resource "aws_route53_record" "logstash-redis" {
   zone_id = "${aws_route53_zone.private-zone.id}"
   name = "redis-logstash.${var.env}.urthecast.com"
   type = "CNAME"
   ttl = "30"
   records = ["${aws_elasticache_cluster.logstash-redis.cache_nodes.0.address}"]
}

It fails on the route53 creation

* Resource 'aws_elasticache_cluster.logstash-redis' does not have attribute 'cache_nodes.0.address' for variable 'aws_elasticache_cluster.logstash-redis.cache_nodes.0.address'

Once the cache node is created it works fine

@catsby
Copy link
Contributor

catsby commented Jul 15, 2015

Hey @mzupan – I'll check this out, however we've been down this road in #2051 and both @phinze and I can't seem to reproduce it after several attempts. There is code in the resource that waits for the nodes to become available, which should populate the attribute before you're getting to this point.

Maybe you can spot something in the logic that we're missing?

@catsby catsby added bug waiting-response An issue/pull request is waiting for a response from the community provider/aws labels Jul 15, 2015
@munhitsu
Copy link

+1 - I'm having exactly the same issue with terraform 0.6.0

@catsby
Copy link
Contributor

catsby commented Jul 22, 2015

@munhitsu do you have a configuration that reproduces this reliably? We've been unable to reproduce this so far (see #2051)

@munhitsu
Copy link

Yes, I'm getting it most of the time (just tested on 0.6.1) - @catsby ping me directly for stack/logs.
Stack cleanup just hides issue.

@catsby
Copy link
Contributor

catsby commented Jul 23, 2015

I'm looking for a configuration that reproduces this. Can you reproduce this with the configuration above, or the one I shared in this gist?

I believe this is happening... I just can't track down where, I'm not able to reproduce it even a single time 😦

@catsby
Copy link
Contributor

catsby commented Jul 24, 2015

Stack cleanup just hides issue

I'm not sure I follow your meaning here.

I've tried the mentioned config and config I shared in this gist, has anyone managed to reproduce this issue with either?

I've tried several regions as well. I've seen logs that demonstrate the error(s) so I believe this to be happening.

@munhitsu – thanks for the info you shared privately. I tried pairing your example back to just the cluster related things, but still no success reproducing it. An important observation from your logs though...

[DEBUG] status: available
[DEBUG] status: available

[DEBUG] status: only appears in the code, in the ElastiCache Cluster resource, here:

In your log, those are the only two lines output of that status. When I create the cluster, I get this output:

Which is about 5 minutes of [DEBUG] status: creating before I reach available. Has anyone noticed that when this fails for them (requires running Terraform with TF_LOG=1 to get the verbose output).

That seems to be a mistake on AWS's side... I'm not sure what else would happen here, we're reading that value from what the API returns.

I pushed a branch https://github.com/hashicorp/terraform/tree/aws-elasticache-debug that has extra debugging for the Cluster creation and checking of the nodes. If anyone can reproduce this, please try with that branch if possible and examine the logs. I've changed the output to be [DEBUG] ElastiCache Cluster status: to be more clear what we're checking for.

I'm trying to re-run this with the that branch.

@catsby
Copy link
Contributor

catsby commented Jul 24, 2015

Specific debug additions: 6469c32

@catsby
Copy link
Contributor

catsby commented Jul 24, 2015

At long (long) last, I think I have insight here. Working on testing my theory and fixing ... Thank you all for your help and patience here

@catsby catsby removed the waiting-response An issue/pull request is waiting for a response from the community label Jul 24, 2015
@catsby
Copy link
Contributor

catsby commented Jul 24, 2015

I think #2842 fixes this. The only way I could reproduce this is if I was creating (or had a prior existing) cluster in the same region. There were bugs in the code that wasn't searching correctly, and wasn't comparing the right cluster information..

Sorry for dragging this on, I really appreciate all the help and patience here. Please let me know if you can checkout #2842

@ozbillwang
Copy link

ozbillwang commented Apr 15, 2017

have this issue today with terraform 0.9.1,

Error running plan: 1 error(s) occurred:

* module.redis.aws_route53_record.redis: 1 error(s) occurred:

* module.redis.aws_route53_record.redis: Resource 'aws_elasticache_cluster.redis' not found for variable 'aws_elasticache_cluster.redis.cache_nodes.0.address'

updates

I found another issue that cluster_id is more than 20 letters. After I fixed this issue, above problem is gone. I can reproduce above issue by increase length of cluster_id. That's Interesting.

@ghost
Copy link

ghost commented Apr 13, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants