Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Consul Auto-Join with Cloud Metadata tutorial #16956

Closed
pavlo opened this issue Dec 19, 2017 · 4 comments
Closed

Issue with Consul Auto-Join with Cloud Metadata tutorial #16956

pavlo opened this issue Dec 19, 2017 · 4 comments
Labels

Comments

@pavlo
Copy link

pavlo commented Dec 19, 2017

Hey!

I was following the tutorial on Auto-Join feature of Consul - https://www.hashicorp.com/blog/consul-auto-join-with-cloud-metadata. It worked just fine, I managed to get 3 servers running and was going to try to scale to 5.

So, I changed the managers_count variable in terraform.tfvars from 3 to 5 and ran terraform plan.

In the tutorial, the plan command outputted this:

Plan: 2 to add, 0 to change, 0 to destroy

But in my environment it yielded this:

Plan: 5 to add, 0 to change, 3 to destroy.

so that instead of just adding two more servers it planned to remove the three existing and create five new ones. I would not expect that as in a real-life it would purge the cluster instead of just scaling it.

So, by changing the managers_count variable it essentially changes the template_file's counter which leads to aws_instance's user_data to change. Thus terraform deems it necessary to recreate the instances.

Is there anything wrong with what I was doing? I assume so given the plan from the tutorial looks just perfect!

Here's an excerpt of the plan output of an instance it scheduled to replace, so it is the id and user_data guilty.

aws_instance.consul_server[0] (new resource required)
  id:                                "i-02924e5317c4603ab" => <computed> (forces new resource)
  ami:                               "ami-8fd760f6" => "ami-8fd760f6"
  associate_public_ip_address:       "false" => <computed>
  availability_zone:                 "eu-west-1a" => <computed>
  ebs_block_device.#:                "0" => <computed>
  ephemeral_block_device.#:          "0" => <computed>
  iam_instance_profile:              "MyTestProject-consul-join" => "MyTestProject-consul-join"
  instance_state:                    "running" => <computed>
  instance_type:                     "t2.nano" => "t2.nano"
  ipv6_address_count:                "" => <computed>
  ipv6_addresses.#:                  "0" => <computed>
  network_interface.#:               "0" => <computed>
  network_interface_id:              "eni-28281e07" => <computed>
  placement_group:                   "" => <computed>
  primary_network_interface_id:      "eni-28281e07" => <computed>
  private_dns:                       "ip-172-33-2-154.eu-west-1.compute.internal" => <computed>
  private_ip:                        "172.33.2.154" => <computed>
  public_dns:                        "" => <computed>
  public_ip:                         "" => <computed>
  root_block_device.#:               "1" => <computed>
  security_groups.#:                 "0" => <computed>
  source_dest_check:                 "true" => "true"
  subnet_id:                         "subnet-3fdcda58" => "subnet-3fdcda58"
  tags.%:                            "2" => "2"
  tags.Name:                         "manager-0" => "manager-0"
  tags.consul_join:                  "MyTestProject-consul_join" => "MyTestProject-consul_join"
  tenancy:                           "default" => <computed>
  user_data:                         "df716badde7501abbd5a40383535cf55209fa621" => "a499190dfab3f8c28b8ee3e9931f0a53ea748bf0" (forces new resource)
  volume_tags.%:                     "0" => <computed>
  vpc_security_group_ids.#:          "1" => "1"
  vpc_security_group_ids.2478132388: "sg-72a7f709" => "sg-72a7f709"  

Terraform Configuration Files

data "template_file" "manager" {
  count    = "${var.managers_count}"
  template = "${file("${path.module}/templates/manager_provision.sh.tpl")}"

  vars {
    consul_version = "${var.consul_version}"
    consul_config = <<EOF
     "bootstrap_expect": ${var.managers_count},
     "node_name": "manager-${count.index}",
     "retry_join": ["provider=aws tag_key=consul_join tag_value=${var.project_name}-consul_join"],
     "server": true
    EOF
  }
}

resource "aws_instance" "consul_server" {
  count = "${var.managers_count}"
  ami = "${var.ami}"
  instance_type = "${var.manager_instance_type}"
  key_name = "${var.ssh_keypair_name}"

  subnet_id = "${aws_subnet.private_1_subnet_eu_west_1a.id}"
  iam_instance_profile  = "${aws_iam_instance_profile.consul-join.name}"
  vpc_security_group_ids = [
      "${aws_security_group.private_network_host.id}"
  ]

  tags = "${map(
    "Name", "manager-${count.index}",
    "consul_join", "${var.project_name}-consul_join"
  )}"

  user_data = "${element(data.template_file.manager.*.rendered, count.index)}"
}

Consul Version

Note, contrary to the tutorial, I configured it to leverage 1.0.2 version of Consul

Terraform Version

Terraform v0.11.1
+ provider.aws v1.5.0
+ provider.template v1.0.0
@pavlo
Copy link
Author

pavlo commented Dec 22, 2017

Was able to apply this workaround: #3449 (comment)

@apparentlymart
Copy link
Contributor

Hi @pavlo! Sorry this didn't work as expected.

It seems like the problem here stems from the dynamic setting of bootstrap_expect, which is hard-coded to 3 in the published example. By setting it dynamically via ${var.managers_count} the configuration will change when the managers count changes, and this causes Terraform to need to re-create all of the servers since user_data is only honored on first boot.

This is, unfortunately, a situation where Terraform's model of the world doesn't quite fit reality: Consul only cares about bootstrap_expect when the cluster is being initially bootstrapped, so there's no reason to ever update it once the cluster is already bootstrapped. To explain this to Terraform it's possible to use the ignore_changes lifecycle setting, which causes Terraform to ignore changes to a particular argument when creating a diff:

resource "aws_instance" "consul_server" {
  # ...

  user_data = "${element(data.template_file.manager.*.rendered, count.index)}"

  lifecycle {
    ignore_changes = ["user_data"]
  }
}

A common pattern for safely deploying Consul with Terraform in production is to emulate a blue/green deployment model using Terraform modules.

To do this, you can put the necessary resources for a set of Consul servers in a module and instantiate it from your root module. When making any changes to the cluster, a new instance of the same module is created alongside, unbootstrapped, and then joined to the existing servers so that there are temporarily two times the number of servers present. Once cluster replication is complete, you can then remove the original module to destroy the old servers.

This does, of course, add some additional complexity compared to simply using count on a single resource, but it has the advantage of keeping each "generation" of servers isolated at the expense of some additional steps when changes are made. Each generation of servers is, in some sense, immutable. This can be useful, for example, to upgrade to a newer version of Consul without any downtime.

@hashibot
Copy link
Contributor

Hello again!

We didn't hear back from you, so I'm going to close this in the hope that a previous response gave you the information you needed. If not, please do feel free to re-open this and leave another comment with the information my human friends requested above. Thanks!

@ghost
Copy link

ghost commented Sep 13, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Sep 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants