Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3 node cluster running on ECS using awsvpc networking #6802

Closed
juliengrondin opened this issue Nov 15, 2019 · 15 comments · Fixed by hashicorp/go-discover#197
Closed

3 node cluster running on ECS using awsvpc networking #6802

juliengrondin opened this issue Nov 15, 2019 · 15 comments · Fixed by hashicorp/go-discover#197
Assignees
Labels
theme/ecs Related to the AWS Elastic Container Service runtime type/enhancement Proposed improvement or new feature

Comments

@juliengrondin
Copy link

Is there a way for the nodes to discover eachother within ECS using awsvpc networking?
Currently I get errors:
[WARN] agent: Join LAN failed: No servers to join, retrying in 30s
[ERR] agent: failed to sync remote state: No cluster leader
[ERR] agent: Cannot discover LAN provider=aws tag_key=consul tag_value=member: discover-aws: GetInstanceIdentityDocument failed: EC2MetadataRequestError: failed to get EC2 instance identity document
It seems the nodes are unable to discover each other to join a cluster.

@crhino crhino added the needs-investigation The issue described is detailed and complex. label Nov 19, 2019
@crhino
Copy link
Contributor

crhino commented Nov 19, 2019

This should work, although I have not personally tried it on AWS. What values are you passing to the retry-join flag/configuration value? That would help us in understanding the full context of this.

@juliengrondin
Copy link
Author

-retry-join=provider=aws tag_key=consul tag_value=member
Resources are tagged with key : value consul : member

Are there any plans to use the ECS api as opposed to relying on EC2 describe-instance?

@stale
Copy link

stale bot commented Jan 19, 2020

Hey there,
We wanted to check in on this request since it has been inactive for at least 60 days.
If you think this is still an important issue in the latest version of Consul
or its documentation please reply with a comment here which will cause it to stay open for investigation.
If there is still no activity on this issue for 30 more days, we will go ahead and close it.

Feel free to check out the community forum as well!
Thank you!

@stale stale bot added the waiting-reply Waiting on response from Original Poster or another individual in the thread label Jan 19, 2020
@juliengrondin
Copy link
Author

This is still an issue - are there any work arounds?

@stale stale bot removed the waiting-reply Waiting on response from Original Poster or another individual in the thread label Jan 20, 2020
@alkalinecoffee
Copy link

alkalinecoffee commented Apr 15, 2020

Might help to add a little more background on this:

2020-04-15T16:12:09.326Z [INFO] agent: discover-aws: Region not provided. Looking up region in metadata...: cluster=LAN
2020-04-15T16:12:09.872Z [ERROR] agent: Cannot discover address: cluster=LAN address="provider=aws tag_key=consul-datacenter tag_value=us-east-1-staging" error="discover-aws: GetInstanceIdentityDocument failed: EC2MetadataRequestError: failed to get EC2 instance identity document
caused by: RequestError: send request failed
caused by: Get http://169.254.169.254/latest/dynamic/instance-identity/document: dial tcp 169.254.169.254:80: connect: invalid argument"  
2020-04-15T16:12:09.872Z [WARN] agent: Join cluster failed, will retry: cluster=LAN retry_interval=30s error="No servers to join"

According to the docs, the instance-identity path is not available in ECS (it's only available in EC2).

It seems that we either need to add some logic to determine whether we are running in ECS and disregard this identity document call, or perhaps add a flag to disable it (but I'm unsure of what all it is used for).

As it stands now, I don't believe we can use autodiscovery in ECS using tags in this fashion.

This might be of interest: hashicorp/go-discover#61

@cocakohler
Copy link

Any updates on that issue? @juliengrondin did you find a workaround for ECS. I want to run consul on ECS Fargate and i'm having the same problem.

@devarshishah3
Copy link

@cocakohler, there is a learn guide to help get started with Consul on ECS/EC2 for HCP. The workflow for deploying Consul clients would be the same for self-managed. Are you looking to deploy Consul servers on ECS/Fargate or is it just clients? Fargate does not support awsvpc type and daemon scheduling. One way to deploy Consul clients would be to have a Consul agent run as a container within the task definition. Would love to understand your requirements and desired workflow for ECS/Fargate

@iandelahorne
Copy link

x-posting from hashicorp/go-discover#61:

We're seeing this too on the 1.10.2 container image on Fargate. Our consul servers are on EC2 and discovery works great for clients on EC2 using -retry-join "provider=aws tag_key=role tag_value=consul-server"

However, if we try to deploy a client sidecar on Fargate, it does not work. Here's a snippet of the task definition:

  "containerDefinitions": [
    {
      "name": "consul",
      "image": "public.ecr.aws/hashicorp/consul:1.10.2",
      "essential": true,
      "entryPoint": ["/bin/sh", "-ec"],
      "command": [
        "ECS_IPV4=$(curl -s $ECS_CONTAINER_METADATA_URI_V4 | jq -r '.Networks[0].IPv4Addresses[0]')\n exec consul agent -advertise \"$ECS_IPV4\" -datacenter development -retry-join \"provider=aws tag_key=role tag_value=consul-server\" -data-dir /consul/data"
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-create-group": "true",
          "awslogs-group": "/ecs/consultest",
          "awslogs-region": "us-west-2",
          "awslogs-stream-prefix": "consul"
        }
      },
      "portMappings": [
        {
          "containerPort": 8300,
          "hostPort": 8300,
          "protocol": "tcp"
        },
        {
          "containerPort": 8300,
          "hostPort": 8300,
          "protocol": "udp"
        }
      ]
    },
  "placementConstraints": [],
  "requiresCompatibilities": [
    "FARGATE"
  ],

In the logs, we see:

[ERROR] agent: Cannot discover address: cluster=LAN address="provider=aws tag_key=role tag_value=development" error="discover-aws: GetInstanceIdentityDocument failed: EC2MetadataRequestError: failed to get EC2 instance identity document
caused by: RequestError: send request failed
caused by: Get "http://169.254.169.254/latest/dynamic/instance-identity/document": dial tcp 169.254.169.254:80: connect: invalid argument"

According to the ECS task IAM role docs, inside ECS the container IAM role should be fetched from http://169.254.170.2$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI

@jkirschner-hashicorp jkirschner-hashicorp added the theme/ecs Related to the AWS Elastic Container Service runtime label Nov 10, 2021
@aeshaynes
Copy link

aeshaynes commented Feb 18, 2022

The only work around I could come up with for now is to register each Fargate container with AWS Service Discovery, then you can call them with the --retry-join.

Annoying that this has been open since 2019 and seemingly no progress..... can't be the only ones wanting to run this on Fargate.

@Amier3
Copy link
Contributor

Amier3 commented Feb 28, 2022

Hey @aeshaynes

It seems like this issue fell through the cracks, and we apologize for that. I'll revive this issue internally and reach out to some of our engineers that work on consul-ecs integration. We'll make an effort to get this resolved ( and documented ) soon 👍

@Amier3 Amier3 self-assigned this Feb 28, 2022
@fdr2

This comment was marked as outdated.

@fdr2
Copy link
Contributor

fdr2 commented Jul 3, 2022

I submitted PR #197 to go-discover in order to provide ECS-Tag auto-discovery support

@Amier3
Copy link
Contributor

Amier3 commented Jul 7, 2022

Cross-posting a comment from the PR above:

The team did want to caution though that we haven't evaluated the effects of running consul servers inside of ECS ( aside from a single node cluster for dev/testing ) . So for the moment we only officially support using servers outside of ECS. There was a couple of factors that went into that decision -- one of them being some performance concerns around the EFS that ECS uses under the hood.

So we'll continue to try and resolve this issue to allow for consul ecs server experimentation

@Amier3 Amier3 added type/enhancement Proposed improvement or new feature and removed needs-investigation The issue described is detailed and complex. labels Jul 7, 2022
@pglass pglass reopened this Jul 14, 2022
@pglass
Copy link

pglass commented Jul 14, 2022

The ECS support is merged into go-discover. Next, Consul needs to be updated with the new version of the library.

(triggers auto-closed the issue on a merge, so reopened this)

@pglass
Copy link

pglass commented Feb 3, 2023

The support for ECS discovery in Cloud-Auto Join strings was included in the 1.12.9, 1.13.6, and 1.14.4 patch releases of Consul, so closing this issue.

@pglass pglass closed this as completed Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/ecs Related to the AWS Elastic Container Service runtime type/enhancement Proposed improvement or new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.