-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws: Add support for ECS (Container Service) #1803
Conversation
Excellent work! Do you have an idea on what items remain before you feel this is something we should merge in, or do you think it's read as is. It's unfortunate to read the shortcomings of the API at present, but exciting to get ECS in. To be honest, I was supposed to start work on this today 😄 I'm going to familiarize myself with the service and API and I'll be ready to help out where needed, if you have items you'd like someone else to chip away at. Thanks again! |
Except the caveats mentioned above, it should work well, give it a try and deploy a sample cluster+service+TD using the simple demo module I provided above.
it's really mainly about the caveats, as mentioned. I left some |
Maybe just use the |
@gjohnson I was thinking about that as well, I just wasn't sure how that will work in the future, when we may want to deploy TDs cross-region... but in that case we'd just have The other issue with this is that referencing will become a bit more "verbose", e.g. resource "aws_ecs_service" "main" {
name = "tf-ecs-service"
task_definition = "${aws_ecs_task_definition.main.family}.${aws_ecs_task_definition.main.revision}"
desired_count = "${var.service_desired_count}"
} |
70b55b0
to
108a135
Compare
I fixed most major issues (the ClientExceptions are not happening anymore, most likely because of explicit dependencies), but I discovered another one, which happens rarely and is actually a known ECS bug. To quote an AWS support representative:
This may cause some services to be dead-locked (remain in "DRAINING" state and not reacting to delete requests), which means you cannot create another ECS service with the same name. It's also worth mentioning, that none of the two AWS' issues (inability of removing task definitions & dead-locked services) are costing the customer money. All ECS usage is billed based on consumption of EC2 resources, so as long as EC2 instances are terminated, it's just mess for free. @catsby Will you give it a full review? |
I may actually be able to work around that AWS bug simply by running I will have a look. |
7a2a01c
to
b73cb87
Compare
I can review this today if you think it's ready. Any new caveats / limitations ? |
There's one I discovered while testing this during the weekend - I was not able to reference I don't remember having this issue before... but I admit I was rebasing all commits from master regularly every couple days since this PR exists... I also just realised that the AWS known bug w/ ELBs/services cannot be fixed via In other words, we would be deregistering the whole instance from a cluster, not a service from ELB. |
There's a lot here, but overall looks very good. I did not run into the |
Trying to recreate the cluster from your example module (which is great, btw), I'm getting this error:
Have you ran into that? |
Interesting... did you use the module provided or you just built something custom?
I have not ran into this one... but it may be just because I was blocked by the |
I used the module
I did, and it seemed stuck. I went to lunch and now it destroyed fine. |
Hey @radeksimko – sorry that this has been hanging out here so long. I did review and the code checks out. There were some bumps as I mentioned, but it's not clear what we can do about those from our side, would you agree? Do you think this is ready for merging? |
@catsby I'd like to see it working again in my environment before it goes in 😃 Last time I tried I ended up with the "ecs_cluster.name missing" issue and I also remember from the past having some troubles building the IAM instance profile + role via IAM TF resources -> I ended up creating IAM via the AWS Console "ECS - getting started" and taking ARNs from there. Otherwise if you feel it's safe to merge it right now, I won't prevent you from doing so. 😄 Either way, I'm ok with eventually ignoring the race conditions ( |
So I quickly tested and I'm still getting the same error:
and I don't understand why you're not seeing that same error. |
Have you tried The service acceptance tests fail for me:
But |
@catsby I plan to have a look at ECS during the upcoming weekend and ideally make it ready for final review & merge. It has been waiting for almost a month now, so I think it's time 😃 |
@radeksimko I apologize for the delays here, I really appreciate the hard work you have and continue to put in here |
I can confirm I'm now getting the following error as well:
looking into it now. |
1514588
to
d3f35fb
Compare
Bug found + fixed - acceptance tests for Now it's creating its own cluster, which is much better for isolation of those tests anyway. |
@catsby The demo module now contains IAM resources as well. The whole functionality is tested and works for creating & updating ECS resources, but deleting is kinda "broken" due to the mentioned annoying ELB bug. I feel we need to figure out how to delete the I may or may not have time this week to work on it. I will see. I spent most of my time this weekend on #2157 so I had less time for this. |
I did get a response from AWS, that this bug is now fixed, so I went ahead and tested everything all over again. I can confirm that it does not stuck in dead-lock anymore. 🎉 I've been also told that the task-definition de-registration is on the way. 😃 In other words, I think this is finally ready for a full review & merge. |
This is fantastic work @radeksimko! I'll update the CHANGELOG afterwards. Again, thanks! 🎆 |
aws: Add support for ECS (Container Service)
Done 😄 🎉 |
I am playing with the ECS functionality and I ran into an issue with specifying volumes in a task definition. It should be possible to pass an empty host in which case the volume is not made persistent - the docker daemon manages its location and garbage collection: http://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_data_volumes.html |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
This is first working concept, includes a basic set of acceptance tests + all docs.
Example
https://github.com/TimeIncOSS/tf_aws_ecs
Caveats
Tests will fail because aws: Refactor credentials according to latest aws-sdk-go refactoring #1802 , I tested everything w/ aws/aws-sdk-go@876a0d5ClientException: The Cluster cannot be deleted while Container Instances are active.
ClientException: The Cluster cannot be deleted while Services are active.
I'm using- usingid
as a place to save the Task Definition ARN ( =arn:aws:ecs:us-west-2:01234567890:task-definition/mongodb:3
where3
is revision bumped each time you update it), which makes updates two-phased=>
need to runapply
twice, first to update the TD, second to update the reference in the Service. I'm not sure what's the best way to approach this problem.family
instead + exportingarn
to keep referencing easyAPI allows- solved w/ ARN builder similar to the RDS onetask_definition
to be in two compatible formats -family:revision
or full ARN, but full ARN is always returned as response from the API, which makes it confusing for updates. Thereforefamily:revision
format doesn't really work well (it causes update-all-the-time).Test plan