-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Policies for ecs_service, ecs_cluster, ecs_taskdefinition, ecs_task #210
Conversation
888e6ac
to
def7593
Compare
ede308c
to
b6526ad
Compare
aws/terminator/compute.py
Outdated
self.client.delete_service(cluster=self.name, service=name['serviceName'], force=True) | ||
|
||
self.client.delete_cluster(cluster=self.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.client.delete_service(cluster=self.name, service=name['serviceName'], force=True) | |
self.client.delete_cluster(cluster=self.name) | |
self.client.delete_service(cluster=self.name, service=name['serviceName'], force=True) | |
self.client.delete_cluster(cluster=self.name) |
do we need here a waiter? https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Waiter.ServicesInactive
otherwise delete_cluster might fail. Or does it not matter, because the next run of the terminator will delete the cluster, because the service is gone then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would typically avoid using waiters on the lambda to prevent timeouts, especially for anything that might take more than a minute or two. The ideal way to handle a scenario like this would be to create terminators for the cluster and any dependent resources, and increase the age limit on the cluster terminator. This is complicated by the fact that doing anything with services or container instances also requires the cluster name.
aws/terminator/compute.py
Outdated
tasks = _paginate_task_results(name['containerInstanceArn']) | ||
for task in tasks: | ||
self.client.stop_task(cluster=self.name, task=task['taskArn']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tasks = _paginate_task_results(name['containerInstanceArn']) | |
for task in tasks: | |
self.client.stop_task(cluster=self.name, task=task['taskArn']) | |
tasks = _paginate_task_results(name['containerInstanceArn']) | |
for task in tasks: | |
self.client.stop_task(cluster=self.name, task=task['taskArn']) |
Unless the desired count of the belonging service is not set to 0, the service might spawn new tasks to replace the stopped ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe flipping solves this issue in a lazy way?
- delete the service
- delete all left running tasks (like
killall -9
) - delete the cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the tasks need to be individually deleted at all? Won't deregistering the container instance also remove the tasks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the ecs_tasks runs on ec2, yes. deleteing the instance, kills also the ecs tasks.
but if it's serverless (fargate), there is no otherway to stop/kill the tasks.
but basically you did it that way, it you won't wait for the service to become inactive.
imo we've got two choices.
a) delete service and wait until it's gone. forward delete cluster
b) delete service, kill pending tasks, forward delete cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the first option seems like less work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We wouldn't want to ignore the exception forever though. So perhaps, one class with the default age_limit
that catches and ignores ClusterContainsServicesException
and ClusterContainsTasksException
on delete_cluster()
. Then a second class, which has an age_limit
10 minutes longer than the default, that only tries to delete clusters (and that one doesn't ignore those exceptions, so that errors still surface to us in the monitoring).
I'm assuming 10 minutes should be more than long enough, even on a bad day, for all tasks and services to be 100% gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
What's about the failing CI. I cannot find any usefull error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markuman it needs to be refactored in some way because the file's dimension exceeds the limit. Will try to see what I can do for this thing.
090593e
to
f1af250
Compare
Since we've exceeded the policy size on compute, I would suggest moving the ECS policies to the paas policy. |
951959f
to
ee500ed
Compare
Done, thank you @gravesm! |
ee500ed
to
4acd9dc
Compare
aws/terminator/paas.py
Outdated
self.client.stop_task(cluster=self.name, task=task['taskArn']) | ||
|
||
# If there are running services, delete them first | ||
services = _paginate_service_results() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Services should be the first things that are deleted, otherwise, any deleted tasks may just be restarted again.
# Deregister container instances | ||
for name in container_instances: | ||
self.client.deregister_container_instance(containerInstance=name['containerInstanceArn']) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before deleting the cluster, you will also need to delete all the tasks. In the case of fargate, there are no container instances and you may have tasks running that aren't managed by a service. I'm also not clear on whether using force=True
when deleting a service also deletes the tasks or just orphans them. Either way, there should be a step that stops all tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple small fixes, but otherwise this LGTM.
aws/terminator/paas.py
Outdated
if not names: | ||
return [] | ||
|
||
return client.describe_clusters(name=names)['clusters'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return client.describe_clusters(name=names)['clusters'] | |
return client.describe_clusters(clusters=names)['clusters'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Thanks.
aws/terminator/paas.py
Outdated
if not names: | ||
return [] | ||
|
||
return client.describe_clusters(name=names)['clusters'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return client.describe_clusters(name=names)['clusters'] | |
return client.describe_clusters(clusters=names)['clusters'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Thanks.
@alinabuzachis @gravesm Where do we stand on getting this merged? |
Signed-off-by: Alina Buzachis <[email protected]>
Signed-off-by: Alina Buzachis <[email protected]>
Signed-off-by: Alina Buzachis <[email protected]>
Signed-off-by: Alina Buzachis <[email protected]>
Signed-off-by: Alina Buzachis <[email protected]>
Signed-off-by: Alina Buzachis <[email protected]>
Signed-off-by: Alina Buzachis <[email protected]>
Signed-off-by: Alina Buzachis <[email protected]>
Signed-off-by: Alina Buzachis <[email protected]>
Signed-off-by: Alina Buzachis <[email protected]>
Signed-off-by: Alina Buzachis <[email protected]>
Signed-off-by: Alina Buzachis <[email protected]>
Signed-off-by: Alina Buzachis <[email protected]>
a1c21c0
to
29e9112
Compare
Signed-off-by: Alina Buzachis <[email protected]>
e6d006d
to
eff6b5b
Compare
Signed-off-by: Alina Buzachis <[email protected]>
Hey, I'll try to get to this early next week. There are a lot of parts covered by this PR and I'd like to set aside some time to test it. |
Just a note, the |
@tremble I'm running into a lot of problems with the ecs_cluster tests. There may be other things that are broken, as @markuman referred to, but the biggest problem I have so far is that the tests seem to be intended to provision containers on EC2 instances, but then the service definitions are using Fargate provisioning. What I need in order to get this merged is a working test suite. I'm willing to work on getting the permissions correct, but I need a PR I can test against that is otherwise working. |
I can try to target it this week. |
Closing this as it was superseded by #264 |
ecs: integration test and new purge parameters SUMMARY Make the ecs_cluster integration test work again ecs_service - new parameter purge_placement_constraints and purge_placement_strategy. Otherwise it is impossible to remove those placements without breaking backwards compatibility. purge_placement_constraints in the integration test purge_placement_strategy in the integration test required by mattclay/aws-terminator#210 (comment) ISSUE TYPE Bugfix Pull Request Docs Pull Request Feature Pull Request COMPONENT NAME ecs_service ADDITIONAL INFORMATION works for me again ansible-test integration --python 3.10 ecs_cluster --docker --allow-unsupported ... PLAY RECAP ********************************************************************* testhost : ok=143 changed=69 unreachable=0 failed=0 skipped=1 rescued=0 ignored=6 Reviewed-by: Mark Chappell Reviewed-by: Markus Bergholz <[email protected]> Reviewed-by: Alina Buzachis Reviewed-by: Mike Graves <[email protected]>
ecs: integration test and new purge parameters SUMMARY Make the ecs_cluster integration test work again ecs_service - new parameter purge_placement_constraints and purge_placement_strategy. Otherwise it is impossible to remove those placements without breaking backwards compatibility. purge_placement_constraints in the integration test purge_placement_strategy in the integration test required by mattclay/aws-terminator#210 (comment) ISSUE TYPE Bugfix Pull Request Docs Pull Request Feature Pull Request COMPONENT NAME ecs_service ADDITIONAL INFORMATION works for me again ansible-test integration --python 3.10 ecs_cluster --docker --allow-unsupported ... PLAY RECAP ********************************************************************* testhost : ok=143 changed=69 unreachable=0 failed=0 skipped=1 rescued=0 ignored=6 Reviewed-by: Mark Chappell Reviewed-by: Markus Bergholz <[email protected]> Reviewed-by: Alina Buzachis Reviewed-by: Mike Graves <[email protected]> (cherry picked from commit 86c60b4)
[PR #1716/86c60b49 backport][stable-5] ecs: integration test and new purge parameters This is a backport of PR #1716 as merged into main (86c60b4). SUMMARY Make the ecs_cluster integration test work again ecs_service - new parameter purge_placement_constraints and purge_placement_strategy. Otherwise it is impossible to remove those placements without breaking backwards compatibility. purge_placement_constraints in the integration test purge_placement_strategy in the integration test required by mattclay/aws-terminator#210 (comment) ISSUE TYPE Bugfix Pull Request Docs Pull Request Feature Pull Request COMPONENT NAME ecs_service ADDITIONAL INFORMATION works for me again ansible-test integration --python 3.10 ecs_cluster --docker --allow-unsupported ... PLAY RECAP ********************************************************************* testhost : ok=143 changed=69 unreachable=0 failed=0 skipped=1 rescued=0 ignored=6 Reviewed-by: Mark Chappell
) ecs: integration test and new purge parameters SUMMARY Make the ecs_cluster integration test work again ecs_service - new parameter purge_placement_constraints and purge_placement_strategy. Otherwise it is impossible to remove those placements without breaking backwards compatibility. purge_placement_constraints in the integration test purge_placement_strategy in the integration test required by mattclay/aws-terminator#210 (comment) ISSUE TYPE Bugfix Pull Request Docs Pull Request Feature Pull Request COMPONENT NAME ecs_service ADDITIONAL INFORMATION works for me again ansible-test integration --python 3.10 ecs_cluster --docker --allow-unsupported ... PLAY RECAP ********************************************************************* testhost : ok=143 changed=69 unreachable=0 failed=0 skipped=1 rescued=0 ignored=6 Reviewed-by: Mark Chappell Reviewed-by: Markus Bergholz <[email protected]> Reviewed-by: Alina Buzachis Reviewed-by: Mike Graves <[email protected]>
) ecs: integration test and new purge parameters SUMMARY Make the ecs_cluster integration test work again ecs_service - new parameter purge_placement_constraints and purge_placement_strategy. Otherwise it is impossible to remove those placements without breaking backwards compatibility. purge_placement_constraints in the integration test purge_placement_strategy in the integration test required by mattclay/aws-terminator#210 (comment) ISSUE TYPE Bugfix Pull Request Docs Pull Request Feature Pull Request COMPONENT NAME ecs_service ADDITIONAL INFORMATION works for me again ansible-test integration --python 3.10 ecs_cluster --docker --allow-unsupported ... PLAY RECAP ********************************************************************* testhost : ok=143 changed=69 unreachable=0 failed=0 skipped=1 rescued=0 ignored=6 Reviewed-by: Mark Chappell Reviewed-by: Markus Bergholz <[email protected]> Reviewed-by: Alina Buzachis Reviewed-by: Mike Graves <[email protected]>
Policies for ecs_service, ecs_cluster, ecs_taskdefinition, ecs_task
Needed for ansible-collections/community.aws#1145