Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ECS/Fargate] [request]: Allow stopTimeout to be configured for ondemand tasks #1020

Open
ghost opened this issue Aug 7, 2020 · 13 comments
Open
Assignees
Labels
ECS Amazon Elastic Container Service Fargate AWS Fargate Proposed Community submitted issue

Comments

@ghost
Copy link

ghost commented Aug 7, 2020

Based on this comment: spring-projects/spring-boot#4657 (comment), if we were to implement graceful termination on the application side, it would really help if stopTimeout would allowed to be configurable - at least for on-demand fargate tasks.

Currently the max value is 120s which is not sufficient for all usecases.

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
What do you want us to build?

Which service(s) is this request for?
This could be Fargate, ECS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.

Are you currently working around this issue?
How are you currently solving this problem?

Additional context
Anything else we should know?

Attachments
If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

@ghost ghost added the Proposed Community submitted issue label Aug 7, 2020
@akshayram-wolverine akshayram-wolverine added the Fargate AWS Fargate label Aug 11, 2020
@matteomazza91
Copy link

Implementing graceful termination on the application side we are facing the same issue.
We have an ECS service with an ALB, Fargate tasks and autoscaling that increase and decrease the desiredCount.

When an ECS task receive SIGTERM (docker stop), it should get a chance to complete the ongoing work before being forcibly killed.
Unfortunately for our use case a maximum value of 120s for stopTimeout is not enough.

@bryanculbertson
Copy link

I run long running tasks on ECS and would like to run them in fargate. I would like for them to have a chance to finish which would mean a multi-hour stop timeout.

@vineetraja
Copy link

The 2 minute limit is too short for many use-cases.
With this limit in place, AWS fargate auto-scaling is rendered useless for any system that cares about graceful shutdown.

@RicePatrick
Copy link

Agreed with other posters. We have long running Fargate tasks that listen to a queue for picking up tasks, and tasks have a potentially unbounded processing time. Yes, they'll pick up the message again from the queue after timeout if the tasks dies (once visibility expires), but I'd prefer that the task be allowed to finish and gracefully shut down.

With a 2 minute limit, we've been forced to abandon AWS's default auto scaling, and create a lambda that checks for scaling every minute, and sends a http request to containers to shut down instead of using sigterm.

@marc-guenther
Copy link

Same problem here. This renders Fargate useless for our application. Seems we need to use EC2 instances, where this limit does not exist?

We are paying for running containers even whey they are shutting down, why cannot we set the stopTimeout to whatever value we like?

@GytisZ
Copy link

GytisZ commented Aug 19, 2021

Encountering a similar issue. We have tasks that we'd like to drain / shutdown slower than in two minutes. Currently that means that we're migrating from ECS Services and Autoscaling and will need to manually manage the tasks both during deployments and autoscaling. Using the RunTask and some scripts to hold it all together.

Being able to extend the StopTimeout to 1-2 hours is the only reason why the current setup doesn't work for us.

@keirw2022
Copy link

Bumping this! - Such a needed feature.

@satya-500
Copy link

StopTimeout should be set by users as much they want.

@maddipati-srinivas
Copy link

StopTimeout - max can be only 2 mints for SIGTERM. For more details go through the this link. But this solution will not work for stateful operations as it's depends on our business logic.

May I know when we can expect full pledge solution from AWS?

@mdomsch-seczetta
Copy link

We moved our (long-running) batch processing application from ECS on Fargate to ECS on EC2 so that we could manage the termination behavior and extend it as long as necessary in order to properly let our batch jobs complete and drain safely without loss of work. 2 minutes is woefully insufficient. However, this has lead to significantly increased DataDog monitoring costs (from ~$1.40/task to $56/task), which cannot be borne in our budget. We'd be happy to keep the tasks on Margate, if the StopTimeout could be extended as long as necessary.

@omieomye omieomye added the ECS Amazon Elastic Container Service label Sep 23, 2022
@craigify
Copy link

Yes we have the exact same problem. I am not able to use ECS for one of our major applications because I need to allow a Fargate instance much more time than 2 minutes to shut down.

@matt-domsch-sp
Copy link

matt-domsch-sp commented May 16, 2023

#256 (comment) notes that ECS Task Scale-in Protection can now be set. However, that does not solve the problem. This prevents SIGTERM from reaching a running task, so the 2-minute SIGKILL timer never starts. But it also removes the signal (SIGTERM) that the task should stop picking up new jobs to run. Many task servers, such as sidekiq, can work on multiple jobs simultaneously. If one job is running (thus scale-in protection set), if there's another job in a queue ready to be processed, the task could pick up that job too, when we only want to wait for the first job to complete, not start any new jobs on this task. Now that we're allowed to use ECS Task Scale-in Protection to keep a task alive indefinitely, we should similarly be allowed to prevent ECS from sending SIGKILL after 2 minutes.

@ADrejta
Copy link

ADrejta commented Aug 21, 2024

+1 on this. The 2 minute maximum timeout is quite low for graceful shutdowns of longer running jobs. We either have to implement a retry mechanism of jobs that might be lost or completely switch to running ECS in EC2 mode where that number can be set much higher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ECS Amazon Elastic Container Service Fargate AWS Fargate Proposed Community submitted issue
Projects
None yet
Development

No branches or pull requests