-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ECS/Fargate] [request]: Allow stopTimeout to be configured for ondemand tasks #1020
Comments
Implementing graceful termination on the application side we are facing the same issue. When an ECS task receive SIGTERM (docker stop), it should get a chance to complete the ongoing work before being forcibly killed. |
I run long running tasks on ECS and would like to run them in fargate. I would like for them to have a chance to finish which would mean a multi-hour stop timeout. |
The 2 minute limit is too short for many use-cases. |
Agreed with other posters. We have long running Fargate tasks that listen to a queue for picking up tasks, and tasks have a potentially unbounded processing time. Yes, they'll pick up the message again from the queue after timeout if the tasks dies (once visibility expires), but I'd prefer that the task be allowed to finish and gracefully shut down. With a 2 minute limit, we've been forced to abandon AWS's default auto scaling, and create a lambda that checks for scaling every minute, and sends a http request to containers to shut down instead of using sigterm. |
Same problem here. This renders Fargate useless for our application. Seems we need to use EC2 instances, where this limit does not exist? We are paying for running containers even whey they are shutting down, why cannot we set the stopTimeout to whatever value we like? |
Encountering a similar issue. We have tasks that we'd like to drain / shutdown slower than in two minutes. Currently that means that we're migrating from ECS Services and Autoscaling and will need to manually manage the tasks both during deployments and autoscaling. Using the RunTask and some scripts to hold it all together. Being able to extend the StopTimeout to 1-2 hours is the only reason why the current setup doesn't work for us. |
Bumping this! - Such a needed feature. |
StopTimeout should be set by users as much they want. |
StopTimeout - max can be only 2 mints for SIGTERM. For more details go through the this link. But this solution will not work for stateful operations as it's depends on our business logic. May I know when we can expect full pledge solution from AWS? |
We moved our (long-running) batch processing application from ECS on Fargate to ECS on EC2 so that we could manage the termination behavior and extend it as long as necessary in order to properly let our batch jobs complete and drain safely without loss of work. 2 minutes is woefully insufficient. However, this has lead to significantly increased DataDog monitoring costs (from ~$1.40/task to $56/task), which cannot be borne in our budget. We'd be happy to keep the tasks on Margate, if the StopTimeout could be extended as long as necessary. |
Yes we have the exact same problem. I am not able to use ECS for one of our major applications because I need to allow a Fargate instance much more time than 2 minutes to shut down. |
#256 (comment) notes that ECS Task Scale-in Protection can now be set. However, that does not solve the problem. This prevents SIGTERM from reaching a running task, so the 2-minute SIGKILL timer never starts. But it also removes the signal (SIGTERM) that the task should stop picking up new jobs to run. Many task servers, such as sidekiq, can work on multiple jobs simultaneously. If one job is running (thus scale-in protection set), if there's another job in a queue ready to be processed, the task could pick up that job too, when we only want to wait for the first job to complete, not start any new jobs on this task. Now that we're allowed to use ECS Task Scale-in Protection to keep a task alive indefinitely, we should similarly be allowed to prevent ECS from sending SIGKILL after 2 minutes. |
+1 on this. The 2 minute maximum timeout is quite low for graceful shutdowns of longer running jobs. We either have to implement a retry mechanism of jobs that might be lost or completely switch to running ECS in EC2 mode where that number can be set much higher. |
Based on this comment: spring-projects/spring-boot#4657 (comment), if we were to implement graceful termination on the application side, it would really help if
stopTimeout
would allowed to be configurable - at least for on-demand fargate tasks.Currently the max value is
120s
which is not sufficient for all usecases.Community Note
Tell us about your request
What do you want us to build?
Which service(s) is this request for?
This could be Fargate, ECS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.
Are you currently working around this issue?
How are you currently solving this problem?
Additional context
Anything else we should know?
Attachments
If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)
The text was updated successfully, but these errors were encountered: