-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Progressive traffic increase for new Pods (slow start mode) #11050
Comments
This is something that I have wanted to add for quite some time. I think the easiest implementation would be to keep track of host addition and ramp time, and if this option is enabled, scale the host picks for RR and LR by some amount during the warm up period. cc @snowp @tonya11en |
@mattklein123 i would like to help with this |
@nezdolik awesome sounds great. Do you want to put together a short design doc on an implementation proposal? |
@mattklein123 will do |
Thanks @nezdolik for working on this! Overall looks great. There are a few comment threads to work through in the doc but very excited to see this being worked on. |
is there any progress in this work? I very much hope that this work will be completed soon. If there is a need, perhaps I can help as well. |
As an organisation that is 75% Java; we'd love this. |
+1 |
This would be a great feature! In the meantime, what we've done for some apps is basically run a little mini load test from within the pod to warm them up. Even with slow start, a per pod warm up would still be useful in situations where all Pods / VM's behind Envoy were restarted at the same time to ensure they can properly start serving traffic and won't instantly be overwhelmed when they are put into service. |
the RFC looks great, @nezdolik. Is there anything that prevents us from implementing it? |
@costimuraru there is an in progress PR: #13176, "slow start" is slowly moving forward. |
Title: Support for progressive traffic increase for new Pods (slow start mode)
Description:
TL;DR; It would be useful to have a slow start mode that allows us to add new pods without overwhelming them with a flood of requests. Similar to this from AWS: https://aws.amazon.com/about-aws/whats-new/2018/05/application-load-balancer-announces-slow-start-support/
We have a JVM-based web app behind Contour/Envoy/NLB, with horizontal pod auto scaling in place.
When a new pod gets created due to auto scaling, Contour/Envoy directs a proportional amount of traffic on that new pod. However, when the app that has just started is overwhelmed with a flood of requests, we're seeing consistent timeouts until it warms up (a couple of minutes). Because of this, whenever we scale out our app, we're losing data. While discussing this with other teams inside Adobe, we've noticed this a common problem with JVM-based apps.
(as you can see in the graph above, whenever a new pod gets created, requests start failing for a couple of minute)
We tried the same scenario by using a Service type LoadBalancer, in EKS (with an Elastic Load Balancer in front) and we don't see the issue. The ELB is doing a progressive traffic increase on the new pod, as the graph seen below.
(in the graph above, you can see the number of requests received by the new pod from the ELB, which is gradually increasing)
The text was updated successfully, but these errors were encountered: