Progressive traffic increase for new Pods (slow start mode) #11050

costimuraru · 2020-05-04T21:39:59Z

Title: Support for progressive traffic increase for new Pods (slow start mode)

Description:
TL;DR; It would be useful to have a slow start mode that allows us to add new pods without overwhelming them with a flood of requests. Similar to this from AWS: https://aws.amazon.com/about-aws/whats-new/2018/05/application-load-balancer-announces-slow-start-support/

We have a JVM-based web app behind Contour/Envoy/NLB, with horizontal pod auto scaling in place.
When a new pod gets created due to auto scaling, Contour/Envoy directs a proportional amount of traffic on that new pod. However, when the app that has just started is overwhelmed with a flood of requests, we're seeing consistent timeouts until it warms up (a couple of minutes). Because of this, whenever we scale out our app, we're losing data. While discussing this with other teams inside Adobe, we've noticed this a common problem with JVM-based apps.

Any extra documentation required to understand the issue.

(as you can see in the graph above, whenever a new pod gets created, requests start failing for a couple of minute)

We tried the same scenario by using a Service type LoadBalancer, in EKS (with an Elastic Load Balancer in front) and we don't see the issue. The ELB is doing a progressive traffic increase on the new pod, as the graph seen below.

(in the graph above, you can see the number of requests received by the new pod from the ELB, which is gradually increasing)

mattklein123 · 2020-05-04T22:03:22Z

This is something that I have wanted to add for quite some time. I think the easiest implementation would be to keep track of host addition and ramp time, and if this option is enabled, scale the host picks for RR and LR by some amount during the warm up period. cc @snowp @tonya11en

nezdolik · 2020-07-23T22:27:41Z

@mattklein123 i would like to help with this

mattklein123 · 2020-07-23T23:31:31Z

@nezdolik awesome sounds great. Do you want to put together a short design doc on an implementation proposal?

nezdolik · 2020-07-27T08:45:39Z

@mattklein123 will do

nezdolik · 2020-07-31T14:39:07Z

@costimuraru @mattklein123 @snowp @tonya11en please take a look at RFC: https://docs.google.com/document/d/1NiG1X0gbfFChjl1aL-EE1hdfYxKErjJ2688wJZaj5a0/edit?usp=sharing

mattklein123 · 2020-07-31T19:04:52Z

Thanks @nezdolik for working on this! Overall looks great. There are a few comment threads to work through in the doc but very excited to see this being worked on.

wbpcode · 2020-08-21T02:52:58Z

is there any progress in this work? I very much hope that this work will be completed soon. If there is a need, perhaps I can help as well.

Stono · 2020-11-04T16:51:58Z

As an organisation that is 75% Java; we'd love this.

nightmareze1 · 2020-11-06T18:46:36Z

+1

ejc3 · 2020-11-06T19:03:52Z

This would be a great feature!

In the meantime, what we've done for some apps is basically run a little mini load test from within the pod to warm them up. Even with slow start, a per pod warm up would still be useful in situations where all Pods / VM's behind Envoy were restarted at the same time to ensure they can properly start serving traffic and won't instantly be overwhelmed when they are put into service.

costimuraru · 2021-02-26T15:40:57Z

the RFC looks great, @nezdolik. Is there anything that prevents us from implementing it?

nezdolik · 2021-03-08T18:33:56Z

@costimuraru there is an in progress PR: #13176, "slow start" is slowly moving forward.

costimuraru mentioned this issue May 4, 2020

Progressive traffic increase for new Pods projectcontour/contour#2296

Closed

mattklein123 added area/load balancing help wanted Needs help! labels May 4, 2020

snowp assigned nezdolik Jul 23, 2020

snowp mentioned this issue Aug 17, 2020

Warmup of host for Java Applications #12678

Closed

nezdolik mentioned this issue Sep 18, 2020

Support slow Start mode in Envoy #13176

Merged

ramaraochavali mentioned this issue Nov 4, 2020

Ability to gradually warm services istio/istio#21228

Closed

mattklein123 closed this as completed in #13176 Sep 30, 2021

jiangshantao-dbg mentioned this issue Dec 28, 2021

add warmup duration secs api istio/api#2153

Merged

nickcaballero mentioned this issue Feb 7, 2023

Support Envoy slow start mode kgateway-dev/kgateway#7807

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Progressive traffic increase for new Pods (slow start mode) #11050

Progressive traffic increase for new Pods (slow start mode) #11050

costimuraru commented May 4, 2020

mattklein123 commented May 4, 2020

nezdolik commented Jul 23, 2020

mattklein123 commented Jul 23, 2020

nezdolik commented Jul 27, 2020

nezdolik commented Jul 31, 2020

mattklein123 commented Jul 31, 2020

wbpcode commented Aug 21, 2020

Stono commented Nov 4, 2020

nightmareze1 commented Nov 6, 2020

ejc3 commented Nov 6, 2020

costimuraru commented Feb 26, 2021 •

edited

Loading

nezdolik commented Mar 8, 2021

Progressive traffic increase for new Pods (slow start mode) #11050

Progressive traffic increase for new Pods (slow start mode) #11050

Comments

costimuraru commented May 4, 2020

mattklein123 commented May 4, 2020

nezdolik commented Jul 23, 2020

mattklein123 commented Jul 23, 2020

nezdolik commented Jul 27, 2020

nezdolik commented Jul 31, 2020

mattklein123 commented Jul 31, 2020

wbpcode commented Aug 21, 2020

Stono commented Nov 4, 2020

nightmareze1 commented Nov 6, 2020

ejc3 commented Nov 6, 2020

costimuraru commented Feb 26, 2021 • edited Loading

nezdolik commented Mar 8, 2021

costimuraru commented Feb 26, 2021 •

edited

Loading