Progressive traffic increase for new Pods #2296

costimuraru · 2020-02-28T15:35:49Z

We have a JVM-based web app behind Contour/Envoy/NLB, with horizontal pod auto scaling in place.
When a new pod gets created due to auto scaling, Contour/Envoy directs a proportional amount of traffic on that new pod. However, because the app is cold, we're seeing consistent timeouts until it warms up.

We tried the same scenario by using a Service type LoadBalancer, in EKS (with an Elastic Load Balancer in front) and we don't see the same issue in this scenario. This seems to be because the ELB is doing a progressive traffic increase on the new pod, as the graph seen below.

Is there any plan to support something similar in Contour? I see we have the possibility to set weights for different services in an IngressRoute. Would it be something to consider to set some weifghts at pod level for a given service, based on their age? (or is something like this available today)?

youngnick · 2020-03-01T23:27:53Z

Thanks for logging this issue.

This sounds like a time where health checks from Contour or readiness checks from Kubernetes would help.

Kubernetes supports pod readiness checks, and Contour supports endpoint health checks, both of which could ensure that traffic does not get to a warmed instance, as long as your application can indicate that it's ready somehow.

Contour's endpoint health checks are only available in the HTTPProxy object ( and the now deprecated IngressRoute), however. Pod readiness checks are available in any recent version of Kubernetes.

costimuraru · 2020-03-02T13:30:28Z

Thanks, @youngnick.
This sounds like we need to warm up the new pods ourselves. The issue was asking whether this could be handled by Contour/Envoy itself, by doing a progressive traffic increase on the new pod(s), hence warming up the instance.

stevesloka · 2020-03-02T18:46:06Z

I agree with what @youngnick suggested. You could have your readiness probe call an endpoint which would trigger the app to warm up, but put an initial delay that matches the time your app needs to spin up.

Additionally, you could look at adding a retry to the requests, so if the request does fail, then it would get retried by Envoy.

I'm going to close this out, but please re-open if you have further questions on this @costimuraru !

costimuraru · 2020-03-02T19:22:26Z

Thanks for the response, @stevesloka

have your readiness probe call an endpoint which would trigger the app to warm up

I think we might not be on the same page regarding the warm up. The warm up is not related to the application being slow to start or anything like that. This is about the app warming up by processing (real) HTTP requests.

The scenario right now with Contour is:

app starts on the new pod and is ready to handle requests (this happens quite fast)
contour throws a lot of requests to the new pod
app can't handle these many requests at once, being in a cold state and crashes

This problem is known and other load balancers have implemented algorithms to mitigate it. For example see this from the Application Load Balancer from AWS: https://aws.amazon.com/about-aws/whats-new/2018/05/application-load-balancer-announces-slow-start-support/

Application Load Balancers now support a slow start mode that allows you to add new targets without overwhelming them with a flood of requests. With the slow start mode, targets warm up before accepting their fair share of requests based on a ramp-up period that you specify

This issue is related exactly to this kind of behavior, where Contour would be able to support a slow start mode and not overwhelm new pods with requests.

costimuraru · 2020-03-10T14:28:02Z

Hey, @youngnick, @stevesloka,

Any thoughts on the above?

Appreciate the feedback.

youngnick · 2020-03-15T23:04:00Z

Hi @costimuraru, currently, Contour does minimal configuration of Envoy aside from what it's directed to do by Kubernetes objects.

If I understand what you're asking for - having Contour detect new endpoint pods and gradually shift traffic to them - this is a very large change to Contour's current model of using Envoy, as it would involve Contour keeping track of all the health of all the endpoints of the service, and gradually changing the weights of each endpoint after a given period, which is a very large departure from our current model.

I will speak to the team about this idea, we will need to double check if Envoy has any feature that would make adding this feature to Contour easier.

youngnick · 2020-03-15T23:05:40Z

In addition, I think what @stevesloka and I were trying to suggest earlier is having the readiness check do some common requests to the app itself to warm the caches before marking the pod as ready for traffic.

costimuraru · 2020-03-16T14:41:27Z

Thanks for the detailed answer, @youngnick!

In addition, I think what @stevesloka and I were trying to suggest earlier is having the readiness check do some common requests to the app itself to warm the caches before marking the pod as ready for traffic.

We tried this, but the number of requests is just too low to do any real warming (we're trying to warm up from 0 to ~4000 requests per second, for each pod). We also tried adding a PostStart lifecycle hook on the Pod, where we'd run an http generator process to send requests to the app (via localhost), but this also is a problematic. The warm up takes quite a bit of time (eg. ~ 2 minutes), during which the Pod is not actually receiving any external traffic. Even if we add tens of pods due to a spike, we are not able to process the extra requests, because we need for this warm up period to finish (so we're back to the VM world, where it takes minutes to spin up a new machine).
It's also quite hard to generate requests that map to real life use cases, as these are frequently getting updated. All in one, doing this warmup workarounds add quite a lot of work and don't yield the best results.

lrouquette · 2020-04-30T19:15:11Z

@costimuraru - this is more an Envoy issue in my mind (Contour could leverage that feature of course, once implemented in Envoy). Have you considered filing the issue in the Envoy project instead?

costimuraru · 2020-05-04T21:41:08Z

Thanks, @lrouquette. Created the issue in Envoy: envoyproxy/envoy#11050

stevesloka · 2021-12-13T18:42:21Z

This is available in Envoy now so Contour could adopt the feature!

From slack convo:

We'd need to just plan out the API features of how to implement. Probably would need to add to the services struct and add the slow-startup configuration:  https://github.com/projectcontour/contour/blob/main/apis/projectcontour/v1/httpproxy.go#L627

skriss · 2021-12-16T16:15:13Z

cc @CrossingTheRiverPeole

skriss · 2021-12-16T16:16:24Z

Added the help wanted label here if anyone is interested in picking up this issue!

costimuraru · 2021-12-20T14:41:23Z

It would be very useful for us to have support for this new Envoy feature in Contour.

tailrecur · 2022-10-17T14:43:21Z

Thanks a lot for this !!

tailrecur · 2022-10-18T19:52:33Z

@skriss If I understand the Compatibility matrix correctly, this means that this change would get rolled in the next major release (1.23.0 ??) and the minimum supported K8s version for this release will be 1.23. Is this correct?

sunjayBhatia · 2022-10-18T19:58:55Z

yes that is correct

stevesloka closed this as completed Mar 2, 2020

stevesloka reopened this Dec 13, 2021

skriss mentioned this issue Dec 16, 2021

Will the slowstart mode be supported in the future? #4245

Closed

skriss added kind/feature Categorizes issue or PR as related to a new feature. lifecycle/needs-triage Indicates that an issue needs to be triaged by a project contributor. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Dec 16, 2021

tsaarni mentioned this issue Sep 12, 2022

Gradual traffic increase for cold starting endpoints #4687

Closed

tsaarni self-assigned this Sep 20, 2022

tsaarni mentioned this issue Oct 4, 2022

Added support for Envoy slow start mode. #4772

Merged

skriss removed the lifecycle/needs-triage Indicates that an issue needs to be triaged by a project contributor. label Oct 4, 2022

skriss added this to Contour Oct 4, 2022

skriss added this to the 1.23.0 milestone Oct 4, 2022

skriss moved this to In Progress in Contour Oct 4, 2022

tsaarni closed this as completed in #4772 Oct 6, 2022

Repository owner moved this from In Progress to Done in Contour Oct 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Progressive traffic increase for new Pods #2296

Progressive traffic increase for new Pods #2296

costimuraru commented Feb 28, 2020 •

edited

Loading

youngnick commented Mar 1, 2020

costimuraru commented Mar 2, 2020

stevesloka commented Mar 2, 2020

costimuraru commented Mar 2, 2020 •

edited

Loading

costimuraru commented Mar 10, 2020

youngnick commented Mar 15, 2020

youngnick commented Mar 15, 2020

costimuraru commented Mar 16, 2020 •

edited

Loading

lrouquette commented Apr 30, 2020

costimuraru commented May 4, 2020

stevesloka commented Dec 13, 2021

skriss commented Dec 16, 2021

skriss commented Dec 16, 2021

costimuraru commented Dec 20, 2021

tailrecur commented Oct 17, 2022

tailrecur commented Oct 18, 2022

sunjayBhatia commented Oct 18, 2022

Progressive traffic increase for new Pods #2296

Progressive traffic increase for new Pods #2296

Comments

costimuraru commented Feb 28, 2020 • edited Loading

youngnick commented Mar 1, 2020

costimuraru commented Mar 2, 2020

stevesloka commented Mar 2, 2020

costimuraru commented Mar 2, 2020 • edited Loading

costimuraru commented Mar 10, 2020

youngnick commented Mar 15, 2020

youngnick commented Mar 15, 2020

costimuraru commented Mar 16, 2020 • edited Loading

lrouquette commented Apr 30, 2020

costimuraru commented May 4, 2020

stevesloka commented Dec 13, 2021

skriss commented Dec 16, 2021

skriss commented Dec 16, 2021

costimuraru commented Dec 20, 2021

tailrecur commented Oct 17, 2022

tailrecur commented Oct 18, 2022

sunjayBhatia commented Oct 18, 2022

costimuraru commented Feb 28, 2020 •

edited

Loading

costimuraru commented Mar 2, 2020 •

edited

Loading

costimuraru commented Mar 16, 2020 •

edited

Loading