-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Progressive traffic increase for new Pods #2296
Comments
Thanks for logging this issue. This sounds like a time where health checks from Contour or readiness checks from Kubernetes would help. Kubernetes supports pod readiness checks, and Contour supports endpoint health checks, both of which could ensure that traffic does not get to a warmed instance, as long as your application can indicate that it's ready somehow. Contour's endpoint health checks are only available in the HTTPProxy object ( and the now deprecated IngressRoute), however. Pod readiness checks are available in any recent version of Kubernetes. |
Thanks, @youngnick. |
I agree with what @youngnick suggested. You could have your readiness probe call an endpoint which would trigger the app to warm up, but put an initial delay that matches the time your app needs to spin up. Additionally, you could look at adding a retry to the requests, so if the request does fail, then it would get retried by Envoy. I'm going to close this out, but please re-open if you have further questions on this @costimuraru ! |
Thanks for the response, @stevesloka
I think we might not be on the same page regarding the warm up. The warm up is not related to the application being slow to start or anything like that. This is about the app warming up by processing (real) HTTP requests. The scenario right now with Contour is:
This problem is known and other load balancers have implemented algorithms to mitigate it. For example see this from the Application Load Balancer from AWS: https://aws.amazon.com/about-aws/whats-new/2018/05/application-load-balancer-announces-slow-start-support/
This issue is related exactly to this kind of behavior, where Contour would be able to support a slow start mode and not overwhelm new pods with requests. |
Hey, @youngnick, @stevesloka, Any thoughts on the above? Appreciate the feedback. |
Hi @costimuraru, currently, Contour does minimal configuration of Envoy aside from what it's directed to do by Kubernetes objects. If I understand what you're asking for - having Contour detect new endpoint pods and gradually shift traffic to them - this is a very large change to Contour's current model of using Envoy, as it would involve Contour keeping track of all the health of all the endpoints of the service, and gradually changing the weights of each endpoint after a given period, which is a very large departure from our current model. I will speak to the team about this idea, we will need to double check if Envoy has any feature that would make adding this feature to Contour easier. |
In addition, I think what @stevesloka and I were trying to suggest earlier is having the readiness check do some common requests to the app itself to warm the caches before marking the pod as ready for traffic. |
Thanks for the detailed answer, @youngnick!
We tried this, but the number of requests is just too low to do any real warming (we're trying to warm up from 0 to ~4000 requests per second, for each pod). We also tried adding a PostStart lifecycle hook on the Pod, where we'd run an http generator process to send requests to the app (via localhost), but this also is a problematic. The warm up takes quite a bit of time (eg. ~ 2 minutes), during which the Pod is not actually receiving any external traffic. Even if we add tens of pods due to a spike, we are not able to process the extra requests, because we need for this warm up period to finish (so we're back to the VM world, where it takes minutes to spin up a new machine). |
@costimuraru - this is more an Envoy issue in my mind (Contour could leverage that feature of course, once implemented in Envoy). Have you considered filing the issue in the Envoy project instead? |
Thanks, @lrouquette. Created the issue in Envoy: envoyproxy/envoy#11050 |
This is available in Envoy now so Contour could adopt the feature! From slack convo:
|
cc @CrossingTheRiverPeole |
Added the |
It would be very useful for us to have support for this new Envoy feature in Contour. |
Thanks a lot for this !! |
@skriss If I understand the Compatibility matrix correctly, this means that this change would get rolled in the next major release (1.23.0 ??) and the minimum supported K8s version for this release will be 1.23. Is this correct? |
yes that is correct |
We have a JVM-based web app behind Contour/Envoy/NLB, with horizontal pod auto scaling in place.
When a new pod gets created due to auto scaling, Contour/Envoy directs a proportional amount of traffic on that new pod. However, because the app is cold, we're seeing consistent timeouts until it warms up.
We tried the same scenario by using a Service type LoadBalancer, in EKS (with an Elastic Load Balancer in front) and we don't see the same issue in this scenario. This seems to be because the ELB is doing a progressive traffic increase on the new pod, as the graph seen below.
Is there any plan to support something similar in Contour? I see we have the possibility to set weights for different services in an IngressRoute. Would it be something to consider to set some weifghts at pod level for a given service, based on their age? (or is something like this available today)?
The text was updated successfully, but these errors were encountered: