-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Envoy Gateway performance at scale #1365
Comments
cc @AliceProxy @haq204 |
Good first target: 1GB RAM usage at 1000 HTTPRoutes. (We'd probably be OK at 2GB, but let's go for 1GB.) |
Do we run performance test in GitHub CI ? I do not know if GitHub CI provides enough resources to run large scale eg tests. |
it provides 7GB RAM https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources, we might be able to do without a self hosted runner, since we are shooting for a usage of 1GB |
Current statistics Notes:
|
looks like envoy proxy is building a framework / test suite for perf testing |
* Moves envoyproxy#24 into v0.5.0 since it carries over from v0.4.0 * Adds envoyproxy#1365 since it tracks the work items of the Scale theme * Removed other items not tied directly to the roadmap theme * Added a placeholder roadmap theme for v0.6.0 Signed-off-by: Arko Dasgupta <[email protected]>
* Update roadmap for v0.5.0 * Moves #24 into v0.5.0 since it carries over from v0.4.0 * Adds #1365 since it tracks the work items of the Scale theme * Removed other items not tied directly to the roadmap theme * Added a placeholder roadmap theme for v0.6.0 Signed-off-by: Arko Dasgupta <[email protected]> * rm unused link Signed-off-by: Arko Dasgupta <[email protected]> * fix roadmap Signed-off-by: Arko Dasgupta <[email protected]> --------- Signed-off-by: Arko Dasgupta <[email protected]>
Hey @arkodg, I am not sure this one could be done in v0.5.0 for recent I am lack of bandwidth. I will focus more on observability of Envoy Gateway control plane, if I still have any bandwidth after resolving observability of EG, I can still work on this one, but not sure for v0.5.0. So if any other maintainers want to take this one, feel free to take this one from me. Thanks. |
np @Xunzhuo, thanks for the heads up, please unassign yourself, hoping someone from the community will pick this one up |
moving this to the 0.6.0-rc1 milestone since it is still unsigned and unlikely to be finished within the v0.5.0 timeline |
@qicz checking I to see if you have any cycles to help with this one |
cc @gyohuangxin |
thanks @soulxu & @gyohuangxin for picking up this up ! |
@arkodg @Xunzhuo Here is the Propose to add Performance Benchmarking at Scale in EnvoyGateway CI Pipeline, which outlines some plans and options based on my personal thoughts, looking for your feedback. If my ideas are not correct or do not meet the original intention of this issue, please correct me. Thanks in advance! cc @soulxu |
Thanks @gyohuangxin. I have looked throught the docs, I think most of it covers the data plane performance tests. I would like to see more tests on control plane perf test, like observing CP status when facing different scale of numbers of Gateway/xRoute/xPolicy/Service/Endpoint/EndpointSlice. |
@Xunzhuo Thanks for your comments, my thought is to send load requests to data plane at different scale, and then use Prometheus to collect metrics from both control plane and data plane. What do you mean by "observing CP status", is it "observing CPU status"? Yes, we can monitor the control plane's cpu status to see how much EndpointsSlices a single EG instance can support. What do you think about it? |
The data-plane and control-plane performance tests can be separate things. Do you mean the control-plane performance is more important for now? we can adjust the priorities. |
I think we should test control-plane first. |
@zirain Thanks for your comments, we will consider control-plane first. But I think testing frameworks can be universal. |
the docs looks looks good @gyohuangxin, left some comments ! agree with everyone here, we should focus on control plane first something to also keep in mind while designing this @gyohuangxin, since you're also active in gateway api, would be great if the framework can be reused for other gateway api implementation in the future, implementation perf comparisons would really benefit the end user from the document, looks like it can be EG agnostic |
@arkodg Thanks for your helpful comments.
It’s a great idea to use this framework in other gateway API implementations and compare their performance. You’re correct that this framework is inherently universal, and we should always take its versatility into account. |
@Xunzhuo is it possible to calculate time to program data plane via CP metrics today ? that would be handy in perf benchmarking |
@arkodg by exposing some xds metrics? |
@Xunzhuo we can calculate it CP by difference between provider reconcile time to xds server push time, (but that may not be entirely accurate ) |
Basically, we want to see time consuming from all abstraction layers inside envoy gateway by measuring the time interval from provider reconcile stage to xDS pushing stage. However, when envoy proxy taking effect from xDS is out of control plane control which is Envoy gateway in our case. |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. |
Description:
This issue tracks the performance (Throughput, Latency) of Envoy Gateway control plane and data plane at scale (Service, xRoutes, Gateway, Client Connections) .
[optional Relevant Links:]
Emissary: https://www.getambassador.io/docs/emissary/latest/topics/running/scaling
Contour: https://github.com/projectcontour/contour-perf / https://projectcontour.io/guides/resource-limits/
Istio: https://istio.io/v1.16/docs/ops/deployment/performance-and-scalability/
The text was updated successfully, but these errors were encountered: