-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load balance long-lived connections upon auto scale-out or scale-in of Envoy #15283
Comments
+1 for admin interface. Listener or network filter probably don't know how many connections should goaway. However, I feel it reasonable to expose it as a one time action and let some external tools to trigger the rebalance |
See https://www.envoyproxy.io/docs/envoy/latest/operations/admin#post--drain_listeners. I think it would be reasonable to allow this API to take parameters that effect the drain speed. I think we originally discussed it when it was added. cc @auni53 |
I've been reviewing
@mattklein123 are there notes/proposal relating to " I think we originally discussed it when it was added". |
The suggestions in this thread sound reasonable to me. I'm not clear on where you're seeing The original draining behaviour, i.e., drain-strategy=immediate, does an algorithm where the probability of a request being drained scales with the progress into the drain period. I left a comment to describe this. That would be the simplest place to add a new API mode or parameters. |
IIRC the current DrainManager controls drain hint, but end up with an action A partial drain requires both controllable deterministic hint, e.g. not ideal if it answers "should drain" and answer "should not drain" on the same connection. |
@auni53 If we are going to drain a percentage of connections, ideally we should be able to know, at the time of the drain call how many connections there are. There are two problems that i see with this:
There are numConnections() for ListenerManager to wit:
This cycles of workers, which when draining is active could lead to unacceptable overhead. I wonder, is there a preferred method of obtaining (with least overhead) the number of active downstream connections that is not going to impact performance. |
We don't have this today, but we could add it. A couple of quick ideas: 1) Give access to connections on current worker only (simple, might be good enough). 2) Do some periodic calculation that puts total connections into am atomic variable that can be read, potentially specific to the listener, server, etc. We already basically do this here: Lines 228 to 229 in 38f6738
So could add that information on the stats flush interval or some other interval. I doubt your use case requires it to be exact? |
Hey @mattklein123 I like the "So could add that information on the stats flush interval or some other interval. I doubt your use case requires it to be exact?". Thanks. |
@twghu @lambdai @mattklein123 will the solution discussed here impact both TCP and HTTP connection or only HTTP connection? |
|
The new feature (max_requests_per_connection for downstream connections) to be introduced in v1.20 will hopefully solve my issue of rebalancing long-lived connections. When this parameter is set, once the limit on max number of requests per connection is reached, the connection will be closed and the external load balancer will be forced to open a new connection. Any other alternatives to be discussed here? If not, I can close this issue. |
sha-rath, I recommend keeping it, because the TCP part is still not fixed. And still some discussions don't have results until now. |
To revive this conversation and add my two cents:
ProposalPlease let me know your thoughts! Thanks! |
Consider a scenario where there is auto scale-out of Envoy. In such a situation, I would actually like to drain few connections from an old instance and transfer that load to the newly scaled-out instance. Currently, I cannot find a way to drain a certain amount of connections an Envoy instance handles.
Would it be possible to have some sort of configuration where we could set the percentage of connections to drain based on certain parameters like number of connections per envoy instance or the total number of connections ?
The text was updated successfully, but these errors were encountered: