Listener drain, next step ? #8564

lambdai · 2019-10-10T04:57:35Z

Listener mode was working well in the old world:

force close connection at certain time
Use a number of bind_to_port = false listeners and expect very few listeners are updated.

However, I am seeing the tendency that

Deprecate bind_to_port and migrate the small listeners into a listener with huge filter chain collection. We lose the ability to update a small listener. Instead the singleton listener is updated and drained entirely.
In istio, a new service could be up and down and lead to listener updated. As per 1), all the envoy in the cluster could be experiencing listener update and all the connections are drained.
For non-http connection there might be no way to close gently. E.g. a grpc streaming channel, or a mysql long running transaction.
A vicious or a buggy xds implementation(e.g. undeterministic hashed listener config) could update a listener frequently and create endless draining listeners. The draining listener corpse will stay in the heap until drain timeout.
Various legacy services treat connection as pet.

I am proposing changing the listener drain model:

Allow unlimited drain timeout as long as connection is alive. Something like connections ref-count the listener.
Instead of maintaining a drain time window, early announce the drain complete as long as no connection left (minor change)
Audit the drainage, if the draining listener/connection is overloaded, force close connection and remove listener.

mattklein123 · 2019-10-10T21:26:56Z

@lambdai the existing listener drain model works very well for the vast majority of deployments. Before we add a bunch of complexity, let's make sure that we are solving the right problems. Can you please put together a more complete statement of the problem and potential solutions in a gdoc and then share with the community? It would be great if you could pre-review this on the Istio side to make sure everyone is on the same page so we make sure we are focusing on the right problems Envoy side. Thank you!

lambdai · 2019-10-11T06:49:44Z

@mattklein123 Share you the doc: https://docs.google.com/document/d/1fjud3xSNRxxEwAWR1COsDnViB4FtTKTFgfQVW0IPU0c/edit#

stale · 2019-11-10T08:06:10Z

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

stale · 2019-11-17T08:20:05Z

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions.

mattklein123 added the design proposal Needs design doc/proposal before implementation label Oct 10, 2019

stale bot added the stale stalebot believes this issue/PR has not been touched recently label Nov 10, 2019

lambdai mentioned this issue Nov 12, 2019

[WIP] Prototype of per filter chain update lambdai/envoy-dai#6

Open

stale bot closed this as completed Nov 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Listener drain, next step ? #8564

Listener drain, next step ? #8564

lambdai commented Oct 10, 2019

mattklein123 commented Oct 10, 2019

lambdai commented Oct 11, 2019

stale bot commented Nov 10, 2019

stale bot commented Nov 17, 2019

Listener drain, next step ? #8564

Listener drain, next step ? #8564

Comments

lambdai commented Oct 10, 2019

mattklein123 commented Oct 10, 2019

lambdai commented Oct 11, 2019

stale bot commented Nov 10, 2019

stale bot commented Nov 17, 2019