Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listener drain, next step ? #8564

Closed
lambdai opened this issue Oct 10, 2019 · 4 comments
Closed

Listener drain, next step ? #8564

lambdai opened this issue Oct 10, 2019 · 4 comments
Labels
design proposal Needs design doc/proposal before implementation stale stalebot believes this issue/PR has not been touched recently

Comments

@lambdai
Copy link
Contributor

lambdai commented Oct 10, 2019

Listener mode was working well in the old world:

  1. force close connection at certain time
  2. Use a number of bind_to_port = false listeners and expect very few listeners are updated.

However, I am seeing the tendency that

  1. Deprecate bind_to_port and migrate the small listeners into a listener with huge filter chain collection. We lose the ability to update a small listener. Instead the singleton listener is updated and drained entirely.
  2. In istio, a new service could be up and down and lead to listener updated. As per 1), all the envoy in the cluster could be experiencing listener update and all the connections are drained.
  3. For non-http connection there might be no way to close gently. E.g. a grpc streaming channel, or a mysql long running transaction.
  4. A vicious or a buggy xds implementation(e.g. undeterministic hashed listener config) could update a listener frequently and create endless draining listeners. The draining listener corpse will stay in the heap until drain timeout.
  5. Various legacy services treat connection as pet.

I am proposing changing the listener drain model:

  1. Allow unlimited drain timeout as long as connection is alive. Something like connections ref-count the listener.
  2. Instead of maintaining a drain time window, early announce the drain complete as long as no connection left (minor change)
  3. Audit the drainage, if the draining listener/connection is overloaded, force close connection and remove listener.
@mattklein123 mattklein123 added the design proposal Needs design doc/proposal before implementation label Oct 10, 2019
@mattklein123
Copy link
Member

@lambdai the existing listener drain model works very well for the vast majority of deployments. Before we add a bunch of complexity, let's make sure that we are solving the right problems. Can you please put together a more complete statement of the problem and potential solutions in a gdoc and then share with the community? It would be great if you could pre-review this on the Istio side to make sure everyone is on the same page so we make sure we are focusing on the right problems Envoy side. Thank you!

@lambdai
Copy link
Contributor Author

lambdai commented Oct 11, 2019

@stale
Copy link

stale bot commented Nov 10, 2019

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Nov 10, 2019
@stale
Copy link

stale bot commented Nov 17, 2019

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions.

@stale stale bot closed this as completed Nov 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design proposal Needs design doc/proposal before implementation stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

2 participants