-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support connection termination for hash-based load balancers #6730
Comments
This is an interesting feature request. I think this can definitely be added as a LB option for the hashing LBs. Marking help wanted. |
@mattklein123 would like to give it a try |
@nezdolik sounds great. I think for this it would be good to put together a short design doc before coding. Do you want to take a stab at that and we can discuss? |
@mattklein123 sounds good to me. Need few days to get familiar with relevant part of codebase. Will get back with design doc afterwards. |
RFC and relevant discussions here: https://docs.google.com/document/d/1yX8qRDXfbcOqNwSpaptFk1ru-bY_yHDLLOVBuLaclR4/edit?usp=sharing |
Support connection termination for hash based load-balancers
When using long-lived connections (websockets / gRPC) with a hash-based load balancer for the purpose of session affinity between a connection and upstream host, there is a need to be able to kill existing connections during rehashing (host added or removed).
Currently, if a long-lived connection is active and affinitized to a specific host, when rehashing occurs there is a 1/(N hosts) chance of that long-lived connection experiencing a split-brain issue, where the current connection stays active but all new requests are routed to a new/different host.
A simple use case for this is state management on a user / connection basis through the use of a
user-id
header. If we affinitize all requests (long-lived streams & unary) to a specific upstream host based on theuser-id
header, we're able to send unary requests directly to the host that also holds the active long-lived connection. This allows us to process the new incoming request and stream data through the long-lived connection. This all works well until we change the host count and rehashing occurs. When rehashing occurs, the long-lived connection stays active on its initial host but new requests can be affinitized to a different host causing the split-brain issue described above. New requests would be forwarded to a different host and said host will not be the "owner" of the long-lived connection.Supporting connection-termination on rehash ensures that the split-brained long-lived connections would be terminated and affinitized to the appropriate host on reconnect.
Relevant Links:
#2819
The text was updated successfully, but these errors were encountered: