Long-lived connection is not closed with `close_connections_on_host_set_change = true` option #26459

krapie · 2023-03-30T09:34:37Z

Title: Long-lived connection is not closed with close_connections_on_host_set_change = true option

Description:
Hi, I'm currently using Envoy proxy(in fact, I'm using Istio) with hash-based load balancing to distribute workloads throughout servers. I'm using both unary and stream gRPC to communicate with client.

Everything is working just fine, but I've faced "split-brain" of connections. Below explains how split-brain of connection occurs.

(Image from: https://docs.google.com/document/d/1yX8qRDXfbcOqNwSpaptFk1ru-bY_yHDLLOVBuLaclR4/edit#heading=h.cvkycohhpnph)

To solve this problem, I've configured close_connections_on_host_set_change = true option to drain/close all connections when there is change in cluster membership(host set change) via EnvoyFilter in Istio.

But connection wouldn't close even Envoy has been configured to close_connections_on_host_set_change = true. I've searched for this for a whole day, and found that close_connections_on_host_set_change will not work on gRPC stream because gRPC stream cannot be gracefully drained.

So my question is:

Is it possible to enforce force drain/closure of long-lived connections in current envoy?
I'm still confused about why long-lived connection(gRPC stream) is not closed. What it means that gRPC stream cannot be gracefully drained?

I'm currently searching for alternative way, where my gRPC server subscribes for xDS(CDS) changes and manually close gRPC stream. But I cannot find any references for retrieving xDS subscriptions or other API/RPCs for upstream cluster changes...

This is duplicate of #8867, but this issue is closed, so I've renewed this issue.

[optional Relevant Links:]
This is references that I found:

Issue: Support connection termination for hash-based load balancers: Support connection termination for hash-based load balancers #6730
RFC for connection termination for hash-based load balancers: https://docs.google.com/document/d/1yX8qRDXfbcOqNwSpaptFk1ru-bY_yHDLLOVBuLaclR4/edit#heading=h.cvkycohhpnph
PR: upstream: Introducing close_connections_on_host_set_change property: upstream: Introducing close_connections_on_host_set_change property #7675
About long-lived connection redistribution on Envoy scale out: Load balance long-lived connections upon auto scale-out or scale-in of Envoy #15283

The text was updated successfully, but these errors were encountered:

github-actions · 2023-04-29T16:01:42Z

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions · 2023-05-06T20:01:14Z

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

chromakode · 2023-11-25T06:17:05Z

I'm experiencing this same issue for a similar use-case: consistent hashing long-lived websocket connections for a collaborative document store. When hosts are added to my RingHash lb and the destination for a hash key changes, existing websocket connections never terminate (even with close_connections_on_host_set_change=true). This causes split-brain in my application. Ideally, I'd like to be able to force these connections to be closed over my configured drain time period.

@lizan Would you please consider keeping this issue open to track the feature request?

krapie added the triage Issue requires triage label Mar 30, 2023

lizan added question Questions that are neither investigations, bugs, nor enhancements and removed triage Issue requires triage labels Mar 30, 2023

krapie mentioned this issue Apr 12, 2023

Add gRPC MaxConnectionAge & MaxConnectionAgeGrace Options yorkie-team/yorkie#512

Merged

2 tasks

This was referenced Apr 28, 2023

Mitigate Split-brain of Long-lived Connection yorkie-team/yorkie#526

Open

About client closing gRPC stream on receiving HTTP/2 GOAWAY grpc/grpc-go#6232

Closed

github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 29, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 6, 2023

ryanrolds mentioned this issue Oct 29, 2024

Allow management of the Envoy configuration parameter: close_connections_on_host_set_change. k8sgateway/k8sgateway#9505

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long-lived connection is not closed with `close_connections_on_host_set_change = true` option #26459

Long-lived connection is not closed with `close_connections_on_host_set_change = true` option #26459

krapie commented Mar 30, 2023 •

edited

Loading

github-actions bot commented Apr 29, 2023

github-actions bot commented May 6, 2023

chromakode commented Nov 25, 2023

Long-lived connection is not closed with close_connections_on_host_set_change = true option #26459

Long-lived connection is not closed with close_connections_on_host_set_change = true option #26459

Comments

krapie commented Mar 30, 2023 • edited Loading

github-actions bot commented Apr 29, 2023

github-actions bot commented May 6, 2023

chromakode commented Nov 25, 2023

Long-lived connection is not closed with `close_connections_on_host_set_change = true` option #26459

Long-lived connection is not closed with `close_connections_on_host_set_change = true` option #26459

krapie commented Mar 30, 2023 •

edited

Loading