-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider topology updates for default Cluster connections #1317
Comments
Let me rephrase your report to see whether I understood it correctly. With no failover you mean that sending There should be no difference between The Cluster Pub/Sub connection does not consider key routing at all and therefore it sends When establishing cluster connections, we attempt to connect to one of the bootstrap nodes configured through The client refreshes its topology view according to the configuration. Once this happens, any connections to nodes that are no longer part of the cluster are closed, independent from whether the node connection was healthy or not. So in case of key routing, the client might be reconfigured with a new target server which is then used for subsequent commands. The default connection remains always active until the client sees a connection reset of So in terms of resiliency, the default connection is a bottleneck as it does not participate in topology refresh until the connection sees a disconnect. I'm not sure we can really do something about it. While we can enable key routing for Happy to discuss ideas. |
Yes.
After I reported issue, I thoroughly read the code of lettuce and understood about this part. Huum. But I'm having trouble with When there is a connection to a node that is FAIL when refreshing the cluster topology:
|
Right now, the default connection isn't aware of its nodeId. I'm considering to change that so that the default connection knows which cluster node it is connected to. Once the cluster topology is refreshed and nodes get updated, the default connection checks if it is still part of the topology. If not, then we would issue a reconnect and the default connection would connect to a random node. |
Oh, sounds good. I hope it works. I hope it helps other people who have similar problems. I was also doing a test by shutting down the redis node, but I could not test the situation because redis could not return a response. |
… of the cluster #1317 If the default cluster connection points to a node that is no longer part of the cluster, then the connection is reset to point to a cluster member again. Cluster connection facades therefore are aware of their node Id and once the Partitions get updated, the facade verifies cluster membership. The check isn't considering failure flags, only cluster membership. The connection reset is tied to ClusterClientOptions.isCloseStaleConnections which can be disabled on demand.
Bug Report
※ Similar to the issue I shared before. #1245
Current Behavior & Input Code
The other day, the redis cluster node(VM) became unable to return a response due to a hypervisor failure, and a timeout error began to occur. I expected lettuce to do the failover. However, failover had failed. Finally, when the VM of the redis cluster node shut down completely, failover succeeded.
The redis cluster is used exclusively for the purpose of pubsub.
Just like the issue I shared before(#1245), I was able to reproduce it with code like this:
redis cluster client manages the nodes that connect using the
cluster nodes
command. Therefore, it was difficult to use toxiproxy like before. As a result, I reproduced the situation where we used a heavy lua script to time out.In this situation, there is evidence that toporogy refresh is being performed.
However, the publish command keeps giving me a timeout error. failover is not done.
Expected behavior/code
Proper failover is performed even in such a situation.
Or please tell me a good workaround. pingBeforeActivateConnection had no effect.
...By the way, if I execute the publish command using
StatefulRedisClusterConnection
instead ofStatefulRedisClusterPubSubConnection
, the failover is done properly.Is it wrong to publish using
StatefulRedisClusterPubSubConnection
? If so, I'm sorry.Environment
Possible Solution
...
Additional context
...
The text was updated successfully, but these errors were encountered: