-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection leak when there is a master/slave node fail over #342
Comments
You're tackling a few issues here. Let's untangle these and see what we have here:
|
Thanks for your quick turnaround on this |
1+3. I think these things are related to each other. Removing the node from the topology might fit for your case, but not for others. Adding further configuration possibilities comes at a complexity and I'm not convinced yet that this is a common requirement. |
Is there anything I may assist you within the scope of this ticket or can I close it? |
I encountered an issue when there is a change in topology
Setup: 20 master nodes, each with 2 slaves. Running 4.1.1.Final. ReadFrom is read from slaves
Sequence of events:
During the time window when S2 is syncing from S1, there are many connections opened from a host to S2, eventually causing Redis to hit its client count limit and reject other connections. I see a bunch of IOException Connection reset by peer in the log.
Looking into the code (https://github.com/mp911de/lettuce/blob/4.2.x/src/main/java/com/lambdaworks/redis/cluster/PooledClusterConnectionProvider.java#L520), here is what i think is happening:
Is the fix here to close the connection when readonly() fails?
Would be nice if Redis "cluster nodes" command provides a flag for this state when Redis is loading in RAM so we can exclude from topology. An alternative is to somehow trigger an event when this happens (first time seeing Redis loading error on a connection) and have the topology refresher exclude the node from the topology. These are just wish-list items though. Would really appreciate if you can provide feedback on the bug and the proposed fix.
Note that when S2 is saturated, we're also seeing very weird behavior in another set of servers where they purely perform writes which should never establish any connection to slaves except for topology refresh. These hosts would have ConnectionWatchdog throwing exceptions like NullPointerException, cannot initialize channel ... Unfortunately, we don't have debug/trace log enabled so cant really debug more that that. I am assuming this issue is related to https://github.com/mp911de/lettuce/issues/278 ? After a while, both the writer hosts and reader hosts got into a state where this exception is seen everywhere
Thanks,
Make sure that:
You have read the contribution guidelines.
You specify the lettuce version and environment so it's obvious which version is affected
You provide a reproducible test case (either descriptive of as JUnit test) if it's a bug or the expected behavior differs from the actual behavior.
The text was updated successfully, but these errors were encountered: