Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lingering topology refresh connections when using dynamic refresh sources #1342

Closed
tpf1994 opened this issue Jul 14, 2020 · 16 comments
Closed
Labels
type: bug A general bug
Milestone

Comments

@tpf1994
Copy link

tpf1994 commented Jul 14, 2020

Bug Report

Current Behavior

our redis cluster maintain more than one Redis Cluster topology refresh connection,and it will increase with time,finally one node connection is over 10000 with our apps;

something different from wiki descriped as this:
Apart of connection objects, RedisClusterClient uses additional connections for topology refresh. These are created on topology refresh and closed after obtaining the topology:
Set of connections for cluster topology refresh (a connection to each cluster node)

exec "client list" on one redis server among 6 node,the connection with my app is over 10 in one day;
id=394808 addr=172.25.2.23:10751 fd=71 name=lettuce#ClusterTopologyRefresh age=7410 idle=7350 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=394645 addr=172.25.2.23:10468 fd=24 name=lettuce#ClusterTopologyRefresh age=7534 idle=7470 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=402712 addr=172.25.2.23:5076 fd=83 name=lettuce#ClusterTopologyRefresh age=1282 idle=1222 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=397139 addr=172.25.2.23:13189 fd=25 name=lettuce#ClusterTopologyRefresh age=5609 idle=5548 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=395582 addr=172.25.2.23:11557 fd=22 name=lettuce#ClusterTopologyRefresh age=6809 idle=6749 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=400244 addr=172.25.2.23:2644 fd=72 name=lettuce#ClusterTopologyRefresh age=3205 idle=3145 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=400088 addr=172.25.2.23:2430 fd=74 name=lettuce#ClusterTopologyRefresh age=3325 idle=3265 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=403560 addr=172.25.2.23:5897 fd=86 name=lettuce#ClusterTopologyRefresh age=621 idle=561 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=403794 addr=172.25.2.23:6117 fd=59 name=lettuce#ClusterTopologyRefresh age=441 idle=321 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=396361 addr=172.25.2.23:12345 fd=45 name=lettuce#ClusterTopologyRefresh age=6209 idle=6149 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client

configuration

private LettuceClientConfiguration lettuceClientConfiguration(){
        return LettucePoolingClientConfiguration
                .builder()
                .poolConfig(genericObjectPoolConfig())
                .clientOptions(
                        ClusterClientOptions
                        .builder()
                        .topologyRefreshOptions(refreshOptions())
                                .socketOptions(SocketOptions.builder().keepAlive(true).tcpNoDelay(true).build())
                        .build()
                )
                .build();

    }

    private ClusterTopologyRefreshOptions refreshOptions(){
        return ClusterTopologyRefreshOptions
                .builder()
                .enablePeriodicRefresh(Duration.ofSeconds(60))
                .enableAllAdaptiveRefreshTriggers()
                .build();
    }

Environment

  • Lettuce version(s): [5.0.4.RELEASE]
  • Redis version: [3.2.12]
@tpf1994 tpf1994 added the type: bug A general bug label Jul 14, 2020
@mp911de
Copy link
Collaborator

mp911de commented Jul 14, 2020

Which Lettuce version are you using?

@tpf1994
Copy link
Author

tpf1994 commented Jul 14, 2020

Which Lettuce version are you using?

5.0.4.RELEASE

@mp911de
Copy link
Collaborator

mp911de commented Jul 14, 2020

Please upgrade to a newer one as we had issues with lingering connections during topology refresh.

@mp911de mp911de added the status: waiting-for-feedback We need additional information before we can continue label Jul 14, 2020
@arpangupta81
Copy link

@mp911de Can you please suggest the versions where lingering connections was resolved.
Since we can't increase the netty version, the versions: 5.3.1.RELEASE and 5.3.0.RELEASE is not compatible with the Netty's old version(4.1.36.Final).
Can you please let us know if lingering connections issue was resolved till 5.2.2.RELEASE. I went through the release notes couldn't get something like this.

@mp911de
Copy link
Collaborator

mp911de commented Jul 15, 2020

The issue was #721 which was actually shipped with 5.0.3.RELEASE. In that case, it seems there could be another leak.

@mp911de mp911de removed the status: waiting-for-feedback We need additional information before we can continue label Jul 15, 2020
@mp911de
Copy link
Collaborator

mp911de commented Jul 15, 2020

Looking at idle vs. age times, they are roughly 60 seconds apart. 60 seconds is a default timeout for commands that is used during topology refresh. Can you check your logs whether you can find any kind of exceptions that could help to find the root cause?

@mp911de mp911de added the status: waiting-for-feedback We need additional information before we can continue label Jul 28, 2020
@mp911de mp911de changed the title Connections too much for Redis Cluster topology refresh Lingering topology refresh connections after connect failure Jul 29, 2020
@mp911de mp911de removed the status: waiting-for-feedback We need additional information before we can continue label Jul 29, 2020
@mp911de mp911de changed the title Lingering topology refresh connections after connect failure Lingering topology refresh connections when using dynamic refresh sources Jul 29, 2020
@mp911de
Copy link
Collaborator

mp911de commented Jul 29, 2020

I can confirm that I can reproduce the issue. The issue is related to dynamic refresh sources. It happens when a previous seed node connection is attempted again because it was wrongly identified as new node. Connections are stored in a map and the existing connection gets overwritten by a new connection hence the old connection isn't closed.

@mp911de mp911de added this to the 5.3.3 milestone Jul 29, 2020
mp911de added a commit that referenced this issue Jul 29, 2020
We now ensure that we don't create duplicate connections during topology refresh. Previously, the set difference algorithm reported differences for items that weren't different and so the dynamic refresh sources setting caused duplicate connections to nodes that were already connected. Since connections are held in a map, the new connection object overwrote the previous one which was left open.
mp911de added a commit that referenced this issue Jul 29, 2020
We now ensure that we don't create duplicate connections during topology refresh. Previously, the set difference algorithm reported differences for items that weren't different and so the dynamic refresh sources setting caused duplicate connections to nodes that were already connected. Since connections are held in a map, the new connection object overwrote the previous one which was left open.
@mp911de
Copy link
Collaborator

mp911de commented Jul 29, 2020

That's fixed now.

@mp911de mp911de closed this as completed Jul 29, 2020
@mindong789
Copy link

When will 5.3.3 be released?

@mindong789
Copy link

I pulled the code you pushed yesterday, but the problem of increasing the number of connections has not been resolved.

@mindong789
Copy link

connected_clients:47 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0
connected_clients:48 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0
connected_clients:50 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0
connected_clients:53 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0
connected_clients:72 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0

@mp911de
Copy link
Collaborator

mp911de commented Jul 30, 2020

Can you check the connected clients via CLIENTS LIST to see how many of them are named lettuce#ClusterTopologyRefresh?

@mindong789
Copy link

I enter the client list and get the name= is empty, but ip is the address of my server

@mp911de
Copy link
Collaborator

mp911de commented Jul 30, 2020

Connections without a name are regular connections and not topology refresh connections. Please assign client names in your application to trace where they come from. If you feel there's an issue with the driver and you can make sure the issue isn't caused by your application then please file a new issue.

@tpf1994
Copy link
Author

tpf1994 commented Aug 12, 2020

That's fixed now.

hi,i upgrade my lettuce client version to 6.0.0RC,it really works and no more duplicate topology refresh connections exists over one day ;
but onething i want to confirm is that this problems #1342 happens only when redis cluster info changes (such as redis node fail or add) as you say 'difference algorithm caused'?

@mp911de
Copy link
Collaborator

mp911de commented Aug 12, 2020

Thanks for verifying the fix. The issue was caused by how the discovered set of Redis nodes vs. the seed nodes was compared. The seed node appeared also in the set of discovered nodes which caused yet another connection to be opened. Since all connections are held in a Map, the previously existing connection was overwritten in the map without actually closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A general bug
Projects
None yet
Development

No branches or pull requests

4 participants