Lingering topology refresh connections when using dynamic refresh sources #1342

tpf1994 · 2020-07-14T05:48:45Z

Bug Report

Current Behavior

our redis cluster maintain more than one Redis Cluster topology refresh connection,and it will increase with time,finally one node connection is over 10000 with our apps;

something different from wiki descriped as this:
Apart of connection objects, RedisClusterClient uses additional connections for topology refresh. These are created on topology refresh and closed after obtaining the topology:
Set of connections for cluster topology refresh (a connection to each cluster node)

exec "client list" on one redis server among 6 node,the connection with my app is over 10 in one day;
id=394808 addr=172.25.2.23:10751 fd=71 name=lettuce#ClusterTopologyRefresh age=7410 idle=7350 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=394645 addr=172.25.2.23:10468 fd=24 name=lettuce#ClusterTopologyRefresh age=7534 idle=7470 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=402712 addr=172.25.2.23:5076 fd=83 name=lettuce#ClusterTopologyRefresh age=1282 idle=1222 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=397139 addr=172.25.2.23:13189 fd=25 name=lettuce#ClusterTopologyRefresh age=5609 idle=5548 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=395582 addr=172.25.2.23:11557 fd=22 name=lettuce#ClusterTopologyRefresh age=6809 idle=6749 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=400244 addr=172.25.2.23:2644 fd=72 name=lettuce#ClusterTopologyRefresh age=3205 idle=3145 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=400088 addr=172.25.2.23:2430 fd=74 name=lettuce#ClusterTopologyRefresh age=3325 idle=3265 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=403560 addr=172.25.2.23:5897 fd=86 name=lettuce#ClusterTopologyRefresh age=621 idle=561 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=403794 addr=172.25.2.23:6117 fd=59 name=lettuce#ClusterTopologyRefresh age=441 idle=321 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client
id=396361 addr=172.25.2.23:12345 fd=45 name=lettuce#ClusterTopologyRefresh age=6209 idle=6149 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=client

configuration

private LettuceClientConfiguration lettuceClientConfiguration(){
        return LettucePoolingClientConfiguration
                .builder()
                .poolConfig(genericObjectPoolConfig())
                .clientOptions(
                        ClusterClientOptions
                        .builder()
                        .topologyRefreshOptions(refreshOptions())
                                .socketOptions(SocketOptions.builder().keepAlive(true).tcpNoDelay(true).build())
                        .build()
                )
                .build();

    }

    private ClusterTopologyRefreshOptions refreshOptions(){
        return ClusterTopologyRefreshOptions
                .builder()
                .enablePeriodicRefresh(Duration.ofSeconds(60))
                .enableAllAdaptiveRefreshTriggers()
                .build();
    }

Environment

Lettuce version(s): [5.0.4.RELEASE]
Redis version: [3.2.12]

The text was updated successfully, but these errors were encountered:

mp911de · 2020-07-14T06:26:29Z

Which Lettuce version are you using?

tpf1994 · 2020-07-14T06:31:58Z

Which Lettuce version are you using?

5.0.4.RELEASE

mp911de · 2020-07-14T07:47:55Z

Please upgrade to a newer one as we had issues with lingering connections during topology refresh.

arpangupta81 · 2020-07-15T08:31:01Z

@mp911de Can you please suggest the versions where lingering connections was resolved.
Since we can't increase the netty version, the versions: 5.3.1.RELEASE and 5.3.0.RELEASE is not compatible with the Netty's old version(4.1.36.Final).
Can you please let us know if lingering connections issue was resolved till 5.2.2.RELEASE. I went through the release notes couldn't get something like this.

mp911de · 2020-07-15T09:09:25Z

The issue was #721 which was actually shipped with 5.0.3.RELEASE. In that case, it seems there could be another leak.

mp911de · 2020-07-15T09:12:44Z

Looking at idle vs. age times, they are roughly 60 seconds apart. 60 seconds is a default timeout for commands that is used during topology refresh. Can you check your logs whether you can find any kind of exceptions that could help to find the root cause?

mp911de · 2020-07-29T13:32:14Z

I can confirm that I can reproduce the issue. The issue is related to dynamic refresh sources. It happens when a previous seed node connection is attempted again because it was wrongly identified as new node. Connections are stored in a map and the existing connection gets overwritten by a new connection hence the old connection isn't closed.

We now ensure that we don't create duplicate connections during topology refresh. Previously, the set difference algorithm reported differences for items that weren't different and so the dynamic refresh sources setting caused duplicate connections to nodes that were already connected. Since connections are held in a map, the new connection object overwrote the previous one which was left open.

mp911de · 2020-07-29T14:23:27Z

That's fixed now.

mindong789 · 2020-07-30T02:11:51Z

When will 5.3.3 be released？

mindong789 · 2020-07-30T06:12:04Z

I pulled the code you pushed yesterday, but the problem of increasing the number of connections has not been resolved.

mindong789 · 2020-07-30T06:17:14Z

connected_clients:47 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0
connected_clients:48 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0
connected_clients:50 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0
connected_clients:53 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0
connected_clients:72 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0

mp911de · 2020-07-30T07:25:48Z

Can you check the connected clients via CLIENTS LIST to see how many of them are named lettuce#ClusterTopologyRefresh?

mindong789 · 2020-07-30T07:45:51Z

I enter the client list and get the name= is empty, but ip is the address of my server

mp911de · 2020-07-30T07:51:52Z

Connections without a name are regular connections and not topology refresh connections. Please assign client names in your application to trace where they come from. If you feel there's an issue with the driver and you can make sure the issue isn't caused by your application then please file a new issue.

tpf1994 · 2020-08-12T06:59:55Z

That's fixed now.

hi,i upgrade my lettuce client version to 6.0.0RC,it really works and no more duplicate topology refresh connections exists over one day ;
but onething i want to confirm is that this problems #1342 happens only when redis cluster info changes (such as redis node fail or add) as you say 'difference algorithm caused'?

mp911de · 2020-08-12T07:07:49Z

Thanks for verifying the fix. The issue was caused by how the discovered set of Redis nodes vs. the seed nodes was compared. The seed node appeared also in the set of discovered nodes which caused yet another connection to be opened. Since all connections are held in a Map, the previously existing connection was overwritten in the map without actually closing it.

tpf1994 added the type: bug A general bug label Jul 14, 2020

mp911de added the status: waiting-for-feedback We need additional information before we can continue label Jul 14, 2020

mp911de removed the status: waiting-for-feedback We need additional information before we can continue label Jul 15, 2020

mp911de added the status: waiting-for-feedback We need additional information before we can continue label Jul 28, 2020

mp911de mentioned this issue Jul 29, 2020

Lettuce connect the cluster to refresh the topology graph and the number of connections keeps growing #1369

Closed

mp911de changed the title ~~Connections too much for Redis Cluster topology refresh~~ Lingering topology refresh connections after connect failure Jul 29, 2020

mp911de removed the status: waiting-for-feedback We need additional information before we can continue label Jul 29, 2020

mp911de changed the title ~~Lingering topology refresh connections after connect failure~~ Lingering topology refresh connections when using dynamic refresh sources Jul 29, 2020

mp911de added this to the 5.3.3 milestone Jul 29, 2020

mp911de closed this as completed Jul 29, 2020

mp911de mentioned this issue Apr 30, 2021

ClusterTopologyRefresh may cause the leakage of connections. #1736

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lingering topology refresh connections when using dynamic refresh sources #1342

Lingering topology refresh connections when using dynamic refresh sources #1342

tpf1994 commented Jul 14, 2020 •

edited by mp911de

Loading

mp911de commented Jul 14, 2020

tpf1994 commented Jul 14, 2020

mp911de commented Jul 14, 2020

arpangupta81 commented Jul 15, 2020

mp911de commented Jul 15, 2020

mp911de commented Jul 15, 2020

mp911de commented Jul 29, 2020

mp911de commented Jul 29, 2020

mindong789 commented Jul 30, 2020

mindong789 commented Jul 30, 2020

mindong789 commented Jul 30, 2020

mp911de commented Jul 30, 2020

mindong789 commented Jul 30, 2020

mp911de commented Jul 30, 2020

tpf1994 commented Aug 12, 2020 •

edited

Loading

mp911de commented Aug 12, 2020

Lingering topology refresh connections when using dynamic refresh sources #1342

Lingering topology refresh connections when using dynamic refresh sources #1342

Comments

tpf1994 commented Jul 14, 2020 • edited by mp911de Loading

Bug Report

Current Behavior

Environment

mp911de commented Jul 14, 2020

tpf1994 commented Jul 14, 2020

mp911de commented Jul 14, 2020

arpangupta81 commented Jul 15, 2020

mp911de commented Jul 15, 2020

mp911de commented Jul 15, 2020

mp911de commented Jul 29, 2020

mp911de commented Jul 29, 2020

mindong789 commented Jul 30, 2020

mindong789 commented Jul 30, 2020

mindong789 commented Jul 30, 2020

mp911de commented Jul 30, 2020

mindong789 commented Jul 30, 2020

mp911de commented Jul 30, 2020

tpf1994 commented Aug 12, 2020 • edited Loading

mp911de commented Aug 12, 2020

tpf1994 commented Jul 14, 2020 •

edited by mp911de

Loading

tpf1994 commented Aug 12, 2020 •

edited

Loading