Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection pool deadlock #531

Closed
DarkSeraphim opened this issue Apr 30, 2017 · 6 comments
Closed

Connection pool deadlock #531

DarkSeraphim opened this issue Apr 30, 2017 · 6 comments
Labels
type: bug A general bug
Milestone

Comments

@DarkSeraphim
Copy link

One the GenericObjectPool is drained, it will block and wait for a new connection (default behaviour, which is also something I like to keep). The problem is that it will never fire, since it will block inside a synchronized method you override, which also blocks any returnObject calls which would normally free a resource and stop the wait.

To reproduce: gist
Version: current 5.x branch

@mp911de mp911de added the type: bug A general bug label Apr 30, 2017
@mp911de mp911de added this to the Lettuce 4.3.3 milestone Apr 30, 2017
mp911de added a commit that referenced this issue Apr 30, 2017
borrowObject and returnObject methods are no longer synchronized to use the pool synchronization and to solve a deadlock in which the pool is exhausted, a new connection is requested, and a connection cannot be returned to the pool because of instance-level locking.

Previously, if a client got stuck in the borrowObject method (pool exhausted, long connect duration) and a concurrent thread tried to return the connection via returnObject, the second thread was blocked by the first thread. Synchronized methods use the object instance to create a synchronization lock. The instance was locked by the first thread and so the second thread had to wait (an indefinite time) for a release, that would likely never happen.
mp911de added a commit that referenced this issue Apr 30, 2017
borrowObject and returnObject methods are no longer synchronized to use the pool synchronization and to solve a deadlock in which the pool is exhausted, a new connection is requested, and a connection cannot be returned to the pool because of instance-level locking.

Previously, if a client got stuck in the borrowObject method (pool exhausted, long connect duration) and a concurrent thread tried to return the connection via returnObject, the second thread was blocked by the first thread. Synchronized methods use the object instance to create a synchronization lock. The instance was locked by the first thread and so the second thread had to wait (an indefinite time) for a release, that would likely never happen.
mp911de added a commit that referenced this issue Apr 30, 2017
borrowObject and returnObject methods are no longer synchronized to use the pool synchronization and to solve a deadlock in which the pool is exhausted, a new connection is requested, and a connection cannot be returned to the pool because of instance-level locking.

Previously, if a client got stuck in the borrowObject method (pool exhausted, long connect duration) and a concurrent thread tried to return the connection via returnObject, the second thread was blocked by the first thread. Synchronized methods use the object instance to create a synchronization lock. The instance was locked by the first thread and so the second thread had to wait (an indefinite time) for a release, that would likely never happen.
@mp911de
Copy link
Collaborator

mp911de commented Apr 30, 2017

There's no need to have borrowObject and returnObject synchronized, the underlying pool implementations handle synchronization properly. Thanks for the bug report, that's fixed now.

@mp911de mp911de closed this as completed Apr 30, 2017
@qwqmobile
Copy link

hi, could you pls confirm whether the follow bug is #531.
There is about 100 redis clusters. After upgrading lettuce from 4.2.2.final to 4.4.3.final, the RedisCommandTimeoutException was gotten and the server stoped to wait "currentChannel.pipeline().fireUserEventTriggered(new ConnectionEvents.PrepareClose())" and "currentChannel.pipeline().fireUserEventTriggered(new ConnectionEvents.Close())".

The dumped stack is as follows,
"ClusterRebalancer-0" #71 prio=5 os_prio=0 tid=0x00007fdf7fc36800 nid=0x3ea1 in Object.wait() [0x00007fdf241c0000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:274)
- locked <0x00000007606b00f8> (a io.netty.channel.DefaultChannelPromise)
at io.netty.channel.DefaultChannelPromise.awaitUninterruptibly(DefaultChannelPromise.java:137)
at io.netty.channel.DefaultChannelPromise.awaitUninterruptibly(DefaultChannelPromise.java:30)
at io.netty.util.concurrent.DefaultPromise.syncUninterruptibly(DefaultPromise.java:410)
at io.netty.channel.DefaultChannelPromise.syncUninterruptibly(DefaultChannelPromise.java:125)
at io.netty.channel.DefaultChannelPromise.syncUninterruptibly(DefaultChannelPromise.java:30)
at com.lambdaworks.redis.protocol.CommandHandler.close(CommandHandler.java:1110)
at com.lambdaworks.redis.cluster.ClusterNodeCommandHandler.close(ClusterNodeCommandHandler.java:101)
at com.lambdaworks.redis.RedisChannelHandler.close(RedisChannelHandler.java:106)
- locked <0x00000007606b0120> (a com.lambdaworks.redis.StatefulRedisConnectionImpl)
at com.lambdaworks.redis.cluster.topology.Connections.close(Connections.java:117)
at com.lambdaworks.redis.cluster.topology.ClusterTopologyRefresh.loadViews(ClusterTopologyRefresh.java:102)
at com.lambdaworks.redis.cluster.RedisClusterClient.loadPartitions(RedisClusterClient.java:850)
at com.lambdaworks.redis.cluster.RedisClusterClient.initializePartitions(RedisClusterClient.java:824)
at com.lambdaworks.redis.cluster.RedisClusterClient.connectClusterImpl(RedisClusterClient.java:570)
at com.lambdaworks.redis.cluster.RedisClusterClient.connect(RedisClusterClient.java:356)
at com.lambdaworks.redis.cluster.RedisClusterClient.connect(RedisClusterClient.java:333)
at com.amazon.redisnanny.admin.ClusterRebalancer.rebalanceAndCleanUpCluster(ClusterRebalancer.java:179)
at com.amazon.redisnanny.admin.ClusterRebalancer.rebalanceCluster(ClusterRebalancer.java:117)
at com.amazon.redisnanny.admin.ClusterRebalancer$$Lambda$19/221033168.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Could you give me some advices?
Thanks in advance.

@mp911de
Copy link
Collaborator

mp911de commented Dec 6, 2019

The 4.x line is no longer maintained. Please upgrade to 5.2.1.RELEASE.

@qwqmobile
Copy link

ok, 3ks

@qwqmobile
Copy link

Now I have upgraded to 5.0.0.release and it's deployed to three services which each has six clusters and runs about a week, there are two services that got the RedisCommandTimeoutException and the dumped stack is as follows
"ClusterRebalancer-0" #70 prio=5 os_prio=0 tid=0x00007f22bb280000 nid=0x5762 in Object.wait() [0x00007f22577f5000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:274)
- locked <0x0000000736c20fe0> (a io.netty.channel.DefaultChannelPromise)
at io.netty.channel.DefaultChannelPromise.awaitUninterruptibly(DefaultChannelPromise.java:137)
at io.netty.channel.DefaultChannelPromise.awaitUninterruptibly(DefaultChannelPromise.java:30)
at io.netty.util.concurrent.DefaultPromise.syncUninterruptibly(DefaultPromise.java:410)
at io.netty.channel.DefaultChannelPromise.syncUninterruptibly(DefaultChannelPromise.java:125)
at io.netty.channel.DefaultChannelPromise.syncUninterruptibly(DefaultChannelPromise.java:30)
at io.lettuce.core.protocol.DefaultEndpoint.close(DefaultEndpoint.java:484)
at io.lettuce.core.cluster.ClusterNodeEndpoint.close(ClusterNodeEndpoint.java:70)
at io.lettuce.core.RedisChannelHandler.close(RedisChannelHandler.java:111)
- locked <0x0000000736c21058> (a io.lettuce.core.StatefulRedisConnectionImpl)
at io.lettuce.core.cluster.topology.Connections.close(Connections.java:117)
at io.lettuce.core.cluster.topology.ClusterTopologyRefresh.loadViews(ClusterTopologyRefresh.java:101)
at io.lettuce.core.cluster.RedisClusterClient.loadPartitions(RedisClusterClient.java:787)
at io.lettuce.core.cluster.RedisClusterClient.initializePartitions(RedisClusterClient.java:761)
at io.lettuce.core.cluster.RedisClusterClient.connectClusterImpl(RedisClusterClient.java:500)
at io.lettuce.core.cluster.RedisClusterClient.connect(RedisClusterClient.java:339)
at io.lettuce.core.cluster.RedisClusterClient.connect(RedisClusterClient.java:316)
at com.amazon.redisnanny.admin.ClusterRebalancer.rebalanceAndCleanUpCluster(ClusterRebalancer.java:179)
at com.amazon.redisnanny.admin.ClusterRebalancer.rebalanceCluster(ClusterRebalancer.java:117)
at com.amazon.redisnanny.admin.ClusterRebalancer$$Lambda$19/385041270.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

"RedisHealthMonitor-0" #69 prio=5 os_prio=0 tid=0x00007f22bb283800 nid=0x5761 in Object.wait() [0x00007f22578f7000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:274)
- locked <0x0000000736c16e30> (a io.netty.channel.DefaultChannelPromise)
at io.netty.channel.DefaultChannelPromise.awaitUninterruptibly(DefaultChannelPromise.java:137)
at io.netty.channel.DefaultChannelPromise.awaitUninterruptibly(DefaultChannelPromise.java:30)
at io.netty.util.concurrent.DefaultPromise.syncUninterruptibly(DefaultPromise.java:410)
at io.netty.channel.DefaultChannelPromise.syncUninterruptibly(DefaultChannelPromise.java:125)
at io.netty.channel.DefaultChannelPromise.syncUninterruptibly(DefaultChannelPromise.java:30)
at io.lettuce.core.protocol.DefaultEndpoint.close(DefaultEndpoint.java:484)
at io.lettuce.core.RedisChannelHandler.close(RedisChannelHandler.java:111)
- locked <0x0000000736c16e58> (a io.lettuce.core.StatefulRedisConnectionImpl)
at com.amazon.redisnanny.admin.RedisHealthMonitor.metricsCollect(RedisHealthMonitor.java:163)
at com.amazon.redisnanny.admin.RedisHealthMonitor$$Lambda$18/265644189.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

"RedisConsistencyMonitor-0" #68 prio=5 os_prio=0 tid=0x00007f22bb281000 nid=0x5760 in Object.wait() [0x00007f22579f8000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:274)
- locked <0x0000000736c17030> (a io.netty.channel.DefaultChannelPromise)
at io.netty.channel.DefaultChannelPromise.awaitUninterruptibly(DefaultChannelPromise.java:137)
at io.netty.channel.DefaultChannelPromise.awaitUninterruptibly(DefaultChannelPromise.java:30)
at io.netty.util.concurrent.DefaultPromise.syncUninterruptibly(DefaultPromise.java:410)
at io.netty.channel.DefaultChannelPromise.syncUninterruptibly(DefaultChannelPromise.java:125)
at io.netty.channel.DefaultChannelPromise.syncUninterruptibly(DefaultChannelPromise.java:30)
at io.lettuce.core.protocol.DefaultEndpoint.close(DefaultEndpoint.java:484)
at io.lettuce.core.RedisChannelHandler.close(RedisChannelHandler.java:111)
- locked <0x0000000736c17058> (a io.lettuce.core.StatefulRedisConnectionImpl)
at com.amazon.redisnanny.utils.RedisCommandsUtils.closeConnections(RedisCommandsUtils.java:76)
at com.amazon.redisnanny.provider.LettuceClient.getClusterTopology(LettuceClient.java:70)
at com.amazon.redisnanny.provider.ReplicationInfoCollector.getReplicationInfo(ReplicationInfoCollector.java:35)
at com.amazon.redisnanny.admin.RedisConsistencyMonitor.metricsCollect(RedisConsistencyMonitor.java:136)
at com.amazon.redisnanny.admin.RedisConsistencyMonitor$$Lambda$17/972335044.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Could you pls give me some advices? Thanks in advance.

@qwqmobile
Copy link

Could you please tell me whether lettuce 4.4 or 5.0 is compatible with redis 3.2.8?
Thank you in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A general bug
Projects
None yet
Development

No branches or pull requests

3 participants