Connection leak using pingBeforeActivateConnection when PING fails #1262

johnny-costanzo · 2020-04-07T20:03:43Z

Bug Report

I recently enabled pingBeforeActivateConnection to try and avoid issues where when one of our redis nodes restarts, lettuce is able to connect to the newly started node before it is done loading the data from the master node, resulting in LOADING exceptions. Figured since PING commands also fail in this scenario that it would be a good solution.

I noticed today that immediately following a redis node restarting (more specifically the k8s pod was killed and it came up elsewhere with a new IP) I saw a number of these stack traces in my logs which also coincided with a higher than normal connection count (to the node that just came up) as reported by my redis cluster's prometheus. Restarting my service brought the connection count back down to normal levels.

Is it possible that lettuce does not properly handle the case where a connection is successfully made to a redis node but then subsequently fails the "ping before activation" step? Either it should be closing the connection then and there, or it should try to reuse it next time.

FWIW I'm also doing something equivalent to this to get individual node connections:

myStatefulRedisClusterConnection.async().readOnly(predicate).asMap()

Current Behavior

Stack trace

java.util.concurrent.CompletionException: io.lettuce.core.RedisLoadingException: LOADING Redis is loading the dataset in memory
        at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
        at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
        at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
        at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
        at io.lettuce.core.AbstractRedisClient.lambda$null$3(AbstractRedisClient.java:356)
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
        at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
        at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
        at io.lettuce.core.PlainChannelInitializer.lambda$pingBeforeActivate$3(PlainChannelInitializer.java:163)
        at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
        at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
        at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
        at io.lettuce.core.protocol.AsyncCommand.doCompleteExceptionally(AsyncCommand.java:139)
        at io.lettuce.core.protocol.AsyncCommand.completeResult(AsyncCommand.java:120)
        at io.lettuce.core.protocol.AsyncCommand.complete(AsyncCommand.java:111)
        at io.lettuce.core.protocol.CommandHandler.complete(CommandHandler.java:652)
        at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:612)
        at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:563)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
        at io.netty.handler.flush.FlushConsolidationHandler.channelRead(FlushConsolidationHandler.java:154)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931)
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792)
        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:483)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:383)
        at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: io.lettuce.core.RedisLoadingException: LOADING Redis is loading the dataset in memory
        at io.lettuce.core.ExceptionFactory.createExecutionException(ExceptionFactory.java:132)
        at io.lettuce.core.ExceptionFactory.createExecutionException(ExceptionFactory.java:108)
        ... 23 more

Environment

Lettuce version(s): 5.2.0.RELEASE

mp911de · 2020-04-14T20:38:22Z

Thanks for the report. Indeed, Netty channels aren't closed when the connection handshake fails. Reconnects are not affected.

Lettuce now closes the channel when the connection handshake fails.

mp911de · 2020-04-14T20:45:33Z

That's fixed now.

johnny-costanzo added the type: bug A general bug label Apr 7, 2020

mp911de added this to the 5.3.0 milestone Apr 14, 2020

mp911de added a commit that referenced this issue Apr 14, 2020

Add test for connection handshake failures #1262

82ddb31

mp911de added a commit that referenced this issue Apr 14, 2020

Close Channel after init future failure #1262

0ed81fd

Lettuce now closes the channel when the connection handshake fails.

mp911de closed this as completed Apr 14, 2020

mp911de changed the title ~~Connection leak using pingBeforeActivateConnection when PING fails?~~ Connection leak using pingBeforeActivateConnection when PING fails Apr 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection leak using pingBeforeActivateConnection when PING fails #1262

Connection leak using pingBeforeActivateConnection when PING fails #1262

johnny-costanzo commented Apr 7, 2020

mp911de commented Apr 14, 2020

mp911de commented Apr 14, 2020

Connection leak using pingBeforeActivateConnection when PING fails #1262

Connection leak using pingBeforeActivateConnection when PING fails #1262

Comments

johnny-costanzo commented Apr 7, 2020

Bug Report

Current Behavior

Environment

mp911de commented Apr 14, 2020

mp911de commented Apr 14, 2020