Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request queue size is not cleared on reconnect #616

Closed
nikolayk812 opened this issue Oct 3, 2017 · 12 comments
Closed

Request queue size is not cleared on reconnect #616

nikolayk812 opened this issue Oct 3, 2017 · 12 comments
Labels
type: bug A general bug
Milestone

Comments

@nikolayk812
Copy link

Lettuce 4.4.1 is used. After reconnect to restarted Redis request queue size is not cleared and all incoming requests are rejected.

java.util.concurrent.CompletionException: com.lambdaworks.redis.RedisException: Request queue size exceeded: 1000. Commands are not accepted until the queue size drops.
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
        at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:659)
        at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
        at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:800)
        at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)
@nikolayk812
Copy link
Author

I am wrong, sorry. I just set too small number 1000, not enough for my case.

@mp911de
Copy link
Collaborator

mp911de commented Oct 3, 2017

Let's keep that one open to make sure Lettuce behaves as it should and does not break in reconnect scenarios.

@mp911de mp911de reopened this Oct 3, 2017
@nikolayk812
Copy link
Author

@mp911de

com.lambdaworks.redis.RedisException: Request queue size exceeded: 500. Commands are not accepted until the queue size drops.

I get this exception not inside CompleteableFuture but in caller thread. It is strange behaviour also.

@mp911de
Copy link
Collaborator

mp911de commented Oct 5, 2017

That's the desired behavior to fail fast.

@nikolayk812
Copy link
Author

I will retest this issue with Lettuce 4.4.2 released version with different requestQueue sizes.

@mp911de mp911de added this to the Lettuce 4.4.2 milestone Oct 6, 2017
mp911de added a commit that referenced this issue Oct 6, 2017
Lettuce now cancels overcommitted commands that exceed the queue size. Overcommitted commands can happen if commands were written to the protocol stack and a disconnect attempts to copy these commands to a smaller disconnected buffer.
mp911de added a commit that referenced this issue Oct 6, 2017
Lettuce now cancels overcommitted commands that exceed the queue size. Overcommitted commands can happen if commands were written to the protocol stack and a disconnect attempts to copy these commands to a smaller disconnected buffer.
mp911de added a commit that referenced this issue Oct 6, 2017
Lettuce now cancels overcommitted commands that exceed the queue size. Overcommitted commands can happen if commands were written to the protocol stack and a disconnect attempts to copy these commands to a smaller disconnected buffer.
mp911de added a commit that referenced this issue Oct 6, 2017
Lettuce now cancels overcommitted commands that exceed the queue size. Overcommitted commands can happen if commands were written to the protocol stack and a disconnect attempts to copy these commands to a smaller disconnected buffer.
@mp911de
Copy link
Collaborator

mp911de commented Oct 6, 2017

I added a guard to cancel overcommitted commands – if there are more commands queued across the different buffers than allowed by the request queue size.

@mp911de mp911de closed this as completed Oct 6, 2017
@nikolayk812
Copy link
Author

nikolayk812 commented Oct 9, 2017

@mp911de
Looks like it is not fixed. I use Lettuce snapshot 4.4.2 of 6th October.
Request queue size is set to 50000. It takes 6 minutes for Lettuce to clear disconnected buffer, it starts to accept new commands only after 14 minutes.

@nikolayk812
Copy link
Author

nikolayk812 commented Oct 9, 2017

This stacktrace consumes 100% CPU basically.

Stack Trace	Sample Count	Percentage(%)
com.lambdaworks.redis.protocol.AsyncCommand.equals(Object)	2 130	53,815
   java.util.concurrent.ArrayBlockingQueue.remove(Object)	2 130	53,815
      com.lambdaworks.redis.protocol.CommandHandler$ListenerSupport.dequeue(boolean)	2 130	53,815
         com.lambdaworks.redis.protocol.CommandHandler$RetryListener.operationComplete(Future)	2 130	53,815
            io.netty.util.concurrent.DefaultPromise.notifyListener0(Future, GenericFutureListener)	2 130	53,815
               io.netty.util.concurrent.DefaultPromise.notifyListenersNow()	2 130	53,815
                  io.netty.util.concurrent.DefaultPromise.notifyListeners()	2 130	53,815
                     io.netty.util.concurrent.DefaultPromise.tryFailure(Throwable)	2 130	53,815
                        io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(ChannelPromise, Throwable)	2 130	53,815
                           io.netty.channel.AbstractChannel$AbstractUnsafe.write(Object, ChannelPromise)	2 130	53,815
                              io.netty.channel.DefaultChannelPipeline$HeadContext.write(ChannelHandlerContext, Object, ChannelPromise)	2 130	53,815
                                 io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(Object, ChannelPromise)	2 130	53,815
                                    io.netty.channel.AbstractChannelHandlerContext.invokeWrite(Object, ChannelPromise)	2 130	53,815
                                       io.netty.channel.AbstractChannelHandlerContext.write(Object, boolean, ChannelPromise)	2 130	53,815
                                          io.netty.channel.AbstractChannelHandlerContext.write(Object, ChannelPromise)	2 130	53,815
                                             io.netty.channel.AbstractChannelHandlerContext.invokeWrite(Object, ChannelPromise)	2 130	53,815
                                                io.netty.channel.AbstractChannelHandlerContext.write(Object, boolean, ChannelPromise)	2 130	53,815
                                                   io.netty.channel.AbstractChannelHandlerContext.write(Object, ChannelPromise)	2 130	53,815
                                                      io.netty.channel.AbstractChannelHandlerContext.invokeWrite(Object, ChannelPromise)	2 130	53,815
                                                         io.netty.channel.AbstractChannelHandlerContext.write(Object, boolean, ChannelPromise)	2 130	53,815
                                                            io.netty.channel.AbstractChannelHandlerContext.write(Object, ChannelPromise)	2 130	53,815
                                                               io.netty.channel.AbstractChannelHandlerContext.invokeWrite(Object, ChannelPromise)	2 130	53,815
                                                                  io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext, Object, ChannelPromise)	2 130	53,815
                                                                     io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext, Object, ChannelPromise)	2 130	53,815
                                                                        io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext, Object, ChannelPromise)	2 130	53,815
                                                                           io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run()	2 130	53,815
                                                                              io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(long)	2 130	53,815
                                                                                 io.netty.channel.epoll.EpollEventLoop.run()	2 130	53,815
                                                                                    io.netty.util.concurrent.SingleThreadEventExecutor$2.run()	2 130	53,815
                                                                                       io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run()	2 130	53,815
                                                                                          java.lang.Thread.run()	2 130	53,815

@mp911de mp911de reopened this Oct 9, 2017
@mp911de
Copy link
Collaborator

mp911de commented Oct 9, 2017

Thanks for reporting the issue. I pushed another change that does not require removal of the command from the queues but rather only enqueues the command to the stack if the write was successful. I deployed a snapshot of 4.4.2-SNAPSHOT (lettuce-4.4.2-20171010.063534-9.jar) to https://oss.sonatype.org/content/repositories/snapshots/. Care to give it a spin?

mp911de added a commit that referenced this issue Oct 9, 2017
Lettuce now appends written commands to the command stack using the write promise. Appending the command with the promise does not require command dequeueing on write failures but increases GC pressure.
@nikolayk812
Copy link
Author

@mp911de, thanks for quick fix. However, I cannot confirm it solves the problem. Application gets stuck 6 minutes before it store to Redis again. I see these two similar stacktraces in JMC.

Looks like CommandWrapper.unwrap is time-consuming and should be avoided.

Stack Trace	Sample Count	Percentage(%)
java.util.ArrayDeque.contains(Object)	736	48,042
   com.lambdaworks.redis.protocol.CommandHandler.addToStack(RedisCommand, ChannelPromise)	736	48,042
      com.lambdaworks.redis.protocol.CommandHandler.writeSingleCommand(ChannelHandlerContext, RedisCommand, ChannelPromise)	736	48,042
         com.lambdaworks.redis.protocol.CommandHandler.write(ChannelHandlerContext, Object, ChannelPromise)	736	48,042
            io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(Object, ChannelPromise)	736	48,042
               io.netty.channel.AbstractChannelHandlerContext.invokeWrite(Object, ChannelPromise)	645	42,102
                  io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext, Object, ChannelPromise)	645	42,102
                     io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext, Object, ChannelPromise)	645	42,102
                        io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext, Object, ChannelPromise)	645	42,102
                           io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run()	645	42,102
                              io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(long)	645	42,102
                                 io.netty.channel.epoll.EpollEventLoop.run()	645	42,102
                                    io.netty.util.concurrent.SingleThreadEventExecutor$2.run()	645	42,102
                                       io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run()	645	42,102
                                          java.lang.Thread.run()	645	42,102

Stack Trace	Sample Count	Percentage(%)
com.lambdaworks.redis.protocol.CommandWrapper.unwrap(RedisCommand)	710	46,345
   com.lambdaworks.redis.protocol.AsyncCommand.equals(Object)	710	46,345
      java.util.ArrayDeque.contains(Object)	710	46,345
         com.lambdaworks.redis.protocol.CommandHandler.addToStack(RedisCommand, ChannelPromise)	710	46,345
            com.lambdaworks.redis.protocol.CommandHandler.writeSingleCommand(ChannelHandlerContext, RedisCommand, ChannelPromise)	710	46,345
               com.lambdaworks.redis.protocol.CommandHandler.write(ChannelHandlerContext, Object, ChannelPromise)	710	46,345
                  io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(Object, ChannelPromise)	710	46,345
                     io.netty.channel.AbstractChannelHandlerContext.invokeWrite(Object, ChannelPromise)	649	42,363
                        io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext, Object, ChannelPromise)	649	42,363
                           io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext, Object, ChannelPromise)	649	42,363
                              io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext, Object, ChannelPromise)	649	42,363
                                 io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run()	649	42,363
                                    io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(long)	649	42,363
                                       io.netty.channel.epoll.EpollEventLoop.run()	649	42,363
                                          io.netty.util.concurrent.SingleThreadEventExecutor$2.run()	649	42,363
                                             io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run()	649	42,363
                                                java.lang.Thread.run()	649	42,363

@nikolayk812
Copy link
Author

Btw 5463f78 this commit went through to lettuce-4.4.2-20171010.063534-9.jar?

@mp911de
Copy link
Collaborator

mp911de commented Oct 10, 2017

Yes, that particular commit was part of the mentioned snapshot build – I did the deployment locally and didn't pick the change to the affected branches yet.

CommandWrapper.unwrap(…) is the last segment in a hot code path and earns therefore all the blame.

The duplicate command check is in place for a good reason. I'm not quite sure how to proceed from here. The stacks from above relate to single-command dispatching and are not related to the original rebuildQueue/activation issue anymore.

Again, without having a reproducible test case it's not possible to comprehend what's going on and where the actual issue happens. The original issue is solved and I'm going to merge https://github.com/lettuce-io/lettuce-core/tree/issue/616 to 4.4.2 and 4.5.

mp911de added a commit that referenced this issue Oct 11, 2017
Lettuce now appends written commands to the command stack using the write promise. Appending the command with the promise does not require command dequeueing on write failures but increases GC pressure.
mp911de added a commit that referenced this issue Oct 11, 2017
Lettuce now appends written commands to the command stack using the write promise. Appending the command with the promise does not require command dequeueing on write failures but increases GC pressure.
mp911de added a commit that referenced this issue Oct 11, 2017
Lettuce now appends written commands to the command stack using the write promise. Appending the command with the promise does not require command dequeueing on write failures but increases GC pressure.
mp911de added a commit that referenced this issue Oct 11, 2017
Lettuce now appends written commands to the command stack using the write promise. Appending the command with the promise does not require command dequeueing on write failures but increases GC pressure.
@mp911de mp911de closed this as completed Oct 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A general bug
Projects
None yet
Development

No branches or pull requests

2 participants