-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DefaultEndpoint.QUEUE_SIZE becomes out of sync, preventing command queueing #764
Comments
Thanks a lot for your extensive bug report. I fixed the println issue via #765. I need to have a look on what happens with the queue size. |
Just a quick note that the |
This issue arises when n commands are written to the transport (where n is the queue limit) and additional n commands are written while the previous commands are not yet completed. This fits the description where a long-running action causes commands to pile up. |
After further analysis, this bug is a consequence of retry and the queue limit protection. Writing a command to a queue that is full completes the command exceptionally and throws an exception. The The command gets into the event loop. There are two things to do:
|
This lines up with what I had been seeing. I had thought about not using |
CommandHandler now completes a write promise if the write completes without actually writing a message. Previously, no-op writes did not complete and rendered an invalid state (e.g. queue size was not decremented).
CommandHandler now completes a write promise if the write completes without actually writing a message. Previously, no-op writes did not complete and rendered an invalid state (e.g. queue size was not decremented).
CommandHandler now completes a write promise if the write completes without actually writing a message. Previously, no-op writes did not complete and rendered an invalid state (e.g. queue size was not decremented).
CommandHandler now completes a write promise if the write completes without actually writing a message. Previously, no-op writes did not complete and rendered an invalid state (e.g. queue size was not decremented).
That's fixed now. Snapshots are available. Care to give |
This looks great on my end, at least in my cobbled together reproduction environment. I'll will see if i can get this tested on production hardware later today, but I expect it to pass there as well. Thanks! |
Cool, thanks a lot. Closing this one as resolved. Feel free to reopen the issue if the problem persists. |
Do you have an ETA on when you'll be releasing 5.0.4 with this fix? |
I added release dates to the milestones. |
Awesome, thanks. |
Observed Version(s): 5.0.3.RELEASE
Introduced in Version(s): 4.4.0.Final
Still visible in master? Unknown but likely
Expected:
When request queue size is hit, submitted commands are terminated early. When request queue drains, new commands are once again submitted
Actual:
I'm still in the process of determining exactly what is happening here, but what I'm observing is that when a redis instance is performing a task that blocks the foreground thread for a substantial amount of time (seconds up to minutes, details on how to do this below), the
DefaultEndpoint
can become wedged in a state whereQUEUE_SIZE
is stuck at a non-zero value. If this value is greater thanclientOptions.getRequestQueueSize() - command
,validateWrite
will never again validate any writes submitted to it.To Reproduce
Using the setup shown below, connect to redis and verify that commands are processed correctly. Then submit a redis
save
command, and while that save is running (that's why we use a large list, but there are other ways to replicate this), submit more thanrequestQueueSize
requests:Performing the above, and then waiting for the
save
command to complete, results in the log file:After the
save
operation has completed, submitting a single followup request results in the log file:As shown in the log file, the value of
QUEUE_SIZE
is now stuck at5
and above. Given a second cycle, this connection would become entirely unresponsive. By using a debugger, one can manually set this value to zero and verify that everything once again works correctly.Speculation:
I believe the
dequeue
command is never called due, in some part, to the following error, but currently haven't tracked down the exact flow that results in this case:Setup:
Redis:
Client:
P.S. This
println
also seems removable:The text was updated successfully, but these errors were encountered: