-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unnecessary copying of byteBuf in CommandHandler.decode() #725
Comments
Thanks a lot for digging into that. We have a JMH test suite. Care to include a JMH benchmark along your PR #726? |
Sure, will add it soon. |
I digged a bit deeper into this case and I figured out I had not fully understood the reason of the issue in the first place. More precisely, it's not true, that there is one big reply, which is fully loaded to a buffer before being decoded. Here is what really happens:
There is just one thing that bothers me - is it possible, that all |
Not exactly sure about your analysis.
Netty stores outbound writes in A read and write cannot happen concurrently (that was different for netty 3). The bound channel thread can handle only a single task at a time. It's either reading or writing. Since Redis does not reply (should not?) to commands that are currently sent, we're on the safe side. Yet it bothers me because a freeze must not happen. Interesting point, however. |
Just to clarify: I'm not sure I got the whole flow right (I'm not a Netty expert:)), but I'm sure growing of the buffer is caused because
Ad. 2.: Sorry, forgot to mention I was talking about batch writes. And my point was, before the whole batch is successfully written, Redis can already reply to the commands from the beginning of the batch. Here is my another hypothesis (after glancing over
WDYT, is this scenario possible? |
You're absolutely right, I didn't consider this aspect and Redis can indeed start answering while the last command isn't written. The last command in the write will trigger another response which then initiates decoding of the whole batch. I haven't looked into the eventloop implementation but your description sounds reasonable. Right now I see following possibilities how to address that loophole:
|
…dler's buffer Fixes redis#725.
I ran the benchmark (command creation overhead et. al. removed): Calling
Calling
Calling
Calling
|
Based on the benchmark results above I'd propose:
WDYT? I've polished up the code already, so no need to change the pull request. |
I'm a little surprised that So, should I leave #726 as is? I don't see any changes on the |
I haven't pushed yet the changes, wanted to get your feedback first. |
…dler's buffer #725 Add benchmarks for channelRead added to CommandHandler JMH test suite. CommandHandlerBenchmark tests the whole flow now - both writes and reads. Also, creation of commands was moved to benchmark methods - after one usage they become not writable which causes the benchmark to give incorrect results. Original pull request: #726
Move back to discardReadBytes() but discard bytes outside the decoding loop to not enforce cleanup upon each decoded command. Tweak JMH benchmarks to not include command creation overhead caused by IntStream and element collection. Tweak commands in test to never return done state so commands can be reused for all benchmark runs. Original pull request: #726
…dler's buffer #725 Add benchmarks for channelRead added to CommandHandler JMH test suite. CommandHandlerBenchmark tests the whole flow now - both writes and reads. Also, creation of commands was moved to benchmark methods - after one usage they become not writable which causes the benchmark to give incorrect results. Original pull request: #726
Move back to discardReadBytes() but discard bytes outside the decoding loop to not enforce cleanup upon each decoded command. Tweak JMH benchmarks to not include command creation overhead caused by IntStream and element collection. Tweak commands in test to never return done state so commands can be reused for all benchmark runs. Original pull request: #726
…dler's buffer #725 Add benchmarks for channelRead added to CommandHandler JMH test suite. CommandHandlerBenchmark tests the whole flow now - both writes and reads. Also, creation of commands was moved to benchmark methods - after one usage they become not writable which causes the benchmark to give incorrect results. Original pull request: #726
Move back to discardReadBytes() but discard bytes outside the decoding loop to not enforce cleanup upon each decoded command. Tweak JMH benchmarks to not include command creation overhead caused by IntStream and element collection. Tweak commands in test to never return done state so commands can be reused for all benchmark runs. Original pull request: #726
That's fixed now. Thanks a lot! |
I am using snaphots of 5.0.x.
I noticed that reading a reply to a large pipeline (hundreds of thousands of commands) takes a lot of time.
It is caused by calling
discardReadBytes
on every properly decoded command inCommandHandler.decode
. More precisely, I am talking about this line: https://github.com/lettuce-io/lettuce-core/blob/e3641b465d7ad2e7fd5be09006f1087e5df9e919/src/main/java/io/lettuce/core/protocol/CommandHandler.java#L565I wrote a simple benchmark to verify my hypothesis: https://gist.github.com/gszpak/c7c782df696922f3660352bfa22cedfd
Here are the results:
As you can see, querying 500k keys took 28s. Here is the output of JvmTop for this run:
Here is what happens:
When you send a very large pipeline, Redis sends a
BULK
reply. Such a reply is fully loaded tocommandHandler.byteBuf
before being decoded. Then lettuce starts decoding commands one by one and after every commandbyteBuf.discardReadBytes()
is called. This causes shifting the buffer's readerIndex to 0, which actually involves copying the part of the buffer that has not been read yet. Hence theio.netty.util.internal.PlatformDependent0.copyMemory()
call taking 95% time of the run.The text was updated successfully, but these errors were encountered: