Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CommandHandler.write() is O(N^2) #709

Closed
gszpak opened this issue Feb 21, 2018 · 10 comments
Closed

CommandHandler.write() is O(N^2) #709

gszpak opened this issue Feb 21, 2018 · 10 comments
Labels
type: feature A new feature
Milestone

Comments

@gszpak
Copy link
Contributor

gszpak commented Feb 21, 2018

I am using version 5.0.1.RELEASE.

It looks like CommandHandler.write() method is of O(N^2) complexity, where N is number of written commands. It's caused by this check:
https://github.com/lettuce-io/lettuce-core/blob/efa80382fa28570cbaac44681a5dc47768291b92/src/main/java/io/lettuce/core/protocol/CommandHandler.java#L413

Usually, it won't be such a big problem as channel thread will decrease the size of the stack quickly enough. But for some reasons, we want to have only one thread, i.e. we want the main thread to also handle Lettuce's Netty connection. We know that doing so is dangerous, but we want to keep it this way. In such a case, writing large batches becomes really slow.

Let me show you a simple benchmark (I am aware of some simplifications, e.g. EventLoopGroupProvider not being shut down properly, but this code is just to demonstrate the issue):
https://gist.github.com/gszpak/e9af789cc69718524b28d9ee9adfac16

The results of the benchmark are:

Num of iterations: 10000, running time: 6251 ms
Num of iterations: 20000, running time: 23101 ms
Num of iterations: 40000, running time: 80290 ms

Is it possible to get rid of the contains check I mentioned or use more efficient data structures for command stack?

Thank you!

@mp911de
Copy link
Collaborator

mp911de commented Feb 22, 2018

Isn't that a duplicate of #693? Could you upgrade to Lettuce 5.0.2.RELEASE and retry?

@mp911de mp911de added the status: waiting-for-feedback We need additional information before we can continue label Feb 22, 2018
@gszpak
Copy link
Contributor Author

gszpak commented Feb 22, 2018

I still have this problem in 5.0.2.RELEASE. It's a different issue, but it's similar: in #693 there were unnecessary contains checks for commandBuffer and disconnectedBuffer in DefaultEndpoint, this time the problem is caused by such a check before adding the command to stack in CommandHandler.

@mp911de
Copy link
Collaborator

mp911de commented Feb 22, 2018

The check for stack will stay there. If your batch size creates issues because of being too large, then it's nothing the driver will solve for you. Maintaining state comes at a certain cost.

@gszpak
Copy link
Contributor Author

gszpak commented Feb 22, 2018

The problem is not caused by the size of the batch. As you could see in the ticket's description, sending 40000 commands took 80 seconds, which is very slow - and the time clearly increased quadratically.

I'm not saying we should get rid of the check, but having it shouldn't decrease client's performance.
I would suggest two solutions:

The second solution seems to be the best for me. Could I create a PR with it?

@mp911de
Copy link
Collaborator

mp911de commented Feb 23, 2018

ArrayDeque does not produce any garbage when adding/removing items and that has quite some impact on GC pressure. LinkedHashSet creates/removes nodes upon add/removal.

Duplicates can occur upon batch add, not so much when individual commands get added. I thought about changing writeBatch(…) by moving the duplicate verification to there. writeBatch(…) is mostly used upon reconnect and user-controlled batches. Invoking individual commands are not using that code path.

@mp911de mp911de added the type: feature A new feature label Feb 23, 2018
@gszpak
Copy link
Contributor Author

gszpak commented Feb 23, 2018

Still, it's better to create a node on add() than to scan the whole collection, which is happening now:) And like I said, imho it's still better to just have a HashSet next to the queue to have efficient contains check.

As you said - after moving the check to writeBatch, issue will still occur when controlling flush() manually - which is what will usually happen for large batches. So it won't really solve the problem.

@mp911de
Copy link
Collaborator

mp911de commented Feb 25, 2018

As you said - after moving the check to writeBatch, issue will still occur when controlling flush() manually - which is what will usually happen for large batches. So it won't really solve the problem.

I was more heading towards using LinkedHashSet in writeBatch() and let the hash set do the de-duplication instead of ArrayDeque.contains(…).

mp911de added a commit that referenced this issue Feb 25, 2018
Lettuce now checks for command stack duplicates in CommandHandler.writeBatch(…) by using LinkedHashSet. Duplicates occur usually in batch submissions and the check in the single command path causes additional cost that isn't necessary in the majority of cases.

Using LinkedHashSet as intermediate collection reduces contains cost from O(N^2) to constant time.
mp911de added a commit that referenced this issue Feb 25, 2018
Lettuce now checks for command stack duplicates in CommandHandler.writeBatch(…) by using LinkedHashSet. Duplicates occur usually in batch submissions and the check in the single command path causes additional cost that isn't necessary in the majority of cases.

Using LinkedHashSet as intermediate collection reduces contains cost from O(N^2) to constant time.
@mp911de mp911de added this to the Lettuce 5.0.3 milestone Feb 25, 2018
@mp911de
Copy link
Collaborator

mp911de commented Feb 25, 2018

I pushed a change, and a new snapshot 5.0.3.BUILD-SNAPSHOT is available. Care to give it a spin?

@gszpak
Copy link
Contributor Author

gszpak commented Feb 26, 2018

Works like a charm:

Num of iterations: 10000, running time: 1143 ms
Num of iterations: 20000, running time: 1780 ms
Num of iterations: 40000, running time: 2202 ms

Thank you so much for fixing it! When can we expect version 5.0.3 to be released?

@mp911de
Copy link
Collaborator

mp911de commented Feb 26, 2018

Cool, thanks for giving it a try. I don't know when 5.0.3 will be released. As an intermediate solution, feel free to use snapshots of 5.0.x. I don't have a release date yet, I expect some feedback from Spring 2.0 GA in the next weeks.

mp911de added a commit that referenced this issue Feb 27, 2018
Lettuce now checks for command stack duplicates in CommandHandler.writeBatch(…) by using LinkedHashSet. Duplicates occur usually in batch submissions and the check in the single command path causes additional cost that isn't necessary in the majority of cases.

Using LinkedHashSet as intermediate collection reduces contains cost from O(N^2) to constant time.
mp911de added a commit that referenced this issue Feb 27, 2018
Lettuce now checks for command stack duplicates in CommandHandler.writeBatch(…) by using LinkedHashSet. Duplicates occur usually in batch submissions and the check in the single command path causes additional cost that isn't necessary in the majority of cases.

Using LinkedHashSet as intermediate collection reduces contains cost from O(N^2) to constant time.
@mp911de mp911de removed the status: waiting-for-feedback We need additional information before we can continue label Feb 27, 2018
@mp911de mp911de modified the milestones: Lettuce 5.0.3, Lettuce 4.4.4 Feb 27, 2018
@mp911de mp911de closed this as completed Feb 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature A new feature
Projects
None yet
Development

No branches or pull requests

2 participants