-
Notifications
You must be signed in to change notification settings - Fork 754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding HNSW, updating SW-graph, increasing the number of queries #18
Conversation
adding a good set of parameters for the old sw-graph
Some relatively minor edits. Also, currently NMSLIB uses the branch pserv. We will try to merge it with the master branch ASAP and make a release. Then, we will update the installation script.
Updating the number of queries and the seed (also README)
Leo (@searchivarius) and I => Yury @yurymalkov and Leo. Perhaps, we could also add to the README that tests are based on the pserv branch. We will merge it into master soon. Afterwards, the README would have to be updated again :-) |
] | ||
|
||
algos['SW-graph(nmslib)'] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably could have put these in a loop but no big deal
nice – very impressive results! |
will rerun this week |
thank you @erikbern ! |
i deleted the results for annoy, kgraph, SW-graph, and falconn. re-running now with a larger number of queries |
thanks! when the results are ready, could you postpone the announcement a little bit? perhaps, for a couple of days. |
was just going to post some preliminary results here but sure I'll hold off a few days :) |
It's fine to update GitHub, just don't post results on Twitter or don't blog them, please. I want to update docs and propagate all the changes to the master branch. |
np i'm not in a rush :) |
i noticed nmslib now stores all indices in a subdirectory, which messed up my benchmarks a bit (hard drive filled up and caused a bunch of issues). is that necessary for the purpose of these benchmarks? |
@erikbern yes, indexing times are long. Because we use many fewer unique indices for each test, saving an index saves a lot of time. I did mention this in README. Sorry, if this wasn't sufficiently clear. The indices are not huge, I am surprised you run out of disk space. Another option would be to modify the benchmark so that you can create an index and modify query-time parameters. However, other methods do not seem to have a capability of changing parameters at run-time, in particular, because most parameters are index-time. |
the c2.4xlarge machines just don't have a lot of disk space, that's the problem unfortunately since the machine ran out of disk, it wiped out the result of the last 48h... not a big problem really (i'm paying $0.12/h for the spot instance iirc). re-running it with saving disabled now. |
Sorry about the trouble, I should have mentioned that I have added additional space. You can do it when you setup a machine. With 5K queries, re-running took about 32 hours on c2.4xlarge. |
PS: it can still be a spot-instance price. I also paid about 0.1$ per hour. |
32h is fine, i'm not in a rush. i'll keep it running over the weekend! |
@erikbern I must warn that without saving/loading of the index the tests will take much longer time. The parameters are tuned now so that each point of HNSW for glove may take several hours to build. And there are ~50 of these points. The same mostly holds for the SW-graph but with less points. |
i don't really have time to solve the disk space issue, and i'm going to be afk from a few days anyway, so i it's not a big deal. will just get the results on monday |
i also think it's useful to measure index building time (although i haven't included that in the analysis yet) |
In terms of index time, the results aren't so great, so there's room for improvement. However, the current building times (for this benchmark) are a bit pessimistic. It is possible to make indexing 2-5x times faster at a relatively little (10-20%) loss in accuracy/speed. Also, if you have 4x more cores, you get 4x shorter indexing times :-) |
The current building times (for this benchmark) are very pessimistic, at least for glove. The build parameters and the number of points were selected quite carelessly, assuming using save/load of the index. |
I wouldn't say carelessly. Rather, they are deliberately optimized for faster retrieval at the expense of longer indexing time. |
Leo (@searchivarius) and I have added a new algorithm (Hierarchical NSW, HNSW) and updated the parameters and performance of the SW-graph algorithm (both from nmslib).
Results of comparison to FALCONN and Annoy on the same amazon instance are attached below.
Comparison on a Xeon E5-4650 v2 machine can be found in the HNSW preprint http://arxiv.org/abs/1603.09320
Note that SW-graph and HNSW indexes are now saved and reused later to strongly reduce the testing time.
As an addition, we have increased the number of queries to 10K (5K might be also OK), see Leo's comments below: