Adding HNSW, updating SW-graph, increasing the number of queries #18

yurymalkov · 2016-04-24T07:17:30Z

Leo (@searchivarius) and I have added a new algorithm (Hierarchical NSW, HNSW) and updated the parameters and performance of the SW-graph algorithm (both from nmslib).
Results of comparison to FALCONN and Annoy on the same amazon instance are attached below.
Comparison on a Xeon E5-4650 v2 machine can be found in the HNSW preprint http://arxiv.org/abs/1603.09320
Note that SW-graph and HNSW indexes are now saved and reused later to strongly reduce the testing time.

As an addition, we have increased the number of queries to 10K (5K might be also OK), see Leo's comments below:

Eric is most welcome to use any other random_state.

Last time, Leo was testing using random_state==1 However, it is best to use a new random seed for each major re-evaluation, that we test on a truly bind data.

Increasing the number of queries or changing the seed, won't make truly weak methods much faster. However, in my experience, this can easily result in 10-30% changes in performance. This may affect relative performance of methods with similar performance.

adding a good set of parameters for the old sw-graph

Some relatively minor edits. Also, currently NMSLIB uses the branch pserv. We will try to merge it with the master branch ASAP and make a release. Then, we will update the installation script.

Updating the number of queries and the seed (also README)

searchivarius · 2016-04-24T17:14:27Z

Leo (@searchivarius) and I => Yury @yurymalkov and Leo.

Perhaps, we could also add to the README that tests are based on the pserv branch. We will merge it into master soon. Afterwards, the README would have to be updated again :-)

erikbern · 2016-04-25T01:34:41Z

ann_benchmarks/__init__.py

        ]

+        algos['SW-graph(nmslib)'] = [


probably could have put these in a loop but no big deal

erikbern · 2016-04-25T01:35:15Z

nice – very impressive results!

erikbern · 2016-04-25T01:35:31Z

will rerun this week

searchivarius · 2016-04-25T02:05:40Z

thank you @erikbern !

erikbern · 2016-04-27T13:22:06Z

i deleted the results for annoy, kgraph, SW-graph, and falconn. re-running now with a larger number of queries

searchivarius · 2016-04-28T03:45:37Z

thanks! when the results are ready, could you postpone the announcement a little bit? perhaps, for a couple of days.

erikbern · 2016-04-28T12:39:45Z

was just going to post some preliminary results here but sure I'll hold off a few days :)

searchivarius · 2016-04-28T13:20:10Z

It's fine to update GitHub, just don't post results on Twitter or don't blog them, please. I want to update docs and propagate all the changes to the master branch.

erikbern · 2016-04-28T13:38:51Z

np i'm not in a rush :)

erikbern · 2016-04-29T13:36:55Z

i noticed nmslib now stores all indices in a subdirectory, which messed up my benchmarks a bit (hard drive filled up and caused a bunch of issues). is that necessary for the purpose of these benchmarks?

searchivarius · 2016-04-29T13:47:19Z

@erikbern yes, indexing times are long. Because we use many fewer unique indices for each test, saving an index saves a lot of time. I did mention this in README. Sorry, if this wasn't sufficiently clear. The indices are not huge, I am surprised you run out of disk space.

Another option would be to modify the benchmark so that you can create an index and modify query-time parameters. However, other methods do not seem to have a capability of changing parameters at run-time, in particular, because most parameters are index-time.

erikbern · 2016-04-29T14:12:14Z

the c2.4xlarge machines just don't have a lot of disk space, that's the problem

unfortunately since the machine ran out of disk, it wiped out the result of the last 48h... not a big problem really (i'm paying $0.12/h for the spot instance iirc). re-running it with saving disabled now.

searchivarius · 2016-04-29T14:15:07Z

Sorry about the trouble, I should have mentioned that I have added additional space. You can do it when you setup a machine. With 5K queries, re-running took about 32 hours on c2.4xlarge.

searchivarius · 2016-04-29T14:15:47Z

PS: it can still be a spot-instance price. I also paid about 0.1$ per hour.

erikbern · 2016-04-29T14:17:52Z

32h is fine, i'm not in a rush. i'll keep it running over the weekend!

yurymalkov · 2016-04-29T15:49:40Z

@erikbern I must warn that without saving/loading of the index the tests will take much longer time. The parameters are tuned now so that each point of HNSW for glove may take several hours to build. And there are ~50 of these points. The same mostly holds for the SW-graph but with less points.
Btw, the indexes should take something around 5.5 Gb in total.

erikbern · 2016-04-29T17:02:11Z

i don't really have time to solve the disk space issue, and i'm going to be afk from a few days anyway, so i it's not a big deal. will just get the results on monday

erikbern · 2016-04-29T17:02:37Z

i also think it's useful to measure index building time (although i haven't included that in the analysis yet)

searchivarius · 2016-04-29T17:18:55Z

In terms of index time, the results aren't so great, so there's room for improvement. However, the current building times (for this benchmark) are a bit pessimistic. It is possible to make indexing 2-5x times faster at a relatively little (10-20%) loss in accuracy/speed. Also, if you have 4x more cores, you get 4x shorter indexing times :-)

yurymalkov · 2016-04-29T18:18:18Z

The current building times (for this benchmark) are very pessimistic, at least for glove. The build parameters and the number of points were selected quite carelessly, assuming using save/load of the index.

searchivarius · 2016-04-29T23:13:31Z

I wouldn't say carelessly. Rather, they are deliberately optimized for faster retrieval at the expense of longer indexing time.

Yuri Malkov and others added 10 commits April 21, 2016 17:34

Added new Nmslib bindings+updated script

233cc1f

Update old nmslib algs

444d39b

Fix build script and L2 nmslib algs

ba96e1a

adding a good set of parameters for the old sw-graph

fa0d012

Merge pull request #1 from searchivarius/master

c77fbb5

adding a good set of parameters for the old sw-graph

Updated link in nmslib install script

9b42bad

Updating the number of queries and the seed.

0fa27b4

Glove queries will have to be recomputed.

8fa121f

Update README.rst

186619d

Some relatively minor edits. Also, currently NMSLIB uses the branch pserv. We will try to merge it with the master branch ASAP and make a release. Then, we will update the installation script.

Merge pull request #3 from searchivarius/master

fc465b8

Updating the number of queries and the seed (also README)

erikbern reviewed Apr 25, 2016
View reviewed changes

ann_benchmarks/__init__.py

]

algos['SW-graph(nmslib)'] = [

Copy link

Owner

erikbern Apr 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably could have put these in a loop but no big deal

erikbern merged commit c42833f into erikbern:master Apr 25, 2016

cyberpower678 mentioned this pull request Nov 3, 2023

install.py reports numerous installation failures #477

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding HNSW, updating SW-graph, increasing the number of queries #18

Adding HNSW, updating SW-graph, increasing the number of queries #18

yurymalkov commented Apr 24, 2016

searchivarius commented Apr 24, 2016

erikbern Apr 25, 2016

erikbern commented Apr 25, 2016

erikbern commented Apr 25, 2016

searchivarius commented Apr 25, 2016

erikbern commented Apr 27, 2016

searchivarius commented Apr 28, 2016

erikbern commented Apr 28, 2016

searchivarius commented Apr 28, 2016

erikbern commented Apr 28, 2016

erikbern commented Apr 29, 2016

searchivarius commented Apr 29, 2016 •

edited

Loading

erikbern commented Apr 29, 2016

searchivarius commented Apr 29, 2016

searchivarius commented Apr 29, 2016

erikbern commented Apr 29, 2016

yurymalkov commented Apr 29, 2016

erikbern commented Apr 29, 2016

erikbern commented Apr 29, 2016

searchivarius commented Apr 29, 2016 •

edited

Loading

yurymalkov commented Apr 29, 2016

searchivarius commented Apr 29, 2016

Adding HNSW, updating SW-graph, increasing the number of queries #18

Adding HNSW, updating SW-graph, increasing the number of queries #18

Conversation

yurymalkov commented Apr 24, 2016

searchivarius commented Apr 24, 2016

erikbern Apr 25, 2016

Choose a reason for hiding this comment

erikbern commented Apr 25, 2016

erikbern commented Apr 25, 2016

searchivarius commented Apr 25, 2016

erikbern commented Apr 27, 2016

searchivarius commented Apr 28, 2016

erikbern commented Apr 28, 2016

searchivarius commented Apr 28, 2016

erikbern commented Apr 28, 2016

erikbern commented Apr 29, 2016

searchivarius commented Apr 29, 2016 • edited Loading

erikbern commented Apr 29, 2016

searchivarius commented Apr 29, 2016

searchivarius commented Apr 29, 2016

erikbern commented Apr 29, 2016

yurymalkov commented Apr 29, 2016

erikbern commented Apr 29, 2016

erikbern commented Apr 29, 2016

searchivarius commented Apr 29, 2016 • edited Loading

yurymalkov commented Apr 29, 2016

searchivarius commented Apr 29, 2016

searchivarius commented Apr 29, 2016 •

edited

Loading

searchivarius commented Apr 29, 2016 •

edited

Loading