Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding benchmarks for some of the Non-Metric Space Library Methods #6

Merged
merged 4 commits into from
Jun 14, 2015
Merged

Adding benchmarks for some of the Non-Metric Space Library Methods #6

merged 4 commits into from
Jun 14, 2015

Conversation

searchivarius
Copy link
Contributor

Hi, please, consider the following pull request:

  • SW-graph is a proximity graph implementation.
  • MPLSH (multiprobe LSH). It is actually a ported implementation from LSHKit, which was written by Wei Dong (the kgraph guy :). Yet, it "sits" inside the Non-Metric Space Library and can be invoked like any other supported method. It works only for L2. I tried to use it for the cosine similarity/angular distance knn, but performance wasn't good.
  • Ball-Tree (VP-tree) is a ball tree that can adjust its pruning algorithm. This works decently for many metric and non-metric spaces. It keeps its own copy of each bucket (see my comments below on cache misses).
  • Two variants of the brute-force search: the variant bruteforce1 copies all vectors to store them in a contiguous chunk of memory.

Should you decide to benchmark computationally intensive distances, we can add a couple of other methods.

The recent changes are in the ann-benchmark. We are going to propagate them to the master soon (and make a mini-release).

L2/Cosine implementations use SSE2, but not AVX (which is slightly faster).

One reason why bruteforce performance may suck is that Python doesn't store vectors contiguously. Accessing these vectors incurs a lot of cache misses. One cache miss is roughly 500 CPU cycles, or 4 computations of L2 distances. For L2, I suspect, memory bandwidth is becoming a bottleneck.

For SIFT signatures, you can store vectors as byte vectors and use an efficient Wei Dong's implementation of L2 that relies on SIMD. Apparently, this can boost performance, at least in the multithreaded mode (due to bandwidth savings, or maybe it also uses fewer CPU cycles). However, this is not a generic solution. In fact, I recently learned that RootSIFT performs better than raw SIFT. However, you can't apparently use the byte-storage trick with RootSIFT.

I made a test run on c4.4xlarge for some methods (results are below). However, I didn't re-run FLANN and only imported Eric's results (FLANN takes it really long to build an index):

GLove/angular
Glove/angular

SIFT/l2
SIFT/L2

@aaalgo
Copy link
Contributor

aaalgo commented Jun 13, 2015

SW-graph is good!
RootSIFT: we also found taking logarithm of original SIFT values makes performance better. (http://www.cs.princeton.edu/cass/papers/icmr12.pdf) SIFT doesn't use the space of 256 values very efficiently.

Memory bandwidth: I also think it's the bottleneck. I don't suggest using byte value for SIFT either. So long everyone has the same disadvantage, the benchmark is still meaningful. I didn't observer much improvement of my manual SSE vectorization over GCC-generated code when I was comparing against FLANN.

@searchivarius
Copy link
Contributor Author

Hi Wei, thank you!

Regarding the difference in performance: GCC auto-vectorizes in simple cases and uses AVX instructions. So, it results in slightly faster code in the case when your read from L1/L2. However, when I benchmark sequential searching, there is no difference on my PC (between faster and slower distance function). So, I suspect that sequential search is memory bound. It also seems to me that a single core cannot use full memory bandwidth.

@searchivarius
Copy link
Contributor Author

PS: However, Clang doesn't automatically vectorize yet. So, there is a difference when you use a custom SSE implementation (I triple checked this). Intel compiler vectorizes as well. However, their compiler isn't freely available any more. So, I am not sure if it's worth a support.

erikbern added a commit that referenced this pull request Jun 14, 2015
Adding benchmarks for some of the Non-Metric Space Library Methods
@erikbern erikbern merged commit edbc72b into erikbern:master Jun 14, 2015
erikbern pushed a commit that referenced this pull request May 16, 2018
erikbern pushed a commit that referenced this pull request Dec 6, 2018
tinkerlin pushed a commit to tinkerlin/ann-benchmarks that referenced this pull request Jul 1, 2020
erikbern pushed a commit that referenced this pull request Apr 14, 2023
erikbern pushed a commit that referenced this pull request Apr 14, 2023
erikbern pushed a commit that referenced this pull request Jun 9, 2023
make image ann-benchmarks not ann-benchmarks-base in install.py
jgpruitt added a commit to timescale/ann-benchmarks that referenced this pull request Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants