-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rerun benchmark with elasticsearch 7.5 or above #2
Comments
Hello @jtibshirani , Thanks for reaching out. I'm actually working on evaluation of 7.6. Why do you recommend 7.5? And yes, 7.6 is much faster than 7.4 (about 2x I think so well done!) I'm also switching to standard ann benchmark data instead of random data, see also my comments on elastic/elasticsearch#51243. From my tests, it seems like those brute force latency numbers must have been produced with number of shards set to 2. |
My statement was a bit confusing -- any version 7.5 or above will contain the improvements.
Great that you're standardizing on the ann-benchmarks data! |
Yes, coming first is gist-960-euclidean and sift-128-euclidean. |
@jtibshirani would you be able to help with the following questions?
The results are gist-960-euclidean 1M vectorssingle shard with Elastic and threads-per-search equal to one with Vespa
two shards for Elastic and two threads-per-search with Vespa
Are the results with Elastic comparable with your setup? Same HW as before. Vespa is implementing a variant of the hnsw algorithm for ann (experimental feature currently) so will eventually publish some results with that enabled as well. |
I would recommend using the bulk API. You can't feed the list of JSON documents directly, some minimal wrapping is still needed to create the request. The ES Python client has some nice bulk helpers to make the process easier.
I don't have deep knowledge of Vespa's architecture, but from the ES side this seems like a reasonable comparison -- with only one shard ES will use a single thread to perform the search. In addition to setting There are a few other pieces of set-up that are important:
|
Thanks a lot for your input @jtibshirani, Yes, all numbers are reported using a single client and no concurrency. I want to also evaluate with more with higher concurrency so thanks for the recommendation on number_of_replicas. Some of the libraries out there for ANN scales pretty badly with increased concurrency but that does not show in any of the ann-benchmarks. I've given Elastic 8G heap (ES_JAVA_OPTS="-Xms8g -Xmx8g") and I don't see any GC pressure signs and I've used force segments (Vespa has a similar mechanism for flushing the memory index), difference is that Vespa has a memory index (b+ tree implementation) which can updated without any merging like with Lucene based engines. Once the memory index has reached a threshold it's flushed and merged with the index (Similar to Lucene segment merging). I'm able to reproduce the brute force numbers here elastic/elasticsearch#51243 (comment) but in my setup I need 2 shards to get 0.78 QPS. |
@jtibshirani I've updated the master branch using 7.6. |
@jobergum I'm sorry for the late reply. I'm not sure why your benchmarking results aren't lining up with @mayya-sharipova's. The only other difference that comes to mind is that we always make sure to omit the returning the full document source in results by setting
Thanks! The 'Ivy Bridge' numbers make sense to me, based on the previous results and the performance improvements in ES. However the Haswell numbers are more surprising -- do you know why Vespa shows a latency improvement of ~2x between the Ivy Bridge and Haswell processors? |
@jtibshirani the vector is not returned with the result, if that was the case yes - I would have spotted it. Sample response from ES
On cpu architectures, yes it's explained by us using avx512 instructions
Will soon update with results using our HNSW implementation for approximate nearest neighbor search, some sample data with gist data set: |
Thanks for the explanation + links on AVX. The HNSW implementation looks really promising. If it's not too much work, it would be great to report the sift-128-euclidean results against Ivy Bridge as well. I'd be curious to see how consistent the latency differences are. Other than that I don't have anything else to add, happy if you'd like to close out this issue. |
Thanks, Yes, I just did. Thanks for the input on ES benchmarking. |
I'm resolving this - hoping to have more time to introduce the ANN vespa version later on |
@jtibshirani does ES ship with an asynch feed client which makes it easier to feed documents with high throughput? I'm using the synchronous HTTP POST api but would like to move away from it. Vespa has this utility to feed a JSON file https://docs.vespa.ai/en/reference/vespa-cmdline-tools.html#vespa-feeder so I'm looking for an ES equivalent. |
There's an Elasticsearch Python client, which adds convenient 'bulk helpers' for indexing a large set of documents: https://elasticsearch-py.readthedocs.io/en/v7.10.1/helpers.html#bulk-helpers. Here's an example from one of my colleagues: https://github.com/elastic/examples/blob/master/Machine%20Learning/Online%20Search%20Relevance%20Metrics/bin/index#L34. You could ignore everything related to 'pipeline', this is an optional piece of configuration for transforming documents before indexing them. |
In ES 7.5, we made some improvements to the performance of Elasticsearch
dense_vector
operations (elastic/elasticsearch#46294). Although I still expect the QPS to be significantly worse than Vespa's, it would be helpful to rerun the benchmarks against ES 7.5 to get an up-to-date comparison.The text was updated successfully, but these errors were encountered: