Rerun benchmark with elasticsearch 7.5 or above #2

jtibshirani · 2020-03-09T18:31:00Z

In ES 7.5, we made some improvements to the performance of Elasticsearch dense_vector operations (elastic/elasticsearch#46294). Although I still expect the QPS to be significantly worse than Vespa's, it would be helpful to rerun the benchmarks against ES 7.5 to get an up-to-date comparison.

The text was updated successfully, but these errors were encountered:

jobergum · 2020-03-09T20:04:31Z

Hello @jtibshirani , Thanks for reaching out. I'm actually working on evaluation of 7.6. Why do you recommend 7.5? And yes, 7.6 is much faster than 7.4 (about 2x I think so well done!)

I'm also switching to standard ann benchmark data instead of random data, see also my comments on elastic/elasticsearch#51243. From my tests, it seems like those brute force latency numbers must have been produced with number of shards set to 2.

jtibshirani · 2020-03-09T20:07:52Z

Why do you recommend 7.5?

My statement was a bit confusing -- any version 7.5 or above will contain the improvements.

I'm also switching to standard ann benchmark data instead of random data, see also my comments on elastic/elasticsearch#51243.

Great that you're standardizing on the ann-benchmarks data!

jobergum · 2020-03-09T20:14:46Z

Yes, coming first is gist-960-euclidean and sift-128-euclidean.

jobergum · 2020-03-11T12:45:36Z

@jtibshirani would you be able to help with the following questions?

Is there a simple batch oriented api for feeding a json formatted file with the elastic docker distribution? Feeding one and one document through the REST http api is painfully slow with 1M gist-960 dataset.
For comparisons it's best to use 1 shard with ES and 1 thread per search with Vespa, you agree?
Could you look at the ES numbers below and tell me if they are within your expectations?

The results are

gist-960-euclidean 1M vectors

single shard with Elastic and threads-per-search equal to one with Vespa

Engine	QPS	Average Latency (ms)	95P Latency (ms)	Recall@10
Elastic 7.6	0.40	2502.82	2520.21	1.0000
Vespa 7.184.8	0.63	1579.29	1787.40	1.0000

two shards for Elastic and two threads-per-search with Vespa

Engine	QPS	Average Latency (ms)	95P Latency (ms)	Recall@10
Elastic 7.6	0.78	1276.69	1333.61	1.0000
Vespa 7.184.8	1.26	794.28	892.23	1.0000

Are the results with Elastic comparable with your setup? Same HW as before. Vespa is implementing a variant of the hnsw algorithm for ann (experimental feature currently) so will eventually publish some results with that enabled as well.

jtibshirani · 2020-03-11T19:15:31Z

Is there a simple batch oriented api for feeding a json formatted file with the elastic docker distribution?

I would recommend using the bulk API. You can't feed the list of JSON documents directly, some minimal wrapping is still needed to create the request. The ES Python client has some nice bulk helpers to make the process easier.

For comparisons it's best to use 1 shard with ES and 1 thread per search with Vespa, you agree?

I don't have deep knowledge of Vespa's architecture, but from the ES side this seems like a reasonable comparison -- with only one shard ES will use a single thread to perform the search.

In addition to setting number_of_shards: 1, it'd be good number_of_replicas: 0 so that there is only one shard copy serving searches. (I'm just mentioning this as a best practice, I assume you've configured vespa-fbench to run each search in a serial fashion, and not submit multiple searches at once? Setting number_of_replicas shouldn't change the performance numbers much if you are running each search serially.)

There are a few other pieces of set-up that are important:

After indexing all the vectors, you should 'force merge' all of the segments in a shard. Otherwise ES will have to search many small segments serially, then merge together the results. An example of force merging an index to one segment can be found here.
In @mayya-sharipova's latest benchmarks under the section 'Bruteforce benchmarks', she set the heap size to 7GB. The default heap size is quite low, so it would be good to raise it -- instructions on how to set the heap size can be found here.

jobergum · 2020-03-11T20:13:55Z

Thanks a lot for your input @jtibshirani,

Yes, all numbers are reported using a single client and no concurrency. I want to also evaluate with more with higher concurrency so thanks for the recommendation on number_of_replicas. Some of the libraries out there for ANN scales pretty badly with increased concurrency but that does not show in any of the ann-benchmarks.

I've given Elastic 8G heap (ES_JAVA_OPTS="-Xms8g -Xmx8g") and I don't see any GC pressure signs and I've used force segments (Vespa has a similar mechanism for flushing the memory index), difference is that Vespa has a memory index (b+ tree implementation) which can updated without any merging like with Lucene based engines. Once the memory index has reached a threshold it's flushed and merged with the index (Similar to Lucene segment merging).

I'm able to reproduce the brute force numbers here elastic/elasticsearch#51243 (comment) but in my setup I need 2 shards to get 0.78 QPS.

jobergum · 2020-03-13T13:28:09Z

@jtibshirani I've updated the master branch using 7.6.

jtibshirani · 2020-03-23T20:53:18Z

@jobergum I'm sorry for the late reply. I'm not sure why your benchmarking results aren't lining up with @mayya-sharipova's. The only other difference that comes to mind is that we always make sure to omit the returning the full document source in results by setting _source: false in the search request body: https://www.elastic.co/guide/en/elasticsearch/reference/7.6/search-request-body.html#request-body-search-source-filtering. Otherwise ES will load and return the whole stored vector for the top 10 results, whereas we are just interested in the document IDs.

@jtibshirani I've updated the master branch using 7.6.

Thanks! The 'Ivy Bridge' numbers make sense to me, based on the previous results and the performance improvements in ES. However the Haswell numbers are more surprising -- do you know why Vespa shows a latency improvement of ~2x between the Ivy Bridge and Haswell processors?

jobergum · 2020-03-23T22:07:40Z

@jtibshirani the vector is not returned with the result, if that was the case yes - I would have spotted it.

Sample response from ES

{"took":604,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":10000,"relation":"gte"},"max_score":0.005666477,"hits":[{"_index":"doc","_type":"_doc","_id":"669835","_score":0.005666477},{"_index":"doc","_type":"_doc","_id":"408764","_score":0.0056393184},{"_index":"doc","_type":"_doc","_id":"408462","_score":0.0054252045},{"_index":"doc","_type":"_doc","_id":"408855","_score":0.0053858217},{"_index":"doc","_type":"_doc","_id":"551661","_score":0.0053397696},{"_index":"doc","_type":"_doc","_id":"861882","_score":0.005264404},{"_index":"doc","_type":"_doc","_id":"406273","_score":0.0052393572},{"_index":"doc","_type":"_doc","_id":"406324","_score":0.0052266084},{"_index":"doc","_type":"_doc","_id":"551743","_score":0.005219447},{"_index":"doc","_type":"_doc","_id":"861530","_score":0.0052178036}]}}

On cpu architectures, yes it's explained by us using avx512 instructions
See

Will soon update with results using our HNSW implementation for approximate nearest neighbor search, some sample data with gist data set:

jtibshirani · 2020-03-23T22:30:16Z

Thanks for the explanation + links on AVX. The HNSW implementation looks really promising.

If it's not too much work, it would be great to report the sift-128-euclidean results against Ivy Bridge as well. I'd be curious to see how consistent the latency differences are. Other than that I don't have anything else to add, happy if you'd like to close out this issue.

jobergum · 2020-03-24T08:33:10Z

Thanks, Yes, I just did. Thanks for the input on ES benchmarking.

jobergum · 2021-01-27T20:34:28Z

I'm resolving this - hoping to have more time to introduce the ANN vespa version later on

jobergum · 2021-01-27T20:37:28Z

@jtibshirani does ES ship with an asynch feed client which makes it easier to feed documents with high throughput? I'm using the synchronous HTTP POST api but would like to move away from it. Vespa has this utility to feed a JSON file https://docs.vespa.ai/en/reference/vespa-cmdline-tools.html#vespa-feeder so I'm looking for an ES equivalent.

jtibshirani · 2021-01-27T21:03:55Z

There's an Elasticsearch Python client, which adds convenient 'bulk helpers' for indexing a large set of documents: https://elasticsearch-py.readthedocs.io/en/v7.10.1/helpers.html#bulk-helpers.

Here's an example from one of my colleagues: https://github.com/elastic/examples/blob/master/Machine%20Learning/Online%20Search%20Relevance%20Metrics/bin/index#L34. You could ignore everything related to 'pipeline', this is an optional piece of configuration for transforming documents before indexing them.

jobergum mentioned this issue May 19, 2020

OpenDistro Elasticsearch K-NN #4

Open

jobergum closed this as completed Jan 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rerun benchmark with elasticsearch 7.5 or above #2

Rerun benchmark with elasticsearch 7.5 or above #2

jtibshirani commented Mar 9, 2020

jobergum commented Mar 9, 2020

jtibshirani commented Mar 9, 2020

jobergum commented Mar 9, 2020

jobergum commented Mar 11, 2020

jtibshirani commented Mar 11, 2020 •

edited

Loading

jobergum commented Mar 11, 2020

jobergum commented Mar 13, 2020

jtibshirani commented Mar 23, 2020 •

edited

Loading

jobergum commented Mar 23, 2020 •

edited

Loading

jtibshirani commented Mar 23, 2020 •

edited

Loading

jobergum commented Mar 24, 2020

jobergum commented Jan 27, 2021

jobergum commented Jan 27, 2021

jtibshirani commented Jan 27, 2021

Rerun benchmark with elasticsearch 7.5 or above #2

Rerun benchmark with elasticsearch 7.5 or above #2

Comments

jtibshirani commented Mar 9, 2020

jobergum commented Mar 9, 2020

jtibshirani commented Mar 9, 2020

jobergum commented Mar 9, 2020

jobergum commented Mar 11, 2020

gist-960-euclidean 1M vectors

single shard with Elastic and threads-per-search equal to one with Vespa

two shards for Elastic and two threads-per-search with Vespa

jtibshirani commented Mar 11, 2020 • edited Loading

jobergum commented Mar 11, 2020

jobergum commented Mar 13, 2020

jtibshirani commented Mar 23, 2020 • edited Loading

jobergum commented Mar 23, 2020 • edited Loading

Sample response from ES

jtibshirani commented Mar 23, 2020 • edited Loading

jobergum commented Mar 24, 2020

jobergum commented Jan 27, 2021

jobergum commented Jan 27, 2021

jtibshirani commented Jan 27, 2021

jtibshirani commented Mar 11, 2020 •

edited

Loading

jtibshirani commented Mar 23, 2020 •

edited

Loading

jobergum commented Mar 23, 2020 •

edited

Loading

jtibshirani commented Mar 23, 2020 •

edited

Loading