Integrate ANN search #78473

jtibshirani · 2021-09-29T17:17:10Z

elasticmachine · 2021-09-29T17:18:30Z

Pinging @elastic/es-search (Team:Search)

MLnick · 2021-09-30T17:19:30Z

Very excited to see this!

Support cosine similarity instead of dot product (?)

IMO both should be supported. Similar items via cosine sim is a very common use case, as is dot product (for e.g. recommendations).

This PR extends the `dense_vector` type to allow vectors to be added to an ANN index: ``` "mappings": { "properties": { "my_vector": { "type": "dense_vector", "dims": 128, "index": true, "similarity": "l2_norm" } } } ``` A description of the parameters: * `index`. Setting this to `true` indicates the field should be added to the ANN index. The values will be parsed as a `KnnVectorField` instead of a doc values field. By default `index: false` to provide a smooth transition from 7.x, where vectors are not indexed. * `similarity`. When `index: true`, it's required to specify what similarity to use when indexing the vectors. Right now the accepted values are `l2_norm` and `dot_product`, which matches the Lucene options. (We decided to require `similarity` to be set since there's no default choice that works in general, and it's easy to overlook and accidentally get poor results.) Indexed vectors still support the same functionality as vectors based on doc values -- they work with vector script functions and `exists` queries. Relates to #78473.

jtibshirani · 2021-10-05T16:13:56Z

@MLnick thanks for the feedback, I updated the plan to make sure we cover both. Dot product may need special handling as it's not a true metric (for example doesn't satisfy the triangle inequality). I've also seen dot product used as an optimized cosine similarity, by normalizing all vectors to unit length beforehand -- this is more straightforward to support.

This PR extends the dense_vector type to allow configure HNSW params in `index_options`: `m` – max number of connections for each node, `ef_construction` – number of candidate neighbors to track while searching the graph for each newly inserted node. ``` "mappings": { "properties": { "my_vector": { "type": "dense_vector", "dims": 128, "index": true, "similarity": "l2_norm", "index_options": { "type" : "hnsw", "m" : 15, "ef_construction" : 50 } } } } ``` index_options as an object, and all parameters underneath are optional. If `m` or `ef_contruction` are not provided, the default values from the current codec will be used. Relates to elastic#78473

This PR extends the dense_vector type to allow configure HNSW params in `index_options`: `m` – max number of connections for each node, `ef_construction` – number of candidate neighbours to track while searching the graph for each newly inserted node. ``` "mappings": { "properties": { "my_vector": { "type": "dense_vector", "dims": 128, "index": true, "similarity": "l2_norm", "index_options": { "type" : "hnsw", "m" : 15, "ef_construction" : 50 } } } } ``` `index_options` as an object is optional. If not provided, the default values from the current codec will be used. If `index_options` is provided, that all parameters related to the specific type must be provided. Relates to #78473

The new kNN endpoint currently doesn't support searches on nested fields. This PR updates the validation logic to detect this case and throw a clear error. It also adds tests for kNN search when there are nested documents. Relates to #78473.

This PR throws an exception for kNN searches on filetered aliases. We don't allow kNN searches on filtered aliases as currently filters are applied only after kNN searches are done, which may lead to returning less than k results. In the future, we want to apply filters while doing a kNN search. Once implemented, we will allow kNN searches on filtered aliases. Relates to elastic#78473

This PR fixes some issues in `KnnVectorQueryBuilderTests`: * Improve the check on the Lucene query * Remove an unused field mapping Relates to #78473.

This PR ensures the `_knn_search` endpoint handles both FLS and DLS: * Updates `FieldSubsetReader` to handle FLS for the vectors format * Adds tests to check both DLS and FLS work Relates to #78473.

This PR fixes some issues in `KnnVectorQueryBuilderTests`: * Improve the check on the Lucene query * Remove an unused field mapping Relates to elastic#78473.

This commit updates the `dense_vector` docs to include information on the new `index`, `similarity`, and `index_options` parameters. It also tries to clarify the difference between `similarity` and `index_options` with the existing parameters that have the same name. Relates to #78473.

This commit adds docs for the new `_knn_search` endpoint. It focuses on being an API reference and is light on details in terms of how exactly the kNN search works, and how the endpoint contrasts with `script_score` queries. We plan to add a high-level guide on kNN search that will explain this in depth. Relates to #78473.

coreation · 2021-11-23T18:54:29Z

@mayya-sharipova great to see this being put to work, I was going through the documentation WIP (https://elasticsearch_80857.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/knn-search.html#exact-knn) and was a bit confused. There's a lot of work being done around ANN's, which is experimental, and there's "exact kNN", but correct me if I'm wrong, there's nothing new being done with regards to exact kNN's right? The function score there is already possible in 7.11 for example.

I was wondering if the effort being done will help speed up exact kNN searches, is that something that will be improved by this issue? If read the issue it doesn't sound like it, but I wanted to make sure I wasn't mistaken.

mayya-sharipova · 2021-11-26T13:17:00Z

@coreation

there's nothing new being done with regards to exact kNN's right?

You are right, this issue and the work done is concerned only approximate NN, and doesn't bring improvement to the exact KNN search.

I wondering what is your use case of the exact KNN, would it be possible for your use case to use ANN tuned for high accuracy/recall (Using a big number for num_candidates)?

Also what kind of speed ups are you thinking for the exact kNN search?

coreation · 2021-11-26T14:19:33Z

@mayya-sharipova Our use is that given a set of vectors, find the best fitting (N) other documents that are also complying with a set of filters. Currently we use a query in combination with a script score function for this, where the script score function can have 1-30 cosine similarity calculations, since we don't have 1 vector to match against, but a set of vectors.

This takes quite a bit of time, which is understandable given the sometimes 30 cosine similarity computations per score. I think aNN with filters will help speed up this process dramatically, if I understand it correctly because we don't need an exact score per se, just an idea of how well they match compared to the given vectors.

So your proposal of using ANN tuned for high accuracy/recall will suffice - and likely return results in a much faster manner.

tholor · 2021-11-26T15:09:28Z

+1 for the combination of ANN and filters!

From what I understand from this current draft, the combination of ANN and filtering won't be supported yet and will only be explored in a distinct future?
Our use case is also heavily relying on running KNN on a filtered subset of documents. As these subsets are growing into the millions, we've reached the limits of KNN and hoped for switching to ANN with the 8.0 release. However, if ANN doesn't support filtering, we will run into accuracy problems when running this on the whole index.

jtibshirani · 2021-11-29T21:46:23Z

Thanks for the feedback! I was too ambitious in listing all these extensions (like filtering) under "Phase 2". I changed the heading name to "Future Plans". We'll tackle them in their own dedicated GitHub issues.

coreation · 2021-11-30T08:13:10Z

Thanks for the update @jtibshirani , will that issue (filtering) be linked as well in the main post when available?

Adds a high-level guide for running an approximate or exact kNN search in Elasticsearch. Relates to #78473.

msahamed · 2021-12-04T15:49:02Z

Thank you for the great work with ANN support. I could not agree more regarding @tholor's view on the ANN and filter. Filtering with ANN is among the powerful options that other databases lack. In my role as a data scientist, I feel this is a necessity every day. Hence, it would be more beneficial if the ANN +filter were a higher priority.

In order to perform a kNN search on a `dense_vector` field, it must have `index: true` in its mapping. This commit clarifies the error message. Before the message was confusing, because the user likely didn't touch the `index` parameter and might not even be aware of it. It adds a note to the docs clarifying that when coming from 7.x, you must explicitly update `index: true` and reindex the vectors. Relates to #78473.

jtibshirani · 2021-12-15T21:39:06Z

I opened #81788 to track work on supporting ANN with filtering (also linked under "Future Plans" in the description). From your comments, it sounds like filtering would be really useful and a high priority for you.

I'm going to close out this issue, since we've merged the work required for basic ANN support. This is just a beginning -- we expect to iterate on and improve the feature through other GitHub issues.

Adds a release highlight for the kNN search API. Relates to #78473 and #79013 ### Preview https://elasticsearch_83755.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/8.0/release-highlights.html#_knn_search_api

jtibshirani · 2022-02-24T03:53:54Z

I opened a new meta issue to track our follow-up work: #84324.

jtibshirani added :Search/Search Search-related issues that do not fall into other categories >feature Meta labels Sep 29, 2021

elasticmachine added the Team:Search Meta label for search team label Sep 29, 2021

This was referenced Sep 29, 2021

Investigate various implementations of ann search for vector fields #42326

Closed

Extend dense_vector to support indexing vectors #78491

Merged

jtibshirani mentioned this issue Oct 5, 2021

Load knn vectors format with mmapfs #78724

Merged

mayya-sharipova mentioned this issue Oct 14, 2021

Add support for configuring HNSW parameters #79193

Merged

This was referenced Oct 18, 2021

Add new kNN search endpoint #79013

Merged

Disallow kNN searches on nested vector fields #79403

Merged

mayya-sharipova mentioned this issue Oct 21, 2021

Disallow kNN searches with index alias filters #79654

Closed

jtibshirani mentioned this issue Oct 21, 2021

Small fixes in KnnVectorQueryBuilderTests #79667

Merged

jtibshirani added a commit that referenced this issue Oct 23, 2021

Small fixes in KnnVectorQueryBuilderTests (#79667)

fa7b6ee

This PR fixes some issues in `KnnVectorQueryBuilderTests`: * Improve the check on the Lucene query * Remove an unused field mapping Relates to #78473.

jtibshirani mentioned this issue Oct 25, 2021

Ensure kNN search respects authorization #79693

Merged

jtibshirani mentioned this issue Nov 3, 2021

Update dense_vector docs with kNN indexing options #80306

Merged

jtibshirani mentioned this issue Nov 4, 2021

Add docs for kNN search endpoint #80378

Merged

jrodewig self-assigned this Nov 5, 2021

tholor mentioned this issue Nov 9, 2021

Multiple filters not supported in MilvusDocumentStore? deepset-ai/haystack#1712

Closed

jtibshirani assigned mayya-sharipova and jtibshirani Nov 10, 2021

jrodewig added a commit that referenced this issue Nov 30, 2021

[DOCS] Add high-level guide for kNN search (#80857)

229d2d7

Adds a high-level guide for running an approximate or exact kNN search in Elasticsearch. Relates to #78473.

elasticsearchmachine pushed a commit that referenced this issue Nov 30, 2021

[DOCS] Add high-level guide for kNN search (#80857) (#81172)

cd53ff6

Adds a high-level guide for running an approximate or exact kNN search in Elasticsearch. Relates to #78473.

jrodewig removed their assignment Dec 2, 2021

rjurney mentioned this issue Dec 13, 2021

Add vectors search track elastic/rally-tracks#217

Merged

jtibshirani closed this as completed Dec 15, 2021

jtibshirani added the release highlight label Jan 27, 2022

jrodewig mentioned this issue Feb 9, 2022

[DOCS] Add 8.0 release highlight for kNN search API #83755

Merged

jtibshirani added :Search Relevance/Vectors Vector search and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 21, 2022

javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate ANN search #78473

Integrate ANN search #78473

jtibshirani commented Sep 29, 2021 •

edited

Loading

elasticmachine commented Sep 29, 2021

MLnick commented Sep 30, 2021

jtibshirani commented Oct 5, 2021

coreation commented Nov 23, 2021 •

edited

Loading

mayya-sharipova commented Nov 26, 2021

coreation commented Nov 26, 2021

tholor commented Nov 26, 2021 •

edited

Loading

jtibshirani commented Nov 29, 2021

coreation commented Nov 30, 2021 •

edited

Loading

msahamed commented Dec 4, 2021

jtibshirani commented Dec 15, 2021

jtibshirani commented Feb 24, 2022

Integrate ANN search #78473

Integrate ANN search #78473

Comments

jtibshirani commented Sep 29, 2021 • edited Loading

Background

Implementation Plan

Phase 0: Help prepare Lucene's HNSW implementation

Phase 1: Basic ANN support

Future Plans: Improvements to functionality and performance

elasticmachine commented Sep 29, 2021

MLnick commented Sep 30, 2021

jtibshirani commented Oct 5, 2021

coreation commented Nov 23, 2021 • edited Loading

mayya-sharipova commented Nov 26, 2021

coreation commented Nov 26, 2021

tholor commented Nov 26, 2021 • edited Loading

jtibshirani commented Nov 29, 2021

coreation commented Nov 30, 2021 • edited Loading

msahamed commented Dec 4, 2021

jtibshirani commented Dec 15, 2021

jtibshirani commented Feb 24, 2022

jtibshirani commented Sep 29, 2021 •

edited

Loading

coreation commented Nov 23, 2021 •

edited

Loading

tholor commented Nov 26, 2021 •

edited

Loading

coreation commented Nov 30, 2021 •

edited

Loading