-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to build vector data structures greedily and perform exact search when there are no engine files #2188
Conversation
b124156
to
76f8a41
Compare
In this comment we will be looking at some of the options on how to expose a setting “approximate_threshold” which controls when to build vector data structure during flush and merge operation. If no value is provided during index creation, we will use default value to decide whether to build vector data structure or not during flush and merge operations. The possible values for this parameters are:
Proposal1. Keep “approximate_threshold” as dynamic index settingIn this approach, users can set this value per index. They can use “Update Index Setting” API to update this value on live index at any time. Example:
Pros:
Cons
2. Keep “approximate_threshold” as mapping parameterIn this approach, users can set this value as mapping parameter for knn vector field type. They can use “Update Mapping Setting” API to update this value. Example:
Pros
Cons
But in order to update a mapping parameter, users has to pass other non-updatable parameters as part of update mapping api request. In above example, to update “ 3. Combine 1 & 2In this approach we will provide this as both index setting and mapping parameter, where index setting will be considered as global setting for every field, and, users can override this value for any specific field by updating as mapping parameter. Pros
Cons
Finalized approachSince there is no work around to use as mapping parameter and provide better user experience at same time, we can first introduce this as “index setting”, (Proposal 1 ) and, if we receive substantial feedback on having it as mapping parameter from community, we can introduce mapping parameter in next iteration**(effectively Proposal 3).** |
Link to Documentation issue opensearch-project/documentation-website#8482 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think this looks pretty good
src/main/java/org/opensearch/knn/index/codec/BasePerFieldKnnVectorsFormat.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/BasePerFieldKnnVectorsFormat.java
Outdated
Show resolved
Hide resolved
76f8a41
to
2501ad2
Compare
Test failures are not due to this PR. |
… creation (opensearch-project#2007) Added new updatable index setting "build_vector_data_structure_threshold", which will be considered when to build braph or not for native engines. This is noop for lucene. This depends on use lucene format as prerequisite. We don't need to add flag since it is only enable if lucene format is already enabled. Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Vijayan Balasubramanian <[email protected]>
…ject#2175) Previosuly we only added support to build greedily for non quantization scenario. In this commit, we can remove that constraint, however, we cannot skip writing quanitization state since it is required irrespective of type of search is executed later. Signed-off-by: Vijayan Balasubramanian <[email protected]>
…project#2136) * Add exact search if no engine files are in segments When graph is not available, plugin will return empty results. With this change, exact search will be performed when only no engine file is available in segment. We also don't need version check or feature flag because, option to not build vector data structure will only be available post 2.17. If an index is created using pre 2.17 version, segment will always have engine files and this feature will never be called during search. --------- Signed-off-by: Vijayan Balasubramanian <[email protected]>
* Add support for radial search in exact search When threshold value is set, knn plugin will not be creating graph. Hence, when search request is trigged during that time, exact search will return valid results. However, radial search was never included as part of exact search. This will break radial search when threshold is added and radial search is requested. In this commit, new method is introduced to accept min score and return documents that are greater than min score, similar to how radial search is performed by native engines. This search is independent of engine, but, radial search is supported only for FAISS engine out of all native engines. Signed-off-by: Vijayan Balasubramanian <[email protected]> --------- Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Vijayan Balasubramanian <[email protected]>
2501ad2
to
00c0ed1
Compare
add this info on the gh issue too. |
…t search when there are no engine files (#2188) * Introduce new setting to configure when to build graph during segment creation (#2007) Added new updatable index setting "build_vector_data_structure_threshold", which will be considered when to build braph or not for native engines. This is noop for lucene. This depends on use lucene format as prerequisite. We don't need to add flag since it is only enable if lucene format is already enabled. Signed-off-by: Vijayan Balasubramanian <[email protected]> * Add integration test for binary vector values (#2142) Signed-off-by: Vijayan Balasubramanian <[email protected]> * Allow build graph greedily for quantization scenarios (#2175) Previosuly we only added support to build greedily for non quantization scenario. In this commit, we can remove that constraint, however, we cannot skip writing quanitization state since it is required irrespective of type of search is executed later. Signed-off-by: Vijayan Balasubramanian <[email protected]> * Add exact search if no native engine files are available (#2136) * Add exact search if no engine files are in segments When graph is not available, plugin will return empty results. With this change, exact search will be performed when only no engine file is available in segment. We also don't need version check or feature flag because, option to not build vector data structure will only be available post 2.17. If an index is created using pre 2.17 version, segment will always have engine files and this feature will never be called during search. --------- Signed-off-by: Vijayan Balasubramanian <[email protected]> * Add support for radial search in exact search (#2174) * Add support for radial search in exact search When threshold value is set, knn plugin will not be creating graph. Hence, when search request is trigged during that time, exact search will return valid results. However, radial search was never included as part of exact search. This will break radial search when threshold is added and radial search is requested. In this commit, new method is introduced to accept min score and return documents that are greater than min score, similar to how radial search is performed by native engines. This search is independent of engine, but, radial search is supported only for FAISS engine out of all native engines. Signed-off-by: Vijayan Balasubramanian <[email protected]> --------- Signed-off-by: Vijayan Balasubramanian <[email protected]> (cherry picked from commit 5a56829)
…nd perform exact search when there are no engine files (#2201) * Add support to build vector data structures greedily and perform exact search when there are no engine files (#2188) * Introduce new setting to configure when to build graph during segment creation (#2007) Added new updatable index setting "build_vector_data_structure_threshold", which will be considered when to build braph or not for native engines. This is noop for lucene. This depends on use lucene format as prerequisite. We don't need to add flag since it is only enable if lucene format is already enabled. Signed-off-by: Vijayan Balasubramanian <[email protected]> * Add integration test for binary vector values (#2142) Signed-off-by: Vijayan Balasubramanian <[email protected]> * Allow build graph greedily for quantization scenarios (#2175) Previosuly we only added support to build greedily for non quantization scenario. In this commit, we can remove that constraint, however, we cannot skip writing quanitization state since it is required irrespective of type of search is executed later. Signed-off-by: Vijayan Balasubramanian <[email protected]> * Add exact search if no native engine files are available (#2136) * Add exact search if no engine files are in segments When graph is not available, plugin will return empty results. With this change, exact search will be performed when only no engine file is available in segment. We also don't need version check or feature flag because, option to not build vector data structure will only be available post 2.17. If an index is created using pre 2.17 version, segment will always have engine files and this feature will never be called during search. --------- Signed-off-by: Vijayan Balasubramanian <[email protected]> * Add support for radial search in exact search (#2174) * Add support for radial search in exact search When threshold value is set, knn plugin will not be creating graph. Hence, when search request is trigged during that time, exact search will return valid results. However, radial search was never included as part of exact search. This will break radial search when threshold is added and radial search is requested. In this commit, new method is introduced to accept min score and return documents that are greater than min score, similar to how radial search is performed by native engines. This search is independent of engine, but, radial search is supported only for FAISS engine out of all native engines. Signed-off-by: Vijayan Balasubramanian <[email protected]> --------- Signed-off-by: Vijayan Balasubramanian <[email protected]> (cherry picked from commit 5a56829) * Fix compilation issue due to package error Signed-off-by: Vijayan Balasubramanian <[email protected]> --------- Signed-off-by: Vijayan Balasubramanian <[email protected]> Co-authored-by: Vijayan Balasubramanian <[email protected]>
Description
Added new updatable index setting "approximate_threshold", which will be
considered when to build graph or not for native engines. This is noop for lucene.
When graph is not available, plugin will return empty results. With this change, exact search will be performed when only no engine file is available in segment.
Radial search was never included as part of exact search. This will break radial search when threshold
is added and radial search is requested. Hence, introduce exact search during radial search similar to Approx search.
Related Issues
#1942
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.