-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Support Radius Search in k-NN #814
Comments
Assigning it to @vamshin |
Please +1 if you are looking for this feature to help prioritize |
With leveraging both Lucene and FAISS libraries, we can find out a way to unify the radius search API for both enginess in OpenSearch k-NN.
Our initial release goal is to support both Lucene and Faiss engine for this feature. |
Lucene 9.10 branch was cut yesterday and the release process will begin tomorrow. Given that OpenSearch release is starting today, it's unlikely this would make 2.12 w/o a delay. 9.10 upgrade is a good reason to suggest postponing 2.12 as there's several nice updates and features coming in Lucene, but it looks like OpenSearch core's main branch still hasn't updated to any snapshot of 9.10. Perplexing why that never happened since (at minimum) the main branch should be updated to the next minor snapshot after a release of Lucene to ensure we stay on track w/ upstream releases. ¯\_(ツ)_/¯ |
Thanks @nknize for sharing the timely information on Luecne 9.10! |
Radial search benchmarkCluster configuration
Cluster created by opensearch-build and opensearch-cluster-cdk Feature branch: https://github.com/junqiu-lei/k-NN/tree/2.x-radial-traversal Benchmark ToolOSB DatasetWe need update datasets to include threshold values for top-k and true neighbors for radial threshold. For example, the top k 100 query threshold means we use the 100th nearest doc to capture the min_score threshold and max_distance threshold.
Algorithm
Results
Observations
|
@junqiu-lei thanks, I think the 0.95*min_score makes sense to start. We can make this configurable in future if need be. |
Closing this issue as this feature is going to release at 2.14. |
Is your feature request related to a problem?
I would be interested in the ability to provide a maximum radius filter to find all (approximate) nearest neighbors that lie within a certain distance from the query vector. For our use case we need to aggregate over this result. If I am not mistaken, this is currently not possible.
What solution would you like?
Something like:
This would allow me to aggregate over all results for the given radius.
What alternatives have you considered?
We are considering Elasticsearch as an alternative. They are working on such a feature: elastic/elasticsearch#84929
The text was updated successfully, but these errors were encountered: