-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support score type threshold in radial search #1589
Support score type threshold in radial search #1589
Conversation
Signed-off-by: Junqiu Lei <[email protected]>
7de0298
to
42e9fa0
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## feature/radius-search #1589 +/- ##
===========================================================
+ Coverage 84.82% 84.85% +0.02%
- Complexity 1400 1429 +29
===========================================================
Files 174 174
Lines 5713 5802 +89
Branches 569 588 +19
===========================================================
+ Hits 4846 4923 +77
- Misses 621 630 +9
- Partials 246 249 +3 ☔ View full report in Codecov by Sentry. |
SpaceType, | ||
Function<Float, Float>>builder() | ||
.put(SpaceType.INNER_PRODUCT, score -> score > 1 ? 1 - score : 1 / score - 1) | ||
.put(SpaceType.L2, score -> 1 / score - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for L2 spacetype can we put this translation in SpaceType enum class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes updated.
private final static Map<SpaceType, Function<Float, Float>> SCORE_TO_DISTANCE_TRANSFORMATIONS = ImmutableMap.< | ||
SpaceType, | ||
Function<Float, Float>>builder() | ||
.put(SpaceType.INNER_PRODUCT, score -> score > 1 ? 1 - score : 1 / score - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add some java doc here why this conversion make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added knn space doc here for more details: https://opensearch.org/docs/latest/search-plugins/knn/approximate-knn/#spaces
Minor comments overall code looks good to me. |
Signed-off-by: Junqiu Lei <[email protected]>
In general, Im not in favor of "score" naming. Users are going to expect it to match the final score produced by the search results, regardless of the query type. So, they will be confused and cut issues when a result is returned from a more complex query with a hit score that is below the score they passed in the filter. I prefer the name "similarity" - this is what lucene refers to this as - https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/FloatVectorSimilarityQuery.java - it will disambiguate this case from the user experience. |
I also thought about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per sync offline, I think "score" param is okay if we detail explanation in the documentation. Key is that score in this context is scoped to this query context.
@@ -499,4 +541,24 @@ protected int doHashCode() { | |||
public String getWriteableName() { | |||
return NAME; | |||
} | |||
|
|||
private static void validSingleQueryType(Integer k, Float distance, Float score) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: valid -> validate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Signed-off-by: Junqiu Lei <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks @junqiu-lei
@@ -36,6 +36,14 @@ public float scoreTranslation(float rawScore) { | |||
public VectorSimilarityFunction getVectorSimilarityFunction() { | |||
return VectorSimilarityFunction.EUCLIDEAN; | |||
} | |||
|
|||
@Override | |||
public float scoreToDistanceTranslation(float score) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we adding the score to distance translation for L2 only? not for cosine and dot product?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@navneet1v IMO no need for cosine and dot product here. The SpaceType Enum class is used for common translation for different engines:
- Cosine's score to distance function doesn't needed because Lucene's radial search api just need score parameter, in k-NN faiss, it doesn't support Cosine's type.
- Dot product's score to distance function is only need in Faiss engine and in Faiss it has different translation from Lucene, so I just put the function inside Lucene.java where it needed.
d334fa4
into
opensearch-project:feature/radius-search
* Support score type threshold in radial search Signed-off-by: Junqiu Lei <[email protected]>
* Support score type threshold in radial search Signed-off-by: Junqiu Lei <[email protected]>
* Support score type threshold in radial search Signed-off-by: Junqiu Lei <[email protected]>
Description
This PR allows user to use
score
as threshold to search vectors, the search result contains all docs which score higher than thescore
for Lucene and Faiss engines ANN search.Usage
During index mapping and indexing stages no behavior change. Some query examples:
Issues Resolved
Part of #814
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.