-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support distance type radius search for Lucene engine #1498
Support distance type radius search for Lucene engine #1498
Conversation
ca85877
to
3c049d7
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## feature/radius-search #1498 +/- ##
===========================================================
- Coverage 85.07% 85.00% -0.08%
- Complexity 1278 1311 +33
===========================================================
Files 167 170 +3
Lines 5207 5342 +135
Branches 493 512 +19
===========================================================
+ Hits 4430 4541 +111
- Misses 570 584 +14
- Partials 207 217 +10 ☔ View full report in Codecov by Sentry. |
src/main/java/org/opensearch/knn/index/query/KNNQueryBuilder.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/query/KNNQueryBuilder.java
Outdated
Show resolved
Hide resolved
this.radius = 0.0f; | ||
} | ||
|
||
public KNNQueryBuilder(String fieldName, float[] vector, float radius, QueryBuilder filter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be time to refactor to Builder pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, let me check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say make radius last parameter for consistency. In terms of using Builder pattern for optional patterns (filter and radius I believe) that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after introducing radius, neither k or radius will be required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But cant we switch to builder pattern still?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use builder pattern in new code.
src/main/java/org/opensearch/knn/index/query/KNNQueryBuilder.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/query/KNNQueryBuilder.java
Outdated
Show resolved
Hide resolved
5aa4389
to
f19437d
Compare
… filter (opensearch-project#1402) * Optimize Faiss Query With Filters. Reduce iteration copy for docid set iterator Signed-off-by: luyuncheng <[email protected]> * Optimize Faiss Query With Filters. Reduce iteration copy for docid set iterator. Use Bitmap And Batch to do id filter. and you sparse or fixed bitset do exact ANN search Signed-off-by: luyuncheng <[email protected]> * Using int64_t instead of long type for GetLongArrayElements Signed-off-by: luyuncheng <[email protected]> * Add IDSelectorJlongBitmap Signed-off-by: luyuncheng <[email protected]> * 1. Add IDSelectorJlongBitmap and UT for it 2. Move FilterIdsSelectorType to a util class Signed-off-by: luyuncheng <[email protected]> * 1. Add IDSelectorJlongBitmap and UT for it 2. Move FilterIdsSelectorType to a util class 3. Spotless apply Signed-off-by: luyuncheng <[email protected]> * Rebase remote-tracking branch 'origin/main' into Filter Signed-off-by: luyuncheng <[email protected]> * tidy Signed-off-by: luyuncheng <[email protected]> * Add Changelog Signed-off-by: luyuncheng <[email protected]> * fix javadoc tasks Signed-off-by: luyuncheng <[email protected]> * fix bwc javadoc Signed-off-by: luyuncheng <[email protected]> * UpdatedFilterIdsSelector Signed-off-by: luyuncheng <[email protected]> * UpdatedFilterIdsSelector Signed-off-by: luyuncheng <[email protected]> * Rebase faiss_wrapper.cpp Signed-off-by: luyuncheng <[email protected]> * UpdatedFilterIdsSelector For description Select different FilterIdsSelectorType Signed-off-by: luyuncheng <[email protected]> * UpdatedFilterIdsSelector For description Select different FilterIdsSelectorType Signed-off-by: luyuncheng <[email protected]> * UpdatedFilterIdsSelector as Byte.SIZE Signed-off-by: luyuncheng <[email protected]> * UpdatedFilterIdsSelector For comments Signed-off-by: luyuncheng <[email protected]> --------- Signed-off-by: luyuncheng <[email protected]>
…rch-project#1505) Signed-off-by: Varun Jain <[email protected]>
src/main/java/org/opensearch/knn/index/query/KNNQueryBuilder.java
Outdated
Show resolved
Hide resolved
if ((k > 0 && distance > 0) || (k == 0 && distance == 0)) { | ||
throw new ParsingException( | ||
parser.getTokenLocation(), | ||
"[" + NAME + "] query requires exactly one of 'k' or 'distance' to be set and valid." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But for setting max results returned, couldnt k be used? How are we limiting outside of this?
src/main/java/org/opensearch/knn/index/query/KNNQueryBuilder.java
Outdated
Show resolved
Hide resolved
this.radius = 0.0f; | ||
} | ||
|
||
public KNNQueryBuilder(String fieldName, float[] vector, float radius, QueryBuilder filter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But cant we switch to builder pattern still?
src/main/java/org/opensearch/knn/index/query/KNNQueryBuilder.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/query/KNNQueryBuilder.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/query/KNNQueryBuilder.java
Outdated
Show resolved
Hide resolved
766b657
to
514f8a8
Compare
src/main/java/org/opensearch/knn/index/query/KNNQueryBuilder.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/query/KNNQueryBuilder.java
Outdated
Show resolved
Hide resolved
@@ -53,6 +55,7 @@ public static Query create( | |||
String fieldName, | |||
float[] vector, | |||
int k, | |||
Float radius, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, we should have a separate factory that is called for Radial Queries. Putting k and radius in same create method is confusing. There should be a RadialQueryFactory that gets called if we need a radial nearest neighbor query and KNNQueryFactory gets called if we need a KNNQuery. @VijayanB what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all current files and class name are originally from KNN, I'm not sure if we want a more generic name and pattern from "KNN" to "VectorQuery" in whole codebase, but this is a long term refactoring I think. For now I prefer reuse the existing factory without much change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmazanec15 Good point. I like your suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmazanec15 @VijayanB Added BaseQueryFactory as parent class and added RNNQueryFactory for radials search.
if (distance == null || distance < 0) { | ||
throw new IllegalArgumentException("[" + NAME + "] requires distance >= 0"); | ||
} | ||
if (k != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmazanec15 Do you see any potential backward incompatibility if we change k from int to integer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're still using int for k in KNNQueryBuilder
} | ||
if (radius != null) { | ||
if (!KNNEngine.getEnginesThatSupportsRadialSearch().contains(knnEngine)) { | ||
throw new UnsupportedOperationException(String.format("Engine [%s] does not support radial search yet", knnEngine)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need remove? This is used to validate engines if support radius search in knn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, i meant word "yet"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, updated
* | ||
* @return Set of all engines that support radial search. | ||
*/ | ||
public static Set<KNNEngine> getEnginesThatSupportsRadialSearch() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to expose this? Why not pass engine and return true if supported else otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used same pattern from other knn features.
k-NN/src/main/java/org/opensearch/knn/index/util/KNNEngine.java
Lines 82 to 107 in dad43c6
/** | |
* Get the engine from the path. | |
* | |
* @param path to be checked | |
* @return KNNEngine corresponding to path | |
*/ | |
public static KNNEngine getEngineNameFromPath(String path) { | |
if (path.endsWith(KNNEngine.NMSLIB.getExtension()) || path.endsWith(KNNEngine.NMSLIB.getCompoundExtension())) { | |
return KNNEngine.NMSLIB; | |
} | |
if (path.endsWith(KNNEngine.FAISS.getExtension()) || path.endsWith(KNNEngine.FAISS.getCompoundExtension())) { | |
return KNNEngine.FAISS; | |
} | |
throw new IllegalArgumentException("No engine matches the path's suffix"); | |
} | |
/** | |
* Returns all engines that create custom segment files. | |
* | |
* @return Set of all engines that create custom segment files. | |
*/ | |
public static Set<KNNEngine> getEnginesThatCreateCustomSegmentFiles() { | |
return CUSTOM_SEGMENT_FILE_ENGINES; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see need to expose , can we update it to accept engine as parameter and decide whether is it supported or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to directly use set ENGINES_SUPPORTING_RADIAL_SEARCH
@junqiu-lei Your description says it supports only for lucene engine. Can you add additional description on what happens if it is used for nmslib or fais? |
* @return Lucene Query | ||
*/ | ||
protected static Query getFilterQuery(BaseQueryFactory.CreateQueryRequest createQueryRequest) { | ||
if (createQueryRequest.getFilter().isPresent()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flip if condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
throw new RuntimeException("Cannot create query with filter", e); | ||
} | ||
if (queryShardContext.getParentFilter() != null) { | ||
if (new NestedHelper(queryShardContext.getMapperService()).mightMatchNestedDocs(filterQuery)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you create local instance? chaining makes it hard to debug and read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
if (new NestedHelper(queryShardContext.getMapperService()).mightMatchNestedDocs(filterQuery)) { | ||
return filterQuery; | ||
} | ||
return new org.apache.lucene.search.join.ToChildBlockJoinQuery(filterQuery, queryShardContext.getParentFilter()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use import static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
|
||
KNNQueryBuilder knnQueryBuilder; | ||
if (k != null) { | ||
knnQueryBuilder = new KNNQueryBuilder(fieldName, ObjectsToFloats(vector)).k(k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move common fields to line 276 and set either k or distance inside if block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
Sure |
* Class to create radius nearest neighbor queries | ||
*/ | ||
@Log4j2 | ||
public class RNNQueryFactory extends BaseQueryFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmazanec15 is it correct to call it as Radius nearest neighbor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be VectorSearchRadialQuery ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ref for the nearest neighbors search names: https://en.wikipedia.org/wiki/Nearest_neighbor_search#Variants
Signed-off-by: Junqiu Lei <[email protected]>
Signed-off-by: Junqiu Lei <[email protected]>
.queryName(queryName); | ||
|
||
if (k != null) { | ||
knnQueryBuilder = knnQueryBuilder.k(k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think you need to update . You can do like 285
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
* Base class for creating vector search queries. | ||
*/ | ||
@Log4j2 | ||
public abstract class BaseQueryFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think what you need is a Single Factory class or method that will return KNNQuery ( instance of Radial or KNN based on parameter )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The motivation of this refactor is come from #1498 (comment)
Signed-off-by: Junqiu Lei <[email protected]>
Approving it since pending questions are not blocker and can be refactored in later PR. |
61318ae
into
opensearch-project:feature/radius-search
…ject#1498) * Optimize Faiss Query With Filters: Reduce iteration and memory for id filter (opensearch-project#1402) * Optimize Faiss Query With Filters. Reduce iteration copy for docid set iterator Signed-off-by: luyuncheng <[email protected]> * Optimize Faiss Query With Filters. Reduce iteration copy for docid set iterator. Use Bitmap And Batch to do id filter. and you sparse or fixed bitset do exact ANN search Signed-off-by: luyuncheng <[email protected]> * Using int64_t instead of long type for GetLongArrayElements Signed-off-by: luyuncheng <[email protected]> * Add IDSelectorJlongBitmap Signed-off-by: luyuncheng <[email protected]> * 1. Add IDSelectorJlongBitmap and UT for it 2. Move FilterIdsSelectorType to a util class Signed-off-by: luyuncheng <[email protected]> * 1. Add IDSelectorJlongBitmap and UT for it 2. Move FilterIdsSelectorType to a util class 3. Spotless apply Signed-off-by: luyuncheng <[email protected]> * Rebase remote-tracking branch 'origin/main' into Filter Signed-off-by: luyuncheng <[email protected]> * tidy Signed-off-by: luyuncheng <[email protected]> * Add Changelog Signed-off-by: luyuncheng <[email protected]> * fix javadoc tasks Signed-off-by: luyuncheng <[email protected]> * fix bwc javadoc Signed-off-by: luyuncheng <[email protected]> * UpdatedFilterIdsSelector Signed-off-by: luyuncheng <[email protected]> * UpdatedFilterIdsSelector Signed-off-by: luyuncheng <[email protected]> * Rebase faiss_wrapper.cpp Signed-off-by: luyuncheng <[email protected]> * UpdatedFilterIdsSelector For description Select different FilterIdsSelectorType Signed-off-by: luyuncheng <[email protected]> * UpdatedFilterIdsSelector For description Select different FilterIdsSelectorType Signed-off-by: luyuncheng <[email protected]> * UpdatedFilterIdsSelector as Byte.SIZE Signed-off-by: luyuncheng <[email protected]> * UpdatedFilterIdsSelector For comments Signed-off-by: luyuncheng <[email protected]> --------- Signed-off-by: luyuncheng <[email protected]> * Increment 2.12.0-SNAPSHOT to 2.13.0-SNAPSHOT in BWC workflow (opensearch-project#1505) Signed-off-by: Varun Jain <[email protected]> * Manually install zlib for win CI (opensearch-project#1513) Signed-off-by: John Mazanec <[email protected]> * Upgrade faiss to 12b92e9 (opensearch-project#1509) Upgrades faiss to facebookresearch/faiss@12b92e9. Cleanup outdated patches. Signed-off-by: John Mazanec <[email protected]> * Disable sdc table for HNSWPQ read-only indices (opensearch-project#1518) Passes flag to disable sdc table for the HNSWPQ indices. This table is only used by HNSWPQ during graph creation to compare nodes already present in graph. When we call load index, the graph is read only. Hence, we wont be doing any ingestion and so the table can be disabled to save some memory. Along with this, added a unit test and a couple test helper methods for generating random data. Signed-off-by: John Mazanec <[email protected]> * Support distance type radius search for Lucene engine Signed-off-by: Junqiu Lei <[email protected]> * Resolve feedback Signed-off-by: Junqiu Lei <[email protected]> * Resolve feedback Signed-off-by: Junqiu Lei <[email protected]> * Resolve comments Signed-off-by: Junqiu Lei <[email protected]> * Resolve comments Signed-off-by: Junqiu Lei <[email protected]> * Add RNNQueryFactory class Signed-off-by: Junqiu Lei <[email protected]> * Add javadoc Signed-off-by: Junqiu Lei <[email protected]> * Resolve feedback Signed-off-by: Junqiu Lei <[email protected]> * Resolve feedback Signed-off-by: Junqiu Lei <[email protected]> * Resolve feedback Signed-off-by: Junqiu Lei <[email protected]> --------- Signed-off-by: luyuncheng <[email protected]> Signed-off-by: Varun Jain <[email protected]> Signed-off-by: John Mazanec <[email protected]> Signed-off-by: Junqiu Lei <[email protected]> Co-authored-by: luyuncheng <[email protected]> Co-authored-by: Varun Jain <[email protected]> Co-authored-by: John Mazanec <[email protected]>
Signed-off-by: Junqiu Lei <[email protected]>
Signed-off-by: Junqiu Lei <[email protected]>
…ject#1498) * Optimize Faiss Query With Filters: Reduce iteration and memory for id filter (opensearch-project#1402) * Optimize Faiss Query With Filters. Reduce iteration copy for docid set iterator Signed-off-by: luyuncheng <[email protected]> * Optimize Faiss Query With Filters. Reduce iteration copy for docid set iterator. Use Bitmap And Batch to do id filter. and you sparse or fixed bitset do exact ANN search Signed-off-by: luyuncheng <[email protected]> * Using int64_t instead of long type for GetLongArrayElements Signed-off-by: luyuncheng <[email protected]> * Add IDSelectorJlongBitmap Signed-off-by: luyuncheng <[email protected]> * 1. Add IDSelectorJlongBitmap and UT for it 2. Move FilterIdsSelectorType to a util class Signed-off-by: luyuncheng <[email protected]> * 1. Add IDSelectorJlongBitmap and UT for it 2. Move FilterIdsSelectorType to a util class 3. Spotless apply Signed-off-by: luyuncheng <[email protected]> * Rebase remote-tracking branch 'origin/main' into Filter Signed-off-by: luyuncheng <[email protected]> * tidy Signed-off-by: luyuncheng <[email protected]> * Add Changelog Signed-off-by: luyuncheng <[email protected]> * fix javadoc tasks Signed-off-by: luyuncheng <[email protected]> * fix bwc javadoc Signed-off-by: luyuncheng <[email protected]> * UpdatedFilterIdsSelector Signed-off-by: luyuncheng <[email protected]> * UpdatedFilterIdsSelector Signed-off-by: luyuncheng <[email protected]> * Rebase faiss_wrapper.cpp Signed-off-by: luyuncheng <[email protected]> * UpdatedFilterIdsSelector For description Select different FilterIdsSelectorType Signed-off-by: luyuncheng <[email protected]> * UpdatedFilterIdsSelector For description Select different FilterIdsSelectorType Signed-off-by: luyuncheng <[email protected]> * UpdatedFilterIdsSelector as Byte.SIZE Signed-off-by: luyuncheng <[email protected]> * UpdatedFilterIdsSelector For comments Signed-off-by: luyuncheng <[email protected]> --------- Signed-off-by: luyuncheng <[email protected]> * Increment 2.12.0-SNAPSHOT to 2.13.0-SNAPSHOT in BWC workflow (opensearch-project#1505) Signed-off-by: Varun Jain <[email protected]> * Manually install zlib for win CI (opensearch-project#1513) Signed-off-by: John Mazanec <[email protected]> * Upgrade faiss to 12b92e9 (opensearch-project#1509) Upgrades faiss to facebookresearch/faiss@12b92e9. Cleanup outdated patches. Signed-off-by: John Mazanec <[email protected]> * Disable sdc table for HNSWPQ read-only indices (opensearch-project#1518) Passes flag to disable sdc table for the HNSWPQ indices. This table is only used by HNSWPQ during graph creation to compare nodes already present in graph. When we call load index, the graph is read only. Hence, we wont be doing any ingestion and so the table can be disabled to save some memory. Along with this, added a unit test and a couple test helper methods for generating random data. Signed-off-by: John Mazanec <[email protected]> * Support distance type radius search for Lucene engine Signed-off-by: Junqiu Lei <[email protected]> * Resolve feedback Signed-off-by: Junqiu Lei <[email protected]> * Resolve feedback Signed-off-by: Junqiu Lei <[email protected]> * Resolve comments Signed-off-by: Junqiu Lei <[email protected]> * Resolve comments Signed-off-by: Junqiu Lei <[email protected]> * Add RNNQueryFactory class Signed-off-by: Junqiu Lei <[email protected]> * Add javadoc Signed-off-by: Junqiu Lei <[email protected]> * Resolve feedback Signed-off-by: Junqiu Lei <[email protected]> * Resolve feedback Signed-off-by: Junqiu Lei <[email protected]> * Resolve feedback Signed-off-by: Junqiu Lei <[email protected]> --------- Signed-off-by: luyuncheng <[email protected]> Signed-off-by: Varun Jain <[email protected]> Signed-off-by: John Mazanec <[email protected]> Signed-off-by: Junqiu Lei <[email protected]> Co-authored-by: luyuncheng <[email protected]> Co-authored-by: Varun Jain <[email protected]> Co-authored-by: John Mazanec <[email protected]>
Signed-off-by: Junqiu Lei <[email protected]>
Signed-off-by: Junqiu Lei <[email protected]>
Signed-off-by: Junqiu Lei <[email protected]>
Signed-off-by: Junqiu Lei <[email protected]>
…ject#1498) Signed-off-by: Junqiu Lei <[email protected]>
Description
This PR introduces radius search capability to the Lucene engine, leveraging a new distance parameter. This feature allows users to perform searches within a specified radius, supporting L2 Square and Cosine Similarity metrics, similar to the existing top-k search functionality.
For now for other unsupported engines, the query request will return 400 response with unsupported information.
For detailed distance metric explanations, see OpenSearch k-NN documentation.
Example radius search workflow:
Issues Resolved
#814
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.