Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KNNIterators should support with and without filters #2155

Merged
merged 6 commits into from
Sep 28, 2024

Conversation

VijayanB
Copy link
Member

@VijayanB VijayanB commented Sep 27, 2024

Description

Update VectorIterator, ByteIterator, NestedVectorIterator, NestedByteIterator to iterate even if there are no filters provided.
Currently this is used by exact search to score either topK docs or all docs when filter is provided by users.
However, in future we will be allowing exact search even if there are no filters. Hence, decouple filter
and make it option to support both cases.

Related Issues

Pre-requisite for #1942

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Comment on lines 30 to 31
public ByteVectorIdsKNNIterator(
final BitSet filterIdsBitSet,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make filterIdsBitSet as Optional? or atleast annotate as @nullable to provide a signal that this parameter can be null.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Update VectorIterator and NesterVector Iterator to
iterate even if there is no filters provided to iterator.
Currently this is used by exact search to score either topk
docs or all docs when filter is provided by users.
However, in future we will be allowing exact search
even if there are no filters. Hence, decouple filter
and make it option to support both cases.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Vijayan Balasubramanian <[email protected]>
Copy link
Collaborator

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactoring.

@VijayanB VijayanB force-pushed the refactor-iterator-new branch from 7b53e0c to 71dba07 Compare September 27, 2024 07:19
@VijayanB VijayanB requested a review from navneet1v September 27, 2024 17:55
this.docId = getNextDocId();
}

ByteVectorIdsKNNIterator(final byte[] queryVector, final KNNBinaryVectorValues binaryVectorValues, final SpaceType spaceType)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ByteVectorIdsKNNIterator(final byte[] queryVector, final KNNBinaryVectorValues binaryVectorValues, final SpaceType spaceType)
public ByteVectorIdsKNNIterator(final byte[] queryVector, final KNNBinaryVectorValues binaryVectorValues, final SpaceType spaceType)

super(filterIdsArray, queryVector, binaryVectorValues, spaceType);
this.parentBitSet = parentBitSet;
}

NestedByteVectorIdsKNNIterator(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NestedByteVectorIdsKNNIterator(
public NestedByteVectorIdsKNNIterator(

private final BitSet parentBitSet;

NestedFilteredIdsKNNIterator(
final BitSet filterIdsArray,
NestedVectorIdsKNNIterator(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NestedVectorIdsKNNIterator(
public NestedVectorIdsKNNIterator(

this(filterIdsArray, queryVector, knnFloatVectorValues, spaceType, parentBitSet, null, null);
}

public NestedFilteredIdsKNNIterator(
NestedVectorIdsKNNIterator(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NestedVectorIdsKNNIterator(
public NestedVectorIdsKNNIterator(

@@ -55,10 +58,13 @@ public int nextDoc() throws IOException {
if (docId == DocIdSetIterator.NO_MORE_DOCS) {
return DocIdSetIterator.NO_MORE_DOCS;
}
int doc = binaryVectorValues.advance(docId);
if (bitSetIterator != null) {
binaryVectorValues.advance(docId);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this line inside getNextDocId()? See line#84 for example.


protected int getNextDocId() throws IOException {
if (bitSetIterator != null) {
return this.bitSetIterator.nextDoc();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this for consistency. this is mostly needed only inside constructor to distinguish the variable from parameter.

Suggested change
return this.bitSetIterator.nextDoc();
int docId = bitSetIterator.nextDoc();
binaryVectorValues.advance(docId);
return docId;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heemin32 It is missing NO_DOCUMENTS check. Please check my response here #2155 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am good with having NO_DOCUMENTS check here if needed.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
@VijayanB VijayanB force-pushed the refactor-iterator-new branch from 71dba07 to a9ab6cd Compare September 27, 2024 21:06
@VijayanB VijayanB force-pushed the refactor-iterator-new branch from a9ab6cd to 71adc4f Compare September 27, 2024 23:25
Signed-off-by: Vijayan Balasubramanian <[email protected]>
@VijayanB VijayanB requested a review from heemin32 September 28, 2024 00:02
@VijayanB VijayanB merged commit 6f6dd56 into opensearch-project:main Sep 28, 2024
34 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 28, 2024
* Rename class names to represent both and filter and non filter use cases
* Iterator should support with filters

Update VectorIterator and NesterVector Iterator to
iterate even if there is no filters provided to iterator.
Currently this is used by exact search to score either topk
docs or all docs when filter is provided by users.
However, in future we will be allowing exact search
even if there are no filters. Hence, decouple filter
and make it option to support both cases.

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
(cherry picked from commit 6f6dd56)
VijayanB added a commit that referenced this pull request Sep 28, 2024
* Rename class names to represent both and filter and non filter use cases
* Iterator should support with filters

Update VectorIterator and NesterVector Iterator to
iterate even if there is no filters provided to iterator.
Currently this is used by exact search to score either topk
docs or all docs when filter is provided by users.
However, in future we will be allowing exact search
even if there are no filters. Hence, decouple filter
and make it option to support both cases.

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
(cherry picked from commit 6f6dd56)

Co-authored-by: Vijayan Balasubramanian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants