-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add binary format support with IVF method in Faiss Engine #1784
Add binary format support with IVF method in Faiss Engine #1784
Conversation
dafd79b
to
a913082
Compare
22d044a
to
629757b
Compare
f726d81
to
648f342
Compare
|
||
private boolean isBinaryField(FieldInfo field) { | ||
if (field.attributes().containsKey(MODEL_ID)) { | ||
Model model = ModelCache.getInstance().get(field.attributes().get(MODEL_ID)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model can be quite big because it contains the binary blob. Can we read from model metadata instead via ModelDao?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated code to use model only once in the addKNNBinaryField function
src/main/java/org/opensearch/knn/training/ByteTrainingDataConsumer.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@Override | ||
public void accept(List<byte[]> byteVectors) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously implemented from Consumer, which are public, now removed parent implement from Consumer and changed it to protected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be even private. I don't see it being used outside of this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accept method is used in some tests, but I removed other methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update but could you try to remove accept method as well? It is not great to have it just because it is used in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The accept
is also used in the processTrainingVectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But we can change it to private for that use case.
src/main/java/org/opensearch/knn/training/ByteTrainingDataConsumer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/training/ByteTrainingDataConsumer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/training/TrainingDataConsumer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/training/TrainingDataConsumer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/training/TrainingDataConsumer.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Junqiu Lei <[email protected]>
|
||
List<Float[]> vectorsConsumed; | ||
// create test float training data consumer class extending FloatTrainingDataConsumer | ||
private static class TestFloatTrainingDataConsumer extends FloatTrainingDataConsumer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this class? I think we can just use FloatTrainingDataConsumer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the TestFloatTrainingDataConsumer processTrainingVectors
method, it additionally count the total vectors used for test only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in another comment, I think we don't need to validate total vectors count. All we need is to verifying that JNIService is called with expected method and parameters.
} | ||
|
||
@Override | ||
public void processTrainingVectors(SearchResponse searchResponse, int vectorsToAdd, String fieldName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a unit test for this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in VectorReaderTests.java by TestFloatTrainingDataConsumer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we should name it as FloatTrainingDataConsumerTests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a existing test class named FloatTrainingDataConsumerTests
, but I think VectorReaderTests can cover the test of processTrainingVectors. Will have another PR for the refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving. We need to refactor TrainingDataConsumer though.
abae5d5
into
opensearch-project:feature/binary-format
…-project#1784) Signed-off-by: Junqiu Lei <[email protected]>
…-project#1784) Signed-off-by: Junqiu Lei <[email protected]>
…-project#1784) Signed-off-by: Junqiu Lei <[email protected]>
) Signed-off-by: Junqiu Lei <[email protected]>
) Signed-off-by: Junqiu Lei <[email protected]>
) Signed-off-by: Junqiu Lei <[email protected]>
Resolved comments in PR heemin32#2, which was friendly to check the file diffs when #1781 wasn't merged. Because #1781 now is merged, I rebased junqiu-lei:binary-ivf against opensearch-project:feature/binary-format
Description
This PR will support using binary format with Faiss IVF method, it mainly have changes:
data_type
field when train modelJNI layer related refactor works will be complete in another PR tracked by #1846
Example workflow
1. Create binary format train index
2. Ingest tran index
3. Create train model
4. Create IVF binary format target index
5. Bulk target index
6. Query target index
7. Query result
Issues Resolved
part of #1767
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.