-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add binary format support for Faiss IVF #2
Conversation
404056a
to
735ccda
Compare
@@ -29,6 +29,12 @@ namespace knn_jni { | |||
jlong vectorsAddressJ, jint dimJ, jstring indexPathJ, jbyteArray templateIndexJ, | |||
jobject parametersJ); | |||
|
|||
// Create an index with ids and vectors. Instead of creating a new index, this function creates the index | |||
// based off of the template index passed in. The index is serialized to indexPathJ. | |||
void CreateBinaryIndexFromTemplate(knn_jni::JNIUtilInterface * jniUtil, JNIEnv * env, jintArray idsJ, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why introduce Binary<...> methods, but for Free take parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is to reflect underlying cpp method.
Faiss has two different method between non binary and binary for create index method. However, there is only one method, delete` to free the memory.
throw std::runtime_error("Template index cannot be null"); | ||
} | ||
|
||
// Set thread count if it is passed in as a parameter. Setting this variable will only impact the current thread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to avoid code duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to add it into faiss_index_service, but failed with some c++ null pointer error, will try again with workable solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll raise another PR to try to add it into faiss_index_service
src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesConsumer.java
Outdated
Show resolved
Hide resolved
@@ -596,8 +597,13 @@ protected List<Field> getFieldsForByteVector(final byte[] array, final FieldType | |||
return fields; | |||
} | |||
|
|||
protected void parseCreateField(ParseContext context, int dimension, SpaceType spaceType, MethodComponentContext methodComponentContext) | |||
throws IOException { | |||
protected void parseCreateField( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this need to be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we create model based index, we don't specify the data_type, which makes vectorDataType to be default float in KNNVectorFieldMapper, so I update this function to be able to pass in the correct vector datatype for model based index from modelMetaData
src/main/java/org/opensearch/knn/index/util/ModelInfoExtractor.java
Outdated
Show resolved
Hide resolved
612a612
to
29815af
Compare
@junqiu-lei in this PR I can see heemin's change and your change both. Is that intensional? |
@navneet1v It might because Heemin updated his branch which I targeted on this PR, let me rebase and update it. |
Rebased and only my commit on this PR. |
src/test/java/org/opensearch/knn/index/codec/KNNCodecTestCase.java
Outdated
Show resolved
Hide resolved
b3fc615
to
b8b127d
Compare
@@ -128,6 +137,7 @@ public ModelMetadata( | |||
this.error = Objects.requireNonNull(error, "error must not be null"); | |||
this.trainingNodeAssignment = Objects.requireNonNull(trainingNodeAssignment, "node assignment must not be null"); | |||
this.methodComponentContext = Objects.requireNonNull(methodComponentContext, "method context must not be null"); | |||
this.vectorDataType = Objects.requireNonNull(vectorDataType, "vector data type must not be null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update Java doc of the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
src/test/java/org/opensearch/knn/plugin/action/RestKNNStatsHandlerIT.java
Show resolved
Hide resolved
f5529f2
to
90479ed
Compare
@junqiu-lei @heemin32 Are you adding validation check to not use an encoder with binary datatype because I believe as of now all the encoders are not supported with Binary datatype ? |
|
c22e654
to
2fce648
Compare
src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesConsumer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/transfer/VectorTransfer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesConsumer.java
Outdated
Show resolved
Hide resolved
3a5491b
to
7037a6e
Compare
800c7c5
to
7294db4
Compare
Signed-off-by: Junqiu Lei <[email protected]>
Closing this PR, the comments on this PR are resolved and continue being reviewing on opensearch-project#1784 |
Because opensearch-project#1784 is merged, I rebased junqiu-lei:binary-ivf against opensearch-project:feature/binary-format
Description
Raising this PR in Heemin's forked k-NN repository to facilitate the review of the code changes from opensearch-project#1784
It contains code change in Java and JNI layer to support:
data_type
field when train modelWill add tests in another PR.
Example workflow
1. Create binary format train index
2. Ingest tran index
3. Create train model
4. Create IVF binary format target index
5. Bulk target index
6. Query target index
7. Query result