-
Notifications
You must be signed in to change notification settings - Fork 56
Adding support for FAISS JNI #285
Adding support for FAISS JNI #285
Conversation
2. Right Now Only Support JNI
modified make without clang
modified OPENMP build in CMakeLists.txt
2. When Write index and query index, we use knnEngine to verify search in faiss or nmslib
Add index.knn.knnEngine in the ES Settings. When Write a Segment OR Query Index. It would use this settings to find the knn lib
2. almost integTest and unitest PASSED
Gradle build jni and jniFaiss
Hi @luyuncheng , thanks for making this PR! This is a very big change, so it will take some time to review. I will start looking at it in January. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will review the PR first and then check into a develop branch for further testing.
Maintaining the same interface makes sense.
I added a preliminary review. I will continue to review as I think more about it.
A couple comments:
- For FAISS submodule, we should align it with a release. It appears to be aligned to an arbitrary commit
- In general, I do not think it is a good idea to check a binary in (libfaiss.a). What is the reason for doing this?
I need to think more on best way to support 2 libraries from the user perspective. I will get back on this.
@@ -0,0 +1,144 @@ | |||
cmake_minimum_required(VERSION 2.8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Design question: Should we have separate JNI libraries for FAISS and nmslib, or should they be contained in one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure which is better.
At this, i separate libraries just to elaborate this faiss engine
can work with knn-plugin.
may be one jni interface can make jni code more simple to maintain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel having seperate JNI would be more cleaner and easy to abstract out the underlying business logic to dedicated files. I like the current approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having 2 separate libraries is okay.
@@ -151,12 +153,22 @@ private void onRemoval(RemovalNotification<String, KNNIndexCacheEntry> removalNo | |||
*/ | |||
public KNNIndex getIndex(String key, final String indexName) { | |||
try { | |||
//TODO if Type Not consistent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when cached an Index, should we verify which engine for this index, and then load the exactly library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I think we should add a different file extension for faiss graphs as opposed to nmslib graphs. Maybe .faiss_hnsw.
@@ -83,6 +85,7 @@ | |||
/** | |||
* Default setting values | |||
*/ | |||
public static final String INDEX_KNN_DEFAULT_ENGINE = "Faiss"; // nmslib, faiss |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep nmslib as the default engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. this is just for integration test can cover the faiss
engine code.
upstream master merge into local master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few questions
-
Does faiss support all the library functions nmslib supports. We are primarily focusing on
L2, cosine, L1, negative dot product. Accordingly we may need to have relevant validations and throw the exceptions with proper message. -
Can we confirm From Elasticsearch point of view following work? Manual test is fine for now and ideally we would prefer to have integration tests.
- A nmslib index should be possible to reindex into faiss index and vice versa
- Forcemerge calls work on faiss index (recall should be about same)
- snapshot/restore work
build.gradle
Outdated
task buildJniLib() { | ||
dependsOn cmakeJniLib | ||
dependsOn buildJniNmsLib | ||
dependsOn buildJniFaissLib | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could remove dependsOn cmakeJniLib
and just rely on dependsOn buildJniNmsLib
dependsOn buildJniFaissLib
as they internally depends on cmakeJniNmsLib
and cmakeJniFaissLib
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I fixed it
* Method: saveIndex | ||
* Signature: ([I[[FLjava/lang/String;[Ljava/lang/String;Ljava/lang/String;)V | ||
*/ | ||
JNIEXPORT void JNICALL Java_com_amazon_opendistroforelasticsearch_knn_index_faiss_KNNFIndex_saveIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is faiss library backward compatible?
We should have faiss library version in the function names to support backward compatibility issues if any arises in the future? You could refer nmslib apis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, your are right. i will add the version number into it.
* When autoclose class do close, then delete the pointer | ||
* Method GC pointer | ||
*/ | ||
JNIEXPORT void JNICALL Java_com_amazon_opendistroforelasticsearch_knn_index_faiss_KNNFIndex_gc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should run memory leak tests for faiss jni code. @jmazanec15 ran some memory leak tests for nmslib, we could probably ran it on the faiss branch once we create. Lets see if we can test it as part of this PR or atleast we can create an issue to track all TODOs later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any tools to detect memory leak in JNI code?
when i submit the faiss jni code, i have been writed a demo with valgrind
to detect memory leak. but i do not know how to detect it from java test code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We used valgrind
to detect memory leaks. Last time I remember we build C++ class to call the functions directly and then build the artifact for running valgrind
. We could probably help here.
@@ -25,9 +28,17 @@ | |||
public String indexLibraryVersion() { | |||
return "KNNIndexV2_0_6"; | |||
} | |||
}, | |||
VFaiss("Faiss") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having Faiss part of NmsLibVersion
class would add confusion. We should probably have dedicated enum class for Faiss versions? Intension of this class is to hold different versions of same library and refer them from this class incase we happen to maintain more than one version of the library because of compatability issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea! i'll do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about make
nmslib: KNNIndex_NMSLIB_V2_0_6,
faiss: KNNIndex_FAISS_V1_6_4
like this:
public enum NmsLibVersion {
VNMSLIB_208("NMSLIB_208"){
@Override
public String indexLibraryVersion() {
return "KNNIndex_NMSLIB_V2_0_8";
}
},
VFAISS_164("FAISS_164") {
@Override
public String indexLibraryVersion() {
return "KNNIndex_FAISS_V1_6_4";
}
};
}
or do you have any idea to distinguish the name between nmslib and faiss library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have new enum class for FAISS that just holds FAISS versions. Example
public enum FAISSLibVersion {
/**
* Latest available faiss version
*/
V_164("164"){
@Override
public String indexLibraryVersion() {
return "KNNIndexV2_0_11";
}
};
public static final FAISSLibVersion LATEST = V164;
public String buildVersion;
FAISSLibVersion(String buildVersion) {
this.buildVersion = buildVersion;
}
/**
* FAISS library version used by the KNN codec
* @return name
*/
public abstract String indexLibraryVersion();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have new enum class for FAISS that just holds FAISS versions. Example
yes, it is great. i added FAISSLibVersion in the latest commits
src/test/java/com/amazon/opendistroforelasticsearch/knn/index/KNNJNIFaissTests.java
Show resolved
Hide resolved
@@ -0,0 +1,138 @@ | |||
package com.amazon.opendistroforelasticsearch.knn.index.faiss; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
package name should have version information similar to nmslib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the latest commits, i add faiss
version into package and function names.
|
||
if(knnEngine.contains(NmsLibVersion.VFaiss.getBuildVersion())) { | ||
AccessController.doPrivileged( | ||
new PrivilegedAction<Void>() { | ||
public Void run() { | ||
KNNFIndex.saveIndex(pair.docs, pair.vectors, tempIndexPath, algoParams, spaceType); | ||
return null; | ||
} | ||
} | ||
); | ||
} else { | ||
AccessController.doPrivileged( | ||
new PrivilegedAction<Void>() { | ||
public Void run() { | ||
KNNIndex.saveIndex(pair.docs, pair.vectors, tempIndexPath, algoParams, spaceType); | ||
return null; | ||
} | ||
} | ||
} | ||
); | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we move if else check inside the AccessController.doPrivileged
. Something like below example
AccessController.doPrivileged(
new PrivilegedAction<Void>() {
public Void run() {
if(knnindex)
KNNIndex.saveIndex(pair.docs, pair.vectors, tempIndexPath, algoParams, spaceType);
if(faissindex)
KNNFIndex.saveIndex(pair.docs, pair.vectors, tempIndexPath, algoParams, spaceType);
return null;
}
```
); | ||
final KNNQueryResult[] results; | ||
|
||
if (fieldAttributes.containsValue(NmsLibVersion.V206.getBuildVersion())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we pull out the methods in KNNFIndex and KNNIndex to interface so that we could refer with one object to keep it more cleaner? Something like
KNNIndex index;
if (fieldAttributes.containsValue(NmsLibVersion.V206.getBuildVersion())) {
index = knnIndexCache.getIndex(indexPath.toString(), knnQuery.getIndexName());
} else {
index = knnIndexCache.getFIndex(indexPath.toString(), knnQuery.getIndexName());
}
results = index.queryIndex(knnQuery.getQueryVector(), knnQuery.getK());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is may make the log of changes in KNNIndex. In KNNIndex class, it would load the specific library, if we want to pull out the methods getIndex
, we need an abstraction of KNNIndex and verify the engine we are using.
Let me think about it how to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we pull out the methods in KNNFIndex and KNNIndex to interface so that we could refer with one object to keep it more cleaner? Something like
in the latest commits, I used KNNIndex as abstract class. KNNFaissIndex
and KNNNmsLibIndex
extends the KNNINdex.
KNNIndexCacheEntry
only save the KNNIndex.
if you have any other idea, pls let me know
@@ -0,0 +1,272 @@ | |||
#include "com_amazon_opendistroforelasticsearch_knn_index_faiss_KNNFIndex.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class has core logic and would need some insights into way faiss library operates to review the code. I plan to visit this code once I have understanding of faiss interface.
@@ -0,0 +1,144 @@ | |||
cmake_minimum_required(VERSION 2.8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel having seperate JNI would be more cleaner and easy to abstract out the underlying business logic to dedicated files. I like the current approach.
update CMakeLists.txt
Fixed depends On in gradle
Add Faiss and nmslib LibVersion into knn plugin
1. Add NmsLib Version And Faiss Version into KNN Plugin
Update Upstream Master
2. Make Faiss Version into code 3. Update Hnswlib to 2.0.10 # Conflicts: # jni/CMakeLists.txt # jni/include/com_amazon_opendistroforelasticsearch_knn_index_nmslib_v208_KNNIndex.h # jni/include/com_amazon_opendistroforelasticsearch_knn_index_v2011_KNNIndex.h # jni/include/com_amazon_opendistroforelasticsearch_knn_index_v208_KNNIndex.h # jni/src/com_amazon_opendistroforelasticsearch_knn_index_nmslib_v208_KNNIndex.cpp # jni/src/com_amazon_opendistroforelasticsearch_knn_index_v2011_KNNIndex.cpp # jni/src/com_amazon_opendistroforelasticsearch_knn_index_v208_KNNIndex.cpp # src/main/java/com/amazon/opendistroforelasticsearch/knn/index/KNNIndexCache.java # src/main/java/com/amazon/opendistroforelasticsearch/knn/index/KNNIndexShard.java # src/main/java/com/amazon/opendistroforelasticsearch/knn/index/KNNWeight.java # src/main/java/com/amazon/opendistroforelasticsearch/knn/index/codec/KNN80Codec/KNN80DocValuesConsumer.java # src/main/java/com/amazon/opendistroforelasticsearch/knn/index/nmslib/v208/KNNIndex.java # src/main/java/com/amazon/opendistroforelasticsearch/knn/index/util/NmsLibVersion.java # src/main/java/com/amazon/opendistroforelasticsearch/knn/index/v2011/KNNIndex.java # src/main/java/com/amazon/opendistroforelasticsearch/knn/index/v208/KNNIndex.java # src/main/plugin-metadata/plugin-security.policy # src/test/java/com/amazon/opendistroforelasticsearch/knn/index/KNNIndexCacheTests.java # src/test/java/com/amazon/opendistroforelasticsearch/knn/index/KNNJNITests.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @luyuncheng, could you remove the jniFaiss/external/libfaiss.a? We don't need to ship binaries.
Also, can you update FAISS library commit to their latest release? v1.6.5. Right now, it points to a more recent commit.
.github/workflows/CI.yml
Outdated
@@ -4,6 +4,7 @@ on: | |||
branches: | |||
- master | |||
- opendistro-* | |||
- faiss* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think we need to add this branch to CI action, given that it is a development branch. Please remove. Once review finishes, we will check into faiss branch and develop/test from there. Then, once we are ready, we can create another PR to master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,i'll fix it
@@ -0,0 +1,144 @@ | |||
cmake_minimum_required(VERSION 2.8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having 2 separate libraries is okay.
|
||
std::unordered_map<string, faiss::MetricType> mapMetric = { | ||
{"l2", faiss::METRIC_L2}, | ||
{"innerproduct", faiss::METRIC_INNER_PRODUCT} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't FAISS also have cosine space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Doc Faiss-indexes , i added inner product, which means If we have normalized vectors, the inner product becomes cosine similarity.
And there is some issues talk about this: facebookresearch/faiss#593
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting. That makes sense.
paramsList.push_back(rawString); | ||
|
||
int M = 32; | ||
if (sscanf(rawString, "M=%d", &M) == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to sscanf here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to sscanf here?
In the KNN plugin we would make algoParam like: M=32, efSearch=20
, As faiss index factory docs shows, if M=32
we need use string: HNSW32
for index Description. so i use sscanf to get M.
i think we need an adapter to match different knnEngine algorithms params. if you have good idea, pls let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it. I think makes sense.
//---- Do Index | ||
//----- 1. Train | ||
if(!indexWriter->is_trained) { | ||
//TODO if we use like PQ, we have to train dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was thinking about this a little bit. It seems like it might make sense to fall back to a flat index if we do not have enough points to train. But we can think about this later.
{ | ||
//set thread 1 cause ES has Search thread | ||
//TODO make it different at search and write | ||
// omp_set_num_threads(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am no sure is it appropriate to set threads as 1 or reuse the settings:
public static final String KNN_ALGO_PARAM_INDEX_THREAD_QTY = "knn.algo_param.index_thread_qty";
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it probably makes sense to reuse KNN_ALGO_PARAM_INDEX_THREAD_QTY.
@@ -151,12 +153,22 @@ private void onRemoval(RemovalNotification<String, KNNIndexCacheEntry> removalNo | |||
*/ | |||
public KNNIndex getIndex(String key, final String indexName) { | |||
try { | |||
//TODO if Type Not consistent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I think we should add a different file extension for faiss graphs as opposed to nmslib graphs. Maybe .faiss_hnsw.
private final String indexPathUrl; | ||
private final String esIndexName; | ||
private final WatcherHandle<FileWatcher> fileWatcherHandle; | ||
|
||
private KNNIndexCacheEntry(final KNNIndex knnIndex, final String indexPathUrl, final String esIndexName, | ||
final WatcherHandle<FileWatcher> fileWatcherHandle) { | ||
this.knnIndex = knnIndex; | ||
this.knnFindex = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me that knnIndex and knnFindex should inherit from some kind of abstract class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I think we should add a different file extension for faiss graphs as opposed to nmslib graphs. Maybe .faiss_hnsw.
in the latest commit, i use knnEngine
name in the index file names like:
String hnswFileName = String.format("%s_%s_%s%s",
state.segmentInfo.name,
knnEngine.getLatestBuildVersion(),
field.name,
KNNCodecUtil.HNSW_EXTENSION);
Seems to me that knnIndex and knnFindex should inherit from some kind of abstract class.
yes, you are right. let me think about how to abstract this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me that knnIndex and knnFindex should inherit from some kind of abstract class.
I used KNNIndex as abstract class. KNNFaissIndex
and KNNNmsLibIndex
extends the KNNINdex.
KNNIndexCacheEntry
only save the KNNIndex.
if you have any other idea, pls let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense. Thanks for making the change!
2. Remove no useful libfaiss.a 3. Update JniFaiss CmakeLists.txt
2. Remove no use comments
Add NmsLib Version And Faiss Version into KNN Plugin
2. Add KNNNmsLibIndex and KNNFaissIndex inherit the KNNIndex
2. Add KNNNmsLibIndex and KNNFaissIndex inherit the KNNIndex 3. Move JNITests into JNINmsLibTests 4. Add New JNITests which would mixed knnEngine, and test with exception
Make KNNIndex as abstract class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this change @luyuncheng!
I think at this point we can check it into a development branch "faiss-support" and I will start working on it/testing it as well.
I created "faiss-support". Could you merge in all the changes to that branch so that we can merge this change in?
i have changed the base branch from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! to push to dedicate branch. Once again thanks for the great work @luyuncheng.
We can continue development/testing on top of this code in dedicated branch.
*As Issue #225 mentioned. It is possible to support faiss as future.
But i do not implement the codec services, because i want to ask for your opinions about this part.
I prefer to use the same interface(
saveIndex
,init
,queryIndex
) as same as nmslib's code with the same input parameter, this would i have more compatibility for knn plugin. so i PR this using the same JNI interface.i think with the same interface we can support nmslib and faiss in knn plugin at the same time.
Description of changes:
Faiss
index and Query with the same interface in jni code.index.knn.knnEngine
to distinguish faiss and nmslib.index.knn.knnEngine
to load the explicit Library.index.knn.knnEngine
to load the explicit Library.KNNJNIFaissTests
tests, it can do the same logic asKNNIndex
andKNNJNITests
integTest
when using Faiss as Librarybuild faiss
firstly which depends on some external Libs like OpenMP,BLAS etc.I think this PR, there is a lot of details need to be discussion.
PS. Is there any possible to open a faiss branch?
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.