-
Notifications
You must be signed in to change notification settings - Fork 56
FEAT: support cosine similarity #90
FEAT: support cosine similarity #90
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! Added a couple comments. Additionally, nmslib submodule should not be altered.
jni/src/v1736/com_amazon_opendistroforelasticsearch_knn_index_v1736_KNNIndex.cpp
Show resolved
Hide resolved
jni/src/v1736/com_amazon_opendistroforelasticsearch_knn_index_v1736_KNNIndex.cpp
Outdated
Show resolved
Hide resolved
@@ -31,8 +31,8 @@ | |||
import java.util.Map; | |||
import java.util.stream.Collectors; | |||
|
|||
public class KNNJNITests extends ESTestCase { | |||
private static final Logger logger = LogManager.getLogger(KNNJNITests.class); | |||
public class KNNJNITestsIT extends ESTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why switch this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. When I ran the test locally on IntelliJ, it fails to detect the integration tests therein due to lack of suffix "IT" in the class name. So I switched that for easier debugging. Will change it back but I think it might worth doublecheck if CI also neglects the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests as opposed to IT signals that it is a unit test. These tests run during ./gradlew build
, which is what we use for the PR testing workflow. You can also run them individually with this command:
./gradlew ':test' --tests "com.amazon.opendistroforelasticsearch.knn.index.KNNJNITests.testCreateHnswIndex"
We consider these unit tests because they are testing an isolated portion of code. Interesting that IntelliJ does not pick them up. Will need to look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Sorry, I misidentified it as IT.
private Set<String> types = SpaceTypes.getValues(); | ||
|
||
@Override public void validate(String value) { | ||
if (!types.contains(value.toLowerCase())){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this case insensitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intended to set it case insensitive, e.g. L2, Cosinesimil is also valid string. But I can remove this if not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think case insensitive is good
@@ -165,7 +214,7 @@ public Void run() { | |||
float[] queryVector = {1.0f, 1.0f, 1.0f, 1.0f}; | |||
String[] algoQueryParams = {"efSearch=200"}; | |||
|
|||
final KNNIndex index = KNNIndex.loadIndex(indexPath, algoQueryParams); | |||
final KNNIndex index = KNNIndex.loadIndex(indexPath, "l2", algoQueryParams); | |||
final KNNQueryResult[] results = index.queryIndex(queryVector, 30); | |||
|
|||
Map<Integer, Float> scores = Arrays.stream(results).collect( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add test cases to ensure that invalid space names throw the expected exceptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. My initial thought was to add such tests at KNNSettings module b/c that is where I want the input value to be validated and fail fast. Do you think it is also necessary to prevent invalid spaceType into JNI functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an integration test in KNNESSettingsIT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense thanks
@@ -31,8 +31,8 @@ | |||
import java.util.Map; | |||
import java.util.stream.Collectors; | |||
|
|||
public class KNNJNITests extends ESTestCase { | |||
private static final Logger logger = LogManager.getLogger(KNNJNITests.class); | |||
public class KNNJNITestsIT extends ESTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests as opposed to IT signals that it is a unit test. These tests run during ./gradlew build
, which is what we use for the PR testing workflow. You can also run them individually with this command:
./gradlew ':test' --tests "com.amazon.opendistroforelasticsearch.knn.index.KNNJNITests.testCreateHnswIndex"
We consider these unit tests because they are testing an isolated portion of code. Interesting that IntelliJ does not pick them up. Will need to look into it.
…N into feat/28-cosine-similarity * 'feat/28-cosine-similarity' of github.com:chenqi0805/k-NN: Compile .so on linux
@jmazanec15 I have added one more test coverage on invalid spaceType setting and also updated README. The only concern left is whether spaceType setting is allowed to be dynamic or not. I think this depends on whether the KNN index setting parameters are embedded into fields and docs when stored, which I am not sure. Do you have idea here? Besides that, this work is now complete. |
spaceType setting should be static for an index to lead consistent results otherwise segments in the shards would be scattered with different spaces and queries would lead inconsistent results |
@@ -85,15 +85,19 @@ void catch_cpp_exception_and_throw_java(JNIEnv* env) | |||
} | |||
} | |||
|
|||
JNIEXPORT void JNICALL Java_com_amazon_opendistroforelasticsearch_knn_index_v1736_KNNIndex_saveIndex(JNIEnv* env, jclass cls, jintArray ids, jobjectArray vectors, jstring indexPath, jobjectArray algoParams) | |||
JNIEXPORT void JNICALL Java_com_amazon_opendistroforelasticsearch_knn_index_v1736_KNNIndex_saveIndex(JNIEnv* env, jclass cls, jintArray ids, jobjectArray vectors, jstring indexPath, jstring spaceType, jobjectArray algoParams) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Please add the new function arguments to the end of the existing parameters list. Please take care in other places as well.
|
||
private Set<String> types = SpaceTypes.getValues(); | ||
|
||
@Override public void validate(String value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
null check on value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vamshin I put a null check here, but it still fails to guard against null value, i.e. for some reason, if index.knn.space_type == null, it will not go through the validate. Similar bug happens to other parameters, e.g.
PUT /myindex
{
"settings" : {
"index": {
"knn": true,
"knn.algo_param.m": null
}
}
}
PUT /myindex/_doc/2?refresh=true
{
"my_vector1" : [1.5, 2.5],
"price":10
}
POST /myindex/_search
{
"size" : 10,
"query": {
"knn": {
"my_vector1": {
"vector": [15, 25],
"k": 2
}
}
}
}
yields
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
How about we open a separate issue to track this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it. I think we can probably ignore this case.
/** | ||
* Enum contains space types for k-NN similarity search | ||
*/ | ||
public enum SpaceTypes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Move this enum class to a dedicated file
* @param algoParams hnsw algorithm parameters | ||
* @return knn index that can be queried for k nearest neighbours | ||
*/ | ||
public static KNNIndex loadIndex(String indexPath, final String[] algoParams) { | ||
public static KNNIndex loadIndex(String indexPath, String spaceType, final String[] algoParams) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: final String spaceType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for this contribution!
@@ -165,7 +214,7 @@ public Void run() { | |||
float[] queryVector = {1.0f, 1.0f, 1.0f, 1.0f}; | |||
String[] algoQueryParams = {"efSearch=200"}; | |||
|
|||
final KNNIndex index = KNNIndex.loadIndex(indexPath, algoQueryParams); | |||
final KNNIndex index = KNNIndex.loadIndex(indexPath, "l2", algoQueryParams); | |||
final KNNQueryResult[] results = index.queryIndex(queryVector, 30); | |||
|
|||
Map<Integer, Float> scores = Arrays.stream(results).collect( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense thanks
@vamshin Thanks for the comments. Besides the comment on null value check, it is good for another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the contribution. Much required feature for our community. Adding new space types supported by nmslib should be much simpler going forward and we could support it as per the community request.
As this change involves JNI level code changes and new functionality, we would evaluate this commit with performance tests and recall tests and then consider for merge. We are in the process of building automated performance tests for the commits but till then we may need to do some manual evaluation.
We will update this thread with the results and you could merge it then. Thank you.
|
||
private Set<String> types = SpaceTypes.getValues(); | ||
|
||
@Override public void validate(String value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it. I think we can probably ignore this case.
Hi @chenqi0805, I think we are going to check this in as an experimental feature, until we can test it more thoroughly. Could you add a section in the readme under this section for Cosine Similarity usage with an example and mark it in the header as experimental? Like this: |
@jmazanec15 Will do in free time. |
@jmazanec15 updated. |
Great thank you @chenqi0805! This is a great feature. Going to check it in. |
Issue #, if available:
#28
Description of changes:
TODO
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.