Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

FEAT: support cosine similarity #90

Conversation

chenqi0805
Copy link
Contributor

@chenqi0805 chenqi0805 commented Apr 12, 2020

Issue #, if available:
#28

Description of changes:

  • Add "index.knn.space_type" as string param into KNNSettings and use validator for fast fail
  • Pass in spaceType as String into jni functions (java/c++)
  • Update vector field mapper to correctly parse the spaceType setting param
  • Adapt KNNIndexCache
  • Adapt existing test cases

TODO

  • check if "index.knn.space_type" should be reset or not once a index is created
  • More test cases to cover cosine similarity
  • update README

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Added a couple comments. Additionally, nmslib submodule should not be altered.

@@ -31,8 +31,8 @@
import java.util.Map;
import java.util.stream.Collectors;

public class KNNJNITests extends ESTestCase {
private static final Logger logger = LogManager.getLogger(KNNJNITests.class);
public class KNNJNITestsIT extends ESTestCase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why switch this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. When I ran the test locally on IntelliJ, it fails to detect the integration tests therein due to lack of suffix "IT" in the class name. So I switched that for easier debugging. Will change it back but I think it might worth doublecheck if CI also neglects the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests as opposed to IT signals that it is a unit test. These tests run during ./gradlew build, which is what we use for the PR testing workflow. You can also run them individually with this command:

./gradlew ':test' --tests "com.amazon.opendistroforelasticsearch.knn.index.KNNJNITests.testCreateHnswIndex"

We consider these unit tests because they are testing an isolated portion of code. Interesting that IntelliJ does not pick them up. Will need to look into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Sorry, I misidentified it as IT.

private Set<String> types = SpaceTypes.getValues();

@Override public void validate(String value) {
if (!types.contains(value.toLowerCase())){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this case insensitive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended to set it case insensitive, e.g. L2, Cosinesimil is also valid string. But I can remove this if not necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think case insensitive is good

@@ -165,7 +214,7 @@ public Void run() {
float[] queryVector = {1.0f, 1.0f, 1.0f, 1.0f};
String[] algoQueryParams = {"efSearch=200"};

final KNNIndex index = KNNIndex.loadIndex(indexPath, algoQueryParams);
final KNNIndex index = KNNIndex.loadIndex(indexPath, "l2", algoQueryParams);
final KNNQueryResult[] results = index.queryIndex(queryVector, 30);

Map<Integer, Float> scores = Arrays.stream(results).collect(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add test cases to ensure that invalid space names throw the expected exceptions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. My initial thought was to add such tests at KNNSettings module b/c that is where I want the input value to be validated and fail fast. Do you think it is also necessary to prevent invalid spaceType into JNI functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an integration test in KNNESSettingsIT.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense thanks

@@ -31,8 +31,8 @@
import java.util.Map;
import java.util.stream.Collectors;

public class KNNJNITests extends ESTestCase {
private static final Logger logger = LogManager.getLogger(KNNJNITests.class);
public class KNNJNITestsIT extends ESTestCase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests as opposed to IT signals that it is a unit test. These tests run during ./gradlew build, which is what we use for the PR testing workflow. You can also run them individually with this command:

./gradlew ':test' --tests "com.amazon.opendistroforelasticsearch.knn.index.KNNJNITests.testCreateHnswIndex"

We consider these unit tests because they are testing an isolated portion of code. Interesting that IntelliJ does not pick them up. Will need to look into it.

…N into feat/28-cosine-similarity

* 'feat/28-cosine-similarity' of github.com:chenqi0805/k-NN:
  Compile .so on linux
@chenqi0805 chenqi0805 changed the title WIP: FEAT: support cosine similarity FEAT: support cosine similarity Apr 15, 2020
@chenqi0805 chenqi0805 changed the title FEAT: support cosine similarity [WIP]FEAT: support cosine similarity Apr 15, 2020
@chenqi0805 chenqi0805 changed the title [WIP]FEAT: support cosine similarity FEAT: support cosine similarity Apr 15, 2020
@chenqi0805
Copy link
Contributor Author

chenqi0805 commented Apr 15, 2020

@jmazanec15 I have added one more test coverage on invalid spaceType setting and also updated README. The only concern left is whether spaceType setting is allowed to be dynamic or not. I think this depends on whether the KNN index setting parameters are embedded into fields and docs when stored, which I am not sure. Do you have idea here? Besides that, this work is now complete.

@vamshin
Copy link
Member

vamshin commented Apr 15, 2020

@jmazanec15 I have added one more test coverage on invalid spaceType setting and also updated README. The only concern left is whether spaceType setting is allowed to be dynamic or not. I think this depends on whether the KNN index setting parameters are embedded into fields and docs when stored, which I am not sure. Do you have idea here? Besides that, this work is now complete.

spaceType setting should be static for an index to lead consistent results otherwise segments in the shards would be scattered with different spaces and queries would lead inconsistent results

@@ -85,15 +85,19 @@ void catch_cpp_exception_and_throw_java(JNIEnv* env)
}
}

JNIEXPORT void JNICALL Java_com_amazon_opendistroforelasticsearch_knn_index_v1736_KNNIndex_saveIndex(JNIEnv* env, jclass cls, jintArray ids, jobjectArray vectors, jstring indexPath, jobjectArray algoParams)
JNIEXPORT void JNICALL Java_com_amazon_opendistroforelasticsearch_knn_index_v1736_KNNIndex_saveIndex(JNIEnv* env, jclass cls, jintArray ids, jobjectArray vectors, jstring indexPath, jstring spaceType, jobjectArray algoParams)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please add the new function arguments to the end of the existing parameters list. Please take care in other places as well.


private Set<String> types = SpaceTypes.getValues();

@Override public void validate(String value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null check on value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vamshin I put a null check here, but it still fails to guard against null value, i.e. for some reason, if index.knn.space_type == null, it will not go through the validate. Similar bug happens to other parameters, e.g.

PUT /myindex
{
    "settings" : {
        "index": {
            "knn": true,
            "knn.algo_param.m": null
        }
    }
}
PUT /myindex/_doc/2?refresh=true
{
    "my_vector1" : [1.5, 2.5],
    "price":10
}
POST /myindex/_search
{
    "size" : 10,
    "query": {
        "knn": {
            "my_vector1": {
                "vector": [15, 25],
                "k": 2
            }
        }
    }
}

yields

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

How about we open a separate issue to track this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it. I think we can probably ignore this case.

/**
* Enum contains space types for k-NN similarity search
*/
public enum SpaceTypes {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Move this enum class to a dedicated file

* @param algoParams hnsw algorithm parameters
* @return knn index that can be queried for k nearest neighbours
*/
public static KNNIndex loadIndex(String indexPath, final String[] algoParams) {
public static KNNIndex loadIndex(String indexPath, String spaceType, final String[] algoParams) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: final String spaceType?

Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for this contribution!

@@ -165,7 +214,7 @@ public Void run() {
float[] queryVector = {1.0f, 1.0f, 1.0f, 1.0f};
String[] algoQueryParams = {"efSearch=200"};

final KNNIndex index = KNNIndex.loadIndex(indexPath, algoQueryParams);
final KNNIndex index = KNNIndex.loadIndex(indexPath, "l2", algoQueryParams);
final KNNQueryResult[] results = index.queryIndex(queryVector, 30);

Map<Integer, Float> scores = Arrays.stream(results).collect(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense thanks

@chenqi0805
Copy link
Contributor Author

@vamshin Thanks for the comments. Besides the comment on null value check, it is good for another look.

@chenqi0805 chenqi0805 requested a review from vamshin April 17, 2020 17:52
Copy link
Member

@vamshin vamshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the contribution. Much required feature for our community. Adding new space types supported by nmslib should be much simpler going forward and we could support it as per the community request.

As this change involves JNI level code changes and new functionality, we would evaluate this commit with performance tests and recall tests and then consider for merge. We are in the process of building automated performance tests for the commits but till then we may need to do some manual evaluation.

We will update this thread with the results and you could merge it then. Thank you.


private Set<String> types = SpaceTypes.getValues();

@Override public void validate(String value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it. I think we can probably ignore this case.

@jmazanec15
Copy link
Member

Hi @chenqi0805, I think we are going to check this in as an experimental feature, until we can test it more thoroughly. Could you add a section in the readme under this section for Cosine Similarity usage with an example and mark it in the header as experimental? Like this: Cosine Similarity Usage (experimental)

@chenqi0805
Copy link
Contributor Author

@jmazanec15 Will do in free time.

@chenqi0805
Copy link
Contributor Author

@jmazanec15 updated.

@jmazanec15
Copy link
Member

Great thank you @chenqi0805! This is a great feature. Going to check it in.

@jmazanec15 jmazanec15 merged commit 1c80b62 into opendistro-for-elasticsearch:master Apr 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants