Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Upgrade nmslib to v2.0.6 #160

Conversation

chenqi0805
Copy link
Contributor

@chenqi0805 chenqi0805 commented Jul 7, 2020

Issue #, if available:
#128

Description of changes:
This PR upgrades the nmslib to latest and change the impacted file/folder names, import paths, APIs and security settings.

Note:

  1. I verified backward compatibility with the examples on README.md, where I stored indices and docs in V1736 and retrieve/search in V206. It is possible that there will be corner cases missed and we should be prepared to roll back. Not sure if it is necessary to formulate a test suite on backward compatibility.
  2. It is still cumbersome to update everything impacted by the nmslib version in the codebase. If it is going to be done on a regular basis, we should probably improve the infrastructure.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@jmazanec15
Copy link
Member

Hi @chenqi0805 , thanks for working on this.

I think that backwards compatibility is very important. I think the following cases need to be tested to confirm that they do not impact performance:

  1. Document set is indexed with v1.7.3.6, and then loaded/searched with v2.0.6.
  2. Document set is index with v1.7.3.6, and then more documents are added to the index with v2.0.6. Then, search/merge operations are tested.

I can work on testing these.

@vamshin and @chenqi0805 , if you can think of any other test cases, please add them here.

Additionally, with respect to library naming, I believe that NMSLIB version was added to our naming conventions in case backwards compatibility was broken. However, if the library versions are backwards compatible, I see no reason to include NMSLIB version in our naming conventions. This possibly could be changed in another PR.

Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I was able to run through both test cases

  1. Document set is indexed with v1.7.3.6, and then loaded/searched with v2.0.6.
    I setup an ODFE 1.9 cluster on an EC2 instance with RPM artifact, ingested the sift and glove-100 datasets (l2 and cosine similarity spaces, respectively) from Ann-Benchmarks. Then, I stopped Elasticsearch and I built the plugin and library RPM from @chenqi0805 's branch and installed the versions with 2.0.6 on the cluster and started it up again. Then, I ran a recall test for each and noted that performance was not degraded.

  2. Document set is index with v1.7.3.6, and then more documents are added to the index with v2.0.6. Then, search/merge operations are tested.
    I repeated the experiment from above, except I only indexed half the documents with v1.7.3.6 and the other half with v2.0.6. Then, I force merged to 1 segment. Then, I again ran the recall test and noted that there were no performance degradation.

So, from those 2 experiments, it looks like the library is backwards compatible from 2.0.6 to 1.7.3.6. That being said, I think this change can be approved.

@chenqi0805
Copy link
Contributor Author

@jmazanec15 Thanks a lot for running and elaborating the backward compatibility test! Just curious about how you monitored performance degradation, i.e. what metrics did you inspect maybe on cloud watch dashboard? I hope to carry out similar tests w.r.t dot product space #161 .

@vamshin If you do not have further comments, I will merge the PR.

@vamshin
Copy link
Member

vamshin commented Jul 15, 2020

@jmazanec15 Thanks a lot for running and elaborating the backward compatibility test! Just curious about how you monitored performance degradation, i.e. what metrics did you inspect maybe on cloud watch dashboard? I hope to carry out similar tests w.r.t dot product space #161 .

@vamshin If you do not have further comments, I will merge the PR.

@chenqi0805 will take a quick look today and get back on this. So far looks good from Jack's experiments

@jmazanec15
Copy link
Member

Hi @chenqi0805, I have a few recall tests I run and ES Rally scripts I used based off ANN-Benchmark datasets. Eventually, I will submit PR and add them to the k-NN repo.

Copy link
Member

@vamshin vamshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for upgrading library to pull in new similarity spaces.

@vamshin
Copy link
Member

vamshin commented Jul 15, 2020

@jmazanec15 Thanks a lot for running and elaborating the backward compatibility test! Just curious about how you monitored performance degradation, i.e. what metrics did you inspect maybe on cloud watch dashboard? I hope to carry out similar tests w.r.t dot product space #161 .
@vamshin If you do not have further comments, I will merge the PR.

@chenqi0805 will take a quick look today and get back on this. So far looks good from Jack's experiments

@chenqi0805 Done. Thanks for the work. Feel free to merge the PR. @jmazanec15 thanks for thorough testing.

@chenqi0805 chenqi0805 merged commit 3e79fe5 into opendistro-for-elasticsearch:master Jul 15, 2020
@chenqi0805 chenqi0805 deleted the enh/128-update-nmslib branch July 15, 2020 22:12
@jmazanec15 jmazanec15 changed the title ENH: upgrade external library nmslib to latest Upgrade external library nmslib to latest Jul 16, 2020
@jmazanec15 jmazanec15 added the Enhancements Improvement on existing component label Jul 16, 2020
jmazanec15 pushed a commit to jmazanec15/k-NN that referenced this pull request Aug 12, 2020
…nh/128-update-nmslib

ENH: upgrade external library nmslib to latest
jmazanec15 pushed a commit that referenced this pull request Aug 12, 2020
ENH: upgrade external library nmslib to latest
@jmazanec15 jmazanec15 changed the title Upgrade external library nmslib to latest Upgrade external library nmslib to v2.0.6 Aug 25, 2020
@jmazanec15 jmazanec15 changed the title Upgrade external library nmslib to v2.0.6 Upgrade nmslib to v2.0.6 Aug 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Enhancements Improvement on existing component
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants