Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrappers and a customized HNSW search #3926

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alexanderguzhva
Copy link
Contributor

@alexanderguzhva alexanderguzhva commented Oct 8, 2024

This PR introduces the following:

  • cmake/link_to_faiss_lib.cmake, which exposes a useful and reusable CMake link_to_faiss_lib() function
  • IndexWrapper class, which allows to override functionality of an existing Faiss index using a wrapper programming pattern
  • IndexHNSWWrapper overrides search() and range_search() methods for IndexHNSW with a search way it is done and used in Milvus (https://github.com/zilliztech/knowhere). Compared to a baseline Faiss version, this search() implementation tries to maintain a stable recall rate with the change of the % of filtered samples. The Faiss baseline range_search() implementation for HNSW raises questions...
  • IndexBruteForceWrapper overrides search() and range_search() methods to invoke a brute-force search
  • Bitset utility class
  • CountSizeIOWriter utility class, which is used for quick-and-dirty way of evaluating the size of a data blob, produced by index_write()

As of the PR as it is, all code changes are located in cppcontrib/knowhere directory. I'm ready for discussion whether certain components need to be moved out of cppcontrib/knowhere and moved to a baseline Faiss code.
Also, I'm ready to write unit tests and refactoring, just let me know which ones are needed.

There is a quick-and-dirty benchmark included in the code, but it operates on a random data. And there is no easy way to grab a real bench data for C++ code (you need contrib/dataset.py for BigANN, etc), so I'm ready to discuss python integrations, if needed.

@asadoughi
Copy link
Contributor

Thanks, @alexanderguzhva for the PR! Could you split out the CMake re-factoring as a separate PR to review that separately? We would be able to move forward with that at a faster speed than the rest of the current PR.

For IndexBruteForceWrapper I am curious what benefits you are seeing from that implementation compared to the current IndexFlat implementations, for example?

I haven't had a chance to dig through the HNSW implementation, but will try to find time for that over the next week.

@alexanderguzhva
Copy link
Contributor Author

@asadoughi #3939

@alexanderguzhva
Copy link
Contributor Author

wrappers are useful in a situation if you need to grab a index and make it work as it is was a brute-search one. For example, you have HNSWPQ index.

@mdouze
Copy link
Contributor

mdouze commented Oct 11, 2024

@alexanderguzhva is there a reason not to replace the Faiss HNSW implementation altogether ? There seems to be a lot of duplication in the HNSW wrapper.

@asadoughi
Copy link
Contributor

  1. Building on top of @mdouze's comment, we should get the equivalent behavior as FAISS if we initialize kAlpha to 1.0f, correct? More generally, are there any backwards incompatible behavior changes?
  2. Could you elaborate on your insights regarding range_search, please? Perhaps share a particular dataset where the proposed HNSW implementation performed significantly better than the current implementation. Thanks!

@alexanderguzhva
Copy link
Contributor Author

@mdouze @asadoughi I was expecting you guys to maybe conduct some performance testing for your internal use cases. As of now, it is a candidate code, and it is placed as a separate one, because it allows to test both baseline and candidate HNSW search in a single application.
I will provide updates for kAlpha and range_search

@asadoughi
Copy link
Contributor

How about benchmarks using the filter dataset from the NeurIPS '23 workshop?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants