-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrappers and a customized HNSW search #3926
base: main
Are you sure you want to change the base?
Wrappers and a customized HNSW search #3926
Conversation
1d05380
to
f4b7439
Compare
f4b7439
to
59e2eca
Compare
Thanks, @alexanderguzhva for the PR! Could you split out the CMake re-factoring as a separate PR to review that separately? We would be able to move forward with that at a faster speed than the rest of the current PR. For IndexBruteForceWrapper I am curious what benefits you are seeing from that implementation compared to the current IndexFlat implementations, for example? I haven't had a chance to dig through the HNSW implementation, but will try to find time for that over the next week. |
wrappers are useful in a situation if you need to grab a index and make it work as it is was a brute-search one. For example, you have HNSWPQ index. |
@alexanderguzhva is there a reason not to replace the Faiss HNSW implementation altogether ? There seems to be a lot of duplication in the HNSW wrapper. |
|
@mdouze @asadoughi I was expecting you guys to maybe conduct some performance testing for your internal use cases. As of now, it is a candidate code, and it is placed as a separate one, because it allows to test both baseline and candidate HNSW search in a single application. |
Signed-off-by: Alexandr Guzhva <[email protected]>
59e2eca
to
b42ef4a
Compare
How about benchmarks using the filter dataset from the NeurIPS '23 workshop? |
This PR introduces the following:
cmake/link_to_faiss_lib.cmake
, which exposes a useful and reusable CMakelink_to_faiss_lib()
functionIndexWrapper
class, which allows to override functionality of an existing Faiss index using a wrapper programming patternIndexHNSWWrapper
overridessearch()
andrange_search()
methods for IndexHNSW with a search way it is done and used in Milvus (https://github.com/zilliztech/knowhere). Compared to a baseline Faiss version, thissearch()
implementation tries to maintain a stable recall rate with the change of the % of filtered samples. The Faiss baselinerange_search()
implementation for HNSW raises questions...IndexBruteForceWrapper
overridessearch()
andrange_search()
methods to invoke a brute-force searchBitset
utility classCountSizeIOWriter
utility class, which is used for quick-and-dirty way of evaluating the size of a data blob, produced byindex_write()
As of the PR as it is, all code changes are located in
cppcontrib/knowhere
directory. I'm ready for discussion whether certain components need to be moved out ofcppcontrib/knowhere
and moved to a baseline Faiss code.Also, I'm ready to write unit tests and refactoring, just let me know which ones are needed.
There is a quick-and-dirty benchmark included in the code, but it operates on a random data. And there is no easy way to grab a real bench data for C++ code (you need
contrib/dataset.py
for BigANN, etc), so I'm ready to discuss python integrations, if needed.