-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[META] Support Hybrid Disk/Memory Algorithms #1134
Comments
@jmazanec15 Thanks for creating this issue. I kind of align on the high level task, but would love to see more breakdown of these tasks with time bound investigation and scope. Is there any specific feedback you are looking here? |
Please +1 if you wish to have this feature prioritized |
Thanks @navneet1v
Sure, will evolve this as we go. Mainly put out the issue for tracking purposes.
Not specific feedback - would like to see if anyone has used a disk based approach and wants to share their opinion. This is a tracking issue, not an RFC. There will be more issues to come seeking more direct feedback. |
+1 LGTM |
adding the below resources, looks like a similar issue came out in Lucene: credit to @reta for bringing those to my attention |
Overview
One common problem around dense vector search is the amount of memory required to implement a solution that delivers high performance and recall. The problem is grounded in the fact that many ANN algorithms require fast, random access to the full floating representations of the vectors as well as auxiliary data structures (in-memory algorithms). HNSW is one example of such an algorithm. While there are techniques, like Product Quantization, that allow the memory footprint of the algorithm to be greatly reduced, it can be difficult to achieve high recall (refer to a blog I wrote awhile ago around IVFPQ vs. HNSW for billion-scale search: https://aws.amazon.com/blogs/big-data/choose-the-k-nn-algorithm-for-your-billion-scale-use-case-with-opensearch/). Anecdotally, users typically want a recall above 0.9, where recall is defined as the ratio of neighbors returned that are ground truth nearest neighbors. Similarly, techniques such as scalar quantization and dimensionality reduction can also be employed, but some users may want a more significant reduction in memory.
Another approach that has gained a lot of interest is utilizing fast access disks (i.e. SSDs) in order to extend memory. I will refer to these algorithms as hybrid algorithms. hybrid algorithms will keep some kind of a small representation of the index in memory and the rest on disk. Then, during search, the algorithms will intelligently select a few items to read from disk with the goal of optimizing query latency/working set size/recall tradeoff. A few popular examples are:
For OpenSearch, given the potential benefit, we should investigate adding support. There are several approaches that we can take to add support:
For each approach, there are also a lot of factors to consider - (integration feasibility into OpenSearch, support for aux features such as filtering, etc.). At the moment, we do not have enough data to figure out what route to go. Therefore, we will need to conduct an investigation into existing approaches. This issue is a general META issue that tracks the project as a whole. All feedback/collaboration is welcome. If there are approaches that you have tried that worked well (or not so well), please share your experiences!
Tasks
(Task list is subject to evolve)
Related Issues
related issue: #758
The text was updated successfully, but these errors were encountered: