You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At Meilisearch, we must ensure documents or vectors are hidden from some queries. The first version of the vector store feature was to iterate on the best results in the order found in the HNSW and conditionally ignore them. The complexity was O(n), which is not great when the user filters the results on a couple, e.g., 5, 10, 100.
We can do better than that now that we control the whole source code. There are two main solutions to implement:
When there is a fairly small number of selected items, compute the distance without going through the whole tree/graph. However, we need to define a correct threshold.
The algorithm must be more clever when many more items are selected. We will filter the items from the Descendant variant when "iterating" over the tree nodes in the nns_by_item/by_vector. We can barely not even touch the build phase. However, it would also be great to store bitmaps instead of lists of u32s in the Descendant nodes to be able to perform faster intersections.
We will also probably need to provide a nss_builder to reduce the number of conditional parameters to specify. For example, we can provide the RoaringBitmap to filter with, the number of trees to explore, and maybe two methods to query the database: with and without the distances (is it useful?).
TODO
Update the README's feature list
The text was updated successfully, but these errors were encountered:
At Meilisearch, we must ensure documents or vectors are hidden from some queries. The first version of the vector store feature was to iterate on the best results in the order found in the HNSW and conditionally ignore them. The complexity was
O(n)
, which is not great when the user filters the results on a couple, e.g., 5, 10, 100.We can do better than that now that we control the whole source code. There are two main solutions to implement:
Descendant
variant when "iterating" over the tree nodes in thenns_by_item/by_vector
. We can barely not even touch the build phase. However, it would also be great to store bitmaps instead of lists ofu32
s in theDescendant
nodes to be able to perform faster intersections.We will also probably need to provide a
nss_builder
to reduce the number of conditional parameters to specify. For example, we can provide theRoaringBitmap
to filter with, the number of trees to explore, and maybe two methods to query the database: with and without the distances (is it useful?).TODO
The text was updated successfully, but these errors were encountered: