[Feature] [META] Add Efficient filtering support for Faiss Engine #903

navneet1v · 2023-05-19T01:52:05Z

Introduction

This Issue provides a high level overview on Efficient Filtering support for K-NN native engines. Due to the limited support filtering in Native Engines we will be enabling Efficient filtering in Faiss only.

Background

The OpenSearch K-NN plugin supports 3 different type of Engines to perform the Approximate Nearest Neighbor Search. Engines is just an abstraction provided by the plugin over what downstream libraries we will use to do the Nearest Neighbor Search. Currently we have Lucene(Java Implementation), Faiss(C++ implementation) and Nmslib(C++ implementation) as 3 different engines.
Every engine supports various algorithms do the Search. On high level we support:

Lucene: HNSW algorithm
Nmslib: HNSW algorithm
Faiss: HNSW and IVF.

For more details you can read this documentation: https://opensearch.org/docs/latest/search-plugins/knn/knn-index/

K-NN Architecture For Native Engines(Indexing and Search)

On a very high level, an OpenSearch index data is stored in Shards. Shards are nothing but Lucene Indexes. Each shard contains segments. These segments are immutable once they are created. For indices that have K-NN fields in it the architecture is same. K-NN plugin uses the same architecture to support the ANN search. At a very high level during the segment creation, apart from creating all the different data structures(Like FST, BKDs, DocValues etc) needed for different fields, for a K-NN field for HNSW algorithm K-NN plugin creates the HNSW graph files per K-NN Vector Field. These files are written down as segment files.
While performing the ANN Search, we load these HNSW files from disk into Memory(not JVM Heap) if not present already and then perform the Search using respective Libraries.

Filtering

As of OpenSearch 2.8 version K-NN plugin support various types of filters(ref doc). But the type of filtering that we are proposing as part of this document is only present with Lucene engine. None of the Native Engines(Faiss and NmsLib) doesn’t support it.

What is filtering and why it is important?

Filtering is a way that lets users to restrict their search to a certain part of their data. In terms of vector search: The goal is to return the nearest neighbor for a query(containing query vector and filter) among the data points that satisfy the filter specified with query. Lets try to understand this with an example specific to vector search:
Let’s say we have an index where we are storing the product catalog, where images are represented as vectors, along with the same index we are storing the ratings, date when it was uploaded, total review etc. The customer of that application is trying to search for similar products(providing that as a vector) but wants only that products which has Rating >= 4.
So, to honor such queries we need filtering along with Vector Search.

What is Efficient Filtering?

In Filtering world, there are basically 2 types of filtering, lets try to understand then in terms of Vector Search:

Pre-Filtering: Pre filtering is mainly means, for a given query with filter you first apply the filters on your complete corpus to get the filtered documents and now on those filtered documents you do the Vector search. In general vector search on those filtered documents can be either be done via Exact Search or creating a new HNSW graph with these filtered Ids during runtime and then do a search.
Post Filtering: Post filtering is mainly means, for a given query with filter you first do the Vector Search and then what ever documents you get apply filter on top of it. This has obvious problems because now the total number of results can be < k if we apply filters.

Efficient Filtering: Efficient filtering is an improvement over pre-filtering especially, where main idea

we apply the filters first to get the filterIds and while doing the ANN on the whole corpus we only consider the docIds which are present in the filterIds set.
We intelligently figure out when do ANN search on whole corpus with FilterIds and when do the exact search. The main idea behind doing this is in worst case you need to look at at-least K docs while doing ANN, but if the filterIds < K doing an exact search is efficient both in terms of latency and recall.

Note: The ideas discussed in this section around efficient filtering is not the only things we are considering in efficient filtering. Detail explanations are provided in the doc over the same, which are specific to OpenSearch K-NN Plugin.

Why we need Filtering in Native Engines?

There are couple of limitation with Lucene because of which we want add Filtering in Native engines:

Lucene has limitation on the number of dimensions which 1024. With current boom in LLM space, the vector dimensions > 1024(Open AI has 1500 dimensions). So customer who want to use OpenSearch K-NN plugin won’t be able to take full advantage of K-NN plugin. There has been issues cut to lucene to increase the dimensions size to 2048 but seems like Lucene community is bit divided there. Currently native engines support 16K dimensions.
Lucene doesn’t allow us to change the ef_search parameter while doing the search. This parameter has huge impact on both latency and recall. For filtering this will be more visible. Exp5 shows that.

Requirement

Implement Efficient Filtering for Native Engines(along with different engines algorithms) of OpenSearch K-NN plugins.

Out Of Scope

Based on the deep-dive and POCs(refer POC section of this document) done, NMSLib in current form doesn’t have support for doing filtering with ANN Search. Hence filtering will be added only for Faiss Engine. Please refer Future Enhancements section around how we can support filtering on NmsLib engine.

Success Criteria

Native engine should support all the different types of filter clauses currently supported in OpenSearch and by Lucene Engine.
The Recall and Latency of the filtered query(both Relaxed and Restrictive Filters) should be in acceptable range.

High Level Flow

Below is a simplified flow/logic on how filtered search will work.

Filtered Search Flow: Users create a k-NN query with filter and calls the OpenSearch _search api to do the query with filter. The query follows the standard Query Route at the Coordinator node and routed to different shards(can be present on different or same Data Nodes). On the data node, the query is written into Lucene based Query Interface for every shard after performing validation checks(like does filter is supported by this KNN engine or not.). On every shard this query gets executed via IndexSearchers. The IndexSearcher runs the K-NN Query on each Shard, where each K-NN Query will build the a filterQuery(Lucene Constant Score Query) and run the query to return weight of the query. This filter weight is passed to KNNWeight where:

It first resolves the FilterWeight to get FilterDocIds.
Check if exact search needs to be done or not. If no, it loads the graphs which are not loaded in cache and then call the JNI Layer, which further calls the Faiss Library to do the Search.
If exact Search is required, we do exact search on the filterIds and then return the results back to IndexSearcher.

Once the results are returned standard OpenSearch Query flows happens, where a list of docIds of length “size”, is send from each shard to coordinator, which then compile the top docs Ids (of length == size parameter) and runs the fetch phase to get more information on these docIds.

Example

Create Index

PUT /hotels-index
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100,
      "number_of_shards": 1,
      "number_of_replicas": 0,
            "refresh_interval": "1s"
    }
  },
  "mappings": {
    "properties": {
      "location": {
        "type": "knn_vector",
        "dimension": 2,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "faiss"
        }
      }
    }
  }
}

Ingest Documents

POST /_bulk?refresh
{ "index": { "_index": "hotels-index", "_id": "1" } }
{ "location": [5.2, 4.4], "parking" : "true", "rating" : 5 }
{ "index": { "_index": "hotels-index", "_id": "2" } }
{ "location": [5.2, 3.9], "parking" : "false", "rating" : 4 }
{ "index": { "_index": "hotels-index", "_id": "3" } }
{ "location": [4.9, 3.4], "parking" : "true", "rating" : 9 }
{ "index": { "_index": "hotels-index", "_id": "4" } }
{ "location": [4.2, 4.6], "parking" : "false", "rating" : 6}
{ "index": { "_index": "hotels-index", "_id": "5" } }
{ "location": [3.3, 4.5], "parking" : "true", "rating" : 8 }
{ "index": { "_index": "hotels-index", "_id": "6" } }
{ "location": [6.4, 3.4], "parking" : "true", "rating" : 9 }
{ "index": { "_index": "hotels-index", "_id": "7" } }
{ "location": [4.2, 6.2], "parking" : "true", "rating" : 5 }
{ "index": { "_index": "hotels-index", "_id": "8" } }
{ "location": [2.4, 4.0], "parking" : "true", "rating" : 8 }
{ "index": { "_index": "hotels-index", "_id": "9" } }
{ "location": [1.4, 3.2], "parking" : "false", "rating" : 5 }
{ "index": { "_index": "hotels-index", "_id": "10" } }
{ "location": [7.0, 9.9], "parking" : "true", "rating" : 9 }
{ "index": { "_index": "hotels-index", "_id": "11" } }
{ "location": [3.0, 2.3], "parking" : "false", "rating" : 6 }
{ "index": { "_index": "hotels-index", "_id": "12" } }
{ "location": [5.0, 1.0], "parking" : "true", "rating" : 3 }

Query With Filters

POST /hotels-index/_search
{
  "size": 10,
  "query": {
    "knn": {
      "location": {
        "vector": [5,4],
        "k": 10,
        "filter": {
          "bool": {
            "must": [
              {
                "range": {
                  "rating": {
                    "gte": 7,
                    "lte": 10
                  }
                }
              },
              {
                "term": {
                  "parking": "true"
                }
              }
            ]
          }
        }
      }
    }
  }
}

Query Without Filters

POST /hotels-index/_search
{
  "size": 10,
  "query": {
    "knn": {
      "location": {
        "vector": [
          5,
          4
        ],
        "k": 3
      }
    }
  }
}

Task:

The text was updated successfully, but these errors were encountered:

TrungBui59 · 2023-09-21T23:30:06Z

Hi, is this issue still open for contribution?

navneet1v · 2023-09-22T04:42:59Z

@TrungBui59 all the tasks have been completed for this issue. I am waiting for the opensearch 2.10 release to happen once that is done this issue will be closed. But there are several others related to this issue which you can contribute.

#1050

You can also view issues with tags good first issue and help wanted.

savanbthakkar · 2023-12-06T17:04:59Z

@navneet1v , Does it work in a similar way for neural query as well? Or it's just for "kNN" query?

navneet1v · 2023-12-06T17:43:25Z

Yes this works in same for neural query clause also..

savanbthakkar · 2023-12-06T18:33:50Z

Thank you @navneet1v
I have single opensearch index with kNN true for my multi-tenant system.
I index documents of only one tenant (with tenant_id 1) , then run the query below, I get 55 results.
Then I index documents of another tenant (with tenant_id 2), then run the same query below, I get 28 results.
My index is created using faiss engine. Am I missing anything?

{
  "query": {
          "neural": {
            "my_content_vector": {
              "query_text": "how do I pay taxes?",
              "model_id": "Wz1nO4wBAQOb2BRMu2G9",
              "k": 1,
              "filter": {
                "bool": {
                  "must": [
                    {
                      "term": {
                      "tenant_id": {
                        "value": "1"
                        }
                      }
                    }
                  ]
                }
              }
            }
          }
        },
  "_source": false
}

navneet1v · 2024-02-08T18:28:27Z

@savanbthakkar I see that in your query you have not provided size parameter, the default value of size is 10. So I am wondering if you forgot to put that size here when pasting the query. Because if that is not the case, then getting 28 results or 55 results is not possible.

To understand more can you provide below details:

How many shards you have in the index?
How many segments are present in the index?

This can help me understand what happening in the backend.

igniting · 2024-06-16T13:34:20Z

Is there a limit (or a practical limitation) on maximum number of IDs which can be extracted for a filter from lucene and passed to FAISS - both from lucene side and FAISS side?

vamshin · 2024-06-16T18:34:29Z

@igniting there is no such limit on both the engines

navneet1v · 2024-06-16T18:57:48Z

Is there a limit (or a practical limitation) on maximum number of IDs which can be extracted for a filter from lucene and passed to FAISS - both from lucene side and FAISS side?

@igniting are you facing any issues related to number of ids that are getting extracted for filters?

igniting · 2024-06-17T08:44:14Z

Is there a limit (or a practical limitation) on maximum number of IDs which can be extracted for a filter from lucene and passed to FAISS - both from lucene side and FAISS side?

@igniting are you facing any issues related to number of ids that are getting extracted for filters?

No, I've not yet tested it - but we have scenarios when the matched filter set can result in millions of ids.

heemin32 · 2024-06-17T16:13:37Z

There is no limit but the query will be slower with more filtered ids. We don't have the benchmark data between request latency and number of filtered ids.

JonNorman · 2024-07-03T16:41:30Z

Thanks for such a clear write-up in the PR description!

While performing the ANN Search, we load these HNSW files from disk into Memory(not JVM Heap) if not present already and then perform the Search using respective Libraries.

What determines which segments get loaded? Is it all segments? Presumably for pre-filtering there could be an optimisation to not load in the vector segments that are filtered out, but for efficient filtering, presumably all of them need to be loaded into memory (since by definition we can't know ahead of time which ones we might need to load (without first having performed a KNN))?

If there any docs on this that you can point me to then I'd really appreciate it.

heemin32 · 2024-07-03T17:23:11Z

Pre filtering and efficient filtering is same. Every segments need to get loaded into memory for KNN search even with post-filtering because filtering is happening in document level but not segment level.

navneet1v self-assigned this May 19, 2023

github-actions bot added the untriaged label May 19, 2023

navneet1v added Features enhancement and removed untriaged labels May 20, 2023

This was referenced May 22, 2023

Enabled the efficient filtering support for Faiss Engine #906

Closed

Enabled the efficient filtering support for Faiss Engine #907

Merged

navneet1v added the v2.9.0 label May 30, 2023

jmazanec15 mentioned this issue Jun 12, 2023

[FEATURE] Support higher vector dimension limit for lucene #925

Closed

navneet1v mentioned this issue Jun 13, 2023

Added Integration Tests and Unit test for Efficient Filtering for Faiss Engine #934

Merged

5 tasks

navneet1v changed the title ~~[Feature] Add Efficient filtering support for Faiss Engine~~ [Feature] [META] Add Efficient filtering support for Faiss Engine Jun 13, 2023

This was referenced Jun 14, 2023

Added support for Efficient Pre-filtering for Faiss Engine #936

Merged

[DOC] Faiss Engine Efficient Filtering opensearch-project/documentation-website#4350

Closed

navneet1v mentioned this issue Jun 21, 2023

Added Unit Tests for KNNQueryBuilder for Faiss Filtering #942

Merged

5 tasks

This was referenced Jul 10, 2023

[Backport-2.x] Enabled the efficient filtering support for Faiss Engine. #960

Merged

Setting the OMP thread to 1 while doing the query in Faiss to reduce thread creation overhead. #965

Merged

navneet1v mentioned this issue Aug 17, 2023

[Feature] Enhancing Efficient Filters(in Faiss Engine) for Restrictive Filtering use case #1049

Closed

heemin32 closed this as completed Oct 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] [META] Add Efficient filtering support for Faiss Engine #903

[Feature] [META] Add Efficient filtering support for Faiss Engine #903

navneet1v commented May 19, 2023 •

edited

Loading

TrungBui59 commented Sep 21, 2023

navneet1v commented Sep 22, 2023

savanbthakkar commented Dec 6, 2023

navneet1v commented Dec 6, 2023

savanbthakkar commented Dec 6, 2023 •

edited

Loading

navneet1v commented Feb 8, 2024

igniting commented Jun 16, 2024

vamshin commented Jun 16, 2024

navneet1v commented Jun 16, 2024

igniting commented Jun 17, 2024

heemin32 commented Jun 17, 2024

JonNorman commented Jul 3, 2024

heemin32 commented Jul 3, 2024

[Feature] [META] Add Efficient filtering support for Faiss Engine #903

[Feature] [META] Add Efficient filtering support for Faiss Engine #903

Comments

navneet1v commented May 19, 2023 • edited Loading

Introduction

Background

K-NN Architecture For Native Engines(Indexing and Search)

Filtering

What is filtering and why it is important?

What is Efficient Filtering?

Why we need Filtering in Native Engines?

Requirement

Out Of Scope

Success Criteria

High Level Flow

Example

Create Index

Ingest Documents

Query With Filters

Query Without Filters

Task:

TrungBui59 commented Sep 21, 2023

navneet1v commented Sep 22, 2023

savanbthakkar commented Dec 6, 2023

navneet1v commented Dec 6, 2023

savanbthakkar commented Dec 6, 2023 • edited Loading

navneet1v commented Feb 8, 2024

igniting commented Jun 16, 2024

vamshin commented Jun 16, 2024

navneet1v commented Jun 16, 2024

igniting commented Jun 17, 2024

heemin32 commented Jun 17, 2024

JonNorman commented Jul 3, 2024

heemin32 commented Jul 3, 2024

navneet1v commented May 19, 2023 •

edited

Loading

savanbthakkar commented Dec 6, 2023 •

edited

Loading