Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] [META] Add Efficient filtering support for Faiss Engine #903

Closed
12 tasks done
navneet1v opened this issue May 19, 2023 · 13 comments
Closed
12 tasks done

[Feature] [META] Add Efficient filtering support for Faiss Engine #903

navneet1v opened this issue May 19, 2023 · 13 comments
Assignees
Labels
enhancement Features Introduces a new unit of functionality that satisfies a requirement v2.9.0

Comments

@navneet1v
Copy link
Collaborator

navneet1v commented May 19, 2023

Introduction

This Issue provides a high level overview on Efficient Filtering support for K-NN native engines. Due to the limited support filtering in Native Engines we will be enabling Efficient filtering in Faiss only.

Background

The OpenSearch K-NN plugin supports 3 different type of Engines to perform the Approximate Nearest Neighbor Search. Engines is just an abstraction provided by the plugin over what downstream libraries we will use to do the Nearest Neighbor Search. Currently we have Lucene(Java Implementation), Faiss(C++ implementation) and Nmslib(C++ implementation) as 3 different engines.
Every engine supports various algorithms do the Search. On high level we support:

  1. Lucene: HNSW algorithm
  2. Nmslib: HNSW algorithm
  3. Faiss: HNSW and IVF.

For more details you can read this documentation: https://opensearch.org/docs/latest/search-plugins/knn/knn-index/

K-NN Architecture For Native Engines(Indexing and Search)

On a very high level, an OpenSearch index data is stored in Shards. Shards are nothing but Lucene Indexes. Each shard contains segments. These segments are immutable once they are created. For indices that have K-NN fields in it the architecture is same. K-NN plugin uses the same architecture to support the ANN search. At a very high level during the segment creation, apart from creating all the different data structures(Like FST, BKDs, DocValues etc) needed for different fields, for a K-NN field for HNSW algorithm K-NN plugin creates the HNSW graph files per K-NN Vector Field. These files are written down as segment files.
While performing the ANN Search, we load these HNSW files from disk into Memory(not JVM Heap) if not present already and then perform the Search using respective Libraries.

FaissEfficientFiltering-Architecture HLD (1)

Filtering

As of OpenSearch 2.8 version K-NN plugin support various types of filters(ref doc). But the type of filtering that we are proposing as part of this document is only present with Lucene engine. None of the Native Engines(Faiss and NmsLib) doesn’t support it.

What is filtering and why it is important?

Filtering is a way that lets users to restrict their search to a certain part of their data. In terms of vector search: The goal is to return the nearest neighbor for a query(containing query vector and filter) among the data points that satisfy the filter specified with query. Lets try to understand this with an example specific to vector search:
Let’s say we have an index where we are storing the product catalog, where images are represented as vectors, along with the same index we are storing the ratings, date when it was uploaded, total review etc. The customer of that application is trying to search for similar products(providing that as a vector) but wants only that products which has Rating >= 4.
So, to honor such queries we need filtering along with Vector Search.

What is Efficient Filtering?

In Filtering world, there are basically 2 types of filtering, lets try to understand then in terms of Vector Search:

  1. Pre-Filtering: Pre filtering is mainly means, for a given query with filter you first apply the filters on your complete corpus to get the filtered documents and now on those filtered documents you do the Vector search. In general vector search on those filtered documents can be either be done via Exact Search or creating a new HNSW graph with these filtered Ids during runtime and then do a search.
  2. Post Filtering: Post filtering is mainly means, for a given query with filter you first do the Vector Search and then what ever documents you get apply filter on top of it. This has obvious problems because now the total number of results can be < k if we apply filters.

Efficient Filtering: Efficient filtering is an improvement over pre-filtering especially, where main idea

  1. we apply the filters first to get the filterIds and while doing the ANN on the whole corpus we only consider the docIds which are present in the filterIds set.
  2. We intelligently figure out when do ANN search on whole corpus with FilterIds and when do the exact search. The main idea behind doing this is in worst case you need to look at at-least K docs while doing ANN, but if the filterIds < K doing an exact search is efficient both in terms of latency and recall.

Note: The ideas discussed in this section around efficient filtering is not the only things we are considering in efficient filtering. Detail explanations are provided in the doc over the same, which are specific to OpenSearch K-NN Plugin.

Why we need Filtering in Native Engines?

There are couple of limitation with Lucene because of which we want add Filtering in Native engines:

  1. Lucene has limitation on the number of dimensions which 1024. With current boom in LLM space, the vector dimensions > 1024(Open AI has 1500 dimensions). So customer who want to use OpenSearch K-NN plugin won’t be able to take full advantage of K-NN plugin. There has been issues cut to lucene to increase the dimensions size to 2048 but seems like Lucene community is bit divided there. Currently native engines support 16K dimensions.
  2. Lucene doesn’t allow us to change the ef_search parameter while doing the search. This parameter has huge impact on both latency and recall. For filtering this will be more visible. Exp5 shows that.

Requirement

Implement Efficient Filtering for Native Engines(along with different engines algorithms) of OpenSearch K-NN plugins.

Out Of Scope

  1. Based on the deep-dive and POCs(refer POC section of this document) done, NMSLib in current form doesn’t have support for doing filtering with ANN Search. Hence filtering will be added only for Faiss Engine. Please refer Future Enhancements section around how we can support filtering on NmsLib engine.

Success Criteria

  1. Native engine should support all the different types of filter clauses currently supported in OpenSearch and by Lucene Engine.
  2. The Recall and Latency of the filtered query(both Relaxed and Restrictive Filters) should be in acceptable range.

High Level Flow

Below is a simplified flow/logic on how filtered search will work.

FaissEfficientFiltering-HLD for Efficient Filtering (1)

Filtered Search Flow: Users create a k-NN query with filter and calls the OpenSearch _search api to do the query with filter. The query follows the standard Query Route at the Coordinator node and routed to different shards(can be present on different or same Data Nodes). On the data node, the query is written into Lucene based Query Interface for every shard after performing validation checks(like does filter is supported by this KNN engine or not.). On every shard this query gets executed via IndexSearchers. The IndexSearcher runs the K-NN Query on each Shard, where each K-NN Query will build the a filterQuery(Lucene Constant Score Query) and run the query to return weight of the query. This filter weight is passed to KNNWeight where:

  1. It first resolves the FilterWeight to get FilterDocIds.
  2. Check if exact search needs to be done or not. If no, it loads the graphs which are not loaded in cache and then call the JNI Layer, which further calls the Faiss Library to do the Search.
  3. If exact Search is required, we do exact search on the filterIds and then return the results back to IndexSearcher.

Once the results are returned standard OpenSearch Query flows happens, where a list of docIds of length “size”, is send from each shard to coordinator, which then compile the top docs Ids (of length == size parameter) and runs the fetch phase to get more information on these docIds.

Example

Create Index

PUT /hotels-index
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100,
      "number_of_shards": 1,
      "number_of_replicas": 0,
            "refresh_interval": "1s"
    }
  },
  "mappings": {
    "properties": {
      "location": {
        "type": "knn_vector",
        "dimension": 2,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "faiss"
        }
      }
    }
  }
}

Ingest Documents

POST /_bulk?refresh
{ "index": { "_index": "hotels-index", "_id": "1" } }
{ "location": [5.2, 4.4], "parking" : "true", "rating" : 5 }
{ "index": { "_index": "hotels-index", "_id": "2" } }
{ "location": [5.2, 3.9], "parking" : "false", "rating" : 4 }
{ "index": { "_index": "hotels-index", "_id": "3" } }
{ "location": [4.9, 3.4], "parking" : "true", "rating" : 9 }
{ "index": { "_index": "hotels-index", "_id": "4" } }
{ "location": [4.2, 4.6], "parking" : "false", "rating" : 6}
{ "index": { "_index": "hotels-index", "_id": "5" } }
{ "location": [3.3, 4.5], "parking" : "true", "rating" : 8 }
{ "index": { "_index": "hotels-index", "_id": "6" } }
{ "location": [6.4, 3.4], "parking" : "true", "rating" : 9 }
{ "index": { "_index": "hotels-index", "_id": "7" } }
{ "location": [4.2, 6.2], "parking" : "true", "rating" : 5 }
{ "index": { "_index": "hotels-index", "_id": "8" } }
{ "location": [2.4, 4.0], "parking" : "true", "rating" : 8 }
{ "index": { "_index": "hotels-index", "_id": "9" } }
{ "location": [1.4, 3.2], "parking" : "false", "rating" : 5 }
{ "index": { "_index": "hotels-index", "_id": "10" } }
{ "location": [7.0, 9.9], "parking" : "true", "rating" : 9 }
{ "index": { "_index": "hotels-index", "_id": "11" } }
{ "location": [3.0, 2.3], "parking" : "false", "rating" : 6 }
{ "index": { "_index": "hotels-index", "_id": "12" } }
{ "location": [5.0, 1.0], "parking" : "true", "rating" : 3 }

Query With Filters

POST /hotels-index/_search
{
  "size": 10,
  "query": {
    "knn": {
      "location": {
        "vector": [5,4],
        "k": 10,
        "filter": {
          "bool": {
            "must": [
              {
                "range": {
                  "rating": {
                    "gte": 7,
                    "lte": 10
                  }
                }
              },
              {
                "term": {
                  "parking": "true"
                }
              }
            ]
          }
        }
      }
    }
  }
}

Query Without Filters

POST /hotels-index/_search
{
  "size": 10,
  "query": {
    "knn": {
      "location": {
        "vector": [
          5,
          4
        ],
        "k": 3
      }
    }
  }
}

Task:

@navneet1v navneet1v self-assigned this May 19, 2023
@navneet1v navneet1v added Features Introduces a new unit of functionality that satisfies a requirement enhancement and removed untriaged labels May 20, 2023
@navneet1v navneet1v changed the title [Feature] Add Efficient filtering support for Faiss Engine [Feature] [META] Add Efficient filtering support for Faiss Engine Jun 13, 2023
@TrungBui59
Copy link

Hi, is this issue still open for contribution?

@navneet1v
Copy link
Collaborator Author

@TrungBui59 all the tasks have been completed for this issue. I am waiting for the opensearch 2.10 release to happen once that is done this issue will be closed. But there are several others related to this issue which you can contribute.

#1050

You can also view issues with tags good first issue and help wanted.

@savanbthakkar
Copy link

@navneet1v , Does it work in a similar way for neural query as well? Or it's just for "kNN" query?

@navneet1v
Copy link
Collaborator Author

Yes this works in same for neural query clause also..

@savanbthakkar
Copy link

savanbthakkar commented Dec 6, 2023

Thank you @navneet1v
I have single opensearch index with kNN true for my multi-tenant system.
I index documents of only one tenant (with tenant_id 1) , then run the query below, I get 55 results.
Then I index documents of another tenant (with tenant_id 2), then run the same query below, I get 28 results.
My index is created using faiss engine. Am I missing anything?

{
  "query": {
          "neural": {
            "my_content_vector": {
              "query_text": "how do I pay taxes?",
              "model_id": "Wz1nO4wBAQOb2BRMu2G9",
              "k": 1,
              "filter": {
                "bool": {
                  "must": [
                    {
                      "term": {
                      "tenant_id": {
                        "value": "1"
                        }
                      }
                    }
                  ]
                }
              }
            }
          }
        },
  "_source": false
}

@navneet1v
Copy link
Collaborator Author

@savanbthakkar I see that in your query you have not provided size parameter, the default value of size is 10. So I am wondering if you forgot to put that size here when pasting the query. Because if that is not the case, then getting 28 results or 55 results is not possible.

To understand more can you provide below details:

  1. How many shards you have in the index?
  2. How many segments are present in the index?

This can help me understand what happening in the backend.

@igniting
Copy link

Is there a limit (or a practical limitation) on maximum number of IDs which can be extracted for a filter from lucene and passed to FAISS - both from lucene side and FAISS side?

@vamshin
Copy link
Member

vamshin commented Jun 16, 2024

@igniting there is no such limit on both the engines

@navneet1v
Copy link
Collaborator Author

Is there a limit (or a practical limitation) on maximum number of IDs which can be extracted for a filter from lucene and passed to FAISS - both from lucene side and FAISS side?

@igniting are you facing any issues related to number of ids that are getting extracted for filters?

@igniting
Copy link

Is there a limit (or a practical limitation) on maximum number of IDs which can be extracted for a filter from lucene and passed to FAISS - both from lucene side and FAISS side?

@igniting are you facing any issues related to number of ids that are getting extracted for filters?

No, I've not yet tested it - but we have scenarios when the matched filter set can result in millions of ids.

@heemin32
Copy link
Collaborator

There is no limit but the query will be slower with more filtered ids. We don't have the benchmark data between request latency and number of filtered ids.

@JonNorman
Copy link

Thanks for such a clear write-up in the PR description!

While performing the ANN Search, we load these HNSW files from disk into Memory(not JVM Heap) if not present already and then perform the Search using respective Libraries.

What determines which segments get loaded? Is it all segments? Presumably for pre-filtering there could be an optimisation to not load in the vector segments that are filtered out, but for efficient filtering, presumably all of them need to be loaded into memory (since by definition we can't know ahead of time which ones we might need to load (without first having performed a KNN))?

If there any docs on this that you can point me to then I'd really appreciate it.

@heemin32
Copy link
Collaborator

heemin32 commented Jul 3, 2024

Pre filtering and efficient filtering is same. Every segments need to get loaded into memory for KNN search even with post-filtering because filtering is happening in document level but not segment level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Features Introduces a new unit of functionality that satisfies a requirement v2.9.0
Projects
None yet
Development

No branches or pull requests

7 participants