Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Two-phased Search Re-score Design #1861

Closed
jmazanec15 opened this issue Jul 22, 2024 · 0 comments
Closed

[RFC] Two-phased Search Re-score Design #1861

jmazanec15 opened this issue Jul 22, 2024 · 0 comments
Assignees
Labels
Features Introduces a new unit of functionality that satisfies a requirement Roadmap:Vector Database/GenAI Project-wide roadmap label search-improvements v2.17.0

Comments

@jmazanec15
Copy link
Member

Introduction

Document provides a design for re-score portion of disk-based two-phase ANN search project (See #1779).

Problem Statement

Quantization provides a significant reduction in memory consumption for vector search. This is vital to customers in order to control costs for their neural search systems. However, this comes at the cost of k-NN search accuracy, because the distances being computed are approximate. From a user perspective, this results in degraded search relevance.

In order to improve search accuracy while still reaping benefits of reduced memory consumption provided by quantization, a two-phased disk-based approach can be taken, with a predictable increase in latency (this will depend on a lot of system/workload specifics - but we have seen only 10’s of ms increases in some cases).
image

From preliminary experiments, we have seen this work very well (See #1779 (comment))

With that, we need to provide a way for users to execute this two-phased search approach in OpenSearch in an easy to use, yet efficient manner.

Requirements

Functional

  1. Support ability for users to over-sample results from a quantized index
  2. Support ability for users to specify that results from the quantized index should be re-scored with full precision
  3. Re-score should work on quantized index types produced by native engines (i.e. faiss PQ or custom BQ). It should re-score if the index uses full-precision scoring for ANN.
  4. Full precision data set should not need to be fully-resident in memory for strong performance
  5. Works for neural search query type as well

Non-functional

  1. Keep implementation as engine/algorithm agnostic as possible
  2. Behavior for radial search should be well-defined
  3. API interface needs to be easy to use. It should have a simple syntax that is flexible enough to support all functionality.

Out of scope/Future scope

  1. Hierarchical-quantization based re-scoring. One optimization we could make would be to do multiple layers of quantization and then do re-scoring on fewer and fewer results with higher and higher precision. While it may be useful in the future, there is not enough evidence to justify this complexity.
  2. Lucene quantized indices. Lucene quantized indices are structured a little bit differently than the native indices. One major challenge here will be pinning the quantized index in memory so that it does not get paged in and out frequently. We will support Lucene quantized indices, but it will be tracked separately from this (see this issue we opened). Leaving it out of this doc for now. Implementation may be able to re-use components in future.

High Level Design

Proposed

Architecture
We are going to augment our existing k-NN query to support re-scoring logic. A user would specify if re-scoring should be done and, if so, the factor at which to over-sample the quantized ANN index. Results from the ANN search will be merged from the segments and then re-scored and reduced to k overall.

Shard Two-Phased Search Flow
image

Alternative 1: Use script-score functionality already available in query dsl to implement re-scoring.

Currently, it is possible to do the following to achieve re-score functionality via script-scoring and/or re-scoring via k-NN scripts. We could just use these to support the functionality.

Pros

  • Easy — its already implemented
  • Flexible — more complex scoring functionality can be used because scripting allows for it

Cons

  • Scripting interface is verbose — Requires a lot of characters to execute the query
  • Scripting interface would require user to repeat vector (inefficient)
  • Script can be more expensive — scripts will need to compile

Overall, this option was not selected mainly because the user experience is complex.

Alternative 2: Implement re-scoring at the segment level by modifying KNNScorer

Re-scoring could be applied to a segment’s quantized ANN search results before getting merged with the shards top results. The major implication is that performance will vary depending on how many segments there are.

Pros

  • Easy — pretty minor changes

Cons

  • Segment count will have strong influence on performance
  • Not easy to reuse for Lucene-based quantization

Because segment count will have heavy impact on performance, we are not going with this approach.

Access Pattern Optimization Strategy

The re-score process reads vectors from secondary storage into memory in order to recompute the distances. Reads from secondary storage are typically slow and can be a bottleneck. Thus, for optimal performance, secondary storage IOPs should be minimized.

In general, there are two ways that IOPs can be minimized for re-scoring:

  1. Get multiple vectors to be re-scored per IOP
  2. Cache frequently used vectors

Initially, we will not explicitly implement any optimization strategy, for the following reasons:

  1. Because the vector files are mmap’d, we will automatically get some benefits from the page cache. It may not be optimal, but it is something
  2. From our experiments, we have seen acceptable latencies with good throughputs without any optimization
  3. Implementing either strategy will require a large effort.

In the future, we will explore this. However, given successful performance from tests, it does not need to be done as p0.

API Design
Important: completely out of box, one parameter, API design not covered here (see mode param in RFC). Will be covered in future. This section focuses on lower level mechanisms for controlling re-scoring.

The API should:

  1. allow users to signal that re-scoring should be done
  2. allow the plugin to transparently configure the oversampling factor of the quantized ANN index when we develop out-of-box mode interface.
  3. should maintain logical meaning of k parameter as the number of competitive results to retrieve per segment

With that, we propose the following API:

# Simple - just re-score top k
POST my-vector-index/_search
{
  "size": 2,
  "query": {
    "knn": {
      "my_vector_field": {
        "vector": [1.5, 5.5,1.5, 5.5,1.5, 5.5,1.5, 5.5,1.5, 5.5],
        "k": 10,
        "rescore": true
      }
    }
  }
}

# Configure oversample
POST my-vector-index/_search
{
  "size": 2,
  "query": {
    "knn": {
      "my_vector_field": {
        "vector": [1.5, 5.5,1.5, 5.5,1.5, 5.5,1.5, 5.5,1.5, 5.5],
        "k": 10,
        "rescore": {
            "oversample_factor": 1.5
        }
      }
    }
  }
}

We are introducing 2 new optional parameters:

  • rescore (bool or dict; default=false) — boolean parameter indicating whether or not the scores should be re-scored with full-precision vectors. If it is a dict, oversample factor can be set
  • oversample_factor (float; default=1.0) — multiplier of k to over-sample the results from the query.

With this, users can still logically reason that k means the number of competitive results per segment. Additionally, the oversample_factor parameter is necessary because it will allow us to set a default for disk based out of box experience to abstract the underlying two-phase search algorithm from the user.

Radial Search Behavior

Re-score functionality will be partially available with radial search. There are a couple problems supporting re-scoring for the radial use-case:

  1. Binary quantization will take a user’s vectors in a linear space and quantize them into a binary space. It is very difficult to translate the distance threshold in a linear space to a binary space that would preserve the meaning.
  2. Accuracy issues at high compression levels could lead to inconsistent performance. Because the distances for the ANN search will be approximated, when the results are re-scored, the results may no longer fall within the threshold provided by the user (or results may be missed).

For (1), we will disable radial support for binary quantization. We may come back and try to find a way to solve this, but it will be a large effort that may not have great return. We could provide functionality where we set the radial distance to the maximum and then apply the threshold on re-scoring. However, this would mean that ANN search performance would always be the slowest when setting the radius. This would be a bad experience.

For (2), we will support radial search, and will filter on threshold during re-scoring. This could mean that if the distance approximations are significantly underestimated, we get 0 results. While this is not optimal, it will at least preserve the contract on the radial search problem — i.e. we will not be returning a result we know falls outside of the radius. We will give users a way to avoid this by using the oversample_factor parameter to expand the radius on the ANN index search.

Full-precision ANN Index Behavior

Re-score functionality for full-precision ANN indices will be a no-op. Because the results are returned with full-precision scores, it does not make sense to re-score them and then get the same score. That being said, if oversampling is applied, each ANN search will return oversample*k results and they will be reduced to k results. This is more to ensure that logical consistency is maintained within the query API.

Metrics

From a metrics perspective, in order to monitor performance, we will rely on existing metric providers/functionality.

Re-scoring is very IO intensive. It will require a significant number of IOPS in order to properly load the full precision vectors into memory for each query. Thus, to monitor the health of the system, users will need to monitor existing disk/storage based metrics. For example, if the backing storage device is EBS, it would be important to monitor the throughput and IOPS published by EBS. Additionally, node stats can be used to monitor fs and disk metrics via OpenSearch API. We will not add any explicit new resource-monitoring functionality. This will be pretty system specific and need to be handled in the monitoring done for the deployment environment.

In addition, for breakdowns around latency per shard, users can use the profile API to see what time is being spent on the query to troubleshoot slow queries. See the profile API for more details. Specifically, the ANN search will be happening during the rewrite stage of the query so large profile.shards.searches.rewrite_time times will be indicative of poor behavior.

[Feedback requested] Lastly, we could implement a debug flag for the k-NN query that would indicate that fine-grained query debugging/profiling should take place. This could help an operator get fine grained performance specifics on the k-NN query without impacting the whole production system. It would also not add as much overhead as the profile API. The behavior would be that with this flag enabled, we log out debugging statements per shard on the time taken for each operation. This would help zero in on what the performance issue is being caused.

Low Level Design

For the plugin, the k-NN search happens when the per-segment scorer is created (see KNNWeight.scorer). In order to implement the re-score functionality, we are going to modify this process so that the ANN search happens at the rewrite stage (similar to Lucene k-NN) and then we will reduce per segment results and then re-score. The main benefits of this approach are:

  1. Shard level re-scoring as opposed to segment will make performance independent of shard count
  2. We can migrate to Lucene k-NN queries more easily (they do the ANN at the re-write time).

See #1845 for more details. With this, the scorer that is built will only have k results and full precision distances.

IndexSearch.searcher for k-NN with re-score (what search looks like at shard level)
image
Note — The result reduction phase is oversimplified for the sake of brevity. In this phase, lucene will iterate through the scorers and collect the scores by calling scorer.score() and then reduce. See IndexSearcher code for details.

While we could have chosen to return a scorer that would lazily compute the scores when score was called during the reduction step, we chose to recompute scores before scorer is created so that

  1. Disk-reads would happen in close proximity with each other. This could potentially improve performance
  2. We can reduce the matches per segment from oversample_factor*k → k. Otherwise, the scorer would need to say all of them matched and more than k results would be iterated over at the scoring level.

With this in mind, we will need to ensure that we only re-score live docs so that we do not unnecessarily re-score documents that are deleted.

Future Work

  1. Investigate access pattern optimization strategies to improve performance.
  2. Support Lucene quantized indices with re-scoring
@jmazanec15 jmazanec15 added the Features Introduces a new unit of functionality that satisfies a requirement label Jul 22, 2024
@navneet1v navneet1v moved this to 2.17.0 in Vector Search RoadMap Jul 22, 2024
@vamshin vamshin added the Roadmap:Vector Database/GenAI Project-wide roadmap label label Aug 26, 2024
@github-project-automation github-project-automation bot moved this from 2.17.0 to ✅ Done in Vector Search RoadMap Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Features Introduces a new unit of functionality that satisfies a requirement Roadmap:Vector Database/GenAI Project-wide roadmap label search-improvements v2.17.0
Projects
Status: New
Status: Done
Development

No branches or pull requests

4 participants