-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Design for Incorporating Reciprocal Rank Fusion into Neural Search #865
Comments
IMO, the customer should provide the weights for different subqueries. Just out of curiosity, how to adapt the RRF formula by user-determined weights? |
Intuitively, alternative 2 makes more sense to me. Can you elaborate more on this con? |
We can definitely add weights, it's good addition. |
I understand that rrf can only be used with rrf. We can mandate our user to use them when validating parameters. |
Hi @martin-gaievski ! After the PR is almost code ready, could you please showcase the impact of RRF processor on search latency. Just like what Zhichao and Cong have done in the two phase blog: https://opensearch.org/blog/Introducing-a-neural-sparse-two-phase-algorithm/ |
We're planning to run the benchmark in regards to relevancy and performance after implementation reaches code-complete state in feature branch. My forecast is - it shouldn't be much different from normalization processor, we're using same approach but less complex computation logic. |
Fundamentally score normalization and rank based combination are different. What's the benefit of adding normalization technique parameter now? We can add normalization later if that's needed, it will be backward compatible change. |
Agree that this is similar with the normalization processor. Please also take search latency into account. |
Introduction
This document outlines the design for implementing Reciprocal Rank Fusion, RRF, within the neural search plugin. RRF, first outlined in a 2009 paper by Cormac, Clarke, and Büttcher, is a method for combining rankings from several subqueries, providing better search results than other systems. Risk-reward tradeoffs for different fusion techniques are analyzed in this paper. The general formula for RRF, where k = rank constant and query_j_rank is the ranking for a document when it is returned in a query method in hybrid query, is as follows:
Background
The OpenSearch neural search plugin provides semantic search query ability to users by using ML models for embedded text. It supports hybrid queries, which combine text-based search with neural vector search. This design document describes implementation of the RRF technique to combine results from different parts of hybrid query to improve the search relevance.
One advantage of RRF is that it is simple and provides a standard way of processing results. It is scalable and doesn’t require tuning (other than adjusting rank constant). Its simplicity makes it intuitive for users. The algorithm has also been used successfully by several other organizations already. Moreover, incorporating RRF into hybrid query will facilitate customer migration to OpenSeach from platforms where they are already using RRF. Some customers have already requested it to be added as a technique. Based on these reasons, we believe that RRF will provide a benefit to users performing hybrid query.
Functional Requirements
Non Functional Requirements
Document Scope
In this document we propose a solution for questions below:
Out of Document Scope
Solution Overview
Implement RRF within the neural search plugin by adding two classes, RRFProcessor and RRFProcessorFactory, and calling them from the NeuralSearch class. Rank constant could be set by user during query but have a default value. A value of 60 for rank constant is common and was used in the original paper describing the technique. It must be greater than or equal to 1. We can include guidance for customers showing how the rank scores shrink as the rank constant grows, and vice versa, to help inform their choice.
Solution HLD: Architectural and Component Design
Proposed Solution
Pros:
Cons:
Potential Issues
Risks / Known limitations and Future extensions
The following diagram shows the high-level approach for implementing RRF in hybrid query
Solution LLD
For more information about hybrid search, see the following https://opensearch.org/blog/hybrid-search/
The following example uses the data from this tutorial: https://opensearch.org/docs/latest/search-plugins/neural-search-tutorial/
I’ve copied over the scores from the match query and neural query into the table below and used the resulting numbers to calculate the ranks and rank scores
RRF used in hybrid query setting up search pipeline:
Using hybrid query currently, setting up search pipeline (no changes from current setup):
The scores and ranks are based on the example data:
Implementation Details
Benefits
Alternatives Considered
Alternative 1:
Implementing RRF as a NormalizationTechnique and CombinationTechnique (RRFNormalizationTechnique and RRFCombinationTechnique classes) that would be called by NormalizationProcessor the same way it currently works when configured with, for example, L2NormalizationTechnique and ArithmeticMeanScoreCombinationTechnique. The RRFNormalizationTechnique class would perform the subquery-level reciprocal rank score calculation part of the algorithm and would pass the rank scores of all documents in each subquery to the RRFCombinationTechnique class to combine each document’s rank scores, then merge and sort the results, returning a list of documents sorted by combined RRF rank scores
Pros:
Cons:
Example pipeline configuration call
Alternative 2:
Implementing RRF as a processor, RRFProcessor at the same level as NormalizationProcessor, registered by a RRFProcessorFactory, and the RRFProcessor calling a RRFNormalizationTechnique and RRFCombinationTechnique as described above
Pros:
Cons:
Example search pipeline configuration call
Potential Issues
Risks / Known limitations and Future extensions
Backward Compatibility
Will be backward compatible, will ensure that omitting RRF-specific parameters in request configurations will not cause problems
Testability
I will be writing unit tests and integration tests for this implementation
Benchmarking
Benchmarking will involve testing across several datasets used in the initial normalization implementation testing (NFCorpus, Trec-Covid, ArguAna, FiQA, Scifact, DBPedia, Quora, Scidocs, CQADupStack, Amazon ESCI) and comparing the average nDCG@10 score against that of the L2 and Min-Max normalization techniques combined with the Arithmetic, Geometric, and Harmonic combination techniques, using BM25 nDCG@10 scores on these datasets as the baseline. We will also update nightlies benchmark runs to capture performance.
Feedback Required
Should we consider adding weights for combining rank scores from different subqueries? If so, how would weights be determined?
Are there concerns about or objections to incorporating Reciprocal Rank Fusion (RRF) as a new processor, instead of integrating it into the existing NormalizationProcessor? If there are foreseeable problems with this approach, what would a better alternative look like?
The text was updated successfully, but these errors were encountered: