Hybrid search & normalization #64

ShravanSunder · 2024-04-25T17:47:05Z

Hello! I see many articles (like pinecones) that use the following ways to combine the hybrid search results from dense vector and splade.

However i'm a bit confused of how it would work if the dense vectors are normalized to 1, but splade's output is not. any thoughts. What is the best way to conduct hybrid search with both vectors?

I understand the ANN search is done with dot product, so we would just use the highest score and not try to normalize?

def hybrid_scale(dense, sparse, alpha: float):
    # check alpha value is in range
    if alpha < 0 or alpha > 1:
        raise ValueError("Alpha must be between 0 and 1")
    # scale sparse and dense vectors to create hybrid search vecs
    hsparse = {
        'indices': sparse['indices'],
        'values':  [v * (1 - alpha) for v in sparse['values']]
    }
    hdense = [v * alpha for v in dense]
    return hdense, hsparse

i seee this prior issue: #34 but it seemed inconclusive

The text was updated successfully, but these errors were encountered:

h4gen · 2024-06-07T13:17:18Z

If scaling is unclear you can just use Reciprocal Rank Fusion. This only takes the generated ranking into account so there is no need for some kind of normalization. Convex scaling as in your example can outperform RRF, because the distribution is considered. However, to do so you also have to know your distributions quite well. This paper is quite interesting to understand the problem better.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hybrid search & normalization #64

Hybrid search & normalization #64

ShravanSunder commented Apr 25, 2024 •

edited

Loading

h4gen commented Jun 7, 2024

Hybrid search & normalization #64

Hybrid search & normalization #64

Comments

ShravanSunder commented Apr 25, 2024 • edited Loading

h4gen commented Jun 7, 2024

ShravanSunder commented Apr 25, 2024 •

edited

Loading