Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalizing SPLADE embeddings - a bad idea? #34

Closed
adri1wald opened this issue Mar 30, 2023 · 3 comments
Closed

Normalizing SPLADE embeddings - a bad idea? #34

adri1wald opened this issue Mar 30, 2023 · 3 comments

Comments

@adri1wald
Copy link

adri1wald commented Mar 30, 2023

Hi!

I'm using SPLADE together with sentence-transformers/multi-qa-mpnet-base-cos-v1 SentenceTransformer to create hybrid embeddings for use in Pinecone's sparse-dense indexes.

The sparse-dense indexes can only use dotproduct similarity, which is why I chose a dense model trained with cosine similarity. This means I get back dense embeddings with L2 norm of 1 and dot product similarity in range [-1, 1] which I can easily rescale to the unit interval. Based on my somewhat limited understanding, this seems like a relatively sound approach to getting scores which our users can understand as % similarity (assuming in distribution).

After transitioning to sparse-dense vectors, I noticed that SPLADE does not produce normalized embeddings, which means this approach no longer works. I thought about normalizing the SPLADE embeddings, but I'm not sure how this would affect performance.

On a separate note, I'm using Pinecone's convex combination

# alpha in range [0, 1]
embedding.sparse.values = [
    value * (1 - alpha) for value in embedding.sparse.values
]
embedding.dense = [value * alpha for value in embedding.dense]

I am struggling to reason about how all of this interacts and what effect it has on ranking. See here for info on how pinecone's score is calculated and here for more details about their convex combination logic.

Any help understanding this stuff would be hugely appreciated 🙌

Cheers!

@mu4farooqi
Copy link

Although it's usually recommended to use the same similarity metric as used in training but if you see Splade's transformers wrapper, they deliberately supports cosine similarity.

@thibault-formal
Copy link
Contributor

hi @adri1wald ,
If you try to normalize SPLADE embeddings after training, this won't work (as pointed out by @mu4farooqi )

We indeed support cosine similarity -- but this is more a legacy of our initial experiments with dense models. I remember trying at some point some normalization schemes for SPLADE (as part of training), and the results were not so good.

hope it helps!

@thibault-formal
Copy link
Contributor

Closing the issue, feel free to re-open!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants