Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: Pinecone Vector Store incorrectly assigns score in similarity_search_with_relevance_scores #12697

Closed
dangoldbj opened this issue Nov 1, 2023 · 2 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: pinecone Primarily related to Pinecone vector store integration Ɑ: vector store Related to vector store module

Comments

@dangoldbj
Copy link

Issue you'd like to raise.

With the Pinecone index setup with cosine metric, so the scores received from the vector database are already in thecosine metric.

However, the _select_relevance_score_fn of langchain pinecone vector store returns the base vector store's _cosine_relevance_score_fn for the cosine distance strategy.

def _select_relevance_score_fn(self) -> Callable[[float], float]:

In the base vector store implementation of _cosine_relevance_score_fn the score is the complement.

https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/schema/vectorstore.py#L187

So the cosine metric of 0.81 returned by the pinecone vector database indicating a higher similarity is turned into 0.19 in langchain and processed as very low similarity.

Suggestion:

No response

@dosubot dosubot bot added Ɑ: vector store Related to vector store module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: pinecone Primarily related to Pinecone vector store integration labels Nov 1, 2023
Copy link

dosubot bot commented Nov 1, 2023

🤖

Hello @dangoldbj,

I'm here to assist you with your issue. I'm an AI bot designed to help solve bugs, answer questions, and guide you in becoming a contributor while you're waiting for a human maintainer.

From your description, it seems like the issue is with the _cosine_relevance_score_fn method in the Pinecone vector store of LangChain. This method is used to calculate the relevance score of a document in relation to a query when the distance strategy is COSINE. However, as you've pointed out, the score is being complemented, which is causing a higher similarity score to be processed as a very low similarity.

There have been similar issues raised in the past. For instance, Issue #11587 was about the similarity_search_with_relevance_scores not working properly with Pinecone. The suggested solution was to provide a custom relevance_score_fn during the initialization of the ScaNN class that transforms the raw scores in a way that suits your needs. Here's an example of how you can do this:

class PineconeCustom(Pinecone):
    def _cosine_relevance_score_fn(distance: float) -> float:
        """Normalize the distance to a score on a scale [0, 1]."""
        return (1.0 + distance)/2

vectorstore = PineconeCustom(index, embed.embed_query, text_field)
....

This code creates a new class PineconeCustom that inherits from the Pinecone class. It overrides the _cosine_relevance_score_fn method to return a score on a scale from 0 to 1. This score is calculated by adding 1.0 to the distance and dividing by 2. This should give you higher similarity scores.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Best,
Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Copy link

dosubot bot commented Feb 3, 2024

Hi, @dangoldbj,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you raised an issue regarding the incorrect assignment of scores in the similarity_search_with_relevance_scores function in the Pinecone Vector Store. There was a suggestion for a potential solution involving the customization of the relevance_score_fn during the initialization of the ScaNN class. However, it seems that the issue remains unresolved.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, kindly let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Feb 3, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: pinecone Primarily related to Pinecone vector store integration Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

1 participant