New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

feat: Add SimilarityRanker to Haystack 2.0 #5923

Merged

vblagoje merged 6 commits into main from similarity_ranker_new

Oct 6, 2023

Member

vblagoje commented Sep 29, 2023

Why:

Ranking by similarity to a query is essential. Enter SimilarityRanker.

What:

Added SimilarityRanker to the rankers package. It's all about evaluating document relevance based on similarity.

How can it be used:

Here's a quick dive into its usage:

from haystack.preview import Document
from haystack.preview.components.rankers import SimilarityRanker

ranker = SimilarityRanker()
docs = [Document(text="Sarajevo"), Document(text="Berlin")]
query = "City in Bosnia and Herzegovina"
output = ranker.run(query=query, documents=docs)
docs = output["documents"]

assert len(docs) == 2
assert docs[0].text == "Sarajevo"

How did you test it:

Unit tests are in place. Also, ran the above snippet. Worked as expected.

Notes For Reviewer:

Reviewer, please do a deep dive into the similarity metrics and integration. Let's ensure this fits seamlessly into Haystack's architecture.

vblagoje requested review from a team as code owners

September 29, 2023 13:04

vblagoje requested review from dfokina and silvanocerza and removed request for a team

September 29, 2023 13:04

vblagoje added the 2.x label

github-actions bot added topic:tests type:documentation labels

ZanSara reviewed

View reviewed changes

Contributor

ZanSara left a comment

Just some comments. nothing worrying 😊

haystack/preview/components/rankers/similarity.py Outdated

+                      :param device: torch device (for example, cuda:0, cpu, mps) to limit model inference to a specific device.
+                      """
+                      torch_and_transformers_import.check()
+                      super().__init__()

Contributor

ZanSara Sep 29, 2023

No need to call super()

haystack/preview/components/rankers/similarity.py Show resolved Hide resolved

test/preview/components/rankers/test_similarity.py Outdated Show resolved Hide resolved

test/preview/components/rankers/test_similarity.py Show resolved Hide resolved

test/preview/components/rankers/test_similarity.py Outdated Show resolved Hide resolved

Member Author

vblagoje commented Sep 29, 2023

silvanocerza reviewed

View reviewed changes

haystack/preview/components/rankers/similarity.py Outdated Show resolved Hide resolved

silvanocerza reviewed

View reviewed changes

haystack/preview/components/rankers/similarity.py Outdated

+                  def __init__(
+                      self,
+                      model_name_or_path: Union[str, Path] = "cross-encoder/ms-marco-MiniLM-L-6-v2",

Contributor

silvanocerza Oct 3, 2023

I would just go with model to be fair, type hints and documentation can do the rest.

silvanocerza reviewed

View reviewed changes

haystack/preview/components/rankers/similarity.py Outdated Show resolved Hide resolved

sjrl reviewed

View reviewed changes

haystack/preview/components/rankers/similarity.py Outdated Show resolved Hide resolved

sjrl reviewed

View reviewed changes

haystack/preview/components/rankers/similarity.py Show resolved Hide resolved

Contributor

sjrl commented Oct 4, 2023

Thanks for the work on this! I have a few more comments about potential additional features.

The addition of embed_meta_fields as shown here from V1

haystack/haystack/nodes/ranker/sentence_transformers.py

Lines 131 to 133 in 6424354

    
           docs_with_meta_fields = self._add_meta_fields_to_docs( 
        
               documents=documents, embed_meta_fields=self.embed_meta_fields 
        
           )

Sol has found this to be immensely useful just as we have for EmbeddingRetrievers in haystack V1. I think it would be a great addition to add here as well. We actively use this feature in client projects. And you can see how this is implemented for Embedding models in V2 here

haystack/haystack/preview/components/embedders/sentence_transformers_document_embedder.py

Line 26 in 6424354

metadata_fields_to_embed: Optional[List[str]] = None,

Being able to optionally specify a top_k here like we do in V1 would also be helpful I think since we will often pipe the output of this node directly into PromptNode. And I think optionally would be good in case we use nodes like the TopPSampler instead.

Member Author

vblagoje commented Oct 4, 2023 •

edited

Loading

@sjrl let's keep track of these enhancement suggestions (actually existing V1 features :-) ), but for now, I propose we integrate the simplest implementation we currently have so we can experiment and build demos.

vblagoje mentioned this pull request

revisit the Document dataclass #5945

Closed

Member Author

vblagoje commented Oct 4, 2023 •

edited

Loading

PR is awaiting #5945 more specifically, Document unfreezing. Current test failures are expected.

vblagoje mentioned this pull request

feat: Unfreeze Document in Haystack 2.0 #5974

Merged

vblagoje added topic:tests and removed topic:tests labels

vblagoje added 5 commits

October 5, 2023 17:55


          Initial SimilarityRanker

d6c7fac


          Remove score_field, update docs

0e6f382


          Add release note

4d3a923


          Pylint fix

763ebca


          PR feedback

f278786

vblagoje force-pushed the similarity_ranker_new branch from 6a410f9 to f278786 Compare

October 5, 2023 15:57

Member Author

vblagoje commented Oct 5, 2023

@sjrl and @silvanocerza would you give it one last pass please?

sjrl reviewed

View reviewed changes

test/preview/components/rankers/test_similarity.py Outdated Show resolved Hide resolved

sjrl approved these changes

View reviewed changes

Contributor

sjrl left a comment

Looks great! Only one minor comment about additional test.


          Add unit tests, fix bug

82bc1de

Member Author

vblagoje commented Oct 6, 2023

Should be gtg now, please have another look @sjrl

sjrl approved these changes

View reviewed changes

Contributor

sjrl left a comment

I'm glad that helped catch a bug!

vblagoje merged commit 1cdff64 into main

20 checks passed

vblagoje deleted the similarity_ranker_new branch

October 6, 2023 14:01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

ZanSara ZanSara left review comments

silvanocerza silvanocerza left review comments

sjrl sjrl approved these changes

dfokina Awaiting requested review from dfokina dfokina is a code owner automatically assigned from deepset-ai/documentation

Labels

2.x topic:tests type:documentation