feat: Fastembed - add FastembedRanker #1178

paulmartrencharpro · 2024-11-12T14:21:26Z

Proposed Changes:

New feature: integrate with the new ranker from fastembed. (https://qdrant.tech/articles/cross-encoder-integration-gsoc/, https://github.com/qdrant/fastembed/tree/main?tab=readme-ov-file#-rerankers)

I based my code on CohereRanker and Fastembed existing code

How did you test it?

I wrote all the tests in test_fastembed_ranker.py, based on the other tests for fastembed.

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.

anakin87

Hey! Thank you for this PR.

I've had a first look and it seems good.

I have an idea: in general, we have an embedding backend for local Embedders because we expect users to use both Text Embedder and Doc Embedder based on the same model; this saves resources (the model is loaded only once).

For rankers, we don't have this problem.
So I would ask you to remove the backend and do everything in the component (Cohere can be taken as example).

Let me know and thanks again!

integrations/fastembed/src/haystack_integrations/components/rankers/fastembed/ranker.py

anakin87

In general, this PR is almost ready to be merged.

TODO:

modify this line in README to mention ranker:

haystack-core-integrations/README.md

Line 37 in c343776

    
           | [fastembed-haystack](integrations/fastembed/)                                                                  | Embedder            | [![PyPI - Version](https://img.shields.io/pypi/v/fastembed-haystack.svg)](https://pypi.org/project/fastembed-haystack/)                                  | [![Test / fastembed](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/fastembed.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/fastembed.yml)                                  |

add more tests if you can/want (uncovered lines: 66-67, 109, 115->exit, 162-163, 166, 170-171, 174-175)

Then, if you also want to open a PR on haystack-integrations to update this page ( https://github.com/deepset-ai/haystack-integrations/blob/main/integrations/fastembed.md), it would be appreciated - otherwise, I'll take care of it.

integrations/fastembed/src/haystack_integrations/components/rankers/fastembed/ranker.py

integrations/fastembed/tests/test_fastembed_ranker.py

paulmartrencharpro · 2024-11-13T14:29:42Z

In general, this PR is almost ready to be merged.

TODO:
* modify this line in README to mention ranker: https://github.com/deepset-ai/haystack-core-integrations/blob/c3437763e7c80e11e9ed8678f1b235353a316781/README.md?plain=1#L37

* add more tests if you can/want (uncovered lines: 66-67, 109, 115->exit, 162-163, 166, 170-171, 174-175)
Then, if you also want to open a PR on haystack-integrations to update this page ( https://github.com/deepset-ai/haystack-integrations/blob/main/integrations/fastembed.md), it would be appreciated - otherwise, I'll take care of it.

I updated the Readme and added tests.
The only test I cannot write is the one that check that we don't create 2 TextCrossEncoder when we call warm_up multiple times. I don't understand the patch/mockup code to replicate it. I wrote this but it's not working:

@patch('fastembed.rerank.cross_encoder.TextCrossEncoder')
def test_warmup_twice(self, mock_encoder):
    """
    Test for checking that warm up only creates one TextCrossEncoder
    """
    ranker = FastembedRanker(model_name="Xenova/ms-marco-MiniLM-L-12-v2")
    ranker.warm_up()
    ranker.warm_up()

    # Assert that TextCrossEncoder is called only once
    mock_encoder.assert_called_once()

I'll make the PR on the other repo

anakin87

Everything looks good now!

Thanks for the good work. 💟

(I'll track the related stuff in #1188)

anakin87 · 2024-11-13T17:32:31Z

something is failing with merging this PR.
I will try to close it and reopen it.

paulmartrencharpro requested a review from a team as a code owner November 12, 2024 14:21

paulmartrencharpro requested review from anakin87 and removed request for a team November 12, 2024 14:21

github-actions bot added type:documentation Improvements or additions to documentation integration:fastembed and removed type:documentation Improvements or additions to documentation labels Nov 12, 2024

anakin87 requested changes Nov 12, 2024

View reviewed changes

anakin87 reviewed Nov 13, 2024

View reviewed changes

integrations/fastembed/tests/test_fastembed_ranker.py Outdated Show resolved Hide resolved

anakin87 changed the title ~~feat: Fastembed: Add a new ranker based on the new rerankers from fastembed v0.4.1+~~ feat: Fastembed - add FastembedRanker Nov 13, 2024

anakin87 self-requested a review November 13, 2024 17:17

anakin87 approved these changes Nov 13, 2024

View reviewed changes

anakin87 closed this Nov 13, 2024

anakin87 reopened this Nov 13, 2024

squashing

c1d6d48

anakin87 force-pushed the main branch from b7493f1 to c1d6d48 Compare November 13, 2024 17:38

anakin87 self-requested a review November 13, 2024 17:39

anakin87 approved these changes Nov 13, 2024

View reviewed changes

anakin87 changed the base branch from main to vertexgenerator-rm-tools November 13, 2024 17:48

anakin87 changed the base branch from vertexgenerator-rm-tools to main November 13, 2024 17:48

anakin87 merged commit 55a1cbf into deepset-ai:main Nov 13, 2024
8 checks passed

anakin87 mentioned this pull request Nov 13, 2024

Docs for FastEmbed Ranker #1189

Closed

This was referenced Nov 14, 2024

Add ranker to the fastembed deepset-ai/haystack-integrations#287

Merged

docs: Fastembed Ranker - update readme and fix example #1198

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Fastembed - add FastembedRanker #1178

feat: Fastembed - add FastembedRanker #1178

paulmartrencharpro commented Nov 12, 2024

anakin87 left a comment

anakin87 left a comment

paulmartrencharpro commented Nov 13, 2024

anakin87 left a comment

anakin87 commented Nov 13, 2024

feat: Fastembed - add FastembedRanker #1178

feat: Fastembed - add FastembedRanker #1178

Conversation

paulmartrencharpro commented Nov 12, 2024

Proposed Changes:

How did you test it?

Checklist

anakin87 left a comment

Choose a reason for hiding this comment

anakin87 left a comment

Choose a reason for hiding this comment

paulmartrencharpro commented Nov 13, 2024

anakin87 left a comment

Choose a reason for hiding this comment

anakin87 commented Nov 13, 2024