support for intfloat/multilingual-e5-small request #153

ashiskumarnaik · 2024-03-14T16:14:52Z

support for intfloat/multilingual-e5-small request
This PR resolves issue #123

NirantK · 2024-03-15T02:19:27Z

Failing tests. Please verify that the ONNX values do map to the Torch implementation first? You can try with some sample text and compare.

ashiskumarnaik · 2024-03-15T16:10:42Z

updated

Signed-off-by: Ashis Kumar Naik <[email protected]>

fastembed/text/onnx_embedding.py

NirantK · 2024-03-16T04:17:04Z

tests/test_text_onnx_embeddings.py

@@ -16,6 +16,7 @@
    "sentence-transformers/all-MiniLM-L6-v2": np.array([0.0259, 0.0058, 0.0114, 0.0380, -0.0233]),
    "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2": np.array([0.0094, 0.0184, 0.0328, 0.0072, -0.0351]),
    "intfloat/multilingual-e5-large": np.array([0.0098, 0.0045, 0.0066, -0.0354, 0.0070]),
+    "intfloat/multilingual-e5-small": np.array([0.04931236, 0.02415175, -0.0384715, -0.08884481, 0.08710264]),


This is not the vector you get after normalization: https://colab.research.google.com/drive/1tNdV3DsiwsJzu2AXnUnoeF5av1Hp8HF1?usp=sharing

This looks like you've copied the vector from the ONNX test case and pasted here, please don't do that!

This is NOT Resolved. The test vector needs to come from a source implementation which we can trust. For instance, I've used Sentence Transformers — which takes care of the proper normalization as recommended by the model creators.

You've used the output from your own FastEmbed implementation — that's not the point of the test? How do you know that your implementation is correct — you don't.

In fact, the Colab notebook I've shared proved that this implementation is wrong.

ashiskumarnaik · 2024-03-16T17:37:49Z

Hi I am adding this vector values , I ran the test with this first five values and tests are passed.

vector values: [0.0493123643, 0.0241517499, -0.0384715013, -0.0888448060, 0.087102644] , got after running locally

Signed-off-by: Ashis Kumar Naik <[email protected]>

NirantK · 2024-03-18T01:16:28Z

fastembed/text/onnx_embedding.py

@@ -130,6 +130,15 @@
            "hf": "qdrant/gte-large-onnx",
        },
    },
+    {


Move this to the e5_onnx file please?

NirantK · 2024-03-18T01:17:01Z

tests/test_text_onnx_embeddings.py

@@ -16,6 +16,7 @@
    "sentence-transformers/all-MiniLM-L6-v2": np.array([0.0259, 0.0058, 0.0114, 0.0380, -0.0233]),
    "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2": np.array([0.0094, 0.0184, 0.0328, 0.0072, -0.0351]),
    "intfloat/multilingual-e5-large": np.array([0.0098, 0.0045, 0.0066, -0.0354, 0.0070]),
+    "intfloat/multilingual-e5-small": np.array([0.0493123643, 0.0241517499, -0.0384715013, -0.0888448060, 0.087102644]),


Please see previous review comments

NirantK · 2024-03-26T04:01:21Z

Hello!

Thanks for taking the time for raising this PR. We'll try to support the model in case this PR doesn't go through.

In the meanwhile, I'm closing this pull request for now because of inactivity. Please feel free to open it once you've resolved conflicts and made the requested changes.

ashiskumarnaik force-pushed the compile/dev branch from 32ad3e3 to 009fc66 Compare March 15, 2024 16:08

ashiskumarnaik added 2 commits March 15, 2024 21:52

feat:support for intfloat/multilingual-e5-small request

e0c5136

Signed-off-by: Ashis Kumar Naik <[email protected]>

docs:updated list of supported model

490415e

Signed-off-by: Ashis Kumar Naik <[email protected]>

ashiskumarnaik force-pushed the compile/dev branch from 009fc66 to 351c749 Compare March 15, 2024 16:27

NirantK suggested changes Mar 16, 2024

View reviewed changes

test:include intfloat/multilingual-e5-small request

7c0d04c

Signed-off-by: Ashis Kumar Naik <[email protected]>

ashiskumarnaik force-pushed the compile/dev branch from 351c749 to 7c0d04c Compare March 16, 2024 17:40

ashiskumarnaik requested a review from NirantK March 16, 2024 17:42

NirantK marked this pull request as draft March 18, 2024 01:15

NirantK suggested changes Mar 18, 2024

View reviewed changes

NirantK mentioned this pull request Mar 18, 2024

Move CONTRIBUTING.md + Add Test for Adding New Models #154

Merged

NirantK closed this Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for intfloat/multilingual-e5-small request #153

support for intfloat/multilingual-e5-small request #153

ashiskumarnaik commented Mar 14, 2024

NirantK commented Mar 15, 2024

ashiskumarnaik commented Mar 15, 2024

NirantK Mar 16, 2024

NirantK Mar 18, 2024 •

edited

Loading

ashiskumarnaik commented Mar 16, 2024 •

edited

Loading

NirantK Mar 18, 2024

NirantK Mar 18, 2024

NirantK commented Mar 26, 2024

support for intfloat/multilingual-e5-small request #153

support for intfloat/multilingual-e5-small request #153

Conversation

ashiskumarnaik commented Mar 14, 2024

NirantK commented Mar 15, 2024

ashiskumarnaik commented Mar 15, 2024

NirantK Mar 16, 2024

Choose a reason for hiding this comment

NirantK Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

ashiskumarnaik commented Mar 16, 2024 • edited Loading

NirantK Mar 18, 2024

Choose a reason for hiding this comment

NirantK Mar 18, 2024

Choose a reason for hiding this comment

NirantK commented Mar 26, 2024

NirantK Mar 18, 2024 •

edited

Loading

ashiskumarnaik commented Mar 16, 2024 •

edited

Loading