Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for intfloat/multilingual-e5-small request #153

Closed
wants to merge 3 commits into from

Conversation

ashiskumarnaik
Copy link

support for intfloat/multilingual-e5-small request
This PR resolves issue #123

@NirantK
Copy link
Contributor

NirantK commented Mar 15, 2024

Failing tests. Please verify that the ONNX values do map to the Torch implementation first? You can try with some sample text and compare.

@ashiskumarnaik
Copy link
Author

updated

fastembed/text/onnx_embedding.py Show resolved Hide resolved
@@ -16,6 +16,7 @@
"sentence-transformers/all-MiniLM-L6-v2": np.array([0.0259, 0.0058, 0.0114, 0.0380, -0.0233]),
"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2": np.array([0.0094, 0.0184, 0.0328, 0.0072, -0.0351]),
"intfloat/multilingual-e5-large": np.array([0.0098, 0.0045, 0.0066, -0.0354, 0.0070]),
"intfloat/multilingual-e5-small": np.array([0.04931236, 0.02415175, -0.0384715, -0.08884481, 0.08710264]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the vector you get after normalization: https://colab.research.google.com/drive/1tNdV3DsiwsJzu2AXnUnoeF5av1Hp8HF1?usp=sharing

This looks like you've copied the vector from the ONNX test case and pasted here, please don't do that!

Copy link
Contributor

@NirantK NirantK Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is NOT Resolved. The test vector needs to come from a source implementation which we can trust. For instance, I've used Sentence Transformers — which takes care of the proper normalization as recommended by the model creators.

You've used the output from your own FastEmbed implementation — that's not the point of the test? How do you know that your implementation is correct — you don't.

In fact, the Colab notebook I've shared proved that this implementation is wrong.

@ashiskumarnaik
Copy link
Author

ashiskumarnaik commented Mar 16, 2024

Screenshot 2024-03-16 at 8 27 12 PM Hi I am adding this vector values , I ran the test with this first five values and tests are passed. Screenshot 2024-03-16 at 11 00 51 PM vector values: [0.0493123643, 0.0241517499, -0.0384715013, -0.0888448060, 0.087102644] , got after running locally

@@ -130,6 +130,15 @@
"hf": "qdrant/gte-large-onnx",
},
},
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to the e5_onnx file please?

@@ -16,6 +16,7 @@
"sentence-transformers/all-MiniLM-L6-v2": np.array([0.0259, 0.0058, 0.0114, 0.0380, -0.0233]),
"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2": np.array([0.0094, 0.0184, 0.0328, 0.0072, -0.0351]),
"intfloat/multilingual-e5-large": np.array([0.0098, 0.0045, 0.0066, -0.0354, 0.0070]),
"intfloat/multilingual-e5-small": np.array([0.0493123643, 0.0241517499, -0.0384715013, -0.0888448060, 0.087102644]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see previous review comments

@NirantK
Copy link
Contributor

NirantK commented Mar 26, 2024

Hello!

Thanks for taking the time for raising this PR. We'll try to support the model in case this PR doesn't go through.

In the meanwhile, I'm closing this pull request for now because of inactivity. Please feel free to open it once you've resolved conflicts and made the requested changes.

@NirantK NirantK closed this Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants