-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for intfloat/multilingual-e5-small request #153
Conversation
Failing tests. Please verify that the ONNX values do map to the Torch implementation first? You can try with some sample text and compare. |
32ad3e3
to
009fc66
Compare
updated |
Signed-off-by: Ashis Kumar Naik <[email protected]>
Signed-off-by: Ashis Kumar Naik <[email protected]>
009fc66
to
351c749
Compare
tests/test_text_onnx_embeddings.py
Outdated
@@ -16,6 +16,7 @@ | |||
"sentence-transformers/all-MiniLM-L6-v2": np.array([0.0259, 0.0058, 0.0114, 0.0380, -0.0233]), | |||
"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2": np.array([0.0094, 0.0184, 0.0328, 0.0072, -0.0351]), | |||
"intfloat/multilingual-e5-large": np.array([0.0098, 0.0045, 0.0066, -0.0354, 0.0070]), | |||
"intfloat/multilingual-e5-small": np.array([0.04931236, 0.02415175, -0.0384715, -0.08884481, 0.08710264]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the vector you get after normalization: https://colab.research.google.com/drive/1tNdV3DsiwsJzu2AXnUnoeF5av1Hp8HF1?usp=sharing
This looks like you've copied the vector from the ONNX test case and pasted here, please don't do that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is NOT Resolved. The test vector needs to come from a source implementation which we can trust. For instance, I've used Sentence Transformers — which takes care of the proper normalization as recommended by the model creators.
You've used the output from your own FastEmbed implementation — that's not the point of the test? How do you know that your implementation is correct — you don't.
In fact, the Colab notebook I've shared proved that this implementation is wrong.
Signed-off-by: Ashis Kumar Naik <[email protected]>
351c749
to
7c0d04c
Compare
@@ -130,6 +130,15 @@ | |||
"hf": "qdrant/gte-large-onnx", | |||
}, | |||
}, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to the e5_onnx file please?
@@ -16,6 +16,7 @@ | |||
"sentence-transformers/all-MiniLM-L6-v2": np.array([0.0259, 0.0058, 0.0114, 0.0380, -0.0233]), | |||
"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2": np.array([0.0094, 0.0184, 0.0328, 0.0072, -0.0351]), | |||
"intfloat/multilingual-e5-large": np.array([0.0098, 0.0045, 0.0066, -0.0354, 0.0070]), | |||
"intfloat/multilingual-e5-small": np.array([0.0493123643, 0.0241517499, -0.0384715013, -0.0888448060, 0.087102644]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see previous review comments
Hello! Thanks for taking the time for raising this PR. We'll try to support the model in case this PR doesn't go through. In the meanwhile, I'm closing this pull request for now because of inactivity. Please feel free to open it once you've resolved conflicts and made the requested changes. |
support for intfloat/multilingual-e5-small request
This PR resolves issue #123