Replies: 1 comment 4 replies
-
How do you specify the mode that you pass to MTEB? This might help us track down a potential cause. You might want to check out the using a custom model subsection in the documentation. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am running retrieval evaluation on the HotpotQA dataset with the nq-distilbert-base-v1 sentence transformer model. I have created my own custom distance metric which takes token_embeddings (multivector) from the queries and sentence_embeddings (single-vector) from the documents and computes similarity scores. I am encountering an issue where the evaluation is taking an extremely long time (longer than 36 hours and still running on NVIDIA Tesla V100-PCIE-16GB GPU)
When I run a test script to examine the encoding and similarity score computation times with 16 queries and 16 documents from the HotpotQA test set here are the results. For reference, model2 is the nq-distilbert-base-v1 model fine-tuned using the custom distance metric as the similarity score and model1 is the original nq-distilbert-base-v1 model that uses cosine similarity.
**Model2 multivector queries Encoding took: 0.014986991882324219 seconds
Model2 single-vector queries Encoding took: 0.006419658660888672 seconds
Model1 multivector queries Encoding took: 0.015572547912597656 seconds
Model1 single-vector queries Encoding took: 0.006383657455444336 seconds
Model1 cosine similarity calculation using batch of queries (single-vector) and docs (single-vector) took: 0.0003333091735839844 seconds
Model2 custom similarity score calculation using batch of queries (multivector) and docs (single-vector) took: 0.05283713340759277 seconds**
Observing the results, it seems that the latency issue is due to the custom distance metric I am using for the similarity score calculation.
I think that the problem occurs due to the fact that for a batch of multivector queries the embedding size varies per query in the batch. Therefore, I cannot take the list of queries and convert that to a tensor unlike what happens in the cosine similarity implementation where a query is represented by a single vector. This prevents me from leveraging optimized matrix operations and requires a nested loop to compute pairwise distances manually for each query-document pair. Any feedback on this would be much appreciated.
Beta Was this translation helpful? Give feedback.
All reactions