Long evaluation time using custom distance metric and token_embeddings for queries on retrieval task #1218

gnatesan · 2024-09-12T21:52:31Z

gnatesan
Sep 12, 2024

I am running retrieval evaluation on the HotpotQA dataset with the nq-distilbert-base-v1 sentence transformer model. I have created my own custom distance metric which takes token_embeddings (multivector) from the queries and sentence_embeddings (single-vector) from the documents and computes similarity scores. I am encountering an issue where the evaluation is taking an extremely long time (longer than 36 hours and still running on NVIDIA Tesla V100-PCIE-16GB GPU)

When I run a test script to examine the encoding and similarity score computation times with 16 queries and 16 documents from the HotpotQA test set here are the results. For reference, model2 is the nq-distilbert-base-v1 model fine-tuned using the custom distance metric as the similarity score and model1 is the original nq-distilbert-base-v1 model that uses cosine similarity.

**Model2 multivector queries Encoding took: 0.014986991882324219 seconds
Model2 single-vector queries Encoding took: 0.006419658660888672 seconds

Model1 multivector queries Encoding took: 0.015572547912597656 seconds
Model1 single-vector queries Encoding took: 0.006383657455444336 seconds

Model1 cosine similarity calculation using batch of queries (single-vector) and docs (single-vector) took: 0.0003333091735839844 seconds
Model2 custom similarity score calculation using batch of queries (multivector) and docs (single-vector) took: 0.05283713340759277 seconds**

Observing the results, it seems that the latency issue is due to the custom distance metric I am using for the similarity score calculation.

I think that the problem occurs due to the fact that for a batch of multivector queries the embedding size varies per query in the batch. Therefore, I cannot take the list of queries and convert that to a tensor unlike what happens in the cosine similarity implementation where a query is represented by a single vector. This prevents me from leveraging optimized matrix operations and requires a nested loop to compute pairwise distances manually for each query-document pair. Any feedback on this would be much appreciated.

KennethEnevoldsen · 2024-09-13T11:57:03Z

KennethEnevoldsen
Sep 13, 2024
Maintainer

How do you specify the mode that you pass to MTEB? This might help us track down a potential cause.

You might want to check out the using a custom model subsection in the documentation.

4 replies

gnatesan Sep 18, 2024
Author

After further investigating I see that the issue is in fact with trying to use a list of different sized token_embeddings after encoding the queries in the RetrievalEvaluator. I looked into padding and I see that the sentence-transformers encode() function which is called in the MTEB RetrievalEvaluator search() function via encode_queries() removes the padded values for the token_embeddings. Without these padded values, my custom distance metric computation between query and corpus embeddings takes a long time as I cannot store the embeddings in tensors and move them to the GPU. Would you recommend that I modify the encode() function of sentence-transformers to not remove the padded tokens from token_embeddings?

KennethEnevoldsen Sep 18, 2024
Maintainer

This seems like a good reason to make a wrapper, which can then later be turned into a PR if needed.

gnatesan Sep 24, 2024
Author

I think first I will proceed to evaluate my model on less data. Is there any built in functionality in MTEB to run evaluation on a subset of data? i.e. 5% of HotpotQA dev data

KennethEnevoldsen Sep 24, 2024
Maintainer

Not for retrieval no, but there will soon be a smaller version of some of the retrieval tasks (~1 week)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long evaluation time using custom distance metric and token_embeddings for queries on retrieval task #1218

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Long evaluation time using custom distance metric and token_embeddings for queries on retrieval task #1218

gnatesan Sep 12, 2024

Replies: 1 comment · 4 replies

KennethEnevoldsen Sep 13, 2024 Maintainer

gnatesan Sep 18, 2024 Author

KennethEnevoldsen Sep 18, 2024 Maintainer

gnatesan Sep 24, 2024 Author

KennethEnevoldsen Sep 24, 2024 Maintainer

gnatesan
Sep 12, 2024

Replies: 1 comment 4 replies

KennethEnevoldsen
Sep 13, 2024
Maintainer

gnatesan Sep 18, 2024
Author

KennethEnevoldsen Sep 18, 2024
Maintainer

gnatesan Sep 24, 2024
Author

KennethEnevoldsen Sep 24, 2024
Maintainer