Add truncation support in evaluators #2582

kddubey · 2024-04-09T02:56:15Z

Hello,

This PR is a follow-up to #2573. As you suggested in this comment, a truncate_dim parameter makes it easy to construct a sequential evaluator.

How has this been tested?

It hasn't 🥴
I tested EmbeddingSimilarityEvaluator in #2573 by running a notebook offline. I'll think about how to test this change soon. Lmk what you think would make for a sufficient test, e.g., offline runs or actual tests in tests/test_evaluator.py.

sentence_transformers/evaluation/BinaryClassificationEvaluator.py

tomaarsen · 2024-04-10T10:30:02Z

Hello!

Thanks a bunch for this PR! I've extended it in the following two ways:

Add truncate_dim to ParaphraseMiningEvaluator. Although that evaluator doesn't directly use model.encode itself, it does call paraphrase_mining which uses model.encode under the hood. That way we can still support it.
Add (truncated to ...) to the logging for the Evaluators. This way you can easily create 2 identical evaluators, but one truncated to a specific length, and it'll be clear which results are which in the logs. In practice, however, I would recommend using the name optional argument to specify the length, as that will eventually be included in the logs via [v3] Training refactor - MultiGPU, loss logging, bf16, etc. #2449

Tom Aarsen

tomaarsen · 2024-04-10T11:32:19Z

Additionally, I've extended the existing matryoshka NLI training scripts to show how to use the EmbeddingSimilarityEvaluator with truncate_dim to evaluate it on different embedding dimensionalities. The training logs will now include e.g.:

2024-04-10 12:37:25 - EmbeddingSimilarityEvaluator: Evaluating the model on the sts-dev-768 dataset in epoch 0 after 440 steps (truncated to 768):
2024-04-10 12:37:27 - Cosine-Similarity :       Pearson: 0.8425 Spearman: 0.8500
2024-04-10 12:37:27 - Manhattan-Distance:       Pearson: 0.8302 Spearman: 0.8286
2024-04-10 12:37:27 - Euclidean-Distance:       Pearson: 0.8317 Spearman: 0.8301
2024-04-10 12:37:27 - Dot-Product-Similarity:   Pearson: 0.7044 Spearman: 0.7021
2024-04-10 12:37:27 - EmbeddingSimilarityEvaluator: Evaluating the model on the sts-dev-512 dataset in epoch 0 after 440 steps (truncated to 512):
2024-04-10 12:37:29 - Cosine-Similarity :       Pearson: 0.8437 Spearman: 0.8511
2024-04-10 12:37:29 - Manhattan-Distance:       Pearson: 0.8302 Spearman: 0.8285
2024-04-10 12:37:29 - Euclidean-Distance:       Pearson: 0.8318 Spearman: 0.8299
2024-04-10 12:37:29 - Dot-Product-Similarity:   Pearson: 0.7209 Spearman: 0.7172
2024-04-10 12:37:29 - EmbeddingSimilarityEvaluator: Evaluating the model on the sts-dev-256 dataset in epoch 0 after 440 steps (truncated to 256):
2024-04-10 12:37:30 - Cosine-Similarity :       Pearson: 0.8393 Spearman: 0.8496
2024-04-10 12:37:30 - Manhattan-Distance:       Pearson: 0.8276 Spearman: 0.8263
2024-04-10 12:37:30 - Euclidean-Distance:       Pearson: 0.8287 Spearman: 0.8278
2024-04-10 12:37:30 - Dot-Product-Similarity:   Pearson: 0.7090 Spearman: 0.7086
2024-04-10 12:37:30 - EmbeddingSimilarityEvaluator: Evaluating the model on the sts-dev-128 dataset in epoch 0 after 440 steps (truncated to 128):
2024-04-10 12:37:32 - Cosine-Similarity :       Pearson: 0.8268 Spearman: 0.8398
2024-04-10 12:37:32 - Manhattan-Distance:       Pearson: 0.8227 Spearman: 0.8229
2024-04-10 12:37:32 - Euclidean-Distance:       Pearson: 0.8226 Spearman: 0.8232
2024-04-10 12:37:32 - Dot-Product-Similarity:   Pearson: 0.6628 Spearman: 0.6718
2024-04-10 12:37:32 - EmbeddingSimilarityEvaluator: Evaluating the model on the sts-dev-64 dataset in epoch 0 after 440 steps (truncated to 64):
2024-04-10 12:37:34 - Cosine-Similarity :       Pearson: 0.8168 Spearman: 0.8338
2024-04-10 12:37:34 - Manhattan-Distance:       Pearson: 0.8085 Spearman: 0.8110                                                                                     
2024-04-10 12:37:34 - Euclidean-Distance:       Pearson: 0.8101 Spearman: 0.8126                                                                                     
2024-04-10 12:37:34 - Dot-Product-Similarity:   Pearson: 0.6047 Spearman: 0.6165

(I considered using a separate PR for that, but I think this fits as well)

I'm planning on merging this today or early tomorrow, with the intention of using your Matryoshka-focused improvements as the headliner for a new v2.7.0 release. I've manually tested each of the evaluators by running an example training script for each of them, but in the long term we definitely want to add proper tests for them. For this PR, I think we're okay without.
I also want to note that #2449 will also update the evaluators: they will start returning dictionaries with metric names as the keys.

Tom Aarsen

kddubey · 2024-04-10T18:58:07Z

Wow, great improvements, and thank you for testing the evaluators! Also very excited to see #2449 land

tomaarsen · 2024-04-11T07:01:52Z

Awesome! Also, as for the v2.7.0 release that I mentioned: I think I will postpone that for now.

I think this PR should be good to go now though! Thanks a bunch for setting this up :)

Tom Aarsen

Add truncation support in evaluators

654a0b8

kddubey commented Apr 9, 2024

View reviewed changes

sentence_transformers/evaluation/BinaryClassificationEvaluator.py Show resolved Hide resolved

kddubey commented Apr 9, 2024

View reviewed changes

sentence_transformers/evaluation/BinaryClassificationEvaluator.py Show resolved Hide resolved

Add truncation to logs; support truncation in ParaphraseMiningEvaluator

36bdd3e

Use Matryoshka evaluators for matryoshka training scripts

75dee69

tomaarsen merged commit 99674c7 into UKPLab:master Apr 11, 2024
9 checks passed

kddubey deleted the truncate-dim-evaluators branch April 11, 2024 07:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add truncation support in evaluators #2582

Add truncation support in evaluators #2582

kddubey commented Apr 9, 2024 •

edited

Loading

tomaarsen commented Apr 10, 2024

tomaarsen commented Apr 10, 2024

kddubey commented Apr 10, 2024

tomaarsen commented Apr 11, 2024

Add truncation support in evaluators #2582

Add truncation support in evaluators #2582

Conversation

kddubey commented Apr 9, 2024 • edited Loading

How has this been tested?

tomaarsen commented Apr 10, 2024

tomaarsen commented Apr 10, 2024

kddubey commented Apr 10, 2024

tomaarsen commented Apr 11, 2024

kddubey commented Apr 9, 2024 •

edited

Loading