Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add truncation support in evaluators #2582

Merged
merged 3 commits into from
Apr 11, 2024
Merged

Add truncation support in evaluators #2582

merged 3 commits into from
Apr 11, 2024

Conversation

kddubey
Copy link
Contributor

@kddubey kddubey commented Apr 9, 2024

Hello,

This PR is a follow-up to #2573. As you suggested in this comment, a truncate_dim parameter makes it easy to construct a sequential evaluator.

How has this been tested?

It hasn't 🥴
I tested EmbeddingSimilarityEvaluator in #2573 by running a notebook offline. I'll think about how to test this change soon. Lmk what you think would make for a sufficient test, e.g., offline runs or actual tests in tests/test_evaluator.py.

@tomaarsen
Copy link
Collaborator

Hello!

Thanks a bunch for this PR! I've extended it in the following two ways:

  1. Add truncate_dim to ParaphraseMiningEvaluator. Although that evaluator doesn't directly use model.encode itself, it does call paraphrase_mining which uses model.encode under the hood. That way we can still support it.
  2. Add (truncated to ...) to the logging for the Evaluators. This way you can easily create 2 identical evaluators, but one truncated to a specific length, and it'll be clear which results are which in the logs. In practice, however, I would recommend using the name optional argument to specify the length, as that will eventually be included in the logs via [v3] Training refactor - MultiGPU, loss logging, bf16, etc. #2449
  • Tom Aarsen

@tomaarsen
Copy link
Collaborator

Additionally, I've extended the existing matryoshka NLI training scripts to show how to use the EmbeddingSimilarityEvaluator with truncate_dim to evaluate it on different embedding dimensionalities. The training logs will now include e.g.:

2024-04-10 12:37:25 - EmbeddingSimilarityEvaluator: Evaluating the model on the sts-dev-768 dataset in epoch 0 after 440 steps (truncated to 768):
2024-04-10 12:37:27 - Cosine-Similarity :       Pearson: 0.8425 Spearman: 0.8500
2024-04-10 12:37:27 - Manhattan-Distance:       Pearson: 0.8302 Spearman: 0.8286
2024-04-10 12:37:27 - Euclidean-Distance:       Pearson: 0.8317 Spearman: 0.8301
2024-04-10 12:37:27 - Dot-Product-Similarity:   Pearson: 0.7044 Spearman: 0.7021
2024-04-10 12:37:27 - EmbeddingSimilarityEvaluator: Evaluating the model on the sts-dev-512 dataset in epoch 0 after 440 steps (truncated to 512):
2024-04-10 12:37:29 - Cosine-Similarity :       Pearson: 0.8437 Spearman: 0.8511
2024-04-10 12:37:29 - Manhattan-Distance:       Pearson: 0.8302 Spearman: 0.8285
2024-04-10 12:37:29 - Euclidean-Distance:       Pearson: 0.8318 Spearman: 0.8299
2024-04-10 12:37:29 - Dot-Product-Similarity:   Pearson: 0.7209 Spearman: 0.7172
2024-04-10 12:37:29 - EmbeddingSimilarityEvaluator: Evaluating the model on the sts-dev-256 dataset in epoch 0 after 440 steps (truncated to 256):
2024-04-10 12:37:30 - Cosine-Similarity :       Pearson: 0.8393 Spearman: 0.8496
2024-04-10 12:37:30 - Manhattan-Distance:       Pearson: 0.8276 Spearman: 0.8263
2024-04-10 12:37:30 - Euclidean-Distance:       Pearson: 0.8287 Spearman: 0.8278
2024-04-10 12:37:30 - Dot-Product-Similarity:   Pearson: 0.7090 Spearman: 0.7086
2024-04-10 12:37:30 - EmbeddingSimilarityEvaluator: Evaluating the model on the sts-dev-128 dataset in epoch 0 after 440 steps (truncated to 128):
2024-04-10 12:37:32 - Cosine-Similarity :       Pearson: 0.8268 Spearman: 0.8398
2024-04-10 12:37:32 - Manhattan-Distance:       Pearson: 0.8227 Spearman: 0.8229
2024-04-10 12:37:32 - Euclidean-Distance:       Pearson: 0.8226 Spearman: 0.8232
2024-04-10 12:37:32 - Dot-Product-Similarity:   Pearson: 0.6628 Spearman: 0.6718
2024-04-10 12:37:32 - EmbeddingSimilarityEvaluator: Evaluating the model on the sts-dev-64 dataset in epoch 0 after 440 steps (truncated to 64):
2024-04-10 12:37:34 - Cosine-Similarity :       Pearson: 0.8168 Spearman: 0.8338
2024-04-10 12:37:34 - Manhattan-Distance:       Pearson: 0.8085 Spearman: 0.8110                                                                                     
2024-04-10 12:37:34 - Euclidean-Distance:       Pearson: 0.8101 Spearman: 0.8126                                                                                     
2024-04-10 12:37:34 - Dot-Product-Similarity:   Pearson: 0.6047 Spearman: 0.6165

(I considered using a separate PR for that, but I think this fits as well)

I'm planning on merging this today or early tomorrow, with the intention of using your Matryoshka-focused improvements as the headliner for a new v2.7.0 release. I've manually tested each of the evaluators by running an example training script for each of them, but in the long term we definitely want to add proper tests for them. For this PR, I think we're okay without.
I also want to note that #2449 will also update the evaluators: they will start returning dictionaries with metric names as the keys.

  • Tom Aarsen

@kddubey
Copy link
Contributor Author

kddubey commented Apr 10, 2024

Wow, great improvements, and thank you for testing the evaluators! Also very excited to see #2449 land

@tomaarsen
Copy link
Collaborator

Awesome! Also, as for the v2.7.0 release that I mentioned: I think I will postpone that for now.

I think this PR should be good to go now though! Thanks a bunch for setting this up :)

  • Tom Aarsen

@tomaarsen tomaarsen merged commit 99674c7 into UKPLab:master Apr 11, 2024
9 checks passed
@kddubey kddubey deleted the truncate-dim-evaluators branch April 11, 2024 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants