Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Integrate NanoBeIR datasets; use model.similarity by default in evaluators #2966

Merged
merged 22 commits into from
Oct 29, 2024

Conversation

ArthurCamara
Copy link
Contributor

As discussed in #2848 (comment), This PR adds a new Evaluator based on the NanoBEIR collection of datasets.

It creates one InformationRetrievalEvaluator for each dataset, and aggregates the results accordingly.

Example:

from sentence_transformers import SentenceTransformer
from sentence_transformers.evaluation import NanoBEIREvaluator

# Load a model
model = SentenceTransformer('all-mpnet-base-v2')

datasets = ["QuoraRetrieval", "MSMARCO"]
query_prompts = {
"QuoraRetrieval": "Instruct: Given a question, retrieve questions that are semantically equivalent to the given question\nQuery: ",
"MSMARCO": "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: "
}

evaluator = NanoBEIREvaluator(
dataset_names=datasets,
name="NanoBEIR",
query_prompts=query_prompts,
)

results = evaluator(model)
'''
NanoBEeIR Evaluation of the model on ['QuoraRetrieval', 'MSMARCO'] dataset:
Evaluating NanoBeIRNanoQuoraRetrieval
Evaluating NanoBeIRNanoMSMARCO

Average Queries: 50.0
Average Corpus: 5044.5

Aggregated for Score Function: cosine
Accuracy@1: 39.00%
Accuracy@3: 57.00%
Accuracy@5: 66.00%
Accuracy@10: 77.00%
Precision@1: 39.00%
Recall@1: 34.03%
Precision@3: 20.67%
Recall@3: 54.07%
Precision@5: 15.00%
Recall@5: 64.27%
Precision@10: 8.90%
Recall@10: 75.97%
MRR@10: 0.5004
NDCG@10: 0.5513
Aggregated for Score Function: dot
Accuracy@1: 39.00%
Accuracy@3: 57.00%
Accuracy@5: 66.00%
Accuracy@10: 77.00%
Precision@1: 39.00%
Recall@1: 34.03%
Precision@3: 20.67%
Recall@3: 54.07%
Precision@5: 15.00%
Recall@5: 64.27%
Precision@10: 8.90%
Recall@10: 75.97%
MRR@10: 0.5004
NDCG@10: 0.5513
'''
logger.info(evaluator.primary_metric)
# => "cosine_ndcg@10"
logger.info(results["mean"][evaluator.primary_metric])
# => 0.5512516989358924

(Note that this depends on #2951)

@tomaarsen
Copy link
Collaborator

Although the Be portion obviously stands for Benchmark, I think the abbreviated "BEIR" is usually fully capitalized, so I'd like to propagate that in this PR as well.

@tomaarsen
Copy link
Collaborator

tomaarsen commented Oct 17, 2024

I'm experimenting with having all outputs in the final dict, rather than a nested dict. This way, people can use any value from the evaluator to guide their e.g. early stopping. It should also match the SequentialEvaluator performance, even though the results from the NanoBEIR are now a bit hectic (i.e., one massive dict).

I hope it's okay if I push into this PR!

- Fix 'tokens' typo -> 'dimension' in model card
- Group multiple evaluators with the same output keys together.
- Fix edge case where datasets without languages are excluded in model card
- Truncate really really long texts in model card
- Make default similarity_fn_name "cosine" rather than None
@tomaarsen
Copy link
Collaborator

tomaarsen commented Oct 28, 2024

I've used this PR to address various other issues that I've had with evaluators:

Pull Request overview

  • Use model similarity function by default in the evaluators
  • Fix 'tokens' typo -> 'dimension' in model card
  • Group multiple evaluators with the same output keys together.
  • Fix edge case where datasets without languages are excluded in model card
  • Truncate really really long texts in model card
  • Make default similarity_fn_name "cosine" rather than None

  • Tom Aarsen

@tomaarsen tomaarsen changed the title [feat] Integrate NanoBeIR datasets [feat] Integrate NanoBeIR datasets; use model.similarity by default in evaluators Oct 28, 2024
@ArthurCamara
Copy link
Contributor Author

You are the best, @tomaarsen

@tomaarsen tomaarsen merged commit 210ea8b into UKPLab:master Oct 29, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants