-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Add namaa MrTydi reranking dataset #1573
fix: Add namaa MrTydi reranking dataset #1573
Conversation
Did your dataset add new data to the original |
I might be mistaken, but the Mrtydi dataset was included there for retrieval and not reranking. We basically took the test dataset from MrTydi and added 4-5 negatives to each query and positive (which the original doesn't have). For the formatting I relied on similar Reranking dataset structures. Here is also the results of the two models in the PR description |
Ah, yes. You are right |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metadata seems to be lacking a bit, I have suggested some updates.
7b9b3c9
into
embeddings-benchmark:main
Why this dataset:
1 - Add to the reranking tasks exclusively for arabic
2 - Utilize the test dataset for MrTydi with generated and human-evaluated negatives.
Checklist
make test
.make lint
.Adding datasets checklist
Reason for dataset addition: ...
mteb -m {model_name} -t {task_name}
command.cross-encoder/ms-marco-MiniLM-L-12-v2
cross-encoder/stsb-TinyBERT-L-4
self.stratified_subsampling() under dataset_transform()
make test
.make lint
.