-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RAR-b results #7
Conversation
hi @orionw can you review this? 🙌 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Responding to the PR comments:
Is there any issue with duplicates? I am curious how this will work when there are duplicate results with different model revisions, but that may be a PR another time. It looks like you crossed it out, so it's okay now?
- Dragon-plus is asymmetric with dragon-plus-query-encoder and dragon-plus-context-encoder as different models (so different model revisions), so I have just named the folder dragon-plus for now with no model revisions. Any ideas for putting results of asymmetric models in general?
This seems great, DPR is similar.
Thanks for starting on the instruction-naming process! It's a good reminder that we're gonna need a big PR to rename all the instruction models, I'll make an issue |
Thanks for the review and opening the issue!
No duplicates at the moment. I think as long as the result files are under different model revision folders, it'll be fine? |
perfect, thanks!! |
Why do we need to rename them all? I thought the naming would be Default way the model is intended to be used On that one note - Why does this PR have |
Ah thought we were on the same page, sorry. Let me make a comment on the other issue to keep things in one place. |
I think as we discussed in the issue, BGE implementation uses the same instruction for all retrieval tasks, so it is more a non-instruction version for tasks that need to take in actual task-specific and example-specific instructions? |
Oh yes makes sense sorry for missing that! |
uploading RAR-b results (10 models for now)
All opensourced models were rerun with MTEB. Can confirm match exactly with own results. API models were from own implementation reformatted into MTEB formats.
Have currently differentiated w/o and w/ instruction setting on the task level, such asRARbMath
andRARbMathInstruct
but am happy to reformat them soon to differentiate on the model level following discussions in issues#1066 after we decide it!Note:
1. Results ofall-mpnet-base-v2
all-MiniLM-L6-v2
andGritLM-7B
were already in the repo. For the first two, I find that the results are the same with my run for w/o instruct, so I only addedf{Task}Instruct.json
(won't be conflicts). ForGritLM-7B
, the currentf{Task}.json
matches with myf{Task}Instruct.json
and will bring conflicts. So I added a temp folderresults/GritLM-7B/13f00a0e36500c80ce12870ea513846a066004af-temp
to upload the results for now just for the leaderboard.2. Have put a 0.0 in evaluation time for API models as don't have the numbers.
3. Dragon-plus is asymmetric with
dragon-plus-query-encoder
anddragon-plus-context-encoder
as different models (so different model revisions), so I have just named the folderdragon-plus
for now with no model revisions. Any ideas for putting results of asymmetric models in general?Updates:
Have reformatted into model-level differentiation according to issues#1066, while keeping the same task name for both.
e.g.,
all-mpnet-base-v2/revision_id/RARbCode
andall-mpnet-base-v2-instruct/revision_id/RARbCode
;GritLM-7B/revision_id/RARbCode
andGritLM-7B-noinstruct/revision_id/RARbCode
;