RAR-b results #7

gowitheflow-1998 · 2024-07-10T05:21:06Z

uploading RAR-b results (10 models for now)

All opensourced models were rerun with MTEB. Can confirm match exactly with own results. API models were from own implementation reformatted into MTEB formats.

Have currently differentiated w/o and w/ instruction setting on the task level, such as RARbMath and RARbMathInstruct but am happy to reformat them soon to differentiate on the model level following discussions in issues#1066 after we decide it!

Note:
1. Results of all-mpnet-base-v2 all-MiniLM-L6-v2 and GritLM-7B were already in the repo. For the first two, I find that the results are the same with my run for w/o instruct, so I only added f{Task}Instruct.json (won't be conflicts). For GritLM-7B, the current f{Task}.json matches with my f{Task}Instruct.json and will bring conflicts. So I added a temp folder results/GritLM-7B/13f00a0e36500c80ce12870ea513846a066004af-temp to upload the results for now just for the leaderboard.
2. Have put a 0.0 in evaluation time for API models as don't have the numbers.
3. Dragon-plus is asymmetric with dragon-plus-query-encoder and dragon-plus-context-encoder as different models (so different model revisions), so I have just named the folder dragon-plus for now with no model revisions. Any ideas for putting results of asymmetric models in general?

Updates:

Have reformatted into model-level differentiation according to issues#1066, while keeping the same task name for both.

e.g., all-mpnet-base-v2/revision_id/RARbCode and all-mpnet-base-v2-instruct/revision_id/RARbCode; GritLM-7B/revision_id/RARbCode and GritLM-7B-noinstruct/revision_id/RARbCode;

gowitheflow-1998 · 2024-07-10T22:44:01Z

hi @orionw can you review this? 🙌

orionw

LGTM! Responding to the PR comments:

Is there any issue with duplicates? I am curious how this will work when there are duplicate results with different model revisions, but that may be a PR another time. It looks like you crossed it out, so it's okay now?

Dragon-plus is asymmetric with dragon-plus-query-encoder and dragon-plus-context-encoder as different models (so different model revisions), so I have just named the folder dragon-plus for now with no model revisions. Any ideas for putting results of asymmetric models in general?

This seems great, DPR is similar.

orionw · 2024-07-11T02:04:16Z

Thanks for starting on the instruction-naming process! It's a good reminder that we're gonna need a big PR to rename all the instruction models, I'll make an issue

gowitheflow-1998 · 2024-07-11T16:33:06Z

Thanks for the review and opening the issue!

Is there any issue with duplicates? I am curious how this will work when there are duplicate results with different model revisions, but that may be a PR another time. It looks like you crossed it out, so it's okay now?

No duplicates at the moment. I think as long as the result files are under different model revision folders, it'll be fine?

orionw · 2024-07-11T16:59:36Z

perfect, thanks!!

Muennighoff · 2024-07-12T04:14:46Z

It's a good reminder that we're gonna need a big PR to rename all the instruction models, I'll make an issue

Why do we need to rename them all? I thought the naming would be

Default way the model is intended to be used model_name - If model is used in non-default way with instruction add -instruct, w/o instruction add -noinstruct? But if you prefer not leaving the default names as is, we can also consider renaming.

On that one note - Why does this PR have bge-base-en-v1.5-instruct? Isn't using the retrieval instruction the default way for bge base and thus the existing results in bge-base-en-v1.5 are with instruction? Sorry maybe I am missing something.

orionw · 2024-07-12T04:22:22Z

Ah thought we were on the same page, sorry. Let me make a comment on the other issue to keep things in one place.

gowitheflow-1998 · 2024-07-12T14:25:21Z

On that one note - Why does this PR have bge-base-en-v1.5-instruct? Isn't using the retrieval instruction the default way for bge base and thus the existing results in bge-base-en-v1.5 are with instruction? Sorry maybe I am missing something.

I think as we discussed in the issue, BGE implementation uses the same instruction for all retrieval tasks, so it is more a non-instruction version for tasks that need to take in actual task-specific and example-specific instructions?

Muennighoff · 2024-07-12T15:32:37Z

On that one note - Why does this PR have bge-base-en-v1.5-instruct? Isn't using the retrieval instruction the default way for bge base and thus the existing results in bge-base-en-v1.5 are with instruction? Sorry maybe I am missing something.

I think as we discussed in the issue, BGE implementation uses the same instruction for all retrieval tasks, so it is more a non-instruction version for tasks that need to take in actual task-specific and example-specific instructions?

Oh yes makes sense sorry for missing that!

gowitheflow-1998 added 4 commits July 10, 2024 05:53

rarb results

6676018

rarb results

99e288e

rarb results model-level diff

16dc091

rarb results model-level diff

a239c56

orionw approved these changes Jul 11, 2024

View reviewed changes

orionw mentioned this pull request Jul 11, 2024

Re-name models to clearly separate instruction vs non-instruction models #8

Closed

orionw merged commit 9b6b7ba into main Jul 11, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAR-b results #7

RAR-b results #7

gowitheflow-1998 commented Jul 10, 2024 •

edited

Loading

gowitheflow-1998 commented Jul 10, 2024

orionw left a comment

orionw commented Jul 11, 2024

gowitheflow-1998 commented Jul 11, 2024

orionw commented Jul 11, 2024

Muennighoff commented Jul 12, 2024

orionw commented Jul 12, 2024

gowitheflow-1998 commented Jul 12, 2024

Muennighoff commented Jul 12, 2024

RAR-b results #7

RAR-b results #7

Conversation

gowitheflow-1998 commented Jul 10, 2024 • edited Loading

gowitheflow-1998 commented Jul 10, 2024

orionw left a comment

Choose a reason for hiding this comment

orionw commented Jul 11, 2024

gowitheflow-1998 commented Jul 11, 2024

orionw commented Jul 11, 2024

Muennighoff commented Jul 12, 2024

orionw commented Jul 12, 2024

gowitheflow-1998 commented Jul 12, 2024

Muennighoff commented Jul 12, 2024

gowitheflow-1998 commented Jul 10, 2024 •

edited

Loading