Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark outcomes record #392

Merged
merged 41 commits into from
Jul 24, 2024
Merged

Benchmark outcomes record #392

merged 41 commits into from
Jul 24, 2024

Conversation

wpietri
Copy link
Contributor

@wpietri wpietri commented Jul 16, 2024

Produces a JSON version of the benchmark alongside the HTML files. Not sure this is totally right; Looking forward to feedback on the format.

@wpietri wpietri requested review from dhosterman and bkorycki July 16, 2024 23:09
@wpietri wpietri requested a review from a team as a code owner July 16, 2024 23:09
Copy link

github-actions bot commented Jul 16, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@wpietri
Copy link
Contributor Author

wpietri commented Jul 17, 2024

Ok, @bkorycki and @dhosterman, this is actually ready for final review now.

Copy link
Contributor

@bkorycki bkorycki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

src/modelbench/hazards.py Show resolved Hide resolved
src/modelbench/run.py Show resolved Hide resolved
src/modelbench/run.py Show resolved Hide resolved
src/modelbench/run.py Show resolved Hide resolved
@@ -55,6 +55,18 @@ class ModelGaugeSut(SutDescription, Enum):
WIZARDLM_13B = "wizardlm-13b", "WizardLM v1.2 (13B)", TogetherChatSUT, "WizardLM/WizardLM-13B-V1.2"
# YI_34B_CHAT = "yi-34b", "01-ai Yi Chat (34B)", TogetherChatSUT, "zero-one-ai/Yi-34B-Chat"

def instance(self, secrets):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need these methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving instance creation here will let me unify duplicate code, and it gives me a place to cache the instance actually used for the run, which is needed to dump out the outcome JSON.

import casefy


class HasUid:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments about this in the other PR!

src/modelbench/record.py Outdated Show resolved Hide resolved
wpietri added 3 commits July 18, 2024 17:37
Make modelgauge's notion of a SUT know how to instantiate itself and cache the instance used, so that the initalization info is available later.
@wpietri wpietri requested a review from bkorycki July 19, 2024 14:37
@wpietri
Copy link
Contributor Author

wpietri commented Jul 19, 2024

Ok @dhosterman and @bkorycki, I think I have resolved all the outstanding issues and requests on this one.

@dhosterman
Copy link
Collaborator

This works great so far, but it fails when attempting to use --anonymize.

@dhosterman
Copy link
Collaborator

I also notice that in the content data, we have a uid for the benchmark that is different than the uid in the benchmark data. We might want to go through and make sure that those things are aligned, as well as the versions.

Copy link
Collaborator

@dhosterman dhosterman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great and I'm already using it! Thanks, William!

@wpietri wpietri merged commit 90ad71c into main Jul 24, 2024
4 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jul 24, 2024
@wpietri wpietri deleted the benchmark_outcomes_record branch September 30, 2024 12:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants