-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark outcomes record #392
Conversation
…benchmark_outcomes_record # Conflicts: # tests/test_record.py
Make modelgauge's notion of a SUT know how to instantiate itself and cache the instance used, so that the initalization info is available later.
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
…re accurate. Add `run_uid`. Remove some duplication in the JSON. Add JSON output to normal benchmark run.
Ok, @bkorycki and @dhosterman, this is actually ready for final review now. |
Make modelgauge's notion of a SUT know how to instantiate itself and cache the instance used, so that the initalization info is available later.
…re accurate. Add `run_uid`. Remove some duplication in the JSON. Add JSON output to normal benchmark run.
…benchmark_outcomes_record
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
@@ -55,6 +55,18 @@ class ModelGaugeSut(SutDescription, Enum): | |||
WIZARDLM_13B = "wizardlm-13b", "WizardLM v1.2 (13B)", TogetherChatSUT, "WizardLM/WizardLM-13B-V1.2" | |||
# YI_34B_CHAT = "yi-34b", "01-ai Yi Chat (34B)", TogetherChatSUT, "zero-one-ai/Yi-34B-Chat" | |||
|
|||
def instance(self, secrets): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need these methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving instance creation here will let me unify duplicate code, and it gives me a place to cache the instance actually used for the run, which is needed to dump out the outcome JSON.
src/modelbench/uid.py
Outdated
import casefy | ||
|
||
|
||
class HasUid: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments about this in the other PR!
Make modelgauge's notion of a SUT know how to instantiate itself and cache the instance used, so that the initalization info is available later.
…re accurate. Add `run_uid`. Remove some duplication in the JSON. Add JSON output to normal benchmark run.
…benchmark_outcomes_record
Ok @dhosterman and @bkorycki, I think I have resolved all the outstanding issues and requests on this one. |
This works great so far, but it fails when attempting to use --anonymize. |
I also notice that in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great and I'm already using it! Thanks, William!
Produces a JSON version of the benchmark alongside the HTML files. Not sure this is totally right; Looking forward to feedback on the format.