Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch reference SUT from GPT2 to Pythia 70m #114

Merged
merged 1 commit into from
Feb 23, 2024
Merged

Conversation

wpietri
Copy link
Contributor

@wpietri wpietri commented Feb 22, 2024

In service of #113.

@wpietri wpietri requested a review from dhosterman February 22, 2024 01:26
Copy link

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

harms = GeneralChatBotBenchmarkDefinition().harms()
harm_scores = run_tests(harms, reference_sut, 45)
harm_scores = run_tests(harms, reference_sut, 100)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the increase in number of items here specific to pythia-70m or is there some other reason for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This increase happened because Pythia was causing fewer blowups, presumably because it generates fewer empty responses. Long term I'll shoot for something like 1000 examples per test in hopes that will give us more stable and reliable calibrations.

Copy link
Collaborator

@dhosterman dhosterman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I just added a question to satisfy my curiosity.

@wpietri wpietri merged commit 1efe3ab into main Feb 23, 2024
2 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Feb 23, 2024
@wpietri wpietri deleted the switch-reference-sut branch March 4, 2024 14:56
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants