Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy in accuracy on minitest set for gpt-3.5-turbo #8

Open
jameszhou-gl opened this issue Oct 8, 2023 · 0 comments
Open

Discrepancy in accuracy on minitest set for gpt-3.5-turbo #8

jameszhou-gl opened this issue Oct 8, 2023 · 0 comments

Comments

@jameszhou-gl
Copy link

Hi @lupantech, thank you for your excellent work.

I observed inconsistent accuracies on the minitest set. Specifically, I got acc_average values of 49.29 for gpt-3.5-turbo and 46.93 for Llama-2-7b, while gpt-3.5's reported test set accuracy is 79.93.

Upon analyzing the "true_false" values in chameleon_chatgpt_test_cache.jsonl with matching pids in minitest set, I calculated an accuracy of 0.7948.

Could you help to clarify this discrepancy or share your minitest evaluation results, if available?

@jameszhou-gl jameszhou-gl changed the title Discrepancy in Accuracy on Minitest Set for GPT-3.5 Discrepancy in Accuracy on minitest set for GPT-3.5 Oct 8, 2023
@jameszhou-gl jameszhou-gl changed the title Discrepancy in Accuracy on minitest set for GPT-3.5 Discrepancy in accuracy on minitest set for gpt-3.5-turbo Oct 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant