You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I observed inconsistent accuracies on the minitest set. Specifically, I got acc_average values of 49.29 for gpt-3.5-turbo and 46.93 for Llama-2-7b, while gpt-3.5's reported test set accuracy is 79.93.
Upon analyzing the "true_false" values in chameleon_chatgpt_test_cache.jsonl with matching pids in minitest set, I calculated an accuracy of 0.7948.
Could you help to clarify this discrepancy or share your minitest evaluation results, if available?
The text was updated successfully, but these errors were encountered:
jameszhou-gl
changed the title
Discrepancy in Accuracy on Minitest Set for GPT-3.5
Discrepancy in Accuracy on minitest set for GPT-3.5
Oct 8, 2023
jameszhou-gl
changed the title
Discrepancy in Accuracy on minitest set for GPT-3.5
Discrepancy in accuracy on minitest set for gpt-3.5-turbo
Oct 8, 2023
Hi @lupantech, thank you for your excellent work.
I observed inconsistent accuracies on the minitest set. Specifically, I got acc_average values of 49.29 for gpt-3.5-turbo and 46.93 for Llama-2-7b, while gpt-3.5's reported test set accuracy is 79.93.
Upon analyzing the "true_false" values in chameleon_chatgpt_test_cache.jsonl with matching pids in minitest set, I calculated an accuracy of 0.7948.
Could you help to clarify this discrepancy or share your minitest evaluation results, if available?
The text was updated successfully, but these errors were encountered: