Discrepancy in accuracy on minitest set for gpt-3.5-turbo #8

jameszhou-gl · 2023-10-08T07:07:51Z

Hi @lupantech, thank you for your excellent work.

I observed inconsistent accuracies on the minitest set. Specifically, I got acc_average values of 49.29 for gpt-3.5-turbo and 46.93 for Llama-2-7b, while gpt-3.5's reported test set accuracy is 79.93.

Upon analyzing the "true_false" values in chameleon_chatgpt_test_cache.jsonl with matching pids in minitest set, I calculated an accuracy of 0.7948.

Could you help to clarify this discrepancy or share your minitest evaluation results, if available?

jameszhou-gl changed the title ~~Discrepancy in Accuracy on Minitest Set for GPT-3.5~~ Discrepancy in Accuracy on minitest set for GPT-3.5 Oct 8, 2023

jameszhou-gl changed the title ~~Discrepancy in Accuracy on minitest set for GPT-3.5~~ Discrepancy in accuracy on minitest set for gpt-3.5-turbo Oct 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy in accuracy on minitest set for gpt-3.5-turbo #8

Discrepancy in accuracy on minitest set for gpt-3.5-turbo #8

jameszhou-gl commented Oct 8, 2023

Discrepancy in accuracy on minitest set for gpt-3.5-turbo #8

Discrepancy in accuracy on minitest set for gpt-3.5-turbo #8

Comments

jameszhou-gl commented Oct 8, 2023