Skip to content

Latest commit

 

History

History
41 lines (41 loc) · 4.28 KB

summary.md

File metadata and controls

41 lines (41 loc) · 4.28 KB
Model MMLU
-Redux
ZebraLogic CRUX MATH
-L5
Average
o1-preview-2024-09-12 92.84 71.40 95.88 84.47 86.15
o1-mini-2024-09-12 86.72 52.60 93.75 89.32 80.60
claude-3-5-sonnet-20241022 88.91 36.20 83.88 59.36 67.09
gemini-1.5-pro-exp-0827 86.14 30.50 79.62 68.10 66.09
gpt-4o-2024-08-06 88.26 31.70 87.00 55.34 65.58
chatgpt-4o-latest-24-09-07 88.88 29.90 86.50 53.12 64.60
gpt-4o-2024-05-13 88.01 28.20 86.12 54.79 64.28
claude-3-5-sonnet-20240620 86.00 33.40 80.75 51.87 63.01
Qwen2.5-72B-Instruct 85.57 26.60 73.88 60.19 61.56
Llama-3.1-405B-Inst@sambanova 86.21 30.10 73.00 49.79 59.77
gpt-4-turbo-2024-04-09 85.31 28.40 78.88 46.46 59.76
gemini-1.5-flash-exp-0827 82.11 25.00 74.50 54.51 59.03
Mistral-Large-2 82.97 29.00 75.12 48.54 58.91
gpt-4o-mini-2024-07-18 81.50 20.10 75.88 52.15 57.41
deepseek-v2.5-0908 80.35 22.10 70.00 44.66 54.28
claude-3-opus-20240229 82.54 27.00 70.38 36.89 54.20
Meta-Llama-3.1-70B-Instruct 82.97 24.90 64.25 43.13 53.81
claude-3-5-haiku-20241022 79.63 18.70 68.75 46.46 53.38
gemini-1.5-pro 82.76 19.40 68.00 39.81 52.49
gpt-4-0314 81.64 27.10 74.50 26.07 52.33
Qwen2-72B-Instruct 81.61 21.40 59.13 38.28 50.10
gemini-1.5-flash 77.36 19.40 63.75 34.81 48.83
Qwen2.5-7B-Instruct 75.13 12.00 52.75 51.46 47.84
Meta-Llama-3-70B-Instruct 78.01 16.80 58.88 25.10 44.70
gemma-2-27b-it 75.67 16.30 57.25 26.63 43.96
Athene-70B 76.64 16.70 50.62 20.67 41.16
reka-core-20240501 76.42 13.00 46.25 21.91 39.40
claude-3-haiku-20240307 72.32 14.30 54.75 15.12 39.12
gemma-2-9b-it 72.82 12.80 46.00 19.42 37.76
gpt-3.5-turbo-0125 68.36 10.10 54.75 13.73 36.73
Yi-1.5-34B-Chat 72.79 11.50 44.12 18.17 36.64
Phi-3-mini-4k-instruct 70.34 11.60 44.75 16.23 35.73
Meta-Llama-3.1-8B-Instruct 67.24 12.80 39.88 22.19 35.53
Qwen2-7B-Instruct 66.92 8.40 37.88 23.86 34.27
Phi-3.5-mini-instruct 67.67 6.40 42.12 18.72 33.73
Yi-1.5-9B-Chat 65.05 2.30 44.75 19.97 33.02
Qwen2.5-3B-Instruct 64.25 4.80 33.12 25.52 31.92
Meta-Llama-3-8B-Instruct 61.66 11.90 37.75 7.91 29.80
gemma-2-2b-it 51.94 4.20 21.50 4.30 20.48