Model | MMLU -Redux |
ZebraLogic | CRUX | MATH -L5 |
Average |
---|---|---|---|---|---|
o1-preview-2024-09-12 | 92.84 | 71.40 | 95.88 | 84.47 | 86.15 |
o1-mini-2024-09-12 | 86.72 | 52.60 | 93.75 | 89.32 | 80.60 |
claude-3-5-sonnet-20241022 | 88.91 | 36.20 | 83.88 | 59.36 | 67.09 |
gemini-1.5-pro-exp-0827 | 86.14 | 30.50 | 79.62 | 68.10 | 66.09 |
gpt-4o-2024-08-06 | 88.26 | 31.70 | 87.00 | 55.34 | 65.58 |
chatgpt-4o-latest-24-09-07 | 88.88 | 29.90 | 86.50 | 53.12 | 64.60 |
gpt-4o-2024-05-13 | 88.01 | 28.20 | 86.12 | 54.79 | 64.28 |
claude-3-5-sonnet-20240620 | 86.00 | 33.40 | 80.75 | 51.87 | 63.01 |
Qwen2.5-72B-Instruct | 85.57 | 26.60 | 73.88 | 60.19 | 61.56 |
Llama-3.1-405B-Inst@sambanova | 86.21 | 30.10 | 73.00 | 49.79 | 59.77 |
gpt-4-turbo-2024-04-09 | 85.31 | 28.40 | 78.88 | 46.46 | 59.76 |
gemini-1.5-flash-exp-0827 | 82.11 | 25.00 | 74.50 | 54.51 | 59.03 |
Mistral-Large-2 | 82.97 | 29.00 | 75.12 | 48.54 | 58.91 |
gpt-4o-mini-2024-07-18 | 81.50 | 20.10 | 75.88 | 52.15 | 57.41 |
deepseek-v2.5-0908 | 80.35 | 22.10 | 70.00 | 44.66 | 54.28 |
claude-3-opus-20240229 | 82.54 | 27.00 | 70.38 | 36.89 | 54.20 |
Meta-Llama-3.1-70B-Instruct | 82.97 | 24.90 | 64.25 | 43.13 | 53.81 |
claude-3-5-haiku-20241022 | 79.63 | 18.70 | 68.75 | 46.46 | 53.38 |
gemini-1.5-pro | 82.76 | 19.40 | 68.00 | 39.81 | 52.49 |
gpt-4-0314 | 81.64 | 27.10 | 74.50 | 26.07 | 52.33 |
Qwen2-72B-Instruct | 81.61 | 21.40 | 59.13 | 38.28 | 50.10 |
gemini-1.5-flash | 77.36 | 19.40 | 63.75 | 34.81 | 48.83 |
Qwen2.5-7B-Instruct | 75.13 | 12.00 | 52.75 | 51.46 | 47.84 |
Meta-Llama-3-70B-Instruct | 78.01 | 16.80 | 58.88 | 25.10 | 44.70 |
gemma-2-27b-it | 75.67 | 16.30 | 57.25 | 26.63 | 43.96 |
Athene-70B | 76.64 | 16.70 | 50.62 | 20.67 | 41.16 |
reka-core-20240501 | 76.42 | 13.00 | 46.25 | 21.91 | 39.40 |
claude-3-haiku-20240307 | 72.32 | 14.30 | 54.75 | 15.12 | 39.12 |
gemma-2-9b-it | 72.82 | 12.80 | 46.00 | 19.42 | 37.76 |
gpt-3.5-turbo-0125 | 68.36 | 10.10 | 54.75 | 13.73 | 36.73 |
Yi-1.5-34B-Chat | 72.79 | 11.50 | 44.12 | 18.17 | 36.64 |
Phi-3-mini-4k-instruct | 70.34 | 11.60 | 44.75 | 16.23 | 35.73 |
Meta-Llama-3.1-8B-Instruct | 67.24 | 12.80 | 39.88 | 22.19 | 35.53 |
Qwen2-7B-Instruct | 66.92 | 8.40 | 37.88 | 23.86 | 34.27 |
Phi-3.5-mini-instruct | 67.67 | 6.40 | 42.12 | 18.72 | 33.73 |
Yi-1.5-9B-Chat | 65.05 | 2.30 | 44.75 | 19.97 | 33.02 |
Qwen2.5-3B-Instruct | 64.25 | 4.80 | 33.12 | 25.52 | 31.92 |
Meta-Llama-3-8B-Instruct | 61.66 | 11.90 | 37.75 | 7.91 | 29.80 |
gemma-2-2b-it | 51.94 | 4.20 | 21.50 | 4.30 | 20.48 |