eval_mmlu

Dataset

datasets/MMLU/. Test set containing 57 subtasks.

In every text, the content before [SEP] is the question, and the content after [SEP] is the standard answer to that question.

Evaluation

Introduction

examples/eval_mmlu_2x32B.sh. The evaluation results for mmlu could be obtained by running this program.

The variables in the code should be set as follows:

Variable name	Description
`CHECKPOINT_PATH`	the path that saves the checkpoint to be evaluated.
`TOKENIZER_MODEL_PATH`	the path that saves the tokenizer.
`MATH_DATA`	the path that saves the evaluation set.
`OUTPUT_PATH`	the path that saves the evaluation results.

Usage

Run the following command to evaluate the model's performance on the test set:

bash -x examples/eval_mmlu_2x32B.sh

Result

The evaluation result will be saved in the path of OUTPUT_PATH. In the text, the content before [SEP] is the question, and the content after [SEP] is the answer of our model to that question.

Accuracy

Introduction

tasks/MMLU/score_mmlu.py. The accuracy of evaluation results for mmlu could be obtained by running this program.

The path variables in the code should be set as follows:

Variable name	Description
`origin_file_path`	Path of evaluation set file.
`eval_file_path`	Path for saving the evaluation result file.
`txt_eval_res_dir`	Path for storing distinguished results. Files ending with _true contain correctly results, while those ending in _false contain incorrectly results. The CSV file and jsonl file store accuracy statistics for each subtask.

Usage

Run the following command to evaluate the model's performance on the test set:

python score_mmlu.py

Result

"Number of correct answers" and "Number of incorrect answers" respectively represent the total count of correct and incorrect answers for all tasks, while "accuracy" represents the overall accuracy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval_mmlu.md

eval_mmlu.md

eval_mmlu

Dataset

Evaluation

Introduction

Usage

Result

Accuracy

Introduction

Usage

Result

Files

eval_mmlu.md

Latest commit

History

eval_mmlu.md

File metadata and controls

eval_mmlu

Dataset

Evaluation

Introduction

Usage

Result

Accuracy

Introduction

Usage

Result