ClassEval/classeval_evaluation at master · FudanSELab/ClassEval

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
evaluation.py		evaluation.py
path_util.py		path_util.py
run.sh		run.sh
test_pipeline.py		test_pipeline.py

README.md

Evaluation

To evaluate the models' performance on class-level code generation task, we provide evaluation.py. Navigate to the classeval_evaluation directory and run:

python evaluation.py --source_file_name model_output --eval_data ClassEval_data --greedy 1

--source_file_name: Specifies the filenames pertaining to model outputs.
--greedy: Specifies the sampling methods for generation. Permissible values:
- 0: Nucleus sampling.
- 1: Greedy sampling.
--eval_data: Refers to the benchmark data file, named ClassEval_data in the current rendition.

Our nucleus sampling encompasses 5 samples. During the evaluation, we restrict our scope to the results of pass@1, pass@3, and pass@5. If there's a need to generate a different sample size, it necessitates adjustments cal_metrics_pass_at_k(model_list, k, n) in the evaluation.py code, where k represents the 'k' in pass@k and n stands for the sample count.

Output

The evaluation results are systematically cataloged in the output/result directory:

pass_at_k_result.json: This file aggregates the pass@k metrics across all models' outputs.
detailed_result.json: Dive deep with a meticulous examination of every test case across models' outputs.

The runtime logs for every test case are in the log directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

classeval_evaluation

classeval_evaluation

README.md

Evaluation

Output

Files

classeval_evaluation

Directory actions

More options

Directory actions

More options

Latest commit

History

classeval_evaluation

Folders and files

parent directory

README.md

Evaluation

Output