Language Model Evaluation Harness

Basic Usage

Note: When reporting results from eval harness, please include the task versions (shown in results["versions"]) for reproducibility. This allows bug fixes to tasks while also ensuring that previously reported scores are reproducible. See the Task Versioning section for more info.

Hugging Face `transformers`

To evaluate a model hosted on the HuggingFace Hub (e.g. GPT-J-6B) on hellaswag you can use the following command:

python main.py \
    --model hf-causal \
    --model_args pretrained=EleutherAI/gpt-j-6B \
    --tasks hellaswag \
    --device cuda:0

Additional arguments can be provided to the model constructor using the --model_args flag. Most notably, this supports the common practice of using the revisions feature on the Hub to store partially trained checkpoints, or to specify the datatype for running a model:

python main.py \
    --model hf-causal \
    --model_args pretrained=EleutherAI/pythia-160m,revision=step100000,dtype="float" \
    --tasks lambada_openai,hellaswag \
    --device cuda:0

To evaluate models that are loaded via AutoSeq2SeqLM in Huggingface, you instead use hf-seq2seq. To evaluate (causal) models across multiple GPUs, use --model hf-causal-experimental

Warning: Choosing the wrong model may result in erroneous outputs despite not erroring.

Modified from

@software{eval-harness,
  author       = {Gao, Leo and
                  Tow, Jonathan and
                  Biderman, Stella and
                  Black, Sid and
                  DiPofi, Anthony and
                  Foster, Charles and
                  Golding, Laurence and
                  Hsu, Jeffrey and
                  McDonell, Kyle and
                  Muennighoff, Niklas and
                  Phang, Jason and
                  Reynolds, Laria and
                  Tang, Eric and
                  Thite, Anish and
                  Wang, Ben and
                  Wang, Kevin and
                  Zou, Andy},
  title        = {A framework for few-shot language model evaluation},
  month        = sep,
  year         = 2021,
  publisher    = {Zenodo},
  version      = {v0.0.1},
  doi          = {10.5281/zenodo.5371628},
  url          = {https://doi.org/10.5281/zenodo.5371628}
}

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
data		data
debug		debug
docs		docs
lm_eval		lm_eval
notebook		notebook
scripts		scripts
templates		templates
tests		tests
tools		tools
.gitignore		.gitignore
COM_ICL.md		COM_ICL.md
README.md		README.md
debug.sh		debug.sh
debug2.sh		debug2.sh
euler.sh		euler.sh
flip_label_cuda.sh		flip_label_cuda.sh
flip_label_cuda1.sh		flip_label_cuda1.sh
gather_result.sh		gather_result.sh
induc_cuda.sh		induc_cuda.sh
induc_cuda1.sh		induc_cuda1.sh
main.py		main.py
requirements.txt		requirements.txt
run_cuda.sh		run_cuda.sh
run_cuda0.sh		run_cuda0.sh
run_cuda1.sh		run_cuda1.sh
run_cuda2.sh		run_cuda2.sh
run_logic.sh		run_logic.sh
run_logic2.sh		run_logic2.sh
run_proof.sh		run_proof.sh
setup.py		setup.py
submit0.sh		submit0.sh
submit1.sh		submit1.sh
submit_euler.py		submit_euler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language Model Evaluation Harness

Basic Usage

Hugging Face `transformers`

Modified from

About

Releases

Packages

Languages

OliverXUZY/com_icl

Folders and files

Latest commit

History

Repository files navigation

Language Model Evaluation Harness

Basic Usage

Hugging Face transformers

Modified from

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Hugging Face `transformers`

Packages