Skip to content

Latest commit

 

History

History

eval

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Evaluating the pre-quantized LLM models

The table below includes the checkpoints for each models:

Model Quantization CKPT WikiText ARC-C Hellaswag MMLU
TinyLlaMA-1.1B-v1.0-Chat W8A8 ckpt 15.5 31.9 59.2 25.0
TinyLlaMA-1.1B-v1.0-Chat W4A8 ckpt 17.1 32.3 57.0 25.5
StableLM-2-1.6B W8A8 ckpt 29.7 37.1 63.6 30.0
StableLM-2-1.6B W4A8 ckpt 33.6 35.6 60.5 24.1
Gemma-2B W8A8 ckpt 20.3 21.8 40.9 25.8
Gemma-2B W4A8 ckpt 21.4 23.0 38.9 25.6

Running the evaluation

  • Download the checkpoint
CUDA_VISIBLE_DEVICES=0 python eval/harness_eval.py \
      --tasks wikitext,arc_challenge,hellaswag,hendrycksTest*
      --mode custom --hf_path ${CKPT} --output_dir ${OUTPUT_DIR}