Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update readme for ppl test #11865

Merged
merged 12 commits into from
Aug 20, 2024
20 changes: 11 additions & 9 deletions python/llm/dev/benchmark/perplexity/README.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,31 @@
# Perplexity
Perplexity (PPL) is one of the most common metrics for evaluating language models. This benchmark implementation is adapted from [transformers/perplexity](https://huggingface.co/docs/transformers/perplexity#perplexity-of-fixed-length-models) and [benchmark_patch_llm.py](https://github.com/insuhan/hyper-attn/blob/main/benchmark_patch_llm.py)

## Run on Wikitext

## Environment Preparation
```bash
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install datasets
cranechu0131 marked this conversation as resolved.
Show resolved Hide resolved
```
An example to run perplexity on wikitext:
```bash

python run_wikitext.py --model_path meta-llama/Meta-Llama-3-8B --dataset path=wikitext,name=wikitext-2-raw-v1 --precision sym_int4 --device xpu --stride 512 --max_length 4096
This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.

```bash
source /opt/intel/oneapi/setvars.sh
```

## Run on [THUDM/LongBench](https://github.com/THUDM/LongBench) dataset

## PPL Evaluation
### 1. Run on Wikitext
cranechu0131 marked this conversation as resolved.
Show resolved Hide resolved
An example to run perplexity on [wikitext](https://paperswithcode.com/dataset/wikitext-2):
```bash
pip install datasets
python run_wikitext.py --model_path meta-llama/Meta-Llama-3-8B --dataset path=wikitext,name=wikitext-2-raw-v1 --precision sym_int4 --device xpu --stride 512 --max_length 4096
```
### 2. Run on [THUDM/LongBench](https://github.com/THUDM/LongBench) dataset

An example to run perplexity on chatglm3-6b using the default Chinese datasets("multifieldqa_zh", "dureader", "vcsum", "lsht", "passage_retrieval_zh")
```bash
python run_longbench.py --model_path THUDM/chatglm3-6b --precisions float16 sym_int4 --device xpu --language zh
```


Notes:
- If you want to test model perplexity on a few selected datasets from the `LongBench` dataset, please use the format below.
```bash
Expand Down
Loading