Skip to content

Commit

Permalink
update benchmark readme (#12323)
Browse files Browse the repository at this point in the history
* update benchmark readme

update new comment with memory usage included

* Update README.md
  • Loading branch information
lzivan authored Nov 5, 2024
1 parent e2adc97 commit 45b0d37
Showing 1 changed file with 17 additions and 7 deletions.
24 changes: 17 additions & 7 deletions python/llm/dev/benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,23 @@ with torch.inference_mode():
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
```

### Sample Output
```bash
=========First token cost xx.xxxxs and 3.595703125 GB=========
=========Rest tokens cost average xx.xxxxs (31 tokens in all) and 3.595703125 GB=========
```

You can also set `verbose = True`
```python
model = BenchmarkWrapper(model, do_print=True, verbose=True)
```

```bash
=========First token cost xx.xxxxs and 3.595703125 GB=========
=========Rest token cost average xx.xxxxs (31 tokens in all) and 3.595703125 GB=========
Peak memory for every token: [3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125]
```

### Inference on multi GPUs
Similarly, put this file into your benchmark directory, and then wrap your optimized model with `BenchmarkWrapper` (`model = BenchmarkWrapper(model)`).
For example, just need to apply following code patch on [Deepspeed Autotp example code](https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/Deepspeed-AutoTP/deepspeed_autotp.py) to calculate 1st and the rest token performance:
Expand All @@ -79,10 +96,3 @@ For example, just need to apply following code patch on [Deepspeed Autotp exampl
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
```

### Sample Output
Output will be like:
```bash
=========First token cost xx.xxxxs=========
=========Last token cost average xx.xxxxs (31 tokens in all)=========
```

0 comments on commit 45b0d37

Please sign in to comment.