-
Notifications
You must be signed in to change notification settings - Fork 193
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add cpp Readme, ensured correct batch processing, add PerfMetrics to …
…Readme
- Loading branch information
1 parent
0a8f0d9
commit 90320f4
Showing
17 changed files
with
278 additions
and
115 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# Benchmarking Vanilla GenAI | ||
|
||
This sample script demonstrates how to benchmark an LLMModel in OpenVINO GenAI. The script includes functionality for warm-up iterations, generating text, and calculating various performance metrics. | ||
|
||
## Download and convert the model and tokenizers | ||
|
||
The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version. | ||
|
||
It's not required to install [../../requirements.txt](../../requirements.txt) for deployment if the model has already been exported. | ||
|
||
```sh | ||
pip install --upgrade-strategy eager -r ../../requirements.txt | ||
optimum-cli export openvino --trust-remote-code --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 TinyLlama-1.1B-Chat-v1.0 | ||
``` | ||
|
||
## Usage | ||
|
||
```sh | ||
benchmark_vanilla_genai [OPTIONS] | ||
``` | ||
|
||
### Options | ||
|
||
- `-m, --model`: Path to the model and tokenizers base directory. | ||
- `-p, --prompt` (default: `"The Sky is blue because"`): The prompt to generate text. | ||
- `-nw, --num_warmup` (default: `1`): Number of warmup iterations. | ||
- `-mt, --max_new_tokens` (default: `20`): Number of warmup iterations. | ||
- `-n, --num_iter` (default: `3`): Number of iterations. | ||
- `-d, --device` (default: `"CPU"`): Device to run the model on. | ||
|
||
### Output: | ||
|
||
``` | ||
benchmark_vanilla_genai -m TinyLlama-1.1B-Chat-v1.0 -n 10 | ||
``` | ||
|
||
``` | ||
Load time: 3405.69 ms | ||
Generate time: 1430.77 ± 3.04 ms | ||
Tokenization time: 0.51 ± 0.02 ms | ||
Detokenization time: 0.37 ± 0.01 ms | ||
TTFT: 81.60 ± 0.54 ms | ||
TPOT: 71.52 ± 2.72 ms | ||
Throughput tokens/s: 13.98 ± 0.53 | ||
``` | ||
|
||
For more information how performance metrics are calculated please follow [performance-metrics tutorial](../../../src/README.md#performance-metrics). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
62 changes: 62 additions & 0 deletions
62
samples/python/benchmark_genai/benchmark_genai_automatic.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# Copyright (C) 2023-2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
import argparse | ||
import openvino_genai as ov_genai | ||
import pdb | ||
|
||
def main(): | ||
parser = argparse.ArgumentParser(description="Help command") | ||
parser.add_argument("-m", "--model", type=str, help="Path to model and tokenizers base directory") | ||
parser.add_argument("-p", "--prompt", type=str, default="The Sky is blue because", help="Prompt") | ||
parser.add_argument("-nw", "--num_warmup", type=int, default=1, help="Number of warmup iterations") | ||
parser.add_argument("-n", "--num_iter", type=int, default=5, help="Number of iterations") | ||
parser.add_argument("-mt", "--max_new_tokens", type=int, default=20, help="Maximal number of new tokens") | ||
parser.add_argument("-d", "--device", type=str, default="CPU", help="Device") | ||
|
||
args = parser.parse_args() | ||
|
||
# Perf metrics is stored in DecodedResults. | ||
# In order to get DecodedResults instead of a string input should be a list. | ||
|
||
model_path = args.model | ||
device = args.device | ||
num_warmup = args.num_warmup | ||
num_iter = args.num_iter | ||
|
||
config = ov_genai.GenerationConfig() | ||
config.max_new_tokens = 20 | ||
# config.num_beam_groups = 3 | ||
# config.num_beams = 15 | ||
|
||
pipe = ov_genai.LLMPipeline(model_path, device) | ||
|
||
import pandas as pd | ||
metrics_df = pd.DataFrame(columns=['batch_size', 'throughput', 'ttft', 'tpot', 'std_throughput', 'std_ttft', 'std_tpot']) | ||
|
||
batch_sizes = [1, 2, 4, 16, 32, 64, 256] | ||
for batch_size in batch_sizes: | ||
prompt = [args.prompt] * batch_size | ||
for _ in range(num_warmup): | ||
pipe.generate(prompt, config) | ||
|
||
res = pipe.generate(prompt, config) | ||
metrics = res.metrics | ||
for _ in range(num_iter - 1): | ||
res = pipe.generate(prompt, config) | ||
metrics += res.metrics | ||
# pdb.set_trace() | ||
metrics_df = metrics_df._append({ | ||
'batch_size': batch_size, | ||
'throughput': metrics.mean_throughput, | ||
'ttft': metrics.mean_ttft, | ||
'tpot': metrics.mean_tpot, | ||
'std_throughput': metrics.std_throughput, | ||
'std_ttft': metrics.std_ttft, | ||
'std_tpot': metrics.std_tpot, | ||
}, ignore_index=True) | ||
|
||
metrics_df.to_csv('metrics.csv', index=False) | ||
|
||
if __name__ == "__main__": | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.