add generation time metrics #613

pavel-esir · 2024-07-12T10:53:50Z

Added performance metrics and updated Readme with description how to use them
Added cpp and python sample for benchmarking

Sample to calculate and visualize performance metrics.

import openvino_genai as ov_genai
import tqdm
import pandas as pd
import matplotlib.pylab as pl

pipe = ov_genai.LLMPipeline('TinyLlama-1.1B-Chat-v1.0/')
config = ov_genai.GenerationConfig(max_new_tokens=15)
metrics_df = pd.DataFrame(columns=['batch_size', 'throughput', 'ttft', 'tpot', 'std_throughput', 'std_ttft', 'std_tpot'])

num_iter = 3
for batch_size in tqdm.tqdm([1, 2, 4, 16, 32, 64, 128]):
    prompts = ["The Sky is blue because"] * batch_size
    res = pipe.generate(prompts, config)
    metrics = res.perf_metrics
    
    for _ in range(num_iter - 1):
        res = pipe.generate(prompts, config)
        metrics += res.perf_metrics
    metrics_df = metrics_df._append({
        'throughput': metrics.get_throughput().mean, 'ttft': metrics.get_ttft().mean, 'tpot': metrics.get_tpot().mean,
        'std_throughput': metrics.get_throughput().std, 'std_ttft': metrics.get_ttft().std, 'std_tpot': metrics.get_tpot().std,
        'batch_size': batch_size, 
    }, ignore_index=True)

fig, axes = pl.subplots(nrows=3, ncols=1, figsize=(6, 8), sharex=True)

axes[0].plot(metrics_df['batch_size'], metrics_df['throughput'], '-o')
axes[1].plot(metrics_df['batch_size'], metrics_df['ttft'], '-o', )
axes[2].plot(metrics_df['batch_size'], metrics_df['tpot'], '-o')

axes[0].set_ylabel('Throughput'), axes[1].set_ylabel('TTFT'), axes[2].set_ylabel('TPOT')
axes[2].set_xlabel('Batch Size')
axes[0].grid(True), axes[1].grid(True), axes[2].grid(True)
pl.tight_layout()

ticket: CVS-132859

tests/python_tests/ov_genai_test_utils.py

src/cpp/src/group_beam_searcher.cpp

src/cpp/src/tokenizer.cpp

…oop for greedy sampling (openvinotoolkit#607) Searching for max element in a custom loop gives better performance than using std::max_element

Wovchena · 2024-07-22T11:20:39Z

src/cpp/src/perf_metrics.cpp

+namespace genai {
+
+float PerfMetrics::get_duration_ms(std::chrono::steady_clock::duration duration) {
+    return std::chrono::duration_cast<std::chrono::milliseconds>(duration).count();


Don't cast duration until you really need to use the value (for printing). Return duration itself. That ensures the best accuracy. When you divide a duration, change it's representation to float or doulbe. For example https://github.com/openvinotoolkit/openvino/blob/ffc135cb1240831411799bdb82ecac352c956f22/samples/cpp/benchmark/throughput_benchmark/main.cpp#L19. But your implementation needs an extra step: when mean is computed keep using source units (most likely nanoseconds, but it's unspecified, you can't rely on that) with float or double representation. Cast the duration to Ms or any other suitable unit to use count() and print.

Done. Now durations are stored in microsecond chrono::duration<float, std::ratio<1, 1000000>> for better accuracy. If i store them in ms then tokenization/detokenization times can sometimes be 0, with microseconds.

I convert them only when mean/std are calculated.

You shouldn't store duration as any of the specified units. Store whatever time_point_one - time_point_zero returns. There's also a cast to int representation in constructors. For example

auto start_time = std::chrono::steady_clock::now(); m_pimpl = std::make_unique<StatefulLLMPipeline>(request, tokenizer, generation_config); auto stop_time = std::chrono::steady_clock::now(); m_pimpl->m_load_time_ms = std::chrono::duration_cast<std::chrono::milliseconds>(stop_time - start_time).count();

Which is bad in two ways: the first is described above, the second is that the used int representation in the cast. That's going to divide by 1000, because the default type is usually nanoseconds.

src/cpp/src/utils.hpp

…Readme

pavel-esir · 2024-07-23T21:00:18Z

now PR is final. @Wovchena please take a look

Wovchena · 2024-07-25T11:28:03Z

It got a conflict

…_counters

pavel-esir · 2024-07-26T08:07:56Z

It got a conflict

Resolved. Metrics match to llm_bench's numbers. Will open a separate PR to switch to native counters.

samples/python/benchmark_genai/benchmark_genai_automatic.py

src/README.md

Wovchena · 2024-07-26T10:18:08Z

src/README.md

@@ -196,6 +196,55 @@ int main(int argc, char* argv[]) {
 }
 ```

+### Performance Metrics


When it gets merged, please, open another PR adding it to C++ and Python docstrings.

src/README.md

src/cpp/src/greedy_decoding.cpp

Wovchena · 2024-07-26T10:28:52Z

src/cpp/src/perf_metrics.cpp

+    res.num_generated_tokens = num_generated_tokens + right.num_generated_tokens;
+    res.num_input_tokens = num_generated_tokens + right.num_input_tokens;
+    res.load_time = load_time;
+    res.evaluate_statistics();


evaluate_statistics() is called on every +. Given that it happens during benchmarking loop and most of the results are thrown away, it's worth providing a getter or a standalone function to do that job.

Added getters. Now in order to get user should call perf_metrics.get_tokenization_duration().mean if statistics are fresh and already evaluated then it will return values, if they are not fresh getter will call evaluate_statistics()

src/README.md

Comments were applied

pavel-esir force-pushed the add_perf_counters branch from 278b1b6 to e58ca47 Compare July 12, 2024 10:59

pavel-esir commented Jul 19, 2024

View reviewed changes

tests/python_tests/ov_genai_test_utils.py Outdated Show resolved Hide resolved

pavel-esir commented Jul 22, 2024

View reviewed changes

src/cpp/src/group_beam_searcher.cpp Outdated Show resolved Hide resolved

pavel-esir commented Jul 22, 2024

View reviewed changes

src/cpp/src/tokenizer.cpp Outdated Show resolved Hide resolved

pavel-esir changed the base branch from master to releases/2024/3 July 22, 2024 11:06

pavel-esir force-pushed the add_perf_counters branch 2 times, most recently from 4d4942e to c680bb2 Compare July 22, 2024 11:10

mzegla and others added 3 commits July 22, 2024 13:17

[Continuous batching] Replace standard max_element call with custom l…

cb100cb

…oop for greedy sampling (openvinotoolkit#607) Searching for max element in a custom loop gives better performance than using std::max_element

wip

f0e4190

add detokenization metric; refactor split to perf_conter & perf_metrics

7cab496

Wovchena reviewed Jul 22, 2024

View reviewed changes

pavel-esir force-pushed the add_perf_counters branch from c680bb2 to bb1113c Compare July 22, 2024 12:01

refactor structure, add python sample

bb1113c

pavel-esir commented Jul 22, 2024

View reviewed changes

src/cpp/src/utils.hpp Outdated Show resolved Hide resolved

pavel-esir force-pushed the add_perf_counters branch from 9765729 to 0a8f0d9 Compare July 22, 2024 15:18

pavel-esir added 2 commits July 22, 2024 17:28

add more preicise durations

0a8f0d9

add cpp Readme, ensured correct batch processing, add PerfMetrics to …

90320f4

…Readme

pavel-esir marked this pull request as ready for review July 23, 2024 20:53

pavel-esir requested a review from Wovchena July 23, 2024 20:53

pavel-esir marked this pull request as draft July 23, 2024 20:54

pavel-esir marked this pull request as ready for review July 23, 2024 21:00

use MeanStdPair

aeec730

Merge remote-tracking branch 'upstream/releases/2024/3' into add_perf…

c45aed5

…_counters

resolve conflicts

be2fdaf

Wovchena previously requested changes Jul 26, 2024

View reviewed changes

pavel-esir added 2 commits July 26, 2024 14:05

apply comments

b00bcd8

uset getter and cache evaluate results

60e7188

pavel-esir requested a review from Wovchena July 26, 2024 13:04

update Readme's

e553ef5

Wovchena enabled auto-merge July 26, 2024 13:56

Wovchena approved these changes Jul 26, 2024

View reviewed changes

Wovchena added this pull request to the merge queue Jul 26, 2024

andrei-kochin removed this pull request from the merge queue due to a manual request Jul 26, 2024

andrei-kochin merged commit 102f00a into openvinotoolkit:releases/2024/3 Jul 26, 2024
26 of 27 checks passed

pavel-esir deleted the add_perf_counters branch July 29, 2024 07:10

ilya-lavrenov self-assigned this Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add generation time metrics #613

add generation time metrics #613

pavel-esir commented Jul 12, 2024 •

edited

Loading

Wovchena Jul 22, 2024 •

edited

Loading

pavel-esir Jul 23, 2024

Wovchena Jul 26, 2024

pavel-esir commented Jul 23, 2024

Wovchena commented Jul 25, 2024

pavel-esir commented Jul 26, 2024

Wovchena Jul 26, 2024

pavel-esir Jul 31, 2024

Wovchena Jul 26, 2024

pavel-esir Jul 26, 2024

add generation time metrics #613

add generation time metrics #613

Conversation

pavel-esir commented Jul 12, 2024 • edited Loading

Wovchena Jul 22, 2024 • edited Loading

Choose a reason for hiding this comment

pavel-esir Jul 23, 2024

Choose a reason for hiding this comment

Wovchena Jul 26, 2024

Choose a reason for hiding this comment

pavel-esir commented Jul 23, 2024

Wovchena commented Jul 25, 2024

pavel-esir commented Jul 26, 2024

Wovchena Jul 26, 2024

Choose a reason for hiding this comment

pavel-esir Jul 31, 2024

Choose a reason for hiding this comment

Wovchena Jul 26, 2024

Choose a reason for hiding this comment

pavel-esir Jul 26, 2024

Choose a reason for hiding this comment

pavel-esir commented Jul 12, 2024 •

edited

Loading

Wovchena Jul 22, 2024 •

edited

Loading