-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add generation time metrics #613
add generation time metrics #613
Conversation
278b1b6
to
e58ca47
Compare
4d4942e
to
c680bb2
Compare
…oop for greedy sampling (openvinotoolkit#607) Searching for max element in a custom loop gives better performance than using std::max_element
src/cpp/src/perf_metrics.cpp
Outdated
namespace genai { | ||
|
||
float PerfMetrics::get_duration_ms(std::chrono::steady_clock::duration duration) { | ||
return std::chrono::duration_cast<std::chrono::milliseconds>(duration).count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't cast duration until you really need to use the value (for printing). Return duration itself. That ensures the best accuracy. When you divide a duration, change it's representation to float or doulbe. For example https://github.com/openvinotoolkit/openvino/blob/ffc135cb1240831411799bdb82ecac352c956f22/samples/cpp/benchmark/throughput_benchmark/main.cpp#L19. But your implementation needs an extra step: when mean is computed keep using source units (most likely nanoseconds, but it's unspecified, you can't rely on that) with float or double representation. Cast the duration to Ms
or any other suitable unit to use count()
and print.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Now durations are stored in microsecond chrono::duration<float, std::ratio<1, 1000000>>
for better accuracy. If i store them in ms then tokenization/detokenization times can sometimes be 0, with microseconds.
I convert them only when mean/std are calculated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't store duration as any of the specified units. Store whatever time_point_one - time_point_zero
returns. There's also a cast to int representation in constructors. For example
auto start_time = std::chrono::steady_clock::now();
m_pimpl = std::make_unique<StatefulLLMPipeline>(request, tokenizer, generation_config);
auto stop_time = std::chrono::steady_clock::now();
m_pimpl->m_load_time_ms = std::chrono::duration_cast<std::chrono::milliseconds>(stop_time - start_time).count();
Which is bad in two ways: the first is described above, the second is that the used int representation in the cast. That's going to divide by 1000, because the default type is usually nanoseconds.
c680bb2
to
bb1113c
Compare
9765729
to
0a8f0d9
Compare
now PR is final. @Wovchena please take a look |
It got a conflict |
@@ -196,6 +196,55 @@ int main(int argc, char* argv[]) { | |||
} | |||
``` | |||
|
|||
### Performance Metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When it gets merged, please, open another PR adding it to C++ and Python docstrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done #713
src/cpp/src/perf_metrics.cpp
Outdated
res.num_generated_tokens = num_generated_tokens + right.num_generated_tokens; | ||
res.num_input_tokens = num_generated_tokens + right.num_input_tokens; | ||
res.load_time = load_time; | ||
res.evaluate_statistics(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
evaluate_statistics() is called on every +. Given that it happens during benchmarking loop and most of the results are thrown away, it's worth providing a getter or a standalone function to do that job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added getters. Now in order to get user should call perf_metrics.get_tokenization_duration().mean
if statistics are fresh and already evaluated then it will return values, if they are not fresh getter will call evaluate_statistics()
102f00a
into
openvinotoolkit:releases/2024/3
Sample to calculate and visualize performance metrics.
ticket: CVS-132859