Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log model properties #677

Closed

Conversation

rasapala
Copy link
Contributor

Adding possibility to get and log compiled model properties.
Very useful for benchmarking and comparing settings on different hardware.

Example output:
Loaded model configuration:
AFFINITY: CORE
CPU_DENORMALS_OPTIMIZATION: NO
CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1
DYNAMIC_QUANTIZATION_GROUP_SIZE: 32
ENABLE_CPU_PINNING: YES
ENABLE_HYPER_THREADING: NO
EXECUTION_DEVICES: CPU
EXECUTION_MODE_HINT: PERFORMANCE
INFERENCE_NUM_THREADS: 24
INFERENCE_PRECISION_HINT: f32
KV_CACHE_PRECISION: f16
LOG_LEVEL: LOG_NONE
MODEL_DISTRIBUTION_POLICY:
NETWORK_NAME: Model0
NUM_STREAMS: 1
OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1
PERFORMANCE_HINT: LATENCY
PERFORMANCE_HINT_NUM_REQUESTS: 0
PERF_COUNT: YES
SCHEDULING_CORE_TYPE: ANY_CORE

@andrei-kochin andrei-kochin requested a review from Wovchena July 24, 2024 21:03
src/cpp/src/debug_utils.hpp Outdated Show resolved Hide resolved
src/cpp/src/continuous_batching_pipeline.cpp Outdated Show resolved Hide resolved
src/cpp/src/continuous_batching_pipeline.cpp Outdated Show resolved Hide resolved
@ilya-lavrenov ilya-lavrenov self-assigned this Jul 31, 2024
@@ -56,6 +56,10 @@ class OPENVINO_GENAI_EXPORTS ContinuousBatchingPipeline {

PipelineMetrics get_metrics() const;

std::vector<std::string> get_model_configuration();

void print_model_configuration();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that instead of adding debug API to public API, we can either:

  • Enable compiled time cmake option to print some debug information.
  • Or do it via environment variable. In this case we need to ensure that debug info handling does not affect performance.

It can be useful for timers which are currently profile each iteration / step. I vote for cmake option with debug information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved the functionality to utils as we want to have it every time we use cb_pipeline.

@ilya-lavrenov ilya-lavrenov marked this pull request as draft August 7, 2024 07:39
@rasapala rasapala force-pushed the log_model_properties2 branch from 951d09e to 9a1b7a9 Compare August 19, 2024 12:34
@rasapala rasapala marked this pull request as ready for review August 19, 2024 13:27
@@ -61,6 +61,8 @@ class OPENVINO_GENAI_EXPORTS ContinuousBatchingPipeline {
GenerationHandle add_request(uint64_t request_id, const ov::Tensor& input_ids, const ov::genai::GenerationConfig& sampling_params);
GenerationHandle add_request(uint64_t request_id, const std::string& prompt, const ov::genai::GenerationConfig& sampling_params);

std::string get_model_configuration_string();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm against of adding API method for debug purposes.

Can we enable it via env var?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Env var would be problematic. Maybe we can add a flag in the constructor that is set to false by default ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it problematic? once model is compiled, we can dump this information based on env var.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we do not handle ENV variables in OVMS runtime, this is extra capability reserved only for CLOUD services for example secrets or endpoints. We can add a graph llm options parameter, then we can pass it to the constructor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should store loaded model configuration and provide getter for it.
In current shape it indeed looks like a debug method as the string it returns is not very useful except for printing.
I would change it to return a map with properties represented as {key, value}. This way user would be able to conveniently check model configuration and possibly make decisions in their own applications based on value of certain properties. It might also be useful for UX and monitoring purposes.
The full_log flag in pipeline constructor is unnecessary in my opinion. I think we should store configuration either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider this method as informative so not only for debugging purposes. The output of the method in a form of vector of strings can be used for various purposes on the client application side just like such API is in OV Runtime.
Introducing extra env variable OV_CB_FULL_LOG to mute the output of the method is in my opinion totally useless. I suggest to keep the same output from get_model_configuration(). It will be the client application to use it or not.
It will be used in OVMS to track the configuration of the model for monitoring and benchmarking purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants