Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OV Performance Hints (CPU and GPU logic for selecting the actual configs), while AUTO/MULTI are passing them thru) #6993

Merged
merged 43 commits into from
Sep 13, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
6fa8ba2
rebasing the perf-modes-2021.3 to the 2021.4
myshevts Jul 1, 2021
279897a
overriding streams (to force the TPUT mode to the DLBenchnark)
myshevts Jul 1, 2021
c491513
disabling reducing #streams to fully mimic baseline c4df94d42d90a2bc3…
myshevts Jul 1, 2021
291c7be
clang/identation
myshevts Jul 1, 2021
f87a06a
splitting the Transformation to general and CPU specific.
myshevts Jul 5, 2021
4bb9e3a
disabling GRU/LSTM/TI + reducing of streams + 5D considered compute-l…
myshevts Jul 5, 2021
77c8d2d
refactored to avoid compute_limited_ratio, reverted the reducing #str…
myshevts Jul 7, 2021
6740211
isa-based threshold logic
myshevts Jul 13, 2021
2e53ea2
mode->hint
myshevts Jul 14, 2021
19fb1a3
optional PERFORMANCE_HINT_NUM_REQUESTS
myshevts Jul 15, 2021
e39a5d0
moving the perfHints to the common OV config class + initial tests (C…
myshevts Jul 16, 2021
b682c2c
AUTO support for PerfHints
myshevts Aug 9, 2021
af2a649
MULTI support for PerfHints
myshevts Aug 9, 2021
3192d88
Enabling Perf hints for the GPU plugin
myshevts Aug 9, 2021
32dba28
brushing settings output a bit
myshevts Aug 9, 2021
0444c63
disabling "throughput" perf hint being default (until OV 2.0)
myshevts Aug 9, 2021
b35be3d
uncommenting the logic which was disabled to force the DLBenchmark to…
myshevts Aug 9, 2021
8326811
removing dead and experimental code, and debug printfs
myshevts Aug 9, 2021
d54e768
clang/code-style
myshevts Aug 10, 2021
b8fcac3
code-review remarks
myshevts Aug 13, 2021
9c38dd7
Moved the output of the actual params that the hint produced to the r…
myshevts Aug 13, 2021
0ab4c41
aligning MULTI's GetConfig beh to HETERO's as captured in the preso (…
myshevts Aug 13, 2021
b668094
clang
myshevts Aug 13, 2021
b164db1
benchmark_app brushing
myshevts Aug 18, 2021
6f97186
Update inference-engine/samples/benchmark_app/README.md
myshevts Aug 18, 2021
1b12d6c
Update inference-engine/samples/benchmark_app/README.md
myshevts Aug 18, 2021
da50156
Update inference-engine/samples/benchmark_app/README.md
myshevts Aug 18, 2021
40d87a2
Update inference-engine/samples/benchmark_app/README.md
myshevts Aug 18, 2021
6f4892d
Update inference-engine/samples/benchmark_app/README.md
myshevts Aug 18, 2021
91e725c
Update inference-engine/samples/benchmark_app/README.md
myshevts Aug 18, 2021
2487cdb
Update inference-engine/samples/benchmark_app/README.md
myshevts Aug 18, 2021
bd87e13
Update inference-engine/samples/benchmark_app/benchmark_app.hpp
myshevts Aug 18, 2021
48bded4
Update inference-engine/samples/benchmark_app/benchmark_app.hpp
myshevts Aug 18, 2021
faed6b5
Update inference-engine/samples/benchmark_app/README.md
myshevts Aug 18, 2021
936ab38
propagating the perf hints thru one more scenario in the merged AUTO-…
myshevts Aug 18, 2021
0ad4908
Merge remote-tracking branch 'github/master' into perf-hints-master
myshevts Sep 3, 2021
cdc0655
Merge remote-tracking branch 'github/master' into perf-hints-master
myshevts Sep 6, 2021
299768b
fixed mispint
myshevts Sep 6, 2021
3796c94
Python benchmark_app update for perf hints
myshevts Sep 7, 2021
f02f9bf
addresssing reviewers comments on the python benchmark_app
myshevts Sep 8, 2021
11be133
simplifying/brushing logic a bit
myshevts Sep 8, 2021
6af6358
refactor the heuristic to the separate file (to be shared with iGPU s…
myshevts Sep 10, 2021
6dd369b
refactor conversion of modes to the specific GPU config per feedback …
myshevts Sep 10, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 23 additions & 15 deletions inference-engine/samples/benchmark_app/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Benchmark C++ Tool {#openvino_inference_engine_samples_benchmark_app_README}

This topic demonstrates how to use the Benchmark C++ Tool to estimate deep learning inference performance on supported devices. Performance can be measured for two inference modes: synchronous (latency-oriented) and asynchronous (throughput-oriented).
This topic demonstrates how to use the Benchmark C++ Tool to estimate deep learning inference performance on supported devices.
Performance can be measured for two inference modes: latency- and throughput-oriented.

> **NOTE:** This topic describes usage of C++ implementation of the Benchmark Tool. For the Python* implementation, refer to [Benchmark Python* Tool](../../../tools/benchmark_tool/README.md).

Expand All @@ -13,27 +14,29 @@ This topic demonstrates how to use the Benchmark C++ Tool to estimate deep learn

## How It Works

Upon start-up, the application reads command-line parameters and loads a network and images/binary files to the Inference Engine plugin, which is chosen depending on a specified device. The number of infer requests and execution approach depend on the mode defined with the `-api` command-line parameter.
Upon start-up, the application reads command-line parameters and loads a network and inputs (images/binary files) to the specified device.

> **NOTE**: By default, Inference Engine samples, tools and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md).
**NOTE**: By default, Inference Engine samples, tools and demos expect input with BGR channels order.
If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application
or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified.
For more information about the argument, refer to **When to Reverse Input Channels** section of
[Converting a Model Using General Conversion Parameters](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md).

If you run the application in the synchronous mode, it creates one infer request and executes the `Infer` method.
If you run the application in the asynchronous mode, it creates as many infer requests as specified in the `-nireq` command-line parameter and executes the `StartAsync` method for each of them. If `-nireq` is not set, the application will use the default value for specified device.
Device-specific execution parameters (number of streams, threads, and so on) can be either explicitly specified through the command line
or left default. In the last case, the sample logic will select the values for the optimal throughput.
While experimenting with individual parameters allows to find the performance sweet spot, usually, the parameters are not very performance-portable,
so the values from one machine or device are not necessarily optimal for another.
From this perspective, the most portable way is experimenting only with the performance hints. To learn more, refer to the section on the command-line parameters below.

A number of execution steps is defined by one of the following parameters:
* Number of iterations specified with the `-niter` command-line argument
* Time duration specified with the `-t` command-line argument
* Both of them (execution will continue until both conditions are met)
* Predefined duration if `-niter` and `-t` are not specified. Predefined duration value depends on a device.

During the execution, the application collects latency for each executed infer request.

Reported latency value is calculated as a median value of all collected latencies. Reported throughput value is reported
in frames per second (FPS) and calculated as a derivative from:
* Reported latency in the Sync mode
* The total execution time in the Async mode

Throughput value also depends on batch size.
During the execution, the application calculates latency (if applicable) and overall throughput:
* By default, the median latency value is reported
* Throughput is calculated as overall_inference_time/number_of_processed_requests. Note that the throughput value also depends on batch size.

The application also collects per-layer Performance Measurement (PM) counters for each executed infer request if you
enable statistics dumping by setting the `-report_type` parameter to one of the possible values:
Expand All @@ -57,7 +60,7 @@ Note that the benchmark_app usually produces optimal performance for any device
./benchmark_app -m <model> -i <input> -d CPU
```

But it is still may be non-optimal for some cases, especially for very small networks. More details can read in [Introduction to Performance Topics](../../../docs/IE_DG/Intro_to_Performance.md).
But it is still may be sub-optimal for some cases, especially for very small networks. More details can read in [Introduction to Performance Topics](../../../docs/IE_DG/Intro_to_Performance.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
But it is still may be sub-optimal for some cases, especially for very small networks. More details can read in [Introduction to Performance Topics](../../../docs/IE_DG/Intro_to_Performance.md).
But it is still may be sub-optimal for some cases, especially for very small networks. For more details, refer to [Introduction to Performance Topics](../../../docs/IE_DG/Intro_to_Performance.md).


As explained in the [Introduction to Performance Topics](../../../docs/IE_DG/Intro_to_Performance.md) section, for all devices, including new [MULTI device](../../../docs/IE_DG/supported_plugins/MULTI.md) it is preferable to use the FP16 IR for the model.
Also if latency of the CPU inference on the multi-socket machines is of concern, please refer to the same
Expand All @@ -84,7 +87,12 @@ Options:
-l "<absolute_path>" Required for CPU custom layers. Absolute path to a shared library with the kernels implementations.
Or
-c "<absolute_path>" Required for GPU custom kernels. Absolute path to an .xml file with the kernels description.
-api "<sync/async>" Optional. Enable Sync/Async API. Default value is "async".
-hint "<throughput(or just 'tput')/latency">
Optional. Performance hint (optimize for latency or throughput).
The hint allows the OpenVINO device to select the right network-specific settings,
as opposite to just accepting specific values from the sample command line.
So you can specify only the hint without setting explicit 'nstreams' or other device-specific options.
-api "<sync/async>" Optional (deprecated). Enable Sync/Async API. Default value is "async".
-niter "<integer>" Optional. Number of iterations. If not specified, the number of iterations is calculated depending on a device.
-nireq "<integer>" Optional. Number of infer requests. Default value is determined automatically for a device.
-b "<integer>" Optional. Batch size value. If not specified, the batch size value is determined from Intermediate Representation.
Expand Down
12 changes: 11 additions & 1 deletion inference-engine/samples/benchmark_app/benchmark_app.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,15 @@ static const char model_message[] =
"Required. Path to an .xml/.onnx file with a trained model or to a .blob files with "
"a trained compiled model.";

/// @brief message for performance hint
static const char hint_message[] =
"Optional. Performance hint (optimize for latency or throughput). "
"The hint allows the OpenVINO device to select the right network-specific settings,"
"as opposite to just accepting specific values from the sample command line."
"So you can specify only the hint without setting explicit 'nstreams' or other device-specific options";

/// @brief message for execution mode
static const char api_message[] = "Optional. Enable Sync/Async API. Default value is \"async\".";
static const char api_message[] = "Optional (deprecated). Enable Sync/Async API. Default value is \"async\".";

/// @brief message for assigning cnn calculation to device
static const char target_device_message[] =
Expand Down Expand Up @@ -193,6 +200,9 @@ DEFINE_string(i, "", input_message);
/// It is a required parameter
DEFINE_string(m, "", model_message);

/// @brief Define execution mode
DEFINE_string(hint, "", hint_message);

/// @brief Define execution mode
DEFINE_string(api, "async", api_message);

Expand Down
37 changes: 34 additions & 3 deletions inference-engine/samples/benchmark_app/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,10 @@ bool ParseAndCheckCommandLine(int argc, char* argv[]) {
if (FLAGS_api != "async" && FLAGS_api != "sync") {
throw std::logic_error("Incorrect API. Please set -api option to `sync` or `async` value.");
}

if (!FLAGS_hint.empty() && FLAGS_hint != "throughput" && FLAGS_hint != "tput" && FLAGS_hint != "latency") {
throw std::logic_error("Incorrect performance hint. Please set -hint option to"
"either `throughput`(tput) or `latency' value.");
}
if (!FLAGS_report_type.empty() && FLAGS_report_type != noCntReport && FLAGS_report_type != averageCntReport &&
FLAGS_report_type != detailedCntReport) {
std::string err = "only " + std::string(noCntReport) + "/" + std::string(averageCntReport) + "/" +
Expand Down Expand Up @@ -208,6 +211,11 @@ int main(int argc, char* argv[]) {
// ----------------- 3. Setting device configuration
// -----------------------------------------------------------
next_step();
std::string ov_perf_hint;
if (FLAGS_hint == "throughput" || FLAGS_hint == "tput")
ov_perf_hint = CONFIG_VALUE(THROUGHPUT);
else if (FLAGS_hint == "latency")
ov_perf_hint = CONFIG_VALUE(LATENCY);

bool perf_counts = false;
// Update config per device according to command line parameters
Expand All @@ -219,6 +227,13 @@ int main(int argc, char* argv[]) {
config[device] = {};
std::map<std::string, std::string>& device_config = config.at(device);

// high-level performance modes
if (!ov_perf_hint.empty()) {
device_config[CONFIG_KEY(PERFORMANCE_HINT)] = ov_perf_hint;
if (FLAGS_nireq != 0)
device_config[CONFIG_KEY(PERFORMANCE_HINT_NUM_REQUESTS)] = std::to_string(FLAGS_nireq);
}

// Set performance counter
if (isFlagSetInCommandLine("pc")) {
// set to user defined value
Expand All @@ -241,6 +256,7 @@ int main(int argc, char* argv[]) {
}
perf_counts = (device_config.at(CONFIG_KEY(PERF_COUNT)) == CONFIG_VALUE(YES)) ? true : perf_counts;

// the rest are individual per-device settings (overriding the values set with perf modes)
auto setThroughputStreams = [&]() {
const std::string key = device + "_THROUGHPUT_STREAMS";
if (device_nstreams.count(device)) {
Expand All @@ -255,7 +271,7 @@ int main(int argc, char* argv[]) {
" or via configuration file.");
}
device_config[key] = device_nstreams.at(device);
} else if (!device_config.count(key) && (FLAGS_api == "async")) {
} else if (ov_perf_hint.empty() && !device_config.count(key) && (FLAGS_api == "async")) {
slog::warn << "-nstreams default value is determined automatically for " << device
<< " device. "
"Although the automatic selection usually provides a "
Expand Down Expand Up @@ -484,9 +500,24 @@ int main(int argc, char* argv[]) {
batchSize = 1;
}
}
// ----------------- 8. Setting optimal runtime parameters
// ----------------- 8. Querying optimal runtime parameters
// -----------------------------------------------------
next_step();
// output of the actual settings that the device selected based on the hint
if (!ov_perf_hint.empty()) {
for (const auto& device : devices) {
std::vector<std::string> supported_config_keys =
ie.GetMetric(device, METRIC_KEY(SUPPORTED_CONFIG_KEYS));
slog::info << "Device: " << device << slog::endl;
for (const auto& cfg : supported_config_keys) {
try {
slog::info << " {" << cfg << " , " << exeNetwork.GetConfig(cfg).as<std::string>();
} catch (...) {
};
slog::info << " }" << slog::endl;
}
}
}

// Update number of streams
for (auto&& ds : device_nstreams) {
Expand Down
9 changes: 7 additions & 2 deletions inference-engine/src/cldnn_engine/cldnn_config.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,10 @@ void Config::UpdateFromMap(const std::map<std::string, std::string>& configMap)
for (auto& kvp : configMap) {
std::string key = kvp.first;
std::string val = kvp.second;

if (key.compare(PluginConfigParams::KEY_PERF_COUNT) == 0) {
const auto hints = perfHintsConfig.SupportedKeys();
if (hints.end() != std::find(hints.begin(), hints.end(), key)) {
perfHintsConfig.SetConfig(key, val);
} else if (key.compare(PluginConfigParams::KEY_PERF_COUNT) == 0) {
if (val.compare(PluginConfigParams::YES) == 0) {
useProfiling = true;
} else if (val.compare(PluginConfigParams::NO) == 0) {
Expand Down Expand Up @@ -341,6 +343,9 @@ void Config::adjustKeyMapValues() {
key_config_map[GPUConfigParams::KEY_GPU_ENABLE_LOOP_UNROLLING] = PluginConfigParams::YES;
else
key_config_map[GPUConfigParams::KEY_GPU_ENABLE_LOOP_UNROLLING] = PluginConfigParams::NO;
key_config_map.insert({ PluginConfigParams::KEY_PERFORMANCE_HINT, perfHintsConfig.ovPerfHint });
key_config_map.insert({ PluginConfigParams::KEY_PERFORMANCE_HINT_NUM_REQUESTS,
std::to_string(perfHintsConfig.ovPerfHintNumRequests) });
}
IE_SUPPRESS_DEPRECATED_END

Expand Down
3 changes: 2 additions & 1 deletion inference-engine/src/cldnn_engine/cldnn_config.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
#include <string>

#include "cldnn_custom_layer.h"

#include <ie_performance_hints.hpp>
#include <cldnn/graph/network.hpp>

namespace CLDNNPlugin {
Expand Down Expand Up @@ -62,6 +62,7 @@ struct Config {
bool enable_loop_unrolling;

std::map<std::string, std::string> key_config_map;
InferenceEngine::PerfHintsConfig perfHintsConfig;
myshevts marked this conversation as resolved.
Show resolved Hide resolved
};

} // namespace CLDNNPlugin
32 changes: 30 additions & 2 deletions inference-engine/src/cldnn_engine/cldnn_engine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -565,14 +565,40 @@ void clDNNEngine::UpdateConfig(CLDNNPlugin::Config& conf, const InferenceEngine:
}
}

std::map<std::string, std::string> clDNNEngine::ConvertPerfHintsToConfig(
const std::map<std::string, std::string>& network_config,
const CLDNNPlugin::Config& plugin_config) const {
// deduces the actual settings from the performance hints and returns fully-defined config
auto config = network_config;
const auto &mode = config.find(PluginConfigParams::KEY_PERFORMANCE_HINT);
// the mode may have just arrived to the LoadNetwork, or was set with the plugins' SetConfig
if (mode != config.end() || !plugin_config.perfHintsConfig.ovPerfHint.empty()) {
const auto mode_name = (mode != config.end())
? PerfHintsConfig::CheckPerformanceHintValue(mode->second)
: plugin_config.perfHintsConfig.ovPerfHint;
//checking streams (to avoid overriding what user might explicitly set in the incoming config or previously via SetConfig)
const auto streams = config.find(PluginConfigParams::KEY_GPU_THROUGHPUT_STREAMS);
if (streams == config.end() && !streamsSet) {
myshevts marked this conversation as resolved.
Show resolved Hide resolved
if (mode_name == CONFIG_VALUE(LATENCY)) {
config[PluginConfigParams::KEY_GPU_THROUGHPUT_STREAMS] = std::to_string(1);
} else if (mode_name == CONFIG_VALUE(THROUGHPUT)) {
config[PluginConfigParams::KEY_GPU_THROUGHPUT_STREAMS] = CONFIG_VALUE(GPU_THROUGHPUT_AUTO);
config[GPUConfigParams::KEY_GPU_PLUGIN_THROTTLE] = std::to_string(1);
}
}
}
return config;
}

myshevts marked this conversation as resolved.
Show resolved Hide resolved
IExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceEngine::CNNNetwork &network,
const std::map<std::string, std::string> &config) {
const std::map<std::string, std::string> &orig_config) {
OV_ITT_SCOPED_TASK(itt::domains::CLDNNPlugin, "clDNNEngine::LoadExeNetworkImpl");
// verification of supported input
InferenceEngine::InputsDataMap _networkInputs = network.getInputsInfo();
check_inputs(_networkInputs);

CLDNNPlugin::Config conf = _impl->m_config;
auto config = ConvertPerfHintsToConfig(orig_config, conf);
UpdateConfig(conf, network, config);

CLDNNRemoteCLContext::Ptr context;
Expand Down Expand Up @@ -618,7 +644,7 @@ IExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceE

IExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceEngine::CNNNetwork &network,
const IRemoteContext::Ptr &context,
const std::map<std::string, std::string> &config) {
const std::map<std::string, std::string> &orig_config) {
InferenceEngine::InputsDataMap _networkInputs = network.getInputsInfo();
check_inputs(_networkInputs);

Expand All @@ -628,6 +654,7 @@ IExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceE
}

CLDNNPlugin::Config conf = getContextImpl(casted)->GetConfig();
auto config = ConvertPerfHintsToConfig(orig_config, conf);
UpdateConfig(conf, network, config);

auto transformedNetwork = CloneAndTransformNetwork(network, conf);
Expand Down Expand Up @@ -659,6 +686,7 @@ IRemoteContext::Ptr clDNNEngine::GetDefaultContext(const ParamMap& params) {
}

void clDNNEngine::SetConfig(const std::map<std::string, std::string> &config) {
streamsSet = (config.find(PluginConfigParams::KEY_GPU_THROUGHPUT_STREAMS) != config.end());
_impl->m_config.UpdateFromMap(config);
}

Expand Down
4 changes: 4 additions & 0 deletions inference-engine/src/cldnn_engine/cldnn_engine.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ class clDNNEngine : public InferenceEngine::IInferencePlugin,
public InferenceEngine::gpu::details::param_map_obj_getter {
struct impl;
std::shared_ptr<impl> _impl;
bool streamsSet = false;

// key: device_id, value: cldnn device
std::map<std::string, cldnn::device::ptr> device_map;
Expand All @@ -31,6 +32,9 @@ class clDNNEngine : public InferenceEngine::IInferencePlugin,
InferenceEngine::CNNNetwork CloneAndTransformNetwork(const InferenceEngine::CNNNetwork& network,
const CLDNNPlugin::Config& config) const;

std::map<std::string, std::string> ConvertPerfHintsToConfig(const std::map<std::string, std::string>& network_config,
const CLDNNPlugin::Config& plugin_config) const;

void RegisterPrimitives();
void UpdateConfig(Config& conf, const InferenceEngine::CNNNetwork &network, const std::map<std::string, std::string> &params) const;
public:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,12 @@ namespace CLDNNPlugin {

CLDNNExecNetwork::CLDNNExecNetwork(InferenceEngine::CNNNetwork &network, std::shared_ptr<IRemoteContext> context, Config config) :
InferenceEngine::ExecutableNetworkThreadSafeDefault{[&]()->InferenceEngine::ITaskExecutor::Ptr {
if (config.throughput_streams > 1) {
if (config.exclusiveAsyncRequests) {
//exclusiveAsyncRequests essentially disables the streams (and hence should be checked first) => aligned with the CPU behavior
return ExecutorManager::getInstance()->getExecutor("GPU");
} else if (config.throughput_streams > 1) {
return std::make_shared<InferenceEngine::CPUStreamsExecutor>(
IStreamsExecutor::Config{"CLDNNPlugin executor", config.throughput_streams});
} else if (config.exclusiveAsyncRequests) {
return ExecutorManager::getInstance()->getExecutor("GPU");
} else {
return std::make_shared<InferenceEngine::CPUStreamsExecutor>(
IStreamsExecutor::Config{"CLDNNPlugin executor", 1});
Expand Down
Loading