openvinotoolkit · myshevts · Sep 13, 2021 · Jul 1, 2021 · Jul 1, 2021 · Jul 1, 2021
@@ -1,6 +1,7 @@
 # Benchmark C++ Tool {#openvino_inference_engine_samples_benchmark_app_README}
 
-This topic demonstrates how to use the Benchmark C++ Tool to estimate deep learning inference performance on supported devices. Performance can be measured for two inference modes: synchronous (latency-oriented) and asynchronous (throughput-oriented).
+This topic demonstrates how to use the Benchmark C++ Tool to estimate deep learning inference performance on supported devices.
+Performance can be measured for two inference modes: latency- and throughput-oriented.
 
 > **NOTE:** This topic describes usage of C++ implementation of the Benchmark Tool. For the Python* implementation, refer to [Benchmark Python* Tool](../../../tools/benchmark_tool/README.md).
 
@@ -13,27 +14,29 @@ This topic demonstrates how to use the Benchmark C++ Tool to estimate deep learn
 
 ## How It Works
 
-Upon start-up, the application reads command-line parameters and loads a network and images/binary files to the Inference Engine plugin, which is chosen depending on a specified device. The number of infer requests and execution approach depend on the mode defined with the `-api` command-line parameter.
+Upon start-up, the application reads command-line parameters and loads a network and inputs (images/binary files) to the specified device.
 
-> **NOTE**: By default, Inference Engine samples, tools and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md).
+  **NOTE**: By default, Inference Engine samples, tools and demos expect input with BGR channels order.
+  If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application
+  or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified.
+  For more information about the argument, refer to **When to Reverse Input Channels** section of
+  [Converting a Model Using General Conversion Parameters](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md).
 
-If you run the application in the synchronous mode, it creates one infer request and executes the `Infer` method.
-If you run the application in the asynchronous mode, it creates as many infer requests as specified in the `-nireq` command-line parameter and executes the `StartAsync` method for each of them. If `-nireq` is not set, the application will use the default value for specified device.
+Device-specific execution parameters (number of streams, threads, and so on) can be either explicitly specified through the command line
+or left default. In the last case, the sample logic will select the values for the optimal throughput.
+While experimenting with individual parameters allows to find the performance sweet spot, usually, the parameters are not very performance-portable,
+so the values from one machine or device are not necessarily optimal for another.
+From this perspective, the most portable way is experimenting only with the performance hints. To learn more, refer to the section on the command-line parameters below.
 
 A number of execution steps is defined by one of the following parameters:
 * Number of iterations specified with the `-niter` command-line argument
 * Time duration specified with the `-t` command-line argument
 * Both of them (execution will continue until both conditions are met)
 * Predefined duration if `-niter` and `-t` are not specified. Predefined duration value depends on a device.
 
-During the execution, the application collects latency for each executed infer request.
-
-Reported latency value is calculated as a median value of all collected latencies. Reported throughput value is reported
-in frames per second (FPS) and calculated as a derivative from:
-* Reported latency in the Sync mode
-* The total execution time in the Async mode
-
-Throughput value also depends on batch size.
+During the execution, the application calculates latency (if applicable) and overall throughput:
+* By default, the median latency value is reported
+* Throughput is calculated as overall_inference_time/number_of_processed_requests. Note that the throughput value also depends on batch size.
 
 The application also collects per-layer Performance Measurement (PM) counters for each executed infer request if you
 enable statistics dumping by setting the `-report_type` parameter to one of the possible values:
@@ -57,7 +60,7 @@ Note that the benchmark_app usually produces optimal performance for any device
 ./benchmark_app -m <model> -i <input> -d CPU
 ```
 
-But it is still may be non-optimal for some cases, especially for very small networks. More details can read in [Introduction to Performance Topics](../../../docs/IE_DG/Intro_to_Performance.md).
+But it is still may be sub-optimal for some cases, especially for very small networks. More details can read in [Introduction to Performance Topics](../../../docs/IE_DG/Intro_to_Performance.md).
-But it is still may be sub-optimal for some cases, especially for very small networks. More details can read in [Introduction to Performance Topics](../../../docs/IE_DG/Intro_to_Performance.md).
+But it is still may be sub-optimal for some cases, especially for very small networks. For more details, refer to [Introduction to Performance Topics](../../../docs/IE_DG/Intro_to_Performance.md).
-But it is still may be sub-optimal for some cases, especially for very small networks. More details can read in [Introduction to Performance Topics](../../../docs/IE_DG/Intro_to_Performance.md).
+But it is still may be sub-optimal for some cases, especially for very small networks. For more details, refer to [Introduction to Performance Topics](../../../docs/IE_DG/Intro_to_Performance.md).
 
 As explained in the  [Introduction to Performance Topics](../../../docs/IE_DG/Intro_to_Performance.md) section, for all devices, including new [MULTI device](../../../docs/IE_DG/supported_plugins/MULTI.md) it is preferable to use the FP16 IR for the model.
 Also if latency of the CPU inference on the multi-socket machines is of concern, please refer to the same
@@ -84,7 +87,12 @@ Options:
     -l "<absolute_path>"        Required for CPU custom layers. Absolute path to a shared library with the kernels implementations.
           Or
     -c "<absolute_path>"        Required for GPU custom kernels. Absolute path to an .xml file with the kernels description.
-    -api "<sync/async>"         Optional. Enable Sync/Async API. Default value is "async".
+    -hint "<throughput(or just 'tput')/latency">
+                                Optional. Performance hint (optimize for latency or throughput).
+                                The hint allows the OpenVINO device to select the right network-specific settings,
+                                as opposite to just accepting specific values from the sample command line.
+                                So you can specify only the hint without setting explicit 'nstreams' or other device-specific options.
+    -api "<sync/async>"         Optional (deprecated). Enable Sync/Async API. Default value is "async".
     -niter "<integer>"          Optional. Number of iterations. If not specified, the number of iterations is calculated depending on a device.
     -nireq "<integer>"          Optional. Number of infer requests. Default value is determined automatically for a device.
     -b "<integer>"              Optional. Batch size value. If not specified, the batch size value is determined from Intermediate Representation.

@@ -22,8 +22,15 @@ static const char model_message[] =
     "Required. Path to an .xml/.onnx file with a trained model or to a .blob files with "
     "a trained compiled model.";
 
+/// @brief message for performance hint
+static const char hint_message[] =
+    "Optional. Performance hint (optimize for latency or throughput). "
+    "The hint allows the OpenVINO device to select the right network-specific settings,"
+    "as opposite to just accepting specific values from the sample command line."
+    "So you can specify only the hint without setting  explicit 'nstreams' or other device-specific options";
+
 /// @brief message for execution mode
-static const char api_message[] = "Optional. Enable Sync/Async API. Default value is \"async\".";
+static const char api_message[] = "Optional (deprecated). Enable Sync/Async API. Default value is \"async\".";
 
 /// @brief message for assigning cnn calculation to device
 static const char target_device_message[] =
@@ -193,6 +200,9 @@ DEFINE_string(i, "", input_message);
 /// It is a required parameter
 DEFINE_string(m, "", model_message);
 
+/// @brief Define execution mode
+DEFINE_string(hint, "", hint_message);
+
 /// @brief Define execution mode
 DEFINE_string(api, "async", api_message);
 

@@ -59,7 +59,10 @@ bool ParseAndCheckCommandLine(int argc, char* argv[]) {
     if (FLAGS_api != "async" && FLAGS_api != "sync") {
         throw std::logic_error("Incorrect API. Please set -api option to `sync` or `async` value.");
     }
-
+    if (!FLAGS_hint.empty() && FLAGS_hint != "throughput" && FLAGS_hint != "tput" && FLAGS_hint != "latency") {
+        throw std::logic_error("Incorrect performance hint. Please set -hint option to"
+                               "either `throughput`(tput) or `latency' value.");
+    }
     if (!FLAGS_report_type.empty() && FLAGS_report_type != noCntReport && FLAGS_report_type != averageCntReport &&
         FLAGS_report_type != detailedCntReport) {
         std::string err = "only " + std::string(noCntReport) + "/" + std::string(averageCntReport) + "/" +
@@ -208,6 +211,11 @@ int main(int argc, char* argv[]) {
         // ----------------- 3. Setting device configuration
         // -----------------------------------------------------------
         next_step();
+        std::string ov_perf_hint;
+        if (FLAGS_hint == "throughput" || FLAGS_hint == "tput")
+            ov_perf_hint = CONFIG_VALUE(THROUGHPUT);
+        else if (FLAGS_hint == "latency")
+            ov_perf_hint = CONFIG_VALUE(LATENCY);
 
         bool perf_counts = false;
         // Update config per device according to command line parameters
@@ -219,6 +227,13 @@ int main(int argc, char* argv[]) {
                 config[device] = {};
             std::map<std::string, std::string>& device_config = config.at(device);
 
+            // high-level performance modes
+            if (!ov_perf_hint.empty()) {
+                device_config[CONFIG_KEY(PERFORMANCE_HINT)] = ov_perf_hint;
+                if (FLAGS_nireq != 0)
+                    device_config[CONFIG_KEY(PERFORMANCE_HINT_NUM_REQUESTS)] = std::to_string(FLAGS_nireq);
+            }
+
             // Set performance counter
             if (isFlagSetInCommandLine("pc")) {
                 // set to user defined value
@@ -241,6 +256,7 @@ int main(int argc, char* argv[]) {
             }
             perf_counts = (device_config.at(CONFIG_KEY(PERF_COUNT)) == CONFIG_VALUE(YES)) ? true : perf_counts;
 
+            // the rest are individual per-device settings (overriding the values set with perf modes)
             auto setThroughputStreams = [&]() {
                 const std::string key = device + "_THROUGHPUT_STREAMS";
                 if (device_nstreams.count(device)) {
@@ -255,7 +271,7 @@ int main(int argc, char* argv[]) {
                                                " or via configuration file.");
                     }
                     device_config[key] = device_nstreams.at(device);
-                } else if (!device_config.count(key) && (FLAGS_api == "async")) {
+                } else if (ov_perf_hint.empty() && !device_config.count(key) && (FLAGS_api == "async")) {
                     slog::warn << "-nstreams default value is determined automatically for " << device
                                << " device. "
                                   "Although the automatic selection usually provides a "
@@ -484,9 +500,24 @@ int main(int argc, char* argv[]) {
                 batchSize = 1;
             }
         }
-        // ----------------- 8. Setting optimal runtime parameters
+        // ----------------- 8. Querying optimal runtime parameters
         // -----------------------------------------------------
         next_step();
+        // output of the actual settings that the device selected based on the hint
+        if (!ov_perf_hint.empty()) {
+            for (const auto& device : devices) {
+                std::vector<std::string> supported_config_keys =
+                    ie.GetMetric(device, METRIC_KEY(SUPPORTED_CONFIG_KEYS));
+                slog::info << "Device: " << device << slog::endl;
+                for (const auto& cfg : supported_config_keys) {
+                    try {
+                        slog::info << "  {" << cfg << " , " << exeNetwork.GetConfig(cfg).as<std::string>();
+                    } catch (...) {
+                    };
+                    slog::info << " }" << slog::endl;
+                }
+            }
+        }
 
         // Update number of streams
         for (auto&& ds : device_nstreams) {

@@ -46,8 +46,10 @@ void Config::UpdateFromMap(const std::map<std::string, std::string>& configMap)
     for (auto& kvp : configMap) {
         std::string key = kvp.first;
         std::string val = kvp.second;
-
-        if (key.compare(PluginConfigParams::KEY_PERF_COUNT) == 0) {
+        const auto hints = perfHintsConfig.SupportedKeys();
+        if (hints.end() != std::find(hints.begin(), hints.end(), key)) {
+            perfHintsConfig.SetConfig(key, val);
+        } else if (key.compare(PluginConfigParams::KEY_PERF_COUNT) == 0) {
             if (val.compare(PluginConfigParams::YES) == 0) {
                 useProfiling = true;
             } else if (val.compare(PluginConfigParams::NO) == 0) {
@@ -341,6 +343,9 @@ void Config::adjustKeyMapValues() {
         key_config_map[GPUConfigParams::KEY_GPU_ENABLE_LOOP_UNROLLING] = PluginConfigParams::YES;
     else
         key_config_map[GPUConfigParams::KEY_GPU_ENABLE_LOOP_UNROLLING] = PluginConfigParams::NO;
+    key_config_map.insert({ PluginConfigParams::KEY_PERFORMANCE_HINT, perfHintsConfig.ovPerfHint });
+    key_config_map.insert({ PluginConfigParams::KEY_PERFORMANCE_HINT_NUM_REQUESTS,
+                     std::to_string(perfHintsConfig.ovPerfHintNumRequests) });
 }
 IE_SUPPRESS_DEPRECATED_END
 

@@ -8,7 +8,7 @@
 #include <string>
 
 #include "cldnn_custom_layer.h"
-
+#include <ie_performance_hints.hpp>
 #include <cldnn/graph/network.hpp>
 
 namespace CLDNNPlugin {
@@ -62,6 +62,7 @@ struct Config {
     bool enable_loop_unrolling;
 
     std::map<std::string, std::string> key_config_map;
+    InferenceEngine::PerfHintsConfig  perfHintsConfig;
 };
 
 }  // namespace CLDNNPlugin
@@ -566,13 +566,32 @@ void clDNNEngine::UpdateConfig(CLDNNPlugin::Config& conf, const InferenceEngine:
 }
 
 IExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceEngine::CNNNetwork &network,
-                                                                const std::map<std::string, std::string> &config) {
+                                                                const std::map<std::string, std::string> &orig_config) {
     OV_ITT_SCOPED_TASK(itt::domains::CLDNNPlugin, "clDNNEngine::LoadExeNetworkImpl");
     // verification of supported input
     InferenceEngine::InputsDataMap _networkInputs = network.getInputsInfo();
     check_inputs(_networkInputs);
 
     CLDNNPlugin::Config conf = _impl->m_config;
+    auto config = orig_config;
+    const auto& mode = config.find(PluginConfigParams::KEY_PERFORMANCE_HINT);
+    // the mode may have just arrived to the LoadNetwork, or was set with the plugins' SetConfig
+    if (mode != config.end() || !conf.perfHintsConfig.ovPerfHint.empty()) {
+        const auto mode_name = (mode != config.end())
+                               ? PerfHintsConfig::CheckPerformanceHintValue(mode->second)
+                               : conf.perfHintsConfig.ovPerfHint;
+        //checking streams (to avoid overriding what user might explicitly set in the incoming config or previously via SetConfig)
+        const auto streams = config.find(PluginConfigParams::KEY_CPU_THROUGHPUT_STREAMS);
+        if (streams == config.end() && !streamsSet) {
+            if (mode_name == CONFIG_VALUE(LATENCY)) {
+                config[PluginConfigParams::KEY_GPU_THROUGHPUT_STREAMS] = std::to_string(1);
+            } else if (mode_name == CONFIG_VALUE(THROUGHPUT)) {
+                config[PluginConfigParams::KEY_GPU_THROUGHPUT_STREAMS] = CONFIG_VALUE(GPU_THROUGHPUT_AUTO);
+                config[GPUConfigParams::KEY_GPU_PLUGIN_THROTTLE] = std::to_string(1);
+            }
+        }
+    }
+
     UpdateConfig(conf, network, config);
 
     CLDNNRemoteCLContext::Ptr context;
@@ -659,6 +678,7 @@ IRemoteContext::Ptr clDNNEngine::GetDefaultContext(const ParamMap& params) {
 }
 
 void clDNNEngine::SetConfig(const std::map<std::string, std::string> &config) {
+    streamsSet = (config.find(PluginConfigParams::KEY_GPU_THROUGHPUT_STREAMS) != config.end());
     _impl->m_config.UpdateFromMap(config);
 }
 

@@ -20,6 +20,7 @@ class clDNNEngine : public InferenceEngine::IInferencePlugin,
                     public InferenceEngine::gpu::details::param_map_obj_getter {
     struct impl;
     std::shared_ptr<impl> _impl;
+    bool streamsSet = false;
 
     // key: device_id, value: cldnn device
     std::map<std::string, cldnn::device::ptr> device_map;

@@ -34,11 +34,12 @@ namespace CLDNNPlugin {
 
 CLDNNExecNetwork::CLDNNExecNetwork(InferenceEngine::CNNNetwork &network, std::shared_ptr<IRemoteContext> context, Config config) :
     InferenceEngine::ExecutableNetworkThreadSafeDefault{[&]()->InferenceEngine::ITaskExecutor::Ptr {
-        if (config.throughput_streams > 1) {
+        if (config.exclusiveAsyncRequests) {
+            //exclusiveAsyncRequests essentially disables the streams (and hence should be checked first) => aligned with the CPU behavior
+            return ExecutorManager::getInstance()->getExecutor("GPU");
+        }  else if (config.throughput_streams > 1) {
             return std::make_shared<InferenceEngine::CPUStreamsExecutor>(
                 IStreamsExecutor::Config{"CLDNNPlugin executor", config.throughput_streams});
-        } else if (config.exclusiveAsyncRequests) {
-            return ExecutorManager::getInstance()->getExecutor("GPU");
         } else {
             return std::make_shared<InferenceEngine::CPUStreamsExecutor>(
                 IStreamsExecutor::Config{"CLDNNPlugin executor", 1});

@@ -229,6 +229,21 @@ namespace PluginConfigParams {
 #define CONFIG_VALUE(name)         InferenceEngine::PluginConfigParams::name
 #define DECLARE_CONFIG_VALUE(name) static constexpr auto name = #name
 
+/**
+ * @brief High-level OpenVINO Performance Hints
+ * unlike low-level config keys that are individual (per-device), the hints are smth that every device accepts
+ * and turns into device-specific settings
+ */
+DECLARE_CONFIG_KEY(PERFORMANCE_HINT);
+DECLARE_CONFIG_VALUE(LATENCY);
+DECLARE_CONFIG_VALUE(THROUGHPUT);
+/**
+ * @brief (Optional) config key that backs the (above) Performance Hints
+ * by giving additional information on how many inference requests the application will be keeping in flight
+ * usually this value comes from the actual use-case (e.g. number of video-cameras, or other sources of inputs)
+ */
+DECLARE_CONFIG_KEY(PERFORMANCE_HINT_NUM_REQUESTS);
+
 /**
  * @brief generic boolean values
  */