diff --git a/.gitattributes b/.gitattributes index 4d1315ede4efec..de9c0b51d763cd 100644 --- a/.gitattributes +++ b/.gitattributes @@ -63,3 +63,9 @@ #*.PDF diff=astextplain #*.rtf diff=astextplain #*.RTF diff=astextplain + +*.PNG filter=lfs diff=lfs merge=lfs -text +*.png filter=lfs diff=lfs merge=lfs -text +*.jpg filter=lfs diff=lfs merge=lfs -text +*.gif filter=lfs diff=lfs merge=lfs -text +*.vsdx filter=lfs diff=lfs merge=lfs -text diff --git a/README.md b/README.md index 869616f3ac8fe9..7e79b24c16934d 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ # [OpenVINO™ Toolkit](https://01.org/openvinotoolkit) - Deep Learning Deployment Toolkit repository -[![Stable release](https://img.shields.io/badge/version-2020.3-green.svg)](https://github.com/openvinotoolkit/openvino/releases/tag/2020.3.0) +[![Stable release](https://img.shields.io/badge/version-2020.4-green.svg)](https://github.com/openvinotoolkit/openvino/releases/tag/2020.4.0) [![Apache License Version 2.0](https://img.shields.io/badge/license-Apache_2.0-green.svg)](LICENSE) This toolkit allows developers to deploy pre-trained deep learning models diff --git a/build-instruction.md b/build-instruction.md index 12103ce9875004..6c55924cceeb8f 100644 --- a/build-instruction.md +++ b/build-instruction.md @@ -52,14 +52,15 @@ as a part of [Intel® Distribution of OpenVINO™]. ## Build on Linux\* Systems The software was validated on: +- Ubuntu\* 18.04 (64-bit) with default GCC\* 7.5.0 - Ubuntu\* 16.04 (64-bit) with default GCC\* 5.4.0 - CentOS\* 7.4 (64-bit) with default GCC\* 4.8.5 ### Software Requirements - [CMake]\* 3.11 or higher - GCC\* 4.8 or higher to build the Inference Engine -- Python 2.7 or higher for Inference Engine Python API wrapper -- (Optional) [Install Intel® Graphics Compute Runtime for OpenCL™ Driver package 20.13.16352]. +- Python 3.5 or higher for Inference Engine Python API wrapper +- (Optional) [Install Intel® Graphics Compute Runtime for OpenCL™ Driver package 19.41.14441]. ### Build Steps 1. Clone submodules: @@ -77,7 +78,7 @@ The software was validated on: ``` 3. By default, the build enables the Inference Engine GPU plugin to infer models on your Intel® Processor Graphics. This requires you to - [Install Intel® Graphics Compute Runtime for OpenCL™ Driver package 20.13.16352] + [Install Intel® Graphics Compute Runtime for OpenCL™ Driver package 19.41.14441] before running the build. If you don't want to use the GPU plugin, use the `-DENABLE_CLDNN=OFF` CMake build option and skip the installation of the Intel® Graphics Compute Runtime for OpenCL™ Driver. @@ -202,7 +203,7 @@ Native compilation of the Inference Engine is the most straightforward solution. This compilation was tested on the following configuration: - * Host: Ubuntu\* 16.04 (64-bit, Intel® Core™ i7-6700K CPU @ 4.00GHz × 8) + * Host: Ubuntu\* 18.04 (64-bit, Intel® Core™ i7-6700K CPU @ 4.00GHz × 8) * Target: Raspbian\* Stretch (32-bit, ARMv7, Raspberry Pi\* 3) 1. Install Docker\*: @@ -337,7 +338,7 @@ The software was validated on: - [CMake]\*3.11 or higher - Microsoft\* Visual Studio 2017, 2019 or [Intel® C++ Compiler] 18.0 - (Optional) Intel® Graphics Driver for Windows* (26.20) [driver package]. -- Python 3.4 or higher for Inference Engine Python API wrapper +- Python 3.5 or higher for Inference Engine Python API wrapper ### Build Steps @@ -454,7 +455,7 @@ The software was validated on: - [CMake]\* 3.11 or higher - Clang\* compiler from Xcode\* 10.1 or higher -- Python\* 3.4 or higher for the Inference Engine Python API wrapper +- Python\* 3.5 or higher for the Inference Engine Python API wrapper ### Build Steps @@ -574,8 +575,7 @@ This section describes how to build Inference Engine for Android x86 (64-bit) op ## Use Custom OpenCV Builds for Inference Engine -> **NOTE**: The recommended and tested version of OpenCV is 4.3. The minimum -supported version is 3.4.0. +> **NOTE**: The recommended and tested version of OpenCV is 4.4.0. Required versions of OpenCV packages are downloaded automatically during the building Inference Engine library. If the build script can not find and download @@ -691,7 +691,7 @@ This target collects all dependencies, prepares the nGraph package and copies it [Intel® Distribution of OpenVINO™]:https://software.intel.com/en-us/openvino-toolkit [CMake]:https://cmake.org/download/ -[Install Intel® Graphics Compute Runtime for OpenCL™ Driver package 20.13.16352]:https://github.com/intel/compute-runtime/releases/tag/20.13.16352 +[Install Intel® Graphics Compute Runtime for OpenCL™ Driver package 19.41.14441]:https://github.com/intel/compute-runtime/releases/tag/19.41.14441 [MKL-DNN repository]:https://github.com/intel/mkl-dnn/releases/download/v0.19/mklml_lnx_2019.0.5.20190502.tgz [MKL-DNN repository for Windows]:(https://github.com/intel/mkl-dnn/releases/download/v0.19/mklml_win_2019.0.5.20190502.zip) [OpenBLAS]:https://sourceforge.net/projects/openblas/files/v0.2.14/OpenBLAS-v0.2.14-Win64-int64.zip/download diff --git a/docs/HOWTO/Custom_Layers_Guide.md b/docs/HOWTO/Custom_Layers_Guide.md new file mode 100644 index 00000000000000..ddbd8126798aaa --- /dev/null +++ b/docs/HOWTO/Custom_Layers_Guide.md @@ -0,0 +1,212 @@ +# Custom Layers Guide {#openvino_docs_HOWTO_Custom_Layers_Guide} + +The Intel® Distribution of OpenVINO™ toolkit supports neural network model layers in multiple frameworks including TensorFlow*, Caffe*, MXNet*, Kaldi* and ONYX*. The list of known layers is different for each of the supported frameworks. To see the layers supported by your framework, refer to [supported frameworks](../MO_DG/prepare_model/Supported_Frameworks_Layers.md). + +Custom layers are layers that are not included in the list of known layers. If your topology contains any layers that are not in the list of known layers, the Model Optimizer classifies them as custom. + +This guide illustrates the workflow for running inference on topologies featuring custom layers, allowing you to plug in your own implementation for existing or completely new layers. +For a step-by-step example of creating and executing a custom layer, see the [Custom Layer Implementation Tutorials for Linux and Windows.](https://github.com/david-drew/OpenVINO-Custom-Layers/tree/master/2019.r2.0) + +## Terms used in this guide + +- *Layer* — The abstract concept of a math function that is selected for a specific purpose (relu, sigmoid, tanh, convolutional). This is one of a sequential series of building blocks within the neural network. +- *Kernel* — The implementation of a layer function, in this case, the math programmed (in C++ and Python) to perform the layer operation for target hardware (CPU or GPU). +- *Intermediate Representation (IR)* — Neural Network used only by the Inference Engine in OpenVINO abstracting the different frameworks and describing topology, layer parameters and weights. +The original format will be a supported framework such as TensorFlow, Caffe, or MXNet. + +- *Model Extension Generator* — Generates template source code files for each of the extensions needed by the Model Optimizer and the Inference Engine. + +- *Inference Engine Extension* — Device-specific module implementing custom layers (a set of kernels). + + +## Custom Layer Overview + +The [Model Optimizer](https://docs.openvinotoolkit.org/2019_R1.1/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) searches the list of known layers for each layer contained in the input model topology before building the model's internal representation, optimizing the model, and producing the Intermediate Representation files. + +The [Inference Engine](https://docs.openvinotoolkit.org/2019_R1.1/_docs_IE_DG_Deep_Learning_Inference_Engine_DevGuide.html) loads the layers from the input model IR files into the specified device plugin, which will search a list of known layer implementations for the device. If your topology contains layers that are not in the list of known layers for the device, the Inference Engine considers the layer to be unsupported and reports an error. To see the layers that are supported by each device plugin for the Inference Engine, refer to the [Supported Devices](https://docs.openvinotoolkit.org/2019_R1.1/_docs_IE_DG_supported_plugins_Supported_Devices.html) documentation. +
+**Note:** If a device doesn't support a particular layer, an alternative to creating a new custom layer is to target an additional device using the HETERO plugin. The [Heterogeneous Plugin](https://docs.openvinotoolkit.org/2019_R1.1/_docs_IE_DG_supported_plugins_HETERO.html) may be used to run an inference model on multiple devices allowing the unsupported layers on one device to "fallback" to run on another device (e.g., CPU) that does support those layers. + +## Custom Layer Implementation Workflow + +When implementing a custom layer for your pre-trained model in the Intel® Distribution of OpenVINO™ toolkit, you will need to add extensions to both the Model Optimizer and the Inference Engine. + +## Custom Layer Extensions for the Model Optimizer + +The following figure shows the basic processing steps for the Model Optimizer highlighting the two necessary custom layer extensions, the Custom Layer Extractor and the Custom Layer Operation. + +![](img/MO_extensions_flow.png) + + +The Model Optimizer first extracts information from the input model which includes the topology of the model layers along with parameters, input and output format, etc., for each layer. The model is then optimized from the various known characteristics of the layers, interconnects, and data flow which partly comes from the layer operation providing details including the shape of the output for each layer. Finally, the optimized model is output to the model IR files needed by the Inference Engine to run the model. + +The Model Optimizer starts with a library of known extractors and operations for each [supported model framework](https://docs.openvinotoolkit.org/2019_R1.1/_docs_MO_DG_prepare_model_Supported_Frameworks_Layers.html) which must be extended to use each unknown custom layer. The custom layer extensions needed by the Model Optimizer are: + +- Custom Layer Extractor + - Responsible for identifying the custom layer operation and extracting the parameters for each instance of the custom layer. The layer parameters are stored per instance and used by the layer operation before finally appearing in the output IR. Typically the input layer parameters are unchanged, which is the case covered by this tutorial. +- Custom Layer Operation + - Responsible for specifying the attributes that are supported by the custom layer and computing the output shape for each instance of the custom layer from its parameters.
The `--mo-op` command-line argument shown in the examples below generates a custom layer operation for the Model Optimizer. + +## Custom Layer Extensions for the Inference Engine + +The following figure shows the basic flow for the Inference Engine highlighting two custom layer extensions for the CPU and GPU Plugins, the Custom Layer CPU extension and the Custom Layer GPU Extension. + +![](img/IE_extensions_flow.png) + +Each device plugin includes a library of optimized implementations to execute known layer operations which must be extended to execute a custom layer. The custom layer extension is implemented according to the target device: + +- Custom Layer CPU Extension + - A compiled shared library (.so or .dll binary) needed by the CPU Plugin for executing the custom layer on the CPU. +- Custom Layer GPU Extension + - OpenCL source code (.cl) for the custom layer kernel that will be compiled to execute on the GPU along with a layer description file (.xml) needed by the GPU Plugin for the custom layer kernel. + +## Model Extension Generator + +Using answers to interactive questions or a *.json* configuration file, the Model Extension Generator tool generates template source code files for each of the extensions needed by the Model Optimizer and the Inference Engine. To complete the implementation of each extension, the template functions may need to be edited to fill-in details specific to the custom layer or the actual custom layer functionality itself. + +### Command-line + +The Model Extension Generator is included in the Intel® Distribution of OpenVINO™ toolkit installation and is run using the command (here with the "--help" option): + +```bash +python3 /opt/intel/openvino/deployment_tools/tools/extension_generator/extgen.py new --help +``` + +where the output will appear similar to: + +``` +usage: You can use any combination of the following arguments: + +Arguments to configure extension generation in the interactive mode: + +optional arguments: + -h, --help show this help message and exit + --mo-caffe-ext generate a Model Optimizer Caffe* extractor + --mo-mxnet-ext generate a Model Optimizer MXNet* extractor + --mo-tf-ext generate a Model Optimizer TensorFlow* extractor + --mo-op generate a Model Optimizer operation + --ie-cpu-ext generate an Inference Engine CPU extension + --ie-gpu-ext generate an Inference Engine GPU extension + --output_dir OUTPUT_DIR + set an output directory. If not specified, the current + directory is used by default. +``` + +The available command-line arguments are used to specify which extension(s) to generate templates for the Model Optimizer or Inference Engine. The generated extension files for each argument will appear starting from the top of the output directory as follows: + +Command-line Argument | Output Directory Location | +--------------------- | ------------------------------ | +`--mo-caffe-ext` | user_mo_extensions/front/caffe | +`--mo-mxnet-ext` | user_mo_extensions/front/mxnet | +`--mo-tf-ext` | user_mo_extensions/front/tf | +`--mo-op` | user_mo_extensions/ops | +`--ie-cpu-ext` | user_ie_extensions/cpu | +`--ie-gpu-ext` | user_ie_extensions/gpu | + +### Extension Workflow + +The workflow for each generated extension follows the same basic steps: + +![](img/MEG_generic_flow.png) + +**Step 1: Generate:** Use the Model Extension Generator to generate the Custom Layer Template Files. + +**Step 2: Edit:** Edit the Custom Layer Template Files as necessary to create the specialized Custom Layer Extension Source Code. + +**Step 3: Specify:** Specify the custom layer extension locations to be used by the Model Optimizer or Inference Engine. + +## Caffe\* Models with Custom Layers + +If your Caffe\* model has custom layers: + +**Register the custom layers as extensions to the Model Optimizer**. For instructions, see [Extending Model Optimizer with New Primitives](../MO_DG/prepare_model/customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md). When your custom layers are registered as extensions, the Model Optimizer generates a valid and optimized Intermediate Representation. You will need a bit of Python\* code that lets the Model Optimizer; + +- Generate a valid Intermediate Representation according to the rules you specified. +- Be independent from the availability of Caffe on your computer. + +If your model contains Custom Layers, it is important to understand the internal workflow of the Model Optimizer. Consider the following example. + +**Example**: + +The network has: + +* One input layer (#1) +* One output Layer (#5) +* Three internal layers (#2, 3, 4) + +The custom and standard layer types are: + +* Layers #2 and #5 are implemented as Model Optimizer extensions. +* Layers #1 and #4 are supported in Model Optimizer out-of-the box. +* Layer #3 is neither in the list of supported layers nor in extensions, but is specified in CustomLayersMapping.xml. + +> **NOTE**: If any of the layers are not in one of three categories described above, the Model Optimizer fails with an appropriate message and a link to the corresponding question in [Model Optimizer FAQ](../MO_DG/prepare_model/Model_Optimizer_FAQ.md). + +The general process is as shown: + +![Example custom layer network](img/mo_caffe_priorities.png) +
+ +**Step 1:** The example model is fed to the Model Optimizer that **loads the model** with the special parser built on top of the `caffe.proto` file. In case of failure, the Model Optimizer asks you to prepare the parser that can read the model. For more information, refer to the Model Optimizer, FAQ #1. + +**Step 2:** The Model Optimizer **extracts the attributes of all layers** by going through the list of layers and attempting to find the appropriate extractor. In order of priority, the Model Optimizer checks if the layer is: + +* A. Registered as a Model Optimizer extension +* B. Registered as a standard Model Optimizer layer + +When the Model Optimizer finds a satisfying condition from the list above, it extracts the attributes according to the following rules: + +* For A. - takes only the parameters specified in the extension +* For B. - takes only the parameters specified in the standard extractor +
+ +**Step 3:** The Model Optimizer **calculates the output shape of all layers**. The logic is the same as it is for the priorities. **Important:** the Model Optimizer always takes the first available option. + +**Step 4:** The Model Optimizer **optimizes the original model and produces the two Intermediate Representation (IR) files in .xml and .bin**. +
+ +## TensorFlow\* Models with Custom Layers + +You have two options for TensorFlow\* models with custom layers: +
+ +* **Register those layers as extensions to the Model Optimizer.** In this case, the Model Optimizer generates a valid and optimized Intermediate Representation. +* **If you have sub-graphs that should not be expressed with the analogous sub-graph in the Intermediate Representation, but another sub-graph should appear in the model, the Model Optimizer provides such an option.** This feature is helpful for many TensorFlow models. To read more, see [Sub-graph Replacement in the Model Optimizer](../MO_DG/prepare_model/customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md). + +## MXNet\* Models with Custom Layers + +There are two options to convert your MXNet* model that contains custom layers: + +1. Register the custom layers as extensions to the Model Optimizer. For instructions, see [Extending MXNet Model Optimizer with New Primitives](../MO_DG/prepare_model/customize_model_optimizer/Extending_MXNet_Model_Optimizer_with_New_Primitives.md). When your custom layers are registered as extensions, the Model Optimizer generates a valid and optimized Intermediate Representation. You can create Model Optimizer extensions for both MXNet layers with op `Custom` and layers which are not standard MXNet layers. + +2. If you have sub-graphs that should not be expressed with the analogous sub-graph in the Intermediate Representation, but another sub-graph should appear in the model, the Model Optimizer provides such an option. In MXNet the function is actively used for ssd models provides an opportunity to for the necessary subgraph sequences and replace them. To read more, see [Sub-graph Replacement in the Model Optimizer](../MO_DG/prepare_model/customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md). + +## Kaldi\* Models with Custom Layers +For information on converting your Kaldi* model containing custom layers see [Converting a Kaldi Model in the Model Optimizer Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_Kaldi.html). + +## ONNX\* Models with Custom Layers +For information on converting your ONNX* model containing custom layers see [Converting an ONNX Model in the Model Optimizer Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_ONNX.html). + +## Step-by-Step Custom Layers Tutorial +For a step-by-step walk-through creating and executing a custom layer, see [Custom Layer Implementation Tutorial for Linux and Windows.](https://github.com/david-drew/OpenVINO-Custom-Layers/tree/master/2019.r2.0) + +## Additional Resources + +- Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) +- OpenVINO™ toolkit online documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org) +- [Model Optimizer Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) +- [Kernel Extensivility in the Inference Engine Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Integrate_your_kernels_into_IE.html) +- [Inference Engine Samples Overview](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Samples_Overview.html) +- [Overview of OpenVINO™ Toolkit Pre-Trained Models](https://docs.openvinotoolkit.org/latest/_intel_models_index.html) +- [Inference Engine Tutorials](https://github.com/intel-iot-devkit/inference-tutorials-generic) +- For IoT Libraries and Code Samples see the [Intel® IoT Developer Kit](https://github.com/intel-iot-devkit). + +## Converting Models: + +- [Convert Your Caffe* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_Caffe.md) +- [Convert Your TensorFlow* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md) +- [Convert Your MXNet* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_MxNet.md) +- [Convert Your ONNX* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md) + + + diff --git a/docs/HOWTO/add_regression_test_vpu.md b/docs/HOWTO/add_regression_test_vpu.md new file mode 100644 index 00000000000000..e48a34cb7fed85 --- /dev/null +++ b/docs/HOWTO/add_regression_test_vpu.md @@ -0,0 +1,83 @@ +# Regression tests howto {#openvino_docs_HOWTO_add_regression_test_vpu} + +## Purpose + +This document contains instructions for correctly modifying a set of regression tests. + +## Common + +Regression tests for Myriad and HDDL plugins are on the path: +`inference-engine/tests/functional/vpu/regression_tests/` + +The tests are divided into 4 groups: +* Classification +* Detection +* Raw-results +* Compilation +* VPU hetero + +Testing framework – [Google Test](https://github.com/google/googletest/). +Each group contains [parameterized](https://github.com/google/googletest/blob/master/googletest/docs/advanced.md) tests. The main idea is that to add a new test, you only need to add a new parameter. Except for scenarios different from the generalized case. + +## Classsification and Detection tests + +These groups contains two cases: + +* For generalized scenario (` VpuNoClassificationRegression, VpuNoDetectionRegression`) +* For specific scenario (` VpuNoClassificationRegressionSpecific, VpuNoDetectionRegressionSpecific`) + +### Generalized scenario + +If You want test new parameter(batch, precision, model and etc.) then You need to edit the existing initialization of parameterized tests or create a new one. +Example of initialization of parameterized tests: + +``` c++ +INSTANTIATE_TEST_CASE_P( + VPURegTestWithResources_nightly, + VpuNoClassificationRegression, + Combine(ValuesIn(VpuTestParamsContainer::testingPlugin()), + Values(Precision::FP16), + Values(1), // batches + Values(true), //IsHwAdaptiveMode + Values(false), //DoReshape + Values(3, 5, 7), //Resources + Values(false), //IsIgnoreStatistic + Values(ClassificationSrcParam{ModelName::GoogleNetV1, SourceImages::kCat3, 0.01, Regression::EMean::eValues})), + VpuNoClassificationRegression::getTestCaseName); +``` + +### Specific scenario + +If You need a test to perform some actions that are not provided in the generalized scenario, then add a specific test case. As with the generalized scenario You can change parameters for these tests. +Example of specific test case: + +``` c++ +TEST_P(VpuNoClassificationRegressionSpecific, onAlexNetWithNetworkConfig) { + DISABLE_ON_WINDOWS_IF(HDDL_PLUGIN); + DISABLE_IF(do_reshape_); + + if (!hw_adaptive_mode_) { + config_[VPU_CONFIG_KEY(NETWORK_CONFIG)] = "data=data,scale=1"; + } + + assertThat().classificationResultsForInferRequestAPI() + .on(SourceImages::kDog2) + .withInputPrecision(in_precision_) + .times(batch_) + .withBatch(batch_) + .onModel(ModelName::AlexNet) + .setMean(Regression::EMean::eImage) + .onFP16() + .withTopK(1) + .withPluginConfig(config_) + .equalToReferenceWithDelta(0.04); +} +``` + +## Raw-results tests + +There is no generalized scenario and recommendations are the same as for specific test cases for Classification/Detection groups. + +## Compilation tests + +The tests are in the `vpu_classification_regression.cpp` file and contains only one scenario ` VpuNoRegressionWithCompilation `. To add a new test just update parameters just as in generalized scenarion of Classification/Detection test groups. diff --git a/docs/HOWTO/fuzzing-HOWTO.md b/docs/HOWTO/fuzzing-HOWTO.md new file mode 100644 index 00000000000000..614e94eec4e603 --- /dev/null +++ b/docs/HOWTO/fuzzing-HOWTO.md @@ -0,0 +1,94 @@ +# Fuzzing howto {#openvino_docs_HOWTO_fuzzing_HOWTO} + +## Intended Audience + +This document is for a developer who wants to contribute fuzz tests. + +## Purpose + +This document walks you through creating your first fuzzer, running it and evaluating its quality. + +## Prerequisites + +- Linux OS or Mac OS. + +- [American Fuzzy Loop](http://lcamtuf.coredump.cx/afl/) if building with GCC. + +## Steps + +1. Create a fuzz test in the existing project at `./tests/fuzz`. Fuzz test must + follow `-fuzzer.cc` naming scheme and implement a + `LLVMFuzzerTestOneInput` entry point. + +``` bash +cat << EOF > ./tests/fuzz/test_name-fuzzer.cc +#include +#include + +extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { + // put your fuzzing code here and use data+size as input. + return 0; // always return 0 +} +EOF +``` + +2. Implement test logic under `LLVMFuzzerTestOneInput`. + +See example fuzz test at `tests/fuzz/read_network-fuzzer.cc`. + +3. Build fuzz tests with `-DENABLE_FUZZING=ON` flag for cmake. + +``` bash + mkdir -p build && \ + (cd build && \ + CXX=afl-g++ CC=afl-gcc cmake -DCMAKE_BUILD_TYPE=Debug -DENABLE_FUZZING=ON -DENABLE_TESTS=ON .. && \ + make fuzz --jobs=$(getconf _NPROCESSORS_ONLN)) +``` + +4. Prepare sample inputs for your fuzz test to teach fuzzer engine on input + structure + +``` bash +(cd bin/intel64/Debug && \ +mkdir test_name-corpus && \ +echo sample input > test_name-corpus/in1.txt) +``` + +5. Evaluate fuzz test with `afl-fuzz` fuzzing engine + +Run fuzz test: + +``` bash +(cd bin/intel64/Debug && \ +afl-fuzz -i test_name-corpus -o test_name-out -- ./test_name-fuzzer @@ +``` + +While fuzz test is running it prints out statistics. Besides just crashes `uniq +crashes` and hangs `uniq hangs` you should care about fuzz test quality: + +- Fuzz test should be fast - speed of execution `exec speed` should be at least + 100 exec/s. Speed less than 20 exec/s is not acceptable. + +- Fuzz test should be able to explore new code paths `map coverage` and + `findings in depth`. Confirm it is increasing while fuzz test is running. + +6. Reproduce fuzz test findings + +All issues found by fuzz test are stored as a file in output folder specified +earlier via `-o` afl-fuzz option. To reproduce an issue run fuzz test executable +with an issue file as an argument. + +## Summary + +We have created a simple fuzz test, run it and asses its results. + +## Extension + +Try run parallel fuzzing with the help of +[afl-utils](https://gitlab.com/rc0r/afl-utils). + +## Tips or FAQs + +GCC 7 in Ubuntu 18.04 LTS has a +[defect](https://bugs.launchpad.net/ubuntu/+source/afl/+bug/1774816). Upgrade +GCC 7 for AFL to work. GCC version `Ubuntu 7.3.0-27ubuntu1~18.04` works OK. diff --git a/docs/HOWTO/img/IE_extensions_flow.png b/docs/HOWTO/img/IE_extensions_flow.png new file mode 100644 index 00000000000000..ca665ca3298bbb --- /dev/null +++ b/docs/HOWTO/img/IE_extensions_flow.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c2f362a39ae6c2af080e4f055b6fdba4954f918f85731545d1df3d687d9213d5 +size 421056 diff --git a/docs/HOWTO/img/MEG_generic_flow.png b/docs/HOWTO/img/MEG_generic_flow.png new file mode 100644 index 00000000000000..a492c3fff5026b --- /dev/null +++ b/docs/HOWTO/img/MEG_generic_flow.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cb5c700d003936779455353bfa4ed9432410c0975c46e2dfd30c6a1abccd1727 +size 23320 diff --git a/docs/HOWTO/img/MO_extensions_flow.png b/docs/HOWTO/img/MO_extensions_flow.png new file mode 100644 index 00000000000000..5009c0ce2604ad --- /dev/null +++ b/docs/HOWTO/img/MO_extensions_flow.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:99d6b5146be85fa408dc5432883c3e2745cffe890133854a97dcf22f5c5962d4 +size 47564 diff --git a/docs/HOWTO/img/mo_caffe_priorities.png b/docs/HOWTO/img/mo_caffe_priorities.png new file mode 100644 index 00000000000000..665892316c17fc --- /dev/null +++ b/docs/HOWTO/img/mo_caffe_priorities.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0a4de6e502cae7542f1f311bcdbea6bb145f960f0d27d86a03160d1a60133778 +size 301310 diff --git a/docs/IE_DG/API_Changes.md b/docs/IE_DG/API_Changes.md new file mode 100644 index 00000000000000..5a82cfd19ba7d7 --- /dev/null +++ b/docs/IE_DG/API_Changes.md @@ -0,0 +1,563 @@ +# Inference Engine API Changes History {#openvino_docs_IE_DG_API_Changes} + +The sections below contain detailed list of changes made to the Inference Engine API in recent releases. + +## Deprecation Notice + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit. + +Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware. + +## 2020.4 + +### New API + + **CPU Plugin API:** + + * InferenceEngine::PluginConfigParams::KEY_ENFORCE_BF16 config key + + **Metrics and values for Query API:** + + * METRIC_KEY(OPTIMIZATION_CAPABILITIES) + * METRIC_VALUE(BF16) + +### Deprecated API + + **Myriad Plugin API:** + + * VPU_CONFIG_KEY(IGNORE_IR_STATISTIC) + +### Removed API + + **Inference Engine NN Builder API:** + + * InferenceEngine::Builder::EltwiseLayer + * InferenceEngine::Builder::MemoryLayer + * InferenceEngine::Builder::ROIPoolingLayer + * InferenceEngine::Builder::DeconvolutionLayer + * InferenceEngine::Builder::ReLULayer + * InferenceEngine::Builder::TanHLayer + * InferenceEngine::Builder::InputLayer + * InferenceEngine::Builder::PoolingLayer + * InferenceEngine::Builder::CropLayer + * InferenceEngine::Builder::GRUSequenceLayer + * InferenceEngine::Builder::NormLayer + * InferenceEngine::Builder::LSTMSequenceLayer + * InferenceEngine::Builder::ClampLayer + * InferenceEngine::Builder::PSROIPoolingLayer + * InferenceEngine::Builder::Layer + * InferenceEngine::Builder::RNNSequenceLayer + * InferenceEngine::Builder::ReorgYoloLayer + * InferenceEngine::Builder::NormalizeLayer + * InferenceEngine::Builder::PriorBoxClusteredLayer + * InferenceEngine::Builder::MVNLayer + * InferenceEngine::Builder::PermuteLayer + * InferenceEngine::Builder::SimplerNMSLayer + * InferenceEngine::Builder::ConstLayer + * InferenceEngine::Builder::DeformableConvolutionLayer + * InferenceEngine::Builder::FullyConnectedLayer + * InferenceEngine::Builder::PriorBoxLayer + * InferenceEngine::Builder::SoftMaxLayer + * InferenceEngine::Builder::OutputLayer + * InferenceEngine::Builder::TileLayer + * InferenceEngine::Builder::SplitLayer + * InferenceEngine::Builder::PReLULayer + * InferenceEngine::Builder::RegionYoloLayer + * InferenceEngine::Builder::ReshapeLayer + * InferenceEngine::Builder::ConvolutionLayer + * InferenceEngine::Builder::DetectionOutputLayer + * InferenceEngine::Builder::ConcatLayer + * InferenceEngine::Builder::ELULayer + * InferenceEngine::Builder::GRNLayer + * InferenceEngine::Builder::LRNLayer + * InferenceEngine::Builder::ArgMaxLayer + * InferenceEngine::Builder::ReLU6Layer + * InferenceEngine::Builder::ScaleShiftLayer + * InferenceEngine::Builder::ProposalLayer + * InferenceEngine::Builder::SigmoidLayer + * InferenceEngine::Builder::ResampleLayer + * InferenceEngine::Builder::CTCGreedyDecoderLayer + * InferenceEngine::Builder::BatchNormalizationLayer + * InferenceEngine::Builder::LayerDecorator + * InferenceEngine::Builder::PowerLayer + * InferenceEngine::Builder::Network + * InferenceEngine::Builder::PortInfo + * InferenceEngine::Builder::Connection + * InferenceEngine::Builder::PortData + * InferenceEngine::Builder::Port + * InferenceEngine::Builder::ILayer + * InferenceEngine::Builder::INetworkIterator + * InferenceEngine::Builder::INetwork + * InferenceEngine::Builder::ILayer + +## 2020.2 + +### New API + + **Extensibility API:** + + * InferenceEngine::IExtension::getImplTypes(const std::shared_ptr& node) method + * InferenceEngine::IExtension::getImplementation(const std::shared_ptr& node, const std::string& implType) method + +### Deprecated API + + **Extensibility API:** + + * InferenceEngine::ILayerImplFactory class + * InferenceEngine::IShapeInferImpl class + * InferenceEngine::IShapeInferImpl class + * InferenceEngine::IShapeInferExtension class + * InferenceEngine::IExtension::getFactoryFor(ILayerImplFactory\*& factory, const CNNLayer\* cnnLayer, ResponseDesc\* resp) noexcept method + * InferenceEngine::IExtension::getPrimitiveTypes(char\*\*& types, unsigned int& size, ResponseDesc\* resp) noexcept method + * InferenceEngine::ShapeInferImpl class + * InferenceEngine::Extension::getFactoryFor(ILayerImplFactory\*& factory, const CNNLayer\* cnnLayer, ResponseDesc\* resp) noexcept method + * InferenceEngine::Extension::getPrimitiveTypes(char\*\*& types, unsigned int& size, ResponseDesc\* resp) noexcept method + + **Network API:** + + * InferenceEngine::details::CNNNetworkIterator class + * InferenceEngine::CNNNetwork::getPrecision() const method + * InferenceEngine::CNNNetwork::getLayerByName(const char\* layerName) const method + * InferenceEngine::CNNNetwork::size() const method + * InferenceEngine::CNNNetwork::begin() const method + * InferenceEngine::CNNNetwork::end() const method + * InferenceEngine::CNNNetwork::AddExtension(const IShapeInferExtensionPtr& extension) method + * InferenceEngine::ICNNNetwork::getPrecision() const noexcept method + * InferenceEngine::ICNNNetwork::getName(char\* pName, size_t len) const noexcept method + * InferenceEngine::ICNNNetwork::getData(const char\* dname) noexcept method + * InferenceEngine::ICNNNetwork::addLayer(const CNNLayerPtr& layer) noexcept method + * InferenceEngine::ICNNNetwork::getLayerByName(const char\* layerName, CNNLayerPtr& out, ResponseDesc\* resp) const noexcept method + * InferenceEngine::ICNNNetwork::AddExtension(const IShapeInferExtensionPtr& extension, ResponseDesc\* resp) noexcept method + * InferenceEngine::ICNNNetwork::getStats(ICNNNetworkStats\*\* stats, ResponseDesc\* resp) const noexcept method + * InferenceEngine::ICNNNetworkStats class + * InferenceEngine::NetworkNodeStats class + * InferenceEngine::Data::getCreatorLayer() method + * InferenceEngine::Data::getInputTo() method + * InferenceEngine::LayerParams class + + **Layer API:** + + * InferenceEngine::CNNLayer class + * InferenceEngine::WeightableLayer class + * InferenceEngine::BatchNormalizationLayer class + * InferenceEngine::BatchToSpaceLayer class + * InferenceEngine::BinaryConvolutionLayer class + * InferenceEngine::BroadcastLayer class + * InferenceEngine::BucketizeLayer class + * InferenceEngine::ClampLayer class + * InferenceEngine::ConcatLayer class + * InferenceEngine::ConvolutionLayer class + * InferenceEngine::CropLayer class + * InferenceEngine::DeconvolutionLayer class + * InferenceEngine::DeformableConvolutionLayer class + * InferenceEngine::DepthToSpaceLayer class + * InferenceEngine::EltwiseLayer class + * InferenceEngine::ExperimentalDetectronPriorGridGenerator class + * InferenceEngine::ExperimentalDetectronPriorGridGeneratorLayer class + * InferenceEngine::ExperimentalSparseWeightedReduceLayer class + * InferenceEngine::FillLayer class + * InferenceEngine::FullyConnectedLayer class + * InferenceEngine::GRNLayer class + * InferenceEngine::GRUCell class + * InferenceEngine::GatherLayer class + * InferenceEngine::GemmLayer class + * InferenceEngine::LSTMCell class + * InferenceEngine::MVNLayer class + * InferenceEngine::MathLayer class + * InferenceEngine::NonMaxSuppression class + * InferenceEngine::NormLayer class + * InferenceEngine::OneHotLayer class + * InferenceEngine::PReLULayer class + * InferenceEngine::PadLayer class + * InferenceEngine::PoolingLayer class + * InferenceEngine::PowerLayer class + * InferenceEngine::QuantizeLayer class + * InferenceEngine::RNNCell class + * InferenceEngine::RNNCellBase class + * InferenceEngine::RNNSequenceLayer class + * InferenceEngine::RangeLayer class + * InferenceEngine::ReLU6Layer class + * InferenceEngine::ReLULayer class + * InferenceEngine::ReduceLayer class + * InferenceEngine::ReshapeLayer class + * InferenceEngine::ReverseSequenceLayer class + * InferenceEngine::ScaleShiftLayer class + * InferenceEngine::ScatterLayer class + * InferenceEngine::SelectLayer class + * InferenceEngine::ShuffleChannelsLayer class + * InferenceEngine::SoftMaxLayer class + * InferenceEngine::SpaceToBatchLayer class + * InferenceEngine::SpaceToDepthLayer class + * InferenceEngine::SparseFillEmptyRowsLayer class + * InferenceEngine::SparseSegmentReduceLayer class + * InferenceEngine::SparseToDenseLayer class + * InferenceEngine::SplitLayer class + * InferenceEngine::StridedSliceLayer class + * InferenceEngine::TensorIterator class + * InferenceEngine::TileLayer class + * InferenceEngine::TopKLayer class + * InferenceEngine::UniqueLayer class + +## 2020.1 + +### New API + + **Integration with ngraph API:** + + * InferenceEngine::CNNNetwork(const std::shared_ptr& network) ctor from ngraph::Function + * InferenceEngine::CNNNetwork::getFunction() const noexcept method + * InferenceEngine::ICNNNetwork::getFunction() const noexcept method + * InferenceEngine::Parameter(const std::shared_ptr& var) ctor + * InferenceEngine::Parameter::asVariant() const method + * InferenceEngine::Parameter::operator std::shared_ptr() const operator + * InferenceEngine::Core::ReadNetwork(const std::wstring& modelPath, const std::wstring& binPath) method + * InferenceEngine::Core::ReadNetwork(const std::string& modelPath, const std::string& binPath = "") method + * InferenceEngine::Core::ReadNetwork(const std::string& model, const Blob::CPtr& weights) method + * InferenceEngine::Code::AddExtension(const IExtensionPtr& extension) method + * InferenceEngine::IExtension::getOpSets() method + + + **Offline compilation: import / export to std::stream:** + + * InferenceEngine::ExecutableNetwork::Export(std::ostream& networkModel) method + * InferenceEngine::Core::ImportNetwork(std::istream& networkModel, const std::string& deviceName = {}, const std::map& config = {}) method + * InferenceEngine::IExecutableNetwork::Export(std::ostream& networkModel, ResponseDesc \*resp) noexcept method + + + **RemoteBlob accelerator memory sharing API:** + + * InferenceEngine::RemoteContext class + * InferenceEngine::RemoteBlob class + * InferenceEngine::Core::CreateContext(const std::string& deviceName, const ParamMap& params) method + * InferenceEngine::Core::GetDefaultContext(const std::string& deviceName) method + * InferenceEngine::Core::LoadNetwork(CNNNetwork network, RemoteContext::Ptr context, const std::map& config = std::map()) method + + + **GNA firmware model image generation:** + + * GNA_CONFIG_KEY(FIRMWARE_MODEL_IMAGE_GENERATION) config key + * GNA_CONFIG_VALUE(GEN) value + * GNA_CONFIG_VALUE(GEN_EXACT) value + * GNA_CONFIG_VALUE(SSE) value + * GNA_CONFIG_VALUE(SSE_EXACT) value + * GNA_CONFIG_VALUE(AVX1) value + * GNA_CONFIG_VALUE(AVX1_EXACT) value + * GNA_CONFIG_VALUE(AVX2) value + * GNA_CONFIG_VALUE(AVX2_EXACT) value + + **MemoryBlob mapping of memory to the user space:** + + * InferenceEngine::MemoryBlob::rwmap() noexcept method + * InferenceEngine::MemoryBlob::rmap() noexcept method + * InferenceEngine::MemoryBlob::wmap() noexcept method + + **Memory interoperability on acceleration devices. General classes and GPU helper functions** + * InferenceEngine::RemoteBlob class + * InferenceEngine::RemoteContext class + * InferenceEngine::Core::CreateContext(const std::string& deviceName, const ParamMap& params) method + * InferenceEngine::Core::GetDefaultContext(const std::string& deviceName) method + * InferenceEngine::make_shared_blob(const TensorDesc& desc, RemoteContext::Ptr ctx) function + * InferenceEngine::gpu::make_shared_blob_nv12(size_t height, size_t width, RemoteContext::Ptr ctx, VASurfaceID nv12_surf) function + * InferenceEngine::gpu::make_shared_context(Core& core, std::string deviceName, VADisplay device) function + * InferenceEngine::gpu::make_shared_blob(const TensorDesc& desc, RemoteContext::Ptr ctx, VASurfaceID surface, uint32_t plane = 0) function + * InferenceEngine::gpu::make_shared_blob_nv12(RemoteContext::Ptr ctx, cl::Image2D& nv12_image_plane_y, cl::Image2D& nv12_image_plane_uv) function + * InferenceEngine::gpu::make_shared_context(Core& core, std::string deviceName, cl_context ctx) function + * InferenceEngine::gpu::make_shared_blob(const TensorDesc& desc, ClContext::Ptr ctx) function + * InferenceEngine::gpu::make_shared_blob(const TensorDesc& desc, RemoteContext::Ptr ctx, cl::Buffer& buffer) function + * InferenceEngine::gpu::make_shared_blob(const TensorDesc& desc, RemoteContext::Ptr ctx, cl_mem buffer) function + * InferenceEngine::gpu::make_shared_blob(const TensorDesc& desc, RemoteContext::Ptr ctx, cl::Image2D& image) function + +### Deprecated API + + **Inference Engine NN Builder API:** + + * InferenceEngine::Builder::EltwiseLayer + * InferenceEngine::Builder::MemoryLayer + * InferenceEngine::Builder::ROIPoolingLayer + * InferenceEngine::Builder::DeconvolutionLayer + * InferenceEngine::Builder::ReLULayer + * InferenceEngine::Builder::TanHLayer + * InferenceEngine::Builder::InputLayer + * InferenceEngine::Builder::PoolingLayer + * InferenceEngine::Builder::CropLayer + * InferenceEngine::Builder::GRUSequenceLayer + * InferenceEngine::Builder::NormLayer + * InferenceEngine::Builder::LSTMSequenceLayer + * InferenceEngine::Builder::ClampLayer + * InferenceEngine::Builder::PSROIPoolingLayer + * InferenceEngine::Builder::Layer + * InferenceEngine::Builder::RNNSequenceLayer + * InferenceEngine::Builder::ReorgYoloLayer + * InferenceEngine::Builder::NormalizeLayer + * InferenceEngine::Builder::PriorBoxClusteredLayer + * InferenceEngine::Builder::MVNLayer + * InferenceEngine::Builder::PermuteLayer + * InferenceEngine::Builder::SimplerNMSLayer + * InferenceEngine::Builder::ConstLayer + * InferenceEngine::Builder::DeformableConvolutionLayer + * InferenceEngine::Builder::FullyConnectedLayer + * InferenceEngine::Builder::PriorBoxLayer + * InferenceEngine::Builder::SoftMaxLayer + * InferenceEngine::Builder::OutputLayer + * InferenceEngine::Builder::TileLayer + * InferenceEngine::Builder::SplitLayer + * InferenceEngine::Builder::PReLULayer + * InferenceEngine::Builder::RegionYoloLayer + * InferenceEngine::Builder::ReshapeLayer + * InferenceEngine::Builder::ConvolutionLayer + * InferenceEngine::Builder::DetectionOutputLayer + * InferenceEngine::Builder::ConcatLayer + * InferenceEngine::Builder::ELULayer + * InferenceEngine::Builder::GRNLayer + * InferenceEngine::Builder::LRNLayer + * InferenceEngine::Builder::ArgMaxLayer + * InferenceEngine::Builder::ReLU6Layer + * InferenceEngine::Builder::ScaleShiftLayer + * InferenceEngine::Builder::ProposalLayer + * InferenceEngine::Builder::SigmoidLayer + * InferenceEngine::Builder::ResampleLayer + * InferenceEngine::Builder::CTCGreedyDecoderLayer + * InferenceEngine::Builder::BatchNormalizationLayer + * InferenceEngine::Builder::LayerDecorator + * InferenceEngine::Builder::PowerLayer + * InferenceEngine::Builder::Network + * InferenceEngine::Builder::PortInfo + * InferenceEngine::Builder::Connection + * InferenceEngine::Builder::PortData + * InferenceEngine::Builder::Port + * InferenceEngine::Builder::ILayer + * InferenceEngine::Builder::INetworkIterator + * InferenceEngine::Builder::INetwork + * InferenceEngine::Builder::ILayer + + **Plugin API:** + + * InferenceEngine::InferencePlugin C++ plugin wrapper class + * InferenceEngine::IInferencePlugin plugin interface + * InferenceEngine::PluginDispatcher class + * InferenceEngine::InferenceEnginePluginPtr typedef + * InferenceEngine::ICNNNetReader reader interface + * InferenceEngine::CNNNetReader class + + **Blob API:** + + * Blob::element_size() const noexcept method + * Blob::buffer() noexcept method + * Blob::cbuffer() noexcept method + * MemoryBlob::buffer() noexcept method + * MemoryBlob::cbuffer() noexcept method + + +### Removed API + + Removed all [Inference Engine API which deprecated in 2019'R2](https://docs.openvinotoolkit.org/2019_R3/_docs_IE_DG_API_Changes.html#deprecated_api) + +## 2019 R3 + +### New API + + **New supported layers:** + + * InferenceEngine::SparseFillEmptyRowsLayer new class + * InferenceEngine::UniqueLayer new class + * InferenceEngine::NonMaxSuppressionLayer new class + * InferenceEngine::ScatterLayer new class + + **FPGA plugin streaming support:** + + * DLIA_METRIC_VALUE(INPUT_STREAMING) value to METRIC_KEY(OPTIMIZATION_CAPABILITIES) + * DLIA_CONFIG_KEY(ENABLE_STREAMING) config key + +### Removed API + + * InferenceEngine::EltwiseLayer::Select from InferenceEngine::EltwiseLayer::eOperation enumeration + +## 2019 R2 + +### New API + + **Inference Engine Core API:** + + * Introduced InferenceEngine::Core high level class to manage devices + + **Query API extensions to InferenceEngine::ExecutableNetwork and InferenceEngine::IExecutableNetwork:** + + * InferenceEngine::ExecutableNetwork::SetConfig method + * InferenceEngine::ExecutableNetwork::GetConfig method + * InferenceEngine::ExecutableNetwork::GetMetric method + * InferenceEngine::IExecutableNetwork::SetConfig method + * InferenceEngine::IExecutableNetwork::GetConfig method + * InferenceEngine::IExecutableNetwork::GetMetric method + + **Metrics and values for Query API:** + + * METRIC_KEY(AVAILABLE_DEVICES) + * METRIC_KEY(SUPPORTED_METRICS) + * METRIC_KEY(SUPPORTED_CONFIG_KEYS) + * METRIC_KEY(FULL_DEVICE_NAME) + * METRIC_KEY(OPTIMIZATION_CAPABILITIES) + * METRIC_VALUE(FP32) + * METRIC_VALUE(FP16) + * METRIC_VALUE(INT8) + * METRIC_VALUE(BIN) + * METRIC_VALUE(WINOGRAD) + * DLIA_METRIC_VALUE(FP11) + * METRIC_KEY(RANGE_FOR_STREAMS) + * METRIC_KEY(NUMBER_OF_WAITING_INFER_REQUESTS) + * METRIC_KEY(NUMBER_OF_EXEC_INFER_REQUESTS) + * METRIC_KEY(DEVICE_THERMAL) + * METRIC_KEY(RANGE_FOR_ASYNC_INFER_REQUESTS) + * EXEC_NETWORK_METRIC_KEY(NETWORK_NAME) + * EXEC_NETWORK_METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS) + + **Common API:** + + * CLDNN_CONFIG_KEY(INT8_ENABLED) config key + * CONFIG_KEY(GPU_THROUGHPUT_AUTO) + * CONFIG_KEY(GPU_THROUGHPUT_STREAMS) + * DLIA_CONFIG_KEY(IO_TRANSFORMATIONS_NATIVE) config key + * DLIA_CONFIG_KEY(DUMP_SUPPORTED_LAYERS_INFORMATION) config key + * GNA_CONFIG_VALUE(SW_FP32) config value for GNA_CONFIG_KEY(DEVICE_MODE) key + * MULTI_CONFIG_KEY(DEVICE_PRIORITIES) config key for `MULTI` device + * InferenceEngine::CNNNetReader::ReadNetwork(const std::wstring &filepath) new method + * InferenceEngine::CNNNetReader::ReadWeights(const std::wstring &filepath) new method + * InferenceEngine::ExecutableNetwork::ExecutableNetwork(IExecutableNetwork::Ptr actual, InferenceEnginePluginPtr plg) constructor with additional `plg` parameter + * InferenceEngine::InferRequest::InferRequest(IInferRequest::Ptr request, InferenceEnginePluginPtr plg) constructor with additional `plg` parameter + * InferenceEngine::Data::setName method + * InferenceEngine::QueryNetworkResult::supportedLayersMap + * InferenceEngine::Precision::I64 extension to InferenceEngine::Precision::ePrecision enumeration + + **New supported primitives:** + + * InferenceEngine::Builder::DeformableConvolutionLayer new class + * InferenceEngine::DeformableConvolutionLayer new class + * InferenceEngine::EltwiseLayer::Logical_NOT, InferenceEngine::EltwiseLayer::Mean, InferenceEngine::EltwiseLayer::Select extensions to InferenceEngine::EltwiseLayer::eOperation enumeration + * InferenceEngine::OneHotLayer new class + * InferenceEngine::SelectLayer new class + * InferenceEngine::BroadcastLayer new class + * InferenceEngine::MathLayer new class + * InferenceEngine::ReduceLayer new class + * InferenceEngine::TopKLayer new class + + **Extensions to Blob creation API:** + + * InferenceEngine::Blob::is method + * InferenceEngine::Blob::is const method + * InferenceEngine::Blob::as method + * InferenceEngine::Blob::as const method + * InferenceEngine::Blob::getAllocator abstract method + * InferenceEngine::Blob::getHandle abstract method + * InferenceEngine::MemoryBlob class + * InferenceEngine::ColorFormat enumeration + * InferenceEngine::PreProcessInfo::setColorFormat method + * InferenceEngine::PreProcessInfo::getColorFormat method + * InferenceEngine::CompoundBlob class to work with blobs consisting of several planes + * InferenceEngine::NV12Blob class representing NV12 blob with two planes + +### Deprecated API + +The methods listed below are deprecated and will be removed in 2019 R4 release: + + **Common API:** + + * InferenceEngine::InputInfo::getInputPrecision method + * InferenceEngine::InputInfo::setInputPrecision method + * InferenceEngine::InputInfo::getDims method + * InferenceEngine::CNNLayer::GetParamsAsBool method + * InferenceEngine::CNNNetwork::CNNNetwork(ICNNNetwork* actual) constructor + * InferenceEngine::CNNNetwork::setTargetDevice method + * HETERO_CONFIG_KEY(DUMP_DLA_MESSAGES) config key + * InferenceEngine::ILayerImplFactory::getShapes method + * InferenceEngine::IShapeInferImpl::inferShapes(const std::vector&, const std::map& , const std::map&, std::vector&, ResponseDesc\*) method + * InferenceEngine::Data::setBatchSize method + * InferenceEngine::QueryNetworkResult::supportedLayers field + * InferenceEngine::ICNNNetwork::setBatchSize(const size_t size) method + * InferenceEngine::Blob::Resize method + * InferenceEngine::Blob::Reshape method + * InferenceEngine::TBlob::set method + + **InferenceEngine::IInferencePlugin and InferenceEngine:InferencePlugin obsolete methods:** + + * InferenceEngine::InferencePlugin::LoadNetwork(ICNNNetwork &network) method + * InferenceEngine::InferencePlugin::Infer method + * InferenceEngine::InferencePlugin::GetPerformanceCounts method + * InferenceEngine::InferencePlugin::QueryNetwork(const ICNNNetwork &network, QueryNetworkResult &res) const method + * InferenceEngine::IInferencePlugin::LoadNetwork(ICNNNetwork &network, ResponseDesc \*resp) method + * InferenceEngine::IInferencePlugin::Infer(const Blob &input, Blob &result, ResponseDesc \*resp) method + * InferenceEngine::IInferencePlugin::Infer(const BlobMap &input, BlobMap &result, ResponseDesc \*resp) method + * InferenceEngine::IInferencePlugin::GetPerformanceCounts method + * InferenceEngine::IInferencePlugin::QueryNetwork(const ICNNNetwork& network, QueryNetworkResult& res) const method + + + **Fields in InferenceEngine::Data class are replaced with appropriate methods:** + + * InferenceEngine::Data::precision field + * InferenceEngine::Data::layout field + * InferenceEngine::Data::dims field + * InferenceEngine::Data::creatorLayer field + * InferenceEngine::Data::name field + * InferenceEngine::Data::inputTo field + * InferenceEngine::Data::userObject field + + **Heterogeneous plugin:** + + * InferenceEngine::IHeteroDeviceLoader class + * InferenceEngine::IHeteroInferencePlugin class + * InferenceEngine::HeteroPluginPtr class + * operator InferenceEngine::InferencePlugin::HeteroPluginPtr operator + + **Blob creation API with dimensions in reverse order:** + + * InferenceEngine::Blob::Blob(Precision p) constructor + * InferenceEngine::Blob::Blob(Precision p, Layout l) constructor + * InferenceEngine::Blob::Blob(Precision p, const SizeVector &dims) constructor + * InferenceEngine::Blob::Blob(Precision p, Layout l, const SizeVector &dims) constructor + * InferenceEngine::TBlob::TBlob(Precision p, Layout l) constructor + * InferenceEngine::TBlob::TBlob(Precision p, Layout l, const SizeVector& dims) constructor + * InferenceEngine::TBlob::TBlob(Precision p, Layout l, const SizeVector& dims, T* ptr, size_t data_size) constructor + * InferenceEngine::TBlob::TBlob(Precision p, Layout l, const SizeVector &dims, std::shared_ptr alloc) constructor + * InferenceEngine::Blob::type() method + * InferenceEngine::Blob::precision() method + * InferenceEngine::Blob::layout() method + * InferenceEngine::Blob::dims() method + * InferenceEngine::make_shared_blob(Precision p, Layout l, const SizeVector &dims) function + * InferenceEngine::make_shared_blob(Precision p, const SizeVector &dims) function + * InferenceEngine::make_shared_blob(Precision p, Layout l, const TArg &arg) function + * InferenceEngine::make_shared_blob(Precision p, const TArg &arg) function + * InferenceEngine::make_shared_blob(TBlob &&arg) function + * InferenceEngine::make_shared_blob(Precision p, Layout l) function + * InferenceEngine::make_shared_blob(Precision p, Layout l, SizeVector dims, const std::vector &arg) function + * InferenceEngine::make_shared_blob(Precision p, Layout l, const std::vector &arg) function + * InferenceEngine::make_shared_blob(Precision p, const std::vector &arg) function + * InferenceEngine::make_shared_blob(Precision p, Layout l, const SizeVector &dims, TypeTo * ptr, size_t size) function + * InferenceEngine::make_shared_blob(Precision p, const SizeVector &dims, TypeTo * ptr, size_t size) function + * InferenceEngine::I_N variable + * InferenceEngine::I_C variable + * InferenceEngine::I_H variable + * InferenceEngine::I_W variable + * InferenceEngine::LayoutOffsetCounter class + * InferenceEngine::ConvertLayout function + + **API working with device enumeration:** + + * InferenceEngine::TargetDevice enumeration + * InferenceEngine::TargetDeviceInfo class + * InferenceEngine::getDeviceName function + * InferenceEngine::FindPluginRequest class + * InferenceEngine::FindPluginResponse class + * InferenceEngine::findPlugin(const FindPluginRequest &req, FindPluginResponse &result, ResponseDesc *resp) function + * InferenceEngine::ICNNNetwork::setTargetDevice method + * InferenceEngine::ICNNNetwork::getTargetDevice method + * InferenceEngine::PluginDispatcher::getPluginByDevice method + * InferenceEngine::PluginDispatcher::getSuitablePlugin method diff --git a/docs/IE_DG/Bfloat16Inference.md b/docs/IE_DG/Bfloat16Inference.md new file mode 100644 index 00000000000000..dcc48409cf2b71 --- /dev/null +++ b/docs/IE_DG/Bfloat16Inference.md @@ -0,0 +1,90 @@ +# Bfloat16 Inference {#openvino_docs_IE_DG_Bfloat16Inference} + +## Disclaimer + +Inference Engine with the bfloat16 inference implemented on CPU must support the `avx512_bf16` instruction and therefore the bfloat16 data format. + +## Introduction + +Bfloat16 computations (referred to as BF16) is the Brain Floating-Point format with 16 bits. This is a truncated 16-bit version of the 32-bit IEEE 754 single-precision floating-point format FP32. BF16 preserves 8 exponent bits as FP32 but reduces precision of the sign and mantissa from 24 bits to 8 bits. + +![bf16_format] + +Preserving the exponent bits keeps BF16 to the same range as the FP32 (~1e-38 to ~3e38). This simplifies conversion between two data types: you just need to skip or flush to zero 16 low bits. +Truncated mantissa leads to occasionally less precision, but according to [investigations](https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus), neural networks are more sensitive to the size of the exponent than the mantissa size. Also, in lots of models, precision is needed close to zero but not so much at the maximum range. +Another useful feature of BF16 is possibility to encode an INT8 in BF16 without loss of accuracy, because INT8 range completely fits in BF16 mantissa field. It reduces data flow in conversion from INT8 input image data to BF16 directly without intermediate representation in FP32, or in combination of [INT8 inference](Int8Inference.md) and BF16 layers. + +See the [Intel's site](https://software.intel.com/sites/default/files/managed/40/8b/bf16-hardware-numerics-definition-white-paper.pdf) for more bfloat16 format details. + +There are two ways to check if CPU device can support bfloat16 computations for models: +1. Query the instruction set via system `lscpu | grep avx512_bf16` or `cat /proc/cpuinfo | grep avx512_bf16`. +2. Use [Query API](InferenceEngine_QueryAPI.md) with `METRIC_KEY(OPTIMIZATION_CAPABILITIES)`, which should return `BF16` in the list of CPU optimization options: + +```cpp +InferenceEngine::Core core; +auto cpuOptimizationCapabilities = core.GetMetric("CPU", METRIC_KEY(OPTIMIZATION_CAPABILITIES)).as>(); +``` + +Current Inference Engine solution for bfloat16 inference uses Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) and supports inference of the following layers in BF16 computation mode: +* Convolution +* FullyConnected +* InnerProduct +* LRN +* Pooling + +This means that BF16 inference can only be performed with the CPU plugin on the layers listed above. All other layers are executed in FP32. + +## Lowering Inference Precision + +Lowering precision to increase performance is [widely used](https://software.intel.com/content/www/us/en/develop/articles/lower-numerical-precision-deep-learning-inference-and-training.html) for optimization of inference. The bfloat16 data type usage on CPU for the first time opens the possibility of default optimization approach. +The embodiment of this approach is to use the optimization capabilities of the current platform to achieve maximum performance while maintaining the accuracy of calculations within the acceptable range. + +Bfloat16 data usage provides the following benefits that increase performance: +1. Faster multiplication of two BF16 numbers because of shorter mantissa of bfloat16 data. +2. No need to support denormals and handling exceptions as this is a performance optimization. +3. Fast conversion of float32 to bfloat16 and vice versa. +4. Reduced size of data in memory, as a result, larger models fit in the same memory bounds. +5. Reduced amount of data that must be transferred, as a result, reduced data transition time. + +For default optimization on CPU, source model converts from FP32 or FP16 to BF16 and executes internally on platforms with native BF16 support. In that case, `KEY_ENFORCE_BF16` is set to `YES`. +The code below demonstrates how to check if the key is set: + +```cpp +InferenceEngine::Core core; +auto exeNetwork = core.LoadNetwork(network, "CPU"); +auto enforceBF16 = exeNetwork.GetConfig(PluginConfigParams::KEY_ENFORCE_BF16).as(); +``` + +To disable BF16 internal transformations, set the `KEY_ENFORCE_BF16` to `NO`. In this case, the model infers AS IS without modifications with precisions that were set on each layer edge. + +```cpp +InferenceEngine::Core core; +core.SetConfig({ { CONFIG_KEY(ENFORCE_BF16), CONFIG_VALUE(NO) } }, "CPU"); +``` + +An exception with message `Platform doesn't support BF16 format` is formed in case of setting `KEY_ENFORCE_BF16` to `YES` on CPU without native BF16 support. + +Low-Precision 8-bit integer models do not convert to BF16, even if bfloat16 optimization is set by default. + +## Performance Counters + +Information about layer precision is stored in the performance counters that are +available from the Inference Engine API. The layers have the following marks: +* Suffix `BF16` for layers that had bfloat16 data type input and were computed in BF16 precision +* Suffix `FP32` for layers computed in 32-bit precision + +For example, the performance counters table for the Inception model can look as follows: + +``` +pool5 EXECUTED layerType: Pooling realTime: 143 cpu: 143 execType: jit_avx512_BF16 +fc6 EXECUTED layerType: FullyConnected realTime: 47723 cpu: 47723 execType: jit_gemm_BF16 +relu6 NOT_RUN layerType: ReLU realTime: 0 cpu: 0 execType: undef +fc7 EXECUTED layerType: FullyConnected realTime: 7558 cpu: 7558 execType: jit_gemm_BF16 +relu7 NOT_RUN layerType: ReLU realTime: 0 cpu: 0 execType: undef +fc8 EXECUTED layerType: FullyConnected realTime: 2193 cpu: 2193 execType: jit_gemm_BF16 +prob EXECUTED layerType: SoftMax realTime: 68 cpu: 68 execType: jit_avx512_FP32 +``` + +The `execType` column of the table includes inference primitives with specific suffixes. + +[bf16_format]: img/bf16_format.png \ No newline at end of file diff --git a/docs/IE_DG/Cross_Check_Tool.md b/docs/IE_DG/Cross_Check_Tool.md new file mode 100644 index 00000000000000..495afa790fcccc --- /dev/null +++ b/docs/IE_DG/Cross_Check_Tool.md @@ -0,0 +1,298 @@ +Cross Check Tool {#openvino_docs_IE_DG_Cross_Check_Tool} +================ + +Cross Check Tool is a console application that enables comparing accuracy and performance metrics for two successive +model inferences that are performed +on two different supported Intel® devices or with different precisions. +The Cross Check Tool can compare metrics per layer or all over the model. + +On Linux* OS, before running the Cross Check Tool binary, make sure your application can find the +Deep Learning Inference Engine libraries. +Navigate to the `/deployment_tools/inference_engine/bin` folder and run the `setvars.sh` script to +set all necessary environment variables: + +```sh +source setvars.sh +``` + +## Running the Cross Check Tool + +Cross Check Tool is distributed as a binary file and there is no need to build it. To run the Cross Check Tool, +execute the tool's binary file with necessary parameters. Please note that the Inference Engine assumes that weights +are in the same folder as the _.xml_ file. + +You can get the list of all available options using the -h option: + +```sh +$./cross_check_tool -h +InferenceEngine: + API version ............ 1.0 + Build .................. ### +[ INFO ] Parsing input parameters + +./cross_check_tool [OPTION] +Options: + + -h Prints a usage message. + -i "" Optional. Path to an input image file or multi-input file to infer. Generates input(s) from normal distribution if empty + -m "" Required. Path to an .xml file that represents the first IR of the trained model to infer. + -l "" Required for MKLDNN (CPU)-targeted custom layers. Absolute path to a shared library with the kernels implementation. + Or + -c "" Required for clDNN (GPU)-targeted custom kernels. Absolute path to the xml file with the kernels description. + -conf "" Optional. Path to config file for -d device plugin + -ref_conf "" Optional. Path to config file for -ref_d device plugin + -pp "" Optional. Path to a plugin folder. + -d "" Required. The first target device to infer the model specified with the -m option. CPU, GPU, HDDL or MYRIAD is acceptable. + -ref_m "" Optional. Path to an .xml file that represents the second IR in different precision to compare the metrics. + -ref_d "" Required. The second target device to infer the model and compare the metrics. CPU, GPU, HDDL or MYRIAD is acceptable. + -layers "" Defines layers to check. Options: all, None - for output layers check, list of comma-separated layer names to check. Default value is None. + -eps "" Optional. Threshold for filtering out those blob statistics that do not statify the condition: max_abs_diff < eps. + -dump Enables blobs statistics dumping + -load "" Path to a file to load blobs from +``` +### Examples + +1. To check per-layer accuracy and performance of inference in FP32 precision on the CPU against the GPU, run: +```sh +./cross_check_tool -i \ + -m \ + -d CPU \ + -ref_d GPU \ + -layers all +``` +The output looks as follows: +``` +InferenceEngine: + API version ............ 1.0 + Build .................. ### +[ INFO ] Parsing input parameters + The same IR on both devices: + +[ INFO ] No extensions provided + + API version ............ 1.0 + Build .................. lnx_20180510 + Description ....... MKLDNNPlugin + + API version ............ 0.1 + Build .................. ci-main-03659 + Description ....... clDNNPlugin +[ INFO ] Inputs detected: Placeholder +[ INFO ] Statistics will be dumped for X layers: , , ... , +[ INFO ] Layer statistics + Max absolute difference: 1.52588e-05 + Min absolute difference: 0 + Max relative difference: 0.000288028% + Min relative difference: 0% + Blob size: 1000 + + Devices: CPU_FP32 GPU_FP32 + Status: EXECUTED EXECUTED + Layer type: Reshape Reshape + Real time, microsec: 20 154 + Execution type: unknown GPU + Number of NAN: 0 0 + Number of INF: 0 0 + Number of ZERO: 0 0 +... + +... + +[ INFO ] Overall max absolute difference 2.81334e-05 was reached by layer +[ INFO ] Overall min absolute difference 0 was reached by layer +[ INFO ] Overall max relative difference 0.744893% was reached by layer +[ INFO ] Overall min relative difference -2.47948% was reached by layer +[ INFO ] Execution successful +``` + +2. To check the overall accuracy and performance of inference on the CPU in FP32 precision against the +Intel® Movidius™ Myriad™ device in FP16 precision, run: +```sh +./cross_check_tool -i \ + -m \ + -ref_d CPU \ + -ref_m \ + -d MYRIAD \ +``` +The output looks as follows: +``` +InferenceEngine: + API version ............ 1.0 + Build .................. ### + +[ INFO ] Parsing input parameters +[ INFO ] MYRIAD vs CPU + IR for MYRIAD : + IR for CPU : + +[ INFO ] No extensions provided +[ INFO ] Loading plugins + + API version ............ 0.1 + Build .................. ### + Description ....... myriadPlugin + + + API version ............ 1.0 + Build .................. ### + Description ....... MKLDNNPlugin + +[ INFO ] Inputs detected: +[ INFO ] Statistics will be dumped for 1 layers: +[ INFO ] Layer statistics + Max absolute difference: 0.003889 + Min absolute difference: 2.49778e-12 + Max relative difference: 290.98% + Min relative difference: 0.0327804% + Devices: MYRIAD_FP16 CPU_FP32 + Real time, microsec: 69213.978946 4149.904940 +[ INFO ] Execution successful +``` + +3. To dump layer statistics from specific list of layers, run: +```sh +./cross_check_tool -i \ + -m \ + -d MYRIAD \ + -dump \ + -layers +``` +The output looks as follows: +``` +InferenceEngine: + API version ............ 1.0 + Build .................. ### +[ INFO ] Blob and statistics dumping enabled +[ INFO ] No extensions provided + + API version ............ 0.1 + Build .................. custom_releases/cvsdk-2018-r2_e28ec0278fb749d6b999c688a8e90a8a25c0f2b5 + Description ....... myriadPlugin + +[ INFO ] Inputs detected: +[ INFO ] Statistics will be dumped for X layers: +[ INFO ] Dump path: +[ INFO ] layer processing +... +[ INFO ] layer processing +[ INFO ] Execution successful +``` +If you do not provide the `-i` key, the Cross Check Tool generates an input from normal distributed noise and saves +it in a multi-input file format with the filename `_input_layers_dump.txt` in the same folder as the IR. +4. To check the overall accuracy and performance of inference on the CPU in FP32 precision against dumped results, run: +```sh +./cross_check_tool -i \ + -m \ + -d CPU \ + -load \ + -layers all +``` +The output looks as follows: +``` +InferenceEngine: + API version ............ 1.0 + Build .................. ### +[ INFO ] Blob and statistics loading enabled. File /localdisk/models/FP16/icv_squeezenet_v1.0_MYRIAD_FP16_dump.txt + The same IR on both devices: + +[ INFO ] No extensions provided + + API version ............ 0.1 + Build .................. ### + Description ....... myriadPlugin + +[ INFO ] Inputs detected: +[ INFO ] Statistics will be dumped for X layers: , , ... , +[ INFO ] layer processing +[ INFO ] Layer statistics + Max absolute difference: 0 + Min absolute difference: 0 + Max relative difference: 0% + Min relative difference: 0% + Blob size: 1000 + + Devices: MYRIAD_FP16 MYRIAD_FP16_loaded + Status: EXECUTED EXECUTED + Layer type: SoftMax SoftMax + Real time, microsec: 43 43 + Execution type: SoftMax SoftMax + Number of NAN: 0 0 + Number of INF: 0 0 + Number of ZERO: 0 0 +... + +... +[ INFO ] Overall max absolute difference 0 +[ INFO ] Overall min absolute difference 0 was reached by layer +[ INFO ] Overall max relative difference 0% +[ INFO ] Overall min relative difference 0% was reached by layer +[ INFO ] Execution successful +``` + +### Multi-input and dump file experimental format + +Text file contains description of each layer in structure like this: +* 1st line is layer name (required) +* 2nd line is shape like "(1,224,224,3)" (required) +* 3rd line is a device and precision information like "CPU_FP32" (optional for multi-input file) +* 4th line is execution status Options are: EXECUTED, OPTIMIZED_OUT (optional for multi-input file) +* 5th line is type of layer (optional for multi-input file) +* 6th line is execution time in microseconds (optional for multi-input file) +* 7th line is type of execution (optional for multi-input file) +* 8th line is word "CONTENT" which means that the next line or lines are consisted of blob elements +* Next line or lines are for blob elements. They may be separated with one or several spaces, tabs and new lines. + + +#### Multi-input file example + +``` +Input_1 +(1,10) +CONTENT +0 0.000628471375 0.00185108185 +0.000580787659 +0.00137138367 +0.000561237335 0.0040473938 0 0 0 +Input_2 +(1,8) +CONTENT +0 0 0.00194549561 0.0017490387 7.73072243e-05 0.000135779381 0.000186920166 0 7.52806664e-05 +``` + +#### Dump file example + +``` +Softmax +(1,10) +MYRIAD_FP16 +EXECUTED +SoftMax +43 +SoftMax +CONTENT +7.44462013e-05 +0 +0.000810623169 +0.000361680984 +0 +9.14335251e-05 +0 +0 +8.15987587e-05 +0 +``` + + +### Configuration file + +There is an option to pass configuration file to plugin by providing +`-conf` and/or `--ref_conf` keys. + +Configuration file is a text file with content of pairs of keys and values. + +Structure of configuration file: + +```sh +KEY VALUE +ANOTHER_KEY ANOTHER_VALUE,VALUE_1 +``` diff --git a/docs/IE_DG/Deep_Learning_Inference_Engine_DevGuide.md b/docs/IE_DG/Deep_Learning_Inference_Engine_DevGuide.md new file mode 100644 index 00000000000000..2e6e033069a83c --- /dev/null +++ b/docs/IE_DG/Deep_Learning_Inference_Engine_DevGuide.md @@ -0,0 +1,93 @@ +# Inference Engine Developer Guide {#openvino_docs_IE_DG_Deep_Learning_Inference_Engine_DevGuide} + +## Introduction to the OpenVINO™ Toolkit + +The OpenVINO™ toolkit is a comprehensive toolkit that you can use to develop and deploy vision-oriented solutions on +Intel® platforms. Vision-oriented means the solutions use images or videos to perform specific tasks. +A few of the solutions use cases include autonomous navigation, digital surveillance cameras, robotics, +and mixed-reality headsets. + +The OpenVINO™ toolkit: + +* Enables CNN-based deep learning inference on the edge +* Supports heterogeneous execution across an Intel® CPU, Intel® Integrated Graphics, Intel® Movidius™ Neural Compute Stick and Intel® Neural Compute Stick 2 +* Speeds time-to-market via an easy-to-use library of computer vision functions and pre-optimized kernels +* Includes optimized calls for computer vision standards including OpenCV\*, OpenCL™, and OpenVX\* + +The OpenVINO™ toolkit includes the following components: + +* Intel® Deep Learning Deployment Toolkit (Intel® DLDT) + - [Deep Learning Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) — A cross-platform command-line tool for importing models and + preparing them for optimal execution with the Deep Learning Inference Engine. The Model Optimizer supports converting Caffe*, + TensorFlow*, MXNet*, Kaldi*, ONNX* models. + - [Deep Learning Inference Engine](inference_engine_intro.md) — A unified API to allow high performance inference on many hardware types + including Intel® CPU, Intel® Processor Graphics, Intel® FPGA, Intel® Neural Compute Stick 2. + - [nGraph](nGraph_Flow.md) — graph representation and manipulation engine which is used to represent a model inside Inference Engine and allows the run-time model construction without using Model Optimizer. +* [OpenCV](https://docs.opencv.org/) — OpenCV* community version compiled for Intel® hardware. +Includes PVL libraries for computer vision. +* Drivers and runtimes for OpenCL™ version 2.1 +* [Intel® Media SDK](https://software.intel.com/en-us/media-sdk) +* [OpenVX*](https://software.intel.com/en-us/cvsdk-ovx-guide) — Intel's implementation of OpenVX* +optimized for running on Intel® hardware (CPU, GPU, IPU). +* [Demos and samples](Samples_Overview.md). + + +This Guide provides overview of the Inference Engine describing the typical workflow for performing +inference of a pre-trained and optimized deep learning model and a set of sample applications. + +> **NOTES:** +> - Before you perform inference with the Inference Engine, your models should be converted to the Inference Engine format using the Model Optimizer or built directly in run-time using nGraph API. To learn about how to use Model Optimizer, refer to the [Model Optimizer Developer Guide](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). To learn about the pre-trained and optimized models delivered with the OpenVINO™ toolkit, refer to [Pre-Trained Models](@ref omz_models_intel_index). +> - [Intel® System Studio](https://software.intel.com/en-us/system-studio) is an all-in-one, cross-platform tool suite, purpose-built to simplify system bring-up and improve system and IoT device application performance on Intel® platforms. If you are using the Intel® Distribution of OpenVINO™ with Intel® System Studio, go to [Get Started with Intel® System Studio](https://software.intel.com/en-us/articles/get-started-with-openvino-and-intel-system-studio-2019). + + +## Table of Contents + +* [Introduction to Intel® Deep Learning Deployment Toolkit](Introduction.md) + +* [Inference Engine API Changes History](API_Changes.md) + +* [Introduction to Inference Engine](inference_engine_intro.md) + +* [Introduction to nGraph Flow](nGraph_Flow.md) + +* [Understanding Inference Engine Memory Primitives](Memory_primitives.md) + +* [Introduction to Inference Engine Device Query API](InferenceEngine_QueryAPI.md) + +* [Adding Your Own Layers to the Inference Engine](Extensibility_DG/Intro.md) + +* [Integrating Inference Engine in Your Application](Integrate_with_customer_application_new_API.md) + +* [Migration from Inference Engine Plugin API to Core API](Migration_CoreAPI.md) + +* [Introduction to Performance Topics](Intro_to_Performance.md) + +* [Inference Engine Python API Overview](../../inference-engine/ie_bridges/python/docs/api_overview.md) + +* [Using Dynamic Batching feature](DynamicBatching.md) + +* [Using Static Shape Infer feature](ShapeInference.md) + +* [Using Low-Precision 8-bit Integer Inference](Int8Inference.md) + +* [Using Bfloat16 Inference](Bfloat16Inference.md) + +* Utilities to Validate Your Converted Model + * [Using Cross Check Tool for Per-Layer Comparison Between Plugins](../../inference-engine/tools/cross_check_tool/README.md) + +* [Supported Devices](supported_plugins/Supported_Devices.md) + * [GPU](supported_plugins/CL_DNN.md) + * [CPU](supported_plugins/CPU.md) + * [FPGA](supported_plugins/FPGA.md) + * [VPU](supported_plugins/VPU.md) + * [MYRIAD](supported_plugins/MYRIAD.md) + * [HDDL](supported_plugins/HDDL.md) + * [Heterogeneous execution](supported_plugins/HETERO.md) + * [GNA](supported_plugins/GNA.md) + * **NEW!** [MULTI](supported_plugins/MULTI.md) + +* [Pre-Trained Models](@ref omz_models_intel_index) + +* [Known Issues](Known_Issues_Limitations.md) + +**Typical Next Step:** [Introduction to Intel® Deep Learning Deployment Toolkit](Introduction.md) diff --git a/docs/IE_DG/DynamicBatching.md b/docs/IE_DG/DynamicBatching.md new file mode 100644 index 00000000000000..696b245d45c07e --- /dev/null +++ b/docs/IE_DG/DynamicBatching.md @@ -0,0 +1,83 @@ +Using Dynamic Batching {#openvino_docs_IE_DG_DynamicBatching} +====================== + +Dynamic Batching feature allows you+ to dynamically change batch size for inference calls +within preset batch size limit. +This feature might be useful when batch size is unknown beforehand, and using extra large batch size is +undesired or impossible due to resource limitations. +For example, face detection with person age, gender, or mood recognition is a typical usage scenario. + + +## Usage + +You can activate Dynamic Batching by setting KEY_DYN_BATCH_ENABLED flag to YES in a configuration map that is +passed to the plugin while loading a network. +This configuration creates an ExecutableNetwork object that will allow setting batch size +dynamically in all of its infer requests using SetBatch() method. +The batch size that was set in passed CNNNetwork object will be used as a maximum batch size limit. + +Here is a code example: +```cpp +int dynBatchLimit = FLAGS_bl; //take dynamic batch limit from command line option + +// Read network model +Core core; +CNNNetwork network = core.ReadNetwork(modelFileName, weightFileName); + +// enable dynamic batching and prepare for setting max batch limit +const std::map dyn_config = +{ { PluginConfigParams::KEY_DYN_BATCH_ENABLED, PluginConfigParams::YES } }; +network.setBatchSize(dynBatchLimit); + +// create executable network and infer request +auto executable_network = core.LoadNetwork(network, "CPU", dyn_config); +auto infer_request = executable_network.CreateInferRequest(); + + +... + + +// process a set of images +// dynamically set batch size for subsequent Infer() calls of this request +size_t batchSize = imagesData.size(); +infer_request.SetBatch(batchSize); +infer_request.Infer(); + +... + +// process another set of images +batchSize = imagesData2.size(); +infer_request.SetBatch(batchSize); +infer_request.Infer(); +``` + + +## Limitations + +Currently, certain limitations for using Dynamic Batching exist: + +* Use Dynamic Batching with CPU and GPU plugins only. + +* Use Dynamic Batching on topologies that consist of certain layers only: + + * Convolution + * Deconvolution + * Activation + * LRN + * Pooling + * FullyConnected + * SoftMax + * Split + * Concatenation + * Power + * Eltwise + * Crop + * BatchNormalization + * Copy + +Do not use layers that might arbitrary change tensor shape (such as Flatten, Permute, Reshape), +layers specific to object detection topologies (ROIPooling, ProirBox, DetectionOutput), and +custom layers. +Topology analysis is performed during the process of loading a network into plugin, and if topology is +not applicable, an exception is generated. + diff --git a/docs/IE_DG/Extensibility_DG/AddingNGraphOps.md b/docs/IE_DG/Extensibility_DG/AddingNGraphOps.md new file mode 100644 index 00000000000000..8c181062cb60fd --- /dev/null +++ b/docs/IE_DG/Extensibility_DG/AddingNGraphOps.md @@ -0,0 +1,89 @@ +# Add Custom nGraph Operations {#openvino_docs_IE_DG_Extensibility_DG_AddingNGraphOps} + +Inference Engine Extension API allows to register operation sets (opsets) with custom nGraph operations, it allows to support Networks with unknown operations. + +## Operation Class + +To add your custom nGraph operation, create a new class that extends `ngraph::Op`, which is in turn derived from `ngraph::Node`, the base class for all graph operations in nGraph. Follow the steps below: + +1. Define a `NodeTypeInfo` object that identifies the type of the operation to the graph users and helps with dynamic type resolution. The type info of an nGraph operation currently consists of a string identifier and a version number, but this may change in the future. + +2. Implement constructors that can optionally take the operation inputs and attributes as parameters. + +3. Override the shape inference method `validate_and_infer_types`. This method is called multiple times during graph manipulations to determine the shapes and element types of the outputs of the operations. You can access the input shapes through the `get_input_partial_shape()` method and input element types through the `get_input_element_type()` method of `ngraph::Node`. Set the inferred shape and element type of the output using `set_output_type`. + +4. Override the `copy_with_new_args` method, which allows graph manipulation routines to create copies of this operation and connect it to different nodes during optimization. + +5. Override the `visit_attributes` method, which allows serialization and deserialization of attributes. An `AttributeVisitor` is passed to the method, and the implementation is expected to walk over all the attributes in the op using the type-aware `on_attribute` helper. Helpers are already implemented for standard C++ types like `int64_t`, `float`, `bool`, `vector` and for existing nGraph defined types. + +Based on that, declaration of a operation class can look as follows: + +@snippet op.hpp op:header + +### Class Fields + +The provided implementation has several fields: + + * `add` of type `int64_t` is an attribute of custom operation + * `type_info` of type `ngraph::NodeTypeInfo` defines the type and version of operation + +### Operation Constructors + +nGraph operation contains two constructors: a default constructor, which allows to create operation without attributes and a constructor that creates and validates operation with specified inputs and attributes. + +@snippet op.cpp op:ctor + +### `validate_and_infer_types()` + +`ngraph::Node::validate_and_infer_types` method validates operation attributes and calculates output shapes using attributes of operation. + +@snippet op.cpp op:validate + +### `copy_with_new_args()` + +`ngraph::Node::copy_with_new_args` method creates a copy of nGraph operation with new inputs. + +@snippet op.cpp op:copy + +### `visit_attributes()` + +`ngraph::Node::visit_attributes` method allows to visit all operation attributes. + +@snippet op.cpp op:visit_attributes + +## Register Custom Operations in Extension Class + +To add custom operations to the [Extension](Extension.md) class, create an operation set with custom operations and implement the `InferenceEngine::IExtension::getOpSets` method: + +@snippet extension.cpp extension:getOpSets + +This method returns a map of opsets that exist in the extension library. + +nGraph provides opsets mechanism for operation versioning. Different opsets distinguish between different versions of one operation. + +When specifying opset names, follow the rules below: +* Use unique opset names. +* Do not use the following built-in opset names: `extension`, `experimental`, `opset1`, `opest2`. +* Make sure that the Model Optimizer and your extension use the same opset names. +* IR v10 layers have the mandatory `version` attribute specifying the opset. +* `opset1` is the name of default operations set. +Operations from the default opset cannot be redefined. + +Use a custom opset to create a new operation or extend functionality of an existing operation from another opset. + +## Deprecation Notice + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* diff --git a/docs/IE_DG/Extensibility_DG/Building.md b/docs/IE_DG/Extensibility_DG/Building.md new file mode 100644 index 00000000000000..8d33678da50897 --- /dev/null +++ b/docs/IE_DG/Extensibility_DG/Building.md @@ -0,0 +1,19 @@ +# Build Extension Library Using CMake* {#openvino_docs_IE_DG_Extensibility_DG_Building} + +Inference Engine build infrastructure provides the Inference Engine Package for application development. + +To build an extension library, use the following CMake script: + +@snippet CMakeLists.txt cmake:extension + +This CMake script finds the Inference Engine and nGraph using the `find_package` CMake command. + +To build an extension library, run the commands below: + +```sh +$ cd template_extension +$ mkdir build +$ cd build +$ cmake -DInferenceEngine_DIR=[IE_DIR] -Dngraph_DIR=[NGRAPH_DIR] ../ +$ cmake --build . +``` diff --git a/docs/IE_DG/Extensibility_DG/CPU_Kernel.md b/docs/IE_DG/Extensibility_DG/CPU_Kernel.md new file mode 100644 index 00000000000000..22fd0d062dea2e --- /dev/null +++ b/docs/IE_DG/Extensibility_DG/CPU_Kernel.md @@ -0,0 +1,74 @@ +# How to Implement Custom CPU Layers {#openvino_docs_IE_DG_Extensibility_DG_CPU_Kernel} + +The primary vehicle for the performance of the CPU codepath in the Inference Engine is the Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN), and new CPU kernels extend the Inference Engine plugin for the Intel MKL-DNN. Implementing the InferenceEngine::ILayerExecImpl defines a general CPU-side extension. There are no Intel MKL-DNN specifics in the way you need to implement a kernel. + +## Implementation Class + +All custom kernels for the CPU plugin should be inherited from the InferenceEngine::ILayerExecImpl interface. +Based on that, declaration of a kernel implementation class can look as follows: + +@snippet cpu_kernel.hpp cpu_implementation:header + +### Class Fields + +The provided implementation has several fields: + + * `add` of the type `int64_t` is an attribute of a custom operation + * `inShape` of the type `ngraph::Shape` is an input shape + * `outShape` of the type `ngraph::Shape` is an output shape + * `error` of the type `std::string` is a field to handle errors from a constructor + +### Constructor of Implementation + +An implementation constructor checks parameters of nGraph operation, stores needed attributes, and stores an error message in the case of an error. + +@snippet cpu_kernel.cpp cpu_implementation:ctor + +### `getSupportedConfigurations` + +InferenceEngine::ILayerExecImpl::getSupportedConfigurations method returns all supported configuration formats (input/output tensor layouts) for your implementation. To specify formats of data, use InferenceEngine::TensorDesc. Refer to the [Memory Primitives](../Memory_primitives.md) section for instructions on how to do it. + +@snippet cpu_kernel.cpp cpu_implementation:getSupportedConfigurations + +### `init` + +InferenceEngine::ILayerExecImpl::init method gets a runtime-selected configuration from a vector that is populated from the `getSupportedConfigurations` method and checks the parameters: + +@snippet cpu_kernel.cpp cpu_implementation:init + +### `execute` + +InferenceEngine::ILayerExecImpl::execute method accepts and processes the actual tenors as input/output blobs: + +@snippet cpu_kernel.cpp cpu_implementation:execute + +## Register Implementation in `Extension` Class + +To register custom kernel implementation in the [Extension](Extension.md) class, implement the following methods: +* getImplTypes +* getImplementation + +### getImplTypes + +InferenceEngine::IExtension::getImplTypes returns a vector of implementation types for an operation. + +@snippet extension.cpp extension:getImplTypes + +### getImplementation + +InferenceEngine::IExtension::getImplementation returns the kernel implementation with a specified type for an operation. + +@snippet extension.cpp extension:getImplementation + + +## Load Extension with Executable Kernels to Plugin + +Use the `AddExtension` method of the general plugin interface to load your primitives: +```cpp +InferenceEngine::Core core; +// Load CPU extension as a shared library +auto extension_ptr = make_so_pointer(""); +// Add extension to the CPU device +core.AddExtension(extension_ptr, "CPU"); +``` + diff --git a/docs/IE_DG/Extensibility_DG/Extension.md b/docs/IE_DG/Extensibility_DG/Extension.md new file mode 100644 index 00000000000000..1eb84bb5c694d9 --- /dev/null +++ b/docs/IE_DG/Extensibility_DG/Extension.md @@ -0,0 +1,25 @@ +# Extension Library {#openvino_docs_IE_DG_Extensibility_DG_Extension} + +Inference Engine provides an InferenceEngine::IExtension interface, which defines the interface for Inference Engine Extension libraries. +All extension libraries should be inherited from this interface. + +Based on that, declaration of an extension class can look as follows: + +@snippet extension.hpp extension:header + +The extension library should contain and export the method InferenceEngine::CreateExtension, which creates an `Extension` class: + +@snippet extension.cpp extension:CreateExtension + +Also, an `Extension` object should implement the following methods: + +* InferenceEngine::IExtension::Release deletes an extension object + +* InferenceEngine::IExtension::GetVersion returns information about version of the library + +@snippet extension.cpp extension:GetVersion + +Implement the InferenceEngine::IExtension::getOpSets method if the extension contains custom layers. +Read the [guide about custom operations](AddingNGraphOps.md) for more information. + +To understand how integrate execution kernels to the extension library, read the [guide about development of custom CPU kernels](CPU_Kernel.md). diff --git a/docs/IE_DG/Extensibility_DG/GPU_Kernel.md b/docs/IE_DG/Extensibility_DG/GPU_Kernel.md new file mode 100644 index 00000000000000..24c7599d8baad0 --- /dev/null +++ b/docs/IE_DG/Extensibility_DG/GPU_Kernel.md @@ -0,0 +1,250 @@ +# How to Implement Custom GPU Layers {#openvino_docs_IE_DG_Extensibility_DG_GPU_Kernel} + +The GPU codepath abstracts many details about OpenCL™. You need to provide the kernel code in OpenCL C and the configuration file that connects the kernel and its parameters to the parameters of the layer. + +There are two options of using custom layer configuration file: + +* Include a section with your kernels into the global automatically-loaded `cldnn_global_custom_kernels/cldnn_global_custom_kernels.xml` file, which is hosted in the `/deployment_tools/inference_engine/bin/intel64/{Debug/Release}` folder +* Call the `InferenceEngine::Core::SetConfig()` method from your application with the `InferenceEngine::PluginConfigParams::KEY_CONFIG_FILE` key and the configuration file name as a value before loading the network that uses custom layers to the plugin: +```cpp +InferenceEngine::Core core; +// Load GPU Extensions +core.SetConfig({ { InferenceEngine::PluginConfigParams::KEY_CONFIG_FILE, "" } }, "GPU"); +``` + +All Inference Engine samples, except trivial `hello_classification`, +feature a dedicated command-line option `-c` to load custom kernels. For example, to load custom layers for the classification sample, run the command below: +```sh +$ ./classification_sample -m /bvlc_alexnet_fp16.xml -i ./validation_set/daily/227x227/apron.bmp -d GPU + -c /custom_layer_example.xml +``` + +## Configuration File Format + +The configuration file is expected to follow the `.xml` file structure +with a node of the type `CustomLayer` for every custom layer you provide. + +The definitions described in the sections below use the following notations: + +Notation | Description +---|--- +(0/1) | Can have 0 or 1 instances of this node/attribute +(1) | Must have only 1 instance of this node/attribute +(0+) | Can have any number of instances of this node/attribute +(1+) | Can have 1 or more instances of this node/attribute + +### CustomLayer Node and Sub-node Structure + +`CustomLayer` node contains the entire configuration for a single custom +layer. + +| Attribute Name |\# | Description | +|-----|-----|-----| +| `name` | (1) | The name of the layer type to be used. This name should be identical to the type used in the IR.| +| `type` | (1) | Must be `SimpleGPU`. | +| `version` | (1) | Must be `1`. | + +**Sub-nodes**: `Kernel` (1), `Buffers` (1), `CompilerOptions` (0+), +`WorkSizes` (0/1) + +### Kernel Node and Sub-node Structure + +`Kernel` node contains all kernel source code configuration. No kernel +node structure exists. + +**Sub-nodes**: `Source` (1+), `Define` (0+) + +### Source Node and Sub-node Structure + +`Source` node points to a single OpenCL source file. + +| Attribute Name | \# || +|-----|-----|-----| +| `filename` | (1) | Name of the file containing OpenCL source code. Notice that path is relative to your executable. Multiple source nodes will have their sources concatenated in order. | + +**Sub-nodes**: None + +### Define Node and Sub-node Structure + +`Define` node configures a single `#‍define` instruction to be added to +the sources during compilation (JIT). + +| Attribute Name | \# | Description | +|------|-------|------| +| `name` | (1) | The name of the defined JIT. For static constants, this can include the value as well (taken as a string). | +| `param` | (0/1) | This parameter value is used as the value of this JIT definition. | +| `type` | (0/1) | The parameter type. Accepted values: `int`, `float`, and `int[]`, `float[]` for arrays. | +| `default` | (0/1) | The default value to be used if the specified parameters is missing from the layer in the IR. | + +**Sub-nodes:** None + +The resulting JIT has the following form: +`#‍define [name] [type] [value/default]`. + +### Buffers Node and Sub-node Structure + +`Buffers` node configures all input/output buffers for the OpenCL entry +function. No buffers node structure exists. + +**Sub-nodes:** `Data` (0+), `Tensor` (1+) + +### Data Node and Sub-node Structure + +`Data` node configures a single input with static data (for example, +weights or biases). + +| Attribute Name | \# | Description | +|----|-----|------| +| `name` | (1) | Name of a blob attached to a layer in the IR | +| `arg-index` | (1) | 0-based index in the entry function arguments to be bound to | + +**Sub-nodes**: None + +### Tensor Node and Sub-node Structure + +`Tensor` node configures a single input or output tensor. + +| Attribute Name | \# | Description | +|------|-------|-------| +| `arg-index` | (1) | 0-based index in the entry function arguments to be bound to. | +| `type` | (1) | `input` or `output` | +| `port-index` | (1) | 0-based index in the layer’s input/output ports in the IR | +| `format` | (0/1) | Data layout declaration for the tensor. Accepted values: `BFYX`, `BYXF`, `YXFB`, `FYXB` (also in all lowercase). Default value: `BFYX` | + +### CompilerOptions Node and Sub-node Structure + +`CompilerOptions` node configures the compilation flags for the OpenCL +sources. + +| Attribute Name | \# | Description | +|--------|-----|------| +| `options` | (1) | Options string to be passed to the OpenCL compiler | + +**Sub-nodes**: None + +### WorkSizes Node and Sub-node Structure + +`WorkSizes` node configures the global/local work sizes to be used when +queuing the OpenCL program for execution. + +| Attribute Name | \# | Description | +|-----|------|-----| +| `global`
`local` | (0/1)
(0/1) | An array of up to 3 integers (or formulas) for defining the OpenCL work-sizes to be used during execution.
The formulas can use the values of the B,F,Y,X dimensions and contain the operators: +,-,/,\*,% (all evaluated in integer arithmetic).
Default value: `global=”B*F*Y*X” local=””` | +| `dim` | (0/1) | A tensor to take the work size from. Accepted values: `input N`, `output`, where `N` is an index of input tensor starting with 0. Default value: `output` | + +**Sub-nodes**: None + +## Example Configuration File + +The following code sample provides an example configuration file (in the +`.xml` format). For information on configuration file structure, see +[Configuration File Format](#config-file-format). +```xml + + + + + + + + + + + + +``` + +## Built-In Defines for Custom Layers + +The following table includes definitions that are attached before +the user sources, where `` is the actual input and output, for +example, `INPUT0` or `OUTPUT0`. + +For an example, see [Example Kernel](#example-kernel). + +| Name | Value | +|---|---| +| `NUM_INPUTS` | Number of the input tensors bound to this kernel | +| `GLOBAL_WORKSIZE` | An array of global work sizes used to execute this kernel | +| `GLOBAL_WORKSIZE_SIZE` | The size of the `GLOBAL_WORKSIZE` array | +| `LOCAL_WORKSIZE` | An array of local work sizes used to execute this kernel | +| `LOCAL_WORKSIZE_SIZE` | The size of the `LOCAL_WORKSIZE` array | +| `_DIMS`| An array of the tensor dimension sizes. Always ordered as `BFYX` | +| `_DIMS_SIZE`| The size of the `_DIMS` array.| +| `_TYPE`| The datatype of the tensor: `float`, `half`, or `char`| +| `_FORMAT_` | The format of the tensor, BFYX, BYXF, YXFB , FYXB, or ANY. The format is concatenated to the defined name. You can use the tensor format to define codepaths in your code with `#‍ifdef/#‍endif`. | +| `_LOWER_PADDING` | An array of padding elements used for the tensor dimensions before they start. Always ordered as BFYX.| +| `_ LOWER_PADDING_SIZE` | The size of the `_LOWER_PADDING` array | +| `_UPPER_PADDING` | An array of padding elements used for the tensor dimensions after they end. Always ordered as BFYX. | +| `_UPPER_PADDING_SIZE` | The size of the `_UPPER_PADDING` array | +| `_PITCHES` | The number of elements between adjacent elements in each dimension. Always ordered as BFYX.| +| `_PITCHES_SIZE`| The size of the `_PITCHES` array | +| `_OFFSET`| The number of elements from the start of the tensor to the first valid element (bypassing the lower padding) | +All `` values are automatically defined for every tensor +bound to this layer (`INPUT0`, `INPUT1`, `OUTPUT0`, and so on), as shown +in the following for example: + +```sh +#define INPUT0_DIMS_SIZE 4 +#define INPUT0_DIMS (int []){ 1,96,55,55, } +``` + +## Example Kernel + +```c +#pragma OPENCL EXTENSION cl_khr_fp16 : enable +__kernel void example_relu_kernel( + const __global INPUT0_TYPE* input0, + __global OUTPUT0_TYPE* output) +{ + const uint idx = get_global_id(0); + const uint idy = get_global_id(1); + const uint idbf = get_global_id(2);//batches*features, as OpenCL supports 3D nd-ranges only + const uint feature = idbf%OUTPUT0_DIMS[1]; + const uint batch = idbf/OUTPUT0_DIMS[1]; + //notice that pitches are in elements, not in bytes! + const uint in_id = batch*INPUT0_PITCHES[0] + feature*INPUT0_PITCHES[1] + idy*INPUT0_PITCHES[2] + idx*INPUT0_PITCHES[3] + INPUT0_OFFSET; + const uint out_id = batch*OUTPUT0_PITCHES[0] + feature*OUTPUT0_PITCHES[1] + idy*OUTPUT0_PITCHES[2] + idx*OUTPUT0_PITCHES[3] + OUTPUT0_OFFSET; + + INPUT0_TYPE value = input0[in_id]; + //neg_slope (which is non-zero for leaky ReLU) is put automatically as #define, refer to the config xml + output[out_id] = value < 0 ? value * neg_slope : value; +} +``` + +> **NOTE:** As described in the previous section, all the things like +> `INPUT0_TYPE` are actually defined as OpenCL (pre-)compiler inputs by +> the Inference Engine for efficiency reasons. See [Debugging +> Tips](#debugging-tips) for information on debugging the results. + +> **NOTE**: Several GPU-targeted kernels are also added to the binaries upon samples compilation +> so that the sample application can easy load them. +> Refer to the `cldnn_global_custom_kernels` folder in the GPU plugin installation directory. + +## Debugging Tips + +* **Dumping the Resulting Kernels**. +It is recommended to get a dump of the kernel with all of +the values set by the Inference Engine, such as tensor sizes, +floating-point, and integer kernel parameters. To get the dump, add the +following line to your code that configures the GPU plugin to output the +custom kernels: +```cpp +core.SetConfig({ { PluginConfigParams::KEY_DUMP_KERNELS, PluginConfigParams::YES } }, "GPU"); +``` +When the Inference Engine compiles the kernels for the specific network, +it also outputs the resulting code for the custom kernels. In the +directory of your executable, find files like +`clDNN_program0.cl`, `clDNN_program1.cl`. There are as many files as +distinct sets of parameters for your custom kernel: different input +tensor sizes and kernel parameters. + +* **Using `printf` in the OpenCL™ Kernels**. +To debug the specific values, you can use `printf` in your kernels. +However, be careful: for instance, do not output excessively +as it would generate too much data. The `printf` output is typical, so +your output can be truncated to fit the buffer. Also, because of +buffering, you actually get an entire buffer of output when the +execution ends.
+For more information, refer to the [printf +Function](https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/printfFunction.html). diff --git a/docs/IE_DG/Extensibility_DG/Intro.md b/docs/IE_DG/Extensibility_DG/Intro.md new file mode 100644 index 00000000000000..d63e333b946c32 --- /dev/null +++ b/docs/IE_DG/Extensibility_DG/Intro.md @@ -0,0 +1,56 @@ +# Inference Engine Extensibility Mechanism {#openvino_docs_IE_DG_Extensibility_DG_Intro} + +Inference Engine Extensibility API allows to add support of custom operations to the Inference Engine. +Extension should contain operation sets with custom operations and execution kernels for custom operations. +Physically, an extension library can be represented as a dynamic library exporting the single `CreateExtension` function that allows to create a new extension instance. + +Extensibility library can be loaded to the InferenceEngine::Core object using the InferenceEngine::Core::AddExtension method. + +## Inference Engine Extension Library + +Inference Engine Extension dynamic library contains several main components: + + * [Extension class](Extension.md): + - Contains custom operation sets + - Provides CPU implementations for custom operations + * [Custom operations](Intro.md): + - Allows to use InferenceEngine::Core::ReadNetwork to read Intermediate Representation (IR) with unsupported operations + - Allows to create `ngraph::Function` with unsupported operations + - Provides shape inference mechanism for custom operations + +> **NOTE**: This documentation is written based on the `Template extension`, which demonstrates extension +development details. Find the complete code of the `Template extension`, which is fully compilable and up-to-date, +at `/docs/template_extension`. + +## Execution Kernels + +The Inference Engine workflow involves the creation of custom kernels and either custom or existing operations. + +An _Operation_ is a Network building block implemented in the training framework, for example, `Convolution` in Caffe*. +A _Kernel_ is defined as the corresponding implementation in the Inference Engine. + +Refer to the [Custom Layers in the Model Optimizer](../../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) section for details on how +mapping between framework layers and Inference Engine kernels is registered. + +In short, you can plug your own kernel implementations into the Inference Engine and map them to the layers in the original framework. + +The following pages describe how to integrate custom _kernels_ into the Inference Engine: + + * [Introduction to development of custom CPU kernels](CPU_Kernel.md) + * [Introduction to development of custom GPU kernels](GPU_Kernel.md) + * [Introduction to development of custom VPU kernels](VPU_Kernel.md) + +## Deprecated Extensibility API + +Shape Inference API and some methods of extensibility mechanism was deprecated and will be removed soon. +Old Extensibility mechanism contains two parts shape inference and execution kernel. + * [Shape Inference](deprecated/ShapeInfer.md) + * [Execution Kernel](deprecated/Factory.md) + +## Additional Resources + +* [Build an extension library using CMake*](Building.md) + +## See Also +* [Using Inference Engine Samples](../Samples_Overview.md) +* [Hello Shape Infer SSD sample](../../../inference-engine/samples/hello_reshape_ssd/README.md) diff --git a/docs/IE_DG/Extensibility_DG/VPU_Kernel.md b/docs/IE_DG/Extensibility_DG/VPU_Kernel.md new file mode 100644 index 00000000000000..a3c97d0a8533cd --- /dev/null +++ b/docs/IE_DG/Extensibility_DG/VPU_Kernel.md @@ -0,0 +1,679 @@ +# How to Implement Custom Layers for VPU (Intel® Neural Compute Stick 2) {#openvino_docs_IE_DG_Extensibility_DG_VPU_Kernel} + +> **NOTE:** OpenCL™ custom layer support is available in the preview mode. + +> **NOTE:** This section assumes you are familiar with developing kernels using OpenCL™. + +To customize your topology with an OpenCL™ layer, follow the steps below: + +1. Write and compile you OpenCL™ code with the standalone offline OpenCL™ compiler (`clc`). +2. Write a configuration file to bind the OpenCL™ kernel to the topology file (`.xml`) of the model IR. +3. Pass the configuration file to Inference engine with the model IR. + +## Compile OpenCL™ code for VPU (Intel® Neural Compute Stick 2) + +> **NOTE:** OpenCL compiler, targeting Intel® Neural Compute Stick 2 for the SHAVE* processor only, is redistributed with OpenVINO. +OpenCL support is provided by ComputeAorta*, and is distributed under a license agreement between Intel® and Codeplay* Software Ltd. + +The OpenCL™ toolchain for the Intel® Neural Compute Stick 2 supports offline compilation only, so first compile OpenCL C code using the standalone `clc` compiler. You can find the compiler binary at `/deployment_tools/tools/cl_compiler`. + +> **NOTE:** By design, custom OpenCL layers support any OpenCL kernels written with 1.2 version assumed. It also supports half float +extension and is optimized for this type, because it is a native type for Intel® Movidius™ VPUs. + +1. Prior to running a compilation, make sure that the following variables are set: + * `SHAVE_MA2X8XLIBS_DIR=/deployment_tools/tools/cl_compiler/lib/` + * `SHAVE_LDSCRIPT_DIR=/deployment_tools/tools/cl_compiler/ldscripts/` + * `SHAVE_MYRIAD_LD_DIR=/deployment_tools/tools/cl_compiler/bin/` + * `SHAVE_MOVIASM_DIR=/deployment_tools/tools/cl_compiler/bin/` +2. Run the compilation with the command below. You should use `--strip-binary-header` to make an OpenCL runtime-agnostic binary runnable with the Inference Engine. +```bash +cd /deployment_tools/tools/cl_compiler/bin +./clc --strip-binary-header custom_layer.cl -o custom_layer.bin +``` + +## Write a Configuration File + +To tie the topology IR for a layer you customize, prepare a configuration file, so that the Inference Engine can find parameters for your kernel and the execution work grid is described. +For example, given the following OpenCL kernel signature: +```cpp +__kernel void reorg_nhwc(__global const half *src, __global half *out, int w, int h, int c, int stride); +``` +Configuration file for this kernel might be the following: +```xml + + + + + + + + + + + + + + +``` +Each custom layer is described with the `CustomLayer` node. It has the following nodes and attributes: + - Root node `CustomLayer` contains the following attributes: + - `name` – (Required) A name of the Inference Engine layer to bind the kernel with. + - `type` and `version` – (Required) Reserved for future use. Set them to `MVCL` and `1` respectively. + - `max-shaves` – (Optional) The maximum number of SHAVE cores that should be dedicated for the layer. It is useful for debugging concurrency issues or for resource saving if memory bound kernel does not scale well with the number of cores, so more resources can be left for the rest of a topology. + - Sub-node `Kernel` must contain the following attributes: + - `entry` – A name of your kernel function as you defined it in a source file (in the example above, it is `reorg_nhwc`). + - Node `Source` must contain the following attributes: + - `filename` – A path to a compiled binary relative to the `.xml` binding file. + - Sub-node `Parameters` – Describes parameters bindings. For more information, see the description below. + - Sub-node `WorkSizes` – Describes local and global work group sizes and the source for dimension deduction as a pair `direction,port`. In the example above, the work group is described relatively to the dimension of the input tensor that comes through port 0 in the IR. `global` and `local` work group configurations support any simple math expressions with +,-,\*,/, and () from `B`(batch), `Y`(height), `X`(width) and `F`(channels). + - Sub-node `Where` – Allows to customize bindings with the `key="value"` attribute. For example, to substitute only 3x3 convolutions, write `` in the binging xml. + + Parameter description supports `Tensor` of one of tensor types such as `input`, `output`, `input_buffer`, `output_buffer` or `data`, `Scalar`, or `Data` nodes and has the following format: + - Each `Tensor` node of `input` or `output` type must contain the following attributes: + - `arg-name` – A name of a kernel parameter in the kernel signature. + - `type` – Node type: `input` or `output` as in the IR. + - `port-index` – A number of input/output ports as in the IR. + - `format` – The channel order in the tensor. Optional conversion layers are generated if the custom layer format is not compatible with formats of neighboring layers. `BFXY`, `BYXF`, and `ANY` formats are supported currently. + - Each `Tensor` node of `input_buffer` or `output_buffer` type must contain the following attributes: + - `arg-name` – A name of a kernel parameter in the kernel signature. + - `type` – Node type: `input_buffer` or `output_buffer`. Use the appropriate type to bind multiple kernels that correspond to different stages of the same layer. + - `port-index` – The unique identifier to bind by. + - `dim` – The dim source with the same `direction,port` format used for `WorkSizes` bindings. + - `size` – Amount of bytes needed. Current expression syntax supports only expression over dimensions of over selected input/output tensor or constants and might be expended in the future. + + Here is an example of multi-stage MVN layer binding: + ```xml + + + + + + + + + + + + + + + + + + + + + + + + + + ``` + - Each `Tensor` node that has the type `data` must contain the following attributes: + - `source` – A name of the blob as it is in the IR (typical example is `weights` for convolution + - `format` – Specifies the channel order in the tensor. Optional conversion layers are generated if the custom layer format is not. + ```xml + + + + + + + + + + + + + ``` + - Each `Scalar` node must contain the following attributes: + - `arg-name` – A name of a kernel parameter in the kernel signature. + - `type` – `int` or `float` value. It is used for correct argument extraction from IR parameters. + - `source` – Contains the name of the parameter in the IR file or input/output (`I`/`O`, `In`/`On`, where `n` is a port number) + followed by dimension `B`(batch), `Y`(height), `X`(width), or `F`(channels). + + - Each `Data` node must contain the following attributes: + - `arg-name` – A name of a kernel parameter in the kernel signature. + - `type` – Node type. Currently, `local_data` is the only supported value, which defines buffer allocated in fast local on-chip memory. It is limited to 100K for all `__local` and + `__private` arrays defined inside the kernel as well as all `__local` parameters passed to the kernel. Please, consider that a manual-DMA extension requires double buffering. + If the custom layer is detected to run out of local memory, the inference fails. + - `dim` – The dim source with the same `direction,port` format used for `WorkSizes` bindings. + - `size` – Amount of bytes needed. The current expression syntax supports only expression over dimensions of over selected input/output tensor or constants and may be extended in the future. + The example binding below illustrates a kernel with two local buffers passed to the kernel. + ```xml + + + + + + + + + + + + + + +``` + +## Pass Configuration File to Inference Runtime + +> **NOTE**: If both native and custom layer implementations are present, the custom kernel has a priority over the native one. + +Before loading the network that features the custom layers, provide a separate configuration file and load it using the InferenceEngine::Core::SetConfig() method with the PluginConfigParams::KEY_CONFIG_FILE key and the configuration file name as a value: +```cpp +InferenceEngine::Core core; +// Load custom layers +core.SetConfig({ { InferenceEngine::PluginConfigParams::KEY_CONFIG_FILE, "" } }, "MYRIAD"); +``` +Optionally, set a path to a custom layers description with a pair of `VPU_CUSTOM_LAYERS` and `/path/to/your/customLayers.xml` +as a network configuration: +```cpp +InferenceEngine::Core core; +std::map networkConfig; +config["VPU_CUSTOM_LAYERS"] = "/path/to/your/customLayers.xml"; +// Load custom layers in network config +auto exeNetwork = core.LoadNetwork(cnnNetwork, "MYRIAD", networkConfig); +``` + +## Optimizing Kernels with OpenCL™ for VPU (Intel® Neural Compute Stick 2) + +This section provides optimization guidelines on writing custom layers with OpenCL for VPU devices. Knowledge about general OpenCL +programming model and OpenCL kernel language is assumed and not a subject of this section. The OpenCL model mapping to VPU is described in the table below. + +| OpenCL Model | VPU Mapping| +|-----|----| +| Device code | Executed on SHAVE cores | +| Private memory | Mapped to CMX internal memory, limited to 100KB per work group, valid only while the work group is executed | +| Local memory | Mapped to CMX internal memory, limited to 100KB per work group, valid only while the work group is executed | +| Global memory | Mapped to DDR, used to pass execution preserved parameters for inputs, outputs, and blobs | +| Work group | Executed on a single SHAVE core iterating over multiple work items | + +Note that by the OpenCL specification, the work group execution order is not specified. This means that it is your +responsibility to ensure that race conditions among work groups are not introduced. Custom layer runtime spits evenly +work grid among available compute resources and executes them in an arbitrary order. This static scheduling approach works best if the load is evenly spread out across work groups, which is a typical case for Deep Learning kernels. The following guidelines are recommended to use for work group partitioning: + +1. Split work evenly across work groups. +2. Adjust work group granularity to maintain equal workload for all compute codes. +3. Set the maximum number of cores (using the `max-shaves` attribute for the `CustomLayer` node). This keeps more resources for the rest of topology. It is also useful if the kernel scalability reached its limits, which may happen while optimizing memory bound kernels or kernels with poor parallelization. +4. Try an alternate data layout (`BFXY`/`BYXF`) for the kernel if it improves work group partitioning or data access patterns. +Consider full topology performance (not just specific layer boost) since data conversion layers would be automatically inserted +as appropriate. + +Offline OpenCL compiler (`clc`) features automatic vectorization over `get_global_id(0)` usage, if uniform access is detected. +For example, the kernel below could be automatically vectorized: +```cpp +__kernel void cvtf32f16(__global float* restrict inImage, __global half* restrict outImage, + float scale, float bais) +{ + int idx = get_global_id(0) + get_global_id(1) * get_global_size(0) + get_global_id(2) * get_global_size(0) * get_global_size(1); + outImage[idx] = convert_half(inImage[idx]*scale+bais); +} +``` +However, this work-group based vectorizer (WGV) conflicts with the default LLVM vectorizer based on superword level parallelism +(SLP) for the current compiler version. Manual vectorization is recommended to provide the best performance for non-uniform code +patterns. WGV works if and only if vector types are not used in the code. + +Here is a short list of optimization tips: + +1. Help auto-vectorizer ensure non-aliasing pointers for kernel parameters by putting `restrict` where possible. + - This may give a performance boost, especially for kernels with unrolling, like `ocl_grn` from the example below. + - Place `restrict` markers for kernels with manually vectorized codes. In the `ocl_grn` kernel below, the unrolled version without `restrict` is up to 20% slower than the most optimal one, which combines unrolling and `restrict`. +2. Put `#‍pragma unroll N` to your loop header. Since the compiler does not trigger unrolling by default, it is your responsibility to +annotate the code with pragmas as appropriate. The `ocl_grn` version with `#‍pragma unroll 4` is up to 50% faster, most of which comes from unrolling the first loop, because LLVM, in general, is better in scheduling 3-stage loops (load-compute-store), while the fist loop + `variance += (float)(src_data[c*H*W + y*W + x] * src_data[c*H*W + y*W + x]);` is only 2-stage (load-compute). Please, pay +attention to unrolling such cases first. Unrolling factor is loop-dependent. Choose the smallest number that +still improves performance as an optimum between the kernel size and execution speed. For this specific kernel, changing the unroll factor from `4`to `6` results in the same performance, so unrolling factor equal to 4 is an optimum. For Intel® Neural Compute Stick 2, unrolling is conjugated with the automatic software pipelining for load, store, and compute stages: +```cpp +__kernel void ocl_grn(__global const half* restrict src_data, __global half* restrict dst_data, int C, float bias) +{ + int x = get_global_id(0); + int W = get_global_size(0); + int y = get_global_id(1); + int H = get_global_size(1); + + float variance = bias + 1e-9f; + + #pragma unroll 4 + for (int c = 0; c < C; c++) + variance += (float)(src_data[c*H*W + y*W + x] * src_data[c*H*W + y*W + x]); + + variance = 1.f / native_sqrt(variance); + + #pragma unroll 4 + for (int c = 0; c < C; c++) + dst_data[c*H*W + y*W + x] = (half)((float)src_data[c*H*W + y*W + x] * variance); +} +``` +To check the efficiency of WGV, you can compare performance of the kernel above with the kernel below, which is manually vectorized over width: +```cpp +__kernel void ocl_grn_line(__global const half* restrict src_data, __global half* restrict dst_data, int C, int W, float bias) +{ + int y = get_global_id(1); + int H = get_global_size(1); + + for (int x = 0; x < W/8; x++) + { + float8 variance = (float8)(bias+1e-9f); + + #pragma unroll 4 + for (int c = 0; c < C; c++) + { + __global const half8* restrict src_line = ((__global const half8 * restrict)(src_data + c*H*W + y*W)); + half8 sh = src_line[x]; + variance += convert_float8(sh*sh); + } + + variance = 1.f/native_sqrt(variance); + + #pragma unroll 4 + for (int c = 0; c < C; c++) + { + __global const half8* restrict src_line = ((__global const half8 * restrict)(src_data + c*H*W + y*W)); + __global half8* restrict dst_line = ((__global half8 * restrict)(dst_data + c*H*W + y*W)); + + dst_line[x] = convert_half8(convert_float8(src_line[x])*variance); + } + } + for (int x = W/8*8; x < W; x++) + { + float variance = bias+1e-9f; + #pragma unroll 4 + for (int c = 0; c < C; c++) + variance += (float)(src_data[c*H*W + y*W + x]*src_data[c*H*W + y*W + x]); + + variance = 1.f/native_sqrt(variance); + + #pragma unroll 4 + for (int c = 0; c < C; c++) + dst_data[c*H*W + y*W + x] = (float)src_data[c*H*W + y*W + x]*variance; + } +} +``` +Both versions perform the same, but the second one has more complex code. + +3. If it is easy to predict the work group size, you can also use the `reqd_work_group_size` kernel attribute to ask the compiler +to unroll the code up to local size of the work group. Please note that if the kernel is actually executed with the +different work group configuration, the result is undefined. + +4. Prefer to use the `half` compute, if it keeps reasonable accuracy. 16-bit float is a native type for Intel® Neural Compute Stick 2, most of the functions `half_*` are mapped to a single hardware instruction. +Use the standard `native_*` function for the rest of types. + +5. Prefer to use the `convert_half` function over `vstore_half` if conversion to 32-bit float is required. `convert_half` is mapped to a single hardware instruction. For the `cvtf32f16` kernel above, the line `outImage[idx] = convert_half(inImage[idx]*scale+bais);` is 8 times slower than the code with `vstore_half`. + +6. Mind early exits. Early exit may be extremely costly for the current version of the `clc` compiler due to conflicts with the +auto-vectorizer. The generic advice would be to setup local size by `x` dimension equal to inputs or/and outputs width. +If it is impossible to define the work grid that exactly matches inputs or/and outputs to eliminate checks, for example, +`if (get_global_id(0) >= width) return`, use line-wise kernel variant with manual vectorization. +The kernel example below demonstrates the impact of early exits on kernel performance. + ```cpp + // Initial version + __kernel void reorg(const __global half* restrict src, __global half* restrict out, int stride) + { + int w = get_global_id(0); + int W = get_global_size(0); + + int h = get_global_id(1); + int H = get_global_size(1); + + int c = get_global_id(2); + int C = get_global_size(2); + + int C2 = C/(stride*stride); + int offset = c / C2; + int c2 = c - C2 * offset; + + int H2 = H*stride; + int W2 = W*stride; + + int h2 = h*stride + offset / stride; + int w2 = w*stride + offset - stride * (offset / stride); + + out[W*H*c + W*h + w] = src[W2*H2*c2 + W2*h2 + w2]; + } + ``` +This `reorg` kernel is auto-vectorizable, but an input for YOLO v2 topology is `NCHW=<1,64,26,26>` and it is not multiple of vector width (which is `8` for `half` data type). As a result, the Inference Engine does not select the auto-vectorized kernel. +To compare performance of auto-vectorized and scalar version of the kernel, change the input size to`NCHW=<1,64,26,32>`. This allows the auto-vectorized version to be selected by the Inference Engine and can give you about 30% uplift. +Since the auto-vectorized version is faster, it makes sense to enable it for the YOLO v2 topology input size by setting the local size multiple of vector (e.g. 32) and adjust global sizes accordingly. As a result, the execution work grid exceeds actual input dimension, so out-of-bound checks should be inserted. See the updated kernel version below: + ```cpp + // Version with out-of-bound checks added + __kernel void reorg(const __global half* restrict src, __global half* restrict out, int W, int stride) + { + int w = get_global_id(0); + w = min(w, W-1); + + int h = get_global_id(1); + int H = get_global_size(1); + + int c = get_global_id(2); + int C = get_global_size(2); + + int C2 = C/(stride*stride); + int offset = c / C2; + int c2 = c - C2 * offset; + + int H2 = H*stride; + int W2 = W*stride; + + int h2 = h*stride + offset / stride; + int w2 = w*stride + offset - stride * (offset / stride); + + out[W*H*c + W*h + w] = src[W2*H2*c2 + W2*h2 + w2]; + } + ``` +This code performs the same as the initial kernel above (scalar) due to branching overhead. If you replace min/max expression `w = min(w, W-1);` with `if (w >= W) return;`, runtime increases up to 2x against to code without branching (initial version).
+If branching is inevitable for your element-based kernel, it is recommended to change the scheme to line-based. See the kernel variant below: +```cpp +// Line-wise version +__kernel void reorg(const __global half* restrict src, __global half* restrict out, int H, int W, int stride) +{ + int h = min((int)get_global_id(0), H-1); + + int c = get_global_id(1); + int C = get_global_size(1); + int C2 = C/(stride*stride); + int offset = c / C2; + int c2 = c - C2 * offset; + + int H2 = H*stride; + int W2 = W*stride; + + for (int w = 0; w < W; ++w) + { + int h2 = h*stride + offset / stride; + int w2 = w*stride + offset - stride * (offset / stride); + + out[W*H*c + W*h + w] = src[W2*H2*c2 + W2*h2 + w2]; + } +} +``` +This decreases the execution time up to 40% against the best performing vectorized kernel without early exits (initial version). +7. Reuse computations among work items by using line-based kernels or sharing values though `__local` memory. +8. Improve data access locality. Most of custom kernels are memory bound while convolution and fully connected layers are hardware-implemented. The code below demonstrates a further optimized version of the `reorg` kernel unrolled by `stride`: + ```cpp + // Unrolled line-wise version + __kernel void reorg_unrolled_by_stride(const __global half* restrict src, __global half* restrict dst, + int H, int W, int stride) + { + int h = min((int)get_global_id(0), H-1); + + int c2 = get_global_id(1); + int C2 = get_global_size(1); + int C = C2*stride*stride; + + int H2 = H*stride; + int W2 = W*stride; + + for (int stride_y = 0; stride_y < stride; stride_y++) + for (int stride_x = 0; stride_x < stride; stride_x++) + for (int w2 = 0, w = 0; w < W; w2 += stride, w++) + dst[W*H*C2*(stride_y*stride+stride_x) + W*H*c2 + W*h + w] = src[W2*H2*c2 + W2*h*stride + W2*stride_y + w2 + stride_x]; + } + ``` +`scr` data in this case loaded only once. As the result, the cycle count drops up to 45% against the line-wise version. + +9. Copy data from `__dlobal` to `__local` or `__private` memory if the data is accessed more than once. Access to +`__dlobal` memory is orders of magnitude slower than access to `__local`/`__private` due to statically scheduled pipeline, which +stalls completely on memory access without any prefetch. The same recommendation is applicable for scalar load/store +from/to a `__blobal` pointer since work-group copying could be done in a vector fashion. + +10. Use a manual DMA extension. Local (on-chip) memory throughput is up to 24x higher than DDR throughput. Starting from OpenVINO™ 2020.1, VPU OpenCL features manual-DMA kernel extension to copy sub-tensor used by work group into local memory and performing compute without DDR evolved. Here is the simple GRN kernel implementation that runs over DDR. Local size is equal to (width of the input tensor, 1, 1) to define a large enough work group to get code automatically vectorized and unrolled, while global size is (width of the input tensor, height of the input tensor, 1): + ```cpp + __kernel void grn_NCHW( + __global const half* restrict src_data, + __global half* restrict dst_data, + int C, + float bias) + { + float variance = bias + 1e-9f; + + #pragma unroll 4 + for (int c = 0; c < C; c++) + { + float val = (float) src_data[c*get_global_size(1)*get_global_size(0) + get_global_id(1)*get_global_size(0) + get_global_id(0)]; + variance += val*val; + } + + half hvariance = (half)(native_rsqrt((half)(variance/16.f))*0.25f); + + #pragma unroll 4 + for (int c = 0; c < C; c++) + { + dst_data[c*get_global_size(1)*get_global_size(0) + get_global_id(1)*get_global_size(0) + get_global_id(0)] + = src_data[c*get_global_size(1)*get_global_size(0) + get_global_id(1)*get_global_size(0) + get_global_id(0)] * hvariance; + } + } + ``` +This kernel can be rewritten to introduce special data binding `__dma_preload` and `__dma_postwrite intrinsics`. This means that instead of one kernel, a group of three kernels should be implemented: `kernelName`, `__dma_preload_kernelName` and `__dma_postwrite_kernelName`. `__dma_preload_kernelName` for a particular work group `n` is guaranteed to be executed before `n`-th work group itself, while `__dma_postwrite_kernelName` is guarantied to be executed after a corresponding work group. You can define one of those functions that are intended to be used to copy data from-to `__global` and `__local` memory. The syntactics requires exact functional signature match. The example below illustrates how to prepare your kernel for manual-DMA. + ```cpp + __kernel void __dma_preload_grn_NCHW( + __global const half* restrict src, + __global half* restrict dst, + __local half* restrict local_src, + __local half* restrict local_dst, + int C, + float bias) + { + // ToDO: copy required piece of src tensor into local_src + } + + __kernel void __dma_postwrite_grn_NCHW( + __global const half* restrict src, + __global half* restrict dst, + __local const half* restrict local_src, + __local half* restrict local_dst, + int C, + float bias) + { + // ToDO: copy back computed piece of local_dst into dst + } + + __kernel void grn_NCHW( + __global const half* restrict src_data, + __global half* restrict dst_data, + __local half* restrict src, + __local half* restrict dst, + int C, + float bias) + { + // same as the example above + } + ``` +GRN kernel operates on channel-major tensors to compute average over full channel range and then normalizes input elements to produce the output. +As a part of manual DMA extension, a group of work group copy functions are introduced in addition to `async_work_group_copy`, which is also mapped to DMA call. + +Here is the list of supported functions: +```cpp +// 2D sub-tensor copy +event_t WorkGroupDmaCreateStrideTransaction( + const local T *src, + global T *dst, + size_t src_width, // width of the line of source in bytes + size_t dst_width, // width of the line of destination in bytes + size_t src_stride, // stride between corresponding 2 consecutive lines of source in bytes + size_t dst_stride, // stride between corresponding 2 consecutive lines of destination in bytes + size_t size, // total number of bytes loaded for all lines from source to destination + event_t event) __OVERLOAD; + + +event_t WorkGroupDmaCreateStrideTransaction( + const global T *src, + local T *dst, + size_t src_width, // width of the line of source in bytes + size_t dst_width, // width of the line of destination in bytes + size_t src_stride, // stride between corresponding 2 consecutive lines of source in bytes + size_t dst_stride, // stride between corresponding 2 consecutive lines of destination in bytes + size_t size, // total number of bytes loaded for all lines from source to destination + event_t event) __OVERLOAD; + +// 3D sub-tensor copy +event_t WorkGroupDmaCreate3DTransaction( + const local T *src, + global T *dst, + size_t src_width, // width of the line of source in bytes + size_t dst_width, // width of the line of destination in bytes + size_t src_stride, // stride between corresponding 2 consecutive lines of source in bytes + size_t dst_stride, // stride between corresponding 2 consecutive lines of destination in bytes + size_t num_planes, // number of planes to be copied + size_t src_plane_stride, // stride between corresponding 2 consecutive planes of source in bytes + size_t dst_plane_stride, // stride between corresponding 2 consecutive planes of destination in bytes + size_t size, // size of the loaded plane in bytes, analogues to the size in 2D case + event_t event) __OVERLOAD; + +event_t WorkGroupDmaCreate3DTransaction( + const global T *src, + local T *dst, + size_t src_width, // width of the line of source in bytes + size_t dst_width, // width of the line of destination in bytes + size_t src_stride, // stride between corresponding 2 consecutive lines of source in bytes + size_t dst_stride, // stride between corresponding 2 consecutive lines of destination in bytes + size_t num_planes, // number of planes to be copied + size_t src_plane_stride, // stride between corresponding 2 consecutive planes of source in bytes + size_t dst_plane_stride, // stride between corresponding 2 consecutive planes of destination in bytes + size_t size, // size of the loaded plane in bytes, analogues to the size in 2D case + event_t event) __OVERLOAD; +``` +where `T` can be `uchar`, `char`, `short`, `ushort`, `int`, `uint`, `long`, `ulong`, `half` or `float`. + +Modified version of the GRN kernel could be the following: +```cpp +__kernel void __dma_preload_grn_NCHW( + __global const half* restrict src, + __global half* restrict dst, + __local half* restrict local_src, + __local half* restrict local_dst, + int C, + float bias) +{ + WorkGroupDmaCreate3DTransaction( + src + get_group_id(0)*get_local_size(0) + + get_group_id(1)*get_local_size(1)*get_global_size(0), // src + local_src, // dst + get_local_size(0) * sizeof(half), // src width + get_local_size(0) * sizeof(half), // dst width + get_global_size(0) * sizeof(half), // src stride + get_local_size(0) * sizeof(half), // dst stride + C, // num planes + get_global_size(0) * get_global_size(1) * sizeof(half), // src plane stride + get_local_size(0) * get_local_size(1) * sizeof(half), // dst plane stride + get_local_size(0) * get_local_size(1) * sizeof(half), // plane size + 0); +} + +__kernel void __dma_postwrite_grn_NCHW( + __global const half* restrict src, + __global half* restrict dst, + __local const half* restrict local_src, + __local half* restrict local_dst, + int C, + float bias) +{ + WorkGroupDmaCreate3DTransaction( + local_dst, // src + dst + get_group_id(0)*get_local_size(0) + + get_group_id(1)*get_local_size(1)*get_global_size(0), // dst + get_local_size(0) * sizeof(half), // src width + get_local_size(0) * sizeof(half), // dst width + get_local_size(0) * sizeof(half), // src stride + get_global_size(0) * sizeof(half), // dst stride + C, // num planes + get_local_size(0) * get_local_size(1) * sizeof(half), // src plane stride + get_global_size(0) * get_global_size(1) * sizeof(half), // dst plane stride + get_local_size(0) * get_local_size(1) * sizeof(half), // plane size + 0); +} + +__kernel void grn_NCHW( + __global const half* restrict src_data, + __global half* restrict dst_data, + __local half* restrict src, + __local half* restrict dst, + int C, + float bias) +{ + float variance = bias + 1e-9f; + + #pragma unroll 8 + for (int c = 0; c < C; c++) + { + float val = (float) src[c*get_local_size(1)*get_local_size(0) + get_local_id(1)*get_local_size(0) + get_local_id(0)]; + variance += val*val; + } + + half hvariance = (half)(native_rsqrt((half)(variance/16.f))*0.25f); + + #pragma unroll 8 + for (int c = 0; c < C; c++) + { + dst[c*get_local_size(1)*get_local_size(0) + get_local_id(1)*get_local_size(0) + get_local_id(0)] + = src[c*get_local_size(1)*get_local_size(0) + get_local_id(1)*get_local_size(0) + get_local_id(0)] * hvariance; + } +} +``` + +Please note `get_local_size` and `get_local_id` usage inside the kernel. 21x speedup is expected for a kernel on enet-curbs setup since it was completely limited by memory usage. + +An alternative method of using DMA is to use work item copy extension. Those functions are executed inside a kernel and requires work groups equal to single work item. + +Here is the list of supported work item functions: +```cpp +item_dma_event_t WorkItemDmaCreateTransaction( + const global T *src, + private T *dst, + size_t size, + item_dma_event_t event) __OVERLOAD; + +item_dma_event_t WorkItemDmaCreateTransaction( + const private T *src, + global T *dst, + size_t size, + item_dma_event_t event) __OVERLOAD; + +item_dma_event_t WorkItemDmaCreateStrideTransaction( + const global T *src, + private T *dst, + size_t src_width, + size_t dst_width, + size_t src_stride, + size_t dst_stride, + size_t size, + item_dma_event_t event) __OVERLOAD; + +item_dma_event_t WorkItemDmaCreateStrideTransaction( + const private T *src, + global T *dst, + size_t src_width, + size_t dst_width, + size_t src_stride, + size_t dst_stride, + size_t size, + item_dma_event_t event) __OVERLOAD; + +item_dma_event_t WorkItemDmaCreate3DTransaction( + const global T *src, + private T *dst, + size_t src_width, + size_t dst_width, + size_t src_stride, + size_t dst_stride, + size_t num_planes, + size_t src_plane_stride, + size_t dst_plane_stride, + size_t size, + item_dma_event_t event) __OVERLOAD; + +item_dma_event_t WorkItemDmaCreate3DTransaction( + const private T *src, + global T *dst, + size_t src_width, + size_t dst_width, + size_t src_stride, + size_t dst_stride, + size_t num_planes, + size_t src_plane_stride, + size_t dst_plane_stride, + size_t size, + item_dma_event_t event) __OVERLOAD; +``` +where `T` can be `uchar`, `char`, `short`, `ushort`, `int`, `uint`, `long`, `ulong`, `half` or `float`. diff --git a/docs/IE_DG/Extensibility_DG/deprecated/Factory.md b/docs/IE_DG/Extensibility_DG/deprecated/Factory.md new file mode 100644 index 00000000000000..82370cbfc80dab --- /dev/null +++ b/docs/IE_DG/Extensibility_DG/deprecated/Factory.md @@ -0,0 +1,96 @@ +# Deprecated API for CPU kernels creation {#openvino_docs_IE_DG_Extensibility_DG_deprecated_Factory} + +List of deprecated API for kernels development: + * `InferenceEngine::IExtension::getPrimitiveTypes(char**& types, unsigned int& size, ResponseDesc* resp)` method + * `InferenceEngine::IExtension::getFactoryFor(ILayerImplFactory *&factory, const CNNLayer *cnnLayer, ResponseDesc *resp)` method + * `InferenceEngine::ILayerImplFactory` class + +>**NOTE**: This guide demonstrates how to use deprecated API for kernels creation. However, keep in mind that this API will be deleted soon. + +1. Create your custom layer factory `CustomLayerFactory` class: +```cpp +// custom_layer.h +// A CustomLayerFactory class is an example layer, which makes exponentiation by 2 for the input and does not change dimensions +class CustomLayerFactory { + +}; +``` +2. Inherit it from the abstract `InferenceEngine::ILayerImplFactory` class: +```cpp +// custom_layer.h +class CustomLayerFactory: public InferenceEngine::ILayerImplFactory { + +}; +``` + +3. Create a constructor, a virtual destructor, and a data member to keep the layer info: +```cpp +// custom_layer.h +class CustomLayerFactory: public InferenceEngine::ILayerImplFactory { +public: + explicit CustomLayerFactory(const CNNLayer *layer): cnnLayer(*layer) {} +private: + CNNLayer cnnLayer; +}; +``` + +4. Overload and implement the abstract methods `getShapes` and `getImplementations` of the `InferenceEngine::ILayerImplFactory` class: +```cpp +// custom_layer.h +class CustomLayerFactory: public InferenceEngine::ILayerImplFactory { +public: + // ... constructor and destructor + + StatusCode getShapes(const std::vector& inShapes, std::vector& outShapes, ResponseDesc *resp) noexcept override { + if (cnnLayer == nullptr) { + std::string errorMsg = "Cannot get cnn layer!"; + errorMsg.copy(resp->msg, sizeof(resp->msg) - 1); + return GENERAL_ERROR; + } + if (inShapes.size() != 1) { + std::string errorMsg = "Incorrect input shapes!"; + errorMsg.copy(resp->msg, sizeof(resp->msg) - 1); + return GENERAL_ERROR; + } + outShapes.clear(); + outShapes.emplace_back(inShapes[0]); + return OK; + } + + StatusCode getImplementations(std::vector& impls, ResponseDesc *resp) noexcept override { + // You can add cnnLayer to implementation if it is necessary + impls.push_back(ILayerImpl::Ptr(new CustomLayerImpl())); + return OK; + } +}; +``` +5. Create your custom layer implementation `CustomLayerImpl` class using the [instruction](../CPU_Kernel.md). + +6. Implement methods in the `Extension` class: +```cpp +// custom_extension.h +class CustomExtention : public InferenceEngine::IExtension { +public: + // ... utility methods + // Retruns the list of supported kernels/layers + StatusCode getPrimitiveTypes(char**& types, unsigned int& size, ResponseDesc* resp) noexcept override { + std::string type_name = "CustomLayer"; + types = new char *[1]; + size = 1; + types[0] = new char[type_name.size() + 1]; + std::copy(type_name.begin(), type_name.end(), types[0]); + types[0][type_name.size()] = '\0'; + return OK; + } + // Main function + StatusCode getFactoryFor(ILayerImplFactory *&factory, const CNNLayer *cnnLayer, ResponseDesc *resp) noexcept override { + if (cnnLayer->type != "CustomLayer") { + std::string errorMsg = std::string("Factory for ") + cnnLayer->type + " wasn't found!"; + errorMsg.copy(resp->msg, sizeof(resp->msg) - 1); + return NOT_FOUND; + } + factory = new CustomLayerFactory(cnnLayer); + return OK; + } +}; +``` diff --git a/docs/IE_DG/Extensibility_DG/deprecated/ShapeInfer.md b/docs/IE_DG/Extensibility_DG/deprecated/ShapeInfer.md new file mode 100644 index 00000000000000..5e7101c9d269cf --- /dev/null +++ b/docs/IE_DG/Extensibility_DG/deprecated/ShapeInfer.md @@ -0,0 +1,18 @@ +# Old ShapeInference Extensibility API {#openvino_docs_IE_DG_Extensibility_DG_deprecated_ShapeInfer} + +The new approach to shape inference suggests a creation of a custom nGraph operation that contains a special method for shape inference. +The following classes and methods were deprecated: + + * `InferenceEngine::IShapeInferExtension` class + * `InferenceEngine::IShapeInferExtension::getShapeInferTypes(char**&, unsigned int&, ResponseDesc*)` method + * `InferenceEngine::IShapeInferExtension::getShapeInferImpl(IShapeInferImpl::Ptr&, const char*, ResponseDesc*)` method + +However, the old approach with the `InferenceEngine::IShapeInferExtension` method still works for already existing custom layers. +Custom Shape Inference functions are registered by calling `InferenceEngine::ICNNNetwork::AddExtension` with the implemented `InferenceEngine::IShapeInferExtension` method, which is a holder of custom implementations. +The holder requires to implement two key methods: +* `InferenceEngine::IShapeInferExtension::getShapeInferImpl` - Returns custom shape inference implementation for the given type. +* `InferenceEngine::IShapeInferExtension::getShapeInferTypes` - Provides all custom types. + +Custom shape inference implementation is represented by the `InferenceEngine::IShapeInferImpl::inferShapes` method. + +It is impossible to overwrite built-in shape inference functions. Custom type must be different from the supported ones. diff --git a/docs/IE_DG/GPU_Kernels_Tuning.md b/docs/IE_DG/GPU_Kernels_Tuning.md new file mode 100644 index 00000000000000..0b308682f40ff4 --- /dev/null +++ b/docs/IE_DG/GPU_Kernels_Tuning.md @@ -0,0 +1,43 @@ +Using GPU Kernels Tuning {#openvino_docs_IE_DG_GPU_Kernels_Tuning} +====================== + +GPU Kernels Tuning allows you to tune models, so the heavy computational layers are configured to fit better into +hardware, which the tuning was done on. It is required to achieve best performance on GPU. +> **NOTE** Currently only convolution and fully connected layers undergo tuning process. It means that the performance boost depends on the amount of that layers in the model. + +OpenVINO™ releases include the `/inference_engine/bin/intel64/Release/cache.json` file with pretuned data for current state of the art models. It is highly recommended to do the +tuning for new kind of models, hardwares or drivers. + +## Tuned data + +GPU tuning data is saved in JSON format. +File's content is composed of 2 types of attributes and 1 type of value: +1. Execution units number - this attribute splits the content into different EU sections. +2. Hash - hashed tuned kernel data. +Key: Array with kernel name and kernel's mode index. + +## Usage + +--- + +You can activate Kernels Tuning process by setting `KEY_TUNING_MODE` flag to `TUNING_CREATE` and `KEY_TUNING_FILE` to `<"filename">` in a configuration map that is +passed to the plugin while loading a network. +This configuration modifies the behavior of the `ExecutableNetwork` object. Instead of standard network compilation, it will run the tuning process. +Please keep in mind that the tuning can be very time consuming. The bigger the network, the longer it will take. +File with tuned data is the result of this step. + +> **NOTE** If a filename passed to `KEY_TUNING_FILE` points to existing tuned data and you are tuning a new model, then this file will be extended by new data. This allows you to extend existing `cache.json` provided in the OpenVINO™ release package. + +The example below shows how to set and use the key files: +```cpp +Core ie; + ie.SetConfig({{ CONFIG_KEY(TUNING_MODE), CONFIG_VALUE(TUNING_CREATE) }}, "GPU"); + ie.SetConfig({{ CONFIG_KEY(TUNING_FILE), "/path/to/tuning/file.json" }}, "GPU"); + // Further LoadNetwork calls will use the specified tuning parameters +``` +--- + +You can activate the inference with tuned data by setting `KEY_TUNING_MODE` flag to `TUNING_USE_EXISTING` and +`KEY_TUNING_FILE` flag to `<"filename">`. + +GPU backend will process the content of the file during network compilation to configure the OpenCL kernels for the best performance. diff --git a/docs/IE_DG/Glossary.md b/docs/IE_DG/Glossary.md new file mode 100644 index 00000000000000..139a35bb84e11a --- /dev/null +++ b/docs/IE_DG/Glossary.md @@ -0,0 +1,89 @@ +Glossary {#openvino_docs_IE_DG_Glossary} +======= + +## Acronyms and Abbreviations + +| Abbreviation | Description | +| :--- | :--- | +| API | Application Programming Interface | +| AVX | Advanced Vector Extensions | +| clDNN | Compute Library for Deep Neural Networks | +| CLI | Command Line Interface | +| CNN | Convolutional Neural Network | +| CPU | Central Processing Unit | +| CV | Computer Vision | +| DL | Deep Learning | +| DLDT | Intel(R) Deep Learning Deployment Toolkit | +| DLL | Dynamic Link Library | +| DNN | Deep Neural Networks | +| ELU | Exponential Linear rectification Unit | +| FCN | Fully Convolutional Network | +| FP | Floating Point | +| FPGA | Field-Programmable Gate Array | +| GCC | GNU Compiler Collection | +| GPU | Graphics Processing Unit | +| HD | High Definition | +| IE | Inference Engine | +| IR | Intermediate Representation | +| JIT | Just In Time | +| JTAG | Joint Test Action Group | +| LPR | License-Plate Recognition | +| LRN | Local Response Normalization | +| mAP | Mean Average Precision | +| Intel(R) MKL-DNN | Intel(R) Math Kernel Library Deep Neural Networks | +| MO | Model Optimizer | +| MVN | Mean Variance Normalization | +| NCDHW | Number of images, Channels, Depth, Height, Width | +| NCHW | Number of images, Channels, Height, Width | +| NHWC | Number of images, Height, Width, Channels | +| NMS | Non-Maximum Suppression | +| NN | Neural Network | +| NST | Neural Style Transfer | +| OD | Object Detection | +| OS | Operating System | +| PCI | Peripheral Component Interconnect | +| PReLU | Parametric Rectified Linear Unit | +| PSROI | Position Sensitive Region Of Interest | +| RCNN, R-CNN | Region-based Convolutional Neural Network | +| ReLU | Rectified Linear Unit | +| ROI | Region Of Interest | +| SDK | Software Development Kit | +| SSD | Single Shot multibox Detector | +| SSE | Streaming SIMD Extensions | +| USB | Universal Serial Bus | +| VGG | Visual Geometry Group | +| VOC | Visual Object Classes | +| WINAPI | Windows Application Programming Interface | + +## Terms + +Glossary of terms used in the Inference Engine + + +| Term | Description | +| :--- | :--- | +| Batch | Number of images to analyze during one call of infer. Maximum batch size is a property of the network and it is set before loading of the network to the plugin. In NHWC, NCHW and NCDHW image data layout representation, the N refers to the number of images in the batch | +| Blob | Memory container used for storing inputs, outputs of the network, weights and biases of the layers | +| Device (Affinitity) | A preferred Intel(R) hardware device to run the inference (CPU, GPU, FPGA, etc.) | +| Extensibility mechanism, Custom layers | The mechanism that provides you with capabilities to extend the Inference Engine and Model Optimizer so that they can work with topologies containing layers that are not yet supported | +| ICNNNetwork | An Interface of the Convolutional Neural Network that Inference Engine reads from IR. Consists of topology, weights and biases | +| IExecutableNetwork | An instance of the loaded network which allows the Inference Engine to request (several) infer requests and perform inference synchronously or asynchronously | +| IHeteroInferencePlugin | Interface that is implemented by the heterogeneity plugin to allow the Inference Engine to set the default affinities for layers by devices before loading the network to the heterogeneous plugin. You can modify affinities manually before loading to the plugin. | +| IInferencePlugin | Interface provided by each plugin to allow the Inference Engine to load ICNNNetwork to the plugin, create Executable network and set special dedicated options for the plugin | +| IInferRequest | Interface that represents the end point of inference on the model loaded to the plugin and represented by executable network. Inputs are set here, outputs should be requested from this interface as well | +| InferenceEngineProfileInfo | Represents basic inference profiling information per layer | +| Inference Engine | A C++ library with a set of classes that you can use in your application to infer input data (images) and get the result | +| Inference Engine API | The basic default API for all supported devices, which allows you to load a model from Intermediate Representation, set input and output formats and execute the model on various devices | +| Inference Engine Plugin | Inference Engine plugin is a software component that contains complete implementation for inference on a certain Intel(R) hardware device: CPU, GPU, VPU, FPGA, etc. Each plugin implements the unified API and provides additional hardware-specific APIs. | +| Layer catalog or Operations specification | A list of supported layers or operations and its parameters. Sets of supported layers are different for different plugins, please check the documentation on plugins to verify if the Inference Engine supports certain layer on the dedicated hardware | +| Layout | Image data layout refers to the representation of images batch. Layout shows a sequence of 4D or 5D tensor data in memory. A typical NCHW format represents pixel in horizontal direction, rows by vertical dimension, planes by channel and images into batch | +| OutputsDataMap | Structure which contains information about output precisions and layouts | +| Precision | Represents data precision. For example, FP32 is 32-bit floating point, FP16 is 16-bit floating point. Precision can be changed before loading the network to the plugin | +| PreProcessInfo | Class that represents input data for the network. It contains information about input precision, its layout, and pre-processing | +| ResponseDesc | Represents debug information for an error | + + +## See Also +* [Deep Learning Model Optimizer IR Operations Catalog](../ops/opset.md) +* [Inference Engine Memory primitives](Memory_primitives.md) +* [Terminology](supported_plugins/Supported_Devices.md) diff --git a/docs/IE_DG/Graph_debug_capabilities.md b/docs/IE_DG/Graph_debug_capabilities.md new file mode 100644 index 00000000000000..856bbeb49eb463 --- /dev/null +++ b/docs/IE_DG/Graph_debug_capabilities.md @@ -0,0 +1,64 @@ +# Graph Debug Capabilities {#openvino_docs_IE_DG_Graph_debug_capabilities} + +Inference Engine supports two different objects for a graph representation: the nGraph function and +CNNNetwork. Both representations provide an API to get detailed information about the graph structure. + +## nGraph Function + +To receive additional messages about applied graph modifications, rebuild the nGraph library with +the `-DNGRAPH_DEBUG_ENABLE=ON` option. + +To enable serialization and deserialization of the nGraph function to a JSON file, rebuild the +nGraph library with the `-DNGRAPH_JSON_ENABLE=ON` option. To serialize or deserialize the nGraph +function, call the nGraph function as follows: + +```cpp +#include + +std::shared_ptr nGraph; +... +ngraph::serialize("test_json.json", nGraph); // For graph serialization +std::ifstream file("test_json.json"); // Open a JSON file +nGraph = ngraph::deserialize(file); // For graph deserialization +``` + +To visualize the nGraph function to the xDot format or to an image file, use the +`ngraph::pass::VisualizeTree` graph transformation pass: +```cpp +#include + +std::shared_ptr nGraph; +... +std::vector> g2{nGraph}; +ngraph::pass::VisualizeTree("after.png").run_on_module(g2); // Visualize the nGraph function to an image +``` + +## CNNNetwork + +To serialize the CNNNetwork to the Inference Engine Intermediate Representation (IR) format, use the +`CNNNetwork::serialize(...)` method: +```cpp +std::shared_ptr nGraph; +... +CNNNetwork network(nGraph); +network.serialize("test_ir.xml", "test_ir.bin"); +``` +> **NOTE**: CNNNetwork created from the nGraph function might differ from the original nGraph +> function because the Inference Engine applies some graph transformation. + +## Deprecation Notice + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* diff --git a/docs/IE_DG/InferenceEngine_QueryAPI.md b/docs/IE_DG/InferenceEngine_QueryAPI.md new file mode 100644 index 00000000000000..bed82bca12b32c --- /dev/null +++ b/docs/IE_DG/InferenceEngine_QueryAPI.md @@ -0,0 +1,102 @@ +Introduction to Inference Engine Device Query API {#openvino_docs_IE_DG_InferenceEngine_QueryAPI} +=============================== + +This section provides a high-level description of the process of querying of different device properties and configuration values. +Refer to the [Hello Query Device Sample](../../inference-engine/samples/hello_query_device/README.md) sources and [Multi-Device Plugin guide](supported_plugins/MULTI.md) for example of using the Inference Engine Query API in user applications. + +## Using the Inference Engine Query API in Your Code + +The Inference Engine `Core` class provides the following API to query device information, set or get different device configuration properties: + +* InferenceEngine::Core::GetAvailableDevices - Provides a list of available devices. If there are more than one instance of a specific device, the devices are enumerated with `.suffix` where `suffix` is a unique string identifier. The device name can be passed to all methods of the `InferenceEngine::Core` class that work with devices, for example `InferenceEngine::Core::LoadNetwork`. +* InferenceEngine::Core::GetMetric - Provides information about specific device. + InferenceEngine::Core::GetConfig - Gets the current value of a specific configuration key. +* InferenceEngine::Core::SetConfig - Sets a new value for the configuration key. + +The `InferenceEngine::ExecutableNetwork` class is also extended to support the Query API: + +* InferenceEngine::ExecutableNetwork::GetMetric +* InferenceEngine::ExecutableNetwork::GetConfig +* InferenceEngine::ExecutableNetwork::SetConfig + +## Query API in the Core Class + +### GetAvailableDevices + +```cpp +InferenceEngine::Core core; +std::vector availableDevices = ie.GetAvailableDevices(); +``` + +The function returns list of available devices, for example: +``` +MYRIAD.1.2-ma2480 +MYRIAD.1.4-ma2480 +FPGA.0 +FPGA.1 +CPU +GPU +... +``` + +Each device name can then be passed to: + +* `InferenceEngine::Core::LoadNetwork` to load the network to a specific device. +* `InferenceEngine::Core::GetMetric` to get common or device specific metrics. +* All other methods of the `Core` class that accept `deviceName`. + +### GetConfig() + +The code below demonstrates how to understand whether `HETERO` device dumps `.dot` files with split graphs during the split stage: + +```cpp +InferenceEngine::Core core; +bool dumpDotFile = core.GetConfig("HETERO", HETERO_CONFIG_KEY(DUMP_GRAPH_DOT)).as(); +``` + +For documentation about common configuration keys, refer to `ie_plugin_config.hpp`. Device specific configuration keys can be found in corresponding plugin folders. + +### GetMetric() + +* To extract device properties such as available device, device name, supported configuration keys, and others, use the `InferenceEngine::Core::GetMetric` method: + +```cpp +InferenceEngine::Core core; +std::string cpuDeviceName = core.GetMetric("GPU", METRIC_KEY(FULL_DEVICE_NAME)).as(); +``` + +A returned value looks as follows: `Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz`. + +> **NOTE**: All metrics have specific type, which is specified during metric instantiation. The list of common device-agnostic metrics can be found in `ie_plugin_config.hpp`. Device specific metrics (for example, for `HDDL`, `MYRIAD` devices) can be found in corresponding plugin folders. + +## Query API in the ExecutableNetwork Class + +### GetMetric() + +The method is used to get executable network specific metric such as `METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)`: +```cpp +InferenceEngine::Core core; +auto exeNetwork = core.LoadNetwork(network, "CPU"); +auto nireq = exeNetwork.GetMetric(METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)).as(); +``` + +Or the current temperature of `MYRIAD` device: +```cpp +InferenceEngine::Core core; +auto exeNetwork = core.LoadNetwork(network, "MYRIAD"); +float temperature = exeNetwork.GetMetric(METRIC_KEY(DEVICE_THERMAL)).as(); +``` + +### GetConfig() + +The method is used to get information about configuration values the executable network has been created with: + +```cpp +InferenceEngine::Core core; +auto exeNetwork = core.LoadNetwork(network, "CPU"); +auto ncores = exeNetwork.GetConfig(PluginConfigParams::KEY_CPU_THREADS_NUM).as(); +``` + +### SetConfig() + +The only device that supports this method is [Multi-Device](supported_plugins/MULTI.md). diff --git a/docs/IE_DG/Int8Inference.md b/docs/IE_DG/Int8Inference.md new file mode 100644 index 00000000000000..b815f0b15fd031 --- /dev/null +++ b/docs/IE_DG/Int8Inference.md @@ -0,0 +1,127 @@ +# Low-Precision 8-bit Integer Inference {#openvino_docs_IE_DG_Int8Inference} + +## Disclaimer + +Inference Engine with low-precision 8-bit integer inference requires the following prerequisites to be satisfied: +- Inference Engine [CPU Plugin](supported_plugins/CPU.md) must be built with the Intel® Math Kernel Library (Intel® MKL) dependency. In the Intel® Distribution of OpenVINO™ it is + satisfied by default, this is mostly the requirement if you are using OpenVINO™ available in open source, because [open source version of OpenVINO™](https://github.com/openvinotoolkit/openvino) can be built with OpenBLAS* that is unacceptable if you want to use 8-bit integer inference. +- Intel® platforms that support at least one extension to x86 instruction set from the following list: + - Intel® Advanced Vector Extensions 512 (Intel® AVX-512) + - Intel® Advanced Vector Extensions 2.0 (Intel® AVX2) + - Intel® Streaming SIMD Extensions 4.2 (Intel® SSE4.2) +- A model must be quantized. To quantize the model, you can use the [Post-Training Optimization Tool](@ref pot_README) delivered with the Intel® Distribution of OpenVINO™ toolkit release package. + +The 8-bit inference feature was validated on the following topologies: +* **Classification models:** + * Caffe\* DenseNet-121, DenseNet-161, DenseNet-169, DenseNet-201 + * Caffe Inception v1, Inception v2, Inception v3, Inception v4 + * Caffe YOLO v1 tiny, YOLO v3 + * Caffe ResNet-50 v1, ResNet-101 v1, ResNet-152 v1, ResNet-269 v1 + * Caffe ResNet-18 + * Caffe MobileNet, MobileNet v2 + * Caffe SE ResNeXt-50 + * Caffe SqueezeNet v1.0, SqueezeNet v1.1 + * Caffe VGG16, VGG19 + * TensorFlow\* DenseNet-121, DenseNet-169 + * TensorFlow Inception v1, Inception v2, Inception v3, Inception v4, Inception ResNet v2 + * TensorFlow Lite Inception v1, Inception v2, Inception v3, Inception v4, Inception ResNet v2 + * TensorFlow Lite MobileNet v1, MobileNet v2 + * TensorFlow MobileNet v1, MobileNet v2 + * TensorFlow ResNet-50 v1.5, ResNet-50 v1, ResNet-101 v1, ResNet-152 v1, ResNet-50 v2, ResNet-101 v2, ResNet-152 v2 + * TensorFlow VGG16, VGG19 + * TensorFlow YOLO v3 + * MXNet\* CaffeNet + * MXNet DenseNet-121, DenseNet-161, DenseNet-169, DenseNet-201 + * MXNet Inception v3, inception_v4 + * MXNet Mobilenet, Mobilenet v2 + * MXNet ResNet-101 v1, ResNet-152 v1, ResNet-101 v2, ResNet-152 v2 + * MXNet ResNeXt-101 + * MXNet SqueezeNet v1.1 + * MXNet VGG16, VGG19 + + +* **Object detection models:** + * Caffe SSD GoogLeNet + * Caffe SSD MobileNet + * Caffe SSD SqueezeNet + * Caffe SSD VGG16 300, SSD VGG16 512 + * TensorFlow SSD MobileNet v1, SSD MobileNet v2 + * MXNet SSD Inception v3 512 + * MXNet SSD MobileNet 512 + * MXNet SSD ResNet-50 512 + * MXNet SSD VGG16 300 + * ONNX\* SSD ResNet 34 + +* **Semantic segmentation models:** + * Unet2D + +* **Recommendation system models:** + * NCF + +## Introduction + +A lot of investigation was made in the field of deep learning with the idea of using low precision computations during inference in order to boost deep learning pipelines and gather higher performance. For example, one of the popular approaches is to shrink the precision of activations and weights values from `fp32` precision to smaller ones, for example, to `fp11` or `int8`. For more information about this approach, refer to +**Brief History of Lower Precision in Deep Learning** section in [this whitepaper](https://software.intel.com/en-us/articles/lower-numerical-precision-deep-learning-inference-and-training). + +8-bit computations (referred to as `int8`) offer better performance compared to the results of inference in higher precision (for example, `fp32`), because they allow loading more data into a single processor instruction. Usually the cost for significant boost is a reduced accuracy. However, it is proved that an accuracy drop can be negligible and depends on task requirements, so that the application engineer can set up the maximum accuracy drop that is acceptable. + +Current Inference Engine solution for low-precision inference uses Intel MKL-DNN and supports inference of the following layers in 8-bit integer computation mode: +* Convolution +* FullyConnected +* ReLU +* ReLU6 +* Reshape +* Permute +* Pooling +* Squeeze +* Eltwise +* Concat +* Resample +* MVN + +This means that 8-bit inference can only be performed with the CPU plugin on the layers listed above. All other layers are executed in the format supported by the CPU plugin: 32-bit floating point format (`fp32`). + +## Low-Precision 8-bit Integer Inference Workflow + +For 8-bit integer computations, a model must be quantized. If the model is not quantized then you can use the [Post-Training Optimization Tool](@ref pot_README) to quantize the model. The quantization process adds `FakeQuantize` layers on activations and weights for most layers. Read more about mathematical computations under the hood in the [white paper](https://intel.github.io/mkl-dnn/ex_int8_simplenet.html). + +8-bit inference pipeline includes two stages (also refer to the figure below): +1. *Offline stage*, or *model quantization*. During this stage, `FakeQuantize` layers are added before most layers to have quantized tensors before layers in a way that low-precision accuracy drop for 8-bit integer inference satisfies the specified threshold. The output of this stage is a quantized model. Quantized model precision is not changed, quantized tensors are in original precision range (`fp32`). `FakeQuantize` layer has `Quantization Levels` attribute whic defines quants count. Quants count defines precision which is used during inference. For `int8` range `Quantization Levels` attribute value has to be 255 or 256. + +2. *Run-time stage*. This stage is an internal procedure of the [CPU Plugin](supported_plugins/CPU.md). During this stage, the quantized model is loaded to the plugin. The plugin updates each `FakeQuantize` layer on activations and weights to have `FakeQuantize` output tensor values in low precision range. +![int8_flow] + +### Offline Stage: Model Quantization + +To infer a layer in low precision and get maximum performance, the input tensor for the layer has to be quantized and each value has to be in the target low precision range. For this purpose, `FakeQuantize` layer is used in the OpenVINO™ intermediate representation file (IR). To quantize the model, you can use the [Post-Training Optimization Tool](@ref pot_README) delivered with the Intel® Distribution of OpenVINO™ toolkit release package. + +When you pass the calibrated IR to the [CPU plugin](supported_plugins/CPU.md), the plugin automatically recognizes it as a quantized model and performs 8-bit inference. Note, if you pass a quantized model to another plugin that does not support 8-bit inference, the model is inferred in precision that this plugin supports. + +### Run-Time Stage: Quantization + +This is the second stage of the 8-bit integer inference. After you load the quantized model IR to a plugin, the pluing uses the `Low Precision Transformation` component to update the model to infer it in low precision: +* Updates `FakeQuantize` layers to have quantized output tensors in low precision range and add dequantization layers to compensate the update. Dequantization layers are pushed through as many layers as possible to have more layers in low precision. After that, most layers have quantized input tensors in low precision range and can be inferred in low precision. Ideally, dequantization layers should be fused in next `FakeQuantize` or `ScaleShift` layers. +* Weights are quantized and stored in `Const` layers. +* Biases are updated to avoid shifts in dequantization layers. + +## Performance Counters + +Information about layer precision is stored in the performance counters that are +available from the Inference Engine API. The layers have the following marks: +* Suffix `I8` for layers that had 8-bit data type input and were computed in 8-bit precision +* Suffix `FP32` for layers computed in 32-bit precision + +For example, the performance counters table for the Inception model can look as follows: + +``` +inception_5b/5x5_reduce EXECUTED layerType: Convolution realTime: 417 cpu: 417 execType: gemm_blas_I8 +inception_5b/output EXECUTED layerType: Concat realTime: 34 cpu: 34 execType: ref_I8 +inception_5b/output_U8_nhw... EXECUTED layerType: Reorder realTime: 33092 cpu: 33092 execType: reorder_I8 +inception_5b/output_oScale... EXECUTED layerType: ScaleShift realTime: 1390 cpu: 1390 execType: jit_avx2_FP32 +inception_5b/output_oScale... EXECUTED layerType: Reorder realTime: 143 cpu: 143 execType: reorder_FP32 +inception_5b/pool EXECUTED layerType: Pooling realTime: 59301 cpu: 59301 execType: ref_any_I8 +``` + +The `execType` column of the table includes inference primitives with specific suffixes. + +[int8_flow]: img/cpu_int8_flow.png \ No newline at end of file diff --git a/docs/IE_DG/Integrate_with_customer_application_new_API.md b/docs/IE_DG/Integrate_with_customer_application_new_API.md new file mode 100644 index 00000000000000..07618a77a0a9a1 --- /dev/null +++ b/docs/IE_DG/Integrate_with_customer_application_new_API.md @@ -0,0 +1,320 @@ +Integrate the Inference Engine with Your Application {#openvino_docs_IE_DG_Integrate_with_customer_application_new_API} +=============================== + +This section provides a high-level description of the process of integrating the Inference Engine into your application. +Refer to the [Hello Classification Sample](../../inference-engine/samples/hello_classification/README.md) sources +for example of using the Inference Engine in applications. + +> **NOTE**: For 2019 R2 Release, the new Inference Engine Core API is introduced. This guide is updated to reflect the new API approach. +> The Inference Engine Plugin API is still supported, but is going to be deprecated in future releases. Please, refer to [Migration from Inference Engine Plugin API to Core API](Migration_CoreAPI.md) guide to update your application. + +## Use the Inference Engine API in Your Code + +The core `libinference_engine.so` library implements loading and parsing a model Intermediate Representation (IR), and triggers inference using a specified device. The core library has the following API: + +* `InferenceEngine::Core` +* `InferenceEngine::Blob`, `InferenceEngine::TBlob`, + `InferenceEngine::NV12Blob` +* `InferenceEngine::BlobMap` +* `InferenceEngine::InputsDataMap`, `InferenceEngine::InputInfo`, +* `InferenceEngine::OutputsDataMap` + +C++ Inference Engine API wraps the capabilities of core library: + +* `InferenceEngine::CNNNetwork` +* `InferenceEngine::ExecutableNetwork` +* `InferenceEngine::InferRequest` + +## Integration Steps + +Integration process includes the following steps: +![integration_process] + +1) **Create Inference Engine Core** to manage available devices and read network objects: +```cpp +InferenceEngine::Core core; +``` + +2) **Read a model IR** created by the Model Optimizer (.xml is supported format): +```cpp +auto network = core.ReadNetwork("Model.xml"); +``` +**Or read the model from ONNX format** (.onnx and .prototxt are supported formats) +```cpp +auto network = core.ReadNetwork("model.onnx"); +``` + +3) **Configure input and output**. Request input and output information using `InferenceEngine::CNNNetwork::getInputsInfo()`, and `InferenceEngine::CNNNetwork::getOutputsInfo()` +methods: +```cpp +/** Take information about all topology inputs **/ +InferenceEngine::InputsDataMap input_info = network.getInputsInfo(); +/** Take information about all topology outputs **/ +InferenceEngine::OutputsDataMap output_info = network.getOutputsInfo(); +``` + Optionally, set the number format (precision) and memory layout for inputs and outputs. Refer to the + [Supported configurations](supported_plugins/Supported_Devices.md) chapter to choose the relevant configuration. + + You can also allow input of any size. To do this, mark each input as resizable by setting a desired resize algorithm (e.g. `BILINEAR`) inside of the appropriate input info. + + Basic color format conversions are supported as well. By default, the Inference Engine assumes + that the input color format is `BGR` and color format conversions are disabled. The Inference + Engine supports the following color format conversions: + * `RGB->BGR` + * `RGBX->BGR` + * `BGRX->BGR` + * `NV12->BGR` + + where `X` is a channel that will be ignored during inference. To enable the conversions, set a + desired color format (for example, `RGB`) for each input inside of the appropriate input info. + + If you want to run inference for multiple images at once, you can use the built-in batch + pre-processing functionality. + +> **NOTE**: Batch pre-processing is not supported if input color format is set to `ColorFormat::NV12`. + + You can use the following code snippet to configure input and output: +```cpp +/** Iterate over all input info**/ +for (auto &item : input_info) { + auto input_data = item.second; + input_data->setPrecision(Precision::U8); + input_data->setLayout(Layout::NCHW); + input_data->getPreProcess().setResizeAlgorithm(RESIZE_BILINEAR); + input_data->getPreProcess().setColorFormat(ColorFormat::RGB); +} +/** Iterate over all output info**/ +for (auto &item : output_info) { + auto output_data = item.second; + output_data->setPrecision(Precision::FP32); + output_data->setLayout(Layout::NC); +} +``` + +> **NOTE**: NV12 input color format pre-processing differs from other color conversions. In case of NV12, +> Inference Engine expects two separate image planes (Y and UV). You must use a specific +> `InferenceEngine::NV12Blob` object instead of default blob object and set this blob to +> the Inference Engine Infer Request using `InferenceEngine::InferRequest::SetBlob()`. +> Refer to [Hello NV12 Input Classification C++ Sample](../../inference-engine/samples/hello_nv12_input_classification/README.md) +> for more details. + + If you skip this step, the default values are set: + + * no resize algorithm is set for inputs + * input color format - `ColorFormat::RAW` meaning that input does not need color + conversions + * input and output precision - `Precision::FP32` + * input layout - `Layout::NCHW` + * output layout depends on number of its dimensions: + +|Number of dimensions | 5 | 4 | 3 | 2 | 1 | +|:--------------------|-------|------|-----|----|----| +|Layout | NCDHW | NCHW | CHW | NC | C | + +4) **Load the model** to the device using `InferenceEngine::Core::LoadNetwork()`: +```cpp +auto executable_network = core.LoadNetwork(network, "CPU"); +``` + It creates an executable network from a network object. The executable network is associated with single hardware device. + It is possible to create as many networks as needed and to use them simultaneously (up to the limitation of the hardware resources). + Third parameter is a configuration for plugin. It is map of pairs: (parameter name, parameter value). Choose device from + [Supported devices](supported_plugins/Supported_Devices.md) page for more details about supported configuration parameters. +```cpp +/** Optional config. E.g. this enables profiling of performance counters. **/ +std::map config = {{ PluginConfigParams::KEY_PERF_COUNT, PluginConfigParams::YES }}; +auto executable_network = core.LoadNetwork(network, "CPU", config); +``` + +5) **Create an infer request**: +```cpp +auto infer_request = executable_network.CreateInferRequest(); +``` + +6) **Prepare input**. You can use one of the following options to prepare input: + * **Optimal way for a single network.** Get blobs allocated by an infer request using `InferenceEngine::InferRequest::GetBlob()` + and feed an image and the input data to the blobs. In this case, input data must be aligned (resized manually) with a + given blob size and have a correct color format. +```cpp +/** Iterate over all input blobs **/ +for (auto & item : inputInfo) { + auto input_name = item->first; + /** Get input blob **/ + auto input = infer_request.GetBlob(input_name); + /** Fill input tensor with planes. First b channel, then g and r channels **/ + ... +} +``` + * **Optimal way for a cascade of networks (output of one network is input for another).** Get output blob from the first + request using `InferenceEngine::InferRequest::GetBlob()` and set it as input for the second request using + `InferenceEngine::InferRequest::SetBlob()`. +```cpp +auto output = infer_request1->GetBlob(output_name); +infer_request2->SetBlob(input_name, output); +``` + * **Optimal way to handle ROI (a ROI object located inside of input of one network is input for another).** It is + possible to re-use shared input by several networks. You do not need to allocate separate input blob for a network if + it processes a ROI object located inside of already allocated input of a previous network. For instance, when first + network detects objects on a video frame (stored as input blob) and second network accepts detected bounding boxes + (ROI inside of the frame) as input. + In this case, it is allowed to re-use pre-allocated input blob (used by first network) by second network and just crop + ROI without allocation of new memory using `InferenceEngine::make_shared_blob()` with passing of + `InferenceEngine::Blob::Ptr` and `InferenceEngine::ROI` as parameters. +```cpp +/** inputBlob points to input of a previous network and + cropROI contains coordinates of output bounding box **/ +InferenceEngine::Blob::Ptr inputBlob; +InferenceEngine::ROI cropRoi; +... + +/** roiBlob uses shared memory of inputBlob and describes cropROI + according to its coordinates **/ +auto roiBlob = InferenceEngine::make_shared_blob(inputBlob, cropRoi); +infer_request2->SetBlob(input_name, roiBlob); +``` + Make sure that shared input is kept valid during execution of each network. Otherwise, ROI blob may be corrupted if the + original input blob (that ROI is cropped from) has already been rewritten. + + * Allocate input blobs of the appropriate types and sizes, feed an image and the input data to the blobs, and call + `InferenceEngine::InferRequest::SetBlob()` to set these blobs for an infer request: +```cpp +/** Iterate over all input blobs **/ +for (auto & item : inputInfo) { + auto input_data = item->second; + /** Create input blob **/ + InferenceEngine::TBlob::Ptr input; + // assuming input precision was asked to be U8 in prev step + input = InferenceEngine::make_shared_blob(InferenceEngine::Precision:U8, input_data->getDims()); + input->allocate(); + infer_request->SetBlob(item.first, input); + + /** Fill input tensor with planes. First b channel, then g and r channels **/ + ... +} +``` + A blob can be filled before and after `SetBlob()`. + +> **NOTE:** +> +> * `SetBlob()` method compares precision and layout of an input blob with ones defined on step 3 and +> throws an exception if they do not match. It also compares a size of the input blob with input +> size of the read network. But if input was configured as resizable, you can set an input blob of +> any size (for example, any ROI blob). Input resize will be invoked automatically using resize +> algorithm configured on step 3. Similarly to the resize, color format conversions allow the color +> format of an input blob to differ from the color format of the read network. Color format +> conversion will be invoked automatically using color format configured on step 3. +> +> * `GetBlob()` logic is the same for pre-processable and not pre-processable input. Even if it is +> called with input configured as resizable or as having specific color format, a blob allocated by +> an infer request is returned. Its size and color format are already consistent with the +> corresponding values of the read network. No pre-processing will happen for this blob. If you +> call `GetBlob()` after `SetBlob()`, you will get the blob you set in `SetBlob()`. + +7) **Do inference** by calling the `InferenceEngine::InferRequest::StartAsync` and `InferenceEngine::InferRequest::Wait` +methods for asynchronous request: +```cpp +infer_request->StartAsync(); +infer_request.Wait(IInferRequest::WaitMode::RESULT_READY); +``` + +or by calling the `InferenceEngine::InferRequest::Infer` method for synchronous request: +```cpp +sync_infer_request->Infer(); +``` +`StartAsync` returns immediately and starts inference without blocking main thread, `Infer` blocks + main thread and returns when inference is completed. +Call `Wait` for waiting result to become available for asynchronous request. + +There are three ways to use it: +* specify maximum duration in milliseconds to block for. The method is blocked until the specified timeout has elapsed, +or the result becomes available, whichever comes first. +* `InferenceEngine::IInferRequest::WaitMode::RESULT_READY` - waits until inference result becomes available +* `InferenceEngine::IInferRequest::WaitMode::STATUS_ONLY` - immediately returns request status.It does not +block or interrupts current thread. + +Both requests are thread-safe: can be called from different threads without fearing corruption and failures. + +Multiple requests for single `ExecutableNetwork` are executed sequentially one by one in FIFO order. + +While request is ongoing, all its methods except `InferenceEngine::InferRequest::Wait` would throw an +exception. + +8) Go over the output blobs and **process the results**. +Note that casting `Blob` to `TBlob` via `std::dynamic_pointer_cast` is not recommended way, +better to access data via `buffer()` and `as()` methods as follows: +```cpp + for (auto &item : output_info) { + auto output_name = item.first; + auto output = infer_request.GetBlob(output_name); + { + auto const memLocker = output->cbuffer(); // use const memory locker + // output_buffer is valid as long as the lifetime of memLocker + const float *output_buffer = memLocker.as(); + /** output_buffer[] - accessing output blob data **/ + +``` + +## Build Your Application + +For details about building your application, refer to the CMake files for the sample applications. +All samples source code is located in the `/openvino/inference_engine/samples` directory, where `INSTALL_DIR` is the OpenVINO™ installation directory. + +### CMake project creation + +1. **Create a structure** for the project: +``` sh +project/ + ├── CMakeLists.txt - CMake file to build + ├── ... - Additional folders like includes/ + └── src/ - source folder + └── main.cpp +build/ - build directory + ... +``` + +2. **Include Inference Engine, nGraph and OpenCV libraries** in `project/CMakeLists.txt` +[OpenCV](https://docs.opencv.org/master/db/df5/tutorial_linux_gcc_cmake.html) integration is needed mostly for pre-processing input data and ngraph for more complex applications using [ngraph API](nGraph_Flow.md). +``` cmake +cmake_minimum_required(VERSION 3.0.0) +project(project_name) +find_package(ngraph REQUIRED) +find_package(InferenceEngine REQUIRED) +find_package(OpenCV REQUIRED) +add_executable(${PROJECT_NAME} src/main.cpp) +target_link_libraries(${PROJECT_NAME} PRIVATE ${InferenceEngine_LIBRARIES} ${OpenCV_LIBS} ${NGRAPH_LIBRARIES}) +``` +3. **To build your project** using CMake with the default build tools currently available on your machine, execute the following commands: +> **NOTE**: Make sure **Set the Environment Variables** step in [OpenVINO Installation](../../inference-engine/samples/hello_nv12_input_classification/README.md) document is applied to your terminal, otherwise `InferenceEngine_DIR` and `OpenCV_DIR` variables won't be configured properly to pass `find_package` calls. +```sh +cd build/ +cmake ../project +cmake --build . +``` +It's allowed to specify additional build options (e.g. to build CMake project on Windows with a specific build tools). Please refer to the [CMake page](https://cmake.org/cmake/help/latest/manual/cmake.1.html#manual:cmake(1)) for details. + +### Run Your Application + +> **NOTE**: Before running, make sure you completed **Set the Environment Variables** section in [OpenVINO Installation](../../inference-engine/samples/hello_nv12_input_classification/README.md) document so that the application can find the libraries. + +To run compiled applications on Microsoft* Windows* OS, make sure that Microsoft* Visual C++ 2015 +Redistributable and Intel® C++ Compiler 2017 Redistributable packages are installed and +`/bin/intel64/Release/*.dll` files are placed to the +application folder or accessible via `%PATH%` environment variable. + +[integration_process]: img/integration_process.png + +## Deprecation Notice + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* diff --git a/docs/IE_DG/Intro_to_Performance.md b/docs/IE_DG/Intro_to_Performance.md new file mode 100644 index 00000000000000..2987a3628bab17 --- /dev/null +++ b/docs/IE_DG/Intro_to_Performance.md @@ -0,0 +1,99 @@ +# Introduction to the Performance Topics {#openvino_docs_IE_DG_Intro_to_Performance} + +This section is a shorter version of the +[Optimization Guide](supported_plugins/MULTI.md) for the Intel Deep Learning Deployment Toolkit. + +## Precision +Inference precision directly affects the performance. + +Model Optimizer can produce an IR with different precision. For example, float16 IR initially targets VPU and GPU devices, while, for example, the CPU can also execute regular float32. +Also, further device-specific inference precision settings are available, for example, [8-bit integer](Int8Inference.md) or [bfloat16](Bfloat16Inference.md) inference on the CPU. +Note that for [MULTI device](supported_plugins/MULTI.md) that supports automatic inference on multiple devices in parallel, you can use the FP16 IR. +You can find more information, including preferred data types for specific devices, in the +[Supported Devices](supported_plugins/Supported_Devices.md) section. + +## Lowering Inference Precision +Default optimization is used for CPU and implies that inference is made with lower precision if it is possible on a given platform to reach better performance with acceptable range of accuracy. +This approach is used for CPU device if platform supports the AVX512_BF16 instruction. In this case, a regular float32 model is converted to [bfloat16](Bfloat16Inference.md) internal representation and inference is provided with bfloat16 layers usage. +Below is the example command line to disable this feature on the CPU device with the AVX512_BF16 instruction and execute regular float32. +``` +$ benchmark_app -m -enforcebf16=false + ``` + +## Latency vs. Throughput +One way to increase computational efficiency is batching, which combines many (potentially tens) of +input images to achieve optimal throughput. However, high batch size also comes with a +latency penalty. So, for more real-time oriented usages, lower batch sizes (as low as a single input) are used. +Refer to the [Benchmark App](../../inference-engine/samples/benchmark_app/README.md) sample, which allows latency vs. throughput measuring. + +## Using Async API +To gain better performance on accelerators, such as VPU or FPGA, the Inference Engine uses the asynchronous approach (see +[Integrating Inference Engine in Your Application (current API)](Integrate_with_customer_application_new_API.md)). +The point is amortizing the costs of data transfers, by pipe-lining, see [Async API explained](@ref omz_demos_object_detection_demo_ssd_async_README). +Since the pipe-lining relies on the availability of the parallel slack, running multiple inference requests in parallel is essential. +Refer to the [Benchmark App](../../inference-engine/samples/benchmark_app/README.md) sample, which enables running a number of inference requests in parallel. Specifying different number of request produces different throughput measurements. + +## Best Latency on the Multi-Socket CPUs +Note that when latency is of concern, there are additional tips for multi-socket systems. +When input is limited to the single image, the only way to achieve the best latency is to limit execution to the single socket. +The reason is that single image is simply not enough +to saturate more than one socket. Also NUMA overheads might dominate the execution time. +Below is the example command line that limits the execution to the single socket using numactl for the best *latency* value +(assuming the machine with 28 phys cores per socket): +``` +limited to the single socket). +$ numactl -m 0 --physcpubind 0-27 benchmark_app -m -api sync -nthreads 28 + ``` +Note that if you have more than one input, running as many inference requests as you have NUMA nodes (or sockets) +usually gives the same best latency as a single request on the single socket, but much higher throughput. Assuming two NUMA nodes machine: +``` +$ benchmark_app -m -nstreams 2 + ``` +Number of NUMA nodes on the machine can be queried via 'lscpu'. +Please see more on the NUMA support in the [Optimization Guide](supported_plugins/MULTI.md). + +## Throughput Mode for CPU +Unlike most accelerators, CPU is perceived as an inherently latency-oriented device. +Since 2018 R5 release, the Inference Engine introduced the "throughput" mode, which allows the Inference Engine to efficiently run multiple inference requests on the CPU simultaneously, greatly improving the throughput. + +Internally, the execution resources are split/pinned into execution "streams". +Using this feature gains much better performance for the networks that originally are not scaled well with a number of threads (for example, lightweight topologies). This is especially pronounced for the many-core server machines. + +Run the [Benchmark App](../../inference-engine/samples/benchmark_app/README.md) and play with number of infer requests running in parallel, next section. +Try different values of the `-nstreams` argument from `1` to a number of CPU cores and find one that provides the best performance. + +In addition to the number of streams, it is also possible to play with the batch size to find the throughput sweet-spot. + +The throughput mode relaxes the requirement to saturate the CPU by using a large batch: running multiple independent inference requests in parallel often gives much better performance, than using a batch only. +This allows you to simplify the app-logic, as you don't need to combine multiple inputs into a batch to achieve good CPU performance. +Instead, it is possible to keep a separate infer request per camera or another source of input and process the requests in parallel using Async API. + +## Benchmark App +[Benchmark App](../../inference-engine/samples/benchmark_app/README.md) sample is the best performance reference. +It has a lot of device-specific knobs, but the primary usage is as simple as: +```bash +$ ./benchmark_app –d GPU –m -i +``` +to measure the performance of the model on the GPU. +Or +```bash +$ ./benchmark_app –d CPU –m -i +``` +to execute on the CPU instead. + +For example, for the CPU throughput mode from the previous section, you can play with number of streams (`-nstreams` command-line param). +Try different values of the `-nstreams` argument from `1` to a number of CPU cores and find one that provides the best performance. For example, on a 8-core CPU, compare the `-nstreams 1` (which is a latency-oriented scenario) to the `2`, `4` and `8` streams. Notice that `benchmark_app` automatically queries/creates/runs number of requests required to saturate the given number of streams. + +Finally, notice that when you don't specify number of streams with `-nstreams`, "AUTO" value for the streams is used, e.g. for the CPU this is [CPU_THROUGHPUT_AUTO](supported_plugins/CPU.md). You can spot the actual value behind "AUTO" for your machine in the application output. +Notice that the "AUTO" number is not necessarily most optimal, so it is generally recommended to play either with the benchmark_app's "-nstreams" as described above, or via [new Workbench tool](@ref workbench_docs_Workbench_DG_Introduction).This allows you to simplify the app-logic, as you don't need to combine multiple inputs into a batch to achieve good CPU performance. +Instead, it is possible to keep a separate infer request per camera or another source of input and process the requests in parallel using Async API. + +## Kernels Tuning for GPU + +GPU backend comes with a feature, that allows models tuning, so the workload is configured to fit better into hardware. + +Tuning is time consuming process, which internally execute every layer several (or even hundreds) times to find most performant configuration. + +This configuration is saved into json-formatted file, whose name can be passed as plugin param to network. GPU backend will process this data to configure kernels for the best performance. + +For more details about Kernels Tuning and How-To please refer to [GPU Kernels Tuning](GPU_Kernels_Tuning.md). diff --git a/docs/IE_DG/Introduction.md b/docs/IE_DG/Introduction.md new file mode 100644 index 00000000000000..27d223c4edcfc6 --- /dev/null +++ b/docs/IE_DG/Introduction.md @@ -0,0 +1,145 @@ +# Introduction to Intel® Deep Learning Deployment Toolkit {#openvino_docs_IE_DG_Introduction} + +## Deployment Challenges + +Deploying deep learning networks from the training environment to embedded platforms for inference +might be a complex task that introduces a number of technical challenges that must be addressed: + +* There are a number of deep learning frameworks widely used in the industry, such as Caffe*, TensorFlow*, MXNet*, Kaldi* etc. + +* Typically the training of the deep learning networks is performed in data centers or server farms while the inference +might take place on embedded platforms, optimized for performance and power consumption. Such platforms are typically +limited both from software perspective (programming languages, third party dependencies, memory consumption, +supported operating systems), and from hardware perspective (different data types, limited power envelope), +so usually it is not recommended (and sometimes just impossible) to use original training framework for inference. +An alternative solution would be to use dedicated inference APIs that are well optimized for specific hardware platforms. + +* Additional complications of the deployment process include supporting various layer types and networks that are getting +more and more complex. Obviously, ensuring the accuracy of the transforms networks is not trivial. + +## Deployment Workflow +The process assumes that you have a network model trained using one of the [supported frameworks](#SupportedFW). +The scheme below illustrates the typical workflow for deploying a trained deep learning model: +![scheme] + +The steps are: + +1. [Configure Model Optimizer](../MO_DG/prepare_model/Config_Model_Optimizer.md) for the specific framework (used to train your model). + +2. Run [Model Optimizer](#MO) to produce an optimized [Intermediate Representation (IR)](../MO_DG/IR_and_opsets.md) +of the model based on the trained network topology, weights and biases values, and other optional parameters. + +3. Test the model in the IR format using the [Inference Engine](#IE) in the target environment with provided +[Inference Engine sample applications](Samples_Overview.md). + +4. [Integrate Inference Engine](Integrate_with_customer_application_new_API.md) in your application to deploy the model in the target environment. + + +## Model Optimizer + +Model Optimizer is a cross-platform command line tool that facilitates the transition between the training and +deployment environment, performs static model analysis and automatically adjusts deep learning +models for optimal execution on end-point target devices. + +Model Optimizer is designed to support multiple deep learning [supported frameworks and formats](#SupportedFW). + +While running Model Optimizer you do not need to consider what target device you wish to use, the same output of the MO can be used in all targets. + +### Model Optimizer Workflow + +The process assumes that you have a network model trained using one of the [supported frameworks](#SupportedFW). +The Model Optimizer workflow can be described as following: + +* [Configure Model Optimizer](../MO_DG/prepare_model/Config_Model_Optimizer.md) for one of the supported deep learning framework that was used to train the model. +* Provide as input a trained network that contains a certain network topology, and the adjusted weights and +biases (with some optional parameters). +* [Run Model Optimizer](../MO_DG/prepare_model/convert_model/Converting_Model.md) to perform specific model optimizations (for example, horizontal fusion of certain network layers). Exact optimizations +are framework-specific, refer to appropriate documentation pages: [Converting a Caffe Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_Caffe.md), +[Converting a TensorFlow Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md), [Converting a MXNet Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_MxNet.md), [Converting a Kaldi Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md), +[Converting an ONNX Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md). +* Model Optimizer produces as output an [Intermediate Representation (IR)](../MO_DG/IR_and_opsets.md) of the network which is used as an input for the Inference Engine on all targets. + + +### Supported Frameworks and Formats +* Caffe* (most public branches) +* TensorFlow* +* MXNet* +* Kaldi* +* ONNX* + +### Supported Models +For the list of supported models refer to the framework or format specific page: +* [Supported Caffe* models](../MO_DG/prepare_model/convert_model/Convert_Model_From_Caffe.md) +* [Supported TensorFlow* models](../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md) +* [Supported MXNet* models](../MO_DG/prepare_model/convert_model/Convert_Model_From_MxNet.md) +* [Supported ONNX* models](../MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md) +* [Supported Kaldi* models](../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md) + + +## Intermediate Representation + +Intermediate representation describing a deep learning model plays an important role connecting the OpenVINO™ toolkit components. +The IR is a pair of files: + * `.xml`: The topology file - an XML file that describes the network topology + * `.bin`: The trained data file - a .bin file that contains the weights and biases binary data + +Intermediate Representation (IR) files can be read, loaded and inferred with the [Inference Engine](#IE). +Inference Engine API offers a unified API across a number of [supported Intel® platforms](#SupportedTargets). +IR is also consumed, modified and written by Post-Training Optimization Tool which provides quantization capabilities. + +Refer to a dedicated description about [Intermediate Representation and Operation Sets](../MO_DG/IR_and_opsets.md) for further details. + +## nGraph Integration + +OpenVINO toolkit is powered by nGraph capabilities for Graph construction API, Graph transformation engine and Reshape. +nGraph Function is used as an intermediate representation for a model in the run-time underneath the CNNNetwork API. +The conventional representation for CNNNetwork is still available if requested for backward compatibility when some conventional API methods are used. +Please refer to the [Overview of nGraph Flow](nGraph_Flow.md) describing the details of nGraph integration into the Inference Engine and co-existence with the conventional representation. + +**Deprecation Notice** + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* + +## Inference Engine + +Inference Engine is a runtime that delivers a unified API to integrate the inference with application logic: + +* Takes as input the model. The model presented in the specific form of [Intermediate Representation (IR)](../MO_DG/IR_and_opsets.md) +produced by Model Optimizer. +* Optimizes inference execution for target hardware. +* Delivers inference solution with reduced footprint on embedded inference platforms. + +The Inference Engine supports inference of multiple image classification networks, +including AlexNet, GoogLeNet, VGG and ResNet families of networks, fully convolutional networks like FCN8 used for image + segmentation, and object detection networks like Faster R-CNN. + +For the full list of supported hardware, refer to the +[Supported Devices](supported_plugins/Supported_Devices.md) section. + +For Intel® Distribution of OpenVINO™ toolkit, the Inference Engine package contains [headers](files.html), runtime libraries, and +[sample console applications](Samples_Overview.md) demonstrating how you can use +the Inference Engine in your applications. + +The open source version is available in the [OpenVINO™ toolkit GitHub repository](https://github.com/openvinotoolkit/openvino) and can be built for supported platforms using the Inference Engine Build Instructions. +## See Also +- [Inference Engine Samples](Samples_Overview.md) +- [Intel® Deep Learning Deployment Toolkit Web Page](https://software.intel.com/en-us/computer-vision-sdk) + + +[scheme]: img/workflow_steps.png + +#### Optimization Notice +For complete information about compiler optimizations, see our [Optimization Notice](https://software.intel.com/en-us/articles/optimization-notice#opt-en). diff --git a/docs/IE_DG/Known_Issues_Limitations.md b/docs/IE_DG/Known_Issues_Limitations.md new file mode 100644 index 00000000000000..ec3e4ffd8e2862 --- /dev/null +++ b/docs/IE_DG/Known_Issues_Limitations.md @@ -0,0 +1,58 @@ +# Known Issues and Limitations {#openvino_docs_IE_DG_Known_Issues_Limitations} + +## Multiple OpenMP Loadings + +If the application uses the Inference Engine with third-party components that depend on Intel OpenMP, multiple loadings of the libiomp library may occur and cause OpenMP runtime initialization conflicts. This may happen, for example, if the application uses Intel® Math Kernel Library (Intel® MKL) through the “Single Dynamic Library” (libmkl_rt.so) mechanism and calls Intel MKL after loading the Inference Engine plugin. +The error log looks as follows: +```sh +OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized. +OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/. +``` + +Possible workarounds: + +* Preload the OpenMP runtime using the LD_PRELOAD variable: +```sh +LD_PRELOAD= +``` + This eliminates multiple loadings of libiomp, and makes all the components use this specific version of OpenMP. + +* Alternatively, you can set KMP_DUPLICATE_LIB_OK=TRUE. However, performance degradation or results incorrectness may occur in this case. + + +## Old proto compiler breaks protobuf library + +With python protobuf library version 3.5.1 the following incompatibility can happen. +The known case is for Cent OS 7.4 + +The error log looks as follows: + +```sh +File "../lib64/python3.5/site-packages/google/protobuf/descriptor.py", line 829, in _new_ +return _message.default_pool.AddSerializedFile(serialized_pb) +TypeError: expected bytes, str found +``` + +Possible workaround is to upgrade default protobuf compiler (libprotoc 2.5.0) to newer version, for example +libprotoc 2.6.1. + +[protobuf_issue]: https://github.com/google/protobuf/issues/4272 + +## Dynamic batching +Refer to the **Limitations** section of [Dynamic batching page](DynamicBatching.md) + +## Static Shape Infer +Refer to the **Limitations** section of [Static Shape Infer page](ShapeInference.md) + + +## Image Pre-Processing Performance Optimization Issue + +As described in [documentation for new API](Integrate_with_customer_application_new_API.md), you can set an image blob of any size to an +infer request using resizable input. Resize is executed during inference using configured resize algorithm. + +But currently resize algorithms are not completely optimized. So expect performance degradation if resizable input is +specified and an input blob (to be resized) is set (`SetBlob()` is used). Required performance is met for +[CPU](supported_plugins/CPU.md) plugin only (because enabled openMP* provides parallelism). + +Another limitation is that currently, resize algorithms support NCHW layout only. So if you set NHWC layout for an input +blob, NHWC is converted to NCHW before resize and back to NHWC after resize. diff --git a/docs/IE_DG/Legal_Information.md b/docs/IE_DG/Legal_Information.md new file mode 100644 index 00000000000000..3b39dba5810fa4 --- /dev/null +++ b/docs/IE_DG/Legal_Information.md @@ -0,0 +1,12 @@ +# Legal Information {#openvino_docs_IE_DG_Legal_Information} + +No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
+Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.
+This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.
+The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.
+Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting [www.intel.com/design/literature.htm](http://www.intel.com/design/literature.htm).
+Intel, Intel logo, Intel Core, VTune, Xeon are trademarks of Intel Corporation in the U.S. and other countries.
+\* Other names and brands may be claimed as the property of others.
+Copyright © 2016-2018 Intel Corporation.
+This software and the related documents are Intel copyrighted materials, and your use of them is governed by the express license under which they were provided to you (License). Unless the License provides otherwise, you may not use, modify, copy, publish, distribute, disclose or transmit this software or the related documents without Intel's prior written permission.
+This software and the related documents are provided as is, with no express or implied warranties, other than those that are expressly stated in the License.
diff --git a/docs/IE_DG/Memory_primitives.md b/docs/IE_DG/Memory_primitives.md new file mode 100644 index 00000000000000..a6fed433d3c765 --- /dev/null +++ b/docs/IE_DG/Memory_primitives.md @@ -0,0 +1,55 @@ +Inference Engine Memory primitives {#openvino_docs_IE_DG_Memory_primitives} +===================================================================== + +## Blobs + +InferenceEngine::Blob is the main class intended for working with memory. +Using this class you can read and write memory, get information about the memory structure etc. + +The right way to create Blob objects with a specific layout is to use constructors with InferenceEngine::TensorDesc. +
+InferenceEngige::TensorDesc tdesc(FP32, {1, 3, 227, 227}, InferenceEngine::Layout::NCHW);
+InferenceEngine::Blob::Ptr blob = InferenceEngine::make_shared_blob(tdesc);
+
+ +## Layouts + +InferenceEngine::TensorDesc is a special class that provides layout format description. + +This class allows to create planar layouts using the standard formats (like InferenceEngine::Layout::NCDHW, InferenceEngine::Layout::NCHW, InferenceEngine::Layout::NC, InferenceEngine::Layout::C and etc) and also non-planar layouts using InferenceEngine::BlockingDesc. + +In order to create a complex layout you should use InferenceEngine::BlockingDesc which allows to define the blocked memory with offsets and strides. + +## Examples + +1. You can define a blob with dimensions {N: 1, C: 25, H: 20, W: 20} and format NHWC with using next parameters:
+
+InferenceEngine::BlockingDesc({1, 20, 20, 25}, {0, 2, 3, 1}); // or
+InferenceEngine::BlockingDesc({1, 20, 20, 25}, InferenceEngine::Layout::NHWC);
+
+2. If you have a memory with real dimensions {N: 1, C: 25, H: 20, W: 20} but with channels which are blocked by 8, you can define it using next parameters:
+
+InferenceEngine::BlockingDesc({1, 4, 20, 20, 8}, {0, 1, 2, 3, 1})
+
+3. Also you can set strides and offsets if layout contains it. +4. If you have a complex blob layout and you don't want to calculate the real offset to data you can use methods +InferenceEngine::TensorDesc::offset(size_t l) or InferenceEngine::TensorDesc::offset(SizeVector v).
+For example: +
+InferenceEngine::BlockingDesc blk({1, 4, 20, 20, 8}, {0, 1, 2, 3, 1});
+InferenceEngine::TensorDesc tdesc(FP32, {1, 25, 20, 20}, blk);
+tdesc.offset(0); // = 0
+tdesc.offset(1); // = 8
+tdesc.offset({0, 0, 0, 2}); // = 16
+tdesc.offset({0, 1, 0, 2}); // = 17
+
+5. If you would like to create a TensorDesc with a planar format and for N dimensions (N can be different 1, 2, 4 and etc), you can use the method +InferenceEngine::TensorDesc::getLayoutByDims. +
+InferenceEngine::TensorDesc::getLayoutByDims({1}); // InferenceEngine::Layout::C
+InferenceEngine::TensorDesc::getLayoutByDims({1, 2}); // InferenceEngine::Layout::NC
+InferenceEngine::TensorDesc::getLayoutByDims({1, 2, 3, 4}); // InferenceEngine::Layout::NCHW
+InferenceEngine::TensorDesc::getLayoutByDims({1, 2, 3}); // InferenceEngine::Layout::BLOCKED
+InferenceEngine::TensorDesc::getLayoutByDims({1, 2, 3, 4, 5}); // InferenceEngine::Layout::NCDHW
+InferenceEngine::TensorDesc::getLayoutByDims({1, 2, 3, 4, 5, ...}); // InferenceEngine::Layout::BLOCKED
+
\ No newline at end of file diff --git a/docs/IE_DG/Migration_CoreAPI.md b/docs/IE_DG/Migration_CoreAPI.md new file mode 100644 index 00000000000000..21a01991b7fb77 --- /dev/null +++ b/docs/IE_DG/Migration_CoreAPI.md @@ -0,0 +1,77 @@ +Migration from Inference Engine Plugin API to Core API {#openvino_docs_IE_DG_Migration_CoreAPI} +=============================== + +For 2019 R2 Release, the new Inference Engine Core API is introduced. This guide is updated to reflect the new API approach. The Inference Engine Plugin API is still supported, but is going to be deprecated in future releases. + +This section provides common steps to migrate your application written using the Inference Engine Plugin API (`InferenceEngine::InferencePlugin`) to the Inference Engine Core API (`InferenceEngine::Core`). + +To learn how to write a new application using the Inference Engine, refer to [Integrate the Inference Engine Request API with Your Application](Integrate_with_customer_application_new_API.md) and [Inference Engine Samples Overview](Samples_Overview.md). + +## Inference Engine Core Class + +The Inference Engine Core class is implemented on top existing Inference Engine Plugin API and handles plugins internally. +The main responsibility of the `InferenceEngine::Core` class is to hide plugin specifics inside and provide a new layer of abstraction that works with devices (`InferenceEngine::Core::GetAvailableDevices`). Almost all methods of this class accept `deviceName` as an additional parameter that denotes an actual device you are working with. Plugins are listed in the `plugins.xml` file, which is loaded during constructing `InferenceEngine::Core` objects: + +```bash + + + + + ... + +``` + +## Migration Steps + +Common migration process includes the following steps: + +1. Migrate from the `InferenceEngine::InferencePlugin` initialization: +```cpp +InferenceEngine::InferencePlugin plugin = InferenceEngine::PluginDispatcher({ FLAGS_pp }).getPluginByDevice(FLAGS_d); +``` +to the `InferenceEngine::Core` class initialization: +```cpp +InferenceEngine::Core core; +``` + +2. Instead of using `InferenceEngine::CNNNetReader` to read IR: +```cpp +CNNNetReader network_reader; +network_reader.ReadNetwork(fileNameToString(input_model)); +network_reader.ReadWeights(fileNameToString(input_model).substr(0, input_model.size() - 4) + ".bin"); +CNNNetwork network = network_reader.getNetwork(); +``` +read networks using the Core class: +```cpp +CNNNetwork network = core.ReadNetwork(input_model); +``` +The Core class also allows reading models from ONNX format: +```cpp +CNNNetwork network = core.ReadNetwork("model.onnx"); +``` + +3. Instead of adding CPU device extensions to the plugin: +```cpp +plugin.AddExtension(std::make_shared()); +``` +add extensions to CPU device using the Core class: +```cpp +core.AddExtension(std::make_shared(), "CPU"); +``` + +4. Instead of setting configuration keys to a particular plugin, set (key, value) pairs via `InferenceEngine::Core::SetConfig` +```cpp +core.SetConfig({{PluginConfigParams::KEY_CONFIG_FILE, FLAGS_c}}, "GPU"); +``` +> **NOTE**: If `deviceName` is omitted as the last argument, configuration is set for all Inference Engine devices. + +5. Migrate from loading the network to a particular plugin: +```cpp +auto execNetwork = plugin.LoadNetwork(network, { }); +``` +to `InferenceEngine::Core::LoadNetwork` to a particular device: +```cpp +auto execNetwork = core.LoadNetwork(network, deviceName, { }); +``` + +After you have an instance of `InferenceEngine::ExecutableNetwork`, all other steps are as usual. diff --git a/docs/IE_DG/OnnxImporterTutorial.md b/docs/IE_DG/OnnxImporterTutorial.md new file mode 100644 index 00000000000000..a63b0f9f44c4df --- /dev/null +++ b/docs/IE_DG/OnnxImporterTutorial.md @@ -0,0 +1,118 @@ +# ONNX* Importer API Tutorial {#openvino_docs_IE_DG_OnnxImporterTutorial} + +> **NOTE**: This tutorial is deprecated. Since OpenVINO™ 2020.4 version, Inference Engine enables reading ONNX models via the Inference Engine Core API +> and there is no need to use directly the low-level ONNX* Importer API anymore. +> To read ONNX\* models, it's recommended to use the InferenceEngine::Core::ReadNetwork method that provide a uniform way to read models from IR or ONNX format. + +This tutorial demonstrates how to use the ONNX\* Importer API. +This API makes it possible to create an nGraph `Function` object from an imported ONNX model. + +All functions of the ONNX Importer API are in the [onnx.hpp][onnx_header] header file. + +Two categories of API functions: +* Helper functions that check which ONNX ops are supported in a current version of the ONNX Importer +* Functions that read ONNX models from a stream or file and result in an nGraph function, which can be executed using the Inference Engine + +## Check Which ONNX Ops Are Supported + +To list all supported ONNX ops in a specific version and domain, use the `get_supported_operators` +as shown in the example below: +```cpp +const std::int64_t version = 12; +const std::string domain = "ai.onnx"; +const std::set supported_ops = ngraph::onnx_import::get_supported_operators(version, domain); + +for(const auto& op : supported_ops) +{ + std::cout << op << std::endl; +} +``` +The above code produces a list of all the supported operators for the `version` and `domain` you specified and outputs a list similar to this: +```cpp +Abs +Acos +... +Xor +``` + +To determine whether a specific ONNX operator in a particular version and domain is supported by the importer, use the `is_operator_supported` function as shown in the example below: +```cpp +const std::string op_name = "Abs"; +const std::int64_t version = 12; +const std::string domain = "ai.onnx"; +const bool is_abs_op_supported = ngraph::onnx_import::is_operator_supported(op_name, version, domain); + +std::cout << "Abs in version 12, domain `ai.onnx`is supported: " << (is_abs_op_supported ? "true" : "false") << std::endl; +``` + +## Import ONNX Model + +To import an ONNX model, use the `import_onnx_model` function. +The method has two overloads: +* `import_onnx_model` takes a stream as an input, for example, file stream, memory stream +* `import_onnx_model` takes a file path as an input + +Refer to the sections below for details. + +> **NOTE**: The examples below use the ONNX ResNet50 model, which is available at the [ONNX Model Zoo][onnx_model_zoo]: +> ```bash +> $ wget https://s3.amazonaws.com/download.onnx/models/opset_8/resnet50.tar.gz +> $ tar -xzvf resnet50.tar.gz +> ``` + +Once you create the `ng_function`, you can use it to run computation on the Inference Engine. +As it was shown in [Build a Model with nGraph Library](nGraphTutorial.md), `std::shared_ptr` can be transformed into a `CNNNetwork`. + + +### Stream as Input + +The code below shows how to convert the ONNX ResNet50 model to the nGraph function using `import_onnx_model` with the stream as an input: + +```cpp + const std::string resnet50_path = "resnet50/model.onnx"; + std::ifstream resnet50_stream(resnet50_path); + if(resnet50_stream.is_open()) + { + try + { + const std::shared_ptr ng_function = ngraph::onnx_import::import_onnx_model(resnet50_stream); + + // Check shape of the first output, for example + std::cout << ng_function->get_output_shape(0) << std::endl; + // The output is Shape{1, 1000} + } + catch (const ngraph::ngraph_error& error) + { + std::cout << "Error when importing ONNX model: " << error.what() << std::endl; + } + } + resnet50_stream.close(); +``` + +### Filepath as Input + +The code below shows how to convert the ONNX ResNet50 model to the nGraph function using `import_onnx_model` with the filepath as an input: +```cpp +const std::shared_ptr ng_function = ngraph::onnx_import::import_onnx_model(resnet50_path); +``` + +[onnx_header]: https://github.com/NervanaSystems/ngraph/blob/master/src/ngraph/frontend/onnx_import/onnx.hpp +[onnx_model_zoo]: https://github.com/onnx/models + + +## Deprecation Notice + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* \ No newline at end of file diff --git a/docs/IE_DG/Optimization_notice.md b/docs/IE_DG/Optimization_notice.md new file mode 100644 index 00000000000000..3c128d95b6c5bc --- /dev/null +++ b/docs/IE_DG/Optimization_notice.md @@ -0,0 +1,3 @@ +# Optimization Notice {#openvino_docs_IE_DG_Optimization_notice} + +![Optimization_notice](img/opt-notice-en_080411.gif) \ No newline at end of file diff --git a/docs/IE_DG/PythonPackage_Overview.md b/docs/IE_DG/PythonPackage_Overview.md new file mode 100644 index 00000000000000..411f082609f3d8 --- /dev/null +++ b/docs/IE_DG/PythonPackage_Overview.md @@ -0,0 +1,15 @@ +OpenVINO™ Python* package {#openvino_docs_IE_DG_PythonPackage_Overview} +======================== + +OpenVINO™ Python\* package includes types to measure model and calibrate to low precision. + +The OpenVINO™ Python\* package available in the `/python/python3.X` directory. + +The OpenVINO™ Python\* package includes the following sub-packages: + + - [openvino.inference_engine](../../inference-engine/ie_bridges/python/docs/api_overview.md) - Python\* wrapper on OpenVINO™ Inference Engine. + - `openvino.tools.accuracy_checker` - Measure accuracy. + - `openvino.tools.benchmark` - Measure latency and throughput. + +## See Also +* [Introduction to Intel's Deep Learning Inference Engine](Introduction.md) diff --git a/docs/IE_DG/Samples_Overview.md b/docs/IE_DG/Samples_Overview.md new file mode 100644 index 00000000000000..af60575f2aaf2b --- /dev/null +++ b/docs/IE_DG/Samples_Overview.md @@ -0,0 +1,184 @@ +# Inference Engine Samples {#openvino_docs_IE_DG_Samples_Overview} + +The Inference Engine sample applications are simple console applications that show how to utilize specific Inference Engine capabilities within an application, assist developers in executing specific tasks such as loading a model, running inference, querying specific device capabilities and etc. + +After installation of Intel® Distribution of OpenVINO™ toolkit, С, C++ and Python* sample applications are available in the following directories, respectively: +* `/inference_engine/samples/c` +* `/inference_engine/samples/cpp` +* `/inference_engine/samples/python` + +Inference Engine sample applications include the following: +- **[Automatic Speech Recognition C++ Sample](../../inference-engine/samples/speech_sample/README.md)** – Acoustic model inference based on Kaldi neural networks and speech feature vectors. +- **Benchmark Application** – Estimates deep learning inference performance on supported devices for synchronous and asynchronous modes. + - [Benchmark C++ Application](../../inference-engine/samples/benchmark_app/README.md) + - [Benchmark Python Application](../../inference-engine/tools/benchmark_tool/README.md) +- **Hello Classification Sample** – Inference of image classification networks like AlexNet and GoogLeNet using Synchronous Inference Request API. Input of any size and layout can be set to an infer request which will be pre-processed automatically during inference (the sample supports only images as inputs and supports Unicode paths). + - [Hello Classification C++ Sample](../../inference-engine/samples/hello_classification/README.md) + - [Hello Classification C Sample](../../inference-engine/ie_bridges/c/samples/hello_classification/README.md) +- **Hello NV12 Input Classification Sample** – Input of any size and layout can be provided to an infer request. The sample transforms the input to the NV12 color format and pre-process it automatically during inference. The sample supports only images as inputs. + - [Hello NV12 Input Classification C++ Sample](../../inference-engine/samples/hello_nv12_input_classification/README.md) + - [Hello NV12 Input Classification C Sample](../../inference-engine/ie_bridges/c/samples/hello_nv12_input_classification/README.md) +- **Hello Query Device Sample** – Query of available Inference Engine devices and their metrics, configuration values. + - [Hello Query Device C++ Sample](../../inference-engine/samples/hello_query_device/README.md) + - [Hello Query Device Python* Sample](../../inference-engine/ie_bridges/python/sample/hello_query_device/README.md) +- **[Hello Reshape SSD C++ Sample**](../../inference-engine/samples/hello_reshape_ssd/README.md)** – Inference of SSD networks resized by ShapeInfer API according to an input size. +- **Image Classification Sample Async** – Inference of image classification networks like AlexNet and GoogLeNet using Asynchronous Inference Request API (the sample supports only images as inputs). + - [Image Classification C++ Sample Async](../../inference-engine/samples/classification_sample_async/README.md) + - [Image Classification Python* Sample Async](../../inference-engine/ie_bridges/python/sample/classification_sample_async/README.md) +- **[Image Classification Python* Sample](../../inference-engine/ie_bridges/python/sample/classification_sample/README.md)** – Inference of image classification networks like AlexNet and GoogLeNet using Synchronous Inference Request API (the sample supports only images as inputs). +- **Neural Style Transfer Sample** – Style Transfer sample (the sample supports only images as inputs). + - [Neural Style Transfer C++ Sample](../../inference-engine/samples/style_transfer_sample/README.md) + - [Neural Style Transfer Python* Sample](../../inference-engine/ie_bridges/python/sample/style_transfer_sample/README.md) +- **[nGraph Function Creation C++ Sample](../../inference-engine/samples/ngraph_function_creation_sample/README.md)** – Construction of the LeNet network using the nGraph function creation sample. +- **Object Detection for SSD Sample** – Inference of object detection networks based on the SSD, this sample is simplified version that supports only images as inputs. + - [Object Detection for SSD C++ Sample](../../inference-engine/samples/object_detection_sample_ssd/README.md) + - [Object Detection for SSD C Sample](../../inference-engine/ie_bridges/c/samples/object_detection_sample_ssd/README.md) + - [Object Detection for SSD Python* Sample](../../inference-engine/ie_bridges/python/sample/object_detection_sample_ssd/README.md) + +## Media Files Available for Samples + +To run the sample applications, you can use images and videos from the media files collection available at https://github.com/intel-iot-devkit/sample-videos. + +## Samples that Support Pre-Trained Models + +You can download the [pre-trained models](@ref omz_models_intel_index) using the OpenVINO [Model Downloader](@ref omz_tools_downloader_README) or from [https://download.01.org/opencv/](https://download.01.org/opencv/). + +## Build the Sample Applications + +### Build the Sample Applications on Linux* + +The officially supported Linux* build environment is the following: + +* Ubuntu* 16.04 LTS 64-bit or CentOS* 7.4 64-bit +* GCC* 5.4.0 (for Ubuntu* 16.04) or GCC* 4.8.5 (for CentOS* 7.4) +* CMake* version 2.8 or higher + +To build the C or C++ sample applications for Linux, go to the `/inference_engine/samples/c` or `/inference_engine/samples/cpp` directory, respectively, and run the `build_samples.sh` script: +```sh +build_samples.sh +``` + +Once the build is completed, you can find sample binaries in the following folders: +* C samples: `~/inference_engine_c_samples_build/intel64/Release` +* C++ samples: `~/inference_engine_cpp_samples_build/intel64/Release` + +You can also build the sample applications manually: + +> **NOTE**: If you have installed the product as a root user, switch to root mode before you continue: `sudo -i` + +1. Navigate to a directory that you have write access to and create a samples build directory. This example uses a directory named `build`: +```sh +mkdir build +``` +> **NOTE**: If you ran the Image Classification verification script during the installation, the C++ samples build directory was already created in your home directory: `~/inference_engine_samples_build/` + +2. Go to the created directory: +```sh +cd build +``` + +3. Run CMake to generate the Make files for release or debug configuration. For example, for C++ samples: + - For release configuration: + ```sh + cmake -DCMAKE_BUILD_TYPE=Release /inference_engine/samples/cpp + ``` + - For debug configuration: + ```sh + cmake -DCMAKE_BUILD_TYPE=Debug /inference_engine/samples/cpp + ``` +4. Run `make` to build the samples: +```sh +make +``` + +For the release configuration, the sample application binaries are in `/intel64/Release/`; +for the debug configuration — in `/intel64/Debug/`. + +### Build the Sample Applications on Microsoft Windows* OS + +The recommended Windows* build environment is the following: +* Microsoft Windows* 10 +* Microsoft Visual Studio* 2015, 2017, or 2019 +* CMake* version 2.8 or higher + +> **NOTE**: If you want to use Microsoft Visual Studio 2019, you are required to install CMake 3.14. + +To build the C or C++ sample applications on Windows, go to the `\inference_engine\samples\c` or `\inference_engine\samples\cpp` directory, respectively, and run the `build_samples_msvc.bat` batch file: +```sh +build_samples_msvc.bat +``` + +By default, the script automatically detects the highest Microsoft Visual Studio version installed on the machine and uses it to create and build +a solution for a sample code. Optionally, you can also specify the preferred Microsoft Visual Studio version to be used by the script. Supported +versions are `VS2015`, `VS2017`, and `VS2019`. For example, to build the C++ samples using the Microsoft Visual Studio 2017, use the following command: +```sh +\inference_engine\samples\cpp\build_samples_msvc.bat VS2017 +``` + +Once the build is completed, you can find sample binaries in the following folders: +* C samples: `C:\Users\\Documents\Intel\OpenVINO\inference_engine_c_samples_build\intel64\Release` +* C++ samples: `C:\Users\\Documents\Intel\OpenVINO\inference_engine_cpp_samples_build\intel64\Release` + +You can also build a generated solution manually. For example, if you want to build C++ sample binaries in Debug configuration, run the appropriate version of the +Microsoft Visual Studio and open the generated solution file from the `C:\Users\\Documents\Intel\OpenVINO\inference_engine_cpp_samples_build\Samples.sln` +directory. + +## Get Ready for Running the Sample Applications + +### Get Ready for Running the Sample Applications on Linux* + +Before running compiled binary files, make sure your application can find the +Inference Engine and OpenCV libraries. +Run the `setupvars` script to set all necessary environment variables: +```sh +source /bin/setupvars.sh +``` + +**(Optional)**: The OpenVINO environment variables are removed when you close the +shell. As an option, you can permanently set the environment variables as follows: + +1. Open the `.bashrc` file in ``: +```sh +vi /.bashrc +``` + +2. Add this line to the end of the file: +```sh +source /opt/intel/openvino/bin/setupvars.sh +``` + +3. Save and close the file: press the **Esc** key, type `:wq` and press the **Enter** key. +4. To test your change, open a new terminal. You will see `[setupvars.sh] OpenVINO environment initialized`. + +You are ready to run sample applications. To learn about how to run a particular +sample, read the sample documentation by clicking the sample name in the samples +list above. + +### Get Ready for Running the Sample Applications on Windows* + +Before running compiled binary files, make sure your application can find the +Inference Engine and OpenCV libraries. +Use the `setupvars` script, which sets all necessary environment variables: +```sh +\bin\setupvars.bat +``` + +To debug or run the samples on Windows in Microsoft Visual Studio, make sure you +have properly configured **Debugging** environment settings for the **Debug** +and **Release** configurations. Set correct paths to the OpenCV libraries, and +debug and release versions of the Inference Engine libraries. +For example, for the **Debug** configuration, go to the project's +**Configuration Properties** to the **Debugging** category and set the `PATH` +variable in the **Environment** field to the following: + +```sh +PATH=\deployment_tools\inference_engine\bin\intel64\Debug;\opencv\bin;%PATH% +``` +where `` is the directory in which the OpenVINO toolkit is installed. + +You are ready to run sample applications. To learn about how to run a particular +sample, read the sample documentation by clicking the sample name in the samples +list above. + +## See Also +* [Introduction to Intel's Deep Learning Inference Engine](Introduction.md) diff --git a/docs/IE_DG/ShapeInference.md b/docs/IE_DG/ShapeInference.md new file mode 100644 index 00000000000000..58203a3f841ad6 --- /dev/null +++ b/docs/IE_DG/ShapeInference.md @@ -0,0 +1,129 @@ +Using Shape Inference {#openvino_docs_IE_DG_ShapeInference} +========================================== + +Inference Engine takes two kinds of model description as an input: [Intermediate Representation (IR)](../MO_DG/IR_and_opsets.md) and [nGraph::Function](nGraph_Flow.md) objects. +Both should have fixed input shapes to be successfully loaded to the Inference Engine. +To feed input data of a shape that is different from the model input shape, resize the model first. + +Model resizing on the stage of IR generation or [nGraph::Function creation](nGraphTutorial.md) is the recommended approach. +OpenVINO™ provides the following experimental methods for runtime model reshaping: + +1. Setting a new input shape with the `InferenceEngine::CNNNetwork::reshape` method + + `InferenceEngine::CNNNetwork::reshape` method updates input shapes and propagates them down to the outputs of the model through all intermediate layers. + + Shape propagation for `InferenceEngine::CNNNetwork` objects created from `nGraph::Function` or IR of the version 10 works through the `nGraph` shape inference mechanism. + `InferenceEngine::CNNNetwork` objects created from lower IR versions are considered deprecated and may be reshaped incorrectly or give unexpected results. + + To keep the v10 IR resizable by the `InferenceEngine::CNNNetwork::reshape` method, convert the model with the additional Model Optimizer key `--keep_shape_ops`. + +2. Setting a new batch dimension value with the `InferenceEngine::CNNNetwork::setBatchSize` method + + The meaning of a model batch may vary depending on choices you made during the model designing. + The `InferenceEngine::CNNNetwork::setBatchSize` method deduces index of batch dimension relying only on the input rank. + This method does not work for models with a non-zero index batch placement or models with inputs without a batch dimension. + + Batch-setting algorithm does not involve shape inference mechanism. + Batch of input and output shapes for all layers is set to a new batch value without layer validation. + It may cause both positive and negative side effects. + + Due to the limitations described above, the current method is recommended for simple image processing models only. + + +Practically, some models are not ready to be resized. In this case, a new input shape cannot be set with the Model Optimizer or the `InferenceEngine::CNNNetwork::reshape` method. + +## Troubleshooting Resize Errors + +Operation semantics may impose restrictions on input shapes of the operation. +Shape collision during shape propagation may be a sign that a new shape does not satisfy the restrictions. +Changing the model input shape may result in intermediate operations shape collision. + +Examples of such operations: +- `Reshape` operation with a hard-coded output shape value +- `MatMul` operation with the `Const` second input cannot be resized by spatial dimensions due to operation semantics + +Model structure and logic should not change significantly after resizing. +- The Global Pooling operation is commonly used to reduce output feature map of classification models output. +Having the input of the shape [N, C, H, W], Global Pooling returns the output of the shape [N, C, 1, 1]. +Model architects usually express Global Pooling with the help of the `Pooling` operation with the fixed kernel size [H, W]. +During spatial reshape, having the input of the shape [N, C, H1, W1], Pooling with the fixed kernel size [H, W] returns the output of the shape [N, C, H2, W2], where H2 and W2 are commonly not equal to `1`. +It breaks the classification model structure. +For example, [publicly available Inception family models from TensorFlow*](https://github.com/tensorflow/models/tree/master/research/slim#pre-trained-models) have this issue. + +- Resizing the model input shape may significantly affect its accuracy. +For example, Object Detection models from TensorFlow have resizing restrictions by design. +To keep the model valid after the reshape, choose a new input shape that satisfies conditions listed in the `pipeline.config` file. +For details, refer to the Tensorflow Object Detection API models resizing techniques. + +## Usage of Reshape Method + +The primary method of the feature is `InferenceEngine::CNNNetwork::reshape`. +It gets new input shapes and propagates it from input to output for all intermediates layers of the given network. +The method takes `InferenceEngine::ICNNNetwork::InputShapes` - a map of pairs: name of input data and its dimension. + +The algorithm for resizing network is the following: + +1) **Collect the map of input names and shapes from Intermediate Representation (IR)** using helper method `InferenceEngine::CNNNetwork::getInputShapes` + +2) **Set new input shapes** + +3) **Call reshape** + +Here is a code example: +```cpp + InferenceEngine::Core core; + // ------------- 0. Read IR and image ---------------------------------------------- + CNNNetwork network = core.ReadNetwork("path/to/IR/xml"); + cv::Mat image = cv::imread("path/to/image"); + // --------------------------------------------------------------------------------- + + // ------------- 1. Collect the map of input names and shapes from IR--------------- + auto input_shapes = network.getInputShapes(); + // --------------------------------------------------------------------------------- + + // ------------- 2. Set new input shapes ------------------------------------------- + std::string input_name; + SizeVector input_shape; + std::tie(input_name, input_shape) = *input_shapes.begin(); // let's consider first input only + input_shape[0] = batch_size; // set batch size to the first input dimension + input_shape[2] = image.rows; // changes input height to the image one + input_shape[3] = image.cols; // changes input width to the image one + input_shapes[input_name] = input_shape; + // --------------------------------------------------------------------------------- + + // ------------- 3. Call reshape --------------------------------------------------- + network.reshape(input_shapes); + // --------------------------------------------------------------------------------- + + ... + + // ------------- 4. Loading model to the device ------------------------------------ + std::string device = "CPU"; + ExecutableNetwork executable_network = core.LoadNetwork(network, device); + // --------------------------------------------------------------------------------- + + +``` +Shape Inference feature is used in [Smart classroom sample](@ref omz_demos_smart_classroom_demo_README). + +## Extensibility + +Inference Engine provides a special mechanism that allows to add the support of shape inference for custom operations. +This mechanism is described in the [Extensibility documentation](Extensibility_DG/Intro.md) + +## Deprecation Notice + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* diff --git a/docs/IE_DG/Tools_Overview.md b/docs/IE_DG/Tools_Overview.md new file mode 100644 index 00000000000000..6c543c810d0d2f --- /dev/null +++ b/docs/IE_DG/Tools_Overview.md @@ -0,0 +1,17 @@ +# OpenVINO™ Tools {#openvino_docs_IE_DG_Tools_Overview} + +OpenVINO™ tools are C++ and Python\* console command line applications that can be used for models downloading, accuracy measurement, calibration and checking. + +The OpenVINO™ toolkit installation includes the following tools: + +|Tool | Location in the Installation Directory| +|-----------------------------------------------------------------------------|---------------------------------------| +|[Accuracy Checker Tool](@ref omz_tools_accuracy_checker_README) | `/deployment_tools/tools/open_model_zoo/tools/accuracy_checker`| +|[Post-Training Optimization Tool](@ref pot_README) | `/deployment_tools/tools/post_training_optimization_toolkit`| +|[Model Downloader](@ref omz_tools_downloader_README) | `/deployment_tools/tools/model_downloader`| +|[Cross Check Tool](../../inference-engine/tools/cross_check_tool/README.md) | `/deployment_tools/tools/cross_check_tool`| +|[Compile Tool](../../inference-engine/tools/compile_tool/README.md) | `/deployment_tools/inference_engine/lib/intel64/`| + + +## See Also +* [Introduction to Deep Learning Inference Engine](Introduction.md) diff --git a/docs/IE_DG/img/NewAndOldCNNNetworkImpl.png b/docs/IE_DG/img/NewAndOldCNNNetworkImpl.png new file mode 100644 index 00000000000000..b5868b343487f8 --- /dev/null +++ b/docs/IE_DG/img/NewAndOldCNNNetworkImpl.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5389b6d0a25e8356002bd8c68526ceedf39f6c4efa5e7097b5ac0308fd42dee3 +size 48611 diff --git a/docs/IE_DG/img/TopLevelNGraphFlow.png b/docs/IE_DG/img/TopLevelNGraphFlow.png new file mode 100644 index 00000000000000..4359676d20ca52 --- /dev/null +++ b/docs/IE_DG/img/TopLevelNGraphFlow.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c416156d9ed77213ead230fc49c32a3c3918e52128ac2db442f56062e206bc01 +size 708262 diff --git a/docs/IE_DG/img/bf16_format.png b/docs/IE_DG/img/bf16_format.png new file mode 100644 index 00000000000000..bf92086a96faa8 --- /dev/null +++ b/docs/IE_DG/img/bf16_format.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ce6fb1c626ac0858b411c86fa2e3a46c5ca0dc2e88692284ce4ec24edb141e7f +size 9326 diff --git a/docs/IE_DG/img/conv_depth_01.png b/docs/IE_DG/img/conv_depth_01.png new file mode 100644 index 00000000000000..516b01d6d1b0d3 --- /dev/null +++ b/docs/IE_DG/img/conv_depth_01.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:80edd1da1c5673d18afa44bc2c0503ba9ecdcc37c2acb94960303b61c602ceee +size 12649 diff --git a/docs/IE_DG/img/conv_simple_01.png b/docs/IE_DG/img/conv_simple_01.png new file mode 100644 index 00000000000000..6de6f46e36e3af --- /dev/null +++ b/docs/IE_DG/img/conv_simple_01.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d3e8856aa175d6fcf940af57a53f962ff6c58acf0a3838bfccc6a093bff1756d +size 9015 diff --git a/docs/IE_DG/img/conv_sum_relu_01.png b/docs/IE_DG/img/conv_sum_relu_01.png new file mode 100644 index 00000000000000..7007115294fbac --- /dev/null +++ b/docs/IE_DG/img/conv_sum_relu_01.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7d53ce33f180cf4d170bbeb69635ee7c49a67d3f6ee8b1c01ec12568fe1cca38 +size 17157 diff --git a/docs/IE_DG/img/cpu_int8_flow.png b/docs/IE_DG/img/cpu_int8_flow.png new file mode 100644 index 00000000000000..130e54ceafa638 --- /dev/null +++ b/docs/IE_DG/img/cpu_int8_flow.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3965f4830c45518ee1dc169c2b1760cae83f8a8819023770a28893c6cef558c2 +size 68441 diff --git a/docs/IE_DG/img/deploy_encrypted_model.png b/docs/IE_DG/img/deploy_encrypted_model.png new file mode 100644 index 00000000000000..9338c59dcf273d --- /dev/null +++ b/docs/IE_DG/img/deploy_encrypted_model.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:25ed719bdd525dc0b606ef17a3fec5303ea032dfe6b2d167e1b19b6100b6fb37 +size 16516 diff --git a/docs/IE_DG/img/deploy_encrypted_model.vsdx b/docs/IE_DG/img/deploy_encrypted_model.vsdx new file mode 100644 index 00000000000000..9d1086462bd0c3 --- /dev/null +++ b/docs/IE_DG/img/deploy_encrypted_model.vsdx @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:55c5fd6517ae9e3639f2214167665ffbb4b641cd2abef155ff816c68478915e2 +size 54233 diff --git a/docs/IE_DG/img/example_sample_output.png b/docs/IE_DG/img/example_sample_output.png new file mode 100644 index 00000000000000..f9299373c97e21 --- /dev/null +++ b/docs/IE_DG/img/example_sample_output.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5fbfb33c1a860978b8b99cf4dfbc04b5f7fbe0e20af03cd3e5ffd1d6a9f2db40 +size 353490 diff --git a/docs/IE_DG/img/fpga_full_workflow.png b/docs/IE_DG/img/fpga_full_workflow.png new file mode 100644 index 00000000000000..754bb37cea7fe0 --- /dev/null +++ b/docs/IE_DG/img/fpga_full_workflow.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3f0f329112b9c8227cbba3d394b778a6d219b4f3fc0d02cc5f2f8598c3d4eb51 +size 151678 diff --git a/docs/IE_DG/img/fpga_platform_hub.png b/docs/IE_DG/img/fpga_platform_hub.png new file mode 100644 index 00000000000000..bc5e7e66492611 --- /dev/null +++ b/docs/IE_DG/img/fpga_platform_hub.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0b46a1f89df96410a87f90801c9a86a28a6aacb39fa4677b434d856559f163fe +size 217954 diff --git a/docs/IE_DG/img/fullyconnected_activation_01.png b/docs/IE_DG/img/fullyconnected_activation_01.png new file mode 100644 index 00000000000000..776b14b46feb2a --- /dev/null +++ b/docs/IE_DG/img/fullyconnected_activation_01.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:88745fd132531e943d59afe59ed6af8eaae6b62ba1fda2493dfef76080d31a25 +size 7788 diff --git a/docs/IE_DG/img/group_convolutions_01.png b/docs/IE_DG/img/group_convolutions_01.png new file mode 100644 index 00000000000000..237523823c3503 --- /dev/null +++ b/docs/IE_DG/img/group_convolutions_01.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9709bc83f903943b4d737d379babf80a391a72ad8eab98e71abcc0de5424fbfc +size 12361 diff --git a/docs/IE_DG/img/hor_fusion_1.png b/docs/IE_DG/img/hor_fusion_1.png new file mode 100644 index 00000000000000..4fee4887cdb208 --- /dev/null +++ b/docs/IE_DG/img/hor_fusion_1.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f6ff04de33684f00d0d2da8fed6d30b5162c566b35b8894e9e14f7921db70592 +size 8598 diff --git a/docs/IE_DG/img/hor_fusion_2.png b/docs/IE_DG/img/hor_fusion_2.png new file mode 100644 index 00000000000000..937fbafe09b84e --- /dev/null +++ b/docs/IE_DG/img/hor_fusion_2.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9a453412cf37f06e1e5a63f5ff629d4e16ed1707fc55b5a63cc03e710807b33e +size 10151 diff --git a/docs/IE_DG/img/hor_fusion_3.png b/docs/IE_DG/img/hor_fusion_3.png new file mode 100644 index 00000000000000..3aacdbd6f00a61 --- /dev/null +++ b/docs/IE_DG/img/hor_fusion_3.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b3be59a71703b640eac6ad99ce3d463141a36e58f5299bf21e4f6aba152d9ed6 +size 9359 diff --git a/docs/IE_DG/img/hor_fusion_4.png b/docs/IE_DG/img/hor_fusion_4.png new file mode 100644 index 00000000000000..0a439dafc18f69 --- /dev/null +++ b/docs/IE_DG/img/hor_fusion_4.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:50f41274758a989c9ef43e558343d420d7e4e288c88ac2d19a2bf396d5ee573c +size 9937 diff --git a/docs/IE_DG/img/integration_process.png b/docs/IE_DG/img/integration_process.png new file mode 100644 index 00000000000000..cb1070821064d7 --- /dev/null +++ b/docs/IE_DG/img/integration_process.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9fff52e5faaf108371db87e53959453216554152b15ca0432b1541f94def297e +size 19145 diff --git a/docs/IE_DG/img/intel_logo.png b/docs/IE_DG/img/intel_logo.png new file mode 100644 index 00000000000000..77a3ff51275b83 --- /dev/null +++ b/docs/IE_DG/img/intel_logo.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2d147adf801535e95d8b627a8a1d23f7b89dea1eabe06218235e756b0a9866fe +size 1636 diff --git a/docs/IE_DG/img/ir_add_n_ref.png b/docs/IE_DG/img/ir_add_n_ref.png new file mode 100644 index 00000000000000..cc21c584f0ed4f --- /dev/null +++ b/docs/IE_DG/img/ir_add_n_ref.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a9aae473dcc469ebdb5c2d9ac8067bf8c7caa11d4cdbc7e0dd0b2006621ce526 +size 4267 diff --git a/docs/IE_DG/img/mkldnn_conv_sum.png b/docs/IE_DG/img/mkldnn_conv_sum.png new file mode 100644 index 00000000000000..d1c56f77128b3f --- /dev/null +++ b/docs/IE_DG/img/mkldnn_conv_sum.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:af2641e8e685b027123681ab542162932b008eff257ef5b7105950bfe8b4ade8 +size 10373 diff --git a/docs/IE_DG/img/mkldnn_conv_sum_result.png b/docs/IE_DG/img/mkldnn_conv_sum_result.png new file mode 100644 index 00000000000000..67dc87cd3263b7 --- /dev/null +++ b/docs/IE_DG/img/mkldnn_conv_sum_result.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:02efdda675c16def7c2705e978964ce8bf65d1ec6cedfdb0a5afc837fb57abf0 +size 5660 diff --git a/docs/IE_DG/img/mkldnn_group_conv.png b/docs/IE_DG/img/mkldnn_group_conv.png new file mode 100644 index 00000000000000..c433a6b5484a1b --- /dev/null +++ b/docs/IE_DG/img/mkldnn_group_conv.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e69242d80da7676311e20e5db67c01bd6562008ecf3a53df8fdedaefabb91b70 +size 7226 diff --git a/docs/IE_DG/img/opt-notice-en_080411.gif b/docs/IE_DG/img/opt-notice-en_080411.gif new file mode 100644 index 00000000000000..ceddf9732d7809 --- /dev/null +++ b/docs/IE_DG/img/opt-notice-en_080411.gif @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d4457dbe05630bf90294396c4185b280634a5bf1ac7a6ca1c5186be67eb1cc4a +size 54231 diff --git a/docs/IE_DG/img/optimizations/groups.png b/docs/IE_DG/img/optimizations/groups.png new file mode 100644 index 00000000000000..b497e16547b85c --- /dev/null +++ b/docs/IE_DG/img/optimizations/groups.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3812efef32bd7f1bf40b130d5d522bc3df6aebd406bd1186699d214bca856722 +size 43721 diff --git a/docs/IE_DG/img/optimizations/inception_v4.png b/docs/IE_DG/img/optimizations/inception_v4.png new file mode 100644 index 00000000000000..64058527a5de82 --- /dev/null +++ b/docs/IE_DG/img/optimizations/inception_v4.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e232c47e8500f42bd0e1f2b93f94f58e2d59caee149c687be3cdc3e8a5be59a +size 18417 diff --git a/docs/IE_DG/img/optimizations/resnet_269.png b/docs/IE_DG/img/optimizations/resnet_269.png new file mode 100644 index 00000000000000..4ef638090e9f61 --- /dev/null +++ b/docs/IE_DG/img/optimizations/resnet_269.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:92d36b9527a3e316cd9eb2b6f5054c312466df004e4aa9c3458e165330bc6561 +size 24157 diff --git a/docs/IE_DG/img/optimizations/resnet_optimization.png b/docs/IE_DG/img/optimizations/resnet_optimization.png new file mode 100644 index 00000000000000..b276e81a2dd18e --- /dev/null +++ b/docs/IE_DG/img/optimizations/resnet_optimization.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2adeca1e3512b9fe7b088a5412ce21592977a1f352a013735537ec92e895dc94 +size 15653 diff --git a/docs/IE_DG/img/pooling_fakequant_01.png b/docs/IE_DG/img/pooling_fakequant_01.png new file mode 100644 index 00000000000000..2310488df403a9 --- /dev/null +++ b/docs/IE_DG/img/pooling_fakequant_01.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:37c7908d2379cc2ba1909965c58de7bc55d131a330c47e173321c718846d6745 +size 7809 diff --git a/docs/IE_DG/img/workflow_steps.png b/docs/IE_DG/img/workflow_steps.png new file mode 100644 index 00000000000000..6bf780127ad14c --- /dev/null +++ b/docs/IE_DG/img/workflow_steps.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5e22bc22d614c7335ae461a8ce449ea8695973d755faca718cf74b95972c94e2 +size 19773 diff --git a/docs/IE_DG/img/yolo_tiny_v1.png b/docs/IE_DG/img/yolo_tiny_v1.png new file mode 100644 index 00000000000000..a92f7ed806adc9 --- /dev/null +++ b/docs/IE_DG/img/yolo_tiny_v1.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f5d909bcaa7f6ec95cb0e3bf1b676b031489e89afa411e6add1aa2faaf90e0b3 +size 101557 diff --git a/docs/IE_DG/inference_engine_intro.md b/docs/IE_DG/inference_engine_intro.md new file mode 100644 index 00000000000000..cb3b43fcab72dc --- /dev/null +++ b/docs/IE_DG/inference_engine_intro.md @@ -0,0 +1,115 @@ +Introduction to Inference Engine {#openvino_docs_IE_DG_inference_engine_intro} +================================ + +After you have used the Model Optimizer to create an Intermediate Representation (IR), use the Inference Engine to infer the result for a given input data. + +Inference Engine is a set of C++ libraries providing a common API to deliver inference solutions on the platform of your choice: CPU, GPU, VPU, or FPGA. Use the Inference Engine API to read the Intermediate Representation, set the input and output formats, and execute the model on devices. While the C++ libraries is the primary implementation, C libraries and Python bindings are also available. + +For Intel® Distribution of OpenVINO™ toolkit, Inference Engine binaries are delivered within release packages. + +The open source version is available in the [OpenVINO™ toolkit GitHub repository](https://github.com/openvinotoolkit/openvino) and can be built for supported platforms using the Inference Engine Build Instructions. + +To learn about how to use the Inference Engine API for your application, see the [Integrating Inference Engine in Your Application](Integrate_with_customer_application_new_API.md) documentation. + +For complete API Reference, see the [API Reference](usergroup29.html) section. + +Inference Engine uses a plugin architecture. Inference Engine plugin is a software component that contains complete implementation for inference on a certain Intel® hardware device: CPU, GPU, VPU, FPGA, etc. Each plugin implements the unified API and provides additional hardware-specific APIs. + +Modules in the Inference Engine component +--------------------------------------- + +### Core Inference Engine Libraries ### + +Your application must link to the core Inference Engine libraries: +* Linux* OS: + - `libinference_engine.so`, which depends on `libinference_engine_transformations.so` and `libngraph.so` + - `libinference_engine_legacy.so`, which depends on `libtbb.so` +* Windows* OS: + - `inference_engine.dll`, which depends on `inference_engine_transformations.dll` and `ngraph.dll` + - `inference_engine_legacy.dll`, which depends on `tbb.dll` + +The required C++ header files are located in the `include` directory. + +This library contains the classes to: +* Create Inference Engine Core object to work with devices and read network (InferenceEngine::Core) +* Manipulate network information (InferenceEngine::CNNNetwork) +* Execute and pass inputs and outputs (InferenceEngine::ExecutableNetwork and InferenceEngine::InferRequest) + +### Plugin Libraries to read a network object ### + +Starting from 2020.4 release, Inference Engine introduced a concept of `CNNNetwork` reader plugins. Such plugins can be automatically dynamically loaded by Inference Engine in runtime depending on file format: +* Linux* OS: + - `libinference_engine_ir_reader.so` to read a network from IR + - `libinference_engine_onnx_reader.so` to read a network from ONNX model format +* Windows* OS: + - `inference_engine_ir_reader.dll` to read a network from IR + - `inference_engine_onnx_reader.dll` to read a network from ONNX model format + +### Device-specific Plugin Libraries ### + +For each supported target device, Inference Engine provides a plugin — a DLL/shared library that contains complete implementation for inference on this particular device. The following plugins are available: + +| Plugin | Device Type | +| ------------- | ------------- | +|CPU| Intel® Xeon® with Intel® AVX2 and AVX512, Intel® Core™ Processors with Intel® AVX2, Intel® Atom® Processors with Intel® SSE | +|GPU| Intel® Processor Graphics, including Intel® HD Graphics and Intel® Iris® Graphics +|FPGA| Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA, Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Speed Grade 2) | +|MYRIAD| Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X| +|GNA| Intel® Speech Enabling Developer Kit, Amazon Alexa* Premium Far-Field Developer Kit, Intel® Pentium® Silver J5005 Processor, Intel® Pentium® Silver N5000 Processor, Intel® Celeron® J4005 Processor, Intel® Celeron® J4105 Processor, Intel® Celeron® Processor N4100, Intel® Celeron® Processor N4000, Intel® Core™ i3-8121U Processor, Intel® Core™ i7-1065G7 Processor, Intel® Core™ i7-1060G7 Processor, Intel® Core™ i5-1035G4 Processor, Intel® Core™ i5-1035G7 Processor, Intel® Core™ i5-1035G1 Processor, Intel® Core™ i5-1030G7 Processor, Intel® Core™ i5-1030G4 Processor, Intel® Core™ i3-1005G1 Processor, Intel® Core™ i3-1000G1 Processor, Intel® Core™ i3-1000G4 Processor +|HETERO|Automatic splitting of a network inference between several devices (for example if a device doesn't support certain layers| +|MULTI| Simultaneous inference of the same network on several devices in parallel| + +The table below shows the plugin libraries and additional dependencies for Linux and Windows platforms. + +| Plugin | Library name for Linux | Dependency libraries for Linux | Library name for Windows | Dependency libraries for Windows | +|--------|------------------------|-------------------------------------------------|--------------------------|--------------------------------------------------------------------------------------------------------| +| CPU | `libMKLDNNPlugin.so` | `libinference_engine_lp_transformations.so` | `MKLDNNPlugin.dll` | `inference_engine_lp_transformations.dll` | +| GPU | `libclDNNPlugin.so` | `libinference_engine_lp_transformations.so`, `libOpenCL.so` | `clDNNPlugin.dll` | `OpenCL.dll`, `inference_engine_lp_transformations.dll` | +| FPGA | `libdliaPlugin.so` | `libdla_compiler_core.so`, `libdla_runtime_core.so`, `libcrypto.so`, `libalteracl.so`, `liblpsolve5525.so`, `libprotobuf.so`, `libacl_emulator_kernel_rt.so` | `dliaPlugin.dll` | `dla_compiler_core.dll`, `dla_runtime_core.dll`, `crypto.dll`, `alteracl.dll`, `lpsolve5525.dll`, `protobuf.dll`, `acl_emulator_kernel_rt.dll` +| MYRIAD | `libmyriadPlugin.so` | `libusb.so`, `libinference_engine_lp_transformations.so` | `myriadPlugin.dll` | `usb.dll`, `inference_engine_lp_transformations.dll` | +| HDDL | `libHDDLPlugin.so` | `libbsl.so`, `libhddlapi.so`, `libmvnc-hddl.so`, `libinference_engine_lp_transformations.so`| `HDDLPlugin.dll` | `bsl.dll`, `hddlapi.dll`, `json-c.dll`, `libcrypto-1_1-x64.dll`, `libssl-1_1-x64.dll`, `mvnc-hddl.dll`, `inference_engine_lp_transformations.dll` | +| GNA | `libGNAPlugin.so` | `libgna.so`, `libinference_engine_lp_transformations.so` | `GNAPlugin.dll` | `gna.dll`, `inference_engine_lp_transformations.dll` | +| HETERO | `libHeteroPlugin.so` | Same as for selected plugins | `HeteroPlugin.dll` | Same as for selected plugins | +| MULTI | `libMultiDevicePlugin.so` | Same as for selected plugins | `MultiDevicePlugin.dll` | Same as for selected plugins | + +> **NOTE**: All plugin libraries also depend on core Inference Engine libraries. + +Make sure those libraries are in your computer's path or in the place you pointed to in the plugin loader. Make sure each plugin's related dependencies are in the: + +* Linux: `LD_LIBRARY_PATH` +* Windows: `PATH` + +On Linux, use the script `bin/setupvars.sh` to set the environment variables. + +On Windows, run the `bin\setupvars.bat` batch file to set the environment variables. + +To learn more about supported devices and corresponding plugins, see the [Supported Devices](supported_plugins/Supported_Devices.md) chapter. + +Common Workflow for Using the Inference Engine API +--------------------------- +The common workflow contains the following steps: + +1. **Create Inference Engine Core object** - Create an `InferenceEngine::Core` object to work with different devices, all device plugins are managed internally by the `Core` object. Register extensions with custom nGraph operations (`InferenceEngine::Core::AddExtension`). + +2. **Read the Intermediate Representation** - Using the `InferenceEngine::Core` class, read an Intermediate Representation file into an object of the `InferenceEngine::CNNNetwork` class. This class represents the network in the host memory. + +3. **Prepare inputs and outputs format** - After loading the network, specify input and output precision and the layout on the network. For these specification, use the `InferenceEngine::CNNNetwork::getInputsInfo()` and `InferenceEngine::CNNNetwork::getOutputsInfo()`. + +4. Pass per device loading configurations specific to this device (`InferenceEngine::Core::SetConfig`), and register extensions to this device (`InferenceEngine::Core::AddExtension`). + +4. **Compile and Load Network to device** - Use the `InferenceEngine::Core::LoadNetwork()` method with specific device (e.g. `CPU`, `GPU`, etc.) to compile and load the network on the device. Pass in the per-target load configuration for this compilation and load operation. + +5. **Set input data** - With the network loaded, you have an `InferenceEngine::ExecutableNetwork` object. Use this object to create an `InferenceEngine::InferRequest` in which you signal the input buffers to use for input and output. Specify a device-allocated memory and copy it into the device memory directly, or tell the device to use your application memory to save a copy. + +6. **Execute** - With the input and output memory now defined, choose your execution mode: + + * Synchronously - `InferenceEngine::InferRequest::Infer()` method. Blocks until inference is completed. + * Asynchronously - `InferenceEngine::InferRequest::StartAsync()` method. Check status with the `InferenceEngine::InferRequest::Wait()` method (0 timeout), wait, or specify a completion callback. + +7. **Get the output** - After inference is completed, get the output memory or read the memory you provided earlier. Do this with the `InferenceEngine::IInferRequest::GetBlob()` method. + + +Further Reading +--------------- + +For more details on the Inference Engine API, refer to the [Integrating Inference Engine in Your Application](Integrate_with_customer_application_new_API.md) documentation. diff --git a/docs/IE_DG/nGraphTutorial.md b/docs/IE_DG/nGraphTutorial.md new file mode 100644 index 00000000000000..41a0a294964d52 --- /dev/null +++ b/docs/IE_DG/nGraphTutorial.md @@ -0,0 +1,81 @@ +# Build a Model with nGraph Library {#openvino_docs_IE_DG_nGraphTutorial} + +This section illustrates how to construct an nGraph function +composed of operations from the `opset3` namespace. Once created, +it can wrap into a `CNNNetwork`, creating utility for data scientists +or app developers to define a deep-learning model in a neutral way +that does not depend on existing Deep Learning (DL) frameworks. + +Operation Set `opsetX` integrates a list of nGraph pre-compiled operations that work +for this purpose. In other words, `opsetX` defines a set of operations for building a graph. + +For a complete list of operation sets supported by Inference Engine, see [Available Operations Sets](../ops/opset.md). + +To add custom nGraph operations to an existing `CNNNetwork`, see +the [Add Custom nGraph Operations](Extensibility_DG/Intro.md) document. + +Now that you can build graphs with anything from the `opset3` definition, some +parameters for shape-relevant (or shape-specific) inputs can be added. The +following code prepares a graph for shape-relevant parameters. + +> **NOTE**: `validate_nodes_and_infer_types(ops)` must be included for partial shape inference. + +```cpp +#include "ngraph/opsets/opset.hpp" +#include "ngraph/opsets/opset3.hpp" + +using namespace std; +using namespace ngraph; + +auto arg0 = make_shared(element::f32, Shape{7}); +auto arg1 = make_shared(element::f32, Shape{7}); +// Create an 'Add' operation with two inputs 'arg0' and 'arg1' +auto add0 = make_shared(arg0, arg1); +auto abs0 = make_shared(add0); +// Create a node whose inputs/attributes will be specified later +auto acos0 = make_shared(); +// Create a node using opset factories +auto add1 = shared_ptr(get_opset3().create("Add")); +// Set inputs to nodes explicitly +acos0->set_argument(0, add0); +add1->set_argument(0, acos0); +add1->set_argument(1, abs0); + +// Run shape inference on the nodes +NodeVector ops{arg0, arg1, add0, abs0, acos0, add1}; +validate_nodes_and_infer_types(ops); + +// Create a graph with one output (add1) and four inputs (arg0, arg1) +auto ng_function = make_shared(OutputVector{add1}, ParameterVector{arg0, arg1}); + +``` + +To wrap it into a CNNNetwork, use: +```cpp +CNNNetwork net (ng_function); +``` + +## Deprecation Notice + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* + +## See Also + +* [Available Operation Sets](../ops/opset.md) +* [Operation Set `opset1` Specification](../ops/opset1.md) +* [Operation Set `opset2` Specification](../ops/opset2.md) +* [Operation Set `opset3` Specification](../ops/opset3.md) +* [Inference Engine Extensibility Developer Guide](Extensibility_DG/Intro.md) diff --git a/docs/IE_DG/nGraph_Flow.md b/docs/IE_DG/nGraph_Flow.md new file mode 100644 index 00000000000000..abd4e3db0eeb64 --- /dev/null +++ b/docs/IE_DG/nGraph_Flow.md @@ -0,0 +1,159 @@ +# Introduction to nGraph Flow in Inference Engine {#openvino_docs_IE_DG_nGraph_Flow} + +## New Run-Time Intermediate Representation (IR): nGraph + +Starting from the OpenVINO™ release 2020.1, the Inference Engine integrates the +nGraph Core. +That implies that the Inference Engine uses a new way to represent a model in run time underneath of +the conventional `CNNNetwork` API, which is an instance of `ngraph::Function`. + +Besides the representation update, nGraph integration resulted in the following changes and new features: + +1. New operations sets. When operations from the nGraph Core were combined with conventional layers +from `CNNNetwork`, there were created a [new sets of operations called `opset1`, `opset2` and etc.](../ops/opset.md), +which covered both interfaces except several not very important cases. +Operations from `opset3` are generated by the Model Optimizer and are accepted in the Inference Engine. + +2. New version approach that attaches a version to each operation rather than to the entire IR file format. +IR is still versioned but has a different meaning. For details, see [Deep Learning Network Intermediate Representation and Operation Sets in OpenVINO™](../MO_DG/IR_and_opsets.md). + +3. Creating models in run-time without loading IR from an xml/binary file. You can enable it by creating +`ngraph::Function` passing it to `CNNNetwork`. + +4. Run-time reshape capability and constant folding are implemented through the nGraph code for more operations compared to previous releases. +As a result, more models can be reshaped. For details, see the [dedicated guide about the reshape capability](ShapeInference.md). + +5. Loading model from ONNX format without converting it to the Inference Engine IR. + +The conventional flow that is not based on nGraph is still available. +The complete picture of co-existence of legacy and new flows is presented below. +The rest of the document describes the coexistence of legacy and new flows showed in the picture below: + +![](img/TopLevelNGraphFlow.png) + + +## Read the Intermediate Representation to `CNNNetwork` + +As the new operation set is introduced, the Model Optimizer generates the IR version 10 using the new operations by default. +Each layer generated in the IR has a semantics matching to the corresponding operation from the nGraph namespace `opset3`. +The IR version 10 automatically triggers the nGraph flow inside the Inference Engine. +When such IR is read in an application, the Inference Engine IR reader produces `CNNNetwork` that encapsulates the `ngraph::Function` instance underneath. +Thus the OpenVINO IR becomes a new serialization format for the nGraph IR, and it can be deserialized reading the `CNNNetwork`. + +> **IMPORTANT**: Conventional interfaces are used (`CNNNetwork`, the reader), so no changes required in most applications. + +> **NOTE**: While you still can use old APIs, there is an independent process of continuous improvements in the Inference Engine API. +> For example, the Core::Read API is recommended to use instead of `CNNNetworkReader`. +> These changes are independent of nGraph integration and do not enable or disable new features. + +Interpretation of the IR version 10 differs from the old IR version. +Besides having a different operations set, the IR version 10 ignores the shapes and data types assigned to the ports in an XML file. +Both shapes and types are reinferred while loading to the Inference Engine using the nGraph shape and type propagation function that is a part of each nGraph operation. + +### Legacy IR Versions + +You can read old versions of the IR in the Inference Engine. +Each version below or equal to 7 is treated as an old one. +When the Inference Engine reader reads an old version of the IR, it does not use the nGraph representation. +There is no way to activate nGraph flow with an old IR version. +The rest of this document is not applied in this case. + +Model Optimizer generates the IR version 10 by default, and there is the command line key `--generate_deprecated_IR_V7` which switches generation to the legacy IR version 7. +It is useful when the new nGraph flow does not work for some reason. + +## Build a Model in the Application + +Alternative method to feed the Inference Engine with a model is to create the model in the run time. +It is achieved by creation of the `ngraph::Function` construction using nGraph operation classes and optionally user-defined operations. +For details, see [Add Custom nGraph Operations](Extensibility_DG/AddingNGraphOps.md) and [examples](nGraphTutorial.md). +At this stage, the code is completely independent of the rest of the Inference Engine code and can be built separately. +After you construct an instance of `ngraph::Function`, you can use it to create `CNNNetwork` by passing it to the new constructor for this class. + +Initializing `CNNNetwork` from the nGraph Function means encapsulating the object and not converting it to a conventional representation. +Going to low-level details, technically it is achieved by using another class for the `CNNNetwork` internals. +The old representation that is used for former versions of IR before version 10 uses `CNNNetworkImpl`. +The new representation that is built around nGraph uses `CNNNetworkNGraphImpl`. + +![](img/NewAndOldCNNNetworkImpl.png) + +## Automatic Conversion to the Old Representation + +The old representation is still required in the cases listed below. +When old representation is required, the conversion from the `ngraph::Function` to the old representation is called automatically. +The following methods lead to the automatic conversion: + +1. Using the old API, which is expected to produce an old representation. Guaranteed to be read-only. Once you call such a method, the original nGraph representation is preserved and continues to be used in the successive calls. + + 1.1. `CNNNetwork::serialize`. Dumps the old representation after automatically called conversion. Cannot be used to dump IR V10. For details, see [Graph Debug Capabilities](Graph_debug_capabilities.md). + +2. Calling `CNNNetwork` methods that modify the model. After that nGraph representation is lost and cannot be used afterwards. + + 1.1. `CNNNetwork::addLayer` + + 1.2. CNNNetwork::setBatchSize. Still implemented through old logic for backward compatibility without using nGraph capabilities. + For details, see [Using Shape Inference](ShapeInference.md). + +3. Using methods that return objects inside an old representation. +Using these methods does not mean modification of the model, but you are not limited by the API to make read-only changes. +These methods should be used in the read-only mode with respect to a model representation. +If the model is changed, for example attribute of some layer is changed or layers are reconnected, the modification is lost whenever any method that uses nGraph is called, including methods inside plugins like CNNNetwork::reshape. +It is hard to predict whether the nGraph function is used in a plugin or other methods of CNNNetworks, so modifying a network using the following methods is *strongly not recommended*. +This is an important limitation that is introduced for the old API calls listed below: + + 1.1. `Data::getInputTo` + + 1.2. `Data::getCreatorLayer` + + 1.3. `CNNNetwork::getLayerByName` + + 1.4. Iterating over `CNNLayer` objects in `CNNNetwork`: `CNNNetwork::begin`, `details::CNNNetworkIterator` class. + +4. Using a conventional plugin that accepts the old representation only. + +Though the conversion is always a one-way process, which means there is no method to convert back, there are important caveats. + +In the cases [1] and [3], both representations are held underneath and you should use the old representation in the read-only mode only from the caller side. +It is hard to track from the Inference Engine side whether the API is used in the read-only mode or for modification of the model. + +That is why when using potentially modifying methods listed in section [3] above, you should not modify the model via those methods. +Use a direct manipulation of the nGraph function instead. + +## Conversion Function + +Inference Engine implements the conversion function that is used when the nGraph function is transformed to the old `CNNNetworkImpl` representation. +This conversion function is hidden and you cannot call it directly from the application. +Nevertheless, it is an important component of the model transformation pipeline in the Inference Engine. +Some issues of models may be caught during the conversion process in this function. +Exceptions are thrown in this function, and you should know what this function does to find a root cause. + +The conversion function performs the following steps: + +1. Convert and decompose some operations as the first step of the nGraph function preparation for optimization. +Reduce operation set to easily optimize it at the next stages. +For example, decomposing of BatchNormInference happens at this stage. + +2. Optimizing transformations that usually happen in the Model Optimizer are called here, because the nGraph function is not always read from an already optimized IR. + +3. Changing operation set from `opsetX` to legacy layer semantics described in the [Legacy Layers Catalog](../MO_DG/prepare_model/convert_model/Legacy_IR_Layers_Catalog_Spec.md). +The model is still represented as the nGraph function at this stage, but the operation set is completely different. + +4. One-to-one conversion of nGraph representation to the corresponding `CNNNetworkImpl` without changing its semantics. +You can see the result of the conversion by calling the `CNNNetwork::serialize` method, which produces legacy IR semantics, which is not nGraph-based even if it is applied to `CNNNetwork` constructed from the nGraph Function. +It may help in debugging, see [Graph Debug Capabilities](Graph_debug_capabilities.md) to view all options for dumping new and old IR representations. + +## Deprecation Notice + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* diff --git a/docs/IE_DG/protecting_model_guide.md b/docs/IE_DG/protecting_model_guide.md new file mode 100644 index 00000000000000..75e82ebe2c6b3a --- /dev/null +++ b/docs/IE_DG/protecting_model_guide.md @@ -0,0 +1,71 @@ +# Using Encrypted Models with OpenVINO™ {#openvino_docs_IE_DG_protecting_model_guide} + +Deploying deep-learning capabilities to edge devices can present security +challenges. For example, ensuring inference integrity or providing copyright +protection of your deep-learning models. + +One possible solution is to use cryptography to protect models as they are +deployed and stored on edge devices. Model encryption, decryption and +authentication are not provided by OpenVINO™ but can be implemented with +third-party tools, like OpenSSL\*. While implementing encryption, ensure that +you use the latest versions of tools and follow cryptography best practices. + +This guide demonstrates how to use OpenVINO securely with protected models. + +## Secure Model Deployment + +After a model is optimized by the OpenVINO Model Optimizer, it's then deployed +to target devices in the Intermediate Representation (IR) format. An optimized +model is stored on an edge device and executed by the Inference Engine. + +To protect deep-learning models, you can encrypt an optimized model before +deploying it to the edge device. The edge device should keep the stored model +protected at all times and have the model decrypted **in runtime only** for use +by the Inference Engine. + +![deploy_encrypted_model] + +## Loading Encrypted Models + +The OpenVINO Inference Engine requires model decryption before loading. Allocate +a temporary memory block for model decryption, and use +`InferenceEngine::Core::ReadNetwork` method to load the model from memory buffer. +For more information, see the `InferenceEngine::Core` Class +Reference Documentation. + +```cpp +std::vector model; +std::vector weights; + +// Read model files and decrypt them into temporary memory block +decrypt_file(model_file, password, model); +decrypt_file(weights_file, password, weights); +``` + +Hardware-based protection, such as Intel® Software Guard Extensions +(Intel® SGX), can be utilized to protect decryption operation secrets and +bind them to a device. For more information, go to [Intel® Software Guard +Extensions](https://software.intel.com/en-us/sgx). + +Use `InferenceEngine::Core::ReadNetwork()` to set model representations and +weights respectively. + +```cpp +Core core; +// Load model from temporary memory block +std::string strModel(model.begin(), model.end()); +CNNNetwork network = core.ReadNetwork(strModel, make_shared_blob({Precision::U8, {weights.size()}, C}, weights.data())); +``` + +[deploy_encrypted_model]: img/deploy_encrypted_model.png + +## Additional Resources + +- Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) +- OpenVINO™ toolkit online documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org) +- Model Optimizer Developer Guide: [Model Optimizer Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) +- Inference Engine Developer Guide: [Inference Engine Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Deep_Learning_Inference_Engine_DevGuide.html) +- For more information on Sample Applications, see the [Inference Engine Samples Overview](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Samples_Overview.html) +- For information on a set of pre-trained models, see the [Overview of OpenVINO™ Toolkit Pre-Trained Models](@ref omz_models_intel_index) +- For information on Inference Engine Tutorials, see the [Inference Tutorials](https://github.com/intel-iot-devkit/inference-tutorials-generic) +- For IoT Libraries and Code Samples see the [Intel® IoT Developer Kit](https://github.com/intel-iot-devkit). diff --git a/docs/IE_DG/supported_plugins/CL_DNN.md b/docs/IE_DG/supported_plugins/CL_DNN.md new file mode 100644 index 00000000000000..a25012bf0732a0 --- /dev/null +++ b/docs/IE_DG/supported_plugins/CL_DNN.md @@ -0,0 +1,123 @@ +GPU Plugin {#openvino_docs_IE_DG_supported_plugins_CL_DNN} +======= + +The GPU plugin uses the Intel® Compute Library for Deep Neural Networks ([clDNN](https://01.org/cldnn)) to infer deep neural networks. +clDNN is an open source performance library for Deep Learning (DL) applications intended for acceleration of Deep Learning Inference on Intel® Processor Graphics including Intel® HD Graphics and Intel® Iris® Graphics. +For an in-depth description of clDNN, see: [clDNN sources](https://github.com/intel/clDNN) and [Accelerate Deep Learning Inference with Intel® Processor Graphics](https://software.intel.com/en-us/articles/accelerating-deep-learning-inference-with-intel-processor-graphics). + +## Optimizations + +The plugin supports algorithms that fuse several operations into one optimized operation. Refer to the sections below for details. + +> **NOTE**: For operation descriptions, see the [IR Notation Reference](../../ops/opset.md). + +### Fusing Convolution and Simple Layers + +Merge of a Convolution layer and any of the simple layers listed below: +- Activation: ReLU, ELU, Sigmoid, Clamp, and others +- Depthwise: ScaleShift, PReLU +- FakeQuantize + +> **NOTE**: You can have any number and order of simple layers. + +A combination of a Convolution layer and simple layers results in a single fused layer called +*Convolution*: +![conv_simple_01] + + +### Fusing Pooling and FakeQuantize Layers + +A combination of Pooling and FakeQuantize layers results in a single fused layer called *Pooling*: +![pooling_fakequant_01] + +### Fusing Activation Layers + +Given the linear pattern, an Activation layer can be fused into other layers: + +![fullyconnected_activation_01] + + +### Fusing Convolution and Sum Layers + +A combination of Convolution, Simple, and Eltwise layers with the sum operation results in a single layer called *Convolution*: +![conv_sum_relu_01] + +### Fusing a Group of Convolutions + +If a topology contains the following pipeline, a GPU plugin merges Split, Convolution, and Concatenation layers into a single Convolution layer with the group parameter: +> **NOTE**: Parameters of the Convolution layers must coincide. + +![group_convolutions_01] + +### Optimizing Layers Out + +The following layers are optimized out under certain conditions: + * Crop + * Concatenate + * Reshape + * Flatten + * Split + * Copy + +### Load-Time Execution + +Some layers are executed during the load time, not during the inference. One of such layers is PriorBox. + + +## CPU Executed Layers + +The following layers are not accelerated on the GPU and executed on the host CPU instead: +* Proposal +* SimplerNMS +* PriorBox +* DetectionOutput + +## Known Layers Limitations +* ROIPooling is supported for 'max' value of 'method' attribute. + +## Supported Configuration Parameters + +The plugin supports the configuration parameters listed below. +All parameters must be set before calling InferenceEngine::Core::LoadNetwork() in order to take effect. +When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix. + +| Parameter Name | Parameter Values | Default | Description | +|---------------------|-----------------------------|-----------------|-----------------------------------------------------------| +| `KEY_PERF_COUNT` | `YES` / `NO` | `NO` | Collect performance counters during inference | +| `KEY_CONFIG_FILE` | `" [ ...]"` | `""` | Load custom layer configuration files | +| `KEY_DUMP_KERNELS` | `YES` / `NO` | `NO` | Dump the final kernels used for custom layers | +| `KEY_TUNING_MODE` | `TUNING_DISABLED`
`TUNING_CREATE`
`TUNING_USE_EXISTING` | `TUNING_DISABLED` | Disable inference kernel tuning
Create tuning file (expect much longer runtime)
Use an existing tuning file | +| `KEY_TUNING_FILE` | `""` | `""` | Tuning file to create / use | +| `KEY_CLDNN_PLUGIN_PRIORITY` | `<0-3>` | `0` | OpenCL queue priority (before usage, make sure your OpenCL driver supports appropriate extension)
Higher value means higher priority for clDNN OpenCL queue. 0 disables the setting. | +| `KEY_CLDNN_PLUGIN_THROTTLE` | `<0-3>` | `0` | OpenCL queue throttling (before usage, make sure your OpenCL driver supports appropriate extension)
Lower value means lower driver thread priority and longer sleep time for it. 0 disables the setting. | +| `KEY_CLDNN_GRAPH_DUMPS_DIR` | `""` | `""` | clDNN graph optimizer stages dump output directory (in GraphViz format) | +| `KEY_CLDNN_SOURCES_DUMPS_DIR` | `""` | `""` | Final optimized clDNN OpenCL sources dump output directory | +| `KEY_GPU_THROUGHPUT_STREAMS` | `KEY_GPU_THROUGHPUT_AUTO`, or positive integer| 1 | Specifies a number of GPU "execution" streams for the throughput mode (upper bound for a number of inference requests that can be executed simultaneously).
This option is can be used to decrease GPU stall time by providing more effective load from several streams. Increasing the number of streams usually is more effective for smaller topologies or smaller input sizes. Note that your application should provide enough parallel slack (e.g. running many inference requests) to leverage full GPU bandwidth. Additional streams consume several times more GPU memory, so make sure the system has enough memory available to suit parallel stream execution. Multiple streams might also put additional load on CPU. If CPU load increases, it can be regulated by setting an appropriate `KEY_CLDNN_PLUGIN_THROTTLE` option value (see above). If your target system has relatively weak CPU, keep throttling low.
The default value is 1, which implies latency-oriented behaviour.
`KEY_GPU_THROUGHPUT_AUTO` creates bare minimum of streams to improve the performance; this is the most portable option if you are not sure how many resources your target machine has (and what would be the optimal number of streams).
A positive integer value creates the requested number of streams. | +| `KEY_EXCLUSIVE_ASYNC_REQUESTS` | `YES` / `NO` | `NO` | Forces async requests (also from different executable networks) to execute serially.| + +## Note on Debug Capabilities of the GPU Plugin + +Inference Engine GPU plugin provides possibility to dump the user custom OpenCL™ kernels to a file to allow you to properly debug compilation issues in your custom kernels. + +The application can use the SetConfig() function with the key PluginConfigParams::KEY_DUMP_KERNELS and value: PluginConfigParams::YES. Then during network loading, all custom layers will print their OpenCL kernels with the JIT instrumentation added by the plugin. +The kernels will be stored in the working directory under files named the following way: clDNN_program0.cl, clDNN_program1.cl. + +This option is disabled by default. Additionally, the application can call the SetConfig() function with the key PluginConfigParams::KEY_DUMP_KERNELS and value: PluginConfigParams::NO before network loading. + +How to verify that this option is disabled: +1. Delete all clDNN_program*.cl files from the current directory +2. Run your application to load a network +3. Examine the working directory for the presence of any kernel file (for example, clDNN_program0.cl) + +## GPU Context and Video Memory Sharing RemoteBlob API + +See [RemoteBlob API of GPU Plugin](GPU_RemoteBlob_API.md) + +## See Also +* [Supported Devices](Supported_Devices.md) + +[conv_simple_01]: ../img/conv_simple_01.png +[pooling_fakequant_01]: ../img/pooling_fakequant_01.png +[fullyconnected_activation_01]: ../img/fullyconnected_activation_01.png +[group_convolutions_01]: ../img/group_convolutions_01.png +[conv_sum_relu_01]: ../img/conv_sum_relu_01.png diff --git a/docs/IE_DG/supported_plugins/CPU.md b/docs/IE_DG/supported_plugins/CPU.md new file mode 100644 index 00000000000000..dec4b850c4d08c --- /dev/null +++ b/docs/IE_DG/supported_plugins/CPU.md @@ -0,0 +1,131 @@ +CPU Plugin {#openvino_docs_IE_DG_supported_plugins_CPU} +======= + +## Introducing CPU Plugin +The CPU plugin was developed in order to provide opportunity for high performance scoring of neural networks on CPU, using the Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN). + +Currently, the CPU plugin uses Intel® Threading Building Blocks (Intel® TBB) in order to parallelize calculations. Please refer to the [Optimization Guide](../../optimization_guide/dldt_optimization_guide.md) for associated performance considerations. + +The set of supported layers can be expanded with [the Extensibility mechanism](../Extensibility_DG/Intro.md). + +## Supported Platforms + +OpenVINO™ toolkit is officially supported and validated on the following platforms: + +| Host | OS (64-bit) | +| :--- | :--- | +| Development | Ubuntu* 16.04/CentOS* 7.4/MS Windows* 10 | +| Target | Ubuntu* 16.04/CentOS* 7.4/MS Windows* 10 | + +The CPU Plugin supports inference on Intel® Xeon® with Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX-512), and AVX512_BF16, Intel® Core™ +Processors with Intel® AVX2, Intel Atom® Processors with Intel® Streaming SIMD Extensions (Intel® SSE). + +You can use `-pc` the flag for samples to know which configuration is used by some layer. +This flag shows execution statistics that you can use to get information about layer name, +execution status, layer type, execution time, and the type of the execution primitive. + +## Internal CPU Plugin Optimizations + +CPU plugin supports several graph optimization algorithms, such as fusing or removing layers. +Refer to the sections below for details. + +> **NOTE**: For layer descriptions, see the [IR Notation Reference](../../ops/opset.md). + +### Lowering Inference Precision + +CPU plugin follows default optimization approach. This approach means that inference is made with lower precision if it is possible on a given platform to reach better performance with acceptable range of accuracy. + +> **NOTE**: For details, see the [Using Bfloat16 Inference](../Bfloat16Inference.md). + +### Fusing Convolution and Simple Layers + +Merge of a Convolution layer and any of the simple layers listed below: +- Activation: ReLU, ELU, Sigmoid, Clamp +- Depthwise: ScaleShift, PReLU +- FakeQuantize + +> **NOTE**: You can have any number and order of simple layers. + +A combination of a Convolution layer and simple layers results in a single fused layer called +*Convolution*: +![conv_simple_01] + + +### Fusing Pooling and FakeQuantize Layers + +A combination of Pooling and FakeQuantize layers results in a single fused layer called *Pooling*: +![pooling_fakequant_01] + +### Fusing FullyConnected and Activation Layers + +A combination of FullyConnected and Activation layers results in a single fused layer called +*FullyConnected*: +![fullyconnected_activation_01] + + +### Fusing Convolution and Depthwise Convolution Layers Grouped with Simple Layers + +> **NOTE**: This pattern is possible only on CPUs with support of Streaming SIMD Extensions 4.2 +> (SSE 4.2) and Intel AVX2 Instruction Set Architecture (ISA). + +A combination of a group of a Convolution (or Binary Convolution) layer and simple layers and a group of a Depthwise Convolution +layer and simple layers results in a single layer called *Convolution* (or *Binary Convolution*): +> **NOTE**: Depthwise convolution layers should have the same values for the `group`, input channels, and output channels parameters. + +![conv_depth_01] + +### Fusing Convolution and Sum Layers + +A combination of Convolution, Simple, and Eltwise layers with the sum operation results in a single layer called *Convolution*: +![conv_sum_relu_01] + +### Fusing a Group of Convolutions + +If a topology contains the following pipeline, a CPU plugin merges Split, Convolution, and Concatenation layers into a single Convolution layer with the group parameter: +> **NOTE**: Parameters of the Convolution layers must coincide. + +![group_convolutions_01] + +### Removing a Power Layer + +CPU plugin removes a Power layer from a topology if it has the following parameters: + - power = 1 + - scale = 1 + - offset = 0 + + +## Supported Configuration Parameters + +The plugin supports the configuration parameters listed below. +All parameters must be set with the InferenceEngine::Core::LoadNetwork() method. +When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix. +Refer to the OpenVINO samples for usage examples: [Benchmark App](../../../inference-engine/samples/benchmark_app/README.md). + +These are general options, also supported by other plugins: + +| Parameter name | Parameter values | Default | Description | +| :--- | :--- | :--- | :----------------------------------------------------------------------------------------------------------------------------| +| KEY_EXCLUSIVE_ASYNC_REQUESTS | YES/NO | NO | Forces async requests (also from different executable networks) to execute serially. This prevents potential oversubscription| +| KEY_PERF_COUNT | YES/NO | NO | Enables gathering performance counters | + +CPU-specific settings: + +| Parameter name | Parameter values | Default | Description | +| :--- | :--- | :--- | :--- | +| KEY_CPU_THREADS_NUM | positive integer values| 0 | Specifies the number of threads that CPU plugin should use for inference. Zero (default) means using all (logical) cores| +| KEY_CPU_BIND_THREAD | YES/NUMA/NO | YES | Binds inference threads to CPU cores. 'YES' (default) binding option maps threads to cores - this works best for static/synthetic scenarios like benchmarks. The 'NUMA' binding is more relaxed, binding inference threads only to NUMA nodes, leaving further scheduling to specific cores to the OS. This option might perform better in the real-life/contended scenarios. Note that for the latency-oriented cases (single execution stream, see below) both YES and NUMA options limit number of inference threads to the number of hardware cores (ignoring hyper-threading) on the multi-socket machines. | +| KEY_CPU_THROUGHPUT_STREAMS | KEY_CPU_THROUGHPUT_NUMA, KEY_CPU_THROUGHPUT_AUTO, or positive integer values| 1 | Specifies number of CPU "execution" streams for the throughput mode. Upper bound for the number of inference requests that can be executed simultaneously. All available CPU cores are evenly distributed between the streams. The default value is 1, which implies latency-oriented behavior with all available cores processing requests one by one.
KEY_CPU_THROUGHPUT_NUMA creates as many streams as needed to accommodate NUMA and avoid associated penalties.
KEY_CPU_THROUGHPUT_AUTO creates bare minimum of streams to improve the performance; this is the most portable option if you don't know how many cores your target machine has (and what would be the optimal number of streams). Note that your application should provide enough parallel slack (for example, run many inference requests) to leverage the throughput mode.
A positive integer value creates the requested number of streams. | +| KEY_ENFORCE_BF16 | YES/NO| YES | The name for setting to execute in bfloat16 precision whenever it is possible. This option lets plugin know to downscale the precision where it sees performance benefits from bfloat16 execution. Such option does not guarantee accuracy of the network, you need to verify the accuracy in this mode separately, based on performance and accuracy results. It should be your decision whether to use this option or not. | + +## See Also +* [Supported Devices](Supported_Devices.md) + +[mkldnn_group_conv]: ../img/mkldnn_group_conv.png +[mkldnn_conv_sum]: ../img/mkldnn_conv_sum.png +[mkldnn_conv_sum_result]: ../img/mkldnn_conv_sum_result.png +[conv_simple_01]: ../img/conv_simple_01.png +[pooling_fakequant_01]: ../img/pooling_fakequant_01.png +[fullyconnected_activation_01]: ../img/fullyconnected_activation_01.png +[conv_depth_01]: ../img/conv_depth_01.png +[group_convolutions_01]: ../img/group_convolutions_01.png +[conv_sum_relu_01]: ../img/conv_sum_relu_01.png diff --git a/docs/IE_DG/supported_plugins/FPGA.md b/docs/IE_DG/supported_plugins/FPGA.md new file mode 100644 index 00000000000000..c7c080bb4cc152 --- /dev/null +++ b/docs/IE_DG/supported_plugins/FPGA.md @@ -0,0 +1,294 @@ +FPGA Plugin {#openvino_docs_IE_DG_supported_plugins_FPGA} +=========== + +## Introducing FPGA Plugin + +The FPGA plugin provides an opportunity for high performance scoring of neural networks on Intel® FPGA devices. + +> **NOTE**: Before using the FPGA plugin, ensure that you have installed and configured either the Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) or the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA. For installation and configuration details, see [FPGA installation](Supported_Devices.md). + +## Heterogeneous Execution + +When your topology contains layers that are not supported by the Intel® FPGA plugin, use [Heterogeneous plugin](HETERO.md) with dedicated fallback device. + +If a network has layers that are not supported in the Intel® FPGA plugin or in a fallback plugin, you can implement a custom layer on the CPU/GPU and use the [Extensibility mechanism](../Extensibility_DG/Intro.md). +In addition to adding custom kernels, you must still point to the CPU plugin or the GPU plugin as fallback devices for heterogeneous plugin. + +## Supported Networks + +The following network topologies are supported in heterogeneous mode, running on FPGA with fallback to CPU or GPU devices. + +> **IMPORTANT**: Use only bitstreams from the current version of the OpenVINO toolkit. Bitstreams from older versions of the OpenVINO toolkit are incompatible with later versions of the OpenVINO toolkit. For example, you cannot use the `1-0-1_A10DK_FP16_Generic` bitstream, when the OpenVINO toolkit supports the `2019R2_PL2_FP16_InceptionV1_SqueezeNet_VGG_YoloV3.aocx` bitstream. + + +| Network | Bitstreams (Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2)) | Bitstreams (Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA) | +|:-------------------------------------|:-------------------------------------------------------------------|:---------------------------------------------------------------------------------------------| +| AlexNet | 2020-4_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic, 2020-4_PL2_FP11_AlexNet_GoogleNet_Generic | 2020-4_RC_FP16_AlexNet_GoogleNet_Generic, 2020-4_RC_FP11_AlexNet_GoogleNet_Generic | +| GoogleNet v1 | 2020-4_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic, 2020-4_PL2_FP11_AlexNet_GoogleNet_Generic | 2020-4_RC_FP16_AlexNet_GoogleNet_Generic, 2020-4_RC_FP11_AlexNet_GoogleNet_Generic | +| VGG-16 | 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_InceptionV1_SqueezeNet_TinyYolo_VGG, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | +| VGG-19 | 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_InceptionV1_SqueezeNet_TinyYolo_VGG, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | +| SqueezeNet v 1.0 | 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG, 2020-4_PL2_FP11_SqueezeNet | 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3, 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3 | +| SqueezeNet v 1.1 | 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG, 2020-4_PL2_FP11_SqueezeNet | 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3, 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3 | +| ResNet-18 | 2020-4_PL2_FP16_ResNet_YoloV3, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_ResNet_YoloV3, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | +| ResNet-50 | 2020-4_PL2_FP16_ResNet_YoloV3, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_ResNet_YoloV3, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | +| ResNet-101 | 2020-4_PL2_FP16_ResNet_YoloV3, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_ResNet_YoloV3, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | +| ResNet-152 | 2020-4_PL2_FP16_ResNet_YoloV3, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_ResNet_YoloV3, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | +| MobileNet (Caffe) | 2020-4_PL2_FP16_MobileNet_Clamp, 2020-4_PL2_FP11_MobileNet_Clamp | 2020-4_RC_FP16_MobileNet_Clamp, 2020-4_RC_FP11_MobileNet_Clamp | +| MobileNet (TensorFlow) | 2020-4_PL2_FP16_MobileNet_Clamp, 2020-4_PL2_FP11_MobileNet_Clamp | 2020-4_RC_FP16_MobileNet_Clamp, 2020-4_RC_FP11_MobileNet_Clamp| +| SqueezeNet-based variant of the SSD* | 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG, 2020-4_PL2_FP11_SqueezeNet | 2020-4_RC_FP16_InceptionV1_SqueezeNet_TinyYolo_VGG, 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3 | +| ResNet-based variant of SSD | 2020-4_PL2_FP16_ResNet_YoloV3, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_ResNet_YoloV3, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | +| RMNet | 2020-4_PL2_FP16_RMNet, 2020-4_PL2_FP11_RMNet | 2020-4_RC_FP16_RMNet, 2020-4_RC_FP11_RMNet | +| Yolo v3 | 2020-4_PL2_FP16_ResNet_YoloV3, 2020-4_PL2_FP11_YoloV3_ELU | 2020-4_RC_FP16_ResNet_YoloV3, 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3 | + + +In addition to the list above, arbitrary topologies having big continues subgraphs consisting of layers supported by FPGA plugin are recommended to be executed on FPGA plugin. + +## Bitstreams that are Optimal to Use with the Intel's Pre-Trained Models + +The table below provides you with a list of Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) bitstreams that are optimal to use for the Intel's pre-trained models. + +
+ Click to expand/collapse the table + +| Model Name | FP11 Bitstreams | FP16 Bitstreams | +| :--- | :--- | :--- | +| action-recognition-0001-decoder | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| action-recognition-0001-encoder | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | +| age-gender-recognition-retail-0013 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| asl-recognition-0004 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| driver-action-recognition-adas-0002-decoder | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| driver-action-recognition-adas-0002-encoder | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| emotions-recognition-retail-0003 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | +| face-detection-0100 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| face-detection-0102 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| face-detection-0104 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| face-detection-0105 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| face-detection-0106 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | +| face-detection-adas-0001 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| face-detection-adas-binary-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| face-detection-retail-0004 | 2020-3_PL2_FP11_TinyYolo_SSD300.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| face-detection-retail-0005 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| face-reidentification-retail-0095 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| facial-landmarks-35-adas-0002 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| faster-rcnn-resnet101-coco-sparse-60-0001 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| gaze-estimation-adas-0002 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| handwritten-japanese-recognition-0001 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | +| handwritten-score-recognition-0003 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | +| head-pose-estimation-adas-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| human-pose-estimation-0001 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| icnet-camvid-ava-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| icnet-camvid-ava-sparse-30-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| icnet-camvid-ava-sparse-60-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| image-retrieval-0001 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| instance-segmentation-security-0010 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | +| instance-segmentation-security-0050 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | +| instance-segmentation-security-0083 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| instance-segmentation-security-1025 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| landmarks-regression-retail-0009 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| license-plate-recognition-barrier-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | +| pedestrian-and-vehicle-detector-adas-0001 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| pedestrian-detection-adas-0002 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| pedestrian-detection-adas-binary-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| person-attributes-recognition-crossroad-0230 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| person-detection-action-recognition-0005 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| person-detection-action-recognition-0006 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| person-detection-action-recognition-teacher-0002 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| person-detection-asl-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| person-detection-raisinghand-recognition-0001 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| person-detection-retail-0002 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| person-detection-retail-0013 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| person-reidentification-retail-0031 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_ELU.aocx | +| person-reidentification-retail-0248 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| person-reidentification-retail-0249 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| person-reidentification-retail-0300 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| person-vehicle-bike-detection-crossroad-0078 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_ELU.aocx | +| person-vehicle-bike-detection-crossroad-1016 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| product-detection-0001 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| resnet18-xnor-binary-onnx-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_RMNet.aocx | +| resnet50-binary-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| road-segmentation-adas-0001 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| semantic-segmentation-adas-0001 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| single-image-super-resolution-1032 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_RMNet.aocx | +| single-image-super-resolution-1033 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_RMNet.aocx | +| text-detection-0003 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| text-detection-0004 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| text-image-super-resolution-0001 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_RMNet.aocx | +| text-recognition-0012 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| text-spotting-0002-detector | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | +| text-spotting-0002-recognizer-decoder | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| text-spotting-0002-recognizer-encoder | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | +| unet-camvid-onnx-0001 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| vehicle-attributes-recognition-barrier-0039 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | +| vehicle-detection-adas-0002 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| vehicle-detection-adas-binary-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| vehicle-license-plate-detection-barrier-0106 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| yolo-v2-ava-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | +| yolo-v2-ava-sparse-35-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | +| yolo-v2-ava-sparse-70-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | +| yolo-v2-tiny-ava-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | +| yolo-v2-tiny-ava-sparse-30-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | +| yolo-v2-tiny-ava-sparse-60-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | + +
+ +## Translate from Architecture to FPGA Bitstream Files + +Various FPGA bitstreams that support CNN are available in the OpenVINO™ toolkit package for FPGA. + +To select the correct bitstream (`.aocx`) file for an architecture, select a network (for example, Resnet-18) from the table above for either the Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Speed Grade 1), Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Speed Grade 2) or the Intel® Programmable Acceleration Card (PAC) with Intel® Arria® 10 GX FPGA and note the corresponding architecture. + +The following table describes several parameters that might help you to select the proper bitstream for your needs: + +| Name | Board | Precision | LRN Support | Leaky ReLU Support | PReLU Support | Clamp Support | ELU Support | +|:------------------------------------------|:--------------------------------------------------------------------------------|:----------|:------------|:-------------------|:--------------|:--------------|:------------| +| 2020-4_PL2_FP11_AlexNet_GoogleNet_Generic | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | true | true | true | false | false | +| 2020-4_PL2_FP11_SqueezeNet | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | true | true | false | false | +| 2020-4_PL2_FP11_MobileNet_Clamp | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | true | true | true | false | +| 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | false | false | false | false | +| 2020-4_PL2_FP11_RMNet | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | true | true | false | true | +| 2020-4_PL2_FP11_TinyYolo_SSD300 | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | true | true | true | false | false | +| 2020-4_PL2_FP11_YoloV3_ELU | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | true | true | false | true | +| 2020-4_PL2_FP11_Streaming_InternalUseOnly | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | false | false | false | false | +| 2020-4_PL2_FP11_Streaming_Slicing_InternalUseOnly | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | false | false | false | false | +| 2020-4_PL2_FP11_SwishExcitation | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | false | false | false | false | +| 2020-4_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | true | true | true | false | false | +| 2020-4_PL2_FP16_ELU | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | false | true | true | false | true | +| 2020-4_PL2_FP16_MobileNet_Clamp | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | false | true | true | true | false | +| 2020-4_PL2_FP16_ResNet_YoloV3 | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | false | true | true | false | false | +| 2020-4_PL2_FP16_RMNet | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | false | true | true | false | true | +| 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | false | true | true | false | false | +| 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | false | false | false | false | false | +| 2020-4_RC_FP11_AlexNet_GoogleNet_Generic | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | true | true | true | false | false | +| 2020-4_RC_FP11_RMNet | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | false | true | true | false | true | +| 2020-4_RC_FP11_Streaming_InternalUseOnly | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | true | false | false | false | false | +| 2020-4_RC_FP11_Streaming_Slicing_InternalUseOnly | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | true | false | false | false | false | +| 2020-4_RC_FP11_ELU | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | false | true | true | false | true | +| 2020-4_RC_FP11_SwishExcitation | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | false | false | false | false | false | +| 2020-4_RC_FP11_InceptionV1_ResNet_SqueezeNet_TinyYolo_YoloV3 | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | false | true | true | false | false | +| 2020-4_RC_FP11_MobileNet_Clamp | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | false | true | true | true | false | +| 2020-4_RC_FP16_AlexNet_GoogleNet_Generic | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | true | true | true | false | false | +| 2020-4_RC_FP16_InceptionV1_SqueezeNet_TinyYolo_VGG | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | false | true | true | false | false | +| 2020-4_RC_FP16_RMNet | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | false | true | true | false | true | +| 2020-4_RC_FP16_SwishExcitation | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | false | false | false | false | false | +| 2020-4_RC_FP16_MobileNet_Clamp | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | false | true | true | true | false | +| 2020-4_RC_FP16_ResNet_YoloV3 | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | false | true | true | false | false | +| 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3 | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | false | true | true | false | false | + +## Set Environment for Running the FPGA Plugin + +To make the FPGA plugin run directly or through the heterogeneous plugin, set up the environment: +1. Set up environment to access Intel® FPGA RTE for OpenCL: +``` +source /opt/altera/aocl-pro-rte/aclrte-linux64/init_opencl.sh +``` +2. Set the following environment variable and program the board with a DLA bitstream. Programming of the board is not supported during runtime and must be done before running an application. + + | Variable | Setting | + | :----------------------------------| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| + | ACL_PCIE_USE_JTAG_PROGRAMMING | Set this variable to a value of 1 to force FPGA reprogramming using JTAG | + +## Analyzing Heterogeneous Execution + +Besides generation of .dot files, you can use the error listening mechanism: + +```cpp +class FPGA_ErrorListener : public InferenceEngine::IErrorListener +{ +public: + virtual void onError(const char *msg) noexcept override { + std::cout << msg; + } +}; +... +FPGA_ErrorListener err_listener; +core.SetLogCallback(err_listener); // will be used for FPGA device as well +``` +If during network loading some layers are decided to be executed on a fallback plugin, the following message is printed: + +```cpp +Layer (Name: detection_out, Type: DetectionOutput) is not supported: + custom or unknown. + Has (3) sets of inputs, must be 1, or 2. + Input dimensions (2) should be 4. +``` + +## Multiple FPGA Devices Support + +The Inference Engine FPGA plugin provides an ability to load different networks on multiple FPGA devices. For example, to load two networks AlexNet and MobileNet v2 on two different FPGA devices, follow the steps below: + +1. Program each FGPA device with a corresponding bitstream: +```bash +aocl program acl0 2019R3_PV_PL1_FP16_AlexNet_GoogleNet_InceptionV1_SSD300_Generic.aocx +``` +```bash +aocl program acl1 2019R3_PV_PL1_FP16_MobileNet_Clamp.aocx +``` +For more information about bitstream programming instructions, refer to [Installation Guide for Linux* with Support for FPGA](Supported_Devices.md) +2. All FPGA devices are enumerated with unique ID starting from `0`. By default, all networks are loaded to the default +device with ID `0`. If you want to load a network on a particular non-default device, specify the `KEY_DEVICE_ID` +parameter for C++ and `DEVICE_ID` parameter for Python\*. +The following code snippets demonstrates how to load the AlexNet network on the FPGA device with ID `0` and the +MobileNet v2 network on the device with ID `1`: + * With C++: +```cpp +InferenceEngine::Core core; + +// Load AlexNet network on the first FPGA device programmed with bitstream supporting AlexNet +auto alexnetNetwork = core.ReadNetwork("alexnet.xml"); +auto exeNetwork1 = core.LoadNetwork(alexnetNetwork, "FPGA.0"); + +// Load MobileNet network on the second FPGA device programmed with MobileNet bitstream +auto mobilenetNetwork = core.ReadNetwork("mobilenet_v2.xml"); +auto exeNetwork2 = core.LoadNetwork(mobilenetNetwork, "FPGA", { { KEY_DEVICE_ID, "1" } }); +``` + * With Python: +```python +# Load AlexNet network on the first FPGA device programmed with bitstream supporting AlexNet +net1 = IENetwork(model="alexnet.xml", weights="alexnet.bin") +plugin.load(network=net1, config={"DEVICE_ID": "0"}) + +# Load MobileNet network on the second FPGA device programmed with MobileNet bitstream +net2 = IENetwork(model="mobilenet_v2.xml", weights="mobilenet_v2.bin") +plugin.load(network=net2, config={"DEVICE_ID": "1"}) +``` +Note that you have to use asynchronous infer requests to utilize several FPGA devices, otherwise the execution on devices is performed sequentially. + +## Import and Export Network Flow + +Since the 2019 R4 release, FPGA and HETERO plugins support the export and import flow, which allows to export a compiled network from a plugin to a binary blob by running the command below: + +```bash +$ ./compile_tool -m resnet.xml -DLA_ARCH_NAME 4x2x16x32_fp16_sb9408_fcd1024_actk4_poolk4_normk1_owk2_image300x300x8192_mbfr -d HETERO:FPGA,CPU +Inference Engine: + API version ............ 2.1 + Build .................. 6db44e09a795cb277a63275ea1395bfcb88e46ac + Description ....... API +Done +``` + +Once the command is executed, the binary blob named `resnet.blob` is created at the working directory. Refer to the [Compile tool](../../../inference-engine/tools/compile_tool/README.md) documentation for more details. + +A compiled binary blob can be later imported via `InferenceEngine::Core::Import`: + +```cpp +InferenceEngine::Core core; +std::ifstream strm("resnet.blob"); +auto execNetwork = core.Import(strm); +``` + +## How to Interpret Performance Counters + +As a result of collecting performance counters using InferenceEngine::InferRequest::GetPerformanceCounts you can find out performance data about execution on FPGA, pre-processing and post-processing data and data transferring from/to FPGA card. + +If network is sliced to two parts that are executed on CPU, you can find performance data about Intel® MKL-DNN kernels, their types, and other useful information. + +## Limitations of the FPGA Support for CNN + +The Inference Engine FPGA plugin has limitations on network topologies, kernel parameters, and batch size. + +* Depending on the bitstream loaded on the target device, the FPGA performs calculations with precision rates ranging from FP11 to FP16. This might have accuracy implications. Use the [Accuracy Checker](@ref omz_tools_accuracy_checker_README) to verify the network accuracy on the validation data set. +* Networks that have many CNN layers that are not supported on FPGA stayed in topologies between supported layers might lead to dividing of graph to many subgraphs that might lead to `CL_OUT_OF_HOST_MEMORY` error. These topologies are not FPGA friendly for this release. +* When you use the heterogeneous plugin, the affinity and distribution of nodes by devices depends on the FPGA bitstream that you use. Some layers might not be supported by a bitstream or parameters of the layer are not supported by the bitstream. + +## See Also +* [Supported Devices](Supported_Devices.md) diff --git a/docs/IE_DG/supported_plugins/GNA.md b/docs/IE_DG/supported_plugins/GNA.md new file mode 100644 index 00000000000000..3ddda708a47575 --- /dev/null +++ b/docs/IE_DG/supported_plugins/GNA.md @@ -0,0 +1,166 @@ +# GNA Plugin {#openvino_docs_IE_DG_supported_plugins_GNA} + +## Introducing the GNA Plugin + +Intel® Gaussian & Neural Accelerator is a low-power neural coprocessor for continuous inference at the edge. + +Intel® GNA is not intended to replace classic inference devices such as +CPU, graphics processing unit (GPU), or vision processing unit (VPU) . It is designed for offloading +continuous inference workloads including but not limited to noise reduction or speech recognition +to save power and free CPU resources. + +The GNA plugin provides a way to run inference on Intel® GNA, as well as in the software execution mode on CPU. + +## Devices with Intel® GNA + +Devices with Intel® GNA support: + +* [Intel® Speech Enabling Developer Kit](https://www.intel.com/content/www/us/en/support/articles/000026156/boards-and-kits/smart-home.html) + +* [Amazon Alexa* Premium Far-Field Developer Kit](https://developer.amazon.com/en-US/alexa/alexa-voice-service/dev-kits/amazon-premium-voice) + +* [Gemini Lake](https://ark.intel.com/content/www/us/en/ark/products/codename/83915/gemini-lake.html): + - Intel® Pentium® Silver J5005 Processor + - Intel® Pentium® Silver N5000 Processor + - Intel® Celeron® J4005 Processor + - Intel® Celeron® J4105 Processor + - Intel® Celeron® Processor N4100 + - Intel® Celeron® Processor N4000 + +* [Cannon Lake](https://ark.intel.com/content/www/us/en/ark/products/136863/intel-core-i3-8121u-processor-4m-cache-up-to-3-20-ghz.html): +Intel® Core™ i3-8121U Processor + +* [Ice Lake](https://ark.intel.com/content/www/us/en/ark/products/codename/74979/ice-lake.html): + - Intel® Core™ i7-1065G7 Processor + - Intel® Core™ i7-1060G7 Processor + - Intel® Core™ i5-1035G4 Processor + - Intel® Core™ i5-1035G7 Processor + - Intel® Core™ i5-1035G1 Processor + - Intel® Core™ i5-1030G7 Processor + - Intel® Core™ i5-1030G4 Processor + - Intel® Core™ i3-1005G1 Processor + - Intel® Core™ i3-1000G1 Processor + - Intel® Core™ i3-1000G4 Processor + +> **NOTE**: On platforms where Intel® GNA is not enabled in the BIOS, the driver cannot be installed, so the GNA plugin uses the software emulation mode only. + +## Drivers and Dependencies + +Intel® GNA hardware requires a driver to be installed on the system. + +* Linux\* OS: +[Download Intel® GNA driver for Ubuntu Linux 18.04.3 LTS (with HWE Kernel version 5.0+)](https://download.01.org/opencv/drivers/gna/) + +* Windows\* OS: +Intel® GNA driver for Windows is available through Windows Update\* + +## Models and Layers Limitations + +Because of specifics of hardware architecture, Intel® GNA supports a limited set of layers, their kinds and combinations. +For example, you should not expect the GNA Plugin to be able to run computer vision models, except those specifically adapted for the GNA Plugin, because the plugin does not fully support +2D convolutions. + +The list of supported layers can be found +[here](Supported_Devices.md) (see the GNA column of Supported Layers section). +Limitations include: + +- Only 1D convolutions (in the models converted from [Kaldi](../../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md) framework) are natively supported +- The number of output channels for convolutions must be a multiple of 4 +- Permute layer support is limited to the cases where no data reordering is needed, or when reordering is happening for 2 dimensions, at least one of which is not greater than 8 +- Power layer only supports the power parameter equal to 1 + +#### Experimental Support for 2D Convolutions + +The Intel® GNA hardware natively supports only 1D convolution. + +However, 2D convolutions can be mapped to 1D when a convolution kernel moves in a single direction. Such a transformation is performed by the GNA Plugin for Kaldi `nnet1` convolution. From this perspective, the Intel® GNA hardware convolution operation accepts a `NHWC` input and produces `NHWC` output. Because OpenVINO™ only supports the `NCHW` layout, it may be necessary to insert `Permute` layers before or after convolutions. + +For example, the Kaldi model optimizer inserts such a permute after convolution for the [rm_cnn4a network](https://download.01.org/openvinotoolkit/models_contrib/speech/kaldi/rm_cnn4a_smbr/). This `Permute` layer is automatically removed by the GNA Plugin, because the Intel® GNA hardware convolution layer already produces the required `NHWC` result. + +## Operation Precision + +Intel® GNA essentially operates in the low-precision mode, which represents a mix of 8-bit (`I8`), 16-bit (`I16`), and 32-bit (`I32`) integer computations, so compared to 32-bit floating point (`FP32`) results – for example, calculated on CPU using Inference Engine [CPU Plugin](CPU.md) – outputs calculated using reduced integer precision are different from the scores calculated using floating point. + +Unlike other plugins supporting low-precision execution, the GNA plugin calculates quantization factors at the model loading time, so a model can run without calibration. + +## Execution Modes + +| Mode | Description | +| :---------------------------------| :---------------------------------------------------------| +| `GNA_AUTO` | Uses Intel® GNA if available, otherwise uses software execution mode on CPU. | +| `GNA_HW` | Uses Intel® GNA if available, otherwise raises an error. | +| `GNA_SW` | *Deprecated*. Executes the GNA-compiled graph on CPU performing calculations in the same precision as the Intel® GNA, but not in the bit-exact mode. | +| `GNA_SW_EXACT` | Executes the GNA-compiled graph on CPU performing calculations in the same precision as the Intel® GNA in the bit-exact mode. | +| `GNA_SW_FP32` | Executes the GNA-compiled graph on CPU but substitutes parameters and calculations from low precision to floating point (`FP32`). | + +## Supported Configuration Parameters + +The plugin supports the configuration parameters listed below. +The parameters are passed as `std::map` on `InferenceEngine::Core::LoadNetwork` or `InferenceEngine::SetConfig`. + +The parameter `KEY_GNA_DEVICE_MODE` can also be changed at run time using `InferenceEngine::ExecutableNetwork::SetConfig` (for any values excluding `GNA_SW_FP32`). This allows switching the +execution between software emulation mode and hardware emulation mode after the model is loaded. + +The parameter names below correspond to their usage through API keys, such as `GNAConfigParams::KEY_GNA_DEVICE_MODE` or `PluginConfigParams::KEY_PERF_COUNT`. +When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix. + +| Parameter Name | Parameter Values | Default Value | Description | +| :---------------------------------| :---------------------------------------------------------| :-----------| :------------------------------------------------------------------------| +| `KEY_GNA_COMPACT_MODE` | `YES`/`NO` | `YES` | Reuse I/O buffers to save space (makes debugging harder) | +| `KEY_GNA_SCALE_FACTOR` | `FP32` number | 1.0 | Scale factor to use for input quantization | +| `KEY_GNA_DEVICE_MODE` | `GNA_AUTO`/`GNA_HW`/`GNA_SW_EXACT`/`GNA_SW_FP32` | `GNA_AUTO` | One of the modes described Execution Models | +| `KEY_GNA_FIRMWARE_MODEL_IMAGE` | `std::string` | `""` | Name for embedded model binary dump file | +| `KEY_GNA_PRECISION` | `I16`/`I8` | `I16` | Hint to GNA plugin: preferred integer weight resolution for quantization | +| `KEY_PERF_COUNT` | `YES`/`NO` | `NO` | Turn on performance counters reporting | +| `KEY_GNA_LIB_N_THREADS` | 1-127 integer number | 1 | Sets the number of GNA accelerator library worker threads used for inference computation in software modes + +## How to Interpret Performance Counters + +As a result of collecting performance counters using `InferenceEngine::InferRequest::GetPerformanceCounts`, you can find various performance data about execution on GNA. +Returned map stores a counter description as a key, counter value is stored in the `realTime_uSec` field of the `InferenceEngineProfileInfo` structure. Current GNA implementation calculates counters for the whole utterance scoring and does not provide per-layer information. API allows to retrieve counter units in cycles, but they can be converted to seconds as follows: + +``` +seconds = cycles / frequency +``` + +Refer to the table below to learn about the frequency of Intel® GNA inside a particular processor. +Processor | Frequency of Intel® GNA +---|--- +Intel® Ice Lake processors| 400MHz +Intel® Core™ i3-8121U processor| 400MHz +Intel® Gemini Lake processors | 200MHz + +Performance counters provided for the time being: + +* Scoring request performance results + * Number of total cycles spent on scoring in hardware (including compute and memory stall cycles) + * Number of stall cycles spent in hardware + +## Multithreading Support in GNA Plugin + +The GNA plugin supports the following configuration parameters for multithreading management: + +* `KEY_GNA_LIB_N_THREADS` + + By default, the GNA plugin uses one worker thread for inference computations. This parameter allows you to create up to 127 threads for software modes. + +> **NOTE:** Multithreading mode does not guarantee the same computation order as the order of issuing. Additionally, in this case, software modes do not implement any serializations. + +## Network Batch Size + +Intel® GNA plugin supports the processing of context-windowed speech frames in batches of 1-8 frames in one +input blob using `InferenceEngine::ICNNNetwork::setBatchSize`. Increasing batch size only improves efficiency of `Fully Connected` layers. + +> **NOTE**: For networks with `Convolutional`, `LSTM`, or `Memory` layers, the only supported batch size is 1. + +## Compatibility with Heterogeneous Plugin + +Heterogeneous plugin was tested with the Intel® GNA as a primary device and CPU as a secondary device. To run inference of networks with layers unsupported by the GNA plugin (for example, Softmax), use the Heterogeneous plugin with the `HETERO:GNA,CPU` configuration. For the list of supported networks, see the [Supported Frameworks](#supported-frameworks). + +> **NOTE:** Due to limitation of the Intel® GNA backend library, heterogenous support is limited to cases where in the resulted sliced graph, only one subgraph is scheduled to run on GNA\_HW or GNA\_SW devices. + +## See Also + +* [Supported Devices](Supported_Devices.md) +* [Converting Model](../../MO_DG/prepare_model/convert_model/Converting_Model.md) +* [Convert model from Kaldi](../../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md) diff --git a/docs/IE_DG/supported_plugins/GPU_RemoteBlob_API.md b/docs/IE_DG/supported_plugins/GPU_RemoteBlob_API.md new file mode 100644 index 00000000000000..2518bb80d6b814 --- /dev/null +++ b/docs/IE_DG/supported_plugins/GPU_RemoteBlob_API.md @@ -0,0 +1,227 @@ +Remote Blob API of GPU Plugin {#openvino_docs_IE_DG_supported_plugins_GPU_RemoteBlob_API} +================================ + +The GPU plugin implementation of the `RemoteContext` and `RemoteBlob` interfaces supports GPU +pipeline developers who need video memory sharing and interoperability with existing native APIs +such as OpenCL\*, Microsoft DirectX\*, or VAAPI\*. +Using these interfaces allows to avoid any memory copy overhead when plugging the OpenVINO™ inference +into an existing GPU pipeline. It also enables OpenCL kernels participating in the pipeline to become +native buffer consumers or producers of the OpenVINO™ inference. +Since the GPU plugin works on top of the clDNN library, the functionality above is also implemented +using OpenCL and its sharing extensions provided by Intel®. + +There are two interoperability scenarios that are supported for the Remote Blob API: + +* GPU plugin context and memory objects can be constructed from low-level device, display, or memory +handles and used to create the OpenVINO™ `ExecutableNetwork` or `Blob` class. +* OpenCL context or buffer handles can be obtained from existing GPU plugin objects, and used in OpenCL processing. + +Class and function declarations for the API are defined in the following files: +* Windows\*: `gpu/gpu_context_api_ocl.hpp` and `gpu/gpu_context_api_dx.hpp` +* Linux\*: `gpu/gpu_context_api_ocl.hpp` and `gpu/gpu_context_api_va.hpp` + +The most common way to enable the interaction of your application with the Remote Blob API is to use user-side utility classes +and functions that consume or produce native handles directly. + +## Execution Context User-Side Wrappers + +GPU plugin classes that implement the `RemoteContext` interface are responsible for context sharing. +Obtaining a pointer to a context object is the first step of sharing pipeline objects. +The context object of the GPU plugin directly wraps OpenCL context, setting a scope for sharing +`ExecutableNetwork` and `RemoteBlob` objects. +To create such objects within user context, explicitly provide the context to the plugin using the +`make_shared_context()` overloaded function. Depending on the platform, the function accepts the +`cl_context` handle, the pointer to the `ID3D11Device` interface, or the `VADisplay` handle, and +returns a smart pointer to the `RemoteContext` plugin object. + +If you do not provide any user context, the plugin uses its default internal context. +The plugin attempts to use the same internal context object as long as plugin options are kept the same. +Therefore, all ExecutableNetwork objects created during this time share the same context. +Once the plugin options are changed, the internal context is replaced by the new one. + +To request the current default context of the plugin, call the `GetDefaultContext()` method of the core engine. +To request the internal context of the given `ExecutableNetwork`, use the `GetContext()` method. + +## Shared Blob User-Side Wrappers + +The classes that implement the `RemoteBlob` interface both are wrappers for native API +memory handles (which can be obtained from them at any moment) and act just like regular OpenVINO™ +`Blob` objects. + +Once you obtain the context, you can use it to compile a new `ExecutableNetwork` or create `RemoteBlob` +objects. +For network compilation, use a dedicated flavor of `LoadNetwork()`, which accepts the context as an +additional parameter. + +To create a shared blob from a native memory handle, use `make_shared_blob()` overloaded functions +that can accept the `cl::Buffer`, `cl::Image2D`, `cl_mem` handles, and either `ID3D11Buffer`, +`ID3D11Texture2D` pointers or the `VASurfaceID` handle. +All `make_shared_blob()` flavors return a smart pointer to the `Blob` object, which can be directly +passed to the `SetBlob() `method of an inference request object. + +## Direct NV12 video surface input + +To support the direct consumption of a hardware video decoder output, plugin accepts two-plane video +surfaces as arguments for the `make_shared_blob_nv12()` function, which creates an `NV12Blob` object +and returns a smart pointer to it, which is cast to `Blob::Ptr`. + +To ensure that the plugin generates the correct execution graph for the NV12 dual-plane input, set +the `CLDNNConfigParams::KEY_CLDNN_NV12_TWO_INPUTS` plugin configuration flag to `PluginConfigParams::YES`. + +## Low-Level Methods and Their Parameter Description + +The high-level wrappers above bring a direct dependency on native APIs to the user program. +If you want to avoid the dependency, you still can directly use the `CreateContext()`, +`CreateBlob()`, and `getParams()` methods. +On this level, native handles are re-interpreted as void pointers and all arguments are passed +using `std::map` containers that are filled with `std::string, InferenceEngine::Parameter` pairs. +Two types of map entries are possible: descriptor and container. The first map entry is a +descriptor, which sets the expected structure and possible parameter values of the map. + +**Parameter Map Entries** + +| Key Name | Description and Possible Parameter Values | +|----------------|---------------------------------------------------------------------| +| `CONTEXT_TYPE` | Describes the type of the shared context in a map. Can be `OCL` (for pure OpenCL context) or `VA_SHARED` (for context shared with a video decoding device). | +| `OCL_CONTEXT` | Contains the OpenCL context handle. | +| `VA_DEVICE` | Contains the native video decoding device handle. Can be `VADisplay` or `ID3D11Device` (a pointer). | +| `SHARED_MEM_TYPE` | Describes the type of the shared memory buffer in a map. Can be `OCL_BUFFER` (clBuffer), `OCL_IMAGE2D` (clImage2D), `VA_SURFACE()`, or `DX_BUFFER`. | +| `MEM_HANDLE` | Contains the OpenCL memory handle. | +| `DEV_OBJECT_HANDLE` | Contains the native video decoder surface handle. | +| `VA_PLANE` | Contains the NV12 video decoder surface plane index. Can be `0` or `1`. | + +> **NOTE**: To initialize the entry key and value, use the `GPU_PARAM_KEY()` or `GPU_PARAM_VALUE()` macro. + +## Examples + +Refer to the sections below to see pseudo-code of usage examples. + +> **NOTE**: For low-level parameter usage examples, see the source code of user-side wrappers from the include files mentioned above. + +### OpenCL Kernel Execution on a Shared Buffer + +This example uses the OpenCL context obtained from an executable network object. + +```cpp +#define CL_HPP_MINIMUM_OPENCL_VERSION 120 +#define CL_HPP_TARGET_OPENCL_VERSION 120 + +#include +#include + +... + +// initialize the plugin and load the network +InferenceEngine::Core ie; +auto exec_net = ie.LoadNetwork(net, "GPU", config); + +// obtain the RemoteContext pointer from the executable network object +auto cldnn_context = exec_net.GetContext(); +// obtain the OpenCL context handle from the RemoteContext, +// get device info and create a queue +cl::Context ctx = std::dynamic_pointer_cast(cldnn_context); +_device = cl::Device(_context.getInfo()[0].get(), true); +cl::CommandQueue _queue; +cl_command_queue_properties props = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE; +_queue = cl::CommandQueue(_context, _device, props); + +// create the OpenCL buffer within the obtained context +cl::Buffer shared_buffer(ctx, CL_MEM_READ_WRITE, image_size * num_channels, NULL, &err); +// wrap the buffer into RemoteBlob +auto shared_blob = gpu::make_shared_blob(input_info->getTensorDesc(), cldnn_context, shared_buffer); + +... +// execute user kernel +cl::Kernel kernel(program, kernelName.c_str()); +kernel.setArg(0, shared_buffer); +queue.enqueueNDRangeKernel(kernel, + cl::NDRange(0), + cl::NDRange(image_size), + cl::NDRange(1), + 0, // wait events * + &profileEvent); +queue.finish(); +... + +// pass results to the inference +inf_req_shared.SetBlob(input_name, shared_blob); +inf_req_shared.Infer(); + +``` + +### Running GPU Plugin Inference within User-Supplied Shared Context + +```cpp +#define CL_HPP_MINIMUM_OPENCL_VERSION 120 +#define CL_HPP_TARGET_OPENCL_VERSION 120 + +#include +#include + +... + +cl::Context ctx = get_my_OpenCL_context(); + +// share the context with GPU plugin and compile ExecutableNetwork +auto remote_context = gpu::make_shared_context(ie, "GPU", ocl_instance->_context.get()); +auto exec_net_shared = ie.LoadNetwork(net, remote_context); +auto inf_req_shared = exec_net_shared.CreateInferRequest(); + +... +// do OpenCL processing stuff +... + +// run the inference +inf_req_shared.Infer(); + +``` +### Direct Consuming of the NV12 VAAPI Video Decoder Surface on Linux + +```cpp +#include +#include + +... + +// initialize the objects +CNNNetwork network = ie.ReadNetwork(xmlFileName, binFileName); + +... + +auto inputInfoItem = *inputInfo.begin(); +inputInfoItem.second->setPrecision(Precision::U8); +inputInfoItem.second->setLayout(Layout::NCHW); +inputInfoItem.second->getPreProcess().setColorFormat(ColorFormat::NV12); + +VADisplay disp = get_VA_Device(); +// create the shared context object +auto shared_va_context = gpu::make_shared_context(ie, "GPU", disp); +// compile network within a shared context +ExecutableNetwork executable_network = ie.LoadNetwork(network, + shared_va_context, + { { CLDNNConfigParams::KEY_CLDNN_NV12_TWO_INPUTS, + PluginConfigParams::YES } }); + +// decode/inference loop +for (int i = 0; i < nframes; i++) { + ... + // execute decoding and obtain decoded surface handle + decoder.DecodeFrame(); + VASurfaceID va_surface = decoder.get_VA_output_surface(); + ... + //wrap decoder output into RemoteBlobs and set it as inference input + auto nv12_blob = gpu::make_shared_blob_nv12(ieInHeight, + ieInWidth, + shared_va_context, + va_surface + ); + inferRequests[currentFrame].SetBlob(input_name, nv12_blob); + inferRequests[currentFrame].StartAsync(); + inferRequests[prevFrame].Wait(InferenceEngine::IInferRequest::WaitMode::RESULT_READY); +} +``` + +## See Also + +* InferenceEngine::Core +* InferenceEngine::RemoteBlob diff --git a/docs/IE_DG/supported_plugins/HDDL.md b/docs/IE_DG/supported_plugins/HDDL.md new file mode 100644 index 00000000000000..cc53925558e25e --- /dev/null +++ b/docs/IE_DG/supported_plugins/HDDL.md @@ -0,0 +1,39 @@ +# HDDL Plugin {#openvino_docs_IE_DG_supported_plugins_HDDL} + +## Introducing HDDL Plugin + +The Inference Engine HDDL plugin is developed for inference of neural networks on Intel® Vision Accelerator Design with Intel® Movidius™ VPUs which is designed for use cases those require large throughput of deep learning inference. It provides dozens amount of throughput as MYRIAD Plugin. + +## Installation on Linux* OS + +For installation instructions, refer to the [Installation Guide for Linux\*](VPU.md). + +## Installation on Windows* OS + +For installation instructions, refer to the [Installation Guide for Windows\*](Supported_Devices.md). + +## Supported networks + +For the "Supported Networks", please reference to [MYRIAD Plugin](MYRIAD.md) + +## Supported Configuration Parameters + +See VPU common configuration parameters for the [VPU Plugins](VPU.md). +When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix. + +In addition to common parameters for Myriad plugin and HDDL plugin, HDDL plugin accepts the following options: + +| Parameter Name | Parameter Values | Default | Description | +| :--- | :--- | :--- | :--- | +| KEY_PERF_COUNT | YES/NO | NO | Enable performance counter option. | +| KEY_VPU_HDDL_GRAPH_TAG | string | empty string | Allows to execute network on specified count of devices. | +| KEY_VPU_HDDL_STREAM_ID | string | empty string | Allows to execute inference on a specified device. | +| KEY_VPU_HDDL_DEVICE_TAG | string | empty string | Allows to allocate/deallocate networks on specified devices. | +| KEY_VPU_HDDL_BIND_DEVICE | YES/NO | NO | Whether the network should bind to a device. Refer to vpu_plugin_config.hpp. | +| KEY_VPU_HDDL_RUNTIME_PRIORITY | singed int | 0 | Specify the runtime priority of a device among all devices that running a same network Refer to vpu_plugin_config.hpp. | + +## See Also + +* [Supported Devices](Supported_Devices.md) +* [VPU Plugins](VPU.md) +* [MYRIAD Plugin](MYRIAD.md) diff --git a/docs/IE_DG/supported_plugins/HETERO.md b/docs/IE_DG/supported_plugins/HETERO.md new file mode 100644 index 00000000000000..6648150be614a9 --- /dev/null +++ b/docs/IE_DG/supported_plugins/HETERO.md @@ -0,0 +1,126 @@ +Heterogeneous Plugin {#openvino_docs_IE_DG_supported_plugins_HETERO} +======= + +## Introducing Heterogeneous Plugin + +The heterogeneous plugin enables computing for inference on one network on several devices. +Purposes to execute networks in heterogeneous mode +* To utilize accelerators power and calculate heaviest parts of network on accelerator and execute not supported layers on fallback devices like CPU +* To utilize all available hardware more efficiently during one inference + +The execution through heterogeneous plugin can be divided to two independent steps: +* Setting of affinity to layers (binding them to devices in InferenceEngine::ICNNNetwork) +* Loading a network to the Heterogeneous plugin, splitting the network to parts, and executing them through the plugin + +These steps are decoupled. The setting of affinity can be done automatically using fallback policy or in manual mode. + +The fallback automatic policy means greedy behavior and assigns all layers which can be executed on certain device on that device follow priorities. + +Some of the topologies are not friendly to heterogeneous execution on some devices or cannot be executed in such mode at all. +Example of such networks might be networks having activation layers which are not supported on primary device. +If transmitting of data from one part of network to another part in heterogeneous mode takes relatively much time, +then it is not much sense to execute them in heterogeneous mode on these devices. +In this case you can define heaviest part manually and set affinity thus way to avoid sending of data back and forth many times during one inference. + +## Annotation of Layers per Device and Default Fallback Policy +Default fallback policy decides which layer goes to which device automatically according to the support in dedicated plugins (FPGA, GPU, CPU, MYRIAD). + +Another way to annotate a network is setting affinity manually using CNNLayer::affinity field. This field accepts string values of devices like "CPU" or "FPGA". + +The fallback policy does not work if even one layer has an initialized affinity. The sequence should be calling of automating affinity settings and then fix manually. +```cpp +InferenceEngine::Core core +auto network = core.ReadNetwork("Model.xml"); + +// This example demonstrates how to perform default affinity initialization and then +// correct affinity manually for some layers +const std::string device = "HETERO:FPGA,CPU"; + +// QueryNetworkResult object contains map layer -> device +InferenceEngine::QueryNetworkResult res = core.QueryNetwork(network, device, { }); + +// update default affinities +res.supportedLayersMap["layerName"] = "CPU"; + +// set affinities to network +for (auto && layer : res.supportedLayersMap) { + network.getLayerByName(layer->first)->affinity = layer->second; +} + +// load network with affinities set before +auto executable_network = core.LoadNetwork(network, device); +``` + +If you rely on the default affinity distribution, you can avoid calling InferenceEngine::Core::QueryNetwork and just call InferenceEngine::Core::LoadNetwork instead: +```cpp +InferenceEngine::Core core +auto network = core.ReadNetwork("Model.xml"); +auto executable_network = core.LoadNetwork(network, "HETERO:FPGA,CPU"); +``` + + +## Details of Splitting Network and Execution +During loading of the network to heterogeneous plugin, network is divided to separate parts and loaded to dedicated plugins. +Intermediate blobs between these sub graphs are allocated automatically in the most efficient way. + +## Execution Precision +Precision for inference in heterogeneous plugin is defined by +* Precision of IR. +* Ability of final plugins to execute in precision defined in IR + +Examples: +* If you want to execute GPU with CPU fallback with FP16 on GPU, you need to use only FP16 IR. +Weight are converted from FP16 to FP32 automatically for execution on CPU by heterogeneous plugin automatically. +* If you want to execute on FPGA with CPU fallback, you can use any precision for IR. The execution on FPGA is defined by bitstream, +the execution on CPU happens in FP32. + +Samples can be used with the following command: + +```sh +./object_detection_sample_ssd -m /ModelSSD.xml -i /picture.jpg -d HETERO:FPGA,CPU +``` +where: +- `HETERO` stands for heterogeneous plugin +- `FPGA,CPU` points to fallback policy with priority on FPGA and fallback to CPU + +You can point more than two devices: `-d HETERO:FPGA,GPU,CPU` + +## Analyzing Heterogeneous Execution +After enabling of KEY_HETERO_DUMP_GRAPH_DOT config key, you can dump GraphViz* `.dot` files with annotations of devices per layer. + +Heterogeneous plugin can generate two files: +* `hetero_affinity_.dot` - annotation of affinities per layer. This file is written to the disk only if default fallback policy was executed +* `hetero_subgraphs_.dot` - annotation of affinities per graph. This file is written to the disk during execution of ICNNNetwork::LoadNetwork() for heterogeneous plugin + +```cpp +#include "ie_plugin_config.hpp" +#include "hetero/hetero_plugin_config.hpp" +using namespace InferenceEngine::PluginConfigParams; +using namespace InferenceEngine::HeteroConfigParams; + +... +InferenceEngine::Core core; +core.SetConfig({ { KEY_HETERO_DUMP_GRAPH_DOT, YES } }, "HETERO"); +``` + +You can use GraphViz* utility or converters to `.png` formats. On Ubuntu* operating system, you can use the following utilities: +* `sudo apt-get install xdot` +* `xdot hetero_subgraphs.dot` + + +You can use performance data (in samples, it is an option `-pc`) to get performance data on each subgraph. + +Here is an example of the output: for Googlenet v1 running on FPGA with fallback to CPU: +```cpp +subgraph1: 1. input preprocessing (mean data/FPGA):EXECUTED layerType: realTime: 129 cpu: 129 execType: +subgraph1: 2. input transfer to DDR:EXECUTED layerType: realTime: 201 cpu: 0 execType: +subgraph1: 3. FPGA execute time:EXECUTED layerType: realTime: 3808 cpu: 0 execType: +subgraph1: 4. output transfer from DDR:EXECUTED layerType: realTime: 55 cpu: 0 execType: +subgraph1: 5. FPGA output postprocessing:EXECUTED layerType: realTime: 7 cpu: 7 execType: +subgraph1: 6. copy to IE blob:EXECUTED layerType: realTime: 2 cpu: 2 execType: +subgraph2: out_prob: NOT_RUN layerType: Output realTime: 0 cpu: 0 execType: unknown +subgraph2: prob: EXECUTED layerType: SoftMax realTime: 10 cpu: 10 execType: ref +Total time: 4212 microseconds +``` +## See Also +* [Supported Devices](Supported_Devices.md) diff --git a/docs/IE_DG/supported_plugins/MULTI.md b/docs/IE_DG/supported_plugins/MULTI.md new file mode 100644 index 00000000000000..4d382ecfa64b5b --- /dev/null +++ b/docs/IE_DG/supported_plugins/MULTI.md @@ -0,0 +1,149 @@ +# Multi-Device Plugin {#openvino_docs_IE_DG_supported_plugins_MULTI} + +## Introducing Multi-Device Execution + +Multi-Device plugin automatically assigns inference requests to available computational devices to execute the requests in parallel. +Potential gains are as follows +* Improved throughput that multiple devices can deliver (compared to single-device execution) +* More consistent performance, since the devices can now share the inference burden +(so that if one device is becoming too busy, another device can take more of the load) + +Notice that with multi-device the application logic left unchanged, so you don't need to explicitly load the network to every device, +create and balance the inference requests and so on. From the application point of view, this is just another device that handles the actual machinery. +The only thing that is required to leverage performance is to provide the multi-device (and hence the underlying devices) with enough inference requests to crunch. +For example if you were processing 4 cameras on the CPU (with 4 inference requests), you may now want to process more cameras (with more requests in flight) +to keep CPU+GPU busy via multi-device. + +The "setup" of multi-device can be described in three major steps: +* First is configuration of each device as usual (e.g. via conventional SetConfig method) +* Second is loading of a network to the Multi-Device plugin created on top of (prioritized) list of the configured devices. This is the only change that you need in your application. +* Finally, just like with any other ExecutableNetwork (resulted from LoadNetwork) you just create as many requests as needed to saturate the devices. +These steps are covered below in details. + +## Defining and Configuring the Multi-Device +Following the OpenVINO notions of "devices", the multi-device has a "MULTI" name. +The only configuration option for the multi-device is prioritized list of devices to use: + +| Parameter name | Parameter values | Default | Description | +| :--- | :--- | :--- | :----------------------------------------------------------------------------------------------------------------------------| +| "MULTI_DEVICE_PRIORITIES" | comma-separated device names with no spaces| N/A | Prioritized list of devices | + +You can use name of the configuration directly as a string, or use MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES from the multi/multi_device_config.hpp that defines the same string. + +Basically, there are three ways to specify the devices to be use by the "MULTI": +```cpp + Core ie; + //NEW IE-CENTRIC API, the "MULTI" plugin is (globally) pre-configured with the explicit option: + ie.SetConfig({{"MULTI_DEVICE_PRIORITIES", "HDDL,GPU"}}, "MULTI"); + ExecutableNetwork exec0 = ie.LoadNetwork(network, "MULTI", {}); + + //NEW IE-CENTRIC API, configuration of the "MULTI" is part of the network configuration (and hence specific to the network): + ExecutableNetwork exec1 = ie.LoadNetwork(network, "MULTI", {{"MULTI_DEVICE_PRIORITIES", "HDDL,GPU"}}); + //NEW IE-CENTRIC API, same as previous, but configuration of the "MULTI" is part of the name (so config is empty), also network-specific: + ExecutableNetwork exec2 = ie.LoadNetwork(network, "MULTI:HDDL,GPU", {}); +``` +Notice that the priorities of the devices can be changed in real-time for the executable network: +```cpp + Core ie; + ExecutableNetwork exec = ie.LoadNetwork(network, "MULTI:HDDL,GPU", {}); + //... + exec.SetConfig({{"MULTI_DEVICE_PRIORITIES", "GPU,HDDL"}}); + // you can even exclude some device + exec.SetConfig({{"MULTI_DEVICE_PRIORITIES", "GPU"}}); + //... + // and then return it back + exec.SetConfig({{"MULTI_DEVICE_PRIORITIES", "GPU,HDDL"}}); + //but you cannot add new devices on the fly, the next line will trigger the following exception: + //[ ERROR ] [NOT_FOUND] You can only change device priorities but not add new devices with the Network's SetConfig(MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES. + //CPU device was not in the original device list! + exec.SetConfig({{"MULTI_DEVICE_PRIORITIES", "CPU,GPU,HDDL"}}); +``` +Finally, there is a way to specify number of requests that the multi-device will internally keep for each device. +Say if your original app was running 4 cameras with 4 inference requests now you would probably want to share these 4 requests between 2 devices used in the MULTI. The easiest way is to specify a number of requests for each device using parentheses: "MULTI:CPU(2),GPU(2)" and use the same 4 requests in your app. However, such an explicit configuration is not performance portable and hence not recommended. Instead, the better way is to configure the individual devices and query the resulting number of requests to be used in the application level (see [Configuring the Individual Devices and Creating the Multi-Device On Top](#configuring-the-individual-devices-and-creating-the-multi-device-on-top)). + +## Enumerating Available Devices +Inference Engine now features a dedicated API to enumerate devices and their capabilities. See [Hello Query Device C++ Sample](../../../inference-engine/samples/hello_query_device/README.md). This is example output of the sample (truncated to the devices' names only): + +```sh +./hello_query_device +Available devices: + Device: CPU +... + Device: GPU +... + Device: HDDL +``` +Simple programmatic way to enumerate the devices and use with the multi-device is as follows: +```cpp + Core ie; + std::string allDevices = "MULTI:"; + std::vector availableDevices = ie.GetAvailableDevices(); + for (auto && device : availableDevices) { + allDevices += device; + allDevices += ((device == availableDevices[availableDevices.size()-1]) ? "" : ","); + } + ExecutableNetwork exeNetwork = ie.LoadNetwork(cnnNetwork, allDevices, {}); +``` +Beyond trivial "CPU", "GPU", "HDDL" and so on, when multiple instances of a device are available the names are more qualified. +For example this is how two Intel® Movidius™ Myriad™ X sticks are listed with the hello_query_sample: +``` +... + Device: MYRIAD.1.2-ma2480 +... + Device: MYRIAD.1.4-ma2480 +``` +So the explicit configuration to use both would be "MULTI:MYRIAD.1.2-ma2480,MYRIAD.1.4-ma2480". +Accordingly, the code that loops over all available devices of "MYRIAD" type only is below: +```cpp + Core ie; + std::string allDevices = "MULTI:"; + std::vector myriadDevices = ie->GetMetric("MYRIAD", METRIC_KEY(myriadDevices))); + for (int i = 0; i < myriadDevices.size(); ++i) { + allDevices += std::string("MYRIAD.") + + myriadDevices[i] + + std::string(i < (myriadDevices.size() -1) ? "," : ""); + } + ExecutableNetwork exeNetwork = ie.LoadNetwork(cnnNetwork, allDevices, {}); +``` + + +## Configuring the Individual Devices and Creating the Multi-Device On Top +As discussed in the first section, you shall configure each individual device as usual and then just create the "MULTI" device on top: +```cpp +#include +// configure the HDDL device first +Core ie; +ie.SetConfig(hddl_config, "HDDL"); +// configure the GPU device +ie.SetConfig(gpu_config, "GPU"); +// load the network to the multi-device, while specifying the configuration (devices along with priorities): +ExecutableNetwork exeNetwork = ie.LoadNetwork(cnnNetwork, "MULTI", {{MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES, "HDDL,GPU"}}); +// new metric allows to query the optimal number of requests: +uint32_t nireq = exeNetwork.GetMetric(METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)).as(); +``` +Alternatively, you can combine all the individual device settings into single config and load that, allowing the multi-device plugin to parse and apply that to the right devices. See code example in the next section. + +Notice that while the performance of accelerators combines really well with multi-device, the CPU+GPU execution poses some performance caveats, as these devices share the power, bandwidth and other resources. For example it is recommended to enable the GPU throttling hint (which save another CPU thread for the CPU inference). +See section of the [Using the multi-device with OpenVINO samples and benchmarking the performance](#using-the-multi-device-with-openvino-samples-and-benchmarking-the-performance) below. + +## Querying the Optimal Number of Inference Requests +Notice that until R2 you had to calculate number of requests in your application for any device, e.g. you had to know that Intel® Vision Accelerator Design with Intel® Movidius™ VPUs required at least 32 inference requests to perform well. Now you can use the new GetMetric API to query the optimal number of requests. Similarly, when using the multi-device you don't need to sum over included devices yourself, you can query metric directly: +```cpp +// 'device_name' can be "MULTI:HDDL,GPU" to configure the multi-device to use HDDL and GPU +ExecutableNetwork exeNetwork = ie.LoadNetwork(cnnNetwork, device_name, full_config); +// new metric allows to query the optimal number of requests: +uint32_t nireq = exeNetwork.GetMetric(METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)).as(); +``` + +## Using the Multi-Device with OpenVINO Samples and Benchmarking the Performance +Notice that every OpenVINO sample that supports "-d" (which stays for "device") command-line option transparently accepts the multi-device. +The [Benchmark Application](../../../inference-engine/samples/benchmark_app/README.md) is the best reference to the optimal usage of the multi-device. As discussed multiple times earlier, you don't need to setup number of requests, CPU streams or threads as the application provides optimal out of the box performance. +Below is example command-line to evaluate HDDL+GPU performance with that: +```bash +$ ./benchmark_app –d MULTI:HDDL,GPU –m -i -niter 1000 +``` +Notice that you can use the FP16 IR to work with multi-device (as CPU automatically upconverts it to the fp32) and rest of devices support it naturally. +Also notice that no demos are (yet) fully optimized for the multi-device, by means of supporting the OPTIMAL_NUMBER_OF_INFER_REQUESTS metric, using the GPU streams/throttling, and so on. + +## See Also +* [Supported Devices](Supported_Devices.md) diff --git a/docs/IE_DG/supported_plugins/MYRIAD.md b/docs/IE_DG/supported_plugins/MYRIAD.md new file mode 100644 index 00000000000000..5fbee431ee1c92 --- /dev/null +++ b/docs/IE_DG/supported_plugins/MYRIAD.md @@ -0,0 +1,89 @@ +# MYRIAD Plugin {#openvino_docs_IE_DG_supported_plugins_MYRIAD} + +## Introducing MYRIAD Plugin + +The Inference Engine MYRIAD plugin is developed for inference of neural networks on Intel® Movidius™ Neural Compute Stick and Intel® Neural Compute Stick 2. + +## Installation on Linux* OS + +For installation instructions, refer to the [Installation Guide for Linux*](../../../inference-engine/samples/benchmark_app/README.md). + +## Installation on Windows* OS + +For installation instructions, refer to the [Installation Guide for Windows*](../../../inference-engine/samples/benchmark_app/README.md). + +## Supported networks + +The Inference Engine MYRIAD plugin supports the following networks: + +**Caffe\***: +* AlexNet +* CaffeNet +* GoogleNet (Inception) v1, v2, v4 +* VGG family (VGG16, VGG19) +* SqueezeNet v1.0, v1.1 +* ResNet v1 family (18\*\* \*\*\*, 50, 101, 152) +* MobileNet (mobilenet-v1-1.0-224, mobilenet-v2) +* Inception ResNet v2 +* DenseNet family\*\* (121,161,169,201) +* SSD-300, SSD-512, SSD-MobileNet, SSD-GoogleNet, SSD-SqueezeNet + +**TensorFlow\***: +* AlexNet +* Inception v1, v2, v3, v4 +* Inception ResNet v2 +* MobileNet v1, v2 +* ResNet v1 family (50, 101, 152) +* ResNet v2 family (50, 101, 152) +* SqueezeNet v1.0, v1.1 +* VGG family (VGG16, VGG19) +* Yolo family (yolo-v2, yolo-v3, tiny-yolo-v1, tiny-yolo-v2, tiny-yolo-v3) +* faster_rcnn_inception_v2, faster_rcnn_resnet101 +* ssd_mobilenet_v1 +* DeepLab-v3+ + +**MXNet\***: +* AlexNet and CaffeNet +* DenseNet family\*\* (121,161,169,201) +* SqueezeNet v1.1 +* MobileNet v1, v2 +* NiN +* ResNet v1 (101, 152) +* ResNet v2 (101) +* SqueezeNet v1.1 +* VGG family (VGG16, VGG19) +* SSD-Inception-v3, SSD-MobileNet, SSD-ResNet-50, SSD-300 + +\*\* Network is tested on Intel® Movidius™ Neural Compute Stick with BatchNormalization fusion optimization disabled during Model Optimizer import + +\*\*\* Network is tested on Intel® Neural Compute Stick 2 with BatchNormalization fusion optimization disabled during Model Optimizer import + +## Supported Configuration Parameters + +See VPU common configuration parameters for the [VPU Plugins](VPU.md). +When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix. + +In addition to common parameters, the MYRIAD plugin accepts the following options: + +| Parameter Name | Parameter Values | Default | Description | +| :--- | :--- | :--- | :--- | +| `KEY_VPU_MYRIAD_PLATFORM` | empty string/`VPU_MYRIAD_2450`/`VPU_MYRIAD_2480` | empty string | If set, the plugin will use a device with specific platform to allocate a network. | +| `KEY_VPU_MYRIAD_PROTOCOL` | empty string/`VPU_MYRIAD_USB`/`VPU_MYRIAD_PCIE` | empty string | If set, the plugin will use a device with specific protocol to allocate a network. | +| `KEY_VPU_MYRIAD_FORCE_RESET` | `YES`/`NO` | `NO` | Enables force reset of all booted devices when new ExecutableNetwork is created.
This is a plugin scope option and must be used with the plugin's SetConfig method only.
See Device allocation section for details. | +| `KEY_VPU_PLATFORM` | empty string/`VPU_2450`/`VPU_2480` | empty string | **Deprecated** Use `KEY_VPU_MYRIAD_PLATFORM` instead.
If set, the plugin will use a device with specific platform to allocate a network. | +| `KEY_VPU_FORCE_RESET` | `YES`/`NO` | `NO` | **Deprecated** Use `KEY_VPU_MYRIAD_FORCE_RESET` instead.
Enables force reset of all booted devices when new ExecutableNetwork is created.
This is a plugin scope option and must be used with the plugin's SetConfig method only.
See Device allocation section for details. | + +## Device allocation   + +Each `IExecutableNetwork` instance tries to allocate new device on `InferenceEngine::Core::LoadNetwork`, but if all available devices are already allocated it will use the one with the minimal number of uploaded networks. +The maximum number of networks single device can handle depends on device memory capacity and the size of the networks. + +If `KEY_VPU_MYRIAD_FORCE_RESET` option is set to `YES` the plugin will reset all VPU devices in the system. + +Single device cannot be shared across multiple processes. + +## See Also + +* [Supported Devices](Supported_Devices.md) +* [VPU Plugins](VPU.md) +* [Intel® Neural Compute Stick 2 Get Started](https://software.intel.com/en-us/neural-compute-stick/get-started) diff --git a/docs/IE_DG/supported_plugins/Supported_Devices.md b/docs/IE_DG/supported_plugins/Supported_Devices.md new file mode 100644 index 00000000000000..7e4111837a14bb --- /dev/null +++ b/docs/IE_DG/supported_plugins/Supported_Devices.md @@ -0,0 +1,263 @@ +Supported Devices {#openvino_docs_IE_DG_supported_plugins_Supported_Devices} +================== + +The Inference Engine can infer models in different formats with various input and output formats. This section provides supported and optimal configurations per device. + +> **NOTE**: With OpenVINO™ 2020.4 release, Intel® Movidius™ Neural Compute Stick is no longer supported. + +The Inference Engine provides unique capabilities to infer deep learning models on the following device types with corresponding plugins: + +| Plugin | Device types | +|------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------| +|[GPU plugin](CL_DNN.md) |Intel® Processor Graphics, including Intel® HD Graphics and Intel® Iris® Graphics | +|[CPU plugin](CPU.md) |Intel® Xeon® with Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX-512), and AVX512_BF16, Intel® Core™ Processors with Intel® AVX2, Intel® Atom® Processors with Intel® Streaming SIMD Extensions (Intel® SSE) | +|[FPGA plugin](FPGA.md) (available in the Intel® Distribution of OpenVINO™ toolkit) |Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Speed Grade 2), Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | +|[VPU plugins](VPU.md) (available in the Intel® Distribution of OpenVINO™ toolkit) |Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X, Intel® Vision Accelerator Design with Intel® Movidius™ VPUs | +|[GNA plugin](GNA.md) (available in the Intel® Distribution of OpenVINO™ toolkit) |Intel® Speech Enabling Developer Kit, Amazon Alexa* Premium Far-Field Developer Kit, Intel® Pentium® Silver J5005 Processor, Intel® Pentium® Silver N5000 Processor, Intel® Celeron® J4005 Processor, Intel® Celeron® J4105 Processor, Intel® Celeron® Processor N4100, Intel® Celeron® Processor N4000, Intel® Core™ i3-8121U Processor, Intel® Core™ i7-1065G7 Processor, Intel® Core™ i7-1060G7 Processor, Intel® Core™ i5-1035G4 Processor, Intel® Core™ i5-1035G7 Processor, Intel® Core™ i5-1035G1 Processor, Intel® Core™ i5-1030G7 Processor, Intel® Core™ i5-1030G4 Processor, Intel® Core™ i3-1005G1 Processor, Intel® Core™ i3-1000G1 Processor, Intel® Core™ i3-1000G4 Processor| +|[Multi-Device plugin](MULTI.md) |Multi-Device plugin enables simultaneous inference of the same network on several Intel® devices in parallel | +|[Heterogeneous plugin](HETERO.md) |Heterogeneous plugin enables automatic inference splitting between several Intel® devices (for example if a device doesn't [support certain layers](#supported-layers)). | + +## Supported Configurations + +The Inference Engine can inference models in different formats with various input and output formats. +This chapter provides supported and optimal configurations for each plugin. + +### Terminology + +| Acronym/Term | Description | +| :-----------------| :---------------------------------------------| +| DL | Deep Learning | +| FP32 format | Single-precision floating-point format | +| BF16 format | Brain floating-point format | +| FP16 format | Half-precision floating-point format | +| I16 format | 2-byte signed integer format | +| I8 format | 1-byte signed integer format | +| U16 format | 2-byte unsigned integer format | +| U8 format | 1-byte unsigned integer format | + +NHWC, NCHW - Image data layout. Refers to the representation of batches of images. +NCDHW - Images sequence data layout. + +* N - Number of images in a batch +* D - Depth. Depend on model it could be spatial or time dimension +* H - Number of pixels in the vertical dimension +* W - Number of pixels in the horizontal dimension +* C - Number of channels + +CHW, NC, C - Tensor memory layout. +For example, the CHW value at index (c,h,w) is physically located at index (c\*H+h)\*W+w, for others by analogy + +### Supported Model Formats + +|Plugin |FP32 |FP16 |I8 | +|:-------------|:----------------------:|:----------------------:|:----------------------:| +|CPU plugin |Supported and preferred |Supported |Supported | +|GPU plugin |Supported |Supported and preferred |Supported\* | +|FPGA plugin |Supported |Supported |Not supported | +|VPU plugins |Not supported |Supported |Not supported | +|GNA plugin |Supported |Supported |Not supported | +
\* - currently, only limited set of topologies might benefit from enabling I8 model on GPU
+For [Multi-Device](MULTI.md) and [Heterogeneous](HETERO.md) execution +the supported models formats depends on the actual underlying devices. _Generally, FP16 is preferable as it is most ubiquitous and performant_. + +### Supported Input Precision + +|Plugin |FP32 |FP16 |U8 |U16 |I8 |I16 | +|:-------------|:--------:|:-------------:|:-------------:|:-------------:|:------------:|:-------------:| +|CPU plugin |Supported |Not supported |Supported |Supported |Not supported |Supported | +|GPU plugin |Supported |Supported\* |Supported\* |Supported\* |Not supported |Supported\* | +|FPGA plugin |Supported |Supported\* |Supported |Supported |Not supported |Supported | +|VPU plugins |Supported |Supported |Supported |Not supported |Not supported |Not supported | +|GNA plugin |Supported |Not supported |Supported |Not supported |Supported |Supported | + +
\* - Supported via `SetBlob` only, `GetBlob` returns FP32
+For [Multi-Device](MULTI.md) and [Heterogeneous](HETERO.md) execution +the supported input precision depends on the actual underlying devices. _Generally, U8 is preferable as it is most ubiquitous_. + +### Supported Output Precision + +|Plugin |FP32 |FP16 | +|:-------------|:--------:|:------------:| +|CPU plugin |Supported |Not supported | +|GPU plugin |Supported |Supported | +|FPGA plugin |Supported |Supported | +|VPU plugins |Supported |Supported | +|GNA plugin |Supported |Not supported | +For [Multi-Device](MULTI.md) and [Heterogeneous](HETERO.md) execution +the supported output precision depends on the actual underlying devices. _Generally, FP32 is preferable as it is most ubiquitous_. + +### Supported Input Layout + +|Plugin |NCDHW |NCHW |NHWC |NC | +|:-------------|:------------:|:------------:|:------------:|:------------:| +|CPU plugin |Supported |Supported |Supported |Supported | +|GPU plugin |Supported |Supported |Supported |Supported | +|FPGA plugin |Not supported |Supported |Supported |Not supported | +|VPU plugins |Not supported |Supported |Supported |Supported | +|GNA plugin |Not supported |Not supported |Not supported |Supported | + +### Supported Output Layout + +|Number of dimensions|5 |4 |3 |2 |1 | +|:-------------------|:---:|:---:|:---:|:---:|:---:| +|Layout |NCDHW|NCHW |CHW |NC |C | + +For setting relevant configuration, refer to the +[Integrate with Customer Application New Request API](../Integrate_with_customer_application_new_API.md) topic +(step 3 "Configure input and output"). + +### Supported Layers +The following layers are supported by the plugins and by [Shape Inference feature](../ShapeInference.md): + +| Layers | GPU | CPU | VPU | GNA | FPGA | ShapeInfer | +|:-------------------------------|:-------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------------:| +| Abs | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| Acos | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Acosh | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Activation-Clamp | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | +| Activation-ELU | Supported |Supported\*\*\*| Supported | Not Supported | Supported | Supported | +| Activation-Exp | Supported |Supported\*\*\*| Not Supported | Supported | Not Supported | Supported | +| Activation-Leaky ReLU | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | +| Activation-Not | Supported |Supported\*\*\*| Not Supported | Not Supported | Not Supported | Supported | +| Activation-PReLU | Supported |Supported\*\*\*| Supported | Not Supported | Supported | Supported | +| Activation-ReLU | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | +| Activation-ReLU6 | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Activation-Sigmoid/Logistic | Supported |Supported\*\*\*| Supported | Supported | Not Supported | Supported | +| Activation-TanH | Supported |Supported\*\*\*| Supported | Supported | Not Supported | Supported | +| ArgMax | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| Asin | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Asinh | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Atan | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Atanh | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| BatchNormalization | Supported | Supported | Supported | Not Supported | Supported\* | Supported | +| BinaryConvolution | Supported | Supported | Not Supported | Not Supported | Not Supported | Supported | +| Broadcast | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| Ceil | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Concat | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | +| Const | Supported | Supported | Supported | Supported | Not Supported | Not Supported | +| Convolution-Dilated | Supported | Supported | Supported | Not Supported | Supported | Supported | +| Convolution-Dilated 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| Convolution-Grouped | Supported | Supported | Supported | Not Supported | Supported | Supported | +| Convolution-Grouped 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| Convolution-Ordinary | Supported | Supported | Supported | Supported\* | Supported | Supported | +| Convolution-Ordinary 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| Cos | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Cosh | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Crop | Supported | Supported | Supported | Supported | Not Supported | Supported | +| CTCGreedyDecoder | Supported\*\* | Supported\*\* | Supported\* | Not Supported | Not Supported | Supported | +| Deconvolution | Supported | Supported | Supported | Not Supported | Supported\* | Supported | +| Deconvolution 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| DeformableConvolution | Supported | Supported | Not Supported | Not Supported | Not Supported | Supported | +| DepthToSpace | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| DetectionOutput | Supported | Supported\*\* | Supported\* | Not Supported | Not Supported | Supported | +| Eltwise-And | Supported |Supported\*\*\*| Not Supported | Not Supported | Not Supported | Supported | +| Eltwise-Add | Supported |Supported\*\*\*| Not Supported | Not Supported | Supported | Supported | +| Eltwise-Div | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-Equal | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-FloorMod | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-Greater | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-GreaterEqual | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-Less | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-LessEqual | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-LogicalAnd | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-LogicalOr | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-LogicalXor | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-Max | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-Min | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-Mul | Supported |Supported\*\*\*| Supported | Supported | Not Supported | Supported | +| Eltwise-NotEqual | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-Pow | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-Prod | Supported |Supported\*\*\*| Supported | Supported | Not Supported | Supported | +| Eltwise-SquaredDiff | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-Sub | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | +| Eltwise-Sum | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | +| Erf | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Exp | Supported | Supported | Not Supported | Supported | Not Supported | Supported | +| FakeQuantize | Not Supported | Supported | Not Supported | Not Supported | Not Supported | Supported | +| Fill | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Flatten | Supported | Supported | Supported | Not Supported | Not Supported | Supported | +| Floor | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| FullyConnected (Inner Product) | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | +| Gather | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| GatherTree | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Gemm | Supported | Supported | Supported | Not Supported | Not Supported | Supported | +| GRN | Supported\*\* | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| HardSigmoid | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Interp | Supported\*\* | Supported\*\* | Supported | Not Supported | Not Supported | Supported\* | +| Log | Supported | Supported\*\* | Supported | Supported | Not Supported | Supported | +| LRN (Norm) | Supported | Supported | Supported | Not Supported | Supported | Supported | +| LSTMCell | Supported | Supported | Supported | Supported | Not Supported | Not Supported | +| GRUCell | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| RNNCell | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| LSTMSequence | Supported | Supported | Supported | Not Supported | Not Supported | Not Supported | +| GRUSequence | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| RNNSequence | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| LogSoftmax | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Not Supported | +| Memory | Not Supported | Supported | Not Supported | Supported | Not Supported | Supported | +| MVN | Supported | Supported\*\* | Supported\* | Not Supported | Not Supported | Supported | +| Neg | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| NonMaxSuppression | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Normalize | Supported | Supported\*\* | Supported\* | Not Supported | Not Supported | Supported | +| OneHot | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Pad | Supported | Supported\*\* | Supported\* | Not Supported | Not Supported | Supported | +| Permute | Supported | Supported | Supported | Supported\* | Not Supported | Supported | +| Pooling(AVG,MAX) | Supported | Supported | Supported | Supported | Supported | Supported | +| Pooling(AVG,MAX) 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| Power | Supported | Supported\*\* | Supported | Supported\* | Supported\* | Supported | +| PowerFile | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Not Supported | +| PriorBox | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| PriorBoxClustered | Supported\*\* | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| Proposal | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| PSROIPooling | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| Range | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Reciprocal | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceAnd | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceL1 | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceL2 | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceLogSum | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceLogSumExp | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceMax | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceMean | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceMin | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceOr | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceProd | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceSum | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceSumSquare | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| RegionYolo | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| ReorgYolo | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| Resample | Supported | Supported\*\* | Supported | Not Supported | Supported\* | Supported | +| Reshape | Supported |Supported\*\*\*| Supported | Supported | Not Supported | Supported\* | +| ReverseSequence | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| RNN | Not Supported | Supported | Supported | Not Supported | Not Supported | Not Supported | +| ROIPooling | Supported\* | Supported | Supported | Not Supported | Not Supported | Supported | +| ScaleShift | Supported |Supported\*\*\*| Supported\* | Supported | Supported | Supported | +| ScatterUpdate | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Select | Supported | Supported | Supported | Not Supported | Not Supported | Supported | +| Selu | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ShuffleChannels | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Sign | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| Sin | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Sinh | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| SimplerNMS | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Slice | Supported |Supported\*\*\*| Supported | Supported | Supported\* | Supported | +| SoftMax | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Softplus | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Softsign | Supported | Supported\*\* | Not Supported | Supported | Not Supported | Supported | +| SpaceToDepth | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| SpatialTransformer | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Split | Supported |Supported\*\*\*| Supported | Supported | Supported\* | Supported | +| Squeeze | Supported | Supported\*\* | Supported | Supported | Not Supported | Supported | +| StridedSlice | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Tan | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| TensorIterator | Not Supported | Supported | Supported | Supported | Not Supported | Not Supported | +| Tile | Supported\*\* |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| TopK | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Unpooling | Supported | Not Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| Unsqueeze | Supported | Supported\*\* | Supported | Supported | Not Supported | Supported | +| Upsampling | Supported | Not Supported | Not Supported | Not Supported | Not Supported | Not Supported | + +\*- support is limited to the specific parameters. Refer to "Known Layers Limitation" section for the device [from the list of supported](Supported_Devices.md). + +\*\*- support is implemented via [Extensibility mechanism](../Extensibility_DG/Intro.md). + +\*\*\*- supports NCDHW layout. diff --git a/docs/IE_DG/supported_plugins/VPU.md b/docs/IE_DG/supported_plugins/VPU.md new file mode 100644 index 00000000000000..7c04290f7dd16d --- /dev/null +++ b/docs/IE_DG/supported_plugins/VPU.md @@ -0,0 +1,104 @@ +# VPU Plugins {#openvino_docs_IE_DG_supported_plugins_VPU} + +This chapter provides information on the Inference Engine plugins that enable inference of deep learning models on the supported VPU devices: + +* Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X — Supported by the [MYRIAD Plugin](MYRIAD.md) +* Intel® Vision Accelerator Design with Intel® Movidius™ VPUs — Supported by the [HDDL Plugin](HDDL.md) + +> **NOTE**: With OpenVINO™ 2020.4 release, Intel® Movidius™ Neural Compute Stick powered by the Intel® Movidius™ Myriad™ 2 is no longer supported. + +## Known Layers Limitations + +* `'ScaleShift'` layer is supported for zero value of `'broadcast'` attribute only. +* `'CTCGreedyDecoder'` layer works with `'ctc_merge_repeated'` attribute equal 1. +* `'DetectionOutput'` layer works with zero values of `'interpolate_orientation'` and `'num_orient_classes'` parameters only. +* `'MVN'` layer uses fixed value for `'eps'` parameters (1e-9). +* `'Normalize'` layer uses fixed value for `'eps'` parameters (1e-9) and is supported for zero value of `'across_spatial'` only. +* `'Pad'` layer works only with 4D tensors. + +## Optimizations + +VPU plugins support layer fusion and decomposition. + +### Layer Fusion + +#### Fusing Rules + +Certain layers can be merged into Convolution, ReLU, and Eltwise layers according to the patterns below: + +- Convolution + - Convolution + ReLU → Convolution + - Convolution + Clamp → Convolution + - Convolution + LeakyReLU → Convolution + - Convolution (3x3, stride=1, padding=1) + Pooling (2x2, stride=2, padding=0) → Convolution + +- Pooling + ReLU → Pooling + +- FullyConnected + ReLU → FullyConnected + +- Eltwise + - Eltwise + ReLU → Eltwise + - Eltwise + LeakyReLU → Eltwise + - Eltwise + Clamp → Eltwise + +#### Joining Rules + +> **NOTE**: Application of these rules depends on tensor sizes and resources available. + +Layers can be joined when the two conditions below are met: +- Layers are located on topologically independent branches. +- Layers can be executed simultaneously on the same hardware units. + +### Decomposition Rules + +- Convolution and Pooling layers are tiled resulting in the following pattern: + - A Split layer that splits tensors into tiles + - A set of tiles, optionally with service layers like Copy + - Depending on a tiling scheme, a Concatenation or Sum layer that joins all resulting tensors into one and restores the full blob that contains the result of a tiled operation + + Names of tiled layers contain the `@soc=M/N` part, where `M` is the tile number and `N` is the number of tiles: + ![](../img/yolo_tiny_v1.png) + +> **NOTE**: Nominal layers, such as Shrink and Expand, are not executed. + +> **NOTE**: VPU plugins can add extra layers like Copy. + + +## VPU Common Configuration Parameters + +The VPU plugins supports the configuration parameters listed below. +The parameters are passed as `std::map` on `InferenceEngine::Core::LoadNetwork` +or `InferenceEngine::Core::SetConfig`. +When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix. + +| Parameter Name | Parameter Values | Default | Description | +| :--- | :--- | :--- | :--- | +| `KEY_VPU_HW_STAGES_OPTIMIZATION` | `YES`/`NO` | `YES` | Turn on HW stages usage
Applicable for Intel Movidius Myriad X and Intel Vision Accelerator Design devices only. | +| `KEY_VPU_COMPUTE_LAYOUT` | `VPU_AUTO`, `VPU_NCHW`, `VPU_NHWC` | `VPU_AUTO` | Specify internal input and output layouts for network layers. | +| `KEY_VPU_PRINT_RECEIVE_TENSOR_TIME` | `YES`/`NO` | `NO` | Add device-side time spent waiting for input to PerformanceCounts.
See Data Transfer Pipelining section for details. | +| `KEY_VPU_IGNORE_IR_STATISTIC` | `YES`/`NO` | `NO` | VPU plugin could use statistic present in IR in order to try to improve calculations precision.
If you don't want statistic to be used enable this option. | +| `KEY_VPU_CUSTOM_LAYERS` | path to XML file | empty string | This option allows to pass XML file with custom layers binding.
If layer is present in such file, it would be used during inference even if the layer is natively supported. | + + +## Data Transfer Pipelining   + +MYRIAD plugin tries to pipeline data transfer to/from device with computations. +While one infer request is executed the data for next infer request can be uploaded to device in parallel. +Same applicable for result downloading. + +`KEY_VPU_PRINT_RECEIVE_TENSOR_TIME` configuration parameter can be used to check the efficiency of current pipelining. +The new record in performance counters will show the time that device spent waiting for input before starting the inference. +In perfect pipeline this time should be near to zero, which means that the data was already transferred when new inference started. + +## Troubleshooting + +**Get the following message when running inference with the VPU plugin: "[VPU] Cannot convert layer due to unsupported layer type "** + +This means that your topology has a layer that is unsupported by your target VPU plugin. To resolve this issue, you can implement the custom layer for the target device using the [Inference Engine Extensibility mechanism](../Extensibility_DG/Intro.md). Or, to quickly get a working prototype, you can use the heterogeneous scenario with the default fallback policy (see the [HETERO Plugin](HETERO.md) section). Use the HETERO plugin with a fallback device that supports this layer, for example, CPU: `HETERO:MYRIAD,CPU`. +For a list of VPU supported layers, see the Supported Layers section of the [Supported Devices](Supported_Devices.md) topic. + + +## See Also + +* [Supported Devices](Supported_Devices.md) +* [Intel® Neural Compute Stick 2 Get Started](https://software.intel.com/en-us/neural-compute-stick/get-started) diff --git a/docs/IE_PLUGIN_DG/AsyncInferRequest.md b/docs/IE_PLUGIN_DG/AsyncInferRequest.md new file mode 100644 index 00000000000000..8250c10b7dd60d --- /dev/null +++ b/docs/IE_PLUGIN_DG/AsyncInferRequest.md @@ -0,0 +1,49 @@ +# Asynchronous Inference Request {#async_infer_request} + +Asynchronous Inference Request runs an inference pipeline asynchronously in one or several task executors depending on a device pipeline structure. +Inference Engine Plugin API provides the base InferenceEngine::AsyncInferRequestThreadSafeDefault class: + +- The class has the `_pipeline` field of `std::vector >`, which contains pairs of an executor and executed task. +- All executors are passed as arguments to a class constructor and they are in the running state and ready to run tasks. +- The class has the InferenceEngine::AsyncInferRequestThreadSafeDefault::StopAndWait method, which waits for `_pipeline` to finish in a class destructor. The method does not stop task executors and they are still in the running stage, because they belong to the executable network instance and are not destroyed. + +`AsyncInferRequest` Class +------------------------ + +Inference Engine Plugin API provides the base InferenceEngine::AsyncInferRequestThreadSafeDefault class for a custom asynchronous inference request implementation: + +@snippet src/template_async_infer_request.hpp async_infer_request:header + +#### Class Fields + +- `_inferRequest` - a reference to the [synchronous inference request](@ref infer_request) implementation. Its methods are reused in the `AsyncInferRequest` constructor to define a device pipeline. +- `_waitExecutor` - a task executor that waits for a response from a device about device tasks completion + +> **NOTE**: If a plugin can work with several instances of a device, `_waitExecutor` must be device-specific. Otherwise, having a single task executor for several devices does not allow them to work in parallel. + +### `AsyncInferRequest()` + +The main goal of the `AsyncInferRequest` constructor is to define a device pipeline `_pipeline`. The example below demonstrates `_pipeline` creation with the following stages: + +- `inferPreprocess` is a CPU compute task. +- `startPipeline` is a CPU ligthweight task to submit tasks to a remote device. +- `waitPipeline` is a CPU non-compute task that waits for a response from a remote device. +- `inferPostprocess` is a CPU compute task. + +@snippet src/template_async_infer_request.cpp async_infer_request:ctor + +The stages are distributed among two task executors in the following way: + +- `inferPreprocess` and `startPipeline` are combined into a single task and run on `_requestExecutor`, which computes CPU tasks. +- You need at least two executors to overlap compute tasks of a CPU and a remote device the plugin works with. Otherwise, CPU and device tasks are executed serially one by one. +- `waitPipeline` is sent to `_waitExecutor`, which works with the device. + +> **NOTE**: `callbackExecutor` is also passed to the constructor and it is used in the base InferenceEngine::AsyncInferRequestThreadSafeDefault class, which adds a pair of `callbackExecutor` and a callback function set by the user to the end of the pipeline. + +Inference request stages are also profiled using IE_PROFILING_AUTO_SCOPE, which shows how pipelines of multiple asynchronous inference requests are run in parallel via the [Intel® VTune™ Profiler](https://software.intel.com/en-us/vtune) tool. + +### `~AsyncInferRequest()` + +In the asynchronous request destructor, it is necessary to wait for a pipeline to finish. It can be done using the InferenceEngine::AsyncInferRequestThreadSafeDefault::StopAndWait method of the base class. + +@snippet src/template_async_infer_request.cpp async_infer_request:dtor diff --git a/docs/IE_PLUGIN_DG/Building.md b/docs/IE_PLUGIN_DG/Building.md new file mode 100644 index 00000000000000..87b66ad983c460 --- /dev/null +++ b/docs/IE_PLUGIN_DG/Building.md @@ -0,0 +1,99 @@ +# Build Plugin Using CMake* {#plugin_build} + +Inference Engine build infrastructure provides the Inference Engine Developer Package for plugin development. + +Inference Engine Developer Package +------------------------ + +To automatically generate the Inference Engine Developer Package, run the `cmake` tool during a DLDT build: + +```bash +$ mkdir dldt-release-build +$ cd dldt-release-build +$ cmake -DCMAKE_BUILD_TYPE=Release ../dldt +``` + +Once the commands above are executed, the Inference Engine Developer Package is generated in the `dldt-release-build` folder. It consists of several files: + - `InferenceEngineDeveloperPackageConfig.cmake` - the main CMake script which imports targets and provides compilation flags and CMake options. + - `InferenceEngineDeveloperPackageConfig-version.cmake` - a file with a package version. + - `targets_developer.cmake` - an automatically generated file which contains all targets exported from the Deep Learning Deployment Toolkit (DLDT) build tree. This file is included by `InferenceEngineDeveloperPackageConfig.cmake` to import the following targets: + - Libraries for plugin development: + * `IE::ngraph` - shared nGraph library + * `IE::inference_engine` - shared Inference Engine library + * `IE::inference_engine_preproc` - shared library with Inference Engine preprocessing plugin + * `IE::inference_engine_plugin_api` - interface library with Inference Engine Plugin API headers + * `IE::inference_engine_lp_transformations` - shared library with low-precision transformations + * `IE::pugixml` - static Pugixml library + * `IE::xbyak` - interface library with Xbyak headers + - Libraries for tests development: + * `IE::gtest`, `IE::gtest_main`, `IE::gmock` - Google Tests framework libraries + * `IE::commonTestUtils` - static library with common tests utilities + * `IE::funcTestUtils` - static library with functional tests utilities + * `IE::unitTestUtils` - static library with unit tests utilities + * `IE::ngraphFunctions` - static library with the set of Ngraph Functions builders + * `IE::funcSharedTests` - static library with common functional tests + +> **Note:** it's enough just to run `cmake --build . --target ie_dev_targets` command to build only targets from the +> Inference Engine Developer package. + +Build Plugin using Inference Engine Developer Package +------------------------ + +To build a plugin source tree using the Inference Engine Developer Package, run the commands below: + +```cmake +$ mkdir template-plugin-release-build +$ cd template-plugin-release-build +$ cmake -DInferenceEngineDeveloperPackage_DIR=../dldt-release-build ../template-plugin +``` + +A common plugin consists of the following components: + +1. Plugin code in the `src` folder +2. Code of tests in the `tests` folder + +To build a plugin and its tests, run the following CMake scripts: + +- Root `CMakeLists.txt`, which finds the Inference Engine Developer Package using the `find_package` CMake command and adds the `src` and `tests` subdirectories with plugin sources and their tests respectively: + +```cmake +cmake_minimum_required(VERSION 3.13.3) + +project(InferenceEngineTemplatePlugin) + +set(IE_MAIN_TEMPLATE_PLUGIN_SOURCE_DIR ${InferenceEngineTemplatePlugin_SOURCE_DIR}) + +find_package(InferenceEngineDeveloperPackage REQUIRED) + +add_subdirectory(src) + +if(ENABLE_TESTS) + include(CTest) + enable_testing() + + if(ENABLE_FUNCTIONAL_TESTS) + add_subdirectory(tests/functional) + endif() + + if(ENABLE_BEH_TESTS) + add_subdirectory(tests/behavior) + endif() +endif() +``` + +> **NOTE**: The default values of the `ENABLE_TESTS`, `ENABLE_FUNCTIONAL_TESTS`, `ENABLE_BEH_TESTS` options are shared via the Inference Engine Developer Package and they are the same as for the main DLDT build tree. You can override them during plugin build using the command below: + ```bash + $ cmake -DENABLE_FUNCTIONAL_TESTS=OFF -DInferenceEngineDeveloperPackage_DIR=../dldt-release-build ../template-plugin + ``` + +- `src/CMakeLists.txt` to build a plugin shared library from sources: + +@snippet src/CMakeLists.txt cmake:plugin + +> **NOTE**: `IE::inference_engine` target is imported from the Inference Engine Developer Package. + +- `tests/functional/CMakeLists.txt` to build a set of functional plugin tests: + +@snippet tests/functional/CMakeLists.txt cmake:functional_tests + +> **NOTE**: The `IE::funcSharedTests` static library with common functional Inference Engine Plugin tests is imported via the Inference Engine Developer Package. diff --git a/docs/IE_PLUGIN_DG/Doxyfile b/docs/IE_PLUGIN_DG/Doxyfile new file mode 100644 index 00000000000000..8b2d8c7f18ddfc --- /dev/null +++ b/docs/IE_PLUGIN_DG/Doxyfile @@ -0,0 +1,2439 @@ +# Doxyfile 1.8.12 + +# This file describes the settings to be used by the documentation system +# doxygen (www.doxygen.org) for a project. +# +# All text after a double hash (##) is considered a comment and is placed in +# front of the TAG it is preceding. +# +# All text after a single hash (#) is considered a comment and will be ignored. +# The format is: +# TAG = value [value, ...] +# For lists, items can also be appended using: +# TAG += value [value, ...] +# Values that contain spaces should be placed between quotes (\" \"). + +#--------------------------------------------------------------------------- +# Project related configuration options +#--------------------------------------------------------------------------- + +# This tag specifies the encoding used for all characters in the config file +# that follow. The default is UTF-8 which is also the encoding used for all text +# before the first occurrence of this tag. Doxygen uses libiconv (or the iconv +# built into libc) for the transcoding. See http://www.gnu.org/software/libiconv +# for the list of possible encodings. +# The default value is: UTF-8. + +DOXYFILE_ENCODING = UTF-8 + +# The PROJECT_NAME tag is a single word (or a sequence of words surrounded by +# double-quotes, unless you are using Doxywizard) that should identify the +# project for which the documentation is generated. This name is used in the +# title of most generated pages and in a few other places. +# The default value is: My Project. + +PROJECT_NAME = "OpenVINO™ Toolkit" + +# The PROJECT_NUMBER tag can be used to enter a project or revision number. This +# could be handy for archiving the generated documentation or if some version +# control system is used. + +PROJECT_NUMBER = + +# Using the PROJECT_BRIEF tag one can provide an optional one line description +# for a project that appears at the top of each page and should give viewer a +# quick idea about the purpose of the project. Keep the description short. + +PROJECT_BRIEF = + +# With the PROJECT_LOGO tag one can specify a logo or an icon that is included +# in the documentation. The maximum height of the logo should not exceed 55 +# pixels and the maximum width should not exceed 200 pixels. Doxygen will copy +# the logo to the output directory. + +PROJECT_LOGO = + +# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) path +# into which the generated documentation will be written. If a relative path is +# entered, it will be relative to the location where doxygen was started. If +# left blank the current directory will be used. + +OUTPUT_DIRECTORY = + +# If the CREATE_SUBDIRS tag is set to YES then doxygen will create 4096 sub- +# directories (in 2 levels) under the output directory of each output format and +# will distribute the generated files over these directories. Enabling this +# option can be useful when feeding doxygen a huge amount of source files, where +# putting all generated files in the same directory would otherwise causes +# performance problems for the file system. +# The default value is: NO. + +CREATE_SUBDIRS = NO + +# If the ALLOW_UNICODE_NAMES tag is set to YES, doxygen will allow non-ASCII +# characters to appear in the names of generated files. If set to NO, non-ASCII +# characters will be escaped, for example _xE3_x81_x84 will be used for Unicode +# U+3044. +# The default value is: NO. + +ALLOW_UNICODE_NAMES = NO + +# The OUTPUT_LANGUAGE tag is used to specify the language in which all +# documentation generated by doxygen is written. Doxygen will use this +# information to generate all constant output in the proper language. +# Possible values are: Afrikaans, Arabic, Armenian, Brazilian, Catalan, Chinese, +# Chinese-Traditional, Croatian, Czech, Danish, Dutch, English (United States), +# Esperanto, Farsi (Persian), Finnish, French, German, Greek, Hungarian, +# Indonesian, Italian, Japanese, Japanese-en (Japanese with English messages), +# Korean, Korean-en (Korean with English messages), Latvian, Lithuanian, +# Macedonian, Norwegian, Persian (Farsi), Polish, Portuguese, Romanian, Russian, +# Serbian, Serbian-Cyrillic, Slovak, Slovene, Spanish, Swedish, Turkish, +# Ukrainian and Vietnamese. +# The default value is: English. + +OUTPUT_LANGUAGE = English + +# If the BRIEF_MEMBER_DESC tag is set to YES, doxygen will include brief member +# descriptions after the members that are listed in the file and class +# documentation (similar to Javadoc). Set to NO to disable this. +# The default value is: YES. + +BRIEF_MEMBER_DESC = YES + +# If the REPEAT_BRIEF tag is set to YES, doxygen will prepend the brief +# description of a member or function before the detailed description +# +# Note: If both HIDE_UNDOC_MEMBERS and BRIEF_MEMBER_DESC are set to NO, the +# brief descriptions will be completely suppressed. +# The default value is: YES. + +REPEAT_BRIEF = YES + +# This tag implements a quasi-intelligent brief description abbreviator that is +# used to form the text in various listings. Each string in this list, if found +# as the leading text of the brief description, will be stripped from the text +# and the result, after processing the whole list, is used as the annotated +# text. Otherwise, the brief description is used as-is. If left blank, the +# following values are used ($name is automatically replaced with the name of +# the entity):The $name class, The $name widget, The $name file, is, provides, +# specifies, contains, represents, a, an and the. + +ABBREVIATE_BRIEF = + +# If the ALWAYS_DETAILED_SEC and REPEAT_BRIEF tags are both set to YES then +# doxygen will generate a detailed section even if there is only a brief +# description. +# The default value is: NO. + +ALWAYS_DETAILED_SEC = NO + +# If the INLINE_INHERITED_MEMB tag is set to YES, doxygen will show all +# inherited members of a class in the documentation of that class as if those +# members were ordinary class members. Constructors, destructors and assignment +# operators of the base classes will not be shown. +# The default value is: NO. + +INLINE_INHERITED_MEMB = NO + +# If the FULL_PATH_NAMES tag is set to YES, doxygen will prepend the full path +# before files name in the file list and in the header files. If set to NO the +# shortest path that makes the file name unique will be used +# The default value is: YES. + +FULL_PATH_NAMES = YES + +# The STRIP_FROM_PATH tag can be used to strip a user-defined part of the path. +# Stripping is only done if one of the specified strings matches the left-hand +# part of the path. The tag can be used to show relative paths in the file list. +# If left blank the directory from which doxygen is run is used as the path to +# strip. +# +# Note that you can specify absolute paths here, but also relative paths, which +# will be relative from the directory where doxygen is started. +# This tag requires that the tag FULL_PATH_NAMES is set to YES. + +STRIP_FROM_PATH = + +# The STRIP_FROM_INC_PATH tag can be used to strip a user-defined part of the +# path mentioned in the documentation of a class, which tells the reader which +# header file to include in order to use a class. If left blank only the name of +# the header file containing the class definition is used. Otherwise one should +# specify the list of include paths that are normally passed to the compiler +# using the -I flag. + +STRIP_FROM_INC_PATH = + +# If the SHORT_NAMES tag is set to YES, doxygen will generate much shorter (but +# less readable) file names. This can be useful is your file systems doesn't +# support long names like on DOS, Mac, or CD-ROM. +# The default value is: NO. + +SHORT_NAMES = NO + +# If the JAVADOC_AUTOBRIEF tag is set to YES then doxygen will interpret the +# first line (until the first dot) of a Javadoc-style comment as the brief +# description. If set to NO, the Javadoc-style will behave just like regular Qt- +# style comments (thus requiring an explicit @brief command for a brief +# description.) +# The default value is: NO. + +JAVADOC_AUTOBRIEF = NO + +# If the QT_AUTOBRIEF tag is set to YES then doxygen will interpret the first +# line (until the first dot) of a Qt-style comment as the brief description. If +# set to NO, the Qt-style will behave just like regular Qt-style comments (thus +# requiring an explicit \brief command for a brief description.) +# The default value is: NO. + +QT_AUTOBRIEF = NO + +# The MULTILINE_CPP_IS_BRIEF tag can be set to YES to make doxygen treat a +# multi-line C++ special comment block (i.e. a block of //! or /// comments) as +# a brief description. This used to be the default behavior. The new default is +# to treat a multi-line C++ comment block as a detailed description. Set this +# tag to YES if you prefer the old behavior instead. +# +# Note that setting this tag to YES also means that rational rose comments are +# not recognized any more. +# The default value is: NO. + +MULTILINE_CPP_IS_BRIEF = NO + +# If the INHERIT_DOCS tag is set to YES then an undocumented member inherits the +# documentation from any documented member that it re-implements. +# The default value is: YES. + +INHERIT_DOCS = YES + +# If the SEPARATE_MEMBER_PAGES tag is set to YES then doxygen will produce a new +# page for each member. If set to NO, the documentation of a member will be part +# of the file/class/namespace that contains it. +# The default value is: NO. + +SEPARATE_MEMBER_PAGES = NO + +# The TAB_SIZE tag can be used to set the number of spaces in a tab. Doxygen +# uses this value to replace tabs by spaces in code fragments. +# Minimum value: 1, maximum value: 16, default value: 4. + +TAB_SIZE = 4 + +# This tag can be used to specify a number of aliases that act as commands in +# the documentation. An alias has the form: +# name=value +# For example adding +# "sideeffect=@par Side Effects:\n" +# will allow you to put the command \sideeffect (or @sideeffect) in the +# documentation, which will result in a user-defined paragraph with heading +# "Side Effects:". You can put \n's in the value part of an alias to insert +# newlines. + +ALIASES = + +# This tag can be used to specify a number of word-keyword mappings (TCL only). +# A mapping has the form "name=value". For example adding "class=itcl::class" +# will allow you to use the command class in the itcl::class meaning. + +TCL_SUBST = + +# Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C sources +# only. Doxygen will then generate output that is more tailored for C. For +# instance, some of the names that are used will be different. The list of all +# members will be omitted, etc. +# The default value is: NO. + +OPTIMIZE_OUTPUT_FOR_C = YES + +# Set the OPTIMIZE_OUTPUT_JAVA tag to YES if your project consists of Java or +# Python sources only. Doxygen will then generate output that is more tailored +# for that language. For instance, namespaces will be presented as packages, +# qualified scopes will look different, etc. +# The default value is: NO. + +OPTIMIZE_OUTPUT_JAVA = NO + +# Set the OPTIMIZE_FOR_FORTRAN tag to YES if your project consists of Fortran +# sources. Doxygen will then generate output that is tailored for Fortran. +# The default value is: NO. + +OPTIMIZE_FOR_FORTRAN = NO + +# Set the OPTIMIZE_OUTPUT_VHDL tag to YES if your project consists of VHDL +# sources. Doxygen will then generate output that is tailored for VHDL. +# The default value is: NO. + +OPTIMIZE_OUTPUT_VHDL = NO + +# Doxygen selects the parser to use depending on the extension of the files it +# parses. With this tag you can assign which parser to use for a given +# extension. Doxygen has a built-in mapping, but you can override or extend it +# using this tag. The format is ext=language, where ext is a file extension, and +# language is one of the parsers supported by doxygen: IDL, Java, Javascript, +# C#, C, C++, D, PHP, Objective-C, Python, Fortran (fixed format Fortran: +# FortranFixed, free formatted Fortran: FortranFree, unknown formatted Fortran: +# Fortran. In the later case the parser tries to guess whether the code is fixed +# or free formatted code, this is the default for Fortran type files), VHDL. For +# instance to make doxygen treat .inc files as Fortran files (default is PHP), +# and .f files as C (default is Fortran), use: inc=Fortran f=C. +# +# Note: For files without extension you can use no_extension as a placeholder. +# +# Note that for custom extensions you also need to set FILE_PATTERNS otherwise +# the files are not read by doxygen. + +EXTENSION_MAPPING = + +# If the MARKDOWN_SUPPORT tag is enabled then doxygen pre-processes all comments +# according to the Markdown format, which allows for more readable +# documentation. See http://daringfireball.net/projects/markdown/ for details. +# The output of markdown processing is further processed by doxygen, so you can +# mix doxygen, HTML, and XML commands with Markdown formatting. Disable only in +# case of backward compatibilities issues. +# The default value is: YES. + +MARKDOWN_SUPPORT = YES + +# When the TOC_INCLUDE_HEADINGS tag is set to a non-zero value, all headings up +# to that level are automatically included in the table of contents, even if +# they do not have an id attribute. +# Note: This feature currently applies only to Markdown headings. +# Minimum value: 0, maximum value: 99, default value: 0. +# This tag requires that the tag MARKDOWN_SUPPORT is set to YES. + +TOC_INCLUDE_HEADINGS = 0 + +# When enabled doxygen tries to link words that correspond to documented +# classes, or namespaces to their corresponding documentation. Such a link can +# be prevented in individual cases by putting a % sign in front of the word or +# globally by setting AUTOLINK_SUPPORT to NO. +# The default value is: YES. + +AUTOLINK_SUPPORT = YES + +# If you use STL classes (i.e. std::string, std::vector, etc.) but do not want +# to include (a tag file for) the STL sources as input, then you should set this +# tag to YES in order to let doxygen match functions declarations and +# definitions whose arguments contain STL classes (e.g. func(std::string); +# versus func(std::string) {}). This also make the inheritance and collaboration +# diagrams that involve STL classes more complete and accurate. +# The default value is: NO. + +BUILTIN_STL_SUPPORT = NO + +# If you use Microsoft's C++/CLI language, you should set this option to YES to +# enable parsing support. +# The default value is: NO. + +CPP_CLI_SUPPORT = NO + +# Set the SIP_SUPPORT tag to YES if your project consists of sip (see: +# http://www.riverbankcomputing.co.uk/software/sip/intro) sources only. Doxygen +# will parse them like normal C++ but will assume all classes use public instead +# of private inheritance when no explicit protection keyword is present. +# The default value is: NO. + +SIP_SUPPORT = NO + +# For Microsoft's IDL there are propget and propput attributes to indicate +# getter and setter methods for a property. Setting this option to YES will make +# doxygen to replace the get and set methods by a property in the documentation. +# This will only work if the methods are indeed getting or setting a simple +# type. If this is not the case, or you want to show the methods anyway, you +# should set this option to NO. +# The default value is: YES. + +IDL_PROPERTY_SUPPORT = YES + +# If member grouping is used in the documentation and the DISTRIBUTE_GROUP_DOC +# tag is set to YES then doxygen will reuse the documentation of the first +# member in the group (if any) for the other members of the group. By default +# all members of a group must be documented explicitly. +# The default value is: NO. + +DISTRIBUTE_GROUP_DOC = NO + +# If one adds a struct or class to a group and this option is enabled, then also +# any nested class or struct is added to the same group. By default this option +# is disabled and one has to add nested compounds explicitly via \ingroup. +# The default value is: NO. + +GROUP_NESTED_COMPOUNDS = NO + +# Set the SUBGROUPING tag to YES to allow class member groups of the same type +# (for instance a group of public functions) to be put as a subgroup of that +# type (e.g. under the Public Functions section). Set it to NO to prevent +# subgrouping. Alternatively, this can be done per class using the +# \nosubgrouping command. +# The default value is: YES. + +SUBGROUPING = YES + +# When the INLINE_GROUPED_CLASSES tag is set to YES, classes, structs and unions +# are shown inside the group in which they are included (e.g. using \ingroup) +# instead of on a separate page (for HTML and Man pages) or section (for LaTeX +# and RTF). +# +# Note that this feature does not work in combination with +# SEPARATE_MEMBER_PAGES. +# The default value is: NO. + +INLINE_GROUPED_CLASSES = NO + +# When the INLINE_SIMPLE_STRUCTS tag is set to YES, structs, classes, and unions +# with only public data fields or simple typedef fields will be shown inline in +# the documentation of the scope in which they are defined (i.e. file, +# namespace, or group documentation), provided this scope is documented. If set +# to NO, structs, classes, and unions are shown on a separate page (for HTML and +# Man pages) or section (for LaTeX and RTF). +# The default value is: NO. + +INLINE_SIMPLE_STRUCTS = NO + +# When TYPEDEF_HIDES_STRUCT tag is enabled, a typedef of a struct, union, or +# enum is documented as struct, union, or enum with the name of the typedef. So +# typedef struct TypeS {} TypeT, will appear in the documentation as a struct +# with name TypeT. When disabled the typedef will appear as a member of a file, +# namespace, or class. And the struct will be named TypeS. This can typically be +# useful for C code in case the coding convention dictates that all compound +# types are typedef'ed and only the typedef is referenced, never the tag name. +# The default value is: NO. + +TYPEDEF_HIDES_STRUCT = NO + +# The size of the symbol lookup cache can be set using LOOKUP_CACHE_SIZE. This +# cache is used to resolve symbols given their name and scope. Since this can be +# an expensive process and often the same symbol appears multiple times in the +# code, doxygen keeps a cache of pre-resolved symbols. If the cache is too small +# doxygen will become slower. If the cache is too large, memory is wasted. The +# cache size is given by this formula: 2^(16+LOOKUP_CACHE_SIZE). The valid range +# is 0..9, the default is 0, corresponding to a cache size of 2^16=65536 +# symbols. At the end of a run doxygen will report the cache usage and suggest +# the optimal cache size from a speed point of view. +# Minimum value: 0, maximum value: 9, default value: 0. + +LOOKUP_CACHE_SIZE = 0 + +#--------------------------------------------------------------------------- +# Build related configuration options +#--------------------------------------------------------------------------- + +# If the EXTRACT_ALL tag is set to YES, doxygen will assume all entities in +# documentation are documented, even if no documentation was available. Private +# class members and static file members will be hidden unless the +# EXTRACT_PRIVATE respectively EXTRACT_STATIC tags are set to YES. +# Note: This will also disable the warnings about undocumented members that are +# normally produced when WARNINGS is set to YES. +# The default value is: NO. + +EXTRACT_ALL = NO + +# If the EXTRACT_PRIVATE tag is set to YES, all private members of a class will +# be included in the documentation. +# The default value is: NO. + +EXTRACT_PRIVATE = NO + +# If the EXTRACT_PACKAGE tag is set to YES, all members with package or internal +# scope will be included in the documentation. +# The default value is: NO. + +EXTRACT_PACKAGE = NO + +# If the EXTRACT_STATIC tag is set to YES, all static members of a file will be +# included in the documentation. +# The default value is: NO. + +EXTRACT_STATIC = YES + +# If the EXTRACT_LOCAL_CLASSES tag is set to YES, classes (and structs) defined +# locally in source files will be included in the documentation. If set to NO, +# only classes defined in header files are included. Does not have any effect +# for Java sources. +# The default value is: YES. + +EXTRACT_LOCAL_CLASSES = NO + +# This flag is only useful for Objective-C code. If set to YES, local methods, +# which are defined in the implementation section but not in the interface are +# included in the documentation. If set to NO, only methods in the interface are +# included. +# The default value is: NO. + +EXTRACT_LOCAL_METHODS = NO + +# If this flag is set to YES, the members of anonymous namespaces will be +# extracted and appear in the documentation as a namespace called +# 'anonymous_namespace{file}', where file will be replaced with the base name of +# the file that contains the anonymous namespace. By default anonymous namespace +# are hidden. +# The default value is: NO. + +EXTRACT_ANON_NSPACES = NO + +# If the HIDE_UNDOC_MEMBERS tag is set to YES, doxygen will hide all +# undocumented members inside documented classes or files. If set to NO these +# members will be included in the various overviews, but no documentation +# section is generated. This option has no effect if EXTRACT_ALL is enabled. +# The default value is: NO. + +HIDE_UNDOC_MEMBERS = NO + +# If the HIDE_UNDOC_CLASSES tag is set to YES, doxygen will hide all +# undocumented classes that are normally visible in the class hierarchy. If set +# to NO, these classes will be included in the various overviews. This option +# has no effect if EXTRACT_ALL is enabled. +# The default value is: NO. + +HIDE_UNDOC_CLASSES = NO + +# If the HIDE_FRIEND_COMPOUNDS tag is set to YES, doxygen will hide all friend +# (class|struct|union) declarations. If set to NO, these declarations will be +# included in the documentation. +# The default value is: NO. + +HIDE_FRIEND_COMPOUNDS = NO + +# If the HIDE_IN_BODY_DOCS tag is set to YES, doxygen will hide any +# documentation blocks found inside the body of a function. If set to NO, these +# blocks will be appended to the function's detailed documentation block. +# The default value is: NO. + +HIDE_IN_BODY_DOCS = NO + +# The INTERNAL_DOCS tag determines if documentation that is typed after a +# \internal command is included. If the tag is set to NO then the documentation +# will be excluded. Set it to YES to include the internal documentation. +# The default value is: NO. + +INTERNAL_DOCS = NO + +# If the CASE_SENSE_NAMES tag is set to NO then doxygen will only generate file +# names in lower-case letters. If set to YES, upper-case letters are also +# allowed. This is useful if you have classes or files whose names only differ +# in case and if your file system supports case sensitive file names. Windows +# and Mac users are advised to set this option to NO. +# The default value is: system dependent. + +CASE_SENSE_NAMES = YES + +# If the HIDE_SCOPE_NAMES tag is set to NO then doxygen will show members with +# their full class and namespace scopes in the documentation. If set to YES, the +# scope will be hidden. +# The default value is: NO. + +HIDE_SCOPE_NAMES = NO + +# If the HIDE_COMPOUND_REFERENCE tag is set to NO (default) then doxygen will +# append additional text to a page's title, such as Class Reference. If set to +# YES the compound reference will be hidden. +# The default value is: NO. + +HIDE_COMPOUND_REFERENCE= NO + +# If the SHOW_INCLUDE_FILES tag is set to YES then doxygen will put a list of +# the files that are included by a file in the documentation of that file. +# The default value is: YES. + +SHOW_INCLUDE_FILES = YES + +# If the SHOW_GROUPED_MEMB_INC tag is set to YES then Doxygen will add for each +# grouped member an include statement to the documentation, telling the reader +# which file to include in order to use the member. +# The default value is: NO. + +SHOW_GROUPED_MEMB_INC = NO + +# If the FORCE_LOCAL_INCLUDES tag is set to YES then doxygen will list include +# files with double quotes in the documentation rather than with sharp brackets. +# The default value is: NO. + +FORCE_LOCAL_INCLUDES = NO + +# If the INLINE_INFO tag is set to YES then a tag [inline] is inserted in the +# documentation for inline members. +# The default value is: YES. + +INLINE_INFO = YES + +# If the SORT_MEMBER_DOCS tag is set to YES then doxygen will sort the +# (detailed) documentation of file and class members alphabetically by member +# name. If set to NO, the members will appear in declaration order. +# The default value is: YES. + +SORT_MEMBER_DOCS = YES + +# If the SORT_BRIEF_DOCS tag is set to YES then doxygen will sort the brief +# descriptions of file, namespace and class members alphabetically by member +# name. If set to NO, the members will appear in declaration order. Note that +# this will also influence the order of the classes in the class list. +# The default value is: NO. + +SORT_BRIEF_DOCS = NO + +# If the SORT_MEMBERS_CTORS_1ST tag is set to YES then doxygen will sort the +# (brief and detailed) documentation of class members so that constructors and +# destructors are listed first. If set to NO the constructors will appear in the +# respective orders defined by SORT_BRIEF_DOCS and SORT_MEMBER_DOCS. +# Note: If SORT_BRIEF_DOCS is set to NO this option is ignored for sorting brief +# member documentation. +# Note: If SORT_MEMBER_DOCS is set to NO this option is ignored for sorting +# detailed member documentation. +# The default value is: NO. + +SORT_MEMBERS_CTORS_1ST = NO + +# If the SORT_GROUP_NAMES tag is set to YES then doxygen will sort the hierarchy +# of group names into alphabetical order. If set to NO the group names will +# appear in their defined order. +# The default value is: NO. + +SORT_GROUP_NAMES = NO + +# If the SORT_BY_SCOPE_NAME tag is set to YES, the class list will be sorted by +# fully-qualified names, including namespaces. If set to NO, the class list will +# be sorted only by class name, not including the namespace part. +# Note: This option is not very useful if HIDE_SCOPE_NAMES is set to YES. +# Note: This option applies only to the class list, not to the alphabetical +# list. +# The default value is: NO. + +SORT_BY_SCOPE_NAME = NO + +# If the STRICT_PROTO_MATCHING option is enabled and doxygen fails to do proper +# type resolution of all parameters of a function it will reject a match between +# the prototype and the implementation of a member function even if there is +# only one candidate or it is obvious which candidate to choose by doing a +# simple string match. By disabling STRICT_PROTO_MATCHING doxygen will still +# accept a match between prototype and implementation in such cases. +# The default value is: NO. + +STRICT_PROTO_MATCHING = NO + +# The GENERATE_TODOLIST tag can be used to enable (YES) or disable (NO) the todo +# list. This list is created by putting \todo commands in the documentation. +# The default value is: YES. + +GENERATE_TODOLIST = YES + +# The GENERATE_TESTLIST tag can be used to enable (YES) or disable (NO) the test +# list. This list is created by putting \test commands in the documentation. +# The default value is: YES. + +GENERATE_TESTLIST = YES + +# The GENERATE_BUGLIST tag can be used to enable (YES) or disable (NO) the bug +# list. This list is created by putting \bug commands in the documentation. +# The default value is: YES. + +GENERATE_BUGLIST = YES + +# The GENERATE_DEPRECATEDLIST tag can be used to enable (YES) or disable (NO) +# the deprecated list. This list is created by putting \deprecated commands in +# the documentation. +# The default value is: YES. + +GENERATE_DEPRECATEDLIST= YES + +# The ENABLED_SECTIONS tag can be used to enable conditional documentation +# sections, marked by \if ... \endif and \cond +# ... \endcond blocks. + +ENABLED_SECTIONS = + +# The MAX_INITIALIZER_LINES tag determines the maximum number of lines that the +# initial value of a variable or macro / define can have for it to appear in the +# documentation. If the initializer consists of more lines than specified here +# it will be hidden. Use a value of 0 to hide initializers completely. The +# appearance of the value of individual variables and macros / defines can be +# controlled using \showinitializer or \hideinitializer command in the +# documentation regardless of this setting. +# Minimum value: 0, maximum value: 10000, default value: 30. + +MAX_INITIALIZER_LINES = 30 + +# Set the SHOW_USED_FILES tag to NO to disable the list of files generated at +# the bottom of the documentation of classes and structs. If set to YES, the +# list will mention the files that were used to generate the documentation. +# The default value is: YES. + +SHOW_USED_FILES = YES + +# Set the SHOW_FILES tag to NO to disable the generation of the Files page. This +# will remove the Files entry from the Quick Index and from the Folder Tree View +# (if specified). +# The default value is: YES. + +SHOW_FILES = YES + +# Set the SHOW_NAMESPACES tag to NO to disable the generation of the Namespaces +# page. This will remove the Namespaces entry from the Quick Index and from the +# Folder Tree View (if specified). +# The default value is: YES. + +SHOW_NAMESPACES = YES + +# The FILE_VERSION_FILTER tag can be used to specify a program or script that +# doxygen should invoke to get the current version for each file (typically from +# the version control system). Doxygen will invoke the program by executing (via +# popen()) the command command input-file, where command is the value of the +# FILE_VERSION_FILTER tag, and input-file is the name of an input file provided +# by doxygen. Whatever the program writes to standard output is used as the file +# version. For an example see the documentation. + +FILE_VERSION_FILTER = + +# The LAYOUT_FILE tag can be used to specify a layout file which will be parsed +# by doxygen. The layout file controls the global structure of the generated +# output files in an output format independent way. To create the layout file +# that represents doxygen's defaults, run doxygen with the -l option. You can +# optionally specify a file name after the option, if omitted DoxygenLayout.xml +# will be used as the name of the layout file. +# +# Note that if you run doxygen from a directory containing a file called +# DoxygenLayout.xml, doxygen will parse it automatically even if the LAYOUT_FILE +# tag is left empty. + +LAYOUT_FILE = layout.xml + +# The CITE_BIB_FILES tag can be used to specify one or more bib files containing +# the reference definitions. This must be a list of .bib files. The .bib +# extension is automatically appended if omitted. This requires the bibtex tool +# to be installed. See also http://en.wikipedia.org/wiki/BibTeX for more info. +# For LaTeX the style of the bibliography can be controlled using +# LATEX_BIB_STYLE. To use this feature you need bibtex and perl available in the +# search path. See also \cite for info how to create references. + +CITE_BIB_FILES = + +#--------------------------------------------------------------------------- +# Configuration options related to warning and progress messages +#--------------------------------------------------------------------------- + +# The QUIET tag can be used to turn on/off the messages that are generated to +# standard output by doxygen. If QUIET is set to YES this implies that the +# messages are off. +# The default value is: NO. + +QUIET = NO + +# The WARNINGS tag can be used to turn on/off the warning messages that are +# generated to standard error (stderr) by doxygen. If WARNINGS is set to YES +# this implies that the warnings are on. +# +# Tip: Turn warnings on while writing the documentation. +# The default value is: YES. + +WARNINGS = YES + +# If the WARN_IF_UNDOCUMENTED tag is set to YES then doxygen will generate +# warnings for undocumented members. If EXTRACT_ALL is set to YES then this flag +# will automatically be disabled. +# The default value is: YES. + +WARN_IF_UNDOCUMENTED = YES + +# If the WARN_IF_DOC_ERROR tag is set to YES, doxygen will generate warnings for +# potential errors in the documentation, such as not documenting some parameters +# in a documented function, or documenting parameters that don't exist or using +# markup commands wrongly. +# The default value is: YES. + +WARN_IF_DOC_ERROR = YES + +# This WARN_NO_PARAMDOC option can be enabled to get warnings for functions that +# are documented, but have no documentation for their parameters or return +# value. If set to NO, doxygen will only warn about wrong or incomplete +# parameter documentation, but not about the absence of documentation. +# The default value is: NO. + +WARN_NO_PARAMDOC = YES + +# If the WARN_AS_ERROR tag is set to YES then doxygen will immediately stop when +# a warning is encountered. +# The default value is: NO. + +WARN_AS_ERROR = NO + +# The WARN_FORMAT tag determines the format of the warning messages that doxygen +# can produce. The string should contain the $file, $line, and $text tags, which +# will be replaced by the file and line number from which the warning originated +# and the warning text. Optionally the format may contain $version, which will +# be replaced by the version of the file (if it could be obtained via +# FILE_VERSION_FILTER) +# The default value is: $file:$line: $text. + +WARN_FORMAT = "$file:$line: $text" + +# The WARN_LOGFILE tag can be used to specify a file to which warning and error +# messages should be written. If left blank the output is written to standard +# error (stderr). + +WARN_LOGFILE = + +#--------------------------------------------------------------------------- +# Configuration options related to the input files +#--------------------------------------------------------------------------- + +# The INPUT tag is used to specify the files and/or directories that contain +# documented source files. You may enter file names like myfile.cpp or +# directories like /usr/src/myproject. Separate the files or directories with +# spaces. See also FILE_PATTERNS and EXTENSION_MAPPING +# Note: If this tag is empty the current directory is searched. + +INPUT = . \ + ../../inference-engine/src/plugin_api + +# This tag can be used to specify the character encoding of the source files +# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses +# libiconv (or the iconv built into libc) for the transcoding. See the libiconv +# documentation (see: http://www.gnu.org/software/libiconv) for the list of +# possible encodings. +# The default value is: UTF-8. + +INPUT_ENCODING = UTF-8 + +# If the value of the INPUT tag contains directories, you can use the +# FILE_PATTERNS tag to specify one or more wildcard patterns (like *.cpp and +# *.h) to filter out the source-files in the directories. +# +# Note that for custom extensions or not directly supported extensions you also +# need to set EXTENSION_MAPPING for the extension otherwise the files are not +# read by doxygen. +# +# If left blank the following patterns are tested:*.c, *.cc, *.cxx, *.cpp, +# *.c++, *.java, *.ii, *.ixx, *.ipp, *.i++, *.inl, *.idl, *.ddl, *.odl, *.h, +# *.hh, *.hxx, *.hpp, *.h++, *.cs, *.d, *.php, *.php4, *.php5, *.phtml, *.inc, +# *.m, *.markdown, *.md, *.mm, *.dox, *.py, *.pyw, *.f90, *.f95, *.f03, *.f08, +# *.f, *.for, *.tcl, *.vhd, *.vhdl, *.ucf and *.qsf. + +FILE_PATTERNS = *.c \ + *.cpp \ + *.c++ \ + *.h \ + *.hpp \ + *.md + +# The RECURSIVE tag can be used to specify whether or not subdirectories should +# be searched for input files as well. +# The default value is: NO. + +RECURSIVE = YES + +# The EXCLUDE tag can be used to specify files and/or directories that should be +# excluded from the INPUT source files. This way you can easily exclude a +# subdirectory from a directory tree whose root is specified with the INPUT tag. +# +# Note that relative paths are relative to the directory from which doxygen is +# run. + +EXCLUDE = + +# The EXCLUDE_SYMLINKS tag can be used to select whether or not files or +# directories that are symbolic links (a Unix file system feature) are excluded +# from the input. +# The default value is: NO. + +EXCLUDE_SYMLINKS = NO + +# If the value of the INPUT tag contains directories, you can use the +# EXCLUDE_PATTERNS tag to specify one or more wildcard patterns to exclude +# certain files from those directories. +# +# Note that the wildcards are matched against the file with absolute path, so to +# exclude all test directories for example use the pattern */test/* + +EXCLUDE_PATTERNS = cnn_network_ngraph_impl.hpp \ + ie_imemory_state_internal.hpp \ + ie_memory_state_internal.hpp \ + ie_memory_state_base.hpp \ + convert_function_to_cnn_network.hpp \ + generic_ie.hpp + +# The EXCLUDE_SYMBOLS tag can be used to specify one or more symbol names +# (namespaces, classes, functions, etc.) that should be excluded from the +# output. The symbol name can be a fully qualified name, a word, or if the +# wildcard * is used, a substring. Examples: ANamespace, AClass, +# AClass::ANamespace, ANamespace::*Test +# +# Note that the wildcards are matched against the file with absolute path, so to +# exclude all test directories use the pattern */test/* + +EXCLUDE_SYMBOLS = + +# The EXAMPLE_PATH tag can be used to specify one or more files or directories +# that contain example code fragments that are included (see the \include +# command). + +EXAMPLE_PATH = ../template_plugin/src \ + ../template_plugin/include \ + ../template_plugin/src/CMakeLists.txt \ + ../template_plugin/tests/functional/CMakeLists.txt \ + ../examples + +# If the value of the EXAMPLE_PATH tag contains directories, you can use the +# EXAMPLE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp and +# *.h) to filter out the source-files in the directories. If left blank all +# files are included. + +EXAMPLE_PATTERNS = *.cpp \ + *.hpp + +# If the EXAMPLE_RECURSIVE tag is set to YES then subdirectories will be +# searched for input files to be used with the \include or \dontinclude commands +# irrespective of the value of the RECURSIVE tag. +# The default value is: NO. + +EXAMPLE_RECURSIVE = YES + +# The IMAGE_PATH tag can be used to specify one or more files or directories +# that contain images that are to be included in the documentation (see the +# \image command). + +IMAGE_PATH = + +# The INPUT_FILTER tag can be used to specify a program that doxygen should +# invoke to filter for each input file. Doxygen will invoke the filter program +# by executing (via popen()) the command: +# +# +# +# where is the value of the INPUT_FILTER tag, and is the +# name of an input file. Doxygen will then use the output that the filter +# program writes to standard output. If FILTER_PATTERNS is specified, this tag +# will be ignored. +# +# Note that the filter must not add or remove lines; it is applied before the +# code is scanned, but not when the output code is generated. If lines are added +# or removed, the anchors will not be placed correctly. +# +# Note that for custom extensions or not directly supported extensions you also +# need to set EXTENSION_MAPPING for the extension otherwise the files are not +# properly processed by doxygen. + +INPUT_FILTER = + +# The FILTER_PATTERNS tag can be used to specify filters on a per file pattern +# basis. Doxygen will compare the file name with each pattern and apply the +# filter if there is a match. The filters are a list of the form: pattern=filter +# (like *.cpp=my_cpp_filter). See INPUT_FILTER for further information on how +# filters are used. If the FILTER_PATTERNS tag is empty or if none of the +# patterns match the file name, INPUT_FILTER is applied. +# +# Note that for custom extensions or not directly supported extensions you also +# need to set EXTENSION_MAPPING for the extension otherwise the files are not +# properly processed by doxygen. + +FILTER_PATTERNS = + +# If the FILTER_SOURCE_FILES tag is set to YES, the input filter (if set using +# INPUT_FILTER) will also be used to filter the input files that are used for +# producing the source files to browse (i.e. when SOURCE_BROWSER is set to YES). +# The default value is: NO. + +FILTER_SOURCE_FILES = NO + +# The FILTER_SOURCE_PATTERNS tag can be used to specify source filters per file +# pattern. A pattern will override the setting for FILTER_PATTERN (if any) and +# it is also possible to disable source filtering for a specific pattern using +# *.ext= (so without naming a filter). +# This tag requires that the tag FILTER_SOURCE_FILES is set to YES. + +FILTER_SOURCE_PATTERNS = + +# If the USE_MDFILE_AS_MAINPAGE tag refers to the name of a markdown file that +# is part of the input, its contents will be placed on the main page +# (index.html). This can be useful if you have a project on for instance GitHub +# and want to reuse the introduction page also for the doxygen output. + +USE_MDFILE_AS_MAINPAGE = + +#--------------------------------------------------------------------------- +# Configuration options related to source browsing +#--------------------------------------------------------------------------- + +# If the SOURCE_BROWSER tag is set to YES then a list of source files will be +# generated. Documented entities will be cross-referenced with these sources. +# +# Note: To get rid of all source code in the generated output, make sure that +# also VERBATIM_HEADERS is set to NO. +# The default value is: NO. + +SOURCE_BROWSER = NO + +# Setting the INLINE_SOURCES tag to YES will include the body of functions, +# classes and enums directly into the documentation. +# The default value is: NO. + +INLINE_SOURCES = NO + +# Setting the STRIP_CODE_COMMENTS tag to YES will instruct doxygen to hide any +# special comment blocks from generated source code fragments. Normal C, C++ and +# Fortran comments will always remain visible. +# The default value is: YES. + +STRIP_CODE_COMMENTS = NO + +# If the REFERENCED_BY_RELATION tag is set to YES then for each documented +# function all documented functions referencing it will be listed. +# The default value is: NO. + +REFERENCED_BY_RELATION = NO + +# If the REFERENCES_RELATION tag is set to YES then for each documented function +# all documented entities called/used by that function will be listed. +# The default value is: NO. + +REFERENCES_RELATION = NO + +# If the REFERENCES_LINK_SOURCE tag is set to YES and SOURCE_BROWSER tag is set +# to YES then the hyperlinks from functions in REFERENCES_RELATION and +# REFERENCED_BY_RELATION lists will link to the source code. Otherwise they will +# link to the documentation. +# The default value is: YES. + +REFERENCES_LINK_SOURCE = YES + +# If SOURCE_TOOLTIPS is enabled (the default) then hovering a hyperlink in the +# source code will show a tooltip with additional information such as prototype, +# brief description and links to the definition and documentation. Since this +# will make the HTML file larger and loading of large files a bit slower, you +# can opt to disable this feature. +# The default value is: YES. +# This tag requires that the tag SOURCE_BROWSER is set to YES. + +SOURCE_TOOLTIPS = YES + +# If the USE_HTAGS tag is set to YES then the references to source code will +# point to the HTML generated by the htags(1) tool instead of doxygen built-in +# source browser. The htags tool is part of GNU's global source tagging system +# (see http://www.gnu.org/software/global/global.html). You will need version +# 4.8.6 or higher. +# +# To use it do the following: +# - Install the latest version of global +# - Enable SOURCE_BROWSER and USE_HTAGS in the config file +# - Make sure the INPUT points to the root of the source tree +# - Run doxygen as normal +# +# Doxygen will invoke htags (and that will in turn invoke gtags), so these +# tools must be available from the command line (i.e. in the search path). +# +# The result: instead of the source browser generated by doxygen, the links to +# source code will now point to the output of htags. +# The default value is: NO. +# This tag requires that the tag SOURCE_BROWSER is set to YES. + +USE_HTAGS = NO + +# If the VERBATIM_HEADERS tag is set the YES then doxygen will generate a +# verbatim copy of the header file for each class for which an include is +# specified. Set to NO to disable this. +# See also: Section \class. +# The default value is: YES. + +VERBATIM_HEADERS = YES + +#--------------------------------------------------------------------------- +# Configuration options related to the alphabetical class index +#--------------------------------------------------------------------------- + +# If the ALPHABETICAL_INDEX tag is set to YES, an alphabetical index of all +# compounds will be generated. Enable this if the project contains a lot of +# classes, structs, unions or interfaces. +# The default value is: YES. + +ALPHABETICAL_INDEX = YES + +# The COLS_IN_ALPHA_INDEX tag can be used to specify the number of columns in +# which the alphabetical index list will be split. +# Minimum value: 1, maximum value: 20, default value: 5. +# This tag requires that the tag ALPHABETICAL_INDEX is set to YES. + +COLS_IN_ALPHA_INDEX = 5 + +# In case all classes in a project start with a common prefix, all classes will +# be put under the same header in the alphabetical index. The IGNORE_PREFIX tag +# can be used to specify a prefix (or a list of prefixes) that should be ignored +# while generating the index headers. +# This tag requires that the tag ALPHABETICAL_INDEX is set to YES. + +IGNORE_PREFIX = + +#--------------------------------------------------------------------------- +# Configuration options related to the HTML output +#--------------------------------------------------------------------------- + +# If the GENERATE_HTML tag is set to YES, doxygen will generate HTML output +# The default value is: YES. + +GENERATE_HTML = YES + +# The HTML_OUTPUT tag is used to specify where the HTML docs will be put. If a +# relative path is entered the value of OUTPUT_DIRECTORY will be put in front of +# it. +# The default directory is: html. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_OUTPUT = html + +# The HTML_FILE_EXTENSION tag can be used to specify the file extension for each +# generated HTML page (for example: .htm, .php, .asp). +# The default value is: .html. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_FILE_EXTENSION = .html + +# The HTML_HEADER tag can be used to specify a user-defined HTML header file for +# each generated HTML page. If the tag is left blank doxygen will generate a +# standard header. +# +# To get valid HTML the header file that includes any scripts and style sheets +# that doxygen needs, which is dependent on the configuration options used (e.g. +# the setting GENERATE_TREEVIEW). It is highly recommended to start with a +# default header using +# doxygen -w html new_header.html new_footer.html new_stylesheet.css +# YourConfigFile +# and then modify the file new_header.html. See also section "Doxygen usage" +# for information on how to generate the default header that doxygen normally +# uses. +# Note: The header is subject to change so you typically have to regenerate the +# default header when upgrading to a newer version of doxygen. For a description +# of the possible markers and block names see the documentation. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_HEADER = + +# The HTML_FOOTER tag can be used to specify a user-defined HTML footer for each +# generated HTML page. If the tag is left blank doxygen will generate a standard +# footer. See HTML_HEADER for more information on how to generate a default +# footer and what special commands can be used inside the footer. See also +# section "Doxygen usage" for information on how to generate the default footer +# that doxygen normally uses. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_FOOTER = + +# The HTML_STYLESHEET tag can be used to specify a user-defined cascading style +# sheet that is used by each HTML page. It can be used to fine-tune the look of +# the HTML output. If left blank doxygen will generate a default style sheet. +# See also section "Doxygen usage" for information on how to generate the style +# sheet that doxygen normally uses. +# Note: It is recommended to use HTML_EXTRA_STYLESHEET instead of this tag, as +# it is more robust and this tag (HTML_STYLESHEET) will in the future become +# obsolete. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_STYLESHEET = + +# The HTML_EXTRA_STYLESHEET tag can be used to specify additional user-defined +# cascading style sheets that are included after the standard style sheets +# created by doxygen. Using this option one can overrule certain style aspects. +# This is preferred over using HTML_STYLESHEET since it does not replace the +# standard style sheet and is therefore more robust against future updates. +# Doxygen will copy the style sheet files to the output directory. +# Note: The order of the extra style sheet files is of importance (e.g. the last +# style sheet in the list overrules the setting of the previous ones in the +# list). For an example see the documentation. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_EXTRA_STYLESHEET = + +# The HTML_EXTRA_FILES tag can be used to specify one or more extra images or +# other source files which should be copied to the HTML output directory. Note +# that these files will be copied to the base HTML output directory. Use the +# $relpath^ marker in the HTML_HEADER and/or HTML_FOOTER files to load these +# files. In the HTML_STYLESHEET file, use the file name only. Also note that the +# files will be copied as-is; there are no commands or markers available. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_EXTRA_FILES = + +# The HTML_COLORSTYLE_HUE tag controls the color of the HTML output. Doxygen +# will adjust the colors in the style sheet and background images according to +# this color. Hue is specified as an angle on a colorwheel, see +# http://en.wikipedia.org/wiki/Hue for more information. For instance the value +# 0 represents red, 60 is yellow, 120 is green, 180 is cyan, 240 is blue, 300 +# purple, and 360 is red again. +# Minimum value: 0, maximum value: 359, default value: 220. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_COLORSTYLE_HUE = 220 + +# The HTML_COLORSTYLE_SAT tag controls the purity (or saturation) of the colors +# in the HTML output. For a value of 0 the output will use grayscales only. A +# value of 255 will produce the most vivid colors. +# Minimum value: 0, maximum value: 255, default value: 100. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_COLORSTYLE_SAT = 100 + +# The HTML_COLORSTYLE_GAMMA tag controls the gamma correction applied to the +# luminance component of the colors in the HTML output. Values below 100 +# gradually make the output lighter, whereas values above 100 make the output +# darker. The value divided by 100 is the actual gamma applied, so 80 represents +# a gamma of 0.8, The value 220 represents a gamma of 2.2, and 100 does not +# change the gamma. +# Minimum value: 40, maximum value: 240, default value: 80. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_COLORSTYLE_GAMMA = 80 + +# If the HTML_TIMESTAMP tag is set to YES then the footer of each generated HTML +# page will contain the date and time when the page was generated. Setting this +# to YES can help to show when doxygen was last run and thus if the +# documentation is up to date. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_TIMESTAMP = NO + +# If the HTML_DYNAMIC_SECTIONS tag is set to YES then the generated HTML +# documentation will contain sections that can be hidden and shown after the +# page has loaded. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_DYNAMIC_SECTIONS = NO + +# With HTML_INDEX_NUM_ENTRIES one can control the preferred number of entries +# shown in the various tree structured indices initially; the user can expand +# and collapse entries dynamically later on. Doxygen will expand the tree to +# such a level that at most the specified number of entries are visible (unless +# a fully collapsed tree already exceeds this amount). So setting the number of +# entries 1 will produce a full collapsed tree by default. 0 is a special value +# representing an infinite number of entries and will result in a full expanded +# tree by default. +# Minimum value: 0, maximum value: 9999, default value: 100. +# This tag requires that the tag GENERATE_HTML is set to YES. + +HTML_INDEX_NUM_ENTRIES = 100 + +# If the GENERATE_DOCSET tag is set to YES, additional index files will be +# generated that can be used as input for Apple's Xcode 3 integrated development +# environment (see: http://developer.apple.com/tools/xcode/), introduced with +# OSX 10.5 (Leopard). To create a documentation set, doxygen will generate a +# Makefile in the HTML output directory. Running make will produce the docset in +# that directory and running make install will install the docset in +# ~/Library/Developer/Shared/Documentation/DocSets so that Xcode will find it at +# startup. See http://developer.apple.com/tools/creatingdocsetswithdoxygen.html +# for more information. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +GENERATE_DOCSET = NO + +# This tag determines the name of the docset feed. A documentation feed provides +# an umbrella under which multiple documentation sets from a single provider +# (such as a company or product suite) can be grouped. +# The default value is: Doxygen generated docs. +# This tag requires that the tag GENERATE_DOCSET is set to YES. + +DOCSET_FEEDNAME = "Doxygen generated docs" + +# This tag specifies a string that should uniquely identify the documentation +# set bundle. This should be a reverse domain-name style string, e.g. +# com.mycompany.MyDocSet. Doxygen will append .docset to the name. +# The default value is: org.doxygen.Project. +# This tag requires that the tag GENERATE_DOCSET is set to YES. + +DOCSET_BUNDLE_ID = org.doxygen.Project + +# The DOCSET_PUBLISHER_ID tag specifies a string that should uniquely identify +# the documentation publisher. This should be a reverse domain-name style +# string, e.g. com.mycompany.MyDocSet.documentation. +# The default value is: org.doxygen.Publisher. +# This tag requires that the tag GENERATE_DOCSET is set to YES. + +DOCSET_PUBLISHER_ID = org.doxygen.Publisher + +# The DOCSET_PUBLISHER_NAME tag identifies the documentation publisher. +# The default value is: Publisher. +# This tag requires that the tag GENERATE_DOCSET is set to YES. + +DOCSET_PUBLISHER_NAME = Publisher + +# If the GENERATE_HTMLHELP tag is set to YES then doxygen generates three +# additional HTML index files: index.hhp, index.hhc, and index.hhk. The +# index.hhp is a project file that can be read by Microsoft's HTML Help Workshop +# (see: http://www.microsoft.com/en-us/download/details.aspx?id=21138) on +# Windows. +# +# The HTML Help Workshop contains a compiler that can convert all HTML output +# generated by doxygen into a single compiled HTML file (.chm). Compiled HTML +# files are now used as the Windows 98 help format, and will replace the old +# Windows help format (.hlp) on all Windows platforms in the future. Compressed +# HTML files also contain an index, a table of contents, and you can search for +# words in the documentation. The HTML workshop also contains a viewer for +# compressed HTML files. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +GENERATE_HTMLHELP = NO + +# The CHM_FILE tag can be used to specify the file name of the resulting .chm +# file. You can add a path in front of the file if the result should not be +# written to the html output directory. +# This tag requires that the tag GENERATE_HTMLHELP is set to YES. + +CHM_FILE = + +# The HHC_LOCATION tag can be used to specify the location (absolute path +# including file name) of the HTML help compiler (hhc.exe). If non-empty, +# doxygen will try to run the HTML help compiler on the generated index.hhp. +# The file has to be specified with full path. +# This tag requires that the tag GENERATE_HTMLHELP is set to YES. + +HHC_LOCATION = + +# The GENERATE_CHI flag controls if a separate .chi index file is generated +# (YES) or that it should be included in the master .chm file (NO). +# The default value is: NO. +# This tag requires that the tag GENERATE_HTMLHELP is set to YES. + +GENERATE_CHI = NO + +# The CHM_INDEX_ENCODING is used to encode HtmlHelp index (hhk), content (hhc) +# and project file content. +# This tag requires that the tag GENERATE_HTMLHELP is set to YES. + +CHM_INDEX_ENCODING = + +# The BINARY_TOC flag controls whether a binary table of contents is generated +# (YES) or a normal table of contents (NO) in the .chm file. Furthermore it +# enables the Previous and Next buttons. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTMLHELP is set to YES. + +BINARY_TOC = NO + +# The TOC_EXPAND flag can be set to YES to add extra items for group members to +# the table of contents of the HTML help documentation and to the tree view. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTMLHELP is set to YES. + +TOC_EXPAND = NO + +# If the GENERATE_QHP tag is set to YES and both QHP_NAMESPACE and +# QHP_VIRTUAL_FOLDER are set, an additional index file will be generated that +# can be used as input for Qt's qhelpgenerator to generate a Qt Compressed Help +# (.qch) of the generated HTML documentation. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +GENERATE_QHP = NO + +# If the QHG_LOCATION tag is specified, the QCH_FILE tag can be used to specify +# the file name of the resulting .qch file. The path specified is relative to +# the HTML output folder. +# This tag requires that the tag GENERATE_QHP is set to YES. + +QCH_FILE = + +# The QHP_NAMESPACE tag specifies the namespace to use when generating Qt Help +# Project output. For more information please see Qt Help Project / Namespace +# (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#namespace). +# The default value is: org.doxygen.Project. +# This tag requires that the tag GENERATE_QHP is set to YES. + +QHP_NAMESPACE = org.doxygen.Project + +# The QHP_VIRTUAL_FOLDER tag specifies the namespace to use when generating Qt +# Help Project output. For more information please see Qt Help Project / Virtual +# Folders (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#virtual- +# folders). +# The default value is: doc. +# This tag requires that the tag GENERATE_QHP is set to YES. + +QHP_VIRTUAL_FOLDER = doc + +# If the QHP_CUST_FILTER_NAME tag is set, it specifies the name of a custom +# filter to add. For more information please see Qt Help Project / Custom +# Filters (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#custom- +# filters). +# This tag requires that the tag GENERATE_QHP is set to YES. + +QHP_CUST_FILTER_NAME = + +# The QHP_CUST_FILTER_ATTRS tag specifies the list of the attributes of the +# custom filter to add. For more information please see Qt Help Project / Custom +# Filters (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#custom- +# filters). +# This tag requires that the tag GENERATE_QHP is set to YES. + +QHP_CUST_FILTER_ATTRS = + +# The QHP_SECT_FILTER_ATTRS tag specifies the list of the attributes this +# project's filter section matches. Qt Help Project / Filter Attributes (see: +# http://qt-project.org/doc/qt-4.8/qthelpproject.html#filter-attributes). +# This tag requires that the tag GENERATE_QHP is set to YES. + +QHP_SECT_FILTER_ATTRS = + +# The QHG_LOCATION tag can be used to specify the location of Qt's +# qhelpgenerator. If non-empty doxygen will try to run qhelpgenerator on the +# generated .qhp file. +# This tag requires that the tag GENERATE_QHP is set to YES. + +QHG_LOCATION = + +# If the GENERATE_ECLIPSEHELP tag is set to YES, additional index files will be +# generated, together with the HTML files, they form an Eclipse help plugin. To +# install this plugin and make it available under the help contents menu in +# Eclipse, the contents of the directory containing the HTML and XML files needs +# to be copied into the plugins directory of eclipse. The name of the directory +# within the plugins directory should be the same as the ECLIPSE_DOC_ID value. +# After copying Eclipse needs to be restarted before the help appears. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +GENERATE_ECLIPSEHELP = NO + +# A unique identifier for the Eclipse help plugin. When installing the plugin +# the directory name containing the HTML and XML files should also have this +# name. Each documentation set should have its own identifier. +# The default value is: org.doxygen.Project. +# This tag requires that the tag GENERATE_ECLIPSEHELP is set to YES. + +ECLIPSE_DOC_ID = org.doxygen.Project + +# If you want full control over the layout of the generated HTML pages it might +# be necessary to disable the index and replace it with your own. The +# DISABLE_INDEX tag can be used to turn on/off the condensed index (tabs) at top +# of each HTML page. A value of NO enables the index and the value YES disables +# it. Since the tabs in the index contain the same information as the navigation +# tree, you can set this option to YES if you also set GENERATE_TREEVIEW to YES. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +DISABLE_INDEX = NO + +# The GENERATE_TREEVIEW tag is used to specify whether a tree-like index +# structure should be generated to display hierarchical information. If the tag +# value is set to YES, a side panel will be generated containing a tree-like +# index structure (just like the one that is generated for HTML Help). For this +# to work a browser that supports JavaScript, DHTML, CSS and frames is required +# (i.e. any modern browser). Windows users are probably better off using the +# HTML help feature. Via custom style sheets (see HTML_EXTRA_STYLESHEET) one can +# further fine-tune the look of the index. As an example, the default style +# sheet generated by doxygen has an example that shows how to put an image at +# the root of the tree instead of the PROJECT_NAME. Since the tree basically has +# the same information as the tab index, you could consider setting +# DISABLE_INDEX to YES when enabling this option. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +GENERATE_TREEVIEW = NO + +# The ENUM_VALUES_PER_LINE tag can be used to set the number of enum values that +# doxygen will group on one line in the generated HTML documentation. +# +# Note that a value of 0 will completely suppress the enum values from appearing +# in the overview section. +# Minimum value: 0, maximum value: 20, default value: 4. +# This tag requires that the tag GENERATE_HTML is set to YES. + +ENUM_VALUES_PER_LINE = 1 + +# If the treeview is enabled (see GENERATE_TREEVIEW) then this tag can be used +# to set the initial width (in pixels) of the frame in which the tree is shown. +# Minimum value: 0, maximum value: 1500, default value: 250. +# This tag requires that the tag GENERATE_HTML is set to YES. + +TREEVIEW_WIDTH = 250 + +# If the EXT_LINKS_IN_WINDOW option is set to YES, doxygen will open links to +# external symbols imported via tag files in a separate window. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +EXT_LINKS_IN_WINDOW = NO + +# Use this tag to change the font size of LaTeX formulas included as images in +# the HTML documentation. When you change the font size after a successful +# doxygen run you need to manually remove any form_*.png images from the HTML +# output directory to force them to be regenerated. +# Minimum value: 8, maximum value: 50, default value: 10. +# This tag requires that the tag GENERATE_HTML is set to YES. + +FORMULA_FONTSIZE = 10 + +# Use the FORMULA_TRANPARENT tag to determine whether or not the images +# generated for formulas are transparent PNGs. Transparent PNGs are not +# supported properly for IE 6.0, but are supported on all modern browsers. +# +# Note that when changing this option you need to delete any form_*.png files in +# the HTML output directory before the changes have effect. +# The default value is: YES. +# This tag requires that the tag GENERATE_HTML is set to YES. + +FORMULA_TRANSPARENT = YES + +# Enable the USE_MATHJAX option to render LaTeX formulas using MathJax (see +# http://www.mathjax.org) which uses client side Javascript for the rendering +# instead of using pre-rendered bitmaps. Use this if you do not have LaTeX +# installed or if you want to formulas look prettier in the HTML output. When +# enabled you may also need to install MathJax separately and configure the path +# to it using the MATHJAX_RELPATH option. +# The default value is: NO. +# This tag requires that the tag GENERATE_HTML is set to YES. + +USE_MATHJAX = NO + +# When MathJax is enabled you can set the default output format to be used for +# the MathJax output. See the MathJax site (see: +# http://docs.mathjax.org/en/latest/output.html) for more details. +# Possible values are: HTML-CSS (which is slower, but has the best +# compatibility), NativeMML (i.e. MathML) and SVG. +# The default value is: HTML-CSS. +# This tag requires that the tag USE_MATHJAX is set to YES. + +MATHJAX_FORMAT = HTML-CSS + +# When MathJax is enabled you need to specify the location relative to the HTML +# output directory using the MATHJAX_RELPATH option. The destination directory +# should contain the MathJax.js script. For instance, if the mathjax directory +# is located at the same level as the HTML output directory, then +# MATHJAX_RELPATH should be ../mathjax. The default value points to the MathJax +# Content Delivery Network so you can quickly see the result without installing +# MathJax. However, it is strongly recommended to install a local copy of +# MathJax from http://www.mathjax.org before deployment. +# The default value is: http://cdn.mathjax.org/mathjax/latest. +# This tag requires that the tag USE_MATHJAX is set to YES. + +MATHJAX_RELPATH = http://cdn.mathjax.org/mathjax/latest + +# The MATHJAX_EXTENSIONS tag can be used to specify one or more MathJax +# extension names that should be enabled during MathJax rendering. For example +# MATHJAX_EXTENSIONS = TeX/AMSmath TeX/AMSsymbols +# This tag requires that the tag USE_MATHJAX is set to YES. + +MATHJAX_EXTENSIONS = + +# The MATHJAX_CODEFILE tag can be used to specify a file with javascript pieces +# of code that will be used on startup of the MathJax code. See the MathJax site +# (see: http://docs.mathjax.org/en/latest/output.html) for more details. For an +# example see the documentation. +# This tag requires that the tag USE_MATHJAX is set to YES. + +MATHJAX_CODEFILE = + +# When the SEARCHENGINE tag is enabled doxygen will generate a search box for +# the HTML output. The underlying search engine uses javascript and DHTML and +# should work on any modern browser. Note that when using HTML help +# (GENERATE_HTMLHELP), Qt help (GENERATE_QHP), or docsets (GENERATE_DOCSET) +# there is already a search function so this one should typically be disabled. +# For large projects the javascript based search engine can be slow, then +# enabling SERVER_BASED_SEARCH may provide a better solution. It is possible to +# search using the keyboard; to jump to the search box use + S +# (what the is depends on the OS and browser, but it is typically +# , /