diff --git a/.gitattributes b/.gitattributes index 4d1315ede4efec..de9c0b51d763cd 100644 --- a/.gitattributes +++ b/.gitattributes @@ -63,3 +63,9 @@ #*.PDF diff=astextplain #*.rtf diff=astextplain #*.RTF diff=astextplain + +*.PNG filter=lfs diff=lfs merge=lfs -text +*.png filter=lfs diff=lfs merge=lfs -text +*.jpg filter=lfs diff=lfs merge=lfs -text +*.gif filter=lfs diff=lfs merge=lfs -text +*.vsdx filter=lfs diff=lfs merge=lfs -text diff --git a/docs/HOWTO/Custom_Layers_Guide.md b/docs/HOWTO/Custom_Layers_Guide.md new file mode 100644 index 00000000000000..ddbd8126798aaa --- /dev/null +++ b/docs/HOWTO/Custom_Layers_Guide.md @@ -0,0 +1,212 @@ +# Custom Layers Guide {#openvino_docs_HOWTO_Custom_Layers_Guide} + +The Intel® Distribution of OpenVINO™ toolkit supports neural network model layers in multiple frameworks including TensorFlow*, Caffe*, MXNet*, Kaldi* and ONYX*. The list of known layers is different for each of the supported frameworks. To see the layers supported by your framework, refer to [supported frameworks](../MO_DG/prepare_model/Supported_Frameworks_Layers.md). + +Custom layers are layers that are not included in the list of known layers. If your topology contains any layers that are not in the list of known layers, the Model Optimizer classifies them as custom. + +This guide illustrates the workflow for running inference on topologies featuring custom layers, allowing you to plug in your own implementation for existing or completely new layers. +For a step-by-step example of creating and executing a custom layer, see the [Custom Layer Implementation Tutorials for Linux and Windows.](https://github.com/david-drew/OpenVINO-Custom-Layers/tree/master/2019.r2.0) + +## Terms used in this guide + +- *Layer* — The abstract concept of a math function that is selected for a specific purpose (relu, sigmoid, tanh, convolutional). This is one of a sequential series of building blocks within the neural network. +- *Kernel* — The implementation of a layer function, in this case, the math programmed (in C++ and Python) to perform the layer operation for target hardware (CPU or GPU). +- *Intermediate Representation (IR)* — Neural Network used only by the Inference Engine in OpenVINO abstracting the different frameworks and describing topology, layer parameters and weights. +The original format will be a supported framework such as TensorFlow, Caffe, or MXNet. + +- *Model Extension Generator* — Generates template source code files for each of the extensions needed by the Model Optimizer and the Inference Engine. + +- *Inference Engine Extension* — Device-specific module implementing custom layers (a set of kernels). + + +## Custom Layer Overview + +The [Model Optimizer](https://docs.openvinotoolkit.org/2019_R1.1/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) searches the list of known layers for each layer contained in the input model topology before building the model's internal representation, optimizing the model, and producing the Intermediate Representation files. + +The [Inference Engine](https://docs.openvinotoolkit.org/2019_R1.1/_docs_IE_DG_Deep_Learning_Inference_Engine_DevGuide.html) loads the layers from the input model IR files into the specified device plugin, which will search a list of known layer implementations for the device. If your topology contains layers that are not in the list of known layers for the device, the Inference Engine considers the layer to be unsupported and reports an error. To see the layers that are supported by each device plugin for the Inference Engine, refer to the [Supported Devices](https://docs.openvinotoolkit.org/2019_R1.1/_docs_IE_DG_supported_plugins_Supported_Devices.html) documentation. +
+**Note:** If a device doesn't support a particular layer, an alternative to creating a new custom layer is to target an additional device using the HETERO plugin. The [Heterogeneous Plugin](https://docs.openvinotoolkit.org/2019_R1.1/_docs_IE_DG_supported_plugins_HETERO.html) may be used to run an inference model on multiple devices allowing the unsupported layers on one device to "fallback" to run on another device (e.g., CPU) that does support those layers. + +## Custom Layer Implementation Workflow + +When implementing a custom layer for your pre-trained model in the Intel® Distribution of OpenVINO™ toolkit, you will need to add extensions to both the Model Optimizer and the Inference Engine. + +## Custom Layer Extensions for the Model Optimizer + +The following figure shows the basic processing steps for the Model Optimizer highlighting the two necessary custom layer extensions, the Custom Layer Extractor and the Custom Layer Operation. + +![](img/MO_extensions_flow.png) + + +The Model Optimizer first extracts information from the input model which includes the topology of the model layers along with parameters, input and output format, etc., for each layer. The model is then optimized from the various known characteristics of the layers, interconnects, and data flow which partly comes from the layer operation providing details including the shape of the output for each layer. Finally, the optimized model is output to the model IR files needed by the Inference Engine to run the model. + +The Model Optimizer starts with a library of known extractors and operations for each [supported model framework](https://docs.openvinotoolkit.org/2019_R1.1/_docs_MO_DG_prepare_model_Supported_Frameworks_Layers.html) which must be extended to use each unknown custom layer. The custom layer extensions needed by the Model Optimizer are: + +- Custom Layer Extractor + - Responsible for identifying the custom layer operation and extracting the parameters for each instance of the custom layer. The layer parameters are stored per instance and used by the layer operation before finally appearing in the output IR. Typically the input layer parameters are unchanged, which is the case covered by this tutorial. +- Custom Layer Operation + - Responsible for specifying the attributes that are supported by the custom layer and computing the output shape for each instance of the custom layer from its parameters.
The `--mo-op` command-line argument shown in the examples below generates a custom layer operation for the Model Optimizer. + +## Custom Layer Extensions for the Inference Engine + +The following figure shows the basic flow for the Inference Engine highlighting two custom layer extensions for the CPU and GPU Plugins, the Custom Layer CPU extension and the Custom Layer GPU Extension. + +![](img/IE_extensions_flow.png) + +Each device plugin includes a library of optimized implementations to execute known layer operations which must be extended to execute a custom layer. The custom layer extension is implemented according to the target device: + +- Custom Layer CPU Extension + - A compiled shared library (.so or .dll binary) needed by the CPU Plugin for executing the custom layer on the CPU. +- Custom Layer GPU Extension + - OpenCL source code (.cl) for the custom layer kernel that will be compiled to execute on the GPU along with a layer description file (.xml) needed by the GPU Plugin for the custom layer kernel. + +## Model Extension Generator + +Using answers to interactive questions or a *.json* configuration file, the Model Extension Generator tool generates template source code files for each of the extensions needed by the Model Optimizer and the Inference Engine. To complete the implementation of each extension, the template functions may need to be edited to fill-in details specific to the custom layer or the actual custom layer functionality itself. + +### Command-line + +The Model Extension Generator is included in the Intel® Distribution of OpenVINO™ toolkit installation and is run using the command (here with the "--help" option): + +```bash +python3 /opt/intel/openvino/deployment_tools/tools/extension_generator/extgen.py new --help +``` + +where the output will appear similar to: + +``` +usage: You can use any combination of the following arguments: + +Arguments to configure extension generation in the interactive mode: + +optional arguments: + -h, --help show this help message and exit + --mo-caffe-ext generate a Model Optimizer Caffe* extractor + --mo-mxnet-ext generate a Model Optimizer MXNet* extractor + --mo-tf-ext generate a Model Optimizer TensorFlow* extractor + --mo-op generate a Model Optimizer operation + --ie-cpu-ext generate an Inference Engine CPU extension + --ie-gpu-ext generate an Inference Engine GPU extension + --output_dir OUTPUT_DIR + set an output directory. If not specified, the current + directory is used by default. +``` + +The available command-line arguments are used to specify which extension(s) to generate templates for the Model Optimizer or Inference Engine. The generated extension files for each argument will appear starting from the top of the output directory as follows: + +Command-line Argument | Output Directory Location | +--------------------- | ------------------------------ | +`--mo-caffe-ext` | user_mo_extensions/front/caffe | +`--mo-mxnet-ext` | user_mo_extensions/front/mxnet | +`--mo-tf-ext` | user_mo_extensions/front/tf | +`--mo-op` | user_mo_extensions/ops | +`--ie-cpu-ext` | user_ie_extensions/cpu | +`--ie-gpu-ext` | user_ie_extensions/gpu | + +### Extension Workflow + +The workflow for each generated extension follows the same basic steps: + +![](img/MEG_generic_flow.png) + +**Step 1: Generate:** Use the Model Extension Generator to generate the Custom Layer Template Files. + +**Step 2: Edit:** Edit the Custom Layer Template Files as necessary to create the specialized Custom Layer Extension Source Code. + +**Step 3: Specify:** Specify the custom layer extension locations to be used by the Model Optimizer or Inference Engine. + +## Caffe\* Models with Custom Layers + +If your Caffe\* model has custom layers: + +**Register the custom layers as extensions to the Model Optimizer**. For instructions, see [Extending Model Optimizer with New Primitives](../MO_DG/prepare_model/customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md). When your custom layers are registered as extensions, the Model Optimizer generates a valid and optimized Intermediate Representation. You will need a bit of Python\* code that lets the Model Optimizer; + +- Generate a valid Intermediate Representation according to the rules you specified. +- Be independent from the availability of Caffe on your computer. + +If your model contains Custom Layers, it is important to understand the internal workflow of the Model Optimizer. Consider the following example. + +**Example**: + +The network has: + +* One input layer (#1) +* One output Layer (#5) +* Three internal layers (#2, 3, 4) + +The custom and standard layer types are: + +* Layers #2 and #5 are implemented as Model Optimizer extensions. +* Layers #1 and #4 are supported in Model Optimizer out-of-the box. +* Layer #3 is neither in the list of supported layers nor in extensions, but is specified in CustomLayersMapping.xml. + +> **NOTE**: If any of the layers are not in one of three categories described above, the Model Optimizer fails with an appropriate message and a link to the corresponding question in [Model Optimizer FAQ](../MO_DG/prepare_model/Model_Optimizer_FAQ.md). + +The general process is as shown: + +![Example custom layer network](img/mo_caffe_priorities.png) +
+ +**Step 1:** The example model is fed to the Model Optimizer that **loads the model** with the special parser built on top of the `caffe.proto` file. In case of failure, the Model Optimizer asks you to prepare the parser that can read the model. For more information, refer to the Model Optimizer, FAQ #1. + +**Step 2:** The Model Optimizer **extracts the attributes of all layers** by going through the list of layers and attempting to find the appropriate extractor. In order of priority, the Model Optimizer checks if the layer is: + +* A. Registered as a Model Optimizer extension +* B. Registered as a standard Model Optimizer layer + +When the Model Optimizer finds a satisfying condition from the list above, it extracts the attributes according to the following rules: + +* For A. - takes only the parameters specified in the extension +* For B. - takes only the parameters specified in the standard extractor +
+ +**Step 3:** The Model Optimizer **calculates the output shape of all layers**. The logic is the same as it is for the priorities. **Important:** the Model Optimizer always takes the first available option. + +**Step 4:** The Model Optimizer **optimizes the original model and produces the two Intermediate Representation (IR) files in .xml and .bin**. +
+ +## TensorFlow\* Models with Custom Layers + +You have two options for TensorFlow\* models with custom layers: +
+ +* **Register those layers as extensions to the Model Optimizer.** In this case, the Model Optimizer generates a valid and optimized Intermediate Representation. +* **If you have sub-graphs that should not be expressed with the analogous sub-graph in the Intermediate Representation, but another sub-graph should appear in the model, the Model Optimizer provides such an option.** This feature is helpful for many TensorFlow models. To read more, see [Sub-graph Replacement in the Model Optimizer](../MO_DG/prepare_model/customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md). + +## MXNet\* Models with Custom Layers + +There are two options to convert your MXNet* model that contains custom layers: + +1. Register the custom layers as extensions to the Model Optimizer. For instructions, see [Extending MXNet Model Optimizer with New Primitives](../MO_DG/prepare_model/customize_model_optimizer/Extending_MXNet_Model_Optimizer_with_New_Primitives.md). When your custom layers are registered as extensions, the Model Optimizer generates a valid and optimized Intermediate Representation. You can create Model Optimizer extensions for both MXNet layers with op `Custom` and layers which are not standard MXNet layers. + +2. If you have sub-graphs that should not be expressed with the analogous sub-graph in the Intermediate Representation, but another sub-graph should appear in the model, the Model Optimizer provides such an option. In MXNet the function is actively used for ssd models provides an opportunity to for the necessary subgraph sequences and replace them. To read more, see [Sub-graph Replacement in the Model Optimizer](../MO_DG/prepare_model/customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md). + +## Kaldi\* Models with Custom Layers +For information on converting your Kaldi* model containing custom layers see [Converting a Kaldi Model in the Model Optimizer Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_Kaldi.html). + +## ONNX\* Models with Custom Layers +For information on converting your ONNX* model containing custom layers see [Converting an ONNX Model in the Model Optimizer Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_ONNX.html). + +## Step-by-Step Custom Layers Tutorial +For a step-by-step walk-through creating and executing a custom layer, see [Custom Layer Implementation Tutorial for Linux and Windows.](https://github.com/david-drew/OpenVINO-Custom-Layers/tree/master/2019.r2.0) + +## Additional Resources + +- Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) +- OpenVINO™ toolkit online documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org) +- [Model Optimizer Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) +- [Kernel Extensivility in the Inference Engine Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Integrate_your_kernels_into_IE.html) +- [Inference Engine Samples Overview](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Samples_Overview.html) +- [Overview of OpenVINO™ Toolkit Pre-Trained Models](https://docs.openvinotoolkit.org/latest/_intel_models_index.html) +- [Inference Engine Tutorials](https://github.com/intel-iot-devkit/inference-tutorials-generic) +- For IoT Libraries and Code Samples see the [Intel® IoT Developer Kit](https://github.com/intel-iot-devkit). + +## Converting Models: + +- [Convert Your Caffe* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_Caffe.md) +- [Convert Your TensorFlow* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md) +- [Convert Your MXNet* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_MxNet.md) +- [Convert Your ONNX* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md) + + + diff --git a/docs/HOWTO/add_regression_test_vpu.md b/docs/HOWTO/add_regression_test_vpu.md new file mode 100644 index 00000000000000..e48a34cb7fed85 --- /dev/null +++ b/docs/HOWTO/add_regression_test_vpu.md @@ -0,0 +1,83 @@ +# Regression tests howto {#openvino_docs_HOWTO_add_regression_test_vpu} + +## Purpose + +This document contains instructions for correctly modifying a set of regression tests. + +## Common + +Regression tests for Myriad and HDDL plugins are on the path: +`inference-engine/tests/functional/vpu/regression_tests/` + +The tests are divided into 4 groups: +* Classification +* Detection +* Raw-results +* Compilation +* VPU hetero + +Testing framework – [Google Test](https://github.com/google/googletest/). +Each group contains [parameterized](https://github.com/google/googletest/blob/master/googletest/docs/advanced.md) tests. The main idea is that to add a new test, you only need to add a new parameter. Except for scenarios different from the generalized case. + +## Classsification and Detection tests + +These groups contains two cases: + +* For generalized scenario (` VpuNoClassificationRegression, VpuNoDetectionRegression`) +* For specific scenario (` VpuNoClassificationRegressionSpecific, VpuNoDetectionRegressionSpecific`) + +### Generalized scenario + +If You want test new parameter(batch, precision, model and etc.) then You need to edit the existing initialization of parameterized tests or create a new one. +Example of initialization of parameterized tests: + +``` c++ +INSTANTIATE_TEST_CASE_P( + VPURegTestWithResources_nightly, + VpuNoClassificationRegression, + Combine(ValuesIn(VpuTestParamsContainer::testingPlugin()), + Values(Precision::FP16), + Values(1), // batches + Values(true), //IsHwAdaptiveMode + Values(false), //DoReshape + Values(3, 5, 7), //Resources + Values(false), //IsIgnoreStatistic + Values(ClassificationSrcParam{ModelName::GoogleNetV1, SourceImages::kCat3, 0.01, Regression::EMean::eValues})), + VpuNoClassificationRegression::getTestCaseName); +``` + +### Specific scenario + +If You need a test to perform some actions that are not provided in the generalized scenario, then add a specific test case. As with the generalized scenario You can change parameters for these tests. +Example of specific test case: + +``` c++ +TEST_P(VpuNoClassificationRegressionSpecific, onAlexNetWithNetworkConfig) { + DISABLE_ON_WINDOWS_IF(HDDL_PLUGIN); + DISABLE_IF(do_reshape_); + + if (!hw_adaptive_mode_) { + config_[VPU_CONFIG_KEY(NETWORK_CONFIG)] = "data=data,scale=1"; + } + + assertThat().classificationResultsForInferRequestAPI() + .on(SourceImages::kDog2) + .withInputPrecision(in_precision_) + .times(batch_) + .withBatch(batch_) + .onModel(ModelName::AlexNet) + .setMean(Regression::EMean::eImage) + .onFP16() + .withTopK(1) + .withPluginConfig(config_) + .equalToReferenceWithDelta(0.04); +} +``` + +## Raw-results tests + +There is no generalized scenario and recommendations are the same as for specific test cases for Classification/Detection groups. + +## Compilation tests + +The tests are in the `vpu_classification_regression.cpp` file and contains only one scenario ` VpuNoRegressionWithCompilation `. To add a new test just update parameters just as in generalized scenarion of Classification/Detection test groups. diff --git a/docs/HOWTO/fuzzing-HOWTO.md b/docs/HOWTO/fuzzing-HOWTO.md new file mode 100644 index 00000000000000..614e94eec4e603 --- /dev/null +++ b/docs/HOWTO/fuzzing-HOWTO.md @@ -0,0 +1,94 @@ +# Fuzzing howto {#openvino_docs_HOWTO_fuzzing_HOWTO} + +## Intended Audience + +This document is for a developer who wants to contribute fuzz tests. + +## Purpose + +This document walks you through creating your first fuzzer, running it and evaluating its quality. + +## Prerequisites + +- Linux OS or Mac OS. + +- [American Fuzzy Loop](http://lcamtuf.coredump.cx/afl/) if building with GCC. + +## Steps + +1. Create a fuzz test in the existing project at `./tests/fuzz`. Fuzz test must + follow `-fuzzer.cc` naming scheme and implement a + `LLVMFuzzerTestOneInput` entry point. + +``` bash +cat << EOF > ./tests/fuzz/test_name-fuzzer.cc +#include +#include + +extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { + // put your fuzzing code here and use data+size as input. + return 0; // always return 0 +} +EOF +``` + +2. Implement test logic under `LLVMFuzzerTestOneInput`. + +See example fuzz test at `tests/fuzz/read_network-fuzzer.cc`. + +3. Build fuzz tests with `-DENABLE_FUZZING=ON` flag for cmake. + +``` bash + mkdir -p build && \ + (cd build && \ + CXX=afl-g++ CC=afl-gcc cmake -DCMAKE_BUILD_TYPE=Debug -DENABLE_FUZZING=ON -DENABLE_TESTS=ON .. && \ + make fuzz --jobs=$(getconf _NPROCESSORS_ONLN)) +``` + +4. Prepare sample inputs for your fuzz test to teach fuzzer engine on input + structure + +``` bash +(cd bin/intel64/Debug && \ +mkdir test_name-corpus && \ +echo sample input > test_name-corpus/in1.txt) +``` + +5. Evaluate fuzz test with `afl-fuzz` fuzzing engine + +Run fuzz test: + +``` bash +(cd bin/intel64/Debug && \ +afl-fuzz -i test_name-corpus -o test_name-out -- ./test_name-fuzzer @@ +``` + +While fuzz test is running it prints out statistics. Besides just crashes `uniq +crashes` and hangs `uniq hangs` you should care about fuzz test quality: + +- Fuzz test should be fast - speed of execution `exec speed` should be at least + 100 exec/s. Speed less than 20 exec/s is not acceptable. + +- Fuzz test should be able to explore new code paths `map coverage` and + `findings in depth`. Confirm it is increasing while fuzz test is running. + +6. Reproduce fuzz test findings + +All issues found by fuzz test are stored as a file in output folder specified +earlier via `-o` afl-fuzz option. To reproduce an issue run fuzz test executable +with an issue file as an argument. + +## Summary + +We have created a simple fuzz test, run it and asses its results. + +## Extension + +Try run parallel fuzzing with the help of +[afl-utils](https://gitlab.com/rc0r/afl-utils). + +## Tips or FAQs + +GCC 7 in Ubuntu 18.04 LTS has a +[defect](https://bugs.launchpad.net/ubuntu/+source/afl/+bug/1774816). Upgrade +GCC 7 for AFL to work. GCC version `Ubuntu 7.3.0-27ubuntu1~18.04` works OK. diff --git a/docs/HOWTO/img/IE_extensions_flow.png b/docs/HOWTO/img/IE_extensions_flow.png new file mode 100644 index 00000000000000..ca665ca3298bbb --- /dev/null +++ b/docs/HOWTO/img/IE_extensions_flow.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c2f362a39ae6c2af080e4f055b6fdba4954f918f85731545d1df3d687d9213d5 +size 421056 diff --git a/docs/HOWTO/img/MEG_generic_flow.png b/docs/HOWTO/img/MEG_generic_flow.png new file mode 100644 index 00000000000000..a492c3fff5026b --- /dev/null +++ b/docs/HOWTO/img/MEG_generic_flow.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cb5c700d003936779455353bfa4ed9432410c0975c46e2dfd30c6a1abccd1727 +size 23320 diff --git a/docs/HOWTO/img/MO_extensions_flow.png b/docs/HOWTO/img/MO_extensions_flow.png new file mode 100644 index 00000000000000..5009c0ce2604ad --- /dev/null +++ b/docs/HOWTO/img/MO_extensions_flow.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:99d6b5146be85fa408dc5432883c3e2745cffe890133854a97dcf22f5c5962d4 +size 47564 diff --git a/docs/HOWTO/img/mo_caffe_priorities.png b/docs/HOWTO/img/mo_caffe_priorities.png new file mode 100644 index 00000000000000..665892316c17fc --- /dev/null +++ b/docs/HOWTO/img/mo_caffe_priorities.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0a4de6e502cae7542f1f311bcdbea6bb145f960f0d27d86a03160d1a60133778 +size 301310 diff --git a/docs/IE_DG/API_Changes.md b/docs/IE_DG/API_Changes.md new file mode 100644 index 00000000000000..f3a7c45417dde4 --- /dev/null +++ b/docs/IE_DG/API_Changes.md @@ -0,0 +1,496 @@ +# Inference Engine API Changes History {#openvino_docs_IE_DG_API_Changes} + +The sections below contain detailed list of changes made to the Inference Engine API in recent releases. + +## Deprecation Notice + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit. + +Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware. + + +## 2020.4 + +### New API + + **CPU Plugin API:** + + * InferenceEngine::PluginConfigParams::KEY_ENFORCE_BF16 config key + + **Metrics and values for Query API:** + + * METRIC_KEY(OPTIMIZATION_CAPABILITIES) + * METRIC_VALUE(BF16) + +## 2020.2 + +### New API + + **Extensibility API:** + + * InferenceEngine::IExtension::getImplTypes(const std::shared_ptr& node) method + * InferenceEngine::IExtension::getImplementation(const std::shared_ptr& node, const std::string& implType) method + +### Deprecated API + + **Extensibility API:** + + * InferenceEngine::ILayerImplFactory class + * InferenceEngine::IShapeInferImpl class + * InferenceEngine::IShapeInferImpl class + * InferenceEngine::IShapeInferExtension class + * InferenceEngine::IExtension::getFactoryFor(ILayerImplFactory\*& factory, const CNNLayer\* cnnLayer, ResponseDesc\* resp) noexcept method + * InferenceEngine::IExtension::getPrimitiveTypes(char\*\*& types, unsigned int& size, ResponseDesc\* resp) noexcept method + * InferenceEngine::ShapeInferImpl class + * InferenceEngine::Extension::getFactoryFor(ILayerImplFactory\*& factory, const CNNLayer\* cnnLayer, ResponseDesc\* resp) noexcept method + * InferenceEngine::Extension::getPrimitiveTypes(char\*\*& types, unsigned int& size, ResponseDesc\* resp) noexcept method + + **Network API:** + + * InferenceEngine::details::CNNNetworkIterator class + * InferenceEngine::CNNNetwork::getPrecision() const method + * InferenceEngine::CNNNetwork::getLayerByName(const char\* layerName) const method + * InferenceEngine::CNNNetwork::size() const method + * InferenceEngine::CNNNetwork::begin() const method + * InferenceEngine::CNNNetwork::end() const method + * InferenceEngine::CNNNetwork::AddExtension(const IShapeInferExtensionPtr& extension) method + * InferenceEngine::ICNNNetwork::getPrecision() const noexcept method + * InferenceEngine::ICNNNetwork::getName(char\* pName, size_t len) const noexcept method + * InferenceEngine::ICNNNetwork::getData(const char\* dname) noexcept method + * InferenceEngine::ICNNNetwork::addLayer(const CNNLayerPtr& layer) noexcept method + * InferenceEngine::ICNNNetwork::getLayerByName(const char\* layerName, CNNLayerPtr& out, ResponseDesc\* resp) const noexcept method + * InferenceEngine::ICNNNetwork::AddExtension(const IShapeInferExtensionPtr& extension, ResponseDesc\* resp) noexcept method + * InferenceEngine::ICNNNetwork::getStats(ICNNNetworkStats\*\* stats, ResponseDesc\* resp) const noexcept method + * InferenceEngine::ICNNNetworkStats class + * InferenceEngine::NetworkNodeStats class + * InferenceEngine::Data::getCreatorLayer() method + * InferenceEngine::Data::getInputTo() method + * InferenceEngine::LayerParams class + + **Layer API:** + + * InferenceEngine::CNNLayer class + * InferenceEngine::WeightableLayer class + * InferenceEngine::BatchNormalizationLayer class + * InferenceEngine::BatchToSpaceLayer class + * InferenceEngine::BinaryConvolutionLayer class + * InferenceEngine::BroadcastLayer class + * InferenceEngine::BucketizeLayer class + * InferenceEngine::ClampLayer class + * InferenceEngine::ConcatLayer class + * InferenceEngine::ConvolutionLayer class + * InferenceEngine::CropLayer class + * InferenceEngine::DeconvolutionLayer class + * InferenceEngine::DeformableConvolutionLayer class + * InferenceEngine::DepthToSpaceLayer class + * InferenceEngine::EltwiseLayer class + * InferenceEngine::ExperimentalDetectronPriorGridGenerator class + * InferenceEngine::ExperimentalDetectronPriorGridGeneratorLayer class + * InferenceEngine::ExperimentalSparseWeightedReduceLayer class + * InferenceEngine::FillLayer class + * InferenceEngine::FullyConnectedLayer class + * InferenceEngine::GRNLayer class + * InferenceEngine::GRUCell class + * InferenceEngine::GatherLayer class + * InferenceEngine::GemmLayer class + * InferenceEngine::LSTMCell class + * InferenceEngine::MVNLayer class + * InferenceEngine::MathLayer class + * InferenceEngine::NonMaxSuppression class + * InferenceEngine::NormLayer class + * InferenceEngine::OneHotLayer class + * InferenceEngine::PReLULayer class + * InferenceEngine::PadLayer class + * InferenceEngine::PoolingLayer class + * InferenceEngine::PowerLayer class + * InferenceEngine::QuantizeLayer class + * InferenceEngine::RNNCell class + * InferenceEngine::RNNCellBase class + * InferenceEngine::RNNSequenceLayer class + * InferenceEngine::RangeLayer class + * InferenceEngine::ReLU6Layer class + * InferenceEngine::ReLULayer class + * InferenceEngine::ReduceLayer class + * InferenceEngine::ReshapeLayer class + * InferenceEngine::ReverseSequenceLayer class + * InferenceEngine::ScaleShiftLayer class + * InferenceEngine::ScatterLayer class + * InferenceEngine::SelectLayer class + * InferenceEngine::ShuffleChannelsLayer class + * InferenceEngine::SoftMaxLayer class + * InferenceEngine::SpaceToBatchLayer class + * InferenceEngine::SpaceToDepthLayer class + * InferenceEngine::SparseFillEmptyRowsLayer class + * InferenceEngine::SparseSegmentReduceLayer class + * InferenceEngine::SparseToDenseLayer class + * InferenceEngine::SplitLayer class + * InferenceEngine::StridedSliceLayer class + * InferenceEngine::TensorIterator class + * InferenceEngine::TileLayer class + * InferenceEngine::TopKLayer class + * InferenceEngine::UniqueLayer class + +## 2020.1 + +### New API + + **Integration with ngraph API:** + + * InferenceEngine::CNNNetwork(const std::shared_ptr& network) ctor from ngraph::Function + * InferenceEngine::CNNNetwork::getFunction() const noexcept method + * InferenceEngine::ICNNNetwork::getFunction() const noexcept method + * InferenceEngine::Parameter(const std::shared_ptr& var) ctor + * InferenceEngine::Parameter::asVariant() const method + * InferenceEngine::Parameter::operator std::shared_ptr() const operator + * InferenceEngine::Core::ReadNetwork(const std::wstring& modelPath, const std::wstring& binPath) method + * InferenceEngine::Core::ReadNetwork(const std::string& modelPath, const std::string& binPath = "") method + * InferenceEngine::Core::ReadNetwork(const std::string& model, const Blob::CPtr& weights) method + * InferenceEngine::Code::AddExtension(const IExtensionPtr& extension) method + * InferenceEngine::IExtension::getOpSets() method + + + **Offline compilation: import / export to std::stream:** + + * InferenceEngine::ExecutableNetwork::Export(std::ostream& networkModel) method + * InferenceEngine::Core::ImportNetwork(std::istream& networkModel, const std::string& deviceName = {}, const std::map& config = {}) method + * InferenceEngine::IExecutableNetwork::Export(std::ostream& networkModel, ResponseDesc \*resp) noexcept method + + + **RemoteBlob accelerator memory sharing API:** + + * InferenceEngine::RemoteContext class + * InferenceEngine::RemoteBlob class + * InferenceEngine::Core::CreateContext(const std::string& deviceName, const ParamMap& params) method + * InferenceEngine::Core::GetDefaultContext(const std::string& deviceName) method + * InferenceEngine::Core::LoadNetwork(CNNNetwork network, RemoteContext::Ptr context, const std::map& config = std::map()) method + + + **GNA firmware model image generation:** + + * GNA_CONFIG_KEY(FIRMWARE_MODEL_IMAGE_GENERATION) config key + * GNA_CONFIG_VALUE(GEN) value + * GNA_CONFIG_VALUE(GEN_EXACT) value + * GNA_CONFIG_VALUE(SSE) value + * GNA_CONFIG_VALUE(SSE_EXACT) value + * GNA_CONFIG_VALUE(AVX1) value + * GNA_CONFIG_VALUE(AVX1_EXACT) value + * GNA_CONFIG_VALUE(AVX2) value + * GNA_CONFIG_VALUE(AVX2_EXACT) value + + **MemoryBlob mapping of memory to the user space:** + + * InferenceEngine::MemoryBlob::rwmap() noexcept method + * InferenceEngine::MemoryBlob::rmap() noexcept method + * InferenceEngine::MemoryBlob::wmap() noexcept method + + **Memory interoperability on acceleration devices. General classes and GPU helper functions** + * InferenceEngine::RemoteBlob class + * InferenceEngine::RemoteContext class + * InferenceEngine::Core::CreateContext(const std::string& deviceName, const ParamMap& params) method + * InferenceEngine::Core::GetDefaultContext(const std::string& deviceName) method + * InferenceEngine::make_shared_blob(const TensorDesc& desc, RemoteContext::Ptr ctx) function + * InferenceEngine::gpu::make_shared_blob_nv12(size_t height, size_t width, RemoteContext::Ptr ctx, VASurfaceID nv12_surf) function + * InferenceEngine::gpu::make_shared_context(Core& core, std::string deviceName, VADisplay device) function + * InferenceEngine::gpu::make_shared_blob(const TensorDesc& desc, RemoteContext::Ptr ctx, VASurfaceID surface, uint32_t plane = 0) function + * InferenceEngine::gpu::make_shared_blob_nv12(RemoteContext::Ptr ctx, cl::Image2D& nv12_image_plane_y, cl::Image2D& nv12_image_plane_uv) function + * InferenceEngine::gpu::make_shared_context(Core& core, std::string deviceName, cl_context ctx) function + * InferenceEngine::gpu::make_shared_blob(const TensorDesc& desc, ClContext::Ptr ctx) function + * InferenceEngine::gpu::make_shared_blob(const TensorDesc& desc, RemoteContext::Ptr ctx, cl::Buffer& buffer) function + * InferenceEngine::gpu::make_shared_blob(const TensorDesc& desc, RemoteContext::Ptr ctx, cl_mem buffer) function + * InferenceEngine::gpu::make_shared_blob(const TensorDesc& desc, RemoteContext::Ptr ctx, cl::Image2D& image) function + +### Deprecated API + + **Inference Engine NN Builder API:** + + * InferenceEngine::Builder::EltwiseLayer + * InferenceEngine::Builder::MemoryLayer + * InferenceEngine::Builder::ROIPoolingLayer + * InferenceEngine::Builder::DeconvolutionLayer + * InferenceEngine::Builder::ReLULayer + * InferenceEngine::Builder::TanHLayer + * InferenceEngine::Builder::InputLayer + * InferenceEngine::Builder::PoolingLayer + * InferenceEngine::Builder::CropLayer + * InferenceEngine::Builder::GRUSequenceLayer + * InferenceEngine::Builder::NormLayer + * InferenceEngine::Builder::LSTMSequenceLayer + * InferenceEngine::Builder::ClampLayer + * InferenceEngine::Builder::PSROIPoolingLayer + * InferenceEngine::Builder::Layer + * InferenceEngine::Builder::RNNSequenceLayer + * InferenceEngine::Builder::ReorgYoloLayer + * InferenceEngine::Builder::NormalizeLayer + * InferenceEngine::Builder::PriorBoxClusteredLayer + * InferenceEngine::Builder::MVNLayer + * InferenceEngine::Builder::PermuteLayer + * InferenceEngine::Builder::SimplerNMSLayer + * InferenceEngine::Builder::ConstLayer + * InferenceEngine::Builder::DeformableConvolutionLayer + * InferenceEngine::Builder::FullyConnectedLayer + * InferenceEngine::Builder::PriorBoxLayer + * InferenceEngine::Builder::SoftMaxLayer + * InferenceEngine::Builder::OutputLayer + * InferenceEngine::Builder::TileLayer + * InferenceEngine::Builder::SplitLayer + * InferenceEngine::Builder::PReLULayer + * InferenceEngine::Builder::RegionYoloLayer + * InferenceEngine::Builder::ReshapeLayer + * InferenceEngine::Builder::ConvolutionLayer + * InferenceEngine::Builder::DetectionOutputLayer + * InferenceEngine::Builder::ConcatLayer + * InferenceEngine::Builder::ELULayer + * InferenceEngine::Builder::GRNLayer + * InferenceEngine::Builder::LRNLayer + * InferenceEngine::Builder::ArgMaxLayer + * InferenceEngine::Builder::ReLU6Layer + * InferenceEngine::Builder::ScaleShiftLayer + * InferenceEngine::Builder::ProposalLayer + * InferenceEngine::Builder::SigmoidLayer + * InferenceEngine::Builder::ResampleLayer + * InferenceEngine::Builder::CTCGreedyDecoderLayer + * InferenceEngine::Builder::BatchNormalizationLayer + * InferenceEngine::Builder::LayerDecorator + * InferenceEngine::Builder::PowerLayer + * InferenceEngine::Builder::Network + * InferenceEngine::Builder::PortInfo + * InferenceEngine::Builder::Connection + * InferenceEngine::Builder::PortData + * InferenceEngine::Builder::Port + * InferenceEngine::Builder::ILayer + * InferenceEngine::Builder::INetworkIterator + * InferenceEngine::Builder::INetwork + * InferenceEngine::Builder::ILayer + + + **Plugin API:** + + * InferenceEngine::InferencePlugin C++ plugin wrapper class + * InferenceEngine::IInferencePlugin plugin interface + * InferenceEngine::PluginDispatcher class + * InferenceEngine::InferenceEnginePluginPtr typedef + * InferenceEngine::ICNNNetReader reader interface + * InferenceEngine::CNNNetReader class + + **Blob API:** + + * Blob::element_size() const noexcept method + * Blob::buffer() noexcept method + * Blob::cbuffer() noexcept method + * MemoryBlob::buffer() noexcept method + * MemoryBlob::cbuffer() noexcept method + + +### Removed API + + Removed all [Inference Engine API which deprecated in 2019'R2](https://docs.openvinotoolkit.org/2019_R3/_docs_IE_DG_API_Changes.html#deprecated_api) + +## 2019 R3 + +### New API + + **New supported layers:** + + * InferenceEngine::SparseFillEmptyRowsLayer new class + * InferenceEngine::UniqueLayer new class + * InferenceEngine::NonMaxSuppressionLayer new class + * InferenceEngine::ScatterLayer new class + + **FPGA plugin streaming support:** + + * DLIA_METRIC_VALUE(INPUT_STREAMING) value to METRIC_KEY(OPTIMIZATION_CAPABILITIES) + * DLIA_CONFIG_KEY(ENABLE_STREAMING) config key + +### Removed API + + * InferenceEngine::EltwiseLayer::Select from InferenceEngine::EltwiseLayer::eOperation enumeration + +## 2019 R2 + +### New API + + **Inference Engine Core API:** + + * Introduced InferenceEngine::Core high level class to manage devices + + **Query API extensions to InferenceEngine::ExecutableNetwork and InferenceEngine::IExecutableNetwork:** + + * InferenceEngine::ExecutableNetwork::SetConfig method + * InferenceEngine::ExecutableNetwork::GetConfig method + * InferenceEngine::ExecutableNetwork::GetMetric method + * InferenceEngine::IExecutableNetwork::SetConfig method + * InferenceEngine::IExecutableNetwork::GetConfig method + * InferenceEngine::IExecutableNetwork::GetMetric method + + **Metrics and values for Query API:** + + * METRIC_KEY(AVAILABLE_DEVICES) + * METRIC_KEY(SUPPORTED_METRICS) + * METRIC_KEY(SUPPORTED_CONFIG_KEYS) + * METRIC_KEY(FULL_DEVICE_NAME) + * METRIC_KEY(OPTIMIZATION_CAPABILITIES) + * METRIC_VALUE(FP32) + * METRIC_VALUE(FP16) + * METRIC_VALUE(INT8) + * METRIC_VALUE(BIN) + * METRIC_VALUE(WINOGRAD) + * DLIA_METRIC_VALUE(FP11) + * METRIC_KEY(RANGE_FOR_STREAMS) + * METRIC_KEY(NUMBER_OF_WAITING_INFER_REQUESTS) + * METRIC_KEY(NUMBER_OF_EXEC_INFER_REQUESTS) + * METRIC_KEY(DEVICE_THERMAL) + * METRIC_KEY(RANGE_FOR_ASYNC_INFER_REQUESTS) + * EXEC_NETWORK_METRIC_KEY(NETWORK_NAME) + * EXEC_NETWORK_METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS) + + **Common API:** + + * CLDNN_CONFIG_KEY(INT8_ENABLED) config key + * CONFIG_KEY(GPU_THROUGHPUT_AUTO) + * CONFIG_KEY(GPU_THROUGHPUT_STREAMS) + * DLIA_CONFIG_KEY(IO_TRANSFORMATIONS_NATIVE) config key + * DLIA_CONFIG_KEY(DUMP_SUPPORTED_LAYERS_INFORMATION) config key + * GNA_CONFIG_VALUE(SW_FP32) config value for GNA_CONFIG_KEY(DEVICE_MODE) key + * MULTI_CONFIG_KEY(DEVICE_PRIORITIES) config key for `MULTI` device + * InferenceEngine::CNNNetReader::ReadNetwork(const std::wstring &filepath) new method + * InferenceEngine::CNNNetReader::ReadWeights(const std::wstring &filepath) new method + * InferenceEngine::ExecutableNetwork::ExecutableNetwork(IExecutableNetwork::Ptr actual, InferenceEnginePluginPtr plg) constructor with additional `plg` parameter + * InferenceEngine::InferRequest::InferRequest(IInferRequest::Ptr request, InferenceEnginePluginPtr plg) constructor with additional `plg` parameter + * InferenceEngine::Data::setName method + * InferenceEngine::QueryNetworkResult::supportedLayersMap + * InferenceEngine::Precision::I64 extension to InferenceEngine::Precision::ePrecision enumeration + + **New supported primitives:** + + * InferenceEngine::Builder::DeformableConvolutionLayer new class + * InferenceEngine::DeformableConvolutionLayer new class + * InferenceEngine::EltwiseLayer::Logical_NOT, InferenceEngine::EltwiseLayer::Mean, InferenceEngine::EltwiseLayer::Select extensions to InferenceEngine::EltwiseLayer::eOperation enumeration + * InferenceEngine::OneHotLayer new class + * InferenceEngine::SelectLayer new class + * InferenceEngine::BroadcastLayer new class + * InferenceEngine::MathLayer new class + * InferenceEngine::ReduceLayer new class + * InferenceEngine::TopKLayer new class + + **Extensions to Blob creation API:** + + * InferenceEngine::Blob::is method + * InferenceEngine::Blob::is const method + * InferenceEngine::Blob::as method + * InferenceEngine::Blob::as const method + * InferenceEngine::Blob::getAllocator abstract method + * InferenceEngine::Blob::getHandle abstract method + * InferenceEngine::MemoryBlob class + * InferenceEngine::ColorFormat enumeration + * InferenceEngine::PreProcessInfo::setColorFormat method + * InferenceEngine::PreProcessInfo::getColorFormat method + * InferenceEngine::CompoundBlob class to work with blobs consisting of several planes + * InferenceEngine::NV12Blob class representing NV12 blob with two planes + +### Deprecated API + +The methods listed below are deprecated and will be removed in 2019 R4 release: + + **Common API:** + + * InferenceEngine::InputInfo::getInputPrecision method + * InferenceEngine::InputInfo::setInputPrecision method + * InferenceEngine::InputInfo::getDims method + * InferenceEngine::CNNLayer::GetParamsAsBool method + * InferenceEngine::CNNNetwork::CNNNetwork(ICNNNetwork* actual) constructor + * InferenceEngine::CNNNetwork::setTargetDevice method + * HETERO_CONFIG_KEY(DUMP_DLA_MESSAGES) config key + * InferenceEngine::ILayerImplFactory::getShapes method + * InferenceEngine::IShapeInferImpl::inferShapes(const std::vector&, const std::map& , const std::map&, std::vector&, ResponseDesc\*) method + * InferenceEngine::Data::setBatchSize method + * InferenceEngine::QueryNetworkResult::supportedLayers field + * InferenceEngine::ICNNNetwork::setBatchSize(const size_t size) method + * InferenceEngine::Blob::Resize method + * InferenceEngine::Blob::Reshape method + * InferenceEngine::TBlob::set method + + **InferenceEngine::IInferencePlugin and InferenceEngine:InferencePlugin obsolete methods:** + + * InferenceEngine::InferencePlugin::LoadNetwork(ICNNNetwork &network) method + * InferenceEngine::InferencePlugin::Infer method + * InferenceEngine::InferencePlugin::GetPerformanceCounts method + * InferenceEngine::InferencePlugin::QueryNetwork(const ICNNNetwork &network, QueryNetworkResult &res) const method + * InferenceEngine::IInferencePlugin::LoadNetwork(ICNNNetwork &network, ResponseDesc \*resp) method + * InferenceEngine::IInferencePlugin::Infer(const Blob &input, Blob &result, ResponseDesc \*resp) method + * InferenceEngine::IInferencePlugin::Infer(const BlobMap &input, BlobMap &result, ResponseDesc \*resp) method + * InferenceEngine::IInferencePlugin::GetPerformanceCounts method + * InferenceEngine::IInferencePlugin::QueryNetwork(const ICNNNetwork& network, QueryNetworkResult& res) const method + + + **Fields in InferenceEngine::Data class are replaced with appropriate methods:** + + * InferenceEngine::Data::precision field + * InferenceEngine::Data::layout field + * InferenceEngine::Data::dims field + * InferenceEngine::Data::creatorLayer field + * InferenceEngine::Data::name field + * InferenceEngine::Data::inputTo field + * InferenceEngine::Data::userObject field + + **Heterogeneous plugin:** + + * InferenceEngine::IHeteroDeviceLoader class + * InferenceEngine::IHeteroInferencePlugin class + * InferenceEngine::HeteroPluginPtr class + * operator InferenceEngine::InferencePlugin::HeteroPluginPtr operator + + **Blob creation API with dimensions in reverse order:** + + * InferenceEngine::Blob::Blob(Precision p) constructor + * InferenceEngine::Blob::Blob(Precision p, Layout l) constructor + * InferenceEngine::Blob::Blob(Precision p, const SizeVector &dims) constructor + * InferenceEngine::Blob::Blob(Precision p, Layout l, const SizeVector &dims) constructor + * InferenceEngine::TBlob::TBlob(Precision p, Layout l) constructor + * InferenceEngine::TBlob::TBlob(Precision p, Layout l, const SizeVector& dims) constructor + * InferenceEngine::TBlob::TBlob(Precision p, Layout l, const SizeVector& dims, T* ptr, size_t data_size) constructor + * InferenceEngine::TBlob::TBlob(Precision p, Layout l, const SizeVector &dims, std::shared_ptr alloc) constructor + * InferenceEngine::Blob::type() method + * InferenceEngine::Blob::precision() method + * InferenceEngine::Blob::layout() method + * InferenceEngine::Blob::dims() method + * InferenceEngine::make_shared_blob(Precision p, Layout l, const SizeVector &dims) function + * InferenceEngine::make_shared_blob(Precision p, const SizeVector &dims) function + * InferenceEngine::make_shared_blob(Precision p, Layout l, const TArg &arg) function + * InferenceEngine::make_shared_blob(Precision p, const TArg &arg) function + * InferenceEngine::make_shared_blob(TBlob &&arg) function + * InferenceEngine::make_shared_blob(Precision p, Layout l) function + * InferenceEngine::make_shared_blob(Precision p, Layout l, SizeVector dims, const std::vector &arg) function + * InferenceEngine::make_shared_blob(Precision p, Layout l, const std::vector &arg) function + * InferenceEngine::make_shared_blob(Precision p, const std::vector &arg) function + * InferenceEngine::make_shared_blob(Precision p, Layout l, const SizeVector &dims, TypeTo * ptr, size_t size) function + * InferenceEngine::make_shared_blob(Precision p, const SizeVector &dims, TypeTo * ptr, size_t size) function + * InferenceEngine::I_N variable + * InferenceEngine::I_C variable + * InferenceEngine::I_H variable + * InferenceEngine::I_W variable + * InferenceEngine::LayoutOffsetCounter class + * InferenceEngine::ConvertLayout function + + **API working with device enumeration:** + + * InferenceEngine::TargetDevice enumeration + * InferenceEngine::TargetDeviceInfo class + * InferenceEngine::getDeviceName function + * InferenceEngine::FindPluginRequest class + * InferenceEngine::FindPluginResponse class + * InferenceEngine::findPlugin(const FindPluginRequest &req, FindPluginResponse &result, ResponseDesc *resp) function + * InferenceEngine::ICNNNetwork::setTargetDevice method + * InferenceEngine::ICNNNetwork::getTargetDevice method + * InferenceEngine::PluginDispatcher::getPluginByDevice method + * InferenceEngine::PluginDispatcher::getSuitablePlugin method diff --git a/docs/IE_DG/Bfloat16Inference.md b/docs/IE_DG/Bfloat16Inference.md new file mode 100644 index 00000000000000..dcc48409cf2b71 --- /dev/null +++ b/docs/IE_DG/Bfloat16Inference.md @@ -0,0 +1,90 @@ +# Bfloat16 Inference {#openvino_docs_IE_DG_Bfloat16Inference} + +## Disclaimer + +Inference Engine with the bfloat16 inference implemented on CPU must support the `avx512_bf16` instruction and therefore the bfloat16 data format. + +## Introduction + +Bfloat16 computations (referred to as BF16) is the Brain Floating-Point format with 16 bits. This is a truncated 16-bit version of the 32-bit IEEE 754 single-precision floating-point format FP32. BF16 preserves 8 exponent bits as FP32 but reduces precision of the sign and mantissa from 24 bits to 8 bits. + +![bf16_format] + +Preserving the exponent bits keeps BF16 to the same range as the FP32 (~1e-38 to ~3e38). This simplifies conversion between two data types: you just need to skip or flush to zero 16 low bits. +Truncated mantissa leads to occasionally less precision, but according to [investigations](https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus), neural networks are more sensitive to the size of the exponent than the mantissa size. Also, in lots of models, precision is needed close to zero but not so much at the maximum range. +Another useful feature of BF16 is possibility to encode an INT8 in BF16 without loss of accuracy, because INT8 range completely fits in BF16 mantissa field. It reduces data flow in conversion from INT8 input image data to BF16 directly without intermediate representation in FP32, or in combination of [INT8 inference](Int8Inference.md) and BF16 layers. + +See the [Intel's site](https://software.intel.com/sites/default/files/managed/40/8b/bf16-hardware-numerics-definition-white-paper.pdf) for more bfloat16 format details. + +There are two ways to check if CPU device can support bfloat16 computations for models: +1. Query the instruction set via system `lscpu | grep avx512_bf16` or `cat /proc/cpuinfo | grep avx512_bf16`. +2. Use [Query API](InferenceEngine_QueryAPI.md) with `METRIC_KEY(OPTIMIZATION_CAPABILITIES)`, which should return `BF16` in the list of CPU optimization options: + +```cpp +InferenceEngine::Core core; +auto cpuOptimizationCapabilities = core.GetMetric("CPU", METRIC_KEY(OPTIMIZATION_CAPABILITIES)).as>(); +``` + +Current Inference Engine solution for bfloat16 inference uses Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) and supports inference of the following layers in BF16 computation mode: +* Convolution +* FullyConnected +* InnerProduct +* LRN +* Pooling + +This means that BF16 inference can only be performed with the CPU plugin on the layers listed above. All other layers are executed in FP32. + +## Lowering Inference Precision + +Lowering precision to increase performance is [widely used](https://software.intel.com/content/www/us/en/develop/articles/lower-numerical-precision-deep-learning-inference-and-training.html) for optimization of inference. The bfloat16 data type usage on CPU for the first time opens the possibility of default optimization approach. +The embodiment of this approach is to use the optimization capabilities of the current platform to achieve maximum performance while maintaining the accuracy of calculations within the acceptable range. + +Bfloat16 data usage provides the following benefits that increase performance: +1. Faster multiplication of two BF16 numbers because of shorter mantissa of bfloat16 data. +2. No need to support denormals and handling exceptions as this is a performance optimization. +3. Fast conversion of float32 to bfloat16 and vice versa. +4. Reduced size of data in memory, as a result, larger models fit in the same memory bounds. +5. Reduced amount of data that must be transferred, as a result, reduced data transition time. + +For default optimization on CPU, source model converts from FP32 or FP16 to BF16 and executes internally on platforms with native BF16 support. In that case, `KEY_ENFORCE_BF16` is set to `YES`. +The code below demonstrates how to check if the key is set: + +```cpp +InferenceEngine::Core core; +auto exeNetwork = core.LoadNetwork(network, "CPU"); +auto enforceBF16 = exeNetwork.GetConfig(PluginConfigParams::KEY_ENFORCE_BF16).as(); +``` + +To disable BF16 internal transformations, set the `KEY_ENFORCE_BF16` to `NO`. In this case, the model infers AS IS without modifications with precisions that were set on each layer edge. + +```cpp +InferenceEngine::Core core; +core.SetConfig({ { CONFIG_KEY(ENFORCE_BF16), CONFIG_VALUE(NO) } }, "CPU"); +``` + +An exception with message `Platform doesn't support BF16 format` is formed in case of setting `KEY_ENFORCE_BF16` to `YES` on CPU without native BF16 support. + +Low-Precision 8-bit integer models do not convert to BF16, even if bfloat16 optimization is set by default. + +## Performance Counters + +Information about layer precision is stored in the performance counters that are +available from the Inference Engine API. The layers have the following marks: +* Suffix `BF16` for layers that had bfloat16 data type input and were computed in BF16 precision +* Suffix `FP32` for layers computed in 32-bit precision + +For example, the performance counters table for the Inception model can look as follows: + +``` +pool5 EXECUTED layerType: Pooling realTime: 143 cpu: 143 execType: jit_avx512_BF16 +fc6 EXECUTED layerType: FullyConnected realTime: 47723 cpu: 47723 execType: jit_gemm_BF16 +relu6 NOT_RUN layerType: ReLU realTime: 0 cpu: 0 execType: undef +fc7 EXECUTED layerType: FullyConnected realTime: 7558 cpu: 7558 execType: jit_gemm_BF16 +relu7 NOT_RUN layerType: ReLU realTime: 0 cpu: 0 execType: undef +fc8 EXECUTED layerType: FullyConnected realTime: 2193 cpu: 2193 execType: jit_gemm_BF16 +prob EXECUTED layerType: SoftMax realTime: 68 cpu: 68 execType: jit_avx512_FP32 +``` + +The `execType` column of the table includes inference primitives with specific suffixes. + +[bf16_format]: img/bf16_format.png \ No newline at end of file diff --git a/docs/IE_DG/Cross_Check_Tool.md b/docs/IE_DG/Cross_Check_Tool.md new file mode 100644 index 00000000000000..495afa790fcccc --- /dev/null +++ b/docs/IE_DG/Cross_Check_Tool.md @@ -0,0 +1,298 @@ +Cross Check Tool {#openvino_docs_IE_DG_Cross_Check_Tool} +================ + +Cross Check Tool is a console application that enables comparing accuracy and performance metrics for two successive +model inferences that are performed +on two different supported Intel® devices or with different precisions. +The Cross Check Tool can compare metrics per layer or all over the model. + +On Linux* OS, before running the Cross Check Tool binary, make sure your application can find the +Deep Learning Inference Engine libraries. +Navigate to the `/deployment_tools/inference_engine/bin` folder and run the `setvars.sh` script to +set all necessary environment variables: + +```sh +source setvars.sh +``` + +## Running the Cross Check Tool + +Cross Check Tool is distributed as a binary file and there is no need to build it. To run the Cross Check Tool, +execute the tool's binary file with necessary parameters. Please note that the Inference Engine assumes that weights +are in the same folder as the _.xml_ file. + +You can get the list of all available options using the -h option: + +```sh +$./cross_check_tool -h +InferenceEngine: + API version ............ 1.0 + Build .................. ### +[ INFO ] Parsing input parameters + +./cross_check_tool [OPTION] +Options: + + -h Prints a usage message. + -i "" Optional. Path to an input image file or multi-input file to infer. Generates input(s) from normal distribution if empty + -m "" Required. Path to an .xml file that represents the first IR of the trained model to infer. + -l "" Required for MKLDNN (CPU)-targeted custom layers. Absolute path to a shared library with the kernels implementation. + Or + -c "" Required for clDNN (GPU)-targeted custom kernels. Absolute path to the xml file with the kernels description. + -conf "" Optional. Path to config file for -d device plugin + -ref_conf "" Optional. Path to config file for -ref_d device plugin + -pp "" Optional. Path to a plugin folder. + -d "" Required. The first target device to infer the model specified with the -m option. CPU, GPU, HDDL or MYRIAD is acceptable. + -ref_m "" Optional. Path to an .xml file that represents the second IR in different precision to compare the metrics. + -ref_d "" Required. The second target device to infer the model and compare the metrics. CPU, GPU, HDDL or MYRIAD is acceptable. + -layers "" Defines layers to check. Options: all, None - for output layers check, list of comma-separated layer names to check. Default value is None. + -eps "" Optional. Threshold for filtering out those blob statistics that do not statify the condition: max_abs_diff < eps. + -dump Enables blobs statistics dumping + -load "" Path to a file to load blobs from +``` +### Examples + +1. To check per-layer accuracy and performance of inference in FP32 precision on the CPU against the GPU, run: +```sh +./cross_check_tool -i \ + -m \ + -d CPU \ + -ref_d GPU \ + -layers all +``` +The output looks as follows: +``` +InferenceEngine: + API version ............ 1.0 + Build .................. ### +[ INFO ] Parsing input parameters + The same IR on both devices: + +[ INFO ] No extensions provided + + API version ............ 1.0 + Build .................. lnx_20180510 + Description ....... MKLDNNPlugin + + API version ............ 0.1 + Build .................. ci-main-03659 + Description ....... clDNNPlugin +[ INFO ] Inputs detected: Placeholder +[ INFO ] Statistics will be dumped for X layers: , , ... , +[ INFO ] Layer statistics + Max absolute difference: 1.52588e-05 + Min absolute difference: 0 + Max relative difference: 0.000288028% + Min relative difference: 0% + Blob size: 1000 + + Devices: CPU_FP32 GPU_FP32 + Status: EXECUTED EXECUTED + Layer type: Reshape Reshape + Real time, microsec: 20 154 + Execution type: unknown GPU + Number of NAN: 0 0 + Number of INF: 0 0 + Number of ZERO: 0 0 +... + +... + +[ INFO ] Overall max absolute difference 2.81334e-05 was reached by layer +[ INFO ] Overall min absolute difference 0 was reached by layer +[ INFO ] Overall max relative difference 0.744893% was reached by layer +[ INFO ] Overall min relative difference -2.47948% was reached by layer +[ INFO ] Execution successful +``` + +2. To check the overall accuracy and performance of inference on the CPU in FP32 precision against the +Intel® Movidius™ Myriad™ device in FP16 precision, run: +```sh +./cross_check_tool -i \ + -m \ + -ref_d CPU \ + -ref_m \ + -d MYRIAD \ +``` +The output looks as follows: +``` +InferenceEngine: + API version ............ 1.0 + Build .................. ### + +[ INFO ] Parsing input parameters +[ INFO ] MYRIAD vs CPU + IR for MYRIAD : + IR for CPU : + +[ INFO ] No extensions provided +[ INFO ] Loading plugins + + API version ............ 0.1 + Build .................. ### + Description ....... myriadPlugin + + + API version ............ 1.0 + Build .................. ### + Description ....... MKLDNNPlugin + +[ INFO ] Inputs detected: +[ INFO ] Statistics will be dumped for 1 layers: +[ INFO ] Layer statistics + Max absolute difference: 0.003889 + Min absolute difference: 2.49778e-12 + Max relative difference: 290.98% + Min relative difference: 0.0327804% + Devices: MYRIAD_FP16 CPU_FP32 + Real time, microsec: 69213.978946 4149.904940 +[ INFO ] Execution successful +``` + +3. To dump layer statistics from specific list of layers, run: +```sh +./cross_check_tool -i \ + -m \ + -d MYRIAD \ + -dump \ + -layers +``` +The output looks as follows: +``` +InferenceEngine: + API version ............ 1.0 + Build .................. ### +[ INFO ] Blob and statistics dumping enabled +[ INFO ] No extensions provided + + API version ............ 0.1 + Build .................. custom_releases/cvsdk-2018-r2_e28ec0278fb749d6b999c688a8e90a8a25c0f2b5 + Description ....... myriadPlugin + +[ INFO ] Inputs detected: +[ INFO ] Statistics will be dumped for X layers: +[ INFO ] Dump path: +[ INFO ] layer processing +... +[ INFO ] layer processing +[ INFO ] Execution successful +``` +If you do not provide the `-i` key, the Cross Check Tool generates an input from normal distributed noise and saves +it in a multi-input file format with the filename `_input_layers_dump.txt` in the same folder as the IR. +4. To check the overall accuracy and performance of inference on the CPU in FP32 precision against dumped results, run: +```sh +./cross_check_tool -i \ + -m \ + -d CPU \ + -load \ + -layers all +``` +The output looks as follows: +``` +InferenceEngine: + API version ............ 1.0 + Build .................. ### +[ INFO ] Blob and statistics loading enabled. File /localdisk/models/FP16/icv_squeezenet_v1.0_MYRIAD_FP16_dump.txt + The same IR on both devices: + +[ INFO ] No extensions provided + + API version ............ 0.1 + Build .................. ### + Description ....... myriadPlugin + +[ INFO ] Inputs detected: +[ INFO ] Statistics will be dumped for X layers: , , ... , +[ INFO ] layer processing +[ INFO ] Layer statistics + Max absolute difference: 0 + Min absolute difference: 0 + Max relative difference: 0% + Min relative difference: 0% + Blob size: 1000 + + Devices: MYRIAD_FP16 MYRIAD_FP16_loaded + Status: EXECUTED EXECUTED + Layer type: SoftMax SoftMax + Real time, microsec: 43 43 + Execution type: SoftMax SoftMax + Number of NAN: 0 0 + Number of INF: 0 0 + Number of ZERO: 0 0 +... + +... +[ INFO ] Overall max absolute difference 0 +[ INFO ] Overall min absolute difference 0 was reached by layer +[ INFO ] Overall max relative difference 0% +[ INFO ] Overall min relative difference 0% was reached by layer +[ INFO ] Execution successful +``` + +### Multi-input and dump file experimental format + +Text file contains description of each layer in structure like this: +* 1st line is layer name (required) +* 2nd line is shape like "(1,224,224,3)" (required) +* 3rd line is a device and precision information like "CPU_FP32" (optional for multi-input file) +* 4th line is execution status Options are: EXECUTED, OPTIMIZED_OUT (optional for multi-input file) +* 5th line is type of layer (optional for multi-input file) +* 6th line is execution time in microseconds (optional for multi-input file) +* 7th line is type of execution (optional for multi-input file) +* 8th line is word "CONTENT" which means that the next line or lines are consisted of blob elements +* Next line or lines are for blob elements. They may be separated with one or several spaces, tabs and new lines. + + +#### Multi-input file example + +``` +Input_1 +(1,10) +CONTENT +0 0.000628471375 0.00185108185 +0.000580787659 +0.00137138367 +0.000561237335 0.0040473938 0 0 0 +Input_2 +(1,8) +CONTENT +0 0 0.00194549561 0.0017490387 7.73072243e-05 0.000135779381 0.000186920166 0 7.52806664e-05 +``` + +#### Dump file example + +``` +Softmax +(1,10) +MYRIAD_FP16 +EXECUTED +SoftMax +43 +SoftMax +CONTENT +7.44462013e-05 +0 +0.000810623169 +0.000361680984 +0 +9.14335251e-05 +0 +0 +8.15987587e-05 +0 +``` + + +### Configuration file + +There is an option to pass configuration file to plugin by providing +`-conf` and/or `--ref_conf` keys. + +Configuration file is a text file with content of pairs of keys and values. + +Structure of configuration file: + +```sh +KEY VALUE +ANOTHER_KEY ANOTHER_VALUE,VALUE_1 +``` diff --git a/docs/IE_DG/Deep_Learning_Inference_Engine_DevGuide.md b/docs/IE_DG/Deep_Learning_Inference_Engine_DevGuide.md new file mode 100644 index 00000000000000..2e6e033069a83c --- /dev/null +++ b/docs/IE_DG/Deep_Learning_Inference_Engine_DevGuide.md @@ -0,0 +1,93 @@ +# Inference Engine Developer Guide {#openvino_docs_IE_DG_Deep_Learning_Inference_Engine_DevGuide} + +## Introduction to the OpenVINO™ Toolkit + +The OpenVINO™ toolkit is a comprehensive toolkit that you can use to develop and deploy vision-oriented solutions on +Intel® platforms. Vision-oriented means the solutions use images or videos to perform specific tasks. +A few of the solutions use cases include autonomous navigation, digital surveillance cameras, robotics, +and mixed-reality headsets. + +The OpenVINO™ toolkit: + +* Enables CNN-based deep learning inference on the edge +* Supports heterogeneous execution across an Intel® CPU, Intel® Integrated Graphics, Intel® Movidius™ Neural Compute Stick and Intel® Neural Compute Stick 2 +* Speeds time-to-market via an easy-to-use library of computer vision functions and pre-optimized kernels +* Includes optimized calls for computer vision standards including OpenCV\*, OpenCL™, and OpenVX\* + +The OpenVINO™ toolkit includes the following components: + +* Intel® Deep Learning Deployment Toolkit (Intel® DLDT) + - [Deep Learning Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) — A cross-platform command-line tool for importing models and + preparing them for optimal execution with the Deep Learning Inference Engine. The Model Optimizer supports converting Caffe*, + TensorFlow*, MXNet*, Kaldi*, ONNX* models. + - [Deep Learning Inference Engine](inference_engine_intro.md) — A unified API to allow high performance inference on many hardware types + including Intel® CPU, Intel® Processor Graphics, Intel® FPGA, Intel® Neural Compute Stick 2. + - [nGraph](nGraph_Flow.md) — graph representation and manipulation engine which is used to represent a model inside Inference Engine and allows the run-time model construction without using Model Optimizer. +* [OpenCV](https://docs.opencv.org/) — OpenCV* community version compiled for Intel® hardware. +Includes PVL libraries for computer vision. +* Drivers and runtimes for OpenCL™ version 2.1 +* [Intel® Media SDK](https://software.intel.com/en-us/media-sdk) +* [OpenVX*](https://software.intel.com/en-us/cvsdk-ovx-guide) — Intel's implementation of OpenVX* +optimized for running on Intel® hardware (CPU, GPU, IPU). +* [Demos and samples](Samples_Overview.md). + + +This Guide provides overview of the Inference Engine describing the typical workflow for performing +inference of a pre-trained and optimized deep learning model and a set of sample applications. + +> **NOTES:** +> - Before you perform inference with the Inference Engine, your models should be converted to the Inference Engine format using the Model Optimizer or built directly in run-time using nGraph API. To learn about how to use Model Optimizer, refer to the [Model Optimizer Developer Guide](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). To learn about the pre-trained and optimized models delivered with the OpenVINO™ toolkit, refer to [Pre-Trained Models](@ref omz_models_intel_index). +> - [Intel® System Studio](https://software.intel.com/en-us/system-studio) is an all-in-one, cross-platform tool suite, purpose-built to simplify system bring-up and improve system and IoT device application performance on Intel® platforms. If you are using the Intel® Distribution of OpenVINO™ with Intel® System Studio, go to [Get Started with Intel® System Studio](https://software.intel.com/en-us/articles/get-started-with-openvino-and-intel-system-studio-2019). + + +## Table of Contents + +* [Introduction to Intel® Deep Learning Deployment Toolkit](Introduction.md) + +* [Inference Engine API Changes History](API_Changes.md) + +* [Introduction to Inference Engine](inference_engine_intro.md) + +* [Introduction to nGraph Flow](nGraph_Flow.md) + +* [Understanding Inference Engine Memory Primitives](Memory_primitives.md) + +* [Introduction to Inference Engine Device Query API](InferenceEngine_QueryAPI.md) + +* [Adding Your Own Layers to the Inference Engine](Extensibility_DG/Intro.md) + +* [Integrating Inference Engine in Your Application](Integrate_with_customer_application_new_API.md) + +* [Migration from Inference Engine Plugin API to Core API](Migration_CoreAPI.md) + +* [Introduction to Performance Topics](Intro_to_Performance.md) + +* [Inference Engine Python API Overview](../../inference-engine/ie_bridges/python/docs/api_overview.md) + +* [Using Dynamic Batching feature](DynamicBatching.md) + +* [Using Static Shape Infer feature](ShapeInference.md) + +* [Using Low-Precision 8-bit Integer Inference](Int8Inference.md) + +* [Using Bfloat16 Inference](Bfloat16Inference.md) + +* Utilities to Validate Your Converted Model + * [Using Cross Check Tool for Per-Layer Comparison Between Plugins](../../inference-engine/tools/cross_check_tool/README.md) + +* [Supported Devices](supported_plugins/Supported_Devices.md) + * [GPU](supported_plugins/CL_DNN.md) + * [CPU](supported_plugins/CPU.md) + * [FPGA](supported_plugins/FPGA.md) + * [VPU](supported_plugins/VPU.md) + * [MYRIAD](supported_plugins/MYRIAD.md) + * [HDDL](supported_plugins/HDDL.md) + * [Heterogeneous execution](supported_plugins/HETERO.md) + * [GNA](supported_plugins/GNA.md) + * **NEW!** [MULTI](supported_plugins/MULTI.md) + +* [Pre-Trained Models](@ref omz_models_intel_index) + +* [Known Issues](Known_Issues_Limitations.md) + +**Typical Next Step:** [Introduction to Intel® Deep Learning Deployment Toolkit](Introduction.md) diff --git a/docs/IE_DG/DynamicBatching.md b/docs/IE_DG/DynamicBatching.md new file mode 100644 index 00000000000000..696b245d45c07e --- /dev/null +++ b/docs/IE_DG/DynamicBatching.md @@ -0,0 +1,83 @@ +Using Dynamic Batching {#openvino_docs_IE_DG_DynamicBatching} +====================== + +Dynamic Batching feature allows you+ to dynamically change batch size for inference calls +within preset batch size limit. +This feature might be useful when batch size is unknown beforehand, and using extra large batch size is +undesired or impossible due to resource limitations. +For example, face detection with person age, gender, or mood recognition is a typical usage scenario. + + +## Usage + +You can activate Dynamic Batching by setting KEY_DYN_BATCH_ENABLED flag to YES in a configuration map that is +passed to the plugin while loading a network. +This configuration creates an ExecutableNetwork object that will allow setting batch size +dynamically in all of its infer requests using SetBatch() method. +The batch size that was set in passed CNNNetwork object will be used as a maximum batch size limit. + +Here is a code example: +```cpp +int dynBatchLimit = FLAGS_bl; //take dynamic batch limit from command line option + +// Read network model +Core core; +CNNNetwork network = core.ReadNetwork(modelFileName, weightFileName); + +// enable dynamic batching and prepare for setting max batch limit +const std::map dyn_config = +{ { PluginConfigParams::KEY_DYN_BATCH_ENABLED, PluginConfigParams::YES } }; +network.setBatchSize(dynBatchLimit); + +// create executable network and infer request +auto executable_network = core.LoadNetwork(network, "CPU", dyn_config); +auto infer_request = executable_network.CreateInferRequest(); + + +... + + +// process a set of images +// dynamically set batch size for subsequent Infer() calls of this request +size_t batchSize = imagesData.size(); +infer_request.SetBatch(batchSize); +infer_request.Infer(); + +... + +// process another set of images +batchSize = imagesData2.size(); +infer_request.SetBatch(batchSize); +infer_request.Infer(); +``` + + +## Limitations + +Currently, certain limitations for using Dynamic Batching exist: + +* Use Dynamic Batching with CPU and GPU plugins only. + +* Use Dynamic Batching on topologies that consist of certain layers only: + + * Convolution + * Deconvolution + * Activation + * LRN + * Pooling + * FullyConnected + * SoftMax + * Split + * Concatenation + * Power + * Eltwise + * Crop + * BatchNormalization + * Copy + +Do not use layers that might arbitrary change tensor shape (such as Flatten, Permute, Reshape), +layers specific to object detection topologies (ROIPooling, ProirBox, DetectionOutput), and +custom layers. +Topology analysis is performed during the process of loading a network into plugin, and if topology is +not applicable, an exception is generated. + diff --git a/docs/IE_DG/Extensibility_DG/AddingNGraphOps.md b/docs/IE_DG/Extensibility_DG/AddingNGraphOps.md new file mode 100644 index 00000000000000..8c181062cb60fd --- /dev/null +++ b/docs/IE_DG/Extensibility_DG/AddingNGraphOps.md @@ -0,0 +1,89 @@ +# Add Custom nGraph Operations {#openvino_docs_IE_DG_Extensibility_DG_AddingNGraphOps} + +Inference Engine Extension API allows to register operation sets (opsets) with custom nGraph operations, it allows to support Networks with unknown operations. + +## Operation Class + +To add your custom nGraph operation, create a new class that extends `ngraph::Op`, which is in turn derived from `ngraph::Node`, the base class for all graph operations in nGraph. Follow the steps below: + +1. Define a `NodeTypeInfo` object that identifies the type of the operation to the graph users and helps with dynamic type resolution. The type info of an nGraph operation currently consists of a string identifier and a version number, but this may change in the future. + +2. Implement constructors that can optionally take the operation inputs and attributes as parameters. + +3. Override the shape inference method `validate_and_infer_types`. This method is called multiple times during graph manipulations to determine the shapes and element types of the outputs of the operations. You can access the input shapes through the `get_input_partial_shape()` method and input element types through the `get_input_element_type()` method of `ngraph::Node`. Set the inferred shape and element type of the output using `set_output_type`. + +4. Override the `copy_with_new_args` method, which allows graph manipulation routines to create copies of this operation and connect it to different nodes during optimization. + +5. Override the `visit_attributes` method, which allows serialization and deserialization of attributes. An `AttributeVisitor` is passed to the method, and the implementation is expected to walk over all the attributes in the op using the type-aware `on_attribute` helper. Helpers are already implemented for standard C++ types like `int64_t`, `float`, `bool`, `vector` and for existing nGraph defined types. + +Based on that, declaration of a operation class can look as follows: + +@snippet op.hpp op:header + +### Class Fields + +The provided implementation has several fields: + + * `add` of type `int64_t` is an attribute of custom operation + * `type_info` of type `ngraph::NodeTypeInfo` defines the type and version of operation + +### Operation Constructors + +nGraph operation contains two constructors: a default constructor, which allows to create operation without attributes and a constructor that creates and validates operation with specified inputs and attributes. + +@snippet op.cpp op:ctor + +### `validate_and_infer_types()` + +`ngraph::Node::validate_and_infer_types` method validates operation attributes and calculates output shapes using attributes of operation. + +@snippet op.cpp op:validate + +### `copy_with_new_args()` + +`ngraph::Node::copy_with_new_args` method creates a copy of nGraph operation with new inputs. + +@snippet op.cpp op:copy + +### `visit_attributes()` + +`ngraph::Node::visit_attributes` method allows to visit all operation attributes. + +@snippet op.cpp op:visit_attributes + +## Register Custom Operations in Extension Class + +To add custom operations to the [Extension](Extension.md) class, create an operation set with custom operations and implement the `InferenceEngine::IExtension::getOpSets` method: + +@snippet extension.cpp extension:getOpSets + +This method returns a map of opsets that exist in the extension library. + +nGraph provides opsets mechanism for operation versioning. Different opsets distinguish between different versions of one operation. + +When specifying opset names, follow the rules below: +* Use unique opset names. +* Do not use the following built-in opset names: `extension`, `experimental`, `opset1`, `opest2`. +* Make sure that the Model Optimizer and your extension use the same opset names. +* IR v10 layers have the mandatory `version` attribute specifying the opset. +* `opset1` is the name of default operations set. +Operations from the default opset cannot be redefined. + +Use a custom opset to create a new operation or extend functionality of an existing operation from another opset. + +## Deprecation Notice + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* diff --git a/docs/IE_DG/Extensibility_DG/Building.md b/docs/IE_DG/Extensibility_DG/Building.md new file mode 100644 index 00000000000000..8d33678da50897 --- /dev/null +++ b/docs/IE_DG/Extensibility_DG/Building.md @@ -0,0 +1,19 @@ +# Build Extension Library Using CMake* {#openvino_docs_IE_DG_Extensibility_DG_Building} + +Inference Engine build infrastructure provides the Inference Engine Package for application development. + +To build an extension library, use the following CMake script: + +@snippet CMakeLists.txt cmake:extension + +This CMake script finds the Inference Engine and nGraph using the `find_package` CMake command. + +To build an extension library, run the commands below: + +```sh +$ cd template_extension +$ mkdir build +$ cd build +$ cmake -DInferenceEngine_DIR=[IE_DIR] -Dngraph_DIR=[NGRAPH_DIR] ../ +$ cmake --build . +``` diff --git a/docs/IE_DG/Extensibility_DG/CPU_Kernel.md b/docs/IE_DG/Extensibility_DG/CPU_Kernel.md new file mode 100644 index 00000000000000..22fd0d062dea2e --- /dev/null +++ b/docs/IE_DG/Extensibility_DG/CPU_Kernel.md @@ -0,0 +1,74 @@ +# How to Implement Custom CPU Layers {#openvino_docs_IE_DG_Extensibility_DG_CPU_Kernel} + +The primary vehicle for the performance of the CPU codepath in the Inference Engine is the Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN), and new CPU kernels extend the Inference Engine plugin for the Intel MKL-DNN. Implementing the InferenceEngine::ILayerExecImpl defines a general CPU-side extension. There are no Intel MKL-DNN specifics in the way you need to implement a kernel. + +## Implementation Class + +All custom kernels for the CPU plugin should be inherited from the InferenceEngine::ILayerExecImpl interface. +Based on that, declaration of a kernel implementation class can look as follows: + +@snippet cpu_kernel.hpp cpu_implementation:header + +### Class Fields + +The provided implementation has several fields: + + * `add` of the type `int64_t` is an attribute of a custom operation + * `inShape` of the type `ngraph::Shape` is an input shape + * `outShape` of the type `ngraph::Shape` is an output shape + * `error` of the type `std::string` is a field to handle errors from a constructor + +### Constructor of Implementation + +An implementation constructor checks parameters of nGraph operation, stores needed attributes, and stores an error message in the case of an error. + +@snippet cpu_kernel.cpp cpu_implementation:ctor + +### `getSupportedConfigurations` + +InferenceEngine::ILayerExecImpl::getSupportedConfigurations method returns all supported configuration formats (input/output tensor layouts) for your implementation. To specify formats of data, use InferenceEngine::TensorDesc. Refer to the [Memory Primitives](../Memory_primitives.md) section for instructions on how to do it. + +@snippet cpu_kernel.cpp cpu_implementation:getSupportedConfigurations + +### `init` + +InferenceEngine::ILayerExecImpl::init method gets a runtime-selected configuration from a vector that is populated from the `getSupportedConfigurations` method and checks the parameters: + +@snippet cpu_kernel.cpp cpu_implementation:init + +### `execute` + +InferenceEngine::ILayerExecImpl::execute method accepts and processes the actual tenors as input/output blobs: + +@snippet cpu_kernel.cpp cpu_implementation:execute + +## Register Implementation in `Extension` Class + +To register custom kernel implementation in the [Extension](Extension.md) class, implement the following methods: +* getImplTypes +* getImplementation + +### getImplTypes + +InferenceEngine::IExtension::getImplTypes returns a vector of implementation types for an operation. + +@snippet extension.cpp extension:getImplTypes + +### getImplementation + +InferenceEngine::IExtension::getImplementation returns the kernel implementation with a specified type for an operation. + +@snippet extension.cpp extension:getImplementation + + +## Load Extension with Executable Kernels to Plugin + +Use the `AddExtension` method of the general plugin interface to load your primitives: +```cpp +InferenceEngine::Core core; +// Load CPU extension as a shared library +auto extension_ptr = make_so_pointer(""); +// Add extension to the CPU device +core.AddExtension(extension_ptr, "CPU"); +``` + diff --git a/docs/IE_DG/Extensibility_DG/Extension.md b/docs/IE_DG/Extensibility_DG/Extension.md new file mode 100644 index 00000000000000..1eb84bb5c694d9 --- /dev/null +++ b/docs/IE_DG/Extensibility_DG/Extension.md @@ -0,0 +1,25 @@ +# Extension Library {#openvino_docs_IE_DG_Extensibility_DG_Extension} + +Inference Engine provides an InferenceEngine::IExtension interface, which defines the interface for Inference Engine Extension libraries. +All extension libraries should be inherited from this interface. + +Based on that, declaration of an extension class can look as follows: + +@snippet extension.hpp extension:header + +The extension library should contain and export the method InferenceEngine::CreateExtension, which creates an `Extension` class: + +@snippet extension.cpp extension:CreateExtension + +Also, an `Extension` object should implement the following methods: + +* InferenceEngine::IExtension::Release deletes an extension object + +* InferenceEngine::IExtension::GetVersion returns information about version of the library + +@snippet extension.cpp extension:GetVersion + +Implement the InferenceEngine::IExtension::getOpSets method if the extension contains custom layers. +Read the [guide about custom operations](AddingNGraphOps.md) for more information. + +To understand how integrate execution kernels to the extension library, read the [guide about development of custom CPU kernels](CPU_Kernel.md). diff --git a/docs/IE_DG/Extensibility_DG/GPU_Kernel.md b/docs/IE_DG/Extensibility_DG/GPU_Kernel.md new file mode 100644 index 00000000000000..24c7599d8baad0 --- /dev/null +++ b/docs/IE_DG/Extensibility_DG/GPU_Kernel.md @@ -0,0 +1,250 @@ +# How to Implement Custom GPU Layers {#openvino_docs_IE_DG_Extensibility_DG_GPU_Kernel} + +The GPU codepath abstracts many details about OpenCL™. You need to provide the kernel code in OpenCL C and the configuration file that connects the kernel and its parameters to the parameters of the layer. + +There are two options of using custom layer configuration file: + +* Include a section with your kernels into the global automatically-loaded `cldnn_global_custom_kernels/cldnn_global_custom_kernels.xml` file, which is hosted in the `/deployment_tools/inference_engine/bin/intel64/{Debug/Release}` folder +* Call the `InferenceEngine::Core::SetConfig()` method from your application with the `InferenceEngine::PluginConfigParams::KEY_CONFIG_FILE` key and the configuration file name as a value before loading the network that uses custom layers to the plugin: +```cpp +InferenceEngine::Core core; +// Load GPU Extensions +core.SetConfig({ { InferenceEngine::PluginConfigParams::KEY_CONFIG_FILE, "" } }, "GPU"); +``` + +All Inference Engine samples, except trivial `hello_classification`, +feature a dedicated command-line option `-c` to load custom kernels. For example, to load custom layers for the classification sample, run the command below: +```sh +$ ./classification_sample -m /bvlc_alexnet_fp16.xml -i ./validation_set/daily/227x227/apron.bmp -d GPU + -c /custom_layer_example.xml +``` + +## Configuration File Format + +The configuration file is expected to follow the `.xml` file structure +with a node of the type `CustomLayer` for every custom layer you provide. + +The definitions described in the sections below use the following notations: + +Notation | Description +---|--- +(0/1) | Can have 0 or 1 instances of this node/attribute +(1) | Must have only 1 instance of this node/attribute +(0+) | Can have any number of instances of this node/attribute +(1+) | Can have 1 or more instances of this node/attribute + +### CustomLayer Node and Sub-node Structure + +`CustomLayer` node contains the entire configuration for a single custom +layer. + +| Attribute Name |\# | Description | +|-----|-----|-----| +| `name` | (1) | The name of the layer type to be used. This name should be identical to the type used in the IR.| +| `type` | (1) | Must be `SimpleGPU`. | +| `version` | (1) | Must be `1`. | + +**Sub-nodes**: `Kernel` (1), `Buffers` (1), `CompilerOptions` (0+), +`WorkSizes` (0/1) + +### Kernel Node and Sub-node Structure + +`Kernel` node contains all kernel source code configuration. No kernel +node structure exists. + +**Sub-nodes**: `Source` (1+), `Define` (0+) + +### Source Node and Sub-node Structure + +`Source` node points to a single OpenCL source file. + +| Attribute Name | \# || +|-----|-----|-----| +| `filename` | (1) | Name of the file containing OpenCL source code. Notice that path is relative to your executable. Multiple source nodes will have their sources concatenated in order. | + +**Sub-nodes**: None + +### Define Node and Sub-node Structure + +`Define` node configures a single `#‍define` instruction to be added to +the sources during compilation (JIT). + +| Attribute Name | \# | Description | +|------|-------|------| +| `name` | (1) | The name of the defined JIT. For static constants, this can include the value as well (taken as a string). | +| `param` | (0/1) | This parameter value is used as the value of this JIT definition. | +| `type` | (0/1) | The parameter type. Accepted values: `int`, `float`, and `int[]`, `float[]` for arrays. | +| `default` | (0/1) | The default value to be used if the specified parameters is missing from the layer in the IR. | + +**Sub-nodes:** None + +The resulting JIT has the following form: +`#‍define [name] [type] [value/default]`. + +### Buffers Node and Sub-node Structure + +`Buffers` node configures all input/output buffers for the OpenCL entry +function. No buffers node structure exists. + +**Sub-nodes:** `Data` (0+), `Tensor` (1+) + +### Data Node and Sub-node Structure + +`Data` node configures a single input with static data (for example, +weights or biases). + +| Attribute Name | \# | Description | +|----|-----|------| +| `name` | (1) | Name of a blob attached to a layer in the IR | +| `arg-index` | (1) | 0-based index in the entry function arguments to be bound to | + +**Sub-nodes**: None + +### Tensor Node and Sub-node Structure + +`Tensor` node configures a single input or output tensor. + +| Attribute Name | \# | Description | +|------|-------|-------| +| `arg-index` | (1) | 0-based index in the entry function arguments to be bound to. | +| `type` | (1) | `input` or `output` | +| `port-index` | (1) | 0-based index in the layer’s input/output ports in the IR | +| `format` | (0/1) | Data layout declaration for the tensor. Accepted values: `BFYX`, `BYXF`, `YXFB`, `FYXB` (also in all lowercase). Default value: `BFYX` | + +### CompilerOptions Node and Sub-node Structure + +`CompilerOptions` node configures the compilation flags for the OpenCL +sources. + +| Attribute Name | \# | Description | +|--------|-----|------| +| `options` | (1) | Options string to be passed to the OpenCL compiler | + +**Sub-nodes**: None + +### WorkSizes Node and Sub-node Structure + +`WorkSizes` node configures the global/local work sizes to be used when +queuing the OpenCL program for execution. + +| Attribute Name | \# | Description | +|-----|------|-----| +| `global`
`local` | (0/1)
(0/1) | An array of up to 3 integers (or formulas) for defining the OpenCL work-sizes to be used during execution.
The formulas can use the values of the B,F,Y,X dimensions and contain the operators: +,-,/,\*,% (all evaluated in integer arithmetic).
Default value: `global=”B*F*Y*X” local=””` | +| `dim` | (0/1) | A tensor to take the work size from. Accepted values: `input N`, `output`, where `N` is an index of input tensor starting with 0. Default value: `output` | + +**Sub-nodes**: None + +## Example Configuration File + +The following code sample provides an example configuration file (in the +`.xml` format). For information on configuration file structure, see +[Configuration File Format](#config-file-format). +```xml + + + + + + + + + + + + +``` + +## Built-In Defines for Custom Layers + +The following table includes definitions that are attached before +the user sources, where `` is the actual input and output, for +example, `INPUT0` or `OUTPUT0`. + +For an example, see [Example Kernel](#example-kernel). + +| Name | Value | +|---|---| +| `NUM_INPUTS` | Number of the input tensors bound to this kernel | +| `GLOBAL_WORKSIZE` | An array of global work sizes used to execute this kernel | +| `GLOBAL_WORKSIZE_SIZE` | The size of the `GLOBAL_WORKSIZE` array | +| `LOCAL_WORKSIZE` | An array of local work sizes used to execute this kernel | +| `LOCAL_WORKSIZE_SIZE` | The size of the `LOCAL_WORKSIZE` array | +| `_DIMS`| An array of the tensor dimension sizes. Always ordered as `BFYX` | +| `_DIMS_SIZE`| The size of the `_DIMS` array.| +| `_TYPE`| The datatype of the tensor: `float`, `half`, or `char`| +| `_FORMAT_` | The format of the tensor, BFYX, BYXF, YXFB , FYXB, or ANY. The format is concatenated to the defined name. You can use the tensor format to define codepaths in your code with `#‍ifdef/#‍endif`. | +| `_LOWER_PADDING` | An array of padding elements used for the tensor dimensions before they start. Always ordered as BFYX.| +| `_ LOWER_PADDING_SIZE` | The size of the `_LOWER_PADDING` array | +| `_UPPER_PADDING` | An array of padding elements used for the tensor dimensions after they end. Always ordered as BFYX. | +| `_UPPER_PADDING_SIZE` | The size of the `_UPPER_PADDING` array | +| `_PITCHES` | The number of elements between adjacent elements in each dimension. Always ordered as BFYX.| +| `_PITCHES_SIZE`| The size of the `_PITCHES` array | +| `_OFFSET`| The number of elements from the start of the tensor to the first valid element (bypassing the lower padding) | +All `` values are automatically defined for every tensor +bound to this layer (`INPUT0`, `INPUT1`, `OUTPUT0`, and so on), as shown +in the following for example: + +```sh +#define INPUT0_DIMS_SIZE 4 +#define INPUT0_DIMS (int []){ 1,96,55,55, } +``` + +## Example Kernel + +```c +#pragma OPENCL EXTENSION cl_khr_fp16 : enable +__kernel void example_relu_kernel( + const __global INPUT0_TYPE* input0, + __global OUTPUT0_TYPE* output) +{ + const uint idx = get_global_id(0); + const uint idy = get_global_id(1); + const uint idbf = get_global_id(2);//batches*features, as OpenCL supports 3D nd-ranges only + const uint feature = idbf%OUTPUT0_DIMS[1]; + const uint batch = idbf/OUTPUT0_DIMS[1]; + //notice that pitches are in elements, not in bytes! + const uint in_id = batch*INPUT0_PITCHES[0] + feature*INPUT0_PITCHES[1] + idy*INPUT0_PITCHES[2] + idx*INPUT0_PITCHES[3] + INPUT0_OFFSET; + const uint out_id = batch*OUTPUT0_PITCHES[0] + feature*OUTPUT0_PITCHES[1] + idy*OUTPUT0_PITCHES[2] + idx*OUTPUT0_PITCHES[3] + OUTPUT0_OFFSET; + + INPUT0_TYPE value = input0[in_id]; + //neg_slope (which is non-zero for leaky ReLU) is put automatically as #define, refer to the config xml + output[out_id] = value < 0 ? value * neg_slope : value; +} +``` + +> **NOTE:** As described in the previous section, all the things like +> `INPUT0_TYPE` are actually defined as OpenCL (pre-)compiler inputs by +> the Inference Engine for efficiency reasons. See [Debugging +> Tips](#debugging-tips) for information on debugging the results. + +> **NOTE**: Several GPU-targeted kernels are also added to the binaries upon samples compilation +> so that the sample application can easy load them. +> Refer to the `cldnn_global_custom_kernels` folder in the GPU plugin installation directory. + +## Debugging Tips + +* **Dumping the Resulting Kernels**. +It is recommended to get a dump of the kernel with all of +the values set by the Inference Engine, such as tensor sizes, +floating-point, and integer kernel parameters. To get the dump, add the +following line to your code that configures the GPU plugin to output the +custom kernels: +```cpp +core.SetConfig({ { PluginConfigParams::KEY_DUMP_KERNELS, PluginConfigParams::YES } }, "GPU"); +``` +When the Inference Engine compiles the kernels for the specific network, +it also outputs the resulting code for the custom kernels. In the +directory of your executable, find files like +`clDNN_program0.cl`, `clDNN_program1.cl`. There are as many files as +distinct sets of parameters for your custom kernel: different input +tensor sizes and kernel parameters. + +* **Using `printf` in the OpenCL™ Kernels**. +To debug the specific values, you can use `printf` in your kernels. +However, be careful: for instance, do not output excessively +as it would generate too much data. The `printf` output is typical, so +your output can be truncated to fit the buffer. Also, because of +buffering, you actually get an entire buffer of output when the +execution ends.
+For more information, refer to the [printf +Function](https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/printfFunction.html). diff --git a/docs/IE_DG/Extensibility_DG/Intro.md b/docs/IE_DG/Extensibility_DG/Intro.md new file mode 100644 index 00000000000000..d63e333b946c32 --- /dev/null +++ b/docs/IE_DG/Extensibility_DG/Intro.md @@ -0,0 +1,56 @@ +# Inference Engine Extensibility Mechanism {#openvino_docs_IE_DG_Extensibility_DG_Intro} + +Inference Engine Extensibility API allows to add support of custom operations to the Inference Engine. +Extension should contain operation sets with custom operations and execution kernels for custom operations. +Physically, an extension library can be represented as a dynamic library exporting the single `CreateExtension` function that allows to create a new extension instance. + +Extensibility library can be loaded to the InferenceEngine::Core object using the InferenceEngine::Core::AddExtension method. + +## Inference Engine Extension Library + +Inference Engine Extension dynamic library contains several main components: + + * [Extension class](Extension.md): + - Contains custom operation sets + - Provides CPU implementations for custom operations + * [Custom operations](Intro.md): + - Allows to use InferenceEngine::Core::ReadNetwork to read Intermediate Representation (IR) with unsupported operations + - Allows to create `ngraph::Function` with unsupported operations + - Provides shape inference mechanism for custom operations + +> **NOTE**: This documentation is written based on the `Template extension`, which demonstrates extension +development details. Find the complete code of the `Template extension`, which is fully compilable and up-to-date, +at `/docs/template_extension`. + +## Execution Kernels + +The Inference Engine workflow involves the creation of custom kernels and either custom or existing operations. + +An _Operation_ is a Network building block implemented in the training framework, for example, `Convolution` in Caffe*. +A _Kernel_ is defined as the corresponding implementation in the Inference Engine. + +Refer to the [Custom Layers in the Model Optimizer](../../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) section for details on how +mapping between framework layers and Inference Engine kernels is registered. + +In short, you can plug your own kernel implementations into the Inference Engine and map them to the layers in the original framework. + +The following pages describe how to integrate custom _kernels_ into the Inference Engine: + + * [Introduction to development of custom CPU kernels](CPU_Kernel.md) + * [Introduction to development of custom GPU kernels](GPU_Kernel.md) + * [Introduction to development of custom VPU kernels](VPU_Kernel.md) + +## Deprecated Extensibility API + +Shape Inference API and some methods of extensibility mechanism was deprecated and will be removed soon. +Old Extensibility mechanism contains two parts shape inference and execution kernel. + * [Shape Inference](deprecated/ShapeInfer.md) + * [Execution Kernel](deprecated/Factory.md) + +## Additional Resources + +* [Build an extension library using CMake*](Building.md) + +## See Also +* [Using Inference Engine Samples](../Samples_Overview.md) +* [Hello Shape Infer SSD sample](../../../inference-engine/samples/hello_reshape_ssd/README.md) diff --git a/docs/IE_DG/Extensibility_DG/VPU_Kernel.md b/docs/IE_DG/Extensibility_DG/VPU_Kernel.md new file mode 100644 index 00000000000000..a3c97d0a8533cd --- /dev/null +++ b/docs/IE_DG/Extensibility_DG/VPU_Kernel.md @@ -0,0 +1,679 @@ +# How to Implement Custom Layers for VPU (Intel® Neural Compute Stick 2) {#openvino_docs_IE_DG_Extensibility_DG_VPU_Kernel} + +> **NOTE:** OpenCL™ custom layer support is available in the preview mode. + +> **NOTE:** This section assumes you are familiar with developing kernels using OpenCL™. + +To customize your topology with an OpenCL™ layer, follow the steps below: + +1. Write and compile you OpenCL™ code with the standalone offline OpenCL™ compiler (`clc`). +2. Write a configuration file to bind the OpenCL™ kernel to the topology file (`.xml`) of the model IR. +3. Pass the configuration file to Inference engine with the model IR. + +## Compile OpenCL™ code for VPU (Intel® Neural Compute Stick 2) + +> **NOTE:** OpenCL compiler, targeting Intel® Neural Compute Stick 2 for the SHAVE* processor only, is redistributed with OpenVINO. +OpenCL support is provided by ComputeAorta*, and is distributed under a license agreement between Intel® and Codeplay* Software Ltd. + +The OpenCL™ toolchain for the Intel® Neural Compute Stick 2 supports offline compilation only, so first compile OpenCL C code using the standalone `clc` compiler. You can find the compiler binary at `/deployment_tools/tools/cl_compiler`. + +> **NOTE:** By design, custom OpenCL layers support any OpenCL kernels written with 1.2 version assumed. It also supports half float +extension and is optimized for this type, because it is a native type for Intel® Movidius™ VPUs. + +1. Prior to running a compilation, make sure that the following variables are set: + * `SHAVE_MA2X8XLIBS_DIR=/deployment_tools/tools/cl_compiler/lib/` + * `SHAVE_LDSCRIPT_DIR=/deployment_tools/tools/cl_compiler/ldscripts/` + * `SHAVE_MYRIAD_LD_DIR=/deployment_tools/tools/cl_compiler/bin/` + * `SHAVE_MOVIASM_DIR=/deployment_tools/tools/cl_compiler/bin/` +2. Run the compilation with the command below. You should use `--strip-binary-header` to make an OpenCL runtime-agnostic binary runnable with the Inference Engine. +```bash +cd /deployment_tools/tools/cl_compiler/bin +./clc --strip-binary-header custom_layer.cl -o custom_layer.bin +``` + +## Write a Configuration File + +To tie the topology IR for a layer you customize, prepare a configuration file, so that the Inference Engine can find parameters for your kernel and the execution work grid is described. +For example, given the following OpenCL kernel signature: +```cpp +__kernel void reorg_nhwc(__global const half *src, __global half *out, int w, int h, int c, int stride); +``` +Configuration file for this kernel might be the following: +```xml + + + + + + + + + + + + + + +``` +Each custom layer is described with the `CustomLayer` node. It has the following nodes and attributes: + - Root node `CustomLayer` contains the following attributes: + - `name` – (Required) A name of the Inference Engine layer to bind the kernel with. + - `type` and `version` – (Required) Reserved for future use. Set them to `MVCL` and `1` respectively. + - `max-shaves` – (Optional) The maximum number of SHAVE cores that should be dedicated for the layer. It is useful for debugging concurrency issues or for resource saving if memory bound kernel does not scale well with the number of cores, so more resources can be left for the rest of a topology. + - Sub-node `Kernel` must contain the following attributes: + - `entry` – A name of your kernel function as you defined it in a source file (in the example above, it is `reorg_nhwc`). + - Node `Source` must contain the following attributes: + - `filename` – A path to a compiled binary relative to the `.xml` binding file. + - Sub-node `Parameters` – Describes parameters bindings. For more information, see the description below. + - Sub-node `WorkSizes` – Describes local and global work group sizes and the source for dimension deduction as a pair `direction,port`. In the example above, the work group is described relatively to the dimension of the input tensor that comes through port 0 in the IR. `global` and `local` work group configurations support any simple math expressions with +,-,\*,/, and () from `B`(batch), `Y`(height), `X`(width) and `F`(channels). + - Sub-node `Where` – Allows to customize bindings with the `key="value"` attribute. For example, to substitute only 3x3 convolutions, write `` in the binging xml. + + Parameter description supports `Tensor` of one of tensor types such as `input`, `output`, `input_buffer`, `output_buffer` or `data`, `Scalar`, or `Data` nodes and has the following format: + - Each `Tensor` node of `input` or `output` type must contain the following attributes: + - `arg-name` – A name of a kernel parameter in the kernel signature. + - `type` – Node type: `input` or `output` as in the IR. + - `port-index` – A number of input/output ports as in the IR. + - `format` – The channel order in the tensor. Optional conversion layers are generated if the custom layer format is not compatible with formats of neighboring layers. `BFXY`, `BYXF`, and `ANY` formats are supported currently. + - Each `Tensor` node of `input_buffer` or `output_buffer` type must contain the following attributes: + - `arg-name` – A name of a kernel parameter in the kernel signature. + - `type` – Node type: `input_buffer` or `output_buffer`. Use the appropriate type to bind multiple kernels that correspond to different stages of the same layer. + - `port-index` – The unique identifier to bind by. + - `dim` – The dim source with the same `direction,port` format used for `WorkSizes` bindings. + - `size` – Amount of bytes needed. Current expression syntax supports only expression over dimensions of over selected input/output tensor or constants and might be expended in the future. + + Here is an example of multi-stage MVN layer binding: + ```xml + + + + + + + + + + + + + + + + + + + + + + + + + + ``` + - Each `Tensor` node that has the type `data` must contain the following attributes: + - `source` – A name of the blob as it is in the IR (typical example is `weights` for convolution + - `format` – Specifies the channel order in the tensor. Optional conversion layers are generated if the custom layer format is not. + ```xml + + + + + + + + + + + + + ``` + - Each `Scalar` node must contain the following attributes: + - `arg-name` – A name of a kernel parameter in the kernel signature. + - `type` – `int` or `float` value. It is used for correct argument extraction from IR parameters. + - `source` – Contains the name of the parameter in the IR file or input/output (`I`/`O`, `In`/`On`, where `n` is a port number) + followed by dimension `B`(batch), `Y`(height), `X`(width), or `F`(channels). + + - Each `Data` node must contain the following attributes: + - `arg-name` – A name of a kernel parameter in the kernel signature. + - `type` – Node type. Currently, `local_data` is the only supported value, which defines buffer allocated in fast local on-chip memory. It is limited to 100K for all `__local` and + `__private` arrays defined inside the kernel as well as all `__local` parameters passed to the kernel. Please, consider that a manual-DMA extension requires double buffering. + If the custom layer is detected to run out of local memory, the inference fails. + - `dim` – The dim source with the same `direction,port` format used for `WorkSizes` bindings. + - `size` – Amount of bytes needed. The current expression syntax supports only expression over dimensions of over selected input/output tensor or constants and may be extended in the future. + The example binding below illustrates a kernel with two local buffers passed to the kernel. + ```xml + + + + + + + + + + + + + + +``` + +## Pass Configuration File to Inference Runtime + +> **NOTE**: If both native and custom layer implementations are present, the custom kernel has a priority over the native one. + +Before loading the network that features the custom layers, provide a separate configuration file and load it using the InferenceEngine::Core::SetConfig() method with the PluginConfigParams::KEY_CONFIG_FILE key and the configuration file name as a value: +```cpp +InferenceEngine::Core core; +// Load custom layers +core.SetConfig({ { InferenceEngine::PluginConfigParams::KEY_CONFIG_FILE, "" } }, "MYRIAD"); +``` +Optionally, set a path to a custom layers description with a pair of `VPU_CUSTOM_LAYERS` and `/path/to/your/customLayers.xml` +as a network configuration: +```cpp +InferenceEngine::Core core; +std::map networkConfig; +config["VPU_CUSTOM_LAYERS"] = "/path/to/your/customLayers.xml"; +// Load custom layers in network config +auto exeNetwork = core.LoadNetwork(cnnNetwork, "MYRIAD", networkConfig); +``` + +## Optimizing Kernels with OpenCL™ for VPU (Intel® Neural Compute Stick 2) + +This section provides optimization guidelines on writing custom layers with OpenCL for VPU devices. Knowledge about general OpenCL +programming model and OpenCL kernel language is assumed and not a subject of this section. The OpenCL model mapping to VPU is described in the table below. + +| OpenCL Model | VPU Mapping| +|-----|----| +| Device code | Executed on SHAVE cores | +| Private memory | Mapped to CMX internal memory, limited to 100KB per work group, valid only while the work group is executed | +| Local memory | Mapped to CMX internal memory, limited to 100KB per work group, valid only while the work group is executed | +| Global memory | Mapped to DDR, used to pass execution preserved parameters for inputs, outputs, and blobs | +| Work group | Executed on a single SHAVE core iterating over multiple work items | + +Note that by the OpenCL specification, the work group execution order is not specified. This means that it is your +responsibility to ensure that race conditions among work groups are not introduced. Custom layer runtime spits evenly +work grid among available compute resources and executes them in an arbitrary order. This static scheduling approach works best if the load is evenly spread out across work groups, which is a typical case for Deep Learning kernels. The following guidelines are recommended to use for work group partitioning: + +1. Split work evenly across work groups. +2. Adjust work group granularity to maintain equal workload for all compute codes. +3. Set the maximum number of cores (using the `max-shaves` attribute for the `CustomLayer` node). This keeps more resources for the rest of topology. It is also useful if the kernel scalability reached its limits, which may happen while optimizing memory bound kernels or kernels with poor parallelization. +4. Try an alternate data layout (`BFXY`/`BYXF`) for the kernel if it improves work group partitioning or data access patterns. +Consider full topology performance (not just specific layer boost) since data conversion layers would be automatically inserted +as appropriate. + +Offline OpenCL compiler (`clc`) features automatic vectorization over `get_global_id(0)` usage, if uniform access is detected. +For example, the kernel below could be automatically vectorized: +```cpp +__kernel void cvtf32f16(__global float* restrict inImage, __global half* restrict outImage, + float scale, float bais) +{ + int idx = get_global_id(0) + get_global_id(1) * get_global_size(0) + get_global_id(2) * get_global_size(0) * get_global_size(1); + outImage[idx] = convert_half(inImage[idx]*scale+bais); +} +``` +However, this work-group based vectorizer (WGV) conflicts with the default LLVM vectorizer based on superword level parallelism +(SLP) for the current compiler version. Manual vectorization is recommended to provide the best performance for non-uniform code +patterns. WGV works if and only if vector types are not used in the code. + +Here is a short list of optimization tips: + +1. Help auto-vectorizer ensure non-aliasing pointers for kernel parameters by putting `restrict` where possible. + - This may give a performance boost, especially for kernels with unrolling, like `ocl_grn` from the example below. + - Place `restrict` markers for kernels with manually vectorized codes. In the `ocl_grn` kernel below, the unrolled version without `restrict` is up to 20% slower than the most optimal one, which combines unrolling and `restrict`. +2. Put `#‍pragma unroll N` to your loop header. Since the compiler does not trigger unrolling by default, it is your responsibility to +annotate the code with pragmas as appropriate. The `ocl_grn` version with `#‍pragma unroll 4` is up to 50% faster, most of which comes from unrolling the first loop, because LLVM, in general, is better in scheduling 3-stage loops (load-compute-store), while the fist loop + `variance += (float)(src_data[c*H*W + y*W + x] * src_data[c*H*W + y*W + x]);` is only 2-stage (load-compute). Please, pay +attention to unrolling such cases first. Unrolling factor is loop-dependent. Choose the smallest number that +still improves performance as an optimum between the kernel size and execution speed. For this specific kernel, changing the unroll factor from `4`to `6` results in the same performance, so unrolling factor equal to 4 is an optimum. For Intel® Neural Compute Stick 2, unrolling is conjugated with the automatic software pipelining for load, store, and compute stages: +```cpp +__kernel void ocl_grn(__global const half* restrict src_data, __global half* restrict dst_data, int C, float bias) +{ + int x = get_global_id(0); + int W = get_global_size(0); + int y = get_global_id(1); + int H = get_global_size(1); + + float variance = bias + 1e-9f; + + #pragma unroll 4 + for (int c = 0; c < C; c++) + variance += (float)(src_data[c*H*W + y*W + x] * src_data[c*H*W + y*W + x]); + + variance = 1.f / native_sqrt(variance); + + #pragma unroll 4 + for (int c = 0; c < C; c++) + dst_data[c*H*W + y*W + x] = (half)((float)src_data[c*H*W + y*W + x] * variance); +} +``` +To check the efficiency of WGV, you can compare performance of the kernel above with the kernel below, which is manually vectorized over width: +```cpp +__kernel void ocl_grn_line(__global const half* restrict src_data, __global half* restrict dst_data, int C, int W, float bias) +{ + int y = get_global_id(1); + int H = get_global_size(1); + + for (int x = 0; x < W/8; x++) + { + float8 variance = (float8)(bias+1e-9f); + + #pragma unroll 4 + for (int c = 0; c < C; c++) + { + __global const half8* restrict src_line = ((__global const half8 * restrict)(src_data + c*H*W + y*W)); + half8 sh = src_line[x]; + variance += convert_float8(sh*sh); + } + + variance = 1.f/native_sqrt(variance); + + #pragma unroll 4 + for (int c = 0; c < C; c++) + { + __global const half8* restrict src_line = ((__global const half8 * restrict)(src_data + c*H*W + y*W)); + __global half8* restrict dst_line = ((__global half8 * restrict)(dst_data + c*H*W + y*W)); + + dst_line[x] = convert_half8(convert_float8(src_line[x])*variance); + } + } + for (int x = W/8*8; x < W; x++) + { + float variance = bias+1e-9f; + #pragma unroll 4 + for (int c = 0; c < C; c++) + variance += (float)(src_data[c*H*W + y*W + x]*src_data[c*H*W + y*W + x]); + + variance = 1.f/native_sqrt(variance); + + #pragma unroll 4 + for (int c = 0; c < C; c++) + dst_data[c*H*W + y*W + x] = (float)src_data[c*H*W + y*W + x]*variance; + } +} +``` +Both versions perform the same, but the second one has more complex code. + +3. If it is easy to predict the work group size, you can also use the `reqd_work_group_size` kernel attribute to ask the compiler +to unroll the code up to local size of the work group. Please note that if the kernel is actually executed with the +different work group configuration, the result is undefined. + +4. Prefer to use the `half` compute, if it keeps reasonable accuracy. 16-bit float is a native type for Intel® Neural Compute Stick 2, most of the functions `half_*` are mapped to a single hardware instruction. +Use the standard `native_*` function for the rest of types. + +5. Prefer to use the `convert_half` function over `vstore_half` if conversion to 32-bit float is required. `convert_half` is mapped to a single hardware instruction. For the `cvtf32f16` kernel above, the line `outImage[idx] = convert_half(inImage[idx]*scale+bais);` is 8 times slower than the code with `vstore_half`. + +6. Mind early exits. Early exit may be extremely costly for the current version of the `clc` compiler due to conflicts with the +auto-vectorizer. The generic advice would be to setup local size by `x` dimension equal to inputs or/and outputs width. +If it is impossible to define the work grid that exactly matches inputs or/and outputs to eliminate checks, for example, +`if (get_global_id(0) >= width) return`, use line-wise kernel variant with manual vectorization. +The kernel example below demonstrates the impact of early exits on kernel performance. + ```cpp + // Initial version + __kernel void reorg(const __global half* restrict src, __global half* restrict out, int stride) + { + int w = get_global_id(0); + int W = get_global_size(0); + + int h = get_global_id(1); + int H = get_global_size(1); + + int c = get_global_id(2); + int C = get_global_size(2); + + int C2 = C/(stride*stride); + int offset = c / C2; + int c2 = c - C2 * offset; + + int H2 = H*stride; + int W2 = W*stride; + + int h2 = h*stride + offset / stride; + int w2 = w*stride + offset - stride * (offset / stride); + + out[W*H*c + W*h + w] = src[W2*H2*c2 + W2*h2 + w2]; + } + ``` +This `reorg` kernel is auto-vectorizable, but an input for YOLO v2 topology is `NCHW=<1,64,26,26>` and it is not multiple of vector width (which is `8` for `half` data type). As a result, the Inference Engine does not select the auto-vectorized kernel. +To compare performance of auto-vectorized and scalar version of the kernel, change the input size to`NCHW=<1,64,26,32>`. This allows the auto-vectorized version to be selected by the Inference Engine and can give you about 30% uplift. +Since the auto-vectorized version is faster, it makes sense to enable it for the YOLO v2 topology input size by setting the local size multiple of vector (e.g. 32) and adjust global sizes accordingly. As a result, the execution work grid exceeds actual input dimension, so out-of-bound checks should be inserted. See the updated kernel version below: + ```cpp + // Version with out-of-bound checks added + __kernel void reorg(const __global half* restrict src, __global half* restrict out, int W, int stride) + { + int w = get_global_id(0); + w = min(w, W-1); + + int h = get_global_id(1); + int H = get_global_size(1); + + int c = get_global_id(2); + int C = get_global_size(2); + + int C2 = C/(stride*stride); + int offset = c / C2; + int c2 = c - C2 * offset; + + int H2 = H*stride; + int W2 = W*stride; + + int h2 = h*stride + offset / stride; + int w2 = w*stride + offset - stride * (offset / stride); + + out[W*H*c + W*h + w] = src[W2*H2*c2 + W2*h2 + w2]; + } + ``` +This code performs the same as the initial kernel above (scalar) due to branching overhead. If you replace min/max expression `w = min(w, W-1);` with `if (w >= W) return;`, runtime increases up to 2x against to code without branching (initial version).
+If branching is inevitable for your element-based kernel, it is recommended to change the scheme to line-based. See the kernel variant below: +```cpp +// Line-wise version +__kernel void reorg(const __global half* restrict src, __global half* restrict out, int H, int W, int stride) +{ + int h = min((int)get_global_id(0), H-1); + + int c = get_global_id(1); + int C = get_global_size(1); + int C2 = C/(stride*stride); + int offset = c / C2; + int c2 = c - C2 * offset; + + int H2 = H*stride; + int W2 = W*stride; + + for (int w = 0; w < W; ++w) + { + int h2 = h*stride + offset / stride; + int w2 = w*stride + offset - stride * (offset / stride); + + out[W*H*c + W*h + w] = src[W2*H2*c2 + W2*h2 + w2]; + } +} +``` +This decreases the execution time up to 40% against the best performing vectorized kernel without early exits (initial version). +7. Reuse computations among work items by using line-based kernels or sharing values though `__local` memory. +8. Improve data access locality. Most of custom kernels are memory bound while convolution and fully connected layers are hardware-implemented. The code below demonstrates a further optimized version of the `reorg` kernel unrolled by `stride`: + ```cpp + // Unrolled line-wise version + __kernel void reorg_unrolled_by_stride(const __global half* restrict src, __global half* restrict dst, + int H, int W, int stride) + { + int h = min((int)get_global_id(0), H-1); + + int c2 = get_global_id(1); + int C2 = get_global_size(1); + int C = C2*stride*stride; + + int H2 = H*stride; + int W2 = W*stride; + + for (int stride_y = 0; stride_y < stride; stride_y++) + for (int stride_x = 0; stride_x < stride; stride_x++) + for (int w2 = 0, w = 0; w < W; w2 += stride, w++) + dst[W*H*C2*(stride_y*stride+stride_x) + W*H*c2 + W*h + w] = src[W2*H2*c2 + W2*h*stride + W2*stride_y + w2 + stride_x]; + } + ``` +`scr` data in this case loaded only once. As the result, the cycle count drops up to 45% against the line-wise version. + +9. Copy data from `__dlobal` to `__local` or `__private` memory if the data is accessed more than once. Access to +`__dlobal` memory is orders of magnitude slower than access to `__local`/`__private` due to statically scheduled pipeline, which +stalls completely on memory access without any prefetch. The same recommendation is applicable for scalar load/store +from/to a `__blobal` pointer since work-group copying could be done in a vector fashion. + +10. Use a manual DMA extension. Local (on-chip) memory throughput is up to 24x higher than DDR throughput. Starting from OpenVINO™ 2020.1, VPU OpenCL features manual-DMA kernel extension to copy sub-tensor used by work group into local memory and performing compute without DDR evolved. Here is the simple GRN kernel implementation that runs over DDR. Local size is equal to (width of the input tensor, 1, 1) to define a large enough work group to get code automatically vectorized and unrolled, while global size is (width of the input tensor, height of the input tensor, 1): + ```cpp + __kernel void grn_NCHW( + __global const half* restrict src_data, + __global half* restrict dst_data, + int C, + float bias) + { + float variance = bias + 1e-9f; + + #pragma unroll 4 + for (int c = 0; c < C; c++) + { + float val = (float) src_data[c*get_global_size(1)*get_global_size(0) + get_global_id(1)*get_global_size(0) + get_global_id(0)]; + variance += val*val; + } + + half hvariance = (half)(native_rsqrt((half)(variance/16.f))*0.25f); + + #pragma unroll 4 + for (int c = 0; c < C; c++) + { + dst_data[c*get_global_size(1)*get_global_size(0) + get_global_id(1)*get_global_size(0) + get_global_id(0)] + = src_data[c*get_global_size(1)*get_global_size(0) + get_global_id(1)*get_global_size(0) + get_global_id(0)] * hvariance; + } + } + ``` +This kernel can be rewritten to introduce special data binding `__dma_preload` and `__dma_postwrite intrinsics`. This means that instead of one kernel, a group of three kernels should be implemented: `kernelName`, `__dma_preload_kernelName` and `__dma_postwrite_kernelName`. `__dma_preload_kernelName` for a particular work group `n` is guaranteed to be executed before `n`-th work group itself, while `__dma_postwrite_kernelName` is guarantied to be executed after a corresponding work group. You can define one of those functions that are intended to be used to copy data from-to `__global` and `__local` memory. The syntactics requires exact functional signature match. The example below illustrates how to prepare your kernel for manual-DMA. + ```cpp + __kernel void __dma_preload_grn_NCHW( + __global const half* restrict src, + __global half* restrict dst, + __local half* restrict local_src, + __local half* restrict local_dst, + int C, + float bias) + { + // ToDO: copy required piece of src tensor into local_src + } + + __kernel void __dma_postwrite_grn_NCHW( + __global const half* restrict src, + __global half* restrict dst, + __local const half* restrict local_src, + __local half* restrict local_dst, + int C, + float bias) + { + // ToDO: copy back computed piece of local_dst into dst + } + + __kernel void grn_NCHW( + __global const half* restrict src_data, + __global half* restrict dst_data, + __local half* restrict src, + __local half* restrict dst, + int C, + float bias) + { + // same as the example above + } + ``` +GRN kernel operates on channel-major tensors to compute average over full channel range and then normalizes input elements to produce the output. +As a part of manual DMA extension, a group of work group copy functions are introduced in addition to `async_work_group_copy`, which is also mapped to DMA call. + +Here is the list of supported functions: +```cpp +// 2D sub-tensor copy +event_t WorkGroupDmaCreateStrideTransaction( + const local T *src, + global T *dst, + size_t src_width, // width of the line of source in bytes + size_t dst_width, // width of the line of destination in bytes + size_t src_stride, // stride between corresponding 2 consecutive lines of source in bytes + size_t dst_stride, // stride between corresponding 2 consecutive lines of destination in bytes + size_t size, // total number of bytes loaded for all lines from source to destination + event_t event) __OVERLOAD; + + +event_t WorkGroupDmaCreateStrideTransaction( + const global T *src, + local T *dst, + size_t src_width, // width of the line of source in bytes + size_t dst_width, // width of the line of destination in bytes + size_t src_stride, // stride between corresponding 2 consecutive lines of source in bytes + size_t dst_stride, // stride between corresponding 2 consecutive lines of destination in bytes + size_t size, // total number of bytes loaded for all lines from source to destination + event_t event) __OVERLOAD; + +// 3D sub-tensor copy +event_t WorkGroupDmaCreate3DTransaction( + const local T *src, + global T *dst, + size_t src_width, // width of the line of source in bytes + size_t dst_width, // width of the line of destination in bytes + size_t src_stride, // stride between corresponding 2 consecutive lines of source in bytes + size_t dst_stride, // stride between corresponding 2 consecutive lines of destination in bytes + size_t num_planes, // number of planes to be copied + size_t src_plane_stride, // stride between corresponding 2 consecutive planes of source in bytes + size_t dst_plane_stride, // stride between corresponding 2 consecutive planes of destination in bytes + size_t size, // size of the loaded plane in bytes, analogues to the size in 2D case + event_t event) __OVERLOAD; + +event_t WorkGroupDmaCreate3DTransaction( + const global T *src, + local T *dst, + size_t src_width, // width of the line of source in bytes + size_t dst_width, // width of the line of destination in bytes + size_t src_stride, // stride between corresponding 2 consecutive lines of source in bytes + size_t dst_stride, // stride between corresponding 2 consecutive lines of destination in bytes + size_t num_planes, // number of planes to be copied + size_t src_plane_stride, // stride between corresponding 2 consecutive planes of source in bytes + size_t dst_plane_stride, // stride between corresponding 2 consecutive planes of destination in bytes + size_t size, // size of the loaded plane in bytes, analogues to the size in 2D case + event_t event) __OVERLOAD; +``` +where `T` can be `uchar`, `char`, `short`, `ushort`, `int`, `uint`, `long`, `ulong`, `half` or `float`. + +Modified version of the GRN kernel could be the following: +```cpp +__kernel void __dma_preload_grn_NCHW( + __global const half* restrict src, + __global half* restrict dst, + __local half* restrict local_src, + __local half* restrict local_dst, + int C, + float bias) +{ + WorkGroupDmaCreate3DTransaction( + src + get_group_id(0)*get_local_size(0) + + get_group_id(1)*get_local_size(1)*get_global_size(0), // src + local_src, // dst + get_local_size(0) * sizeof(half), // src width + get_local_size(0) * sizeof(half), // dst width + get_global_size(0) * sizeof(half), // src stride + get_local_size(0) * sizeof(half), // dst stride + C, // num planes + get_global_size(0) * get_global_size(1) * sizeof(half), // src plane stride + get_local_size(0) * get_local_size(1) * sizeof(half), // dst plane stride + get_local_size(0) * get_local_size(1) * sizeof(half), // plane size + 0); +} + +__kernel void __dma_postwrite_grn_NCHW( + __global const half* restrict src, + __global half* restrict dst, + __local const half* restrict local_src, + __local half* restrict local_dst, + int C, + float bias) +{ + WorkGroupDmaCreate3DTransaction( + local_dst, // src + dst + get_group_id(0)*get_local_size(0) + + get_group_id(1)*get_local_size(1)*get_global_size(0), // dst + get_local_size(0) * sizeof(half), // src width + get_local_size(0) * sizeof(half), // dst width + get_local_size(0) * sizeof(half), // src stride + get_global_size(0) * sizeof(half), // dst stride + C, // num planes + get_local_size(0) * get_local_size(1) * sizeof(half), // src plane stride + get_global_size(0) * get_global_size(1) * sizeof(half), // dst plane stride + get_local_size(0) * get_local_size(1) * sizeof(half), // plane size + 0); +} + +__kernel void grn_NCHW( + __global const half* restrict src_data, + __global half* restrict dst_data, + __local half* restrict src, + __local half* restrict dst, + int C, + float bias) +{ + float variance = bias + 1e-9f; + + #pragma unroll 8 + for (int c = 0; c < C; c++) + { + float val = (float) src[c*get_local_size(1)*get_local_size(0) + get_local_id(1)*get_local_size(0) + get_local_id(0)]; + variance += val*val; + } + + half hvariance = (half)(native_rsqrt((half)(variance/16.f))*0.25f); + + #pragma unroll 8 + for (int c = 0; c < C; c++) + { + dst[c*get_local_size(1)*get_local_size(0) + get_local_id(1)*get_local_size(0) + get_local_id(0)] + = src[c*get_local_size(1)*get_local_size(0) + get_local_id(1)*get_local_size(0) + get_local_id(0)] * hvariance; + } +} +``` + +Please note `get_local_size` and `get_local_id` usage inside the kernel. 21x speedup is expected for a kernel on enet-curbs setup since it was completely limited by memory usage. + +An alternative method of using DMA is to use work item copy extension. Those functions are executed inside a kernel and requires work groups equal to single work item. + +Here is the list of supported work item functions: +```cpp +item_dma_event_t WorkItemDmaCreateTransaction( + const global T *src, + private T *dst, + size_t size, + item_dma_event_t event) __OVERLOAD; + +item_dma_event_t WorkItemDmaCreateTransaction( + const private T *src, + global T *dst, + size_t size, + item_dma_event_t event) __OVERLOAD; + +item_dma_event_t WorkItemDmaCreateStrideTransaction( + const global T *src, + private T *dst, + size_t src_width, + size_t dst_width, + size_t src_stride, + size_t dst_stride, + size_t size, + item_dma_event_t event) __OVERLOAD; + +item_dma_event_t WorkItemDmaCreateStrideTransaction( + const private T *src, + global T *dst, + size_t src_width, + size_t dst_width, + size_t src_stride, + size_t dst_stride, + size_t size, + item_dma_event_t event) __OVERLOAD; + +item_dma_event_t WorkItemDmaCreate3DTransaction( + const global T *src, + private T *dst, + size_t src_width, + size_t dst_width, + size_t src_stride, + size_t dst_stride, + size_t num_planes, + size_t src_plane_stride, + size_t dst_plane_stride, + size_t size, + item_dma_event_t event) __OVERLOAD; + +item_dma_event_t WorkItemDmaCreate3DTransaction( + const private T *src, + global T *dst, + size_t src_width, + size_t dst_width, + size_t src_stride, + size_t dst_stride, + size_t num_planes, + size_t src_plane_stride, + size_t dst_plane_stride, + size_t size, + item_dma_event_t event) __OVERLOAD; +``` +where `T` can be `uchar`, `char`, `short`, `ushort`, `int`, `uint`, `long`, `ulong`, `half` or `float`. diff --git a/docs/IE_DG/Extensibility_DG/deprecated/Factory.md b/docs/IE_DG/Extensibility_DG/deprecated/Factory.md new file mode 100644 index 00000000000000..82370cbfc80dab --- /dev/null +++ b/docs/IE_DG/Extensibility_DG/deprecated/Factory.md @@ -0,0 +1,96 @@ +# Deprecated API for CPU kernels creation {#openvino_docs_IE_DG_Extensibility_DG_deprecated_Factory} + +List of deprecated API for kernels development: + * `InferenceEngine::IExtension::getPrimitiveTypes(char**& types, unsigned int& size, ResponseDesc* resp)` method + * `InferenceEngine::IExtension::getFactoryFor(ILayerImplFactory *&factory, const CNNLayer *cnnLayer, ResponseDesc *resp)` method + * `InferenceEngine::ILayerImplFactory` class + +>**NOTE**: This guide demonstrates how to use deprecated API for kernels creation. However, keep in mind that this API will be deleted soon. + +1. Create your custom layer factory `CustomLayerFactory` class: +```cpp +// custom_layer.h +// A CustomLayerFactory class is an example layer, which makes exponentiation by 2 for the input and does not change dimensions +class CustomLayerFactory { + +}; +``` +2. Inherit it from the abstract `InferenceEngine::ILayerImplFactory` class: +```cpp +// custom_layer.h +class CustomLayerFactory: public InferenceEngine::ILayerImplFactory { + +}; +``` + +3. Create a constructor, a virtual destructor, and a data member to keep the layer info: +```cpp +// custom_layer.h +class CustomLayerFactory: public InferenceEngine::ILayerImplFactory { +public: + explicit CustomLayerFactory(const CNNLayer *layer): cnnLayer(*layer) {} +private: + CNNLayer cnnLayer; +}; +``` + +4. Overload and implement the abstract methods `getShapes` and `getImplementations` of the `InferenceEngine::ILayerImplFactory` class: +```cpp +// custom_layer.h +class CustomLayerFactory: public InferenceEngine::ILayerImplFactory { +public: + // ... constructor and destructor + + StatusCode getShapes(const std::vector& inShapes, std::vector& outShapes, ResponseDesc *resp) noexcept override { + if (cnnLayer == nullptr) { + std::string errorMsg = "Cannot get cnn layer!"; + errorMsg.copy(resp->msg, sizeof(resp->msg) - 1); + return GENERAL_ERROR; + } + if (inShapes.size() != 1) { + std::string errorMsg = "Incorrect input shapes!"; + errorMsg.copy(resp->msg, sizeof(resp->msg) - 1); + return GENERAL_ERROR; + } + outShapes.clear(); + outShapes.emplace_back(inShapes[0]); + return OK; + } + + StatusCode getImplementations(std::vector& impls, ResponseDesc *resp) noexcept override { + // You can add cnnLayer to implementation if it is necessary + impls.push_back(ILayerImpl::Ptr(new CustomLayerImpl())); + return OK; + } +}; +``` +5. Create your custom layer implementation `CustomLayerImpl` class using the [instruction](../CPU_Kernel.md). + +6. Implement methods in the `Extension` class: +```cpp +// custom_extension.h +class CustomExtention : public InferenceEngine::IExtension { +public: + // ... utility methods + // Retruns the list of supported kernels/layers + StatusCode getPrimitiveTypes(char**& types, unsigned int& size, ResponseDesc* resp) noexcept override { + std::string type_name = "CustomLayer"; + types = new char *[1]; + size = 1; + types[0] = new char[type_name.size() + 1]; + std::copy(type_name.begin(), type_name.end(), types[0]); + types[0][type_name.size()] = '\0'; + return OK; + } + // Main function + StatusCode getFactoryFor(ILayerImplFactory *&factory, const CNNLayer *cnnLayer, ResponseDesc *resp) noexcept override { + if (cnnLayer->type != "CustomLayer") { + std::string errorMsg = std::string("Factory for ") + cnnLayer->type + " wasn't found!"; + errorMsg.copy(resp->msg, sizeof(resp->msg) - 1); + return NOT_FOUND; + } + factory = new CustomLayerFactory(cnnLayer); + return OK; + } +}; +``` diff --git a/docs/IE_DG/Extensibility_DG/deprecated/ShapeInfer.md b/docs/IE_DG/Extensibility_DG/deprecated/ShapeInfer.md new file mode 100644 index 00000000000000..5e7101c9d269cf --- /dev/null +++ b/docs/IE_DG/Extensibility_DG/deprecated/ShapeInfer.md @@ -0,0 +1,18 @@ +# Old ShapeInference Extensibility API {#openvino_docs_IE_DG_Extensibility_DG_deprecated_ShapeInfer} + +The new approach to shape inference suggests a creation of a custom nGraph operation that contains a special method for shape inference. +The following classes and methods were deprecated: + + * `InferenceEngine::IShapeInferExtension` class + * `InferenceEngine::IShapeInferExtension::getShapeInferTypes(char**&, unsigned int&, ResponseDesc*)` method + * `InferenceEngine::IShapeInferExtension::getShapeInferImpl(IShapeInferImpl::Ptr&, const char*, ResponseDesc*)` method + +However, the old approach with the `InferenceEngine::IShapeInferExtension` method still works for already existing custom layers. +Custom Shape Inference functions are registered by calling `InferenceEngine::ICNNNetwork::AddExtension` with the implemented `InferenceEngine::IShapeInferExtension` method, which is a holder of custom implementations. +The holder requires to implement two key methods: +* `InferenceEngine::IShapeInferExtension::getShapeInferImpl` - Returns custom shape inference implementation for the given type. +* `InferenceEngine::IShapeInferExtension::getShapeInferTypes` - Provides all custom types. + +Custom shape inference implementation is represented by the `InferenceEngine::IShapeInferImpl::inferShapes` method. + +It is impossible to overwrite built-in shape inference functions. Custom type must be different from the supported ones. diff --git a/docs/IE_DG/GPU_Kernels_Tuning.md b/docs/IE_DG/GPU_Kernels_Tuning.md new file mode 100644 index 00000000000000..0b308682f40ff4 --- /dev/null +++ b/docs/IE_DG/GPU_Kernels_Tuning.md @@ -0,0 +1,43 @@ +Using GPU Kernels Tuning {#openvino_docs_IE_DG_GPU_Kernels_Tuning} +====================== + +GPU Kernels Tuning allows you to tune models, so the heavy computational layers are configured to fit better into +hardware, which the tuning was done on. It is required to achieve best performance on GPU. +> **NOTE** Currently only convolution and fully connected layers undergo tuning process. It means that the performance boost depends on the amount of that layers in the model. + +OpenVINO™ releases include the `/inference_engine/bin/intel64/Release/cache.json` file with pretuned data for current state of the art models. It is highly recommended to do the +tuning for new kind of models, hardwares or drivers. + +## Tuned data + +GPU tuning data is saved in JSON format. +File's content is composed of 2 types of attributes and 1 type of value: +1. Execution units number - this attribute splits the content into different EU sections. +2. Hash - hashed tuned kernel data. +Key: Array with kernel name and kernel's mode index. + +## Usage + +--- + +You can activate Kernels Tuning process by setting `KEY_TUNING_MODE` flag to `TUNING_CREATE` and `KEY_TUNING_FILE` to `<"filename">` in a configuration map that is +passed to the plugin while loading a network. +This configuration modifies the behavior of the `ExecutableNetwork` object. Instead of standard network compilation, it will run the tuning process. +Please keep in mind that the tuning can be very time consuming. The bigger the network, the longer it will take. +File with tuned data is the result of this step. + +> **NOTE** If a filename passed to `KEY_TUNING_FILE` points to existing tuned data and you are tuning a new model, then this file will be extended by new data. This allows you to extend existing `cache.json` provided in the OpenVINO™ release package. + +The example below shows how to set and use the key files: +```cpp +Core ie; + ie.SetConfig({{ CONFIG_KEY(TUNING_MODE), CONFIG_VALUE(TUNING_CREATE) }}, "GPU"); + ie.SetConfig({{ CONFIG_KEY(TUNING_FILE), "/path/to/tuning/file.json" }}, "GPU"); + // Further LoadNetwork calls will use the specified tuning parameters +``` +--- + +You can activate the inference with tuned data by setting `KEY_TUNING_MODE` flag to `TUNING_USE_EXISTING` and +`KEY_TUNING_FILE` flag to `<"filename">`. + +GPU backend will process the content of the file during network compilation to configure the OpenCL kernels for the best performance. diff --git a/docs/IE_DG/Glossary.md b/docs/IE_DG/Glossary.md new file mode 100644 index 00000000000000..139a35bb84e11a --- /dev/null +++ b/docs/IE_DG/Glossary.md @@ -0,0 +1,89 @@ +Glossary {#openvino_docs_IE_DG_Glossary} +======= + +## Acronyms and Abbreviations + +| Abbreviation | Description | +| :--- | :--- | +| API | Application Programming Interface | +| AVX | Advanced Vector Extensions | +| clDNN | Compute Library for Deep Neural Networks | +| CLI | Command Line Interface | +| CNN | Convolutional Neural Network | +| CPU | Central Processing Unit | +| CV | Computer Vision | +| DL | Deep Learning | +| DLDT | Intel(R) Deep Learning Deployment Toolkit | +| DLL | Dynamic Link Library | +| DNN | Deep Neural Networks | +| ELU | Exponential Linear rectification Unit | +| FCN | Fully Convolutional Network | +| FP | Floating Point | +| FPGA | Field-Programmable Gate Array | +| GCC | GNU Compiler Collection | +| GPU | Graphics Processing Unit | +| HD | High Definition | +| IE | Inference Engine | +| IR | Intermediate Representation | +| JIT | Just In Time | +| JTAG | Joint Test Action Group | +| LPR | License-Plate Recognition | +| LRN | Local Response Normalization | +| mAP | Mean Average Precision | +| Intel(R) MKL-DNN | Intel(R) Math Kernel Library Deep Neural Networks | +| MO | Model Optimizer | +| MVN | Mean Variance Normalization | +| NCDHW | Number of images, Channels, Depth, Height, Width | +| NCHW | Number of images, Channels, Height, Width | +| NHWC | Number of images, Height, Width, Channels | +| NMS | Non-Maximum Suppression | +| NN | Neural Network | +| NST | Neural Style Transfer | +| OD | Object Detection | +| OS | Operating System | +| PCI | Peripheral Component Interconnect | +| PReLU | Parametric Rectified Linear Unit | +| PSROI | Position Sensitive Region Of Interest | +| RCNN, R-CNN | Region-based Convolutional Neural Network | +| ReLU | Rectified Linear Unit | +| ROI | Region Of Interest | +| SDK | Software Development Kit | +| SSD | Single Shot multibox Detector | +| SSE | Streaming SIMD Extensions | +| USB | Universal Serial Bus | +| VGG | Visual Geometry Group | +| VOC | Visual Object Classes | +| WINAPI | Windows Application Programming Interface | + +## Terms + +Glossary of terms used in the Inference Engine + + +| Term | Description | +| :--- | :--- | +| Batch | Number of images to analyze during one call of infer. Maximum batch size is a property of the network and it is set before loading of the network to the plugin. In NHWC, NCHW and NCDHW image data layout representation, the N refers to the number of images in the batch | +| Blob | Memory container used for storing inputs, outputs of the network, weights and biases of the layers | +| Device (Affinitity) | A preferred Intel(R) hardware device to run the inference (CPU, GPU, FPGA, etc.) | +| Extensibility mechanism, Custom layers | The mechanism that provides you with capabilities to extend the Inference Engine and Model Optimizer so that they can work with topologies containing layers that are not yet supported | +| ICNNNetwork | An Interface of the Convolutional Neural Network that Inference Engine reads from IR. Consists of topology, weights and biases | +| IExecutableNetwork | An instance of the loaded network which allows the Inference Engine to request (several) infer requests and perform inference synchronously or asynchronously | +| IHeteroInferencePlugin | Interface that is implemented by the heterogeneity plugin to allow the Inference Engine to set the default affinities for layers by devices before loading the network to the heterogeneous plugin. You can modify affinities manually before loading to the plugin. | +| IInferencePlugin | Interface provided by each plugin to allow the Inference Engine to load ICNNNetwork to the plugin, create Executable network and set special dedicated options for the plugin | +| IInferRequest | Interface that represents the end point of inference on the model loaded to the plugin and represented by executable network. Inputs are set here, outputs should be requested from this interface as well | +| InferenceEngineProfileInfo | Represents basic inference profiling information per layer | +| Inference Engine | A C++ library with a set of classes that you can use in your application to infer input data (images) and get the result | +| Inference Engine API | The basic default API for all supported devices, which allows you to load a model from Intermediate Representation, set input and output formats and execute the model on various devices | +| Inference Engine Plugin | Inference Engine plugin is a software component that contains complete implementation for inference on a certain Intel(R) hardware device: CPU, GPU, VPU, FPGA, etc. Each plugin implements the unified API and provides additional hardware-specific APIs. | +| Layer catalog or Operations specification | A list of supported layers or operations and its parameters. Sets of supported layers are different for different plugins, please check the documentation on plugins to verify if the Inference Engine supports certain layer on the dedicated hardware | +| Layout | Image data layout refers to the representation of images batch. Layout shows a sequence of 4D or 5D tensor data in memory. A typical NCHW format represents pixel in horizontal direction, rows by vertical dimension, planes by channel and images into batch | +| OutputsDataMap | Structure which contains information about output precisions and layouts | +| Precision | Represents data precision. For example, FP32 is 32-bit floating point, FP16 is 16-bit floating point. Precision can be changed before loading the network to the plugin | +| PreProcessInfo | Class that represents input data for the network. It contains information about input precision, its layout, and pre-processing | +| ResponseDesc | Represents debug information for an error | + + +## See Also +* [Deep Learning Model Optimizer IR Operations Catalog](../ops/opset.md) +* [Inference Engine Memory primitives](Memory_primitives.md) +* [Terminology](supported_plugins/Supported_Devices.md) diff --git a/docs/IE_DG/Graph_debug_capabilities.md b/docs/IE_DG/Graph_debug_capabilities.md new file mode 100644 index 00000000000000..856bbeb49eb463 --- /dev/null +++ b/docs/IE_DG/Graph_debug_capabilities.md @@ -0,0 +1,64 @@ +# Graph Debug Capabilities {#openvino_docs_IE_DG_Graph_debug_capabilities} + +Inference Engine supports two different objects for a graph representation: the nGraph function and +CNNNetwork. Both representations provide an API to get detailed information about the graph structure. + +## nGraph Function + +To receive additional messages about applied graph modifications, rebuild the nGraph library with +the `-DNGRAPH_DEBUG_ENABLE=ON` option. + +To enable serialization and deserialization of the nGraph function to a JSON file, rebuild the +nGraph library with the `-DNGRAPH_JSON_ENABLE=ON` option. To serialize or deserialize the nGraph +function, call the nGraph function as follows: + +```cpp +#include + +std::shared_ptr nGraph; +... +ngraph::serialize("test_json.json", nGraph); // For graph serialization +std::ifstream file("test_json.json"); // Open a JSON file +nGraph = ngraph::deserialize(file); // For graph deserialization +``` + +To visualize the nGraph function to the xDot format or to an image file, use the +`ngraph::pass::VisualizeTree` graph transformation pass: +```cpp +#include + +std::shared_ptr nGraph; +... +std::vector> g2{nGraph}; +ngraph::pass::VisualizeTree("after.png").run_on_module(g2); // Visualize the nGraph function to an image +``` + +## CNNNetwork + +To serialize the CNNNetwork to the Inference Engine Intermediate Representation (IR) format, use the +`CNNNetwork::serialize(...)` method: +```cpp +std::shared_ptr nGraph; +... +CNNNetwork network(nGraph); +network.serialize("test_ir.xml", "test_ir.bin"); +``` +> **NOTE**: CNNNetwork created from the nGraph function might differ from the original nGraph +> function because the Inference Engine applies some graph transformation. + +## Deprecation Notice + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* diff --git a/docs/IE_DG/InferenceEngine_QueryAPI.md b/docs/IE_DG/InferenceEngine_QueryAPI.md new file mode 100644 index 00000000000000..bed82bca12b32c --- /dev/null +++ b/docs/IE_DG/InferenceEngine_QueryAPI.md @@ -0,0 +1,102 @@ +Introduction to Inference Engine Device Query API {#openvino_docs_IE_DG_InferenceEngine_QueryAPI} +=============================== + +This section provides a high-level description of the process of querying of different device properties and configuration values. +Refer to the [Hello Query Device Sample](../../inference-engine/samples/hello_query_device/README.md) sources and [Multi-Device Plugin guide](supported_plugins/MULTI.md) for example of using the Inference Engine Query API in user applications. + +## Using the Inference Engine Query API in Your Code + +The Inference Engine `Core` class provides the following API to query device information, set or get different device configuration properties: + +* InferenceEngine::Core::GetAvailableDevices - Provides a list of available devices. If there are more than one instance of a specific device, the devices are enumerated with `.suffix` where `suffix` is a unique string identifier. The device name can be passed to all methods of the `InferenceEngine::Core` class that work with devices, for example `InferenceEngine::Core::LoadNetwork`. +* InferenceEngine::Core::GetMetric - Provides information about specific device. + InferenceEngine::Core::GetConfig - Gets the current value of a specific configuration key. +* InferenceEngine::Core::SetConfig - Sets a new value for the configuration key. + +The `InferenceEngine::ExecutableNetwork` class is also extended to support the Query API: + +* InferenceEngine::ExecutableNetwork::GetMetric +* InferenceEngine::ExecutableNetwork::GetConfig +* InferenceEngine::ExecutableNetwork::SetConfig + +## Query API in the Core Class + +### GetAvailableDevices + +```cpp +InferenceEngine::Core core; +std::vector availableDevices = ie.GetAvailableDevices(); +``` + +The function returns list of available devices, for example: +``` +MYRIAD.1.2-ma2480 +MYRIAD.1.4-ma2480 +FPGA.0 +FPGA.1 +CPU +GPU +... +``` + +Each device name can then be passed to: + +* `InferenceEngine::Core::LoadNetwork` to load the network to a specific device. +* `InferenceEngine::Core::GetMetric` to get common or device specific metrics. +* All other methods of the `Core` class that accept `deviceName`. + +### GetConfig() + +The code below demonstrates how to understand whether `HETERO` device dumps `.dot` files with split graphs during the split stage: + +```cpp +InferenceEngine::Core core; +bool dumpDotFile = core.GetConfig("HETERO", HETERO_CONFIG_KEY(DUMP_GRAPH_DOT)).as(); +``` + +For documentation about common configuration keys, refer to `ie_plugin_config.hpp`. Device specific configuration keys can be found in corresponding plugin folders. + +### GetMetric() + +* To extract device properties such as available device, device name, supported configuration keys, and others, use the `InferenceEngine::Core::GetMetric` method: + +```cpp +InferenceEngine::Core core; +std::string cpuDeviceName = core.GetMetric("GPU", METRIC_KEY(FULL_DEVICE_NAME)).as(); +``` + +A returned value looks as follows: `Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz`. + +> **NOTE**: All metrics have specific type, which is specified during metric instantiation. The list of common device-agnostic metrics can be found in `ie_plugin_config.hpp`. Device specific metrics (for example, for `HDDL`, `MYRIAD` devices) can be found in corresponding plugin folders. + +## Query API in the ExecutableNetwork Class + +### GetMetric() + +The method is used to get executable network specific metric such as `METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)`: +```cpp +InferenceEngine::Core core; +auto exeNetwork = core.LoadNetwork(network, "CPU"); +auto nireq = exeNetwork.GetMetric(METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)).as(); +``` + +Or the current temperature of `MYRIAD` device: +```cpp +InferenceEngine::Core core; +auto exeNetwork = core.LoadNetwork(network, "MYRIAD"); +float temperature = exeNetwork.GetMetric(METRIC_KEY(DEVICE_THERMAL)).as(); +``` + +### GetConfig() + +The method is used to get information about configuration values the executable network has been created with: + +```cpp +InferenceEngine::Core core; +auto exeNetwork = core.LoadNetwork(network, "CPU"); +auto ncores = exeNetwork.GetConfig(PluginConfigParams::KEY_CPU_THREADS_NUM).as(); +``` + +### SetConfig() + +The only device that supports this method is [Multi-Device](supported_plugins/MULTI.md). diff --git a/docs/IE_DG/Int8Inference.md b/docs/IE_DG/Int8Inference.md new file mode 100644 index 00000000000000..b815f0b15fd031 --- /dev/null +++ b/docs/IE_DG/Int8Inference.md @@ -0,0 +1,127 @@ +# Low-Precision 8-bit Integer Inference {#openvino_docs_IE_DG_Int8Inference} + +## Disclaimer + +Inference Engine with low-precision 8-bit integer inference requires the following prerequisites to be satisfied: +- Inference Engine [CPU Plugin](supported_plugins/CPU.md) must be built with the Intel® Math Kernel Library (Intel® MKL) dependency. In the Intel® Distribution of OpenVINO™ it is + satisfied by default, this is mostly the requirement if you are using OpenVINO™ available in open source, because [open source version of OpenVINO™](https://github.com/openvinotoolkit/openvino) can be built with OpenBLAS* that is unacceptable if you want to use 8-bit integer inference. +- Intel® platforms that support at least one extension to x86 instruction set from the following list: + - Intel® Advanced Vector Extensions 512 (Intel® AVX-512) + - Intel® Advanced Vector Extensions 2.0 (Intel® AVX2) + - Intel® Streaming SIMD Extensions 4.2 (Intel® SSE4.2) +- A model must be quantized. To quantize the model, you can use the [Post-Training Optimization Tool](@ref pot_README) delivered with the Intel® Distribution of OpenVINO™ toolkit release package. + +The 8-bit inference feature was validated on the following topologies: +* **Classification models:** + * Caffe\* DenseNet-121, DenseNet-161, DenseNet-169, DenseNet-201 + * Caffe Inception v1, Inception v2, Inception v3, Inception v4 + * Caffe YOLO v1 tiny, YOLO v3 + * Caffe ResNet-50 v1, ResNet-101 v1, ResNet-152 v1, ResNet-269 v1 + * Caffe ResNet-18 + * Caffe MobileNet, MobileNet v2 + * Caffe SE ResNeXt-50 + * Caffe SqueezeNet v1.0, SqueezeNet v1.1 + * Caffe VGG16, VGG19 + * TensorFlow\* DenseNet-121, DenseNet-169 + * TensorFlow Inception v1, Inception v2, Inception v3, Inception v4, Inception ResNet v2 + * TensorFlow Lite Inception v1, Inception v2, Inception v3, Inception v4, Inception ResNet v2 + * TensorFlow Lite MobileNet v1, MobileNet v2 + * TensorFlow MobileNet v1, MobileNet v2 + * TensorFlow ResNet-50 v1.5, ResNet-50 v1, ResNet-101 v1, ResNet-152 v1, ResNet-50 v2, ResNet-101 v2, ResNet-152 v2 + * TensorFlow VGG16, VGG19 + * TensorFlow YOLO v3 + * MXNet\* CaffeNet + * MXNet DenseNet-121, DenseNet-161, DenseNet-169, DenseNet-201 + * MXNet Inception v3, inception_v4 + * MXNet Mobilenet, Mobilenet v2 + * MXNet ResNet-101 v1, ResNet-152 v1, ResNet-101 v2, ResNet-152 v2 + * MXNet ResNeXt-101 + * MXNet SqueezeNet v1.1 + * MXNet VGG16, VGG19 + + +* **Object detection models:** + * Caffe SSD GoogLeNet + * Caffe SSD MobileNet + * Caffe SSD SqueezeNet + * Caffe SSD VGG16 300, SSD VGG16 512 + * TensorFlow SSD MobileNet v1, SSD MobileNet v2 + * MXNet SSD Inception v3 512 + * MXNet SSD MobileNet 512 + * MXNet SSD ResNet-50 512 + * MXNet SSD VGG16 300 + * ONNX\* SSD ResNet 34 + +* **Semantic segmentation models:** + * Unet2D + +* **Recommendation system models:** + * NCF + +## Introduction + +A lot of investigation was made in the field of deep learning with the idea of using low precision computations during inference in order to boost deep learning pipelines and gather higher performance. For example, one of the popular approaches is to shrink the precision of activations and weights values from `fp32` precision to smaller ones, for example, to `fp11` or `int8`. For more information about this approach, refer to +**Brief History of Lower Precision in Deep Learning** section in [this whitepaper](https://software.intel.com/en-us/articles/lower-numerical-precision-deep-learning-inference-and-training). + +8-bit computations (referred to as `int8`) offer better performance compared to the results of inference in higher precision (for example, `fp32`), because they allow loading more data into a single processor instruction. Usually the cost for significant boost is a reduced accuracy. However, it is proved that an accuracy drop can be negligible and depends on task requirements, so that the application engineer can set up the maximum accuracy drop that is acceptable. + +Current Inference Engine solution for low-precision inference uses Intel MKL-DNN and supports inference of the following layers in 8-bit integer computation mode: +* Convolution +* FullyConnected +* ReLU +* ReLU6 +* Reshape +* Permute +* Pooling +* Squeeze +* Eltwise +* Concat +* Resample +* MVN + +This means that 8-bit inference can only be performed with the CPU plugin on the layers listed above. All other layers are executed in the format supported by the CPU plugin: 32-bit floating point format (`fp32`). + +## Low-Precision 8-bit Integer Inference Workflow + +For 8-bit integer computations, a model must be quantized. If the model is not quantized then you can use the [Post-Training Optimization Tool](@ref pot_README) to quantize the model. The quantization process adds `FakeQuantize` layers on activations and weights for most layers. Read more about mathematical computations under the hood in the [white paper](https://intel.github.io/mkl-dnn/ex_int8_simplenet.html). + +8-bit inference pipeline includes two stages (also refer to the figure below): +1. *Offline stage*, or *model quantization*. During this stage, `FakeQuantize` layers are added before most layers to have quantized tensors before layers in a way that low-precision accuracy drop for 8-bit integer inference satisfies the specified threshold. The output of this stage is a quantized model. Quantized model precision is not changed, quantized tensors are in original precision range (`fp32`). `FakeQuantize` layer has `Quantization Levels` attribute whic defines quants count. Quants count defines precision which is used during inference. For `int8` range `Quantization Levels` attribute value has to be 255 or 256. + +2. *Run-time stage*. This stage is an internal procedure of the [CPU Plugin](supported_plugins/CPU.md). During this stage, the quantized model is loaded to the plugin. The plugin updates each `FakeQuantize` layer on activations and weights to have `FakeQuantize` output tensor values in low precision range. +![int8_flow] + +### Offline Stage: Model Quantization + +To infer a layer in low precision and get maximum performance, the input tensor for the layer has to be quantized and each value has to be in the target low precision range. For this purpose, `FakeQuantize` layer is used in the OpenVINO™ intermediate representation file (IR). To quantize the model, you can use the [Post-Training Optimization Tool](@ref pot_README) delivered with the Intel® Distribution of OpenVINO™ toolkit release package. + +When you pass the calibrated IR to the [CPU plugin](supported_plugins/CPU.md), the plugin automatically recognizes it as a quantized model and performs 8-bit inference. Note, if you pass a quantized model to another plugin that does not support 8-bit inference, the model is inferred in precision that this plugin supports. + +### Run-Time Stage: Quantization + +This is the second stage of the 8-bit integer inference. After you load the quantized model IR to a plugin, the pluing uses the `Low Precision Transformation` component to update the model to infer it in low precision: +* Updates `FakeQuantize` layers to have quantized output tensors in low precision range and add dequantization layers to compensate the update. Dequantization layers are pushed through as many layers as possible to have more layers in low precision. After that, most layers have quantized input tensors in low precision range and can be inferred in low precision. Ideally, dequantization layers should be fused in next `FakeQuantize` or `ScaleShift` layers. +* Weights are quantized and stored in `Const` layers. +* Biases are updated to avoid shifts in dequantization layers. + +## Performance Counters + +Information about layer precision is stored in the performance counters that are +available from the Inference Engine API. The layers have the following marks: +* Suffix `I8` for layers that had 8-bit data type input and were computed in 8-bit precision +* Suffix `FP32` for layers computed in 32-bit precision + +For example, the performance counters table for the Inception model can look as follows: + +``` +inception_5b/5x5_reduce EXECUTED layerType: Convolution realTime: 417 cpu: 417 execType: gemm_blas_I8 +inception_5b/output EXECUTED layerType: Concat realTime: 34 cpu: 34 execType: ref_I8 +inception_5b/output_U8_nhw... EXECUTED layerType: Reorder realTime: 33092 cpu: 33092 execType: reorder_I8 +inception_5b/output_oScale... EXECUTED layerType: ScaleShift realTime: 1390 cpu: 1390 execType: jit_avx2_FP32 +inception_5b/output_oScale... EXECUTED layerType: Reorder realTime: 143 cpu: 143 execType: reorder_FP32 +inception_5b/pool EXECUTED layerType: Pooling realTime: 59301 cpu: 59301 execType: ref_any_I8 +``` + +The `execType` column of the table includes inference primitives with specific suffixes. + +[int8_flow]: img/cpu_int8_flow.png \ No newline at end of file diff --git a/docs/IE_DG/Integrate_with_customer_application_new_API.md b/docs/IE_DG/Integrate_with_customer_application_new_API.md new file mode 100644 index 00000000000000..07618a77a0a9a1 --- /dev/null +++ b/docs/IE_DG/Integrate_with_customer_application_new_API.md @@ -0,0 +1,320 @@ +Integrate the Inference Engine with Your Application {#openvino_docs_IE_DG_Integrate_with_customer_application_new_API} +=============================== + +This section provides a high-level description of the process of integrating the Inference Engine into your application. +Refer to the [Hello Classification Sample](../../inference-engine/samples/hello_classification/README.md) sources +for example of using the Inference Engine in applications. + +> **NOTE**: For 2019 R2 Release, the new Inference Engine Core API is introduced. This guide is updated to reflect the new API approach. +> The Inference Engine Plugin API is still supported, but is going to be deprecated in future releases. Please, refer to [Migration from Inference Engine Plugin API to Core API](Migration_CoreAPI.md) guide to update your application. + +## Use the Inference Engine API in Your Code + +The core `libinference_engine.so` library implements loading and parsing a model Intermediate Representation (IR), and triggers inference using a specified device. The core library has the following API: + +* `InferenceEngine::Core` +* `InferenceEngine::Blob`, `InferenceEngine::TBlob`, + `InferenceEngine::NV12Blob` +* `InferenceEngine::BlobMap` +* `InferenceEngine::InputsDataMap`, `InferenceEngine::InputInfo`, +* `InferenceEngine::OutputsDataMap` + +C++ Inference Engine API wraps the capabilities of core library: + +* `InferenceEngine::CNNNetwork` +* `InferenceEngine::ExecutableNetwork` +* `InferenceEngine::InferRequest` + +## Integration Steps + +Integration process includes the following steps: +![integration_process] + +1) **Create Inference Engine Core** to manage available devices and read network objects: +```cpp +InferenceEngine::Core core; +``` + +2) **Read a model IR** created by the Model Optimizer (.xml is supported format): +```cpp +auto network = core.ReadNetwork("Model.xml"); +``` +**Or read the model from ONNX format** (.onnx and .prototxt are supported formats) +```cpp +auto network = core.ReadNetwork("model.onnx"); +``` + +3) **Configure input and output**. Request input and output information using `InferenceEngine::CNNNetwork::getInputsInfo()`, and `InferenceEngine::CNNNetwork::getOutputsInfo()` +methods: +```cpp +/** Take information about all topology inputs **/ +InferenceEngine::InputsDataMap input_info = network.getInputsInfo(); +/** Take information about all topology outputs **/ +InferenceEngine::OutputsDataMap output_info = network.getOutputsInfo(); +``` + Optionally, set the number format (precision) and memory layout for inputs and outputs. Refer to the + [Supported configurations](supported_plugins/Supported_Devices.md) chapter to choose the relevant configuration. + + You can also allow input of any size. To do this, mark each input as resizable by setting a desired resize algorithm (e.g. `BILINEAR`) inside of the appropriate input info. + + Basic color format conversions are supported as well. By default, the Inference Engine assumes + that the input color format is `BGR` and color format conversions are disabled. The Inference + Engine supports the following color format conversions: + * `RGB->BGR` + * `RGBX->BGR` + * `BGRX->BGR` + * `NV12->BGR` + + where `X` is a channel that will be ignored during inference. To enable the conversions, set a + desired color format (for example, `RGB`) for each input inside of the appropriate input info. + + If you want to run inference for multiple images at once, you can use the built-in batch + pre-processing functionality. + +> **NOTE**: Batch pre-processing is not supported if input color format is set to `ColorFormat::NV12`. + + You can use the following code snippet to configure input and output: +```cpp +/** Iterate over all input info**/ +for (auto &item : input_info) { + auto input_data = item.second; + input_data->setPrecision(Precision::U8); + input_data->setLayout(Layout::NCHW); + input_data->getPreProcess().setResizeAlgorithm(RESIZE_BILINEAR); + input_data->getPreProcess().setColorFormat(ColorFormat::RGB); +} +/** Iterate over all output info**/ +for (auto &item : output_info) { + auto output_data = item.second; + output_data->setPrecision(Precision::FP32); + output_data->setLayout(Layout::NC); +} +``` + +> **NOTE**: NV12 input color format pre-processing differs from other color conversions. In case of NV12, +> Inference Engine expects two separate image planes (Y and UV). You must use a specific +> `InferenceEngine::NV12Blob` object instead of default blob object and set this blob to +> the Inference Engine Infer Request using `InferenceEngine::InferRequest::SetBlob()`. +> Refer to [Hello NV12 Input Classification C++ Sample](../../inference-engine/samples/hello_nv12_input_classification/README.md) +> for more details. + + If you skip this step, the default values are set: + + * no resize algorithm is set for inputs + * input color format - `ColorFormat::RAW` meaning that input does not need color + conversions + * input and output precision - `Precision::FP32` + * input layout - `Layout::NCHW` + * output layout depends on number of its dimensions: + +|Number of dimensions | 5 | 4 | 3 | 2 | 1 | +|:--------------------|-------|------|-----|----|----| +|Layout | NCDHW | NCHW | CHW | NC | C | + +4) **Load the model** to the device using `InferenceEngine::Core::LoadNetwork()`: +```cpp +auto executable_network = core.LoadNetwork(network, "CPU"); +``` + It creates an executable network from a network object. The executable network is associated with single hardware device. + It is possible to create as many networks as needed and to use them simultaneously (up to the limitation of the hardware resources). + Third parameter is a configuration for plugin. It is map of pairs: (parameter name, parameter value). Choose device from + [Supported devices](supported_plugins/Supported_Devices.md) page for more details about supported configuration parameters. +```cpp +/** Optional config. E.g. this enables profiling of performance counters. **/ +std::map config = {{ PluginConfigParams::KEY_PERF_COUNT, PluginConfigParams::YES }}; +auto executable_network = core.LoadNetwork(network, "CPU", config); +``` + +5) **Create an infer request**: +```cpp +auto infer_request = executable_network.CreateInferRequest(); +``` + +6) **Prepare input**. You can use one of the following options to prepare input: + * **Optimal way for a single network.** Get blobs allocated by an infer request using `InferenceEngine::InferRequest::GetBlob()` + and feed an image and the input data to the blobs. In this case, input data must be aligned (resized manually) with a + given blob size and have a correct color format. +```cpp +/** Iterate over all input blobs **/ +for (auto & item : inputInfo) { + auto input_name = item->first; + /** Get input blob **/ + auto input = infer_request.GetBlob(input_name); + /** Fill input tensor with planes. First b channel, then g and r channels **/ + ... +} +``` + * **Optimal way for a cascade of networks (output of one network is input for another).** Get output blob from the first + request using `InferenceEngine::InferRequest::GetBlob()` and set it as input for the second request using + `InferenceEngine::InferRequest::SetBlob()`. +```cpp +auto output = infer_request1->GetBlob(output_name); +infer_request2->SetBlob(input_name, output); +``` + * **Optimal way to handle ROI (a ROI object located inside of input of one network is input for another).** It is + possible to re-use shared input by several networks. You do not need to allocate separate input blob for a network if + it processes a ROI object located inside of already allocated input of a previous network. For instance, when first + network detects objects on a video frame (stored as input blob) and second network accepts detected bounding boxes + (ROI inside of the frame) as input. + In this case, it is allowed to re-use pre-allocated input blob (used by first network) by second network and just crop + ROI without allocation of new memory using `InferenceEngine::make_shared_blob()` with passing of + `InferenceEngine::Blob::Ptr` and `InferenceEngine::ROI` as parameters. +```cpp +/** inputBlob points to input of a previous network and + cropROI contains coordinates of output bounding box **/ +InferenceEngine::Blob::Ptr inputBlob; +InferenceEngine::ROI cropRoi; +... + +/** roiBlob uses shared memory of inputBlob and describes cropROI + according to its coordinates **/ +auto roiBlob = InferenceEngine::make_shared_blob(inputBlob, cropRoi); +infer_request2->SetBlob(input_name, roiBlob); +``` + Make sure that shared input is kept valid during execution of each network. Otherwise, ROI blob may be corrupted if the + original input blob (that ROI is cropped from) has already been rewritten. + + * Allocate input blobs of the appropriate types and sizes, feed an image and the input data to the blobs, and call + `InferenceEngine::InferRequest::SetBlob()` to set these blobs for an infer request: +```cpp +/** Iterate over all input blobs **/ +for (auto & item : inputInfo) { + auto input_data = item->second; + /** Create input blob **/ + InferenceEngine::TBlob::Ptr input; + // assuming input precision was asked to be U8 in prev step + input = InferenceEngine::make_shared_blob(InferenceEngine::Precision:U8, input_data->getDims()); + input->allocate(); + infer_request->SetBlob(item.first, input); + + /** Fill input tensor with planes. First b channel, then g and r channels **/ + ... +} +``` + A blob can be filled before and after `SetBlob()`. + +> **NOTE:** +> +> * `SetBlob()` method compares precision and layout of an input blob with ones defined on step 3 and +> throws an exception if they do not match. It also compares a size of the input blob with input +> size of the read network. But if input was configured as resizable, you can set an input blob of +> any size (for example, any ROI blob). Input resize will be invoked automatically using resize +> algorithm configured on step 3. Similarly to the resize, color format conversions allow the color +> format of an input blob to differ from the color format of the read network. Color format +> conversion will be invoked automatically using color format configured on step 3. +> +> * `GetBlob()` logic is the same for pre-processable and not pre-processable input. Even if it is +> called with input configured as resizable or as having specific color format, a blob allocated by +> an infer request is returned. Its size and color format are already consistent with the +> corresponding values of the read network. No pre-processing will happen for this blob. If you +> call `GetBlob()` after `SetBlob()`, you will get the blob you set in `SetBlob()`. + +7) **Do inference** by calling the `InferenceEngine::InferRequest::StartAsync` and `InferenceEngine::InferRequest::Wait` +methods for asynchronous request: +```cpp +infer_request->StartAsync(); +infer_request.Wait(IInferRequest::WaitMode::RESULT_READY); +``` + +or by calling the `InferenceEngine::InferRequest::Infer` method for synchronous request: +```cpp +sync_infer_request->Infer(); +``` +`StartAsync` returns immediately and starts inference without blocking main thread, `Infer` blocks + main thread and returns when inference is completed. +Call `Wait` for waiting result to become available for asynchronous request. + +There are three ways to use it: +* specify maximum duration in milliseconds to block for. The method is blocked until the specified timeout has elapsed, +or the result becomes available, whichever comes first. +* `InferenceEngine::IInferRequest::WaitMode::RESULT_READY` - waits until inference result becomes available +* `InferenceEngine::IInferRequest::WaitMode::STATUS_ONLY` - immediately returns request status.It does not +block or interrupts current thread. + +Both requests are thread-safe: can be called from different threads without fearing corruption and failures. + +Multiple requests for single `ExecutableNetwork` are executed sequentially one by one in FIFO order. + +While request is ongoing, all its methods except `InferenceEngine::InferRequest::Wait` would throw an +exception. + +8) Go over the output blobs and **process the results**. +Note that casting `Blob` to `TBlob` via `std::dynamic_pointer_cast` is not recommended way, +better to access data via `buffer()` and `as()` methods as follows: +```cpp + for (auto &item : output_info) { + auto output_name = item.first; + auto output = infer_request.GetBlob(output_name); + { + auto const memLocker = output->cbuffer(); // use const memory locker + // output_buffer is valid as long as the lifetime of memLocker + const float *output_buffer = memLocker.as(); + /** output_buffer[] - accessing output blob data **/ + +``` + +## Build Your Application + +For details about building your application, refer to the CMake files for the sample applications. +All samples source code is located in the `/openvino/inference_engine/samples` directory, where `INSTALL_DIR` is the OpenVINO™ installation directory. + +### CMake project creation + +1. **Create a structure** for the project: +``` sh +project/ + ├── CMakeLists.txt - CMake file to build + ├── ... - Additional folders like includes/ + └── src/ - source folder + └── main.cpp +build/ - build directory + ... +``` + +2. **Include Inference Engine, nGraph and OpenCV libraries** in `project/CMakeLists.txt` +[OpenCV](https://docs.opencv.org/master/db/df5/tutorial_linux_gcc_cmake.html) integration is needed mostly for pre-processing input data and ngraph for more complex applications using [ngraph API](nGraph_Flow.md). +``` cmake +cmake_minimum_required(VERSION 3.0.0) +project(project_name) +find_package(ngraph REQUIRED) +find_package(InferenceEngine REQUIRED) +find_package(OpenCV REQUIRED) +add_executable(${PROJECT_NAME} src/main.cpp) +target_link_libraries(${PROJECT_NAME} PRIVATE ${InferenceEngine_LIBRARIES} ${OpenCV_LIBS} ${NGRAPH_LIBRARIES}) +``` +3. **To build your project** using CMake with the default build tools currently available on your machine, execute the following commands: +> **NOTE**: Make sure **Set the Environment Variables** step in [OpenVINO Installation](../../inference-engine/samples/hello_nv12_input_classification/README.md) document is applied to your terminal, otherwise `InferenceEngine_DIR` and `OpenCV_DIR` variables won't be configured properly to pass `find_package` calls. +```sh +cd build/ +cmake ../project +cmake --build . +``` +It's allowed to specify additional build options (e.g. to build CMake project on Windows with a specific build tools). Please refer to the [CMake page](https://cmake.org/cmake/help/latest/manual/cmake.1.html#manual:cmake(1)) for details. + +### Run Your Application + +> **NOTE**: Before running, make sure you completed **Set the Environment Variables** section in [OpenVINO Installation](../../inference-engine/samples/hello_nv12_input_classification/README.md) document so that the application can find the libraries. + +To run compiled applications on Microsoft* Windows* OS, make sure that Microsoft* Visual C++ 2015 +Redistributable and Intel® C++ Compiler 2017 Redistributable packages are installed and +`/bin/intel64/Release/*.dll` files are placed to the +application folder or accessible via `%PATH%` environment variable. + +[integration_process]: img/integration_process.png + +## Deprecation Notice + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* diff --git a/docs/IE_DG/Intro_to_Performance.md b/docs/IE_DG/Intro_to_Performance.md new file mode 100644 index 00000000000000..2987a3628bab17 --- /dev/null +++ b/docs/IE_DG/Intro_to_Performance.md @@ -0,0 +1,99 @@ +# Introduction to the Performance Topics {#openvino_docs_IE_DG_Intro_to_Performance} + +This section is a shorter version of the +[Optimization Guide](supported_plugins/MULTI.md) for the Intel Deep Learning Deployment Toolkit. + +## Precision +Inference precision directly affects the performance. + +Model Optimizer can produce an IR with different precision. For example, float16 IR initially targets VPU and GPU devices, while, for example, the CPU can also execute regular float32. +Also, further device-specific inference precision settings are available, for example, [8-bit integer](Int8Inference.md) or [bfloat16](Bfloat16Inference.md) inference on the CPU. +Note that for [MULTI device](supported_plugins/MULTI.md) that supports automatic inference on multiple devices in parallel, you can use the FP16 IR. +You can find more information, including preferred data types for specific devices, in the +[Supported Devices](supported_plugins/Supported_Devices.md) section. + +## Lowering Inference Precision +Default optimization is used for CPU and implies that inference is made with lower precision if it is possible on a given platform to reach better performance with acceptable range of accuracy. +This approach is used for CPU device if platform supports the AVX512_BF16 instruction. In this case, a regular float32 model is converted to [bfloat16](Bfloat16Inference.md) internal representation and inference is provided with bfloat16 layers usage. +Below is the example command line to disable this feature on the CPU device with the AVX512_BF16 instruction and execute regular float32. +``` +$ benchmark_app -m -enforcebf16=false + ``` + +## Latency vs. Throughput +One way to increase computational efficiency is batching, which combines many (potentially tens) of +input images to achieve optimal throughput. However, high batch size also comes with a +latency penalty. So, for more real-time oriented usages, lower batch sizes (as low as a single input) are used. +Refer to the [Benchmark App](../../inference-engine/samples/benchmark_app/README.md) sample, which allows latency vs. throughput measuring. + +## Using Async API +To gain better performance on accelerators, such as VPU or FPGA, the Inference Engine uses the asynchronous approach (see +[Integrating Inference Engine in Your Application (current API)](Integrate_with_customer_application_new_API.md)). +The point is amortizing the costs of data transfers, by pipe-lining, see [Async API explained](@ref omz_demos_object_detection_demo_ssd_async_README). +Since the pipe-lining relies on the availability of the parallel slack, running multiple inference requests in parallel is essential. +Refer to the [Benchmark App](../../inference-engine/samples/benchmark_app/README.md) sample, which enables running a number of inference requests in parallel. Specifying different number of request produces different throughput measurements. + +## Best Latency on the Multi-Socket CPUs +Note that when latency is of concern, there are additional tips for multi-socket systems. +When input is limited to the single image, the only way to achieve the best latency is to limit execution to the single socket. +The reason is that single image is simply not enough +to saturate more than one socket. Also NUMA overheads might dominate the execution time. +Below is the example command line that limits the execution to the single socket using numactl for the best *latency* value +(assuming the machine with 28 phys cores per socket): +``` +limited to the single socket). +$ numactl -m 0 --physcpubind 0-27 benchmark_app -m -api sync -nthreads 28 + ``` +Note that if you have more than one input, running as many inference requests as you have NUMA nodes (or sockets) +usually gives the same best latency as a single request on the single socket, but much higher throughput. Assuming two NUMA nodes machine: +``` +$ benchmark_app -m -nstreams 2 + ``` +Number of NUMA nodes on the machine can be queried via 'lscpu'. +Please see more on the NUMA support in the [Optimization Guide](supported_plugins/MULTI.md). + +## Throughput Mode for CPU +Unlike most accelerators, CPU is perceived as an inherently latency-oriented device. +Since 2018 R5 release, the Inference Engine introduced the "throughput" mode, which allows the Inference Engine to efficiently run multiple inference requests on the CPU simultaneously, greatly improving the throughput. + +Internally, the execution resources are split/pinned into execution "streams". +Using this feature gains much better performance for the networks that originally are not scaled well with a number of threads (for example, lightweight topologies). This is especially pronounced for the many-core server machines. + +Run the [Benchmark App](../../inference-engine/samples/benchmark_app/README.md) and play with number of infer requests running in parallel, next section. +Try different values of the `-nstreams` argument from `1` to a number of CPU cores and find one that provides the best performance. + +In addition to the number of streams, it is also possible to play with the batch size to find the throughput sweet-spot. + +The throughput mode relaxes the requirement to saturate the CPU by using a large batch: running multiple independent inference requests in parallel often gives much better performance, than using a batch only. +This allows you to simplify the app-logic, as you don't need to combine multiple inputs into a batch to achieve good CPU performance. +Instead, it is possible to keep a separate infer request per camera or another source of input and process the requests in parallel using Async API. + +## Benchmark App +[Benchmark App](../../inference-engine/samples/benchmark_app/README.md) sample is the best performance reference. +It has a lot of device-specific knobs, but the primary usage is as simple as: +```bash +$ ./benchmark_app –d GPU –m -i +``` +to measure the performance of the model on the GPU. +Or +```bash +$ ./benchmark_app –d CPU –m -i +``` +to execute on the CPU instead. + +For example, for the CPU throughput mode from the previous section, you can play with number of streams (`-nstreams` command-line param). +Try different values of the `-nstreams` argument from `1` to a number of CPU cores and find one that provides the best performance. For example, on a 8-core CPU, compare the `-nstreams 1` (which is a latency-oriented scenario) to the `2`, `4` and `8` streams. Notice that `benchmark_app` automatically queries/creates/runs number of requests required to saturate the given number of streams. + +Finally, notice that when you don't specify number of streams with `-nstreams`, "AUTO" value for the streams is used, e.g. for the CPU this is [CPU_THROUGHPUT_AUTO](supported_plugins/CPU.md). You can spot the actual value behind "AUTO" for your machine in the application output. +Notice that the "AUTO" number is not necessarily most optimal, so it is generally recommended to play either with the benchmark_app's "-nstreams" as described above, or via [new Workbench tool](@ref workbench_docs_Workbench_DG_Introduction).This allows you to simplify the app-logic, as you don't need to combine multiple inputs into a batch to achieve good CPU performance. +Instead, it is possible to keep a separate infer request per camera or another source of input and process the requests in parallel using Async API. + +## Kernels Tuning for GPU + +GPU backend comes with a feature, that allows models tuning, so the workload is configured to fit better into hardware. + +Tuning is time consuming process, which internally execute every layer several (or even hundreds) times to find most performant configuration. + +This configuration is saved into json-formatted file, whose name can be passed as plugin param to network. GPU backend will process this data to configure kernels for the best performance. + +For more details about Kernels Tuning and How-To please refer to [GPU Kernels Tuning](GPU_Kernels_Tuning.md). diff --git a/docs/IE_DG/Introduction.md b/docs/IE_DG/Introduction.md new file mode 100644 index 00000000000000..27d223c4edcfc6 --- /dev/null +++ b/docs/IE_DG/Introduction.md @@ -0,0 +1,145 @@ +# Introduction to Intel® Deep Learning Deployment Toolkit {#openvino_docs_IE_DG_Introduction} + +## Deployment Challenges + +Deploying deep learning networks from the training environment to embedded platforms for inference +might be a complex task that introduces a number of technical challenges that must be addressed: + +* There are a number of deep learning frameworks widely used in the industry, such as Caffe*, TensorFlow*, MXNet*, Kaldi* etc. + +* Typically the training of the deep learning networks is performed in data centers or server farms while the inference +might take place on embedded platforms, optimized for performance and power consumption. Such platforms are typically +limited both from software perspective (programming languages, third party dependencies, memory consumption, +supported operating systems), and from hardware perspective (different data types, limited power envelope), +so usually it is not recommended (and sometimes just impossible) to use original training framework for inference. +An alternative solution would be to use dedicated inference APIs that are well optimized for specific hardware platforms. + +* Additional complications of the deployment process include supporting various layer types and networks that are getting +more and more complex. Obviously, ensuring the accuracy of the transforms networks is not trivial. + +## Deployment Workflow +The process assumes that you have a network model trained using one of the [supported frameworks](#SupportedFW). +The scheme below illustrates the typical workflow for deploying a trained deep learning model: +![scheme] + +The steps are: + +1. [Configure Model Optimizer](../MO_DG/prepare_model/Config_Model_Optimizer.md) for the specific framework (used to train your model). + +2. Run [Model Optimizer](#MO) to produce an optimized [Intermediate Representation (IR)](../MO_DG/IR_and_opsets.md) +of the model based on the trained network topology, weights and biases values, and other optional parameters. + +3. Test the model in the IR format using the [Inference Engine](#IE) in the target environment with provided +[Inference Engine sample applications](Samples_Overview.md). + +4. [Integrate Inference Engine](Integrate_with_customer_application_new_API.md) in your application to deploy the model in the target environment. + + +## Model Optimizer + +Model Optimizer is a cross-platform command line tool that facilitates the transition between the training and +deployment environment, performs static model analysis and automatically adjusts deep learning +models for optimal execution on end-point target devices. + +Model Optimizer is designed to support multiple deep learning [supported frameworks and formats](#SupportedFW). + +While running Model Optimizer you do not need to consider what target device you wish to use, the same output of the MO can be used in all targets. + +### Model Optimizer Workflow + +The process assumes that you have a network model trained using one of the [supported frameworks](#SupportedFW). +The Model Optimizer workflow can be described as following: + +* [Configure Model Optimizer](../MO_DG/prepare_model/Config_Model_Optimizer.md) for one of the supported deep learning framework that was used to train the model. +* Provide as input a trained network that contains a certain network topology, and the adjusted weights and +biases (with some optional parameters). +* [Run Model Optimizer](../MO_DG/prepare_model/convert_model/Converting_Model.md) to perform specific model optimizations (for example, horizontal fusion of certain network layers). Exact optimizations +are framework-specific, refer to appropriate documentation pages: [Converting a Caffe Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_Caffe.md), +[Converting a TensorFlow Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md), [Converting a MXNet Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_MxNet.md), [Converting a Kaldi Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md), +[Converting an ONNX Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md). +* Model Optimizer produces as output an [Intermediate Representation (IR)](../MO_DG/IR_and_opsets.md) of the network which is used as an input for the Inference Engine on all targets. + + +### Supported Frameworks and Formats +* Caffe* (most public branches) +* TensorFlow* +* MXNet* +* Kaldi* +* ONNX* + +### Supported Models +For the list of supported models refer to the framework or format specific page: +* [Supported Caffe* models](../MO_DG/prepare_model/convert_model/Convert_Model_From_Caffe.md) +* [Supported TensorFlow* models](../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md) +* [Supported MXNet* models](../MO_DG/prepare_model/convert_model/Convert_Model_From_MxNet.md) +* [Supported ONNX* models](../MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md) +* [Supported Kaldi* models](../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md) + + +## Intermediate Representation + +Intermediate representation describing a deep learning model plays an important role connecting the OpenVINO™ toolkit components. +The IR is a pair of files: + * `.xml`: The topology file - an XML file that describes the network topology + * `.bin`: The trained data file - a .bin file that contains the weights and biases binary data + +Intermediate Representation (IR) files can be read, loaded and inferred with the [Inference Engine](#IE). +Inference Engine API offers a unified API across a number of [supported Intel® platforms](#SupportedTargets). +IR is also consumed, modified and written by Post-Training Optimization Tool which provides quantization capabilities. + +Refer to a dedicated description about [Intermediate Representation and Operation Sets](../MO_DG/IR_and_opsets.md) for further details. + +## nGraph Integration + +OpenVINO toolkit is powered by nGraph capabilities for Graph construction API, Graph transformation engine and Reshape. +nGraph Function is used as an intermediate representation for a model in the run-time underneath the CNNNetwork API. +The conventional representation for CNNNetwork is still available if requested for backward compatibility when some conventional API methods are used. +Please refer to the [Overview of nGraph Flow](nGraph_Flow.md) describing the details of nGraph integration into the Inference Engine and co-existence with the conventional representation. + +**Deprecation Notice** + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* + +## Inference Engine + +Inference Engine is a runtime that delivers a unified API to integrate the inference with application logic: + +* Takes as input the model. The model presented in the specific form of [Intermediate Representation (IR)](../MO_DG/IR_and_opsets.md) +produced by Model Optimizer. +* Optimizes inference execution for target hardware. +* Delivers inference solution with reduced footprint on embedded inference platforms. + +The Inference Engine supports inference of multiple image classification networks, +including AlexNet, GoogLeNet, VGG and ResNet families of networks, fully convolutional networks like FCN8 used for image + segmentation, and object detection networks like Faster R-CNN. + +For the full list of supported hardware, refer to the +[Supported Devices](supported_plugins/Supported_Devices.md) section. + +For Intel® Distribution of OpenVINO™ toolkit, the Inference Engine package contains [headers](files.html), runtime libraries, and +[sample console applications](Samples_Overview.md) demonstrating how you can use +the Inference Engine in your applications. + +The open source version is available in the [OpenVINO™ toolkit GitHub repository](https://github.com/openvinotoolkit/openvino) and can be built for supported platforms using the Inference Engine Build Instructions. +## See Also +- [Inference Engine Samples](Samples_Overview.md) +- [Intel® Deep Learning Deployment Toolkit Web Page](https://software.intel.com/en-us/computer-vision-sdk) + + +[scheme]: img/workflow_steps.png + +#### Optimization Notice +For complete information about compiler optimizations, see our [Optimization Notice](https://software.intel.com/en-us/articles/optimization-notice#opt-en). diff --git a/docs/IE_DG/Known_Issues_Limitations.md b/docs/IE_DG/Known_Issues_Limitations.md new file mode 100644 index 00000000000000..ec3e4ffd8e2862 --- /dev/null +++ b/docs/IE_DG/Known_Issues_Limitations.md @@ -0,0 +1,58 @@ +# Known Issues and Limitations {#openvino_docs_IE_DG_Known_Issues_Limitations} + +## Multiple OpenMP Loadings + +If the application uses the Inference Engine with third-party components that depend on Intel OpenMP, multiple loadings of the libiomp library may occur and cause OpenMP runtime initialization conflicts. This may happen, for example, if the application uses Intel® Math Kernel Library (Intel® MKL) through the “Single Dynamic Library” (libmkl_rt.so) mechanism and calls Intel MKL after loading the Inference Engine plugin. +The error log looks as follows: +```sh +OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized. +OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/. +``` + +Possible workarounds: + +* Preload the OpenMP runtime using the LD_PRELOAD variable: +```sh +LD_PRELOAD= +``` + This eliminates multiple loadings of libiomp, and makes all the components use this specific version of OpenMP. + +* Alternatively, you can set KMP_DUPLICATE_LIB_OK=TRUE. However, performance degradation or results incorrectness may occur in this case. + + +## Old proto compiler breaks protobuf library + +With python protobuf library version 3.5.1 the following incompatibility can happen. +The known case is for Cent OS 7.4 + +The error log looks as follows: + +```sh +File "../lib64/python3.5/site-packages/google/protobuf/descriptor.py", line 829, in _new_ +return _message.default_pool.AddSerializedFile(serialized_pb) +TypeError: expected bytes, str found +``` + +Possible workaround is to upgrade default protobuf compiler (libprotoc 2.5.0) to newer version, for example +libprotoc 2.6.1. + +[protobuf_issue]: https://github.com/google/protobuf/issues/4272 + +## Dynamic batching +Refer to the **Limitations** section of [Dynamic batching page](DynamicBatching.md) + +## Static Shape Infer +Refer to the **Limitations** section of [Static Shape Infer page](ShapeInference.md) + + +## Image Pre-Processing Performance Optimization Issue + +As described in [documentation for new API](Integrate_with_customer_application_new_API.md), you can set an image blob of any size to an +infer request using resizable input. Resize is executed during inference using configured resize algorithm. + +But currently resize algorithms are not completely optimized. So expect performance degradation if resizable input is +specified and an input blob (to be resized) is set (`SetBlob()` is used). Required performance is met for +[CPU](supported_plugins/CPU.md) plugin only (because enabled openMP* provides parallelism). + +Another limitation is that currently, resize algorithms support NCHW layout only. So if you set NHWC layout for an input +blob, NHWC is converted to NCHW before resize and back to NHWC after resize. diff --git a/docs/IE_DG/Legal_Information.md b/docs/IE_DG/Legal_Information.md new file mode 100644 index 00000000000000..3b39dba5810fa4 --- /dev/null +++ b/docs/IE_DG/Legal_Information.md @@ -0,0 +1,12 @@ +# Legal Information {#openvino_docs_IE_DG_Legal_Information} + +No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
+Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.
+This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.
+The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.
+Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting [www.intel.com/design/literature.htm](http://www.intel.com/design/literature.htm).
+Intel, Intel logo, Intel Core, VTune, Xeon are trademarks of Intel Corporation in the U.S. and other countries.
+\* Other names and brands may be claimed as the property of others.
+Copyright © 2016-2018 Intel Corporation.
+This software and the related documents are Intel copyrighted materials, and your use of them is governed by the express license under which they were provided to you (License). Unless the License provides otherwise, you may not use, modify, copy, publish, distribute, disclose or transmit this software or the related documents without Intel's prior written permission.
+This software and the related documents are provided as is, with no express or implied warranties, other than those that are expressly stated in the License.
diff --git a/docs/IE_DG/Memory_primitives.md b/docs/IE_DG/Memory_primitives.md new file mode 100644 index 00000000000000..a6fed433d3c765 --- /dev/null +++ b/docs/IE_DG/Memory_primitives.md @@ -0,0 +1,55 @@ +Inference Engine Memory primitives {#openvino_docs_IE_DG_Memory_primitives} +===================================================================== + +## Blobs + +InferenceEngine::Blob is the main class intended for working with memory. +Using this class you can read and write memory, get information about the memory structure etc. + +The right way to create Blob objects with a specific layout is to use constructors with InferenceEngine::TensorDesc. +
+InferenceEngige::TensorDesc tdesc(FP32, {1, 3, 227, 227}, InferenceEngine::Layout::NCHW);
+InferenceEngine::Blob::Ptr blob = InferenceEngine::make_shared_blob(tdesc);
+
+ +## Layouts + +InferenceEngine::TensorDesc is a special class that provides layout format description. + +This class allows to create planar layouts using the standard formats (like InferenceEngine::Layout::NCDHW, InferenceEngine::Layout::NCHW, InferenceEngine::Layout::NC, InferenceEngine::Layout::C and etc) and also non-planar layouts using InferenceEngine::BlockingDesc. + +In order to create a complex layout you should use InferenceEngine::BlockingDesc which allows to define the blocked memory with offsets and strides. + +## Examples + +1. You can define a blob with dimensions {N: 1, C: 25, H: 20, W: 20} and format NHWC with using next parameters:
+
+InferenceEngine::BlockingDesc({1, 20, 20, 25}, {0, 2, 3, 1}); // or
+InferenceEngine::BlockingDesc({1, 20, 20, 25}, InferenceEngine::Layout::NHWC);
+
+2. If you have a memory with real dimensions {N: 1, C: 25, H: 20, W: 20} but with channels which are blocked by 8, you can define it using next parameters:
+
+InferenceEngine::BlockingDesc({1, 4, 20, 20, 8}, {0, 1, 2, 3, 1})
+
+3. Also you can set strides and offsets if layout contains it. +4. If you have a complex blob layout and you don't want to calculate the real offset to data you can use methods +InferenceEngine::TensorDesc::offset(size_t l) or InferenceEngine::TensorDesc::offset(SizeVector v).
+For example: +
+InferenceEngine::BlockingDesc blk({1, 4, 20, 20, 8}, {0, 1, 2, 3, 1});
+InferenceEngine::TensorDesc tdesc(FP32, {1, 25, 20, 20}, blk);
+tdesc.offset(0); // = 0
+tdesc.offset(1); // = 8
+tdesc.offset({0, 0, 0, 2}); // = 16
+tdesc.offset({0, 1, 0, 2}); // = 17
+
+5. If you would like to create a TensorDesc with a planar format and for N dimensions (N can be different 1, 2, 4 and etc), you can use the method +InferenceEngine::TensorDesc::getLayoutByDims. +
+InferenceEngine::TensorDesc::getLayoutByDims({1}); // InferenceEngine::Layout::C
+InferenceEngine::TensorDesc::getLayoutByDims({1, 2}); // InferenceEngine::Layout::NC
+InferenceEngine::TensorDesc::getLayoutByDims({1, 2, 3, 4}); // InferenceEngine::Layout::NCHW
+InferenceEngine::TensorDesc::getLayoutByDims({1, 2, 3}); // InferenceEngine::Layout::BLOCKED
+InferenceEngine::TensorDesc::getLayoutByDims({1, 2, 3, 4, 5}); // InferenceEngine::Layout::NCDHW
+InferenceEngine::TensorDesc::getLayoutByDims({1, 2, 3, 4, 5, ...}); // InferenceEngine::Layout::BLOCKED
+
\ No newline at end of file diff --git a/docs/IE_DG/Migration_CoreAPI.md b/docs/IE_DG/Migration_CoreAPI.md new file mode 100644 index 00000000000000..21a01991b7fb77 --- /dev/null +++ b/docs/IE_DG/Migration_CoreAPI.md @@ -0,0 +1,77 @@ +Migration from Inference Engine Plugin API to Core API {#openvino_docs_IE_DG_Migration_CoreAPI} +=============================== + +For 2019 R2 Release, the new Inference Engine Core API is introduced. This guide is updated to reflect the new API approach. The Inference Engine Plugin API is still supported, but is going to be deprecated in future releases. + +This section provides common steps to migrate your application written using the Inference Engine Plugin API (`InferenceEngine::InferencePlugin`) to the Inference Engine Core API (`InferenceEngine::Core`). + +To learn how to write a new application using the Inference Engine, refer to [Integrate the Inference Engine Request API with Your Application](Integrate_with_customer_application_new_API.md) and [Inference Engine Samples Overview](Samples_Overview.md). + +## Inference Engine Core Class + +The Inference Engine Core class is implemented on top existing Inference Engine Plugin API and handles plugins internally. +The main responsibility of the `InferenceEngine::Core` class is to hide plugin specifics inside and provide a new layer of abstraction that works with devices (`InferenceEngine::Core::GetAvailableDevices`). Almost all methods of this class accept `deviceName` as an additional parameter that denotes an actual device you are working with. Plugins are listed in the `plugins.xml` file, which is loaded during constructing `InferenceEngine::Core` objects: + +```bash + + + + + ... + +``` + +## Migration Steps + +Common migration process includes the following steps: + +1. Migrate from the `InferenceEngine::InferencePlugin` initialization: +```cpp +InferenceEngine::InferencePlugin plugin = InferenceEngine::PluginDispatcher({ FLAGS_pp }).getPluginByDevice(FLAGS_d); +``` +to the `InferenceEngine::Core` class initialization: +```cpp +InferenceEngine::Core core; +``` + +2. Instead of using `InferenceEngine::CNNNetReader` to read IR: +```cpp +CNNNetReader network_reader; +network_reader.ReadNetwork(fileNameToString(input_model)); +network_reader.ReadWeights(fileNameToString(input_model).substr(0, input_model.size() - 4) + ".bin"); +CNNNetwork network = network_reader.getNetwork(); +``` +read networks using the Core class: +```cpp +CNNNetwork network = core.ReadNetwork(input_model); +``` +The Core class also allows reading models from ONNX format: +```cpp +CNNNetwork network = core.ReadNetwork("model.onnx"); +``` + +3. Instead of adding CPU device extensions to the plugin: +```cpp +plugin.AddExtension(std::make_shared()); +``` +add extensions to CPU device using the Core class: +```cpp +core.AddExtension(std::make_shared(), "CPU"); +``` + +4. Instead of setting configuration keys to a particular plugin, set (key, value) pairs via `InferenceEngine::Core::SetConfig` +```cpp +core.SetConfig({{PluginConfigParams::KEY_CONFIG_FILE, FLAGS_c}}, "GPU"); +``` +> **NOTE**: If `deviceName` is omitted as the last argument, configuration is set for all Inference Engine devices. + +5. Migrate from loading the network to a particular plugin: +```cpp +auto execNetwork = plugin.LoadNetwork(network, { }); +``` +to `InferenceEngine::Core::LoadNetwork` to a particular device: +```cpp +auto execNetwork = core.LoadNetwork(network, deviceName, { }); +``` + +After you have an instance of `InferenceEngine::ExecutableNetwork`, all other steps are as usual. diff --git a/docs/IE_DG/OnnxImporterTutorial.md b/docs/IE_DG/OnnxImporterTutorial.md new file mode 100644 index 00000000000000..7b336f97a633fc --- /dev/null +++ b/docs/IE_DG/OnnxImporterTutorial.md @@ -0,0 +1,118 @@ +# ONNX* Importer API Tutorial {#openvino_docs_IE_DG_OnnxImporterTutorial} + +> **NOTE**: This tutorial is deprecated. Since OpenVINO™ 2020.4 version, Inference Engine enables reading ONNX models via the Inference Engine Core API +> and there is no need to use directly the low-level ONNX* Importer API anymore. +> To read ONNX\* models, it's recommended to use the `Core::ReadNetwork()` method that provide a uniform way to read models from IR or ONNX format. + +This tutorial demonstrates how to use the ONNX\* Importer API. +This API makes it possible to create an nGraph `Function` object from an imported ONNX model. + +All functions of the ONNX Importer API are in the [onnx.hpp][onnx_header] header file. + +Two categories of API functions: +* Helper functions that check which ONNX ops are supported in a current version of the ONNX Importer +* Functions that read ONNX models from a stream or file and result in an nGraph function, which can be executed using the Inference Engine + +## Check Which ONNX Ops Are Supported + +To list all supported ONNX ops in a specific version and domain, use the `get_supported_operators` +as shown in the example below: +```cpp +const std::int64_t version = 12; +const std::string domain = "ai.onnx"; +const std::set supported_ops = ngraph::onnx_import::get_supported_operators(version, domain); + +for(const auto& op : supported_ops) +{ + std::cout << op << std::endl; +} +``` +The above code produces a list of all the supported operators for the `version` and `domain` you specified and outputs a list similar to this: +```cpp +Abs +Acos +... +Xor +``` + +To determine whether a specific ONNX operator in a particular version and domain is supported by the importer, use the `is_operator_supported` function as shown in the example below: +```cpp +const std::string op_name = "Abs"; +const std::int64_t version = 12; +const std::string domain = "ai.onnx"; +const bool is_abs_op_supported = ngraph::onnx_import::is_operator_supported(op_name, version, domain); + +std::cout << "Abs in version 12, domain `ai.onnx`is supported: " << (is_abs_op_supported ? "true" : "false") << std::endl; +``` + +## Import ONNX Model + +To import an ONNX model, use the `import_onnx_model` function. +The method has two overloads: +* `import_onnx_model` takes a stream as an input, for example, file stream, memory stream +* `import_onnx_model` takes a file path as an input + +Refer to the sections below for details. + +> **NOTE**: The examples below use the ONNX ResNet50 model, which is available at the [ONNX Model Zoo][onnx_model_zoo]: +> ```bash +> $ wget https://s3.amazonaws.com/download.onnx/models/opset_8/resnet50.tar.gz +> $ tar -xzvf resnet50.tar.gz +> ``` + +Once you create the `ng_function`, you can use it to run computation on the Inference Engine. +As it was shown in [Build a Model with nGraph Library](nGraphTutorial.md), `std::shared_ptr` can be transformed into a `CNNNetwork`. + + +### Stream as Input + +The code below shows how to convert the ONNX ResNet50 model to the nGraph function using `import_onnx_model` with the stream as an input: + +```cpp + const std::string resnet50_path = "resnet50/model.onnx"; + std::ifstream resnet50_stream(resnet50_path); + if(resnet50_stream.is_open()) + { + try + { + const std::shared_ptr ng_function = ngraph::onnx_import::import_onnx_model(resnet50_stream); + + // Check shape of the first output, for example + std::cout << ng_function->get_output_shape(0) << std::endl; + // The output is Shape{1, 1000} + } + catch (const ngraph::ngraph_error& error) + { + std::cout << "Error when importing ONNX model: " << error.what() << std::endl; + } + } + resnet50_stream.close(); +``` + +### Filepath as Input + +The code below shows how to convert the ONNX ResNet50 model to the nGraph function using `import_onnx_model` with the filepath as an input: +```cpp +const std::shared_ptr ng_function = ngraph::onnx_import::import_onnx_model(resnet50_path); +``` + +[onnx_header]: https://github.com/NervanaSystems/ngraph/blob/master/src/ngraph/frontend/onnx_import/onnx.hpp +[onnx_model_zoo]: https://github.com/onnx/models + + +## Deprecation Notice + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* \ No newline at end of file diff --git a/docs/IE_DG/Optimization_notice.md b/docs/IE_DG/Optimization_notice.md new file mode 100644 index 00000000000000..3c128d95b6c5bc --- /dev/null +++ b/docs/IE_DG/Optimization_notice.md @@ -0,0 +1,3 @@ +# Optimization Notice {#openvino_docs_IE_DG_Optimization_notice} + +![Optimization_notice](img/opt-notice-en_080411.gif) \ No newline at end of file diff --git a/docs/IE_DG/PythonPackage_Overview.md b/docs/IE_DG/PythonPackage_Overview.md new file mode 100644 index 00000000000000..411f082609f3d8 --- /dev/null +++ b/docs/IE_DG/PythonPackage_Overview.md @@ -0,0 +1,15 @@ +OpenVINO™ Python* package {#openvino_docs_IE_DG_PythonPackage_Overview} +======================== + +OpenVINO™ Python\* package includes types to measure model and calibrate to low precision. + +The OpenVINO™ Python\* package available in the `/python/python3.X` directory. + +The OpenVINO™ Python\* package includes the following sub-packages: + + - [openvino.inference_engine](../../inference-engine/ie_bridges/python/docs/api_overview.md) - Python\* wrapper on OpenVINO™ Inference Engine. + - `openvino.tools.accuracy_checker` - Measure accuracy. + - `openvino.tools.benchmark` - Measure latency and throughput. + +## See Also +* [Introduction to Intel's Deep Learning Inference Engine](Introduction.md) diff --git a/docs/IE_DG/Samples_Overview.md b/docs/IE_DG/Samples_Overview.md new file mode 100644 index 00000000000000..af60575f2aaf2b --- /dev/null +++ b/docs/IE_DG/Samples_Overview.md @@ -0,0 +1,184 @@ +# Inference Engine Samples {#openvino_docs_IE_DG_Samples_Overview} + +The Inference Engine sample applications are simple console applications that show how to utilize specific Inference Engine capabilities within an application, assist developers in executing specific tasks such as loading a model, running inference, querying specific device capabilities and etc. + +After installation of Intel® Distribution of OpenVINO™ toolkit, С, C++ and Python* sample applications are available in the following directories, respectively: +* `/inference_engine/samples/c` +* `/inference_engine/samples/cpp` +* `/inference_engine/samples/python` + +Inference Engine sample applications include the following: +- **[Automatic Speech Recognition C++ Sample](../../inference-engine/samples/speech_sample/README.md)** – Acoustic model inference based on Kaldi neural networks and speech feature vectors. +- **Benchmark Application** – Estimates deep learning inference performance on supported devices for synchronous and asynchronous modes. + - [Benchmark C++ Application](../../inference-engine/samples/benchmark_app/README.md) + - [Benchmark Python Application](../../inference-engine/tools/benchmark_tool/README.md) +- **Hello Classification Sample** – Inference of image classification networks like AlexNet and GoogLeNet using Synchronous Inference Request API. Input of any size and layout can be set to an infer request which will be pre-processed automatically during inference (the sample supports only images as inputs and supports Unicode paths). + - [Hello Classification C++ Sample](../../inference-engine/samples/hello_classification/README.md) + - [Hello Classification C Sample](../../inference-engine/ie_bridges/c/samples/hello_classification/README.md) +- **Hello NV12 Input Classification Sample** – Input of any size and layout can be provided to an infer request. The sample transforms the input to the NV12 color format and pre-process it automatically during inference. The sample supports only images as inputs. + - [Hello NV12 Input Classification C++ Sample](../../inference-engine/samples/hello_nv12_input_classification/README.md) + - [Hello NV12 Input Classification C Sample](../../inference-engine/ie_bridges/c/samples/hello_nv12_input_classification/README.md) +- **Hello Query Device Sample** – Query of available Inference Engine devices and their metrics, configuration values. + - [Hello Query Device C++ Sample](../../inference-engine/samples/hello_query_device/README.md) + - [Hello Query Device Python* Sample](../../inference-engine/ie_bridges/python/sample/hello_query_device/README.md) +- **[Hello Reshape SSD C++ Sample**](../../inference-engine/samples/hello_reshape_ssd/README.md)** – Inference of SSD networks resized by ShapeInfer API according to an input size. +- **Image Classification Sample Async** – Inference of image classification networks like AlexNet and GoogLeNet using Asynchronous Inference Request API (the sample supports only images as inputs). + - [Image Classification C++ Sample Async](../../inference-engine/samples/classification_sample_async/README.md) + - [Image Classification Python* Sample Async](../../inference-engine/ie_bridges/python/sample/classification_sample_async/README.md) +- **[Image Classification Python* Sample](../../inference-engine/ie_bridges/python/sample/classification_sample/README.md)** – Inference of image classification networks like AlexNet and GoogLeNet using Synchronous Inference Request API (the sample supports only images as inputs). +- **Neural Style Transfer Sample** – Style Transfer sample (the sample supports only images as inputs). + - [Neural Style Transfer C++ Sample](../../inference-engine/samples/style_transfer_sample/README.md) + - [Neural Style Transfer Python* Sample](../../inference-engine/ie_bridges/python/sample/style_transfer_sample/README.md) +- **[nGraph Function Creation C++ Sample](../../inference-engine/samples/ngraph_function_creation_sample/README.md)** – Construction of the LeNet network using the nGraph function creation sample. +- **Object Detection for SSD Sample** – Inference of object detection networks based on the SSD, this sample is simplified version that supports only images as inputs. + - [Object Detection for SSD C++ Sample](../../inference-engine/samples/object_detection_sample_ssd/README.md) + - [Object Detection for SSD C Sample](../../inference-engine/ie_bridges/c/samples/object_detection_sample_ssd/README.md) + - [Object Detection for SSD Python* Sample](../../inference-engine/ie_bridges/python/sample/object_detection_sample_ssd/README.md) + +## Media Files Available for Samples + +To run the sample applications, you can use images and videos from the media files collection available at https://github.com/intel-iot-devkit/sample-videos. + +## Samples that Support Pre-Trained Models + +You can download the [pre-trained models](@ref omz_models_intel_index) using the OpenVINO [Model Downloader](@ref omz_tools_downloader_README) or from [https://download.01.org/opencv/](https://download.01.org/opencv/). + +## Build the Sample Applications + +### Build the Sample Applications on Linux* + +The officially supported Linux* build environment is the following: + +* Ubuntu* 16.04 LTS 64-bit or CentOS* 7.4 64-bit +* GCC* 5.4.0 (for Ubuntu* 16.04) or GCC* 4.8.5 (for CentOS* 7.4) +* CMake* version 2.8 or higher + +To build the C or C++ sample applications for Linux, go to the `/inference_engine/samples/c` or `/inference_engine/samples/cpp` directory, respectively, and run the `build_samples.sh` script: +```sh +build_samples.sh +``` + +Once the build is completed, you can find sample binaries in the following folders: +* C samples: `~/inference_engine_c_samples_build/intel64/Release` +* C++ samples: `~/inference_engine_cpp_samples_build/intel64/Release` + +You can also build the sample applications manually: + +> **NOTE**: If you have installed the product as a root user, switch to root mode before you continue: `sudo -i` + +1. Navigate to a directory that you have write access to and create a samples build directory. This example uses a directory named `build`: +```sh +mkdir build +``` +> **NOTE**: If you ran the Image Classification verification script during the installation, the C++ samples build directory was already created in your home directory: `~/inference_engine_samples_build/` + +2. Go to the created directory: +```sh +cd build +``` + +3. Run CMake to generate the Make files for release or debug configuration. For example, for C++ samples: + - For release configuration: + ```sh + cmake -DCMAKE_BUILD_TYPE=Release /inference_engine/samples/cpp + ``` + - For debug configuration: + ```sh + cmake -DCMAKE_BUILD_TYPE=Debug /inference_engine/samples/cpp + ``` +4. Run `make` to build the samples: +```sh +make +``` + +For the release configuration, the sample application binaries are in `/intel64/Release/`; +for the debug configuration — in `/intel64/Debug/`. + +### Build the Sample Applications on Microsoft Windows* OS + +The recommended Windows* build environment is the following: +* Microsoft Windows* 10 +* Microsoft Visual Studio* 2015, 2017, or 2019 +* CMake* version 2.8 or higher + +> **NOTE**: If you want to use Microsoft Visual Studio 2019, you are required to install CMake 3.14. + +To build the C or C++ sample applications on Windows, go to the `\inference_engine\samples\c` or `\inference_engine\samples\cpp` directory, respectively, and run the `build_samples_msvc.bat` batch file: +```sh +build_samples_msvc.bat +``` + +By default, the script automatically detects the highest Microsoft Visual Studio version installed on the machine and uses it to create and build +a solution for a sample code. Optionally, you can also specify the preferred Microsoft Visual Studio version to be used by the script. Supported +versions are `VS2015`, `VS2017`, and `VS2019`. For example, to build the C++ samples using the Microsoft Visual Studio 2017, use the following command: +```sh +\inference_engine\samples\cpp\build_samples_msvc.bat VS2017 +``` + +Once the build is completed, you can find sample binaries in the following folders: +* C samples: `C:\Users\\Documents\Intel\OpenVINO\inference_engine_c_samples_build\intel64\Release` +* C++ samples: `C:\Users\\Documents\Intel\OpenVINO\inference_engine_cpp_samples_build\intel64\Release` + +You can also build a generated solution manually. For example, if you want to build C++ sample binaries in Debug configuration, run the appropriate version of the +Microsoft Visual Studio and open the generated solution file from the `C:\Users\\Documents\Intel\OpenVINO\inference_engine_cpp_samples_build\Samples.sln` +directory. + +## Get Ready for Running the Sample Applications + +### Get Ready for Running the Sample Applications on Linux* + +Before running compiled binary files, make sure your application can find the +Inference Engine and OpenCV libraries. +Run the `setupvars` script to set all necessary environment variables: +```sh +source /bin/setupvars.sh +``` + +**(Optional)**: The OpenVINO environment variables are removed when you close the +shell. As an option, you can permanently set the environment variables as follows: + +1. Open the `.bashrc` file in ``: +```sh +vi /.bashrc +``` + +2. Add this line to the end of the file: +```sh +source /opt/intel/openvino/bin/setupvars.sh +``` + +3. Save and close the file: press the **Esc** key, type `:wq` and press the **Enter** key. +4. To test your change, open a new terminal. You will see `[setupvars.sh] OpenVINO environment initialized`. + +You are ready to run sample applications. To learn about how to run a particular +sample, read the sample documentation by clicking the sample name in the samples +list above. + +### Get Ready for Running the Sample Applications on Windows* + +Before running compiled binary files, make sure your application can find the +Inference Engine and OpenCV libraries. +Use the `setupvars` script, which sets all necessary environment variables: +```sh +\bin\setupvars.bat +``` + +To debug or run the samples on Windows in Microsoft Visual Studio, make sure you +have properly configured **Debugging** environment settings for the **Debug** +and **Release** configurations. Set correct paths to the OpenCV libraries, and +debug and release versions of the Inference Engine libraries. +For example, for the **Debug** configuration, go to the project's +**Configuration Properties** to the **Debugging** category and set the `PATH` +variable in the **Environment** field to the following: + +```sh +PATH=\deployment_tools\inference_engine\bin\intel64\Debug;\opencv\bin;%PATH% +``` +where `` is the directory in which the OpenVINO toolkit is installed. + +You are ready to run sample applications. To learn about how to run a particular +sample, read the sample documentation by clicking the sample name in the samples +list above. + +## See Also +* [Introduction to Intel's Deep Learning Inference Engine](Introduction.md) diff --git a/docs/IE_DG/ShapeInference.md b/docs/IE_DG/ShapeInference.md new file mode 100644 index 00000000000000..58203a3f841ad6 --- /dev/null +++ b/docs/IE_DG/ShapeInference.md @@ -0,0 +1,129 @@ +Using Shape Inference {#openvino_docs_IE_DG_ShapeInference} +========================================== + +Inference Engine takes two kinds of model description as an input: [Intermediate Representation (IR)](../MO_DG/IR_and_opsets.md) and [nGraph::Function](nGraph_Flow.md) objects. +Both should have fixed input shapes to be successfully loaded to the Inference Engine. +To feed input data of a shape that is different from the model input shape, resize the model first. + +Model resizing on the stage of IR generation or [nGraph::Function creation](nGraphTutorial.md) is the recommended approach. +OpenVINO™ provides the following experimental methods for runtime model reshaping: + +1. Setting a new input shape with the `InferenceEngine::CNNNetwork::reshape` method + + `InferenceEngine::CNNNetwork::reshape` method updates input shapes and propagates them down to the outputs of the model through all intermediate layers. + + Shape propagation for `InferenceEngine::CNNNetwork` objects created from `nGraph::Function` or IR of the version 10 works through the `nGraph` shape inference mechanism. + `InferenceEngine::CNNNetwork` objects created from lower IR versions are considered deprecated and may be reshaped incorrectly or give unexpected results. + + To keep the v10 IR resizable by the `InferenceEngine::CNNNetwork::reshape` method, convert the model with the additional Model Optimizer key `--keep_shape_ops`. + +2. Setting a new batch dimension value with the `InferenceEngine::CNNNetwork::setBatchSize` method + + The meaning of a model batch may vary depending on choices you made during the model designing. + The `InferenceEngine::CNNNetwork::setBatchSize` method deduces index of batch dimension relying only on the input rank. + This method does not work for models with a non-zero index batch placement or models with inputs without a batch dimension. + + Batch-setting algorithm does not involve shape inference mechanism. + Batch of input and output shapes for all layers is set to a new batch value without layer validation. + It may cause both positive and negative side effects. + + Due to the limitations described above, the current method is recommended for simple image processing models only. + + +Practically, some models are not ready to be resized. In this case, a new input shape cannot be set with the Model Optimizer or the `InferenceEngine::CNNNetwork::reshape` method. + +## Troubleshooting Resize Errors + +Operation semantics may impose restrictions on input shapes of the operation. +Shape collision during shape propagation may be a sign that a new shape does not satisfy the restrictions. +Changing the model input shape may result in intermediate operations shape collision. + +Examples of such operations: +- `Reshape` operation with a hard-coded output shape value +- `MatMul` operation with the `Const` second input cannot be resized by spatial dimensions due to operation semantics + +Model structure and logic should not change significantly after resizing. +- The Global Pooling operation is commonly used to reduce output feature map of classification models output. +Having the input of the shape [N, C, H, W], Global Pooling returns the output of the shape [N, C, 1, 1]. +Model architects usually express Global Pooling with the help of the `Pooling` operation with the fixed kernel size [H, W]. +During spatial reshape, having the input of the shape [N, C, H1, W1], Pooling with the fixed kernel size [H, W] returns the output of the shape [N, C, H2, W2], where H2 and W2 are commonly not equal to `1`. +It breaks the classification model structure. +For example, [publicly available Inception family models from TensorFlow*](https://github.com/tensorflow/models/tree/master/research/slim#pre-trained-models) have this issue. + +- Resizing the model input shape may significantly affect its accuracy. +For example, Object Detection models from TensorFlow have resizing restrictions by design. +To keep the model valid after the reshape, choose a new input shape that satisfies conditions listed in the `pipeline.config` file. +For details, refer to the Tensorflow Object Detection API models resizing techniques. + +## Usage of Reshape Method + +The primary method of the feature is `InferenceEngine::CNNNetwork::reshape`. +It gets new input shapes and propagates it from input to output for all intermediates layers of the given network. +The method takes `InferenceEngine::ICNNNetwork::InputShapes` - a map of pairs: name of input data and its dimension. + +The algorithm for resizing network is the following: + +1) **Collect the map of input names and shapes from Intermediate Representation (IR)** using helper method `InferenceEngine::CNNNetwork::getInputShapes` + +2) **Set new input shapes** + +3) **Call reshape** + +Here is a code example: +```cpp + InferenceEngine::Core core; + // ------------- 0. Read IR and image ---------------------------------------------- + CNNNetwork network = core.ReadNetwork("path/to/IR/xml"); + cv::Mat image = cv::imread("path/to/image"); + // --------------------------------------------------------------------------------- + + // ------------- 1. Collect the map of input names and shapes from IR--------------- + auto input_shapes = network.getInputShapes(); + // --------------------------------------------------------------------------------- + + // ------------- 2. Set new input shapes ------------------------------------------- + std::string input_name; + SizeVector input_shape; + std::tie(input_name, input_shape) = *input_shapes.begin(); // let's consider first input only + input_shape[0] = batch_size; // set batch size to the first input dimension + input_shape[2] = image.rows; // changes input height to the image one + input_shape[3] = image.cols; // changes input width to the image one + input_shapes[input_name] = input_shape; + // --------------------------------------------------------------------------------- + + // ------------- 3. Call reshape --------------------------------------------------- + network.reshape(input_shapes); + // --------------------------------------------------------------------------------- + + ... + + // ------------- 4. Loading model to the device ------------------------------------ + std::string device = "CPU"; + ExecutableNetwork executable_network = core.LoadNetwork(network, device); + // --------------------------------------------------------------------------------- + + +``` +Shape Inference feature is used in [Smart classroom sample](@ref omz_demos_smart_classroom_demo_README). + +## Extensibility + +Inference Engine provides a special mechanism that allows to add the support of shape inference for custom operations. +This mechanism is described in the [Extensibility documentation](Extensibility_DG/Intro.md) + +## Deprecation Notice + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* diff --git a/docs/IE_DG/Tools_Overview.md b/docs/IE_DG/Tools_Overview.md new file mode 100644 index 00000000000000..6c543c810d0d2f --- /dev/null +++ b/docs/IE_DG/Tools_Overview.md @@ -0,0 +1,17 @@ +# OpenVINO™ Tools {#openvino_docs_IE_DG_Tools_Overview} + +OpenVINO™ tools are C++ and Python\* console command line applications that can be used for models downloading, accuracy measurement, calibration and checking. + +The OpenVINO™ toolkit installation includes the following tools: + +|Tool | Location in the Installation Directory| +|-----------------------------------------------------------------------------|---------------------------------------| +|[Accuracy Checker Tool](@ref omz_tools_accuracy_checker_README) | `/deployment_tools/tools/open_model_zoo/tools/accuracy_checker`| +|[Post-Training Optimization Tool](@ref pot_README) | `/deployment_tools/tools/post_training_optimization_toolkit`| +|[Model Downloader](@ref omz_tools_downloader_README) | `/deployment_tools/tools/model_downloader`| +|[Cross Check Tool](../../inference-engine/tools/cross_check_tool/README.md) | `/deployment_tools/tools/cross_check_tool`| +|[Compile Tool](../../inference-engine/tools/compile_tool/README.md) | `/deployment_tools/inference_engine/lib/intel64/`| + + +## See Also +* [Introduction to Deep Learning Inference Engine](Introduction.md) diff --git a/docs/IE_DG/img/NewAndOldCNNNetworkImpl.png b/docs/IE_DG/img/NewAndOldCNNNetworkImpl.png new file mode 100644 index 00000000000000..b5868b343487f8 --- /dev/null +++ b/docs/IE_DG/img/NewAndOldCNNNetworkImpl.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5389b6d0a25e8356002bd8c68526ceedf39f6c4efa5e7097b5ac0308fd42dee3 +size 48611 diff --git a/docs/IE_DG/img/TopLevelNGraphFlow.png b/docs/IE_DG/img/TopLevelNGraphFlow.png new file mode 100644 index 00000000000000..4359676d20ca52 --- /dev/null +++ b/docs/IE_DG/img/TopLevelNGraphFlow.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c416156d9ed77213ead230fc49c32a3c3918e52128ac2db442f56062e206bc01 +size 708262 diff --git a/docs/IE_DG/img/bf16_format.png b/docs/IE_DG/img/bf16_format.png new file mode 100644 index 00000000000000..bf92086a96faa8 --- /dev/null +++ b/docs/IE_DG/img/bf16_format.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ce6fb1c626ac0858b411c86fa2e3a46c5ca0dc2e88692284ce4ec24edb141e7f +size 9326 diff --git a/docs/IE_DG/img/conv_depth_01.png b/docs/IE_DG/img/conv_depth_01.png new file mode 100644 index 00000000000000..516b01d6d1b0d3 --- /dev/null +++ b/docs/IE_DG/img/conv_depth_01.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:80edd1da1c5673d18afa44bc2c0503ba9ecdcc37c2acb94960303b61c602ceee +size 12649 diff --git a/docs/IE_DG/img/conv_simple_01.png b/docs/IE_DG/img/conv_simple_01.png new file mode 100644 index 00000000000000..6de6f46e36e3af --- /dev/null +++ b/docs/IE_DG/img/conv_simple_01.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d3e8856aa175d6fcf940af57a53f962ff6c58acf0a3838bfccc6a093bff1756d +size 9015 diff --git a/docs/IE_DG/img/conv_sum_relu_01.png b/docs/IE_DG/img/conv_sum_relu_01.png new file mode 100644 index 00000000000000..7007115294fbac --- /dev/null +++ b/docs/IE_DG/img/conv_sum_relu_01.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7d53ce33f180cf4d170bbeb69635ee7c49a67d3f6ee8b1c01ec12568fe1cca38 +size 17157 diff --git a/docs/IE_DG/img/cpu_int8_flow.png b/docs/IE_DG/img/cpu_int8_flow.png new file mode 100644 index 00000000000000..130e54ceafa638 --- /dev/null +++ b/docs/IE_DG/img/cpu_int8_flow.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3965f4830c45518ee1dc169c2b1760cae83f8a8819023770a28893c6cef558c2 +size 68441 diff --git a/docs/IE_DG/img/deploy_encrypted_model.png b/docs/IE_DG/img/deploy_encrypted_model.png new file mode 100644 index 00000000000000..9338c59dcf273d --- /dev/null +++ b/docs/IE_DG/img/deploy_encrypted_model.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:25ed719bdd525dc0b606ef17a3fec5303ea032dfe6b2d167e1b19b6100b6fb37 +size 16516 diff --git a/docs/IE_DG/img/deploy_encrypted_model.vsdx b/docs/IE_DG/img/deploy_encrypted_model.vsdx new file mode 100644 index 00000000000000..9d1086462bd0c3 --- /dev/null +++ b/docs/IE_DG/img/deploy_encrypted_model.vsdx @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:55c5fd6517ae9e3639f2214167665ffbb4b641cd2abef155ff816c68478915e2 +size 54233 diff --git a/docs/IE_DG/img/example_sample_output.png b/docs/IE_DG/img/example_sample_output.png new file mode 100644 index 00000000000000..f9299373c97e21 --- /dev/null +++ b/docs/IE_DG/img/example_sample_output.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5fbfb33c1a860978b8b99cf4dfbc04b5f7fbe0e20af03cd3e5ffd1d6a9f2db40 +size 353490 diff --git a/docs/IE_DG/img/fpga_full_workflow.png b/docs/IE_DG/img/fpga_full_workflow.png new file mode 100644 index 00000000000000..754bb37cea7fe0 --- /dev/null +++ b/docs/IE_DG/img/fpga_full_workflow.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3f0f329112b9c8227cbba3d394b778a6d219b4f3fc0d02cc5f2f8598c3d4eb51 +size 151678 diff --git a/docs/IE_DG/img/fpga_platform_hub.png b/docs/IE_DG/img/fpga_platform_hub.png new file mode 100644 index 00000000000000..bc5e7e66492611 --- /dev/null +++ b/docs/IE_DG/img/fpga_platform_hub.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0b46a1f89df96410a87f90801c9a86a28a6aacb39fa4677b434d856559f163fe +size 217954 diff --git a/docs/IE_DG/img/fullyconnected_activation_01.png b/docs/IE_DG/img/fullyconnected_activation_01.png new file mode 100644 index 00000000000000..776b14b46feb2a --- /dev/null +++ b/docs/IE_DG/img/fullyconnected_activation_01.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:88745fd132531e943d59afe59ed6af8eaae6b62ba1fda2493dfef76080d31a25 +size 7788 diff --git a/docs/IE_DG/img/group_convolutions_01.png b/docs/IE_DG/img/group_convolutions_01.png new file mode 100644 index 00000000000000..237523823c3503 --- /dev/null +++ b/docs/IE_DG/img/group_convolutions_01.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9709bc83f903943b4d737d379babf80a391a72ad8eab98e71abcc0de5424fbfc +size 12361 diff --git a/docs/IE_DG/img/hor_fusion_1.png b/docs/IE_DG/img/hor_fusion_1.png new file mode 100644 index 00000000000000..4fee4887cdb208 --- /dev/null +++ b/docs/IE_DG/img/hor_fusion_1.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f6ff04de33684f00d0d2da8fed6d30b5162c566b35b8894e9e14f7921db70592 +size 8598 diff --git a/docs/IE_DG/img/hor_fusion_2.png b/docs/IE_DG/img/hor_fusion_2.png new file mode 100644 index 00000000000000..937fbafe09b84e --- /dev/null +++ b/docs/IE_DG/img/hor_fusion_2.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9a453412cf37f06e1e5a63f5ff629d4e16ed1707fc55b5a63cc03e710807b33e +size 10151 diff --git a/docs/IE_DG/img/hor_fusion_3.png b/docs/IE_DG/img/hor_fusion_3.png new file mode 100644 index 00000000000000..3aacdbd6f00a61 --- /dev/null +++ b/docs/IE_DG/img/hor_fusion_3.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b3be59a71703b640eac6ad99ce3d463141a36e58f5299bf21e4f6aba152d9ed6 +size 9359 diff --git a/docs/IE_DG/img/hor_fusion_4.png b/docs/IE_DG/img/hor_fusion_4.png new file mode 100644 index 00000000000000..0a439dafc18f69 --- /dev/null +++ b/docs/IE_DG/img/hor_fusion_4.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:50f41274758a989c9ef43e558343d420d7e4e288c88ac2d19a2bf396d5ee573c +size 9937 diff --git a/docs/IE_DG/img/integration_process.png b/docs/IE_DG/img/integration_process.png new file mode 100644 index 00000000000000..cb1070821064d7 --- /dev/null +++ b/docs/IE_DG/img/integration_process.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9fff52e5faaf108371db87e53959453216554152b15ca0432b1541f94def297e +size 19145 diff --git a/docs/IE_DG/img/intel_logo.png b/docs/IE_DG/img/intel_logo.png new file mode 100644 index 00000000000000..77a3ff51275b83 --- /dev/null +++ b/docs/IE_DG/img/intel_logo.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2d147adf801535e95d8b627a8a1d23f7b89dea1eabe06218235e756b0a9866fe +size 1636 diff --git a/docs/IE_DG/img/ir_add_n_ref.png b/docs/IE_DG/img/ir_add_n_ref.png new file mode 100644 index 00000000000000..cc21c584f0ed4f --- /dev/null +++ b/docs/IE_DG/img/ir_add_n_ref.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a9aae473dcc469ebdb5c2d9ac8067bf8c7caa11d4cdbc7e0dd0b2006621ce526 +size 4267 diff --git a/docs/IE_DG/img/mkldnn_conv_sum.png b/docs/IE_DG/img/mkldnn_conv_sum.png new file mode 100644 index 00000000000000..d1c56f77128b3f --- /dev/null +++ b/docs/IE_DG/img/mkldnn_conv_sum.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:af2641e8e685b027123681ab542162932b008eff257ef5b7105950bfe8b4ade8 +size 10373 diff --git a/docs/IE_DG/img/mkldnn_conv_sum_result.png b/docs/IE_DG/img/mkldnn_conv_sum_result.png new file mode 100644 index 00000000000000..67dc87cd3263b7 --- /dev/null +++ b/docs/IE_DG/img/mkldnn_conv_sum_result.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:02efdda675c16def7c2705e978964ce8bf65d1ec6cedfdb0a5afc837fb57abf0 +size 5660 diff --git a/docs/IE_DG/img/mkldnn_group_conv.png b/docs/IE_DG/img/mkldnn_group_conv.png new file mode 100644 index 00000000000000..c433a6b5484a1b --- /dev/null +++ b/docs/IE_DG/img/mkldnn_group_conv.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e69242d80da7676311e20e5db67c01bd6562008ecf3a53df8fdedaefabb91b70 +size 7226 diff --git a/docs/IE_DG/img/opt-notice-en_080411.gif b/docs/IE_DG/img/opt-notice-en_080411.gif new file mode 100644 index 00000000000000..ceddf9732d7809 --- /dev/null +++ b/docs/IE_DG/img/opt-notice-en_080411.gif @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d4457dbe05630bf90294396c4185b280634a5bf1ac7a6ca1c5186be67eb1cc4a +size 54231 diff --git a/docs/IE_DG/img/optimizations/groups.png b/docs/IE_DG/img/optimizations/groups.png new file mode 100644 index 00000000000000..b497e16547b85c --- /dev/null +++ b/docs/IE_DG/img/optimizations/groups.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3812efef32bd7f1bf40b130d5d522bc3df6aebd406bd1186699d214bca856722 +size 43721 diff --git a/docs/IE_DG/img/optimizations/inception_v4.png b/docs/IE_DG/img/optimizations/inception_v4.png new file mode 100644 index 00000000000000..64058527a5de82 --- /dev/null +++ b/docs/IE_DG/img/optimizations/inception_v4.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e232c47e8500f42bd0e1f2b93f94f58e2d59caee149c687be3cdc3e8a5be59a +size 18417 diff --git a/docs/IE_DG/img/optimizations/resnet_269.png b/docs/IE_DG/img/optimizations/resnet_269.png new file mode 100644 index 00000000000000..4ef638090e9f61 --- /dev/null +++ b/docs/IE_DG/img/optimizations/resnet_269.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:92d36b9527a3e316cd9eb2b6f5054c312466df004e4aa9c3458e165330bc6561 +size 24157 diff --git a/docs/IE_DG/img/optimizations/resnet_optimization.png b/docs/IE_DG/img/optimizations/resnet_optimization.png new file mode 100644 index 00000000000000..b276e81a2dd18e --- /dev/null +++ b/docs/IE_DG/img/optimizations/resnet_optimization.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2adeca1e3512b9fe7b088a5412ce21592977a1f352a013735537ec92e895dc94 +size 15653 diff --git a/docs/IE_DG/img/pooling_fakequant_01.png b/docs/IE_DG/img/pooling_fakequant_01.png new file mode 100644 index 00000000000000..2310488df403a9 --- /dev/null +++ b/docs/IE_DG/img/pooling_fakequant_01.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:37c7908d2379cc2ba1909965c58de7bc55d131a330c47e173321c718846d6745 +size 7809 diff --git a/docs/IE_DG/img/workflow_steps.png b/docs/IE_DG/img/workflow_steps.png new file mode 100644 index 00000000000000..6bf780127ad14c --- /dev/null +++ b/docs/IE_DG/img/workflow_steps.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5e22bc22d614c7335ae461a8ce449ea8695973d755faca718cf74b95972c94e2 +size 19773 diff --git a/docs/IE_DG/img/yolo_tiny_v1.png b/docs/IE_DG/img/yolo_tiny_v1.png new file mode 100644 index 00000000000000..a92f7ed806adc9 --- /dev/null +++ b/docs/IE_DG/img/yolo_tiny_v1.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f5d909bcaa7f6ec95cb0e3bf1b676b031489e89afa411e6add1aa2faaf90e0b3 +size 101557 diff --git a/docs/IE_DG/inference_engine_intro.md b/docs/IE_DG/inference_engine_intro.md new file mode 100644 index 00000000000000..0e54e11c5787fc --- /dev/null +++ b/docs/IE_DG/inference_engine_intro.md @@ -0,0 +1,106 @@ +Introduction to Inference Engine {#openvino_docs_IE_DG_inference_engine_intro} +================================ + +After you have used the Model Optimizer to create an Intermediate Representation (IR), use the Inference Engine to infer the result for a given input data. + +Inference Engine is a set of C++ libraries providing a common API to deliver inference solutions on the platform of your choice: CPU, GPU, VPU, or FPGA. Use the Inference Engine API to read the Intermediate Representation, set the input and output formats, and execute the model on devices. While the C++ libraries is the primary implementation, C libraries and Python bindings are also available. + +For Intel® Distribution of OpenVINO™ toolkit, Inference Engine binaries are delivered within release packages. + +The open source version is available in the [OpenVINO™ toolkit GitHub repository](https://github.com/openvinotoolkit/openvino) and can be built for supported platforms using the Inference Engine Build Instructions. + +To learn about how to use the Inference Engine API for your application, see the [Integrating Inference Engine in Your Application](Integrate_with_customer_application_new_API.md) documentation. + +For complete API Reference, see the [API Reference](usergroup29.html) section. + +Inference Engine uses a plugin architecture. Inference Engine plugin is a software component that contains complete implementation for inference on a certain Intel® hardware device: CPU, GPU, VPU, FPGA, etc. Each plugin implements the unified API and provides additional hardware-specific APIs. + +Modules in the Inference Engine component +--------------------------------------- + +### Core Inference Engine Libraries ### + +Your application must link to the core Inference Engine libraries: +* Linux* OS: + - `libinference_engine.so`, which depends on `libinference_engine_transformations.so` and `libngraph.so` + - `libinference_engine_legacy.so`, which depends on `libtbb.so` +* Windows* OS: + - `inference_engine.dll`, which depends on `inference_engine_transformations.dll` and `ngraph.dll` + - `inference_engine_legacy.dll`, which depends on `tbb.dll` + +The required C++ header files are located in the `include` directory. + +This library contains the classes to: +* Read the network (InferenceEngine::CNNNetReader) +* Manipulate network information (InferenceEngine::CNNNetwork) +* Create Inference Engine Core object to work with devices (InferenceEngine::Core) +* Execute and pass inputs and outputs (InferenceEngine::ExecutableNetwork and InferenceEngine::InferRequest) + +### Device-specific Plugin Libraries ### + +For each supported target device, Inference Engine provides a plugin — a DLL/shared library that contains complete implementation for inference on this particular device. The following plugins are available: + +| Plugin | Device Type | +| ------------- | ------------- | +|CPU| Intel® Xeon® with Intel® AVX2 and AVX512, Intel® Core™ Processors with Intel® AVX2, Intel® Atom® Processors with Intel® SSE | +|GPU| Intel® Processor Graphics, including Intel® HD Graphics and Intel® Iris® Graphics +|FPGA| Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA, Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Speed Grade 2) | +|MYRIAD| Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X| +|GNA| Intel® Speech Enabling Developer Kit, Amazon Alexa* Premium Far-Field Developer Kit, Intel® Pentium® Silver J5005 Processor, Intel® Pentium® Silver N5000 Processor, Intel® Celeron® J4005 Processor, Intel® Celeron® J4105 Processor, Intel® Celeron® Processor N4100, Intel® Celeron® Processor N4000, Intel® Core™ i3-8121U Processor, Intel® Core™ i7-1065G7 Processor, Intel® Core™ i7-1060G7 Processor, Intel® Core™ i5-1035G4 Processor, Intel® Core™ i5-1035G7 Processor, Intel® Core™ i5-1035G1 Processor, Intel® Core™ i5-1030G7 Processor, Intel® Core™ i5-1030G4 Processor, Intel® Core™ i3-1005G1 Processor, Intel® Core™ i3-1000G1 Processor, Intel® Core™ i3-1000G4 Processor +|HETERO|Automatic splitting of a network inference between several devices (for example if a device doesn't support certain layers| +|MULTI| Simultaneous inference of the same network on several devices in parallel| + +The table below shows the plugin libraries and additional dependencies for Linux and Windows platforms. + +| Plugin | Library name for Linux | Dependency libraries for Linux | Library name for Windows | Dependency libraries for Windows | +|--------|------------------------|-------------------------------------------------|--------------------------|--------------------------------------------------------------------------------------------------------| +| CPU | `libMKLDNNPlugin.so` | `libinference_engine_lp_transformations.so` | `MKLDNNPlugin.dll` | `inference_engine_lp_transformations.dll` | +| GPU | `libclDNNPlugin.so` | `libinference_engine_lp_transformations.so`, `libOpenCL.so` | `clDNNPlugin.dll` | `OpenCL.dll`, `inference_engine_lp_transformations.dll` | +| FPGA | `libdliaPlugin.so` | `libdla_compiler_core.so`, `libdla_runtime_core.so`, `libcrypto.so`, `libalteracl.so`, `liblpsolve5525.so`, `libprotobuf.so`, `libacl_emulator_kernel_rt.so` | `dliaPlugin.dll` | `dla_compiler_core.dll`, `dla_runtime_core.dll`, `crypto.dll`, `alteracl.dll`, `lpsolve5525.dll`, `protobuf.dll`, `acl_emulator_kernel_rt.dll` +| MYRIAD | `libmyriadPlugin.so` | `libusb.so`, `libinference_engine_lp_transformations.so` | `myriadPlugin.dll` | `usb.dll`, `inference_engine_lp_transformations.dll` | +| HDDL | `libHDDLPlugin.so` | `libbsl.so`, `libhddlapi.so`, `libmvnc-hddl.so`, `libinference_engine_lp_transformations.so`| `HDDLPlugin.dll` | `bsl.dll`, `hddlapi.dll`, `json-c.dll`, `libcrypto-1_1-x64.dll`, `libssl-1_1-x64.dll`, `mvnc-hddl.dll`, `inference_engine_lp_transformations.dll` | +| GNA | `libGNAPlugin.so` | `libgna.so`, `libinference_engine_lp_transformations.so` | `GNAPlugin.dll` | `gna.dll`, `inference_engine_lp_transformations.dll` | +| HETERO | `libHeteroPlugin.so` | Same as for selected plugins | `HeteroPlugin.dll` | Same as for selected plugins | +| MULTI | `libMultiDevicePlugin.so` | Same as for selected plugins | `MultiDevicePlugin.dll` | Same as for selected plugins | + +> **NOTE**: All plugin libraries also depend on core Inference Engine libraries. + +Make sure those libraries are in your computer's path or in the place you pointed to in the plugin loader. Make sure each plugin's related dependencies are in the: + +* Linux: `LD_LIBRARY_PATH` +* Windows: `PATH` + +On Linux, use the script `bin/setupvars.sh` to set the environment variables. + +On Windows, run the `bin\setupvars.bat` batch file to set the environment variables. + +To learn more about supported devices and corresponding plugins, see the [Supported Devices](supported_plugins/Supported_Devices.md) chapter. + +Common Workflow for Using the Inference Engine API +--------------------------- +The common workflow contains the following steps: + +1. **Create Inference Engine Core object** - Create an `InferenceEngine::Core` object to work with different devices, all device plugins are managed internally by the `Core` object. Register extensions with custom nGraph operations (`InferenceEngine::Core::AddExtension`). + +2. **Read the Intermediate Representation** - Using the `InferenceEngine::Core` class, read an Intermediate Representation file into an object of the `InferenceEngine::CNNNetwork` class. This class represents the network in the host memory. + +3. **Prepare inputs and outputs format** - After loading the network, specify input and output precision and the layout on the network. For these specification, use the `InferenceEngine::CNNNetwork::getInputsInfo()` and `InferenceEngine::CNNNetwork::getOutputsInfo()`. + +4. Pass per device loading configurations specific to this device (`InferenceEngine::Core::SetConfig`), and register extensions to this device (`InferenceEngine::Core::AddExtension`). + +4. **Compile and Load Network to device** - Use the `InferenceEngine::Core::LoadNetwork()` method with specific device (e.g. `CPU`, `GPU`, etc.) to compile and load the network on the device. Pass in the per-target load configuration for this compilation and load operation. + +5. **Set input data** - With the network loaded, you have an `InferenceEngine::ExecutableNetwork` object. Use this object to create an `InferenceEngine::InferRequest` in which you signal the input buffers to use for input and output. Specify a device-allocated memory and copy it into the device memory directly, or tell the device to use your application memory to save a copy. + +6. **Execute** - With the input and output memory now defined, choose your execution mode: + + * Synchronously - `InferenceEngine::InferRequest::Infer()` method. Blocks until inference is completed. + * Asynchronously - `InferenceEngine::InferRequest::StartAsync()` method. Check status with the `InferenceEngine::InferRequest::Wait()` method (0 timeout), wait, or specify a completion callback. + +7. **Get the output** - After inference is completed, get the output memory or read the memory you provided earlier. Do this with the `InferenceEngine::IInferRequest::GetBlob()` method. + + +Further Reading +--------------- + +For more details on the Inference Engine API, refer to the [Integrating Inference Engine in Your Application](Integrate_with_customer_application_new_API.md) documentation. diff --git a/docs/IE_DG/nGraphTutorial.md b/docs/IE_DG/nGraphTutorial.md new file mode 100644 index 00000000000000..41a0a294964d52 --- /dev/null +++ b/docs/IE_DG/nGraphTutorial.md @@ -0,0 +1,81 @@ +# Build a Model with nGraph Library {#openvino_docs_IE_DG_nGraphTutorial} + +This section illustrates how to construct an nGraph function +composed of operations from the `opset3` namespace. Once created, +it can wrap into a `CNNNetwork`, creating utility for data scientists +or app developers to define a deep-learning model in a neutral way +that does not depend on existing Deep Learning (DL) frameworks. + +Operation Set `opsetX` integrates a list of nGraph pre-compiled operations that work +for this purpose. In other words, `opsetX` defines a set of operations for building a graph. + +For a complete list of operation sets supported by Inference Engine, see [Available Operations Sets](../ops/opset.md). + +To add custom nGraph operations to an existing `CNNNetwork`, see +the [Add Custom nGraph Operations](Extensibility_DG/Intro.md) document. + +Now that you can build graphs with anything from the `opset3` definition, some +parameters for shape-relevant (or shape-specific) inputs can be added. The +following code prepares a graph for shape-relevant parameters. + +> **NOTE**: `validate_nodes_and_infer_types(ops)` must be included for partial shape inference. + +```cpp +#include "ngraph/opsets/opset.hpp" +#include "ngraph/opsets/opset3.hpp" + +using namespace std; +using namespace ngraph; + +auto arg0 = make_shared(element::f32, Shape{7}); +auto arg1 = make_shared(element::f32, Shape{7}); +// Create an 'Add' operation with two inputs 'arg0' and 'arg1' +auto add0 = make_shared(arg0, arg1); +auto abs0 = make_shared(add0); +// Create a node whose inputs/attributes will be specified later +auto acos0 = make_shared(); +// Create a node using opset factories +auto add1 = shared_ptr(get_opset3().create("Add")); +// Set inputs to nodes explicitly +acos0->set_argument(0, add0); +add1->set_argument(0, acos0); +add1->set_argument(1, abs0); + +// Run shape inference on the nodes +NodeVector ops{arg0, arg1, add0, abs0, acos0, add1}; +validate_nodes_and_infer_types(ops); + +// Create a graph with one output (add1) and four inputs (arg0, arg1) +auto ng_function = make_shared(OutputVector{add1}, ParameterVector{arg0, arg1}); + +``` + +To wrap it into a CNNNetwork, use: +```cpp +CNNNetwork net (ng_function); +``` + +## Deprecation Notice + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* + +## See Also + +* [Available Operation Sets](../ops/opset.md) +* [Operation Set `opset1` Specification](../ops/opset1.md) +* [Operation Set `opset2` Specification](../ops/opset2.md) +* [Operation Set `opset3` Specification](../ops/opset3.md) +* [Inference Engine Extensibility Developer Guide](Extensibility_DG/Intro.md) diff --git a/docs/IE_DG/nGraph_Flow.md b/docs/IE_DG/nGraph_Flow.md new file mode 100644 index 00000000000000..abd4e3db0eeb64 --- /dev/null +++ b/docs/IE_DG/nGraph_Flow.md @@ -0,0 +1,159 @@ +# Introduction to nGraph Flow in Inference Engine {#openvino_docs_IE_DG_nGraph_Flow} + +## New Run-Time Intermediate Representation (IR): nGraph + +Starting from the OpenVINO™ release 2020.1, the Inference Engine integrates the +nGraph Core. +That implies that the Inference Engine uses a new way to represent a model in run time underneath of +the conventional `CNNNetwork` API, which is an instance of `ngraph::Function`. + +Besides the representation update, nGraph integration resulted in the following changes and new features: + +1. New operations sets. When operations from the nGraph Core were combined with conventional layers +from `CNNNetwork`, there were created a [new sets of operations called `opset1`, `opset2` and etc.](../ops/opset.md), +which covered both interfaces except several not very important cases. +Operations from `opset3` are generated by the Model Optimizer and are accepted in the Inference Engine. + +2. New version approach that attaches a version to each operation rather than to the entire IR file format. +IR is still versioned but has a different meaning. For details, see [Deep Learning Network Intermediate Representation and Operation Sets in OpenVINO™](../MO_DG/IR_and_opsets.md). + +3. Creating models in run-time without loading IR from an xml/binary file. You can enable it by creating +`ngraph::Function` passing it to `CNNNetwork`. + +4. Run-time reshape capability and constant folding are implemented through the nGraph code for more operations compared to previous releases. +As a result, more models can be reshaped. For details, see the [dedicated guide about the reshape capability](ShapeInference.md). + +5. Loading model from ONNX format without converting it to the Inference Engine IR. + +The conventional flow that is not based on nGraph is still available. +The complete picture of co-existence of legacy and new flows is presented below. +The rest of the document describes the coexistence of legacy and new flows showed in the picture below: + +![](img/TopLevelNGraphFlow.png) + + +## Read the Intermediate Representation to `CNNNetwork` + +As the new operation set is introduced, the Model Optimizer generates the IR version 10 using the new operations by default. +Each layer generated in the IR has a semantics matching to the corresponding operation from the nGraph namespace `opset3`. +The IR version 10 automatically triggers the nGraph flow inside the Inference Engine. +When such IR is read in an application, the Inference Engine IR reader produces `CNNNetwork` that encapsulates the `ngraph::Function` instance underneath. +Thus the OpenVINO IR becomes a new serialization format for the nGraph IR, and it can be deserialized reading the `CNNNetwork`. + +> **IMPORTANT**: Conventional interfaces are used (`CNNNetwork`, the reader), so no changes required in most applications. + +> **NOTE**: While you still can use old APIs, there is an independent process of continuous improvements in the Inference Engine API. +> For example, the Core::Read API is recommended to use instead of `CNNNetworkReader`. +> These changes are independent of nGraph integration and do not enable or disable new features. + +Interpretation of the IR version 10 differs from the old IR version. +Besides having a different operations set, the IR version 10 ignores the shapes and data types assigned to the ports in an XML file. +Both shapes and types are reinferred while loading to the Inference Engine using the nGraph shape and type propagation function that is a part of each nGraph operation. + +### Legacy IR Versions + +You can read old versions of the IR in the Inference Engine. +Each version below or equal to 7 is treated as an old one. +When the Inference Engine reader reads an old version of the IR, it does not use the nGraph representation. +There is no way to activate nGraph flow with an old IR version. +The rest of this document is not applied in this case. + +Model Optimizer generates the IR version 10 by default, and there is the command line key `--generate_deprecated_IR_V7` which switches generation to the legacy IR version 7. +It is useful when the new nGraph flow does not work for some reason. + +## Build a Model in the Application + +Alternative method to feed the Inference Engine with a model is to create the model in the run time. +It is achieved by creation of the `ngraph::Function` construction using nGraph operation classes and optionally user-defined operations. +For details, see [Add Custom nGraph Operations](Extensibility_DG/AddingNGraphOps.md) and [examples](nGraphTutorial.md). +At this stage, the code is completely independent of the rest of the Inference Engine code and can be built separately. +After you construct an instance of `ngraph::Function`, you can use it to create `CNNNetwork` by passing it to the new constructor for this class. + +Initializing `CNNNetwork` from the nGraph Function means encapsulating the object and not converting it to a conventional representation. +Going to low-level details, technically it is achieved by using another class for the `CNNNetwork` internals. +The old representation that is used for former versions of IR before version 10 uses `CNNNetworkImpl`. +The new representation that is built around nGraph uses `CNNNetworkNGraphImpl`. + +![](img/NewAndOldCNNNetworkImpl.png) + +## Automatic Conversion to the Old Representation + +The old representation is still required in the cases listed below. +When old representation is required, the conversion from the `ngraph::Function` to the old representation is called automatically. +The following methods lead to the automatic conversion: + +1. Using the old API, which is expected to produce an old representation. Guaranteed to be read-only. Once you call such a method, the original nGraph representation is preserved and continues to be used in the successive calls. + + 1.1. `CNNNetwork::serialize`. Dumps the old representation after automatically called conversion. Cannot be used to dump IR V10. For details, see [Graph Debug Capabilities](Graph_debug_capabilities.md). + +2. Calling `CNNNetwork` methods that modify the model. After that nGraph representation is lost and cannot be used afterwards. + + 1.1. `CNNNetwork::addLayer` + + 1.2. CNNNetwork::setBatchSize. Still implemented through old logic for backward compatibility without using nGraph capabilities. + For details, see [Using Shape Inference](ShapeInference.md). + +3. Using methods that return objects inside an old representation. +Using these methods does not mean modification of the model, but you are not limited by the API to make read-only changes. +These methods should be used in the read-only mode with respect to a model representation. +If the model is changed, for example attribute of some layer is changed or layers are reconnected, the modification is lost whenever any method that uses nGraph is called, including methods inside plugins like CNNNetwork::reshape. +It is hard to predict whether the nGraph function is used in a plugin or other methods of CNNNetworks, so modifying a network using the following methods is *strongly not recommended*. +This is an important limitation that is introduced for the old API calls listed below: + + 1.1. `Data::getInputTo` + + 1.2. `Data::getCreatorLayer` + + 1.3. `CNNNetwork::getLayerByName` + + 1.4. Iterating over `CNNLayer` objects in `CNNNetwork`: `CNNNetwork::begin`, `details::CNNNetworkIterator` class. + +4. Using a conventional plugin that accepts the old representation only. + +Though the conversion is always a one-way process, which means there is no method to convert back, there are important caveats. + +In the cases [1] and [3], both representations are held underneath and you should use the old representation in the read-only mode only from the caller side. +It is hard to track from the Inference Engine side whether the API is used in the read-only mode or for modification of the model. + +That is why when using potentially modifying methods listed in section [3] above, you should not modify the model via those methods. +Use a direct manipulation of the nGraph function instead. + +## Conversion Function + +Inference Engine implements the conversion function that is used when the nGraph function is transformed to the old `CNNNetworkImpl` representation. +This conversion function is hidden and you cannot call it directly from the application. +Nevertheless, it is an important component of the model transformation pipeline in the Inference Engine. +Some issues of models may be caught during the conversion process in this function. +Exceptions are thrown in this function, and you should know what this function does to find a root cause. + +The conversion function performs the following steps: + +1. Convert and decompose some operations as the first step of the nGraph function preparation for optimization. +Reduce operation set to easily optimize it at the next stages. +For example, decomposing of BatchNormInference happens at this stage. + +2. Optimizing transformations that usually happen in the Model Optimizer are called here, because the nGraph function is not always read from an already optimized IR. + +3. Changing operation set from `opsetX` to legacy layer semantics described in the [Legacy Layers Catalog](../MO_DG/prepare_model/convert_model/Legacy_IR_Layers_Catalog_Spec.md). +The model is still represented as the nGraph function at this stage, but the operation set is completely different. + +4. One-to-one conversion of nGraph representation to the corresponding `CNNNetworkImpl` without changing its semantics. +You can see the result of the conversion by calling the `CNNNetwork::serialize` method, which produces legacy IR semantics, which is not nGraph-based even if it is applied to `CNNNetwork` constructed from the nGraph Function. +It may help in debugging, see [Graph Debug Capabilities](Graph_debug_capabilities.md) to view all options for dumping new and old IR representations. + +## Deprecation Notice + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* diff --git a/docs/IE_DG/protecting_model_guide.md b/docs/IE_DG/protecting_model_guide.md new file mode 100644 index 00000000000000..75e82ebe2c6b3a --- /dev/null +++ b/docs/IE_DG/protecting_model_guide.md @@ -0,0 +1,71 @@ +# Using Encrypted Models with OpenVINO™ {#openvino_docs_IE_DG_protecting_model_guide} + +Deploying deep-learning capabilities to edge devices can present security +challenges. For example, ensuring inference integrity or providing copyright +protection of your deep-learning models. + +One possible solution is to use cryptography to protect models as they are +deployed and stored on edge devices. Model encryption, decryption and +authentication are not provided by OpenVINO™ but can be implemented with +third-party tools, like OpenSSL\*. While implementing encryption, ensure that +you use the latest versions of tools and follow cryptography best practices. + +This guide demonstrates how to use OpenVINO securely with protected models. + +## Secure Model Deployment + +After a model is optimized by the OpenVINO Model Optimizer, it's then deployed +to target devices in the Intermediate Representation (IR) format. An optimized +model is stored on an edge device and executed by the Inference Engine. + +To protect deep-learning models, you can encrypt an optimized model before +deploying it to the edge device. The edge device should keep the stored model +protected at all times and have the model decrypted **in runtime only** for use +by the Inference Engine. + +![deploy_encrypted_model] + +## Loading Encrypted Models + +The OpenVINO Inference Engine requires model decryption before loading. Allocate +a temporary memory block for model decryption, and use +`InferenceEngine::Core::ReadNetwork` method to load the model from memory buffer. +For more information, see the `InferenceEngine::Core` Class +Reference Documentation. + +```cpp +std::vector model; +std::vector weights; + +// Read model files and decrypt them into temporary memory block +decrypt_file(model_file, password, model); +decrypt_file(weights_file, password, weights); +``` + +Hardware-based protection, such as Intel® Software Guard Extensions +(Intel® SGX), can be utilized to protect decryption operation secrets and +bind them to a device. For more information, go to [Intel® Software Guard +Extensions](https://software.intel.com/en-us/sgx). + +Use `InferenceEngine::Core::ReadNetwork()` to set model representations and +weights respectively. + +```cpp +Core core; +// Load model from temporary memory block +std::string strModel(model.begin(), model.end()); +CNNNetwork network = core.ReadNetwork(strModel, make_shared_blob({Precision::U8, {weights.size()}, C}, weights.data())); +``` + +[deploy_encrypted_model]: img/deploy_encrypted_model.png + +## Additional Resources + +- Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) +- OpenVINO™ toolkit online documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org) +- Model Optimizer Developer Guide: [Model Optimizer Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) +- Inference Engine Developer Guide: [Inference Engine Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Deep_Learning_Inference_Engine_DevGuide.html) +- For more information on Sample Applications, see the [Inference Engine Samples Overview](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Samples_Overview.html) +- For information on a set of pre-trained models, see the [Overview of OpenVINO™ Toolkit Pre-Trained Models](@ref omz_models_intel_index) +- For information on Inference Engine Tutorials, see the [Inference Tutorials](https://github.com/intel-iot-devkit/inference-tutorials-generic) +- For IoT Libraries and Code Samples see the [Intel® IoT Developer Kit](https://github.com/intel-iot-devkit). diff --git a/docs/IE_DG/supported_plugins/CL_DNN.md b/docs/IE_DG/supported_plugins/CL_DNN.md new file mode 100644 index 00000000000000..a25012bf0732a0 --- /dev/null +++ b/docs/IE_DG/supported_plugins/CL_DNN.md @@ -0,0 +1,123 @@ +GPU Plugin {#openvino_docs_IE_DG_supported_plugins_CL_DNN} +======= + +The GPU plugin uses the Intel® Compute Library for Deep Neural Networks ([clDNN](https://01.org/cldnn)) to infer deep neural networks. +clDNN is an open source performance library for Deep Learning (DL) applications intended for acceleration of Deep Learning Inference on Intel® Processor Graphics including Intel® HD Graphics and Intel® Iris® Graphics. +For an in-depth description of clDNN, see: [clDNN sources](https://github.com/intel/clDNN) and [Accelerate Deep Learning Inference with Intel® Processor Graphics](https://software.intel.com/en-us/articles/accelerating-deep-learning-inference-with-intel-processor-graphics). + +## Optimizations + +The plugin supports algorithms that fuse several operations into one optimized operation. Refer to the sections below for details. + +> **NOTE**: For operation descriptions, see the [IR Notation Reference](../../ops/opset.md). + +### Fusing Convolution and Simple Layers + +Merge of a Convolution layer and any of the simple layers listed below: +- Activation: ReLU, ELU, Sigmoid, Clamp, and others +- Depthwise: ScaleShift, PReLU +- FakeQuantize + +> **NOTE**: You can have any number and order of simple layers. + +A combination of a Convolution layer and simple layers results in a single fused layer called +*Convolution*: +![conv_simple_01] + + +### Fusing Pooling and FakeQuantize Layers + +A combination of Pooling and FakeQuantize layers results in a single fused layer called *Pooling*: +![pooling_fakequant_01] + +### Fusing Activation Layers + +Given the linear pattern, an Activation layer can be fused into other layers: + +![fullyconnected_activation_01] + + +### Fusing Convolution and Sum Layers + +A combination of Convolution, Simple, and Eltwise layers with the sum operation results in a single layer called *Convolution*: +![conv_sum_relu_01] + +### Fusing a Group of Convolutions + +If a topology contains the following pipeline, a GPU plugin merges Split, Convolution, and Concatenation layers into a single Convolution layer with the group parameter: +> **NOTE**: Parameters of the Convolution layers must coincide. + +![group_convolutions_01] + +### Optimizing Layers Out + +The following layers are optimized out under certain conditions: + * Crop + * Concatenate + * Reshape + * Flatten + * Split + * Copy + +### Load-Time Execution + +Some layers are executed during the load time, not during the inference. One of such layers is PriorBox. + + +## CPU Executed Layers + +The following layers are not accelerated on the GPU and executed on the host CPU instead: +* Proposal +* SimplerNMS +* PriorBox +* DetectionOutput + +## Known Layers Limitations +* ROIPooling is supported for 'max' value of 'method' attribute. + +## Supported Configuration Parameters + +The plugin supports the configuration parameters listed below. +All parameters must be set before calling InferenceEngine::Core::LoadNetwork() in order to take effect. +When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix. + +| Parameter Name | Parameter Values | Default | Description | +|---------------------|-----------------------------|-----------------|-----------------------------------------------------------| +| `KEY_PERF_COUNT` | `YES` / `NO` | `NO` | Collect performance counters during inference | +| `KEY_CONFIG_FILE` | `" [ ...]"` | `""` | Load custom layer configuration files | +| `KEY_DUMP_KERNELS` | `YES` / `NO` | `NO` | Dump the final kernels used for custom layers | +| `KEY_TUNING_MODE` | `TUNING_DISABLED`
`TUNING_CREATE`
`TUNING_USE_EXISTING` | `TUNING_DISABLED` | Disable inference kernel tuning
Create tuning file (expect much longer runtime)
Use an existing tuning file | +| `KEY_TUNING_FILE` | `""` | `""` | Tuning file to create / use | +| `KEY_CLDNN_PLUGIN_PRIORITY` | `<0-3>` | `0` | OpenCL queue priority (before usage, make sure your OpenCL driver supports appropriate extension)
Higher value means higher priority for clDNN OpenCL queue. 0 disables the setting. | +| `KEY_CLDNN_PLUGIN_THROTTLE` | `<0-3>` | `0` | OpenCL queue throttling (before usage, make sure your OpenCL driver supports appropriate extension)
Lower value means lower driver thread priority and longer sleep time for it. 0 disables the setting. | +| `KEY_CLDNN_GRAPH_DUMPS_DIR` | `""` | `""` | clDNN graph optimizer stages dump output directory (in GraphViz format) | +| `KEY_CLDNN_SOURCES_DUMPS_DIR` | `""` | `""` | Final optimized clDNN OpenCL sources dump output directory | +| `KEY_GPU_THROUGHPUT_STREAMS` | `KEY_GPU_THROUGHPUT_AUTO`, or positive integer| 1 | Specifies a number of GPU "execution" streams for the throughput mode (upper bound for a number of inference requests that can be executed simultaneously).
This option is can be used to decrease GPU stall time by providing more effective load from several streams. Increasing the number of streams usually is more effective for smaller topologies or smaller input sizes. Note that your application should provide enough parallel slack (e.g. running many inference requests) to leverage full GPU bandwidth. Additional streams consume several times more GPU memory, so make sure the system has enough memory available to suit parallel stream execution. Multiple streams might also put additional load on CPU. If CPU load increases, it can be regulated by setting an appropriate `KEY_CLDNN_PLUGIN_THROTTLE` option value (see above). If your target system has relatively weak CPU, keep throttling low.
The default value is 1, which implies latency-oriented behaviour.
`KEY_GPU_THROUGHPUT_AUTO` creates bare minimum of streams to improve the performance; this is the most portable option if you are not sure how many resources your target machine has (and what would be the optimal number of streams).
A positive integer value creates the requested number of streams. | +| `KEY_EXCLUSIVE_ASYNC_REQUESTS` | `YES` / `NO` | `NO` | Forces async requests (also from different executable networks) to execute serially.| + +## Note on Debug Capabilities of the GPU Plugin + +Inference Engine GPU plugin provides possibility to dump the user custom OpenCL™ kernels to a file to allow you to properly debug compilation issues in your custom kernels. + +The application can use the SetConfig() function with the key PluginConfigParams::KEY_DUMP_KERNELS and value: PluginConfigParams::YES. Then during network loading, all custom layers will print their OpenCL kernels with the JIT instrumentation added by the plugin. +The kernels will be stored in the working directory under files named the following way: clDNN_program0.cl, clDNN_program1.cl. + +This option is disabled by default. Additionally, the application can call the SetConfig() function with the key PluginConfigParams::KEY_DUMP_KERNELS and value: PluginConfigParams::NO before network loading. + +How to verify that this option is disabled: +1. Delete all clDNN_program*.cl files from the current directory +2. Run your application to load a network +3. Examine the working directory for the presence of any kernel file (for example, clDNN_program0.cl) + +## GPU Context and Video Memory Sharing RemoteBlob API + +See [RemoteBlob API of GPU Plugin](GPU_RemoteBlob_API.md) + +## See Also +* [Supported Devices](Supported_Devices.md) + +[conv_simple_01]: ../img/conv_simple_01.png +[pooling_fakequant_01]: ../img/pooling_fakequant_01.png +[fullyconnected_activation_01]: ../img/fullyconnected_activation_01.png +[group_convolutions_01]: ../img/group_convolutions_01.png +[conv_sum_relu_01]: ../img/conv_sum_relu_01.png diff --git a/docs/IE_DG/supported_plugins/CPU.md b/docs/IE_DG/supported_plugins/CPU.md new file mode 100644 index 00000000000000..dec4b850c4d08c --- /dev/null +++ b/docs/IE_DG/supported_plugins/CPU.md @@ -0,0 +1,131 @@ +CPU Plugin {#openvino_docs_IE_DG_supported_plugins_CPU} +======= + +## Introducing CPU Plugin +The CPU plugin was developed in order to provide opportunity for high performance scoring of neural networks on CPU, using the Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN). + +Currently, the CPU plugin uses Intel® Threading Building Blocks (Intel® TBB) in order to parallelize calculations. Please refer to the [Optimization Guide](../../optimization_guide/dldt_optimization_guide.md) for associated performance considerations. + +The set of supported layers can be expanded with [the Extensibility mechanism](../Extensibility_DG/Intro.md). + +## Supported Platforms + +OpenVINO™ toolkit is officially supported and validated on the following platforms: + +| Host | OS (64-bit) | +| :--- | :--- | +| Development | Ubuntu* 16.04/CentOS* 7.4/MS Windows* 10 | +| Target | Ubuntu* 16.04/CentOS* 7.4/MS Windows* 10 | + +The CPU Plugin supports inference on Intel® Xeon® with Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX-512), and AVX512_BF16, Intel® Core™ +Processors with Intel® AVX2, Intel Atom® Processors with Intel® Streaming SIMD Extensions (Intel® SSE). + +You can use `-pc` the flag for samples to know which configuration is used by some layer. +This flag shows execution statistics that you can use to get information about layer name, +execution status, layer type, execution time, and the type of the execution primitive. + +## Internal CPU Plugin Optimizations + +CPU plugin supports several graph optimization algorithms, such as fusing or removing layers. +Refer to the sections below for details. + +> **NOTE**: For layer descriptions, see the [IR Notation Reference](../../ops/opset.md). + +### Lowering Inference Precision + +CPU plugin follows default optimization approach. This approach means that inference is made with lower precision if it is possible on a given platform to reach better performance with acceptable range of accuracy. + +> **NOTE**: For details, see the [Using Bfloat16 Inference](../Bfloat16Inference.md). + +### Fusing Convolution and Simple Layers + +Merge of a Convolution layer and any of the simple layers listed below: +- Activation: ReLU, ELU, Sigmoid, Clamp +- Depthwise: ScaleShift, PReLU +- FakeQuantize + +> **NOTE**: You can have any number and order of simple layers. + +A combination of a Convolution layer and simple layers results in a single fused layer called +*Convolution*: +![conv_simple_01] + + +### Fusing Pooling and FakeQuantize Layers + +A combination of Pooling and FakeQuantize layers results in a single fused layer called *Pooling*: +![pooling_fakequant_01] + +### Fusing FullyConnected and Activation Layers + +A combination of FullyConnected and Activation layers results in a single fused layer called +*FullyConnected*: +![fullyconnected_activation_01] + + +### Fusing Convolution and Depthwise Convolution Layers Grouped with Simple Layers + +> **NOTE**: This pattern is possible only on CPUs with support of Streaming SIMD Extensions 4.2 +> (SSE 4.2) and Intel AVX2 Instruction Set Architecture (ISA). + +A combination of a group of a Convolution (or Binary Convolution) layer and simple layers and a group of a Depthwise Convolution +layer and simple layers results in a single layer called *Convolution* (or *Binary Convolution*): +> **NOTE**: Depthwise convolution layers should have the same values for the `group`, input channels, and output channels parameters. + +![conv_depth_01] + +### Fusing Convolution and Sum Layers + +A combination of Convolution, Simple, and Eltwise layers with the sum operation results in a single layer called *Convolution*: +![conv_sum_relu_01] + +### Fusing a Group of Convolutions + +If a topology contains the following pipeline, a CPU plugin merges Split, Convolution, and Concatenation layers into a single Convolution layer with the group parameter: +> **NOTE**: Parameters of the Convolution layers must coincide. + +![group_convolutions_01] + +### Removing a Power Layer + +CPU plugin removes a Power layer from a topology if it has the following parameters: + - power = 1 + - scale = 1 + - offset = 0 + + +## Supported Configuration Parameters + +The plugin supports the configuration parameters listed below. +All parameters must be set with the InferenceEngine::Core::LoadNetwork() method. +When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix. +Refer to the OpenVINO samples for usage examples: [Benchmark App](../../../inference-engine/samples/benchmark_app/README.md). + +These are general options, also supported by other plugins: + +| Parameter name | Parameter values | Default | Description | +| :--- | :--- | :--- | :----------------------------------------------------------------------------------------------------------------------------| +| KEY_EXCLUSIVE_ASYNC_REQUESTS | YES/NO | NO | Forces async requests (also from different executable networks) to execute serially. This prevents potential oversubscription| +| KEY_PERF_COUNT | YES/NO | NO | Enables gathering performance counters | + +CPU-specific settings: + +| Parameter name | Parameter values | Default | Description | +| :--- | :--- | :--- | :--- | +| KEY_CPU_THREADS_NUM | positive integer values| 0 | Specifies the number of threads that CPU plugin should use for inference. Zero (default) means using all (logical) cores| +| KEY_CPU_BIND_THREAD | YES/NUMA/NO | YES | Binds inference threads to CPU cores. 'YES' (default) binding option maps threads to cores - this works best for static/synthetic scenarios like benchmarks. The 'NUMA' binding is more relaxed, binding inference threads only to NUMA nodes, leaving further scheduling to specific cores to the OS. This option might perform better in the real-life/contended scenarios. Note that for the latency-oriented cases (single execution stream, see below) both YES and NUMA options limit number of inference threads to the number of hardware cores (ignoring hyper-threading) on the multi-socket machines. | +| KEY_CPU_THROUGHPUT_STREAMS | KEY_CPU_THROUGHPUT_NUMA, KEY_CPU_THROUGHPUT_AUTO, or positive integer values| 1 | Specifies number of CPU "execution" streams for the throughput mode. Upper bound for the number of inference requests that can be executed simultaneously. All available CPU cores are evenly distributed between the streams. The default value is 1, which implies latency-oriented behavior with all available cores processing requests one by one.
KEY_CPU_THROUGHPUT_NUMA creates as many streams as needed to accommodate NUMA and avoid associated penalties.
KEY_CPU_THROUGHPUT_AUTO creates bare minimum of streams to improve the performance; this is the most portable option if you don't know how many cores your target machine has (and what would be the optimal number of streams). Note that your application should provide enough parallel slack (for example, run many inference requests) to leverage the throughput mode.
A positive integer value creates the requested number of streams. | +| KEY_ENFORCE_BF16 | YES/NO| YES | The name for setting to execute in bfloat16 precision whenever it is possible. This option lets plugin know to downscale the precision where it sees performance benefits from bfloat16 execution. Such option does not guarantee accuracy of the network, you need to verify the accuracy in this mode separately, based on performance and accuracy results. It should be your decision whether to use this option or not. | + +## See Also +* [Supported Devices](Supported_Devices.md) + +[mkldnn_group_conv]: ../img/mkldnn_group_conv.png +[mkldnn_conv_sum]: ../img/mkldnn_conv_sum.png +[mkldnn_conv_sum_result]: ../img/mkldnn_conv_sum_result.png +[conv_simple_01]: ../img/conv_simple_01.png +[pooling_fakequant_01]: ../img/pooling_fakequant_01.png +[fullyconnected_activation_01]: ../img/fullyconnected_activation_01.png +[conv_depth_01]: ../img/conv_depth_01.png +[group_convolutions_01]: ../img/group_convolutions_01.png +[conv_sum_relu_01]: ../img/conv_sum_relu_01.png diff --git a/docs/IE_DG/supported_plugins/FPGA.md b/docs/IE_DG/supported_plugins/FPGA.md new file mode 100644 index 00000000000000..c7c080bb4cc152 --- /dev/null +++ b/docs/IE_DG/supported_plugins/FPGA.md @@ -0,0 +1,294 @@ +FPGA Plugin {#openvino_docs_IE_DG_supported_plugins_FPGA} +=========== + +## Introducing FPGA Plugin + +The FPGA plugin provides an opportunity for high performance scoring of neural networks on Intel® FPGA devices. + +> **NOTE**: Before using the FPGA plugin, ensure that you have installed and configured either the Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) or the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA. For installation and configuration details, see [FPGA installation](Supported_Devices.md). + +## Heterogeneous Execution + +When your topology contains layers that are not supported by the Intel® FPGA plugin, use [Heterogeneous plugin](HETERO.md) with dedicated fallback device. + +If a network has layers that are not supported in the Intel® FPGA plugin or in a fallback plugin, you can implement a custom layer on the CPU/GPU and use the [Extensibility mechanism](../Extensibility_DG/Intro.md). +In addition to adding custom kernels, you must still point to the CPU plugin or the GPU plugin as fallback devices for heterogeneous plugin. + +## Supported Networks + +The following network topologies are supported in heterogeneous mode, running on FPGA with fallback to CPU or GPU devices. + +> **IMPORTANT**: Use only bitstreams from the current version of the OpenVINO toolkit. Bitstreams from older versions of the OpenVINO toolkit are incompatible with later versions of the OpenVINO toolkit. For example, you cannot use the `1-0-1_A10DK_FP16_Generic` bitstream, when the OpenVINO toolkit supports the `2019R2_PL2_FP16_InceptionV1_SqueezeNet_VGG_YoloV3.aocx` bitstream. + + +| Network | Bitstreams (Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2)) | Bitstreams (Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA) | +|:-------------------------------------|:-------------------------------------------------------------------|:---------------------------------------------------------------------------------------------| +| AlexNet | 2020-4_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic, 2020-4_PL2_FP11_AlexNet_GoogleNet_Generic | 2020-4_RC_FP16_AlexNet_GoogleNet_Generic, 2020-4_RC_FP11_AlexNet_GoogleNet_Generic | +| GoogleNet v1 | 2020-4_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic, 2020-4_PL2_FP11_AlexNet_GoogleNet_Generic | 2020-4_RC_FP16_AlexNet_GoogleNet_Generic, 2020-4_RC_FP11_AlexNet_GoogleNet_Generic | +| VGG-16 | 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_InceptionV1_SqueezeNet_TinyYolo_VGG, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | +| VGG-19 | 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_InceptionV1_SqueezeNet_TinyYolo_VGG, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | +| SqueezeNet v 1.0 | 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG, 2020-4_PL2_FP11_SqueezeNet | 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3, 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3 | +| SqueezeNet v 1.1 | 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG, 2020-4_PL2_FP11_SqueezeNet | 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3, 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3 | +| ResNet-18 | 2020-4_PL2_FP16_ResNet_YoloV3, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_ResNet_YoloV3, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | +| ResNet-50 | 2020-4_PL2_FP16_ResNet_YoloV3, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_ResNet_YoloV3, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | +| ResNet-101 | 2020-4_PL2_FP16_ResNet_YoloV3, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_ResNet_YoloV3, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | +| ResNet-152 | 2020-4_PL2_FP16_ResNet_YoloV3, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_ResNet_YoloV3, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | +| MobileNet (Caffe) | 2020-4_PL2_FP16_MobileNet_Clamp, 2020-4_PL2_FP11_MobileNet_Clamp | 2020-4_RC_FP16_MobileNet_Clamp, 2020-4_RC_FP11_MobileNet_Clamp | +| MobileNet (TensorFlow) | 2020-4_PL2_FP16_MobileNet_Clamp, 2020-4_PL2_FP11_MobileNet_Clamp | 2020-4_RC_FP16_MobileNet_Clamp, 2020-4_RC_FP11_MobileNet_Clamp| +| SqueezeNet-based variant of the SSD* | 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG, 2020-4_PL2_FP11_SqueezeNet | 2020-4_RC_FP16_InceptionV1_SqueezeNet_TinyYolo_VGG, 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3 | +| ResNet-based variant of SSD | 2020-4_PL2_FP16_ResNet_YoloV3, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_ResNet_YoloV3, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | +| RMNet | 2020-4_PL2_FP16_RMNet, 2020-4_PL2_FP11_RMNet | 2020-4_RC_FP16_RMNet, 2020-4_RC_FP11_RMNet | +| Yolo v3 | 2020-4_PL2_FP16_ResNet_YoloV3, 2020-4_PL2_FP11_YoloV3_ELU | 2020-4_RC_FP16_ResNet_YoloV3, 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3 | + + +In addition to the list above, arbitrary topologies having big continues subgraphs consisting of layers supported by FPGA plugin are recommended to be executed on FPGA plugin. + +## Bitstreams that are Optimal to Use with the Intel's Pre-Trained Models + +The table below provides you with a list of Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) bitstreams that are optimal to use for the Intel's pre-trained models. + +
+ Click to expand/collapse the table + +| Model Name | FP11 Bitstreams | FP16 Bitstreams | +| :--- | :--- | :--- | +| action-recognition-0001-decoder | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| action-recognition-0001-encoder | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | +| age-gender-recognition-retail-0013 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| asl-recognition-0004 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| driver-action-recognition-adas-0002-decoder | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| driver-action-recognition-adas-0002-encoder | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| emotions-recognition-retail-0003 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | +| face-detection-0100 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| face-detection-0102 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| face-detection-0104 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| face-detection-0105 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| face-detection-0106 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | +| face-detection-adas-0001 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| face-detection-adas-binary-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| face-detection-retail-0004 | 2020-3_PL2_FP11_TinyYolo_SSD300.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| face-detection-retail-0005 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| face-reidentification-retail-0095 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| facial-landmarks-35-adas-0002 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| faster-rcnn-resnet101-coco-sparse-60-0001 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| gaze-estimation-adas-0002 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| handwritten-japanese-recognition-0001 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | +| handwritten-score-recognition-0003 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | +| head-pose-estimation-adas-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| human-pose-estimation-0001 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| icnet-camvid-ava-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| icnet-camvid-ava-sparse-30-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| icnet-camvid-ava-sparse-60-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| image-retrieval-0001 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| instance-segmentation-security-0010 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | +| instance-segmentation-security-0050 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | +| instance-segmentation-security-0083 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| instance-segmentation-security-1025 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| landmarks-regression-retail-0009 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| license-plate-recognition-barrier-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | +| pedestrian-and-vehicle-detector-adas-0001 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| pedestrian-detection-adas-0002 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| pedestrian-detection-adas-binary-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| person-attributes-recognition-crossroad-0230 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| person-detection-action-recognition-0005 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| person-detection-action-recognition-0006 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| person-detection-action-recognition-teacher-0002 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| person-detection-asl-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| person-detection-raisinghand-recognition-0001 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| person-detection-retail-0002 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| person-detection-retail-0013 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| person-reidentification-retail-0031 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_ELU.aocx | +| person-reidentification-retail-0248 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| person-reidentification-retail-0249 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| person-reidentification-retail-0300 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| person-vehicle-bike-detection-crossroad-0078 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_ELU.aocx | +| person-vehicle-bike-detection-crossroad-1016 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| product-detection-0001 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| resnet18-xnor-binary-onnx-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_RMNet.aocx | +| resnet50-binary-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| road-segmentation-adas-0001 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| semantic-segmentation-adas-0001 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| single-image-super-resolution-1032 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_RMNet.aocx | +| single-image-super-resolution-1033 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_RMNet.aocx | +| text-detection-0003 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| text-detection-0004 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| text-image-super-resolution-0001 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_RMNet.aocx | +| text-recognition-0012 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| text-spotting-0002-detector | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | +| text-spotting-0002-recognizer-decoder | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| text-spotting-0002-recognizer-encoder | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | +| unet-camvid-onnx-0001 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| vehicle-attributes-recognition-barrier-0039 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | +| vehicle-detection-adas-0002 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | +| vehicle-detection-adas-binary-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | +| vehicle-license-plate-detection-barrier-0106 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | +| yolo-v2-ava-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | +| yolo-v2-ava-sparse-35-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | +| yolo-v2-ava-sparse-70-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | +| yolo-v2-tiny-ava-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | +| yolo-v2-tiny-ava-sparse-30-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | +| yolo-v2-tiny-ava-sparse-60-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | + +
+ +## Translate from Architecture to FPGA Bitstream Files + +Various FPGA bitstreams that support CNN are available in the OpenVINO™ toolkit package for FPGA. + +To select the correct bitstream (`.aocx`) file for an architecture, select a network (for example, Resnet-18) from the table above for either the Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Speed Grade 1), Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Speed Grade 2) or the Intel® Programmable Acceleration Card (PAC) with Intel® Arria® 10 GX FPGA and note the corresponding architecture. + +The following table describes several parameters that might help you to select the proper bitstream for your needs: + +| Name | Board | Precision | LRN Support | Leaky ReLU Support | PReLU Support | Clamp Support | ELU Support | +|:------------------------------------------|:--------------------------------------------------------------------------------|:----------|:------------|:-------------------|:--------------|:--------------|:------------| +| 2020-4_PL2_FP11_AlexNet_GoogleNet_Generic | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | true | true | true | false | false | +| 2020-4_PL2_FP11_SqueezeNet | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | true | true | false | false | +| 2020-4_PL2_FP11_MobileNet_Clamp | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | true | true | true | false | +| 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | false | false | false | false | +| 2020-4_PL2_FP11_RMNet | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | true | true | false | true | +| 2020-4_PL2_FP11_TinyYolo_SSD300 | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | true | true | true | false | false | +| 2020-4_PL2_FP11_YoloV3_ELU | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | true | true | false | true | +| 2020-4_PL2_FP11_Streaming_InternalUseOnly | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | false | false | false | false | +| 2020-4_PL2_FP11_Streaming_Slicing_InternalUseOnly | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | false | false | false | false | +| 2020-4_PL2_FP11_SwishExcitation | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | false | false | false | false | +| 2020-4_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | true | true | true | false | false | +| 2020-4_PL2_FP16_ELU | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | false | true | true | false | true | +| 2020-4_PL2_FP16_MobileNet_Clamp | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | false | true | true | true | false | +| 2020-4_PL2_FP16_ResNet_YoloV3 | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | false | true | true | false | false | +| 2020-4_PL2_FP16_RMNet | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | false | true | true | false | true | +| 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | false | true | true | false | false | +| 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | false | false | false | false | false | +| 2020-4_RC_FP11_AlexNet_GoogleNet_Generic | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | true | true | true | false | false | +| 2020-4_RC_FP11_RMNet | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | false | true | true | false | true | +| 2020-4_RC_FP11_Streaming_InternalUseOnly | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | true | false | false | false | false | +| 2020-4_RC_FP11_Streaming_Slicing_InternalUseOnly | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | true | false | false | false | false | +| 2020-4_RC_FP11_ELU | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | false | true | true | false | true | +| 2020-4_RC_FP11_SwishExcitation | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | false | false | false | false | false | +| 2020-4_RC_FP11_InceptionV1_ResNet_SqueezeNet_TinyYolo_YoloV3 | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | false | true | true | false | false | +| 2020-4_RC_FP11_MobileNet_Clamp | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | false | true | true | true | false | +| 2020-4_RC_FP16_AlexNet_GoogleNet_Generic | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | true | true | true | false | false | +| 2020-4_RC_FP16_InceptionV1_SqueezeNet_TinyYolo_VGG | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | false | true | true | false | false | +| 2020-4_RC_FP16_RMNet | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | false | true | true | false | true | +| 2020-4_RC_FP16_SwishExcitation | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | false | false | false | false | false | +| 2020-4_RC_FP16_MobileNet_Clamp | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | false | true | true | true | false | +| 2020-4_RC_FP16_ResNet_YoloV3 | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | false | true | true | false | false | +| 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3 | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | false | true | true | false | false | + +## Set Environment for Running the FPGA Plugin + +To make the FPGA plugin run directly or through the heterogeneous plugin, set up the environment: +1. Set up environment to access Intel® FPGA RTE for OpenCL: +``` +source /opt/altera/aocl-pro-rte/aclrte-linux64/init_opencl.sh +``` +2. Set the following environment variable and program the board with a DLA bitstream. Programming of the board is not supported during runtime and must be done before running an application. + + | Variable | Setting | + | :----------------------------------| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| + | ACL_PCIE_USE_JTAG_PROGRAMMING | Set this variable to a value of 1 to force FPGA reprogramming using JTAG | + +## Analyzing Heterogeneous Execution + +Besides generation of .dot files, you can use the error listening mechanism: + +```cpp +class FPGA_ErrorListener : public InferenceEngine::IErrorListener +{ +public: + virtual void onError(const char *msg) noexcept override { + std::cout << msg; + } +}; +... +FPGA_ErrorListener err_listener; +core.SetLogCallback(err_listener); // will be used for FPGA device as well +``` +If during network loading some layers are decided to be executed on a fallback plugin, the following message is printed: + +```cpp +Layer (Name: detection_out, Type: DetectionOutput) is not supported: + custom or unknown. + Has (3) sets of inputs, must be 1, or 2. + Input dimensions (2) should be 4. +``` + +## Multiple FPGA Devices Support + +The Inference Engine FPGA plugin provides an ability to load different networks on multiple FPGA devices. For example, to load two networks AlexNet and MobileNet v2 on two different FPGA devices, follow the steps below: + +1. Program each FGPA device with a corresponding bitstream: +```bash +aocl program acl0 2019R3_PV_PL1_FP16_AlexNet_GoogleNet_InceptionV1_SSD300_Generic.aocx +``` +```bash +aocl program acl1 2019R3_PV_PL1_FP16_MobileNet_Clamp.aocx +``` +For more information about bitstream programming instructions, refer to [Installation Guide for Linux* with Support for FPGA](Supported_Devices.md) +2. All FPGA devices are enumerated with unique ID starting from `0`. By default, all networks are loaded to the default +device with ID `0`. If you want to load a network on a particular non-default device, specify the `KEY_DEVICE_ID` +parameter for C++ and `DEVICE_ID` parameter for Python\*. +The following code snippets demonstrates how to load the AlexNet network on the FPGA device with ID `0` and the +MobileNet v2 network on the device with ID `1`: + * With C++: +```cpp +InferenceEngine::Core core; + +// Load AlexNet network on the first FPGA device programmed with bitstream supporting AlexNet +auto alexnetNetwork = core.ReadNetwork("alexnet.xml"); +auto exeNetwork1 = core.LoadNetwork(alexnetNetwork, "FPGA.0"); + +// Load MobileNet network on the second FPGA device programmed with MobileNet bitstream +auto mobilenetNetwork = core.ReadNetwork("mobilenet_v2.xml"); +auto exeNetwork2 = core.LoadNetwork(mobilenetNetwork, "FPGA", { { KEY_DEVICE_ID, "1" } }); +``` + * With Python: +```python +# Load AlexNet network on the first FPGA device programmed with bitstream supporting AlexNet +net1 = IENetwork(model="alexnet.xml", weights="alexnet.bin") +plugin.load(network=net1, config={"DEVICE_ID": "0"}) + +# Load MobileNet network on the second FPGA device programmed with MobileNet bitstream +net2 = IENetwork(model="mobilenet_v2.xml", weights="mobilenet_v2.bin") +plugin.load(network=net2, config={"DEVICE_ID": "1"}) +``` +Note that you have to use asynchronous infer requests to utilize several FPGA devices, otherwise the execution on devices is performed sequentially. + +## Import and Export Network Flow + +Since the 2019 R4 release, FPGA and HETERO plugins support the export and import flow, which allows to export a compiled network from a plugin to a binary blob by running the command below: + +```bash +$ ./compile_tool -m resnet.xml -DLA_ARCH_NAME 4x2x16x32_fp16_sb9408_fcd1024_actk4_poolk4_normk1_owk2_image300x300x8192_mbfr -d HETERO:FPGA,CPU +Inference Engine: + API version ............ 2.1 + Build .................. 6db44e09a795cb277a63275ea1395bfcb88e46ac + Description ....... API +Done +``` + +Once the command is executed, the binary blob named `resnet.blob` is created at the working directory. Refer to the [Compile tool](../../../inference-engine/tools/compile_tool/README.md) documentation for more details. + +A compiled binary blob can be later imported via `InferenceEngine::Core::Import`: + +```cpp +InferenceEngine::Core core; +std::ifstream strm("resnet.blob"); +auto execNetwork = core.Import(strm); +``` + +## How to Interpret Performance Counters + +As a result of collecting performance counters using InferenceEngine::InferRequest::GetPerformanceCounts you can find out performance data about execution on FPGA, pre-processing and post-processing data and data transferring from/to FPGA card. + +If network is sliced to two parts that are executed on CPU, you can find performance data about Intel® MKL-DNN kernels, their types, and other useful information. + +## Limitations of the FPGA Support for CNN + +The Inference Engine FPGA plugin has limitations on network topologies, kernel parameters, and batch size. + +* Depending on the bitstream loaded on the target device, the FPGA performs calculations with precision rates ranging from FP11 to FP16. This might have accuracy implications. Use the [Accuracy Checker](@ref omz_tools_accuracy_checker_README) to verify the network accuracy on the validation data set. +* Networks that have many CNN layers that are not supported on FPGA stayed in topologies between supported layers might lead to dividing of graph to many subgraphs that might lead to `CL_OUT_OF_HOST_MEMORY` error. These topologies are not FPGA friendly for this release. +* When you use the heterogeneous plugin, the affinity and distribution of nodes by devices depends on the FPGA bitstream that you use. Some layers might not be supported by a bitstream or parameters of the layer are not supported by the bitstream. + +## See Also +* [Supported Devices](Supported_Devices.md) diff --git a/docs/IE_DG/supported_plugins/GNA.md b/docs/IE_DG/supported_plugins/GNA.md new file mode 100644 index 00000000000000..a51cd47ffdce03 --- /dev/null +++ b/docs/IE_DG/supported_plugins/GNA.md @@ -0,0 +1,166 @@ +# GNA Plugin {#openvino_docs_IE_DG_supported_plugins_GNA} + +## Introducing the GNA Plugin + +Intel® Gaussian & Neural Accelerator is a low-power neural coprocessor for continuous inference at the edge. + +Intel® GNA is not intended to replace classic inference devices such as +CPU, graphics processing unit (GPU), or vision processing unit (VPU) . It is designed for offloading +continuous inference workloads including but not limited to noise reduction or speech recognition +to save power and free CPU resources. + +The GNA plugin provides a way to run inference on Intel® GNA, as well as in the software execution mode on CPU. + +## Devices with Intel® GNA + +Devices with Intel® GNA support: + +* [Intel® Speech Enabling Developer Kit](https://www.intel.com/content/www/us/en/support/articles/000026156/boards-and-kits/smart-home.html) + +* [Amazon Alexa* Premium Far-Field Developer Kit](https://developer.amazon.com/en-US/alexa/alexa-voice-service/dev-kits/amazon-premium-voice) + +* [Gemini Lake](https://ark.intel.com/content/www/us/en/ark/products/codename/83915/gemini-lake.html): + - Intel® Pentium® Silver J5005 Processor + - Intel® Pentium® Silver N5000 Processor + - Intel® Celeron® J4005 Processor + - Intel® Celeron® J4105 Processor + - Intel® Celeron® Processor N4100 + - Intel® Celeron® Processor N4000 + +* [Cannon Lake](https://ark.intel.com/content/www/us/en/ark/products/136863/intel-core-i3-8121u-processor-4m-cache-up-to-3-20-ghz.html): +Intel® Core™ i3-8121U Processor + +* [Ice Lake](https://ark.intel.com/content/www/us/en/ark/products/codename/74979/ice-lake.html): + - Intel® Core™ i7-1065G7 Processor + - Intel® Core™ i7-1060G7 Processor + - Intel® Core™ i5-1035G4 Processor + - Intel® Core™ i5-1035G7 Processor + - Intel® Core™ i5-1035G1 Processor + - Intel® Core™ i5-1030G7 Processor + - Intel® Core™ i5-1030G4 Processor + - Intel® Core™ i3-1005G1 Processor + - Intel® Core™ i3-1000G1 Processor + - Intel® Core™ i3-1000G4 Processor + +> **NOTE**: On platforms where Intel® GNA is not enabled in the BIOS, the driver cannot be installed, so the GNA plugin uses the software emulation mode only. + +## Drivers and Dependencies + +Intel® GNA hardware requires a driver to be installed on the system. + +* Linux\* OS: +[Download Intel® GNA driver for Ubuntu Linux 18.04.3 LTS (with HWE Kernel version 5.0+)](https://download.01.org/opencv/drivers/gna/) + +* Windows\* OS: +Intel® GNA driver for Windows is available through Windows Update\* + +## Models and Layers Limitations + +Because of specifics of hardware architecture, Intel® GNA supports a limited set of layers, their kinds and combinations. +For example, you should not expect the GNA Plugin to be able to run computer vision models, except those specifically adapted for the GNA Plugin, because the plugin does not fully support +2D convolutions. + +The list of supported layers can be found +[here](Supported_Devices.md) (see the GNA column of Supported Layers section). +Limitations include: + +- Only 1D convolutions (in the models converted from [Kaldi](../../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md) framework) are natively supported +- The number of output channels for convolutions must be a multiple of 4 +- Permute layer support is limited to the cases where no data reordering is needed, or when reordering is happening for 2 dimensions, at least one of which is not greater than 8 +- Power layer only supports the power parameter equal to 1 + +#### Experimental Support for 2D Convolutions + +The Intel® GNA hardware natively supports only 1D convolution. + +However, 2D convolutions can be mapped to 1D when a convolution kernel moves in a single direction. Such a transformation is performed by the GNA Plugin for Kaldi `nnet1` convolution. From this perspective, the Intel® GNA hardware convolution operation accepts a `NHWC` input and produces `NHWC` output. Because OpenVINO™ only supports the `NCHW` layout, it may be necessary to insert `Permute` layers before or after convolutions. + +For example, the Kaldi model optimizer inserts such a permute after convolution for the [rm_cnn4a network](https://download.01.org/openvinotoolkit/models_contrib/speech/kaldi/rm_cnn4a_smbr/). This `Permute` layer is automatically removed by the GNA Plugin, because the Intel® GNA hardware convolution layer already produces the required `NHWC` result. + +## Operation Precision + +Intel® GNA essentially operates in the low-precision mode, which represents a mix of 8-bit (`I8`), 16-bit (`I16`), and 32-bit (`I32`) integer computations, so compared to 32-bit floating point (`FP32`) results – for example, calculated on CPU using Inference Engine [CPU Plugin](CPU.md) – outputs calculated using reduced integer precision are different from the scores calculated using floating point. + +Unlike other plugins supporting low-precision execution, the GNA plugin calculates quantization factors at the model loading time, so a model can run without calibration. + +## Execution Modes + +| Mode | Description | +| :---------------------------------| :---------------------------------------------------------| +| `GNA_AUTO` | Uses Intel® GNA if available, otherwise uses software execution mode on CPU. | +| `GNA_HW` | Uses Intel® GNA if available, otherwise raises an error. | +| `GNA_SW` | *Deprecated*. Executes the GNA-compiled graph on CPU performing calculations in the same precision as the Intel® GNA, but not in the bit-exact mode. | +| `GNA_SW_EXACT` | Executes the GNA-compiled graph on CPU performing calculations in the same precision as the Intel® GNA in the bit-exact mode. | +| `GNA_SW_FP32` | Executes the GNA-compiled graph on CPU but substitutes parameters and calculations from low precision to floating point (`FP32`). | + +## Supported Configuration Parameters + +The plugin supports the configuration parameters listed below. +The parameters are passed as `std::map` on `InferenceEngine::Core::LoadNetwork` or `InferenceEngine::SetConfig`. + +The parameter `KEY_GNA_DEVICE_MODE` can also be changed at run time using `InferenceEngine::ExecutableNetwork::SetConfig` (for any values excluding `GNA_SW_FP32`). This allows switching the +execution between software emulation mode and hardware emulation mode after the model is loaded. + +The parameter names below correspond to their usage through API keys, such as `GNAConfigParams::KEY_GNA_DEVICE_MODE` or `PluginConfigParams::KEY_PERF_COUNT`. +When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix. + +| Parameter Name | Parameter Values | Default Value | Description | +| :---------------------------------| :---------------------------------------------------------| :-----------| :------------------------------------------------------------------------| +| `KEY_GNA_COMPACT_MODE` | `YES`/`NO` | `YES` | Reuse I/O buffers to save space (makes debugging harder) | +| `KEY_GNA_SCALE_FACTOR` | `FP32` number | 1.0 | Scale factor to use for input quantization | +| `KEY_GNA_DEVICE_MODE` | `GNA_AUTO`/`GNA_HW`/`GNA_SW_EXACT`/`GNA_SW_FP32` | `GNA_AUTO` | One of the modes described Execution Models | +| `KEY_GNA_FIRMWARE_MODEL_IMAGE` | `std::string` | `""` | Name for embedded model binary dump file | +| `KEY_GNA_PRECISION` | `I16`/`I8` | `I16` | Hint to GNA plugin: preferred integer weight resolution for quantization | +| `KEY_PERF_COUNT` | `YES`/`NO` | `NO` | Turn on performance counters reporting | +| `KEY_GNA_LIB_N_THREADS` | 1-127 integer number | 1 | Sets the number of GNA accelerator library worker threads used for inference computation in software modes + +## How to Interpret Performance Counters + +As a result of collecting performance counters using `InferenceEngine::IInferencePlugin::GetPerformanceCounts`, you can find various performance data about execution on GNA. +Returned map stores a counter description as a key, counter value is stored in the `realTime_uSec` field of the `InferenceEngineProfileInfo` structure. Current GNA implementation calculates counters for the whole utterance scoring and does not provide per-layer information. API allows to retrieve counter units in cycles, but they can be converted to seconds as follows: + +``` +seconds = cycles / frequency +``` + +Refer to the table below to learn about the frequency of Intel® GNA inside a particular processor. +Processor | Frequency of Intel® GNA +---|--- +Intel® Ice Lake processors| 400MHz +Intel® Core™ i3-8121U processor| 400MHz +Intel® Gemini Lake processors | 200MHz + +Performance counters provided for the time being: + +* Scoring request performance results + * Number of total cycles spent on scoring in hardware (including compute and memory stall cycles) + * Number of stall cycles spent in hardware + +## Multithreading Support in GNA Plugin + +The GNA plugin supports the following configuration parameters for multithreading management: + +* `KEY_GNA_LIB_N_THREADS` + + By default, the GNA plugin uses one worker thread for inference computations. This parameter allows you to create up to 127 threads for software modes. + +> **NOTE:** Multithreading mode does not guarantee the same computation order as the order of issuing. Additionally, in this case, software modes do not implement any serializations. + +## Network Batch Size + +Intel® GNA plugin supports the processing of context-windowed speech frames in batches of 1-8 frames in one +input blob using `InferenceEngine::ICNNNetwork::setBatchSize`. Increasing batch size only improves efficiency of `Fully Connected` layers. + +> **NOTE**: For networks with `Convolutional`, `LSTM`, or `Memory` layers, the only supported batch size is 1. + +## Compatibility with Heterogeneous Plugin + +Heterogeneous plugin was tested with the Intel® GNA as a primary device and CPU as a secondary device. To run inference of networks with layers unsupported by the GNA plugin (for example, Softmax), use the Heterogeneous plugin with the `HETERO:GNA,CPU` configuration. For the list of supported networks, see the [Supported Frameworks](#supported-frameworks). + +> **NOTE:** Due to limitation of the Intel® GNA backend library, heterogenous support is limited to cases where in the resulted sliced graph, only one subgraph is scheduled to run on GNA\_HW or GNA\_SW devices. + +## See Also + +* [Supported Devices](Supported_Devices.md) +* [Converting Model](../../MO_DG/prepare_model/convert_model/Converting_Model.md) +* [Convert model from Kaldi](../../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md) diff --git a/docs/IE_DG/supported_plugins/GPU_RemoteBlob_API.md b/docs/IE_DG/supported_plugins/GPU_RemoteBlob_API.md new file mode 100644 index 00000000000000..2518bb80d6b814 --- /dev/null +++ b/docs/IE_DG/supported_plugins/GPU_RemoteBlob_API.md @@ -0,0 +1,227 @@ +Remote Blob API of GPU Plugin {#openvino_docs_IE_DG_supported_plugins_GPU_RemoteBlob_API} +================================ + +The GPU plugin implementation of the `RemoteContext` and `RemoteBlob` interfaces supports GPU +pipeline developers who need video memory sharing and interoperability with existing native APIs +such as OpenCL\*, Microsoft DirectX\*, or VAAPI\*. +Using these interfaces allows to avoid any memory copy overhead when plugging the OpenVINO™ inference +into an existing GPU pipeline. It also enables OpenCL kernels participating in the pipeline to become +native buffer consumers or producers of the OpenVINO™ inference. +Since the GPU plugin works on top of the clDNN library, the functionality above is also implemented +using OpenCL and its sharing extensions provided by Intel®. + +There are two interoperability scenarios that are supported for the Remote Blob API: + +* GPU plugin context and memory objects can be constructed from low-level device, display, or memory +handles and used to create the OpenVINO™ `ExecutableNetwork` or `Blob` class. +* OpenCL context or buffer handles can be obtained from existing GPU plugin objects, and used in OpenCL processing. + +Class and function declarations for the API are defined in the following files: +* Windows\*: `gpu/gpu_context_api_ocl.hpp` and `gpu/gpu_context_api_dx.hpp` +* Linux\*: `gpu/gpu_context_api_ocl.hpp` and `gpu/gpu_context_api_va.hpp` + +The most common way to enable the interaction of your application with the Remote Blob API is to use user-side utility classes +and functions that consume or produce native handles directly. + +## Execution Context User-Side Wrappers + +GPU plugin classes that implement the `RemoteContext` interface are responsible for context sharing. +Obtaining a pointer to a context object is the first step of sharing pipeline objects. +The context object of the GPU plugin directly wraps OpenCL context, setting a scope for sharing +`ExecutableNetwork` and `RemoteBlob` objects. +To create such objects within user context, explicitly provide the context to the plugin using the +`make_shared_context()` overloaded function. Depending on the platform, the function accepts the +`cl_context` handle, the pointer to the `ID3D11Device` interface, or the `VADisplay` handle, and +returns a smart pointer to the `RemoteContext` plugin object. + +If you do not provide any user context, the plugin uses its default internal context. +The plugin attempts to use the same internal context object as long as plugin options are kept the same. +Therefore, all ExecutableNetwork objects created during this time share the same context. +Once the plugin options are changed, the internal context is replaced by the new one. + +To request the current default context of the plugin, call the `GetDefaultContext()` method of the core engine. +To request the internal context of the given `ExecutableNetwork`, use the `GetContext()` method. + +## Shared Blob User-Side Wrappers + +The classes that implement the `RemoteBlob` interface both are wrappers for native API +memory handles (which can be obtained from them at any moment) and act just like regular OpenVINO™ +`Blob` objects. + +Once you obtain the context, you can use it to compile a new `ExecutableNetwork` or create `RemoteBlob` +objects. +For network compilation, use a dedicated flavor of `LoadNetwork()`, which accepts the context as an +additional parameter. + +To create a shared blob from a native memory handle, use `make_shared_blob()` overloaded functions +that can accept the `cl::Buffer`, `cl::Image2D`, `cl_mem` handles, and either `ID3D11Buffer`, +`ID3D11Texture2D` pointers or the `VASurfaceID` handle. +All `make_shared_blob()` flavors return a smart pointer to the `Blob` object, which can be directly +passed to the `SetBlob() `method of an inference request object. + +## Direct NV12 video surface input + +To support the direct consumption of a hardware video decoder output, plugin accepts two-plane video +surfaces as arguments for the `make_shared_blob_nv12()` function, which creates an `NV12Blob` object +and returns a smart pointer to it, which is cast to `Blob::Ptr`. + +To ensure that the plugin generates the correct execution graph for the NV12 dual-plane input, set +the `CLDNNConfigParams::KEY_CLDNN_NV12_TWO_INPUTS` plugin configuration flag to `PluginConfigParams::YES`. + +## Low-Level Methods and Their Parameter Description + +The high-level wrappers above bring a direct dependency on native APIs to the user program. +If you want to avoid the dependency, you still can directly use the `CreateContext()`, +`CreateBlob()`, and `getParams()` methods. +On this level, native handles are re-interpreted as void pointers and all arguments are passed +using `std::map` containers that are filled with `std::string, InferenceEngine::Parameter` pairs. +Two types of map entries are possible: descriptor and container. The first map entry is a +descriptor, which sets the expected structure and possible parameter values of the map. + +**Parameter Map Entries** + +| Key Name | Description and Possible Parameter Values | +|----------------|---------------------------------------------------------------------| +| `CONTEXT_TYPE` | Describes the type of the shared context in a map. Can be `OCL` (for pure OpenCL context) or `VA_SHARED` (for context shared with a video decoding device). | +| `OCL_CONTEXT` | Contains the OpenCL context handle. | +| `VA_DEVICE` | Contains the native video decoding device handle. Can be `VADisplay` or `ID3D11Device` (a pointer). | +| `SHARED_MEM_TYPE` | Describes the type of the shared memory buffer in a map. Can be `OCL_BUFFER` (clBuffer), `OCL_IMAGE2D` (clImage2D), `VA_SURFACE()`, or `DX_BUFFER`. | +| `MEM_HANDLE` | Contains the OpenCL memory handle. | +| `DEV_OBJECT_HANDLE` | Contains the native video decoder surface handle. | +| `VA_PLANE` | Contains the NV12 video decoder surface plane index. Can be `0` or `1`. | + +> **NOTE**: To initialize the entry key and value, use the `GPU_PARAM_KEY()` or `GPU_PARAM_VALUE()` macro. + +## Examples + +Refer to the sections below to see pseudo-code of usage examples. + +> **NOTE**: For low-level parameter usage examples, see the source code of user-side wrappers from the include files mentioned above. + +### OpenCL Kernel Execution on a Shared Buffer + +This example uses the OpenCL context obtained from an executable network object. + +```cpp +#define CL_HPP_MINIMUM_OPENCL_VERSION 120 +#define CL_HPP_TARGET_OPENCL_VERSION 120 + +#include +#include + +... + +// initialize the plugin and load the network +InferenceEngine::Core ie; +auto exec_net = ie.LoadNetwork(net, "GPU", config); + +// obtain the RemoteContext pointer from the executable network object +auto cldnn_context = exec_net.GetContext(); +// obtain the OpenCL context handle from the RemoteContext, +// get device info and create a queue +cl::Context ctx = std::dynamic_pointer_cast(cldnn_context); +_device = cl::Device(_context.getInfo()[0].get(), true); +cl::CommandQueue _queue; +cl_command_queue_properties props = CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE; +_queue = cl::CommandQueue(_context, _device, props); + +// create the OpenCL buffer within the obtained context +cl::Buffer shared_buffer(ctx, CL_MEM_READ_WRITE, image_size * num_channels, NULL, &err); +// wrap the buffer into RemoteBlob +auto shared_blob = gpu::make_shared_blob(input_info->getTensorDesc(), cldnn_context, shared_buffer); + +... +// execute user kernel +cl::Kernel kernel(program, kernelName.c_str()); +kernel.setArg(0, shared_buffer); +queue.enqueueNDRangeKernel(kernel, + cl::NDRange(0), + cl::NDRange(image_size), + cl::NDRange(1), + 0, // wait events * + &profileEvent); +queue.finish(); +... + +// pass results to the inference +inf_req_shared.SetBlob(input_name, shared_blob); +inf_req_shared.Infer(); + +``` + +### Running GPU Plugin Inference within User-Supplied Shared Context + +```cpp +#define CL_HPP_MINIMUM_OPENCL_VERSION 120 +#define CL_HPP_TARGET_OPENCL_VERSION 120 + +#include +#include + +... + +cl::Context ctx = get_my_OpenCL_context(); + +// share the context with GPU plugin and compile ExecutableNetwork +auto remote_context = gpu::make_shared_context(ie, "GPU", ocl_instance->_context.get()); +auto exec_net_shared = ie.LoadNetwork(net, remote_context); +auto inf_req_shared = exec_net_shared.CreateInferRequest(); + +... +// do OpenCL processing stuff +... + +// run the inference +inf_req_shared.Infer(); + +``` +### Direct Consuming of the NV12 VAAPI Video Decoder Surface on Linux + +```cpp +#include +#include + +... + +// initialize the objects +CNNNetwork network = ie.ReadNetwork(xmlFileName, binFileName); + +... + +auto inputInfoItem = *inputInfo.begin(); +inputInfoItem.second->setPrecision(Precision::U8); +inputInfoItem.second->setLayout(Layout::NCHW); +inputInfoItem.second->getPreProcess().setColorFormat(ColorFormat::NV12); + +VADisplay disp = get_VA_Device(); +// create the shared context object +auto shared_va_context = gpu::make_shared_context(ie, "GPU", disp); +// compile network within a shared context +ExecutableNetwork executable_network = ie.LoadNetwork(network, + shared_va_context, + { { CLDNNConfigParams::KEY_CLDNN_NV12_TWO_INPUTS, + PluginConfigParams::YES } }); + +// decode/inference loop +for (int i = 0; i < nframes; i++) { + ... + // execute decoding and obtain decoded surface handle + decoder.DecodeFrame(); + VASurfaceID va_surface = decoder.get_VA_output_surface(); + ... + //wrap decoder output into RemoteBlobs and set it as inference input + auto nv12_blob = gpu::make_shared_blob_nv12(ieInHeight, + ieInWidth, + shared_va_context, + va_surface + ); + inferRequests[currentFrame].SetBlob(input_name, nv12_blob); + inferRequests[currentFrame].StartAsync(); + inferRequests[prevFrame].Wait(InferenceEngine::IInferRequest::WaitMode::RESULT_READY); +} +``` + +## See Also + +* InferenceEngine::Core +* InferenceEngine::RemoteBlob diff --git a/docs/IE_DG/supported_plugins/HDDL.md b/docs/IE_DG/supported_plugins/HDDL.md new file mode 100644 index 00000000000000..cc53925558e25e --- /dev/null +++ b/docs/IE_DG/supported_plugins/HDDL.md @@ -0,0 +1,39 @@ +# HDDL Plugin {#openvino_docs_IE_DG_supported_plugins_HDDL} + +## Introducing HDDL Plugin + +The Inference Engine HDDL plugin is developed for inference of neural networks on Intel® Vision Accelerator Design with Intel® Movidius™ VPUs which is designed for use cases those require large throughput of deep learning inference. It provides dozens amount of throughput as MYRIAD Plugin. + +## Installation on Linux* OS + +For installation instructions, refer to the [Installation Guide for Linux\*](VPU.md). + +## Installation on Windows* OS + +For installation instructions, refer to the [Installation Guide for Windows\*](Supported_Devices.md). + +## Supported networks + +For the "Supported Networks", please reference to [MYRIAD Plugin](MYRIAD.md) + +## Supported Configuration Parameters + +See VPU common configuration parameters for the [VPU Plugins](VPU.md). +When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix. + +In addition to common parameters for Myriad plugin and HDDL plugin, HDDL plugin accepts the following options: + +| Parameter Name | Parameter Values | Default | Description | +| :--- | :--- | :--- | :--- | +| KEY_PERF_COUNT | YES/NO | NO | Enable performance counter option. | +| KEY_VPU_HDDL_GRAPH_TAG | string | empty string | Allows to execute network on specified count of devices. | +| KEY_VPU_HDDL_STREAM_ID | string | empty string | Allows to execute inference on a specified device. | +| KEY_VPU_HDDL_DEVICE_TAG | string | empty string | Allows to allocate/deallocate networks on specified devices. | +| KEY_VPU_HDDL_BIND_DEVICE | YES/NO | NO | Whether the network should bind to a device. Refer to vpu_plugin_config.hpp. | +| KEY_VPU_HDDL_RUNTIME_PRIORITY | singed int | 0 | Specify the runtime priority of a device among all devices that running a same network Refer to vpu_plugin_config.hpp. | + +## See Also + +* [Supported Devices](Supported_Devices.md) +* [VPU Plugins](VPU.md) +* [MYRIAD Plugin](MYRIAD.md) diff --git a/docs/IE_DG/supported_plugins/HETERO.md b/docs/IE_DG/supported_plugins/HETERO.md new file mode 100644 index 00000000000000..6648150be614a9 --- /dev/null +++ b/docs/IE_DG/supported_plugins/HETERO.md @@ -0,0 +1,126 @@ +Heterogeneous Plugin {#openvino_docs_IE_DG_supported_plugins_HETERO} +======= + +## Introducing Heterogeneous Plugin + +The heterogeneous plugin enables computing for inference on one network on several devices. +Purposes to execute networks in heterogeneous mode +* To utilize accelerators power and calculate heaviest parts of network on accelerator and execute not supported layers on fallback devices like CPU +* To utilize all available hardware more efficiently during one inference + +The execution through heterogeneous plugin can be divided to two independent steps: +* Setting of affinity to layers (binding them to devices in InferenceEngine::ICNNNetwork) +* Loading a network to the Heterogeneous plugin, splitting the network to parts, and executing them through the plugin + +These steps are decoupled. The setting of affinity can be done automatically using fallback policy or in manual mode. + +The fallback automatic policy means greedy behavior and assigns all layers which can be executed on certain device on that device follow priorities. + +Some of the topologies are not friendly to heterogeneous execution on some devices or cannot be executed in such mode at all. +Example of such networks might be networks having activation layers which are not supported on primary device. +If transmitting of data from one part of network to another part in heterogeneous mode takes relatively much time, +then it is not much sense to execute them in heterogeneous mode on these devices. +In this case you can define heaviest part manually and set affinity thus way to avoid sending of data back and forth many times during one inference. + +## Annotation of Layers per Device and Default Fallback Policy +Default fallback policy decides which layer goes to which device automatically according to the support in dedicated plugins (FPGA, GPU, CPU, MYRIAD). + +Another way to annotate a network is setting affinity manually using CNNLayer::affinity field. This field accepts string values of devices like "CPU" or "FPGA". + +The fallback policy does not work if even one layer has an initialized affinity. The sequence should be calling of automating affinity settings and then fix manually. +```cpp +InferenceEngine::Core core +auto network = core.ReadNetwork("Model.xml"); + +// This example demonstrates how to perform default affinity initialization and then +// correct affinity manually for some layers +const std::string device = "HETERO:FPGA,CPU"; + +// QueryNetworkResult object contains map layer -> device +InferenceEngine::QueryNetworkResult res = core.QueryNetwork(network, device, { }); + +// update default affinities +res.supportedLayersMap["layerName"] = "CPU"; + +// set affinities to network +for (auto && layer : res.supportedLayersMap) { + network.getLayerByName(layer->first)->affinity = layer->second; +} + +// load network with affinities set before +auto executable_network = core.LoadNetwork(network, device); +``` + +If you rely on the default affinity distribution, you can avoid calling InferenceEngine::Core::QueryNetwork and just call InferenceEngine::Core::LoadNetwork instead: +```cpp +InferenceEngine::Core core +auto network = core.ReadNetwork("Model.xml"); +auto executable_network = core.LoadNetwork(network, "HETERO:FPGA,CPU"); +``` + + +## Details of Splitting Network and Execution +During loading of the network to heterogeneous plugin, network is divided to separate parts and loaded to dedicated plugins. +Intermediate blobs between these sub graphs are allocated automatically in the most efficient way. + +## Execution Precision +Precision for inference in heterogeneous plugin is defined by +* Precision of IR. +* Ability of final plugins to execute in precision defined in IR + +Examples: +* If you want to execute GPU with CPU fallback with FP16 on GPU, you need to use only FP16 IR. +Weight are converted from FP16 to FP32 automatically for execution on CPU by heterogeneous plugin automatically. +* If you want to execute on FPGA with CPU fallback, you can use any precision for IR. The execution on FPGA is defined by bitstream, +the execution on CPU happens in FP32. + +Samples can be used with the following command: + +```sh +./object_detection_sample_ssd -m /ModelSSD.xml -i /picture.jpg -d HETERO:FPGA,CPU +``` +where: +- `HETERO` stands for heterogeneous plugin +- `FPGA,CPU` points to fallback policy with priority on FPGA and fallback to CPU + +You can point more than two devices: `-d HETERO:FPGA,GPU,CPU` + +## Analyzing Heterogeneous Execution +After enabling of KEY_HETERO_DUMP_GRAPH_DOT config key, you can dump GraphViz* `.dot` files with annotations of devices per layer. + +Heterogeneous plugin can generate two files: +* `hetero_affinity_.dot` - annotation of affinities per layer. This file is written to the disk only if default fallback policy was executed +* `hetero_subgraphs_.dot` - annotation of affinities per graph. This file is written to the disk during execution of ICNNNetwork::LoadNetwork() for heterogeneous plugin + +```cpp +#include "ie_plugin_config.hpp" +#include "hetero/hetero_plugin_config.hpp" +using namespace InferenceEngine::PluginConfigParams; +using namespace InferenceEngine::HeteroConfigParams; + +... +InferenceEngine::Core core; +core.SetConfig({ { KEY_HETERO_DUMP_GRAPH_DOT, YES } }, "HETERO"); +``` + +You can use GraphViz* utility or converters to `.png` formats. On Ubuntu* operating system, you can use the following utilities: +* `sudo apt-get install xdot` +* `xdot hetero_subgraphs.dot` + + +You can use performance data (in samples, it is an option `-pc`) to get performance data on each subgraph. + +Here is an example of the output: for Googlenet v1 running on FPGA with fallback to CPU: +```cpp +subgraph1: 1. input preprocessing (mean data/FPGA):EXECUTED layerType: realTime: 129 cpu: 129 execType: +subgraph1: 2. input transfer to DDR:EXECUTED layerType: realTime: 201 cpu: 0 execType: +subgraph1: 3. FPGA execute time:EXECUTED layerType: realTime: 3808 cpu: 0 execType: +subgraph1: 4. output transfer from DDR:EXECUTED layerType: realTime: 55 cpu: 0 execType: +subgraph1: 5. FPGA output postprocessing:EXECUTED layerType: realTime: 7 cpu: 7 execType: +subgraph1: 6. copy to IE blob:EXECUTED layerType: realTime: 2 cpu: 2 execType: +subgraph2: out_prob: NOT_RUN layerType: Output realTime: 0 cpu: 0 execType: unknown +subgraph2: prob: EXECUTED layerType: SoftMax realTime: 10 cpu: 10 execType: ref +Total time: 4212 microseconds +``` +## See Also +* [Supported Devices](Supported_Devices.md) diff --git a/docs/IE_DG/supported_plugins/MULTI.md b/docs/IE_DG/supported_plugins/MULTI.md new file mode 100644 index 00000000000000..2d30a5e4322ea8 --- /dev/null +++ b/docs/IE_DG/supported_plugins/MULTI.md @@ -0,0 +1,160 @@ +# Multi-Device Plugin {#openvino_docs_IE_DG_supported_plugins_MULTI} + +## Introducing Multi-Device Execution + +Multi-Device plugin automatically assigns inference requests to available computational devices to execute the requests in parallel. +Potential gains are as follows +* Improved throughput that multiple devices can deliver (compared to single-device execution) +* More consistent performance, since the devices can now share the inference burden +(so that if one device is becoming too busy, another device can take more of the load) + +Notice that with multi-device the application logic left unchanged, so you don't need to explicitly load the network to every device, +create and balance the inference requests and so on. From the application point of view, this is just another device that handles the actual machinery. +The only thing that is required to leverage performance is to provide the multi-device (and hence the underlying devices) with enough inference requests to crunch. +For example if you were processing 4 cameras on the CPU (with 4 inference requests), you may now want to process more cameras (with more requests in flight) +to keep CPU+GPU busy via multi-device. + +The "setup" of multi-device can be described in three major steps: +* First is configuration of each device as usual (e.g. via conventional SetConfig method) +* Second is loading of a network to the Multi-Device plugin created on top of (prioritized) list of the configured devices. This is the only change that you need in your application. +* Finally, just like with any other ExecutableNetwork (resulted from LoadNetwork) you just create as many requests as needed to saturate the devices. +These steps are covered below in details. + +## Defining and Configuring the Multi-Device +Following the OpenVINO notions of "devices", the multi-device has a "MULTI" name. +The only configuration option for the multi-device is prioritized list of devices to use: + +| Parameter name | Parameter values | Default | Description | +| :--- | :--- | :--- | :----------------------------------------------------------------------------------------------------------------------------| +| "MULTI_DEVICE_PRIORITIES" | comma-separated device names with no spaces| N/A | Prioritized list of devices | + +You can use name of the configuration directly as a string, or use MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES from the multi/multi_device_config.hpp that defines the same string. + +Basically, there are three ways to specify the devices to be use by the "MULTI": +```cpp + Core ie; + //NEW IE-CENTRIC API, the "MULTI" plugin is (globally) pre-configured with the explicit option: + ie.SetConfig({{"MULTI_DEVICE_PRIORITIES", "HDDL,GPU"}}, "MULTI"); + ExecutableNetwork exec0 = ie.LoadNetwork(network, "MULTI", {}); + + //NEW IE-CENTRIC API, configuration of the "MULTI" is part of the network configuration (and hence specific to the network): + ExecutableNetwork exec1 = ie.LoadNetwork(network, "MULTI", {{"MULTI_DEVICE_PRIORITIES", "HDDL,GPU"}}); + //NEW IE-CENTRIC API, same as previous, but configuration of the "MULTI" is part of the name (so config is empty), also network-specific: + ExecutableNetwork exec2 = ie.LoadNetwork(network, "MULTI:HDDL,GPU", {}); + + //Similarly for the deprecated (plugin-centric) API + //for example globally pre-configuring the plugin with the explicit option: + //auto plugin0 = PluginDispatcher().getPluginByDevice("MULTI"); + //plugin0.SetConfig({{"MULTI_DEVICE_PRIORITIES", "HDDL,GPU"}}); + //ExecutableNetwork exec3 = plugin.LoadNetwork(network, {}); + // part of the config for the LoadNetwork or device name + //ExecutableNetwork exec4 = plugin0.LoadNetwork(network, {{"MULTI_DEVICE_PRIORITIES", "HDDL,GPU"}}); + // part of the device name + //auto plugin1 = PluginDispatcher().getPluginByDevice("MULTI:HDDL,GPU"); + //ExecutableNetwork exec5 = plugin1.LoadNetwork(network, {}); +``` +Notice that the priorities of the devices can be changed in real-time for the executable network: +```cpp + Core ie; + ExecutableNetwork exec = ie.LoadNetwork(network, "MULTI:HDDL,GPU", {}); + //... + exec.SetConfig({{"MULTI_DEVICE_PRIORITIES", "GPU,HDDL"}}); + // you can even exclude some device + exec.SetConfig({{"MULTI_DEVICE_PRIORITIES", "GPU"}}); + //... + // and then return it back + exec.SetConfig({{"MULTI_DEVICE_PRIORITIES", "GPU,HDDL"}}); + //but you cannot add new devices on the fly, the next line will trigger the following exception: + //[ ERROR ] [NOT_FOUND] You can only change device priorities but not add new devices with the Network's SetConfig(MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES. + //CPU device was not in the original device list! + exec.SetConfig({{"MULTI_DEVICE_PRIORITIES", "CPU,GPU,HDDL"}}); +``` +Finally, there is a way to specify number of requests that the multi-device will internally keep for each device. +Say if your original app was running 4 cameras with 4 inference requests now you would probably want to share these 4 requests between 2 devices used in the MULTI. The easiest way is to specify a number of requests for each device using parentheses: "MULTI:CPU(2),GPU(2)" and use the same 4 requests in your app. However, such an explicit configuration is not performance portable and hence not recommended. Instead, the better way is to configure the individual devices and query the resulting number of requests to be used in the application level (see [Configuring the Individual Devices and Creating the Multi-Device On Top](#configuring-the-individual-devices-and-creating-the-multi-device-on-top)). + +## Enumerating Available Devices +Inference Engine now features a dedicated API to enumerate devices and their capabilities. See [Hello Query Device C++ Sample](../../../inference-engine/samples/hello_query_device/README.md). This is example output of the sample (truncated to the devices' names only): + +```sh +./hello_query_device +Available devices: + Device: CPU +... + Device: GPU +... + Device: HDDL +``` +Simple programmatic way to enumerate the devices and use with the multi-device is as follows: +```cpp + Core ie; + std::string allDevices = "MULTI:"; + std::vector availableDevices = ie.GetAvailableDevices(); + for (auto && device : availableDevices) { + allDevices += device; + allDevices += ((device == availableDevices[availableDevices.size()-1]) ? "" : ","); + } + ExecutableNetwork exeNetwork = ie.LoadNetwork(cnnNetwork, allDevices, {}); +``` +Beyond trivial "CPU", "GPU", "HDDL" and so on, when multiple instances of a device are available the names are more qualified. +For example this is how two Intel® Movidius™ Myriad™ X sticks are listed with the hello_query_sample: +``` +... + Device: MYRIAD.1.2-ma2480 +... + Device: MYRIAD.1.4-ma2480 +``` +So the explicit configuration to use both would be "MULTI:MYRIAD.1.2-ma2480,MYRIAD.1.4-ma2480". +Accordingly, the code that loops over all available devices of "MYRIAD" type only is below: +```cpp + Core ie; + std::string allDevices = "MULTI:"; + std::vector myriadDevices = ie->GetMetric("MYRIAD", METRIC_KEY(myriadDevices))); + for (int i = 0; i < myriadDevices.size(); ++i) { + allDevices += std::string("MYRIAD.") + + myriadDevices[i] + + std::string(i < (myriadDevices.size() -1) ? "," : ""); + } + ExecutableNetwork exeNetwork = ie.LoadNetwork(cnnNetwork, allDevices, {}); +``` + + +## Configuring the Individual Devices and Creating the Multi-Device On Top +As discussed in the first section, you shall configure each individual device as usual and then just create the "MULTI" device on top: +```cpp +#include +// configure the HDDL device first +Core ie; +ie.SetConfig(hddl_config, "HDDL"); +// configure the GPU device +ie.SetConfig(gpu_config, "GPU"); +// load the network to the multi-device, while specifying the configuration (devices along with priorities): +ExecutableNetwork exeNetwork = ie.LoadNetwork(cnnNetwork, "MULTI", {{MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES, "HDDL,GPU"}}); +// new metric allows to query the optimal number of requests: +uint32_t nireq = exeNetwork.GetMetric(METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)).as(); +``` +Alternatively, you can combine all the individual device settings into single config and load that, allowing the multi-device plugin to parse and apply that to the right devices. See code example in the next section. + +Notice that while the performance of accelerators combines really well with multi-device, the CPU+GPU execution poses some performance caveats, as these devices share the power, bandwidth and other resources. For example it is recommended to enable the GPU throttling hint (which save another CPU thread for the CPU inference). +See section of the [Using the multi-device with OpenVINO samples and benchmarking the performance](#using-the-multi-device-with-openvino-samples-and-benchmarking-the-performance) below. + +## Querying the Optimal Number of Inference Requests +Notice that until R2 you had to calculate number of requests in your application for any device, e.g. you had to know that Intel® Vision Accelerator Design with Intel® Movidius™ VPUs required at least 32 inference requests to perform well. Now you can use the new GetMetric API to query the optimal number of requests. Similarly, when using the multi-device you don't need to sum over included devices yourself, you can query metric directly: +```cpp +// 'device_name' can be "MULTI:HDDL,GPU" to configure the multi-device to use HDDL and GPU +ExecutableNetwork exeNetwork = ie.LoadNetwork(cnnNetwork, device_name, full_config); +// new metric allows to query the optimal number of requests: +uint32_t nireq = exeNetwork.GetMetric(METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)).as(); +``` + +## Using the Multi-Device with OpenVINO Samples and Benchmarking the Performance +Notice that every OpenVINO sample that supports "-d" (which stays for "device") command-line option transparently accepts the multi-device. +The [Benchmark Application](../../../inference-engine/samples/benchmark_app/README.md) is the best reference to the optimal usage of the multi-device. As discussed multiple times earlier, you don't need to setup number of requests, CPU streams or threads as the application provides optimal out of the box performance. +Below is example command-line to evaluate HDDL+GPU performance with that: +```bash +$ ./benchmark_app –d MULTI:HDDL,GPU –m -i -niter 1000 +``` +Notice that you can use the FP16 IR to work with multi-device (as CPU automatically upconverts it to the fp32) and rest of devices support it naturally. +Also notice that no demos are (yet) fully optimized for the multi-device, by means of supporting the OPTIMAL_NUMBER_OF_INFER_REQUESTS metric, using the GPU streams/throttling, and so on. + +## See Also +* [Supported Devices](Supported_Devices.md) diff --git a/docs/IE_DG/supported_plugins/MYRIAD.md b/docs/IE_DG/supported_plugins/MYRIAD.md new file mode 100644 index 00000000000000..5fbee431ee1c92 --- /dev/null +++ b/docs/IE_DG/supported_plugins/MYRIAD.md @@ -0,0 +1,89 @@ +# MYRIAD Plugin {#openvino_docs_IE_DG_supported_plugins_MYRIAD} + +## Introducing MYRIAD Plugin + +The Inference Engine MYRIAD plugin is developed for inference of neural networks on Intel® Movidius™ Neural Compute Stick and Intel® Neural Compute Stick 2. + +## Installation on Linux* OS + +For installation instructions, refer to the [Installation Guide for Linux*](../../../inference-engine/samples/benchmark_app/README.md). + +## Installation on Windows* OS + +For installation instructions, refer to the [Installation Guide for Windows*](../../../inference-engine/samples/benchmark_app/README.md). + +## Supported networks + +The Inference Engine MYRIAD plugin supports the following networks: + +**Caffe\***: +* AlexNet +* CaffeNet +* GoogleNet (Inception) v1, v2, v4 +* VGG family (VGG16, VGG19) +* SqueezeNet v1.0, v1.1 +* ResNet v1 family (18\*\* \*\*\*, 50, 101, 152) +* MobileNet (mobilenet-v1-1.0-224, mobilenet-v2) +* Inception ResNet v2 +* DenseNet family\*\* (121,161,169,201) +* SSD-300, SSD-512, SSD-MobileNet, SSD-GoogleNet, SSD-SqueezeNet + +**TensorFlow\***: +* AlexNet +* Inception v1, v2, v3, v4 +* Inception ResNet v2 +* MobileNet v1, v2 +* ResNet v1 family (50, 101, 152) +* ResNet v2 family (50, 101, 152) +* SqueezeNet v1.0, v1.1 +* VGG family (VGG16, VGG19) +* Yolo family (yolo-v2, yolo-v3, tiny-yolo-v1, tiny-yolo-v2, tiny-yolo-v3) +* faster_rcnn_inception_v2, faster_rcnn_resnet101 +* ssd_mobilenet_v1 +* DeepLab-v3+ + +**MXNet\***: +* AlexNet and CaffeNet +* DenseNet family\*\* (121,161,169,201) +* SqueezeNet v1.1 +* MobileNet v1, v2 +* NiN +* ResNet v1 (101, 152) +* ResNet v2 (101) +* SqueezeNet v1.1 +* VGG family (VGG16, VGG19) +* SSD-Inception-v3, SSD-MobileNet, SSD-ResNet-50, SSD-300 + +\*\* Network is tested on Intel® Movidius™ Neural Compute Stick with BatchNormalization fusion optimization disabled during Model Optimizer import + +\*\*\* Network is tested on Intel® Neural Compute Stick 2 with BatchNormalization fusion optimization disabled during Model Optimizer import + +## Supported Configuration Parameters + +See VPU common configuration parameters for the [VPU Plugins](VPU.md). +When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix. + +In addition to common parameters, the MYRIAD plugin accepts the following options: + +| Parameter Name | Parameter Values | Default | Description | +| :--- | :--- | :--- | :--- | +| `KEY_VPU_MYRIAD_PLATFORM` | empty string/`VPU_MYRIAD_2450`/`VPU_MYRIAD_2480` | empty string | If set, the plugin will use a device with specific platform to allocate a network. | +| `KEY_VPU_MYRIAD_PROTOCOL` | empty string/`VPU_MYRIAD_USB`/`VPU_MYRIAD_PCIE` | empty string | If set, the plugin will use a device with specific protocol to allocate a network. | +| `KEY_VPU_MYRIAD_FORCE_RESET` | `YES`/`NO` | `NO` | Enables force reset of all booted devices when new ExecutableNetwork is created.
This is a plugin scope option and must be used with the plugin's SetConfig method only.
See Device allocation section for details. | +| `KEY_VPU_PLATFORM` | empty string/`VPU_2450`/`VPU_2480` | empty string | **Deprecated** Use `KEY_VPU_MYRIAD_PLATFORM` instead.
If set, the plugin will use a device with specific platform to allocate a network. | +| `KEY_VPU_FORCE_RESET` | `YES`/`NO` | `NO` | **Deprecated** Use `KEY_VPU_MYRIAD_FORCE_RESET` instead.
Enables force reset of all booted devices when new ExecutableNetwork is created.
This is a plugin scope option and must be used with the plugin's SetConfig method only.
See Device allocation section for details. | + +## Device allocation   + +Each `IExecutableNetwork` instance tries to allocate new device on `InferenceEngine::Core::LoadNetwork`, but if all available devices are already allocated it will use the one with the minimal number of uploaded networks. +The maximum number of networks single device can handle depends on device memory capacity and the size of the networks. + +If `KEY_VPU_MYRIAD_FORCE_RESET` option is set to `YES` the plugin will reset all VPU devices in the system. + +Single device cannot be shared across multiple processes. + +## See Also + +* [Supported Devices](Supported_Devices.md) +* [VPU Plugins](VPU.md) +* [Intel® Neural Compute Stick 2 Get Started](https://software.intel.com/en-us/neural-compute-stick/get-started) diff --git a/docs/IE_DG/supported_plugins/Supported_Devices.md b/docs/IE_DG/supported_plugins/Supported_Devices.md new file mode 100644 index 00000000000000..7e4111837a14bb --- /dev/null +++ b/docs/IE_DG/supported_plugins/Supported_Devices.md @@ -0,0 +1,263 @@ +Supported Devices {#openvino_docs_IE_DG_supported_plugins_Supported_Devices} +================== + +The Inference Engine can infer models in different formats with various input and output formats. This section provides supported and optimal configurations per device. + +> **NOTE**: With OpenVINO™ 2020.4 release, Intel® Movidius™ Neural Compute Stick is no longer supported. + +The Inference Engine provides unique capabilities to infer deep learning models on the following device types with corresponding plugins: + +| Plugin | Device types | +|------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------| +|[GPU plugin](CL_DNN.md) |Intel® Processor Graphics, including Intel® HD Graphics and Intel® Iris® Graphics | +|[CPU plugin](CPU.md) |Intel® Xeon® with Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX-512), and AVX512_BF16, Intel® Core™ Processors with Intel® AVX2, Intel® Atom® Processors with Intel® Streaming SIMD Extensions (Intel® SSE) | +|[FPGA plugin](FPGA.md) (available in the Intel® Distribution of OpenVINO™ toolkit) |Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Speed Grade 2), Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | +|[VPU plugins](VPU.md) (available in the Intel® Distribution of OpenVINO™ toolkit) |Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X, Intel® Vision Accelerator Design with Intel® Movidius™ VPUs | +|[GNA plugin](GNA.md) (available in the Intel® Distribution of OpenVINO™ toolkit) |Intel® Speech Enabling Developer Kit, Amazon Alexa* Premium Far-Field Developer Kit, Intel® Pentium® Silver J5005 Processor, Intel® Pentium® Silver N5000 Processor, Intel® Celeron® J4005 Processor, Intel® Celeron® J4105 Processor, Intel® Celeron® Processor N4100, Intel® Celeron® Processor N4000, Intel® Core™ i3-8121U Processor, Intel® Core™ i7-1065G7 Processor, Intel® Core™ i7-1060G7 Processor, Intel® Core™ i5-1035G4 Processor, Intel® Core™ i5-1035G7 Processor, Intel® Core™ i5-1035G1 Processor, Intel® Core™ i5-1030G7 Processor, Intel® Core™ i5-1030G4 Processor, Intel® Core™ i3-1005G1 Processor, Intel® Core™ i3-1000G1 Processor, Intel® Core™ i3-1000G4 Processor| +|[Multi-Device plugin](MULTI.md) |Multi-Device plugin enables simultaneous inference of the same network on several Intel® devices in parallel | +|[Heterogeneous plugin](HETERO.md) |Heterogeneous plugin enables automatic inference splitting between several Intel® devices (for example if a device doesn't [support certain layers](#supported-layers)). | + +## Supported Configurations + +The Inference Engine can inference models in different formats with various input and output formats. +This chapter provides supported and optimal configurations for each plugin. + +### Terminology + +| Acronym/Term | Description | +| :-----------------| :---------------------------------------------| +| DL | Deep Learning | +| FP32 format | Single-precision floating-point format | +| BF16 format | Brain floating-point format | +| FP16 format | Half-precision floating-point format | +| I16 format | 2-byte signed integer format | +| I8 format | 1-byte signed integer format | +| U16 format | 2-byte unsigned integer format | +| U8 format | 1-byte unsigned integer format | + +NHWC, NCHW - Image data layout. Refers to the representation of batches of images. +NCDHW - Images sequence data layout. + +* N - Number of images in a batch +* D - Depth. Depend on model it could be spatial or time dimension +* H - Number of pixels in the vertical dimension +* W - Number of pixels in the horizontal dimension +* C - Number of channels + +CHW, NC, C - Tensor memory layout. +For example, the CHW value at index (c,h,w) is physically located at index (c\*H+h)\*W+w, for others by analogy + +### Supported Model Formats + +|Plugin |FP32 |FP16 |I8 | +|:-------------|:----------------------:|:----------------------:|:----------------------:| +|CPU plugin |Supported and preferred |Supported |Supported | +|GPU plugin |Supported |Supported and preferred |Supported\* | +|FPGA plugin |Supported |Supported |Not supported | +|VPU plugins |Not supported |Supported |Not supported | +|GNA plugin |Supported |Supported |Not supported | +
\* - currently, only limited set of topologies might benefit from enabling I8 model on GPU
+For [Multi-Device](MULTI.md) and [Heterogeneous](HETERO.md) execution +the supported models formats depends on the actual underlying devices. _Generally, FP16 is preferable as it is most ubiquitous and performant_. + +### Supported Input Precision + +|Plugin |FP32 |FP16 |U8 |U16 |I8 |I16 | +|:-------------|:--------:|:-------------:|:-------------:|:-------------:|:------------:|:-------------:| +|CPU plugin |Supported |Not supported |Supported |Supported |Not supported |Supported | +|GPU plugin |Supported |Supported\* |Supported\* |Supported\* |Not supported |Supported\* | +|FPGA plugin |Supported |Supported\* |Supported |Supported |Not supported |Supported | +|VPU plugins |Supported |Supported |Supported |Not supported |Not supported |Not supported | +|GNA plugin |Supported |Not supported |Supported |Not supported |Supported |Supported | + +
\* - Supported via `SetBlob` only, `GetBlob` returns FP32
+For [Multi-Device](MULTI.md) and [Heterogeneous](HETERO.md) execution +the supported input precision depends on the actual underlying devices. _Generally, U8 is preferable as it is most ubiquitous_. + +### Supported Output Precision + +|Plugin |FP32 |FP16 | +|:-------------|:--------:|:------------:| +|CPU plugin |Supported |Not supported | +|GPU plugin |Supported |Supported | +|FPGA plugin |Supported |Supported | +|VPU plugins |Supported |Supported | +|GNA plugin |Supported |Not supported | +For [Multi-Device](MULTI.md) and [Heterogeneous](HETERO.md) execution +the supported output precision depends on the actual underlying devices. _Generally, FP32 is preferable as it is most ubiquitous_. + +### Supported Input Layout + +|Plugin |NCDHW |NCHW |NHWC |NC | +|:-------------|:------------:|:------------:|:------------:|:------------:| +|CPU plugin |Supported |Supported |Supported |Supported | +|GPU plugin |Supported |Supported |Supported |Supported | +|FPGA plugin |Not supported |Supported |Supported |Not supported | +|VPU plugins |Not supported |Supported |Supported |Supported | +|GNA plugin |Not supported |Not supported |Not supported |Supported | + +### Supported Output Layout + +|Number of dimensions|5 |4 |3 |2 |1 | +|:-------------------|:---:|:---:|:---:|:---:|:---:| +|Layout |NCDHW|NCHW |CHW |NC |C | + +For setting relevant configuration, refer to the +[Integrate with Customer Application New Request API](../Integrate_with_customer_application_new_API.md) topic +(step 3 "Configure input and output"). + +### Supported Layers +The following layers are supported by the plugins and by [Shape Inference feature](../ShapeInference.md): + +| Layers | GPU | CPU | VPU | GNA | FPGA | ShapeInfer | +|:-------------------------------|:-------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------------:| +| Abs | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| Acos | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Acosh | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Activation-Clamp | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | +| Activation-ELU | Supported |Supported\*\*\*| Supported | Not Supported | Supported | Supported | +| Activation-Exp | Supported |Supported\*\*\*| Not Supported | Supported | Not Supported | Supported | +| Activation-Leaky ReLU | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | +| Activation-Not | Supported |Supported\*\*\*| Not Supported | Not Supported | Not Supported | Supported | +| Activation-PReLU | Supported |Supported\*\*\*| Supported | Not Supported | Supported | Supported | +| Activation-ReLU | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | +| Activation-ReLU6 | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Activation-Sigmoid/Logistic | Supported |Supported\*\*\*| Supported | Supported | Not Supported | Supported | +| Activation-TanH | Supported |Supported\*\*\*| Supported | Supported | Not Supported | Supported | +| ArgMax | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| Asin | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Asinh | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Atan | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Atanh | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| BatchNormalization | Supported | Supported | Supported | Not Supported | Supported\* | Supported | +| BinaryConvolution | Supported | Supported | Not Supported | Not Supported | Not Supported | Supported | +| Broadcast | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| Ceil | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Concat | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | +| Const | Supported | Supported | Supported | Supported | Not Supported | Not Supported | +| Convolution-Dilated | Supported | Supported | Supported | Not Supported | Supported | Supported | +| Convolution-Dilated 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| Convolution-Grouped | Supported | Supported | Supported | Not Supported | Supported | Supported | +| Convolution-Grouped 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| Convolution-Ordinary | Supported | Supported | Supported | Supported\* | Supported | Supported | +| Convolution-Ordinary 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| Cos | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Cosh | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Crop | Supported | Supported | Supported | Supported | Not Supported | Supported | +| CTCGreedyDecoder | Supported\*\* | Supported\*\* | Supported\* | Not Supported | Not Supported | Supported | +| Deconvolution | Supported | Supported | Supported | Not Supported | Supported\* | Supported | +| Deconvolution 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| DeformableConvolution | Supported | Supported | Not Supported | Not Supported | Not Supported | Supported | +| DepthToSpace | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| DetectionOutput | Supported | Supported\*\* | Supported\* | Not Supported | Not Supported | Supported | +| Eltwise-And | Supported |Supported\*\*\*| Not Supported | Not Supported | Not Supported | Supported | +| Eltwise-Add | Supported |Supported\*\*\*| Not Supported | Not Supported | Supported | Supported | +| Eltwise-Div | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-Equal | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-FloorMod | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-Greater | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-GreaterEqual | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-Less | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-LessEqual | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-LogicalAnd | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-LogicalOr | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-LogicalXor | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-Max | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-Min | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-Mul | Supported |Supported\*\*\*| Supported | Supported | Not Supported | Supported | +| Eltwise-NotEqual | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-Pow | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-Prod | Supported |Supported\*\*\*| Supported | Supported | Not Supported | Supported | +| Eltwise-SquaredDiff | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Eltwise-Sub | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | +| Eltwise-Sum | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | +| Erf | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Exp | Supported | Supported | Not Supported | Supported | Not Supported | Supported | +| FakeQuantize | Not Supported | Supported | Not Supported | Not Supported | Not Supported | Supported | +| Fill | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Flatten | Supported | Supported | Supported | Not Supported | Not Supported | Supported | +| Floor | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| FullyConnected (Inner Product) | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | +| Gather | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| GatherTree | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Gemm | Supported | Supported | Supported | Not Supported | Not Supported | Supported | +| GRN | Supported\*\* | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| HardSigmoid | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Interp | Supported\*\* | Supported\*\* | Supported | Not Supported | Not Supported | Supported\* | +| Log | Supported | Supported\*\* | Supported | Supported | Not Supported | Supported | +| LRN (Norm) | Supported | Supported | Supported | Not Supported | Supported | Supported | +| LSTMCell | Supported | Supported | Supported | Supported | Not Supported | Not Supported | +| GRUCell | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| RNNCell | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| LSTMSequence | Supported | Supported | Supported | Not Supported | Not Supported | Not Supported | +| GRUSequence | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| RNNSequence | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| LogSoftmax | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Not Supported | +| Memory | Not Supported | Supported | Not Supported | Supported | Not Supported | Supported | +| MVN | Supported | Supported\*\* | Supported\* | Not Supported | Not Supported | Supported | +| Neg | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| NonMaxSuppression | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Normalize | Supported | Supported\*\* | Supported\* | Not Supported | Not Supported | Supported | +| OneHot | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Pad | Supported | Supported\*\* | Supported\* | Not Supported | Not Supported | Supported | +| Permute | Supported | Supported | Supported | Supported\* | Not Supported | Supported | +| Pooling(AVG,MAX) | Supported | Supported | Supported | Supported | Supported | Supported | +| Pooling(AVG,MAX) 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| Power | Supported | Supported\*\* | Supported | Supported\* | Supported\* | Supported | +| PowerFile | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Not Supported | +| PriorBox | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| PriorBoxClustered | Supported\*\* | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| Proposal | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| PSROIPooling | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| Range | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Reciprocal | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceAnd | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceL1 | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceL2 | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceLogSum | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceLogSumExp | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceMax | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceMean | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceMin | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceOr | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceProd | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceSum | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ReduceSumSquare | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| RegionYolo | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| ReorgYolo | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| Resample | Supported | Supported\*\* | Supported | Not Supported | Supported\* | Supported | +| Reshape | Supported |Supported\*\*\*| Supported | Supported | Not Supported | Supported\* | +| ReverseSequence | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| RNN | Not Supported | Supported | Supported | Not Supported | Not Supported | Not Supported | +| ROIPooling | Supported\* | Supported | Supported | Not Supported | Not Supported | Supported | +| ScaleShift | Supported |Supported\*\*\*| Supported\* | Supported | Supported | Supported | +| ScatterUpdate | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Select | Supported | Supported | Supported | Not Supported | Not Supported | Supported | +| Selu | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| ShuffleChannels | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Sign | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | +| Sin | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Sinh | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| SimplerNMS | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Slice | Supported |Supported\*\*\*| Supported | Supported | Supported\* | Supported | +| SoftMax | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| Softplus | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Softsign | Supported | Supported\*\* | Not Supported | Supported | Not Supported | Supported | +| SpaceToDepth | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| SpatialTransformer | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Split | Supported |Supported\*\*\*| Supported | Supported | Supported\* | Supported | +| Squeeze | Supported | Supported\*\* | Supported | Supported | Not Supported | Supported | +| StridedSlice | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Tan | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| TensorIterator | Not Supported | Supported | Supported | Supported | Not Supported | Not Supported | +| Tile | Supported\*\* |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | +| TopK | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | +| Unpooling | Supported | Not Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| Unsqueeze | Supported | Supported\*\* | Supported | Supported | Not Supported | Supported | +| Upsampling | Supported | Not Supported | Not Supported | Not Supported | Not Supported | Not Supported | + +\*- support is limited to the specific parameters. Refer to "Known Layers Limitation" section for the device [from the list of supported](Supported_Devices.md). + +\*\*- support is implemented via [Extensibility mechanism](../Extensibility_DG/Intro.md). + +\*\*\*- supports NCDHW layout. diff --git a/docs/IE_DG/supported_plugins/VPU.md b/docs/IE_DG/supported_plugins/VPU.md new file mode 100644 index 00000000000000..7c04290f7dd16d --- /dev/null +++ b/docs/IE_DG/supported_plugins/VPU.md @@ -0,0 +1,104 @@ +# VPU Plugins {#openvino_docs_IE_DG_supported_plugins_VPU} + +This chapter provides information on the Inference Engine plugins that enable inference of deep learning models on the supported VPU devices: + +* Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X — Supported by the [MYRIAD Plugin](MYRIAD.md) +* Intel® Vision Accelerator Design with Intel® Movidius™ VPUs — Supported by the [HDDL Plugin](HDDL.md) + +> **NOTE**: With OpenVINO™ 2020.4 release, Intel® Movidius™ Neural Compute Stick powered by the Intel® Movidius™ Myriad™ 2 is no longer supported. + +## Known Layers Limitations + +* `'ScaleShift'` layer is supported for zero value of `'broadcast'` attribute only. +* `'CTCGreedyDecoder'` layer works with `'ctc_merge_repeated'` attribute equal 1. +* `'DetectionOutput'` layer works with zero values of `'interpolate_orientation'` and `'num_orient_classes'` parameters only. +* `'MVN'` layer uses fixed value for `'eps'` parameters (1e-9). +* `'Normalize'` layer uses fixed value for `'eps'` parameters (1e-9) and is supported for zero value of `'across_spatial'` only. +* `'Pad'` layer works only with 4D tensors. + +## Optimizations + +VPU plugins support layer fusion and decomposition. + +### Layer Fusion + +#### Fusing Rules + +Certain layers can be merged into Convolution, ReLU, and Eltwise layers according to the patterns below: + +- Convolution + - Convolution + ReLU → Convolution + - Convolution + Clamp → Convolution + - Convolution + LeakyReLU → Convolution + - Convolution (3x3, stride=1, padding=1) + Pooling (2x2, stride=2, padding=0) → Convolution + +- Pooling + ReLU → Pooling + +- FullyConnected + ReLU → FullyConnected + +- Eltwise + - Eltwise + ReLU → Eltwise + - Eltwise + LeakyReLU → Eltwise + - Eltwise + Clamp → Eltwise + +#### Joining Rules + +> **NOTE**: Application of these rules depends on tensor sizes and resources available. + +Layers can be joined when the two conditions below are met: +- Layers are located on topologically independent branches. +- Layers can be executed simultaneously on the same hardware units. + +### Decomposition Rules + +- Convolution and Pooling layers are tiled resulting in the following pattern: + - A Split layer that splits tensors into tiles + - A set of tiles, optionally with service layers like Copy + - Depending on a tiling scheme, a Concatenation or Sum layer that joins all resulting tensors into one and restores the full blob that contains the result of a tiled operation + + Names of tiled layers contain the `@soc=M/N` part, where `M` is the tile number and `N` is the number of tiles: + ![](../img/yolo_tiny_v1.png) + +> **NOTE**: Nominal layers, such as Shrink and Expand, are not executed. + +> **NOTE**: VPU plugins can add extra layers like Copy. + + +## VPU Common Configuration Parameters + +The VPU plugins supports the configuration parameters listed below. +The parameters are passed as `std::map` on `InferenceEngine::Core::LoadNetwork` +or `InferenceEngine::Core::SetConfig`. +When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix. + +| Parameter Name | Parameter Values | Default | Description | +| :--- | :--- | :--- | :--- | +| `KEY_VPU_HW_STAGES_OPTIMIZATION` | `YES`/`NO` | `YES` | Turn on HW stages usage
Applicable for Intel Movidius Myriad X and Intel Vision Accelerator Design devices only. | +| `KEY_VPU_COMPUTE_LAYOUT` | `VPU_AUTO`, `VPU_NCHW`, `VPU_NHWC` | `VPU_AUTO` | Specify internal input and output layouts for network layers. | +| `KEY_VPU_PRINT_RECEIVE_TENSOR_TIME` | `YES`/`NO` | `NO` | Add device-side time spent waiting for input to PerformanceCounts.
See Data Transfer Pipelining section for details. | +| `KEY_VPU_IGNORE_IR_STATISTIC` | `YES`/`NO` | `NO` | VPU plugin could use statistic present in IR in order to try to improve calculations precision.
If you don't want statistic to be used enable this option. | +| `KEY_VPU_CUSTOM_LAYERS` | path to XML file | empty string | This option allows to pass XML file with custom layers binding.
If layer is present in such file, it would be used during inference even if the layer is natively supported. | + + +## Data Transfer Pipelining   + +MYRIAD plugin tries to pipeline data transfer to/from device with computations. +While one infer request is executed the data for next infer request can be uploaded to device in parallel. +Same applicable for result downloading. + +`KEY_VPU_PRINT_RECEIVE_TENSOR_TIME` configuration parameter can be used to check the efficiency of current pipelining. +The new record in performance counters will show the time that device spent waiting for input before starting the inference. +In perfect pipeline this time should be near to zero, which means that the data was already transferred when new inference started. + +## Troubleshooting + +**Get the following message when running inference with the VPU plugin: "[VPU] Cannot convert layer due to unsupported layer type "** + +This means that your topology has a layer that is unsupported by your target VPU plugin. To resolve this issue, you can implement the custom layer for the target device using the [Inference Engine Extensibility mechanism](../Extensibility_DG/Intro.md). Or, to quickly get a working prototype, you can use the heterogeneous scenario with the default fallback policy (see the [HETERO Plugin](HETERO.md) section). Use the HETERO plugin with a fallback device that supports this layer, for example, CPU: `HETERO:MYRIAD,CPU`. +For a list of VPU supported layers, see the Supported Layers section of the [Supported Devices](Supported_Devices.md) topic. + + +## See Also + +* [Supported Devices](Supported_Devices.md) +* [Intel® Neural Compute Stick 2 Get Started](https://software.intel.com/en-us/neural-compute-stick/get-started) diff --git a/docs/Inference_Engine_Development_Procedure/CONTRIBUTING.md b/docs/Inference_Engine_Development_Procedure/CONTRIBUTING.md new file mode 100644 index 00000000000000..b121254303ebf9 --- /dev/null +++ b/docs/Inference_Engine_Development_Procedure/CONTRIBUTING.md @@ -0,0 +1,130 @@ +# Inference Engine development configuration document {#openvino_docs_Inference_Engine_Development_Procedure_CONTRIBUTING} + +To create MakeFiles use following process or run build-after-clone.sh script located in the root +folder if you use Ubuntu 16.04. +To create Visual Studio project run create_vs_proj_x64.cmd from scripts folder. + +## Setting up the environment for development + +1. Update/init submodules bu running +```bash +git submodule init +git submodule update --recursive +``` +2. Install [Git LFS](https://git-lfs.github.com) extension. It's required to download models + from the [repo](https://gitlab-icv.inn.intel.com/inference-engine/models-ir) + Below is step by step guide to install Git LFS. + + 2.1 Linux + ```bash + wget https://github.com/git-lfs/git-lfs/releases/download/v2.3.4/git-lfs-linux-amd64-2.3.4.tar.gz + tar xf git-lfs-linux-amd64-2.3.4.tar.gz + cd git-lfs-2.3.4 + sudo PREFIX=/usr/ ./install.sh + git config --global http.sslverify false + ``` + 2.1 Windows + 2.1.1 Download + [Git LFS](https://github.com/git-lfs/git-lfs/releases/download/v2.3.4/git-lfs-windows-2.3.4.exe) + and install it. + 2.1.2 Run console command + ```bash + git config --global http.sslverify false + ``` + > **NOTE**: HTTPS protocol is used to download files by Git LFS. You either have to + > disable HTTPS proxy for local resources like GitLab server gitlab-icv.inn.intel.com by setting + > `no_proxy=localhost,gitlab-icv.inn.intel.com` or switch to `http://proxy-chain.intel.com:911` proxy server, + > because it disables proxy for local servers automatically. + +3. Use Cmake to fetch project dependencies and create Unix makefiles + ```bash + mkdir build + cd build + ``` + There are number of options which turn on some components during builds and initiate downloading of the models + + `-DENABLE_TESTS=ON` - to build functional and behavior tests + this will copy necessary dependencies to ./temp folder, or to ENV.DL_SDK_TEMP folder if environment variable set + `-DENABLE_FUNCTIONAL_TESTS=ON` - to build functional tests + `-DCMAKE_BUILD_TYPE=Debug/Release` - to point debug or release configuration. Missing this option will generate something between + Release and Debug and you might be surprised by certain aspects of the compiled binaries + `-DENABLE_PRIVATE_MODELS=ON` - copy private models from https://gitlab-icv.inn.intel.com/inference-engine-models/private-ir with restricted access + + The full command line enough for development is following: + ```bash + cmake -DENABLE_TESTS=ON -DENABLE_FUNCTIONAL_TESTS=ON -DCMAKE_BUILD_TYPE=Debug .. + ``` + + The full command line enough for validation before push to the server + ```bash + cmake -DENABLE_TESTS=ON -DENABLE_FUNCTIONAL_TESTS=ON -DCMAKE_BUILD_TYPE=Release .. + ``` + +4. Build project and tests: +```bash +make -j16 +``` + +5. To build documentation: + a. Install doxygen and graphviz: + ```bash + apt-get install doxygen && apt-get install graphviz && apt-get install texlive + ``` + b. Go to the documentation build directory: + ```bash + cd to scripts/build_documentation + ``` + c. Run the `build_docs.sh` script: + * To build the documentation set that includes documentation from the current branch of the + `inference-engine` repo and specific branches of the `openvino-documentation`, `models` and + `model-optimizer-tensorflow` repos, specify three branches as parameters: + ```sh + ./build_docs.sh ovinodoc: models: mo: + ``` + * To build the documentation set that includes only documentation from the current branch of the + `inference-engine` repo, run the script with no parameters: + ```sh + ./build_docs.sh + ``` + + > **NOTE**: You should run the script either with specifying all three parameters or without any parameters. + + d. Find the generated documentation in the `root_directory/doc` directory + + > **NOTE**: If you make any changes in the documentation source files, it is recommended to cleanup the + > documentation build directory and continue with step 3: + >```sh + > cd scripts/build_documentation + > ./clean.sh + > ``` + + > **NOTE**: The scripts for building documentation use SSH for cloning repositories. Please, make sure that + you have + > added your SSH key to git-lab. For more information about it, please visit the + > [instructions page](https://gitlab-icv.inn.intel.com/help/ssh/README.md) + + +## Compilers supported and verified + +All others may be compatible but Inference Engine does not guarantee that. + +* Linux : gcc(5.4)\*, clang(3.9) +* MacOS : gcc(5.4), clang(3.9)\* +* Windows: MSVC(14), ICC(17.0)\* + \* - is target compiler for platform and used for public external drops + +## TeamCity CI + +TeamCity CI server is available +[here](https://teamcity01-ir.devtools.intel.com/project.html?projectId=DeepLearningSdk_DeepLearningSdk_InferenceEngine) + +To get access to the server, go to +[AGS](https://ags.intel.com/identityiq/lcm/requestAccess.jsf) and search "DevTools -- INDE xOS - Project Developer". + + +## Troubleshooting steps + +1. **Issue**: Build of the "mkldnn" project failed on Windows with "Error MSB6003 The specified task + executable "cmd.exe" could not be run. The working directory "\mkl\tools" does not exist". + **Solution**: open InferenceEngine.sln -> goto "mkldnn" project + Properties -> Configuration Properties -> Intel Performance Libraries -> Use Intel MKL -> choose "No" \ No newline at end of file diff --git a/docs/Inference_Engine_Development_Procedure/COVERAGE.md b/docs/Inference_Engine_Development_Procedure/COVERAGE.md new file mode 100644 index 00000000000000..ee9642150af401 --- /dev/null +++ b/docs/Inference_Engine_Development_Procedure/COVERAGE.md @@ -0,0 +1,58 @@ +# Inference Engine coverage report build {#openvino_docs_Inference_Engine_Development_Procedure_COVERAGE} + +The coverage report is generated using Lcov tool and based on profile data generated by GCC. +The generated reports are in HTML form and located in `/coverage`. The reports are generated for the following components: + +1. `inference_engine` - main Inference Engine library +1. `inference_engine_legacy` - legacy Inference Engine library +1. `inference_engine_ir_reader` - Inference Engine IR reader library +1. `low_precision_transformations` - library with Low Precision transformations. +1. `inference_engine_transformations` - Ngraph-based transformation for Inference Engine. +1. `preprocessing` - Inference Engine G-API based preprocessing plugin. +1. Inference Engine open-sources plugins: + - `hetero_plugin` - Heterogeneous plugin. + - `multi_device` - Multi device plugin. + - `cldnn_engine` - GPU plugin. + - `mkldnn_plugin` - CPU plugin. + - `gna_plugin` - GNA plugin. + +## Build with profiling data support + +To build coverage report, compile DLDT with an additional CMake option `-DENABLE_COVERAGE=ON`: + +```bash +$ cmake -DENABLE_COVERAGE=ON . +``` + +And build DLDT as usual. + +## Generate coverage report + +In order to generate coverage reports, first of all, the tests must be run. Depending on how many tests are run, the better covegare percentage can be achieved. E.g. for `inference_engine` component, `InferenceEngineUnitTests`, `ieUnitTests`, `ieFuncTests` must be run as well as plugin tests. + +```bash +$ ctest -V -L IE +``` + +After sufficient number of tests are executed, the coverage numbers can be calculated. In order to do this, run: + +```bash +$ make ie_coverage +``` + +The following tree of reports are generated: + +```bash +$ find coverage/ -maxdepth 2 -name index.html +coverage/hetero_plugin/index.html +coverage/inference_engine/index.html +coverage/inference_engine_ir_reader/index.html +coverage/inference_engine_legacy/index.html +coverage/low_precision_transformations/index.html +coverage/mkldnn_plugin/index.html +coverage/multi_device/index.html +coverage/preprocessing/index.html +coverage/inference_engine_transformations/index.html +coverage/gna_plugin/index.html +coverage/cldnn_engine/index.html +``` \ No newline at end of file diff --git a/docs/Inference_Engine_Development_Procedure/IE_Dev_Procedure.md b/docs/Inference_Engine_Development_Procedure/IE_Dev_Procedure.md new file mode 100644 index 00000000000000..2be7f8dcb476be --- /dev/null +++ b/docs/Inference_Engine_Development_Procedure/IE_Dev_Procedure.md @@ -0,0 +1,167 @@ +# Development Flow for Adding New Changes to the Inference Engine {#openvino_docs_Inference_Engine_Development_Procedure_IE_Dev_Procedure} + +## Develop Your Feature +1. Create a branch based on the latest version of a target branch (`master` or `release`). + Use the following branch naming: + * *feature//* - stands for temporary work area for creating a feature or performing bug fixes + * *scout/* - is used for shared development if several developers work on one feature + **IMPORTANT**: Do not use long branch name, because it may lead to failed CI jobs on Windows. Name length must be less 40 characters. +2. Commit changes on your branch and push it to remote. + + +## Create a Merge Request +1. Go to the [GitLab\\\* merge request page](https://gitlab-icv.inn.intel.com/inference-engine/dldt/merge_requests) + +2. Create a merge request by pressing on **New Merge Request** or **Create merge request**. Choose an `inference-engine` project from the drop-down list. +
+ ![mr1] + + a. Fill the **Title** and **Description** fields with meaningful information. This information should be enough to understand the changes you made: + + Use this template for the **Title** field. If you did not finish your work on current bug/feature, add `[WIP]` to the beginning of the title: + ``` + [Domain] Small description for merge request (can use first string from the commit message) + ``` + Use domain from the following list: + * [IE COMMON] - if a solution impacts common Inference Engine functionality + * [IE EXTENSION] + * [IE PYTHON] + * [IE SAMPLES] + * [IE TESTS] + * [IE DOCS] + * [IE MKLDNN] + * [IE FPGA] + * [IE GNA] + * [IE CLDNN] + * [IE MYRIAD] + * [IE HDDL] + * [IE HETERO] + + You can use several domains in one commit message. For example: [COMMON][MKLDNN] + + Use this template to fill **Description** field: + ``` + + + JIRA: + CI: + + ``` + + b. Add **Milestone** and **Labels** to the MR if it is possible. + + c. If your work is finished, assign the MR to a reviewer. If it is in progress, assing the MR to yourself (`[WIP]` case). + + Example of an [MR](https://gitlab-icv.inn.intel.com/inference-engine/inference-engine/merge_requests/2512): +
+ ![mr_example1] + + >**NOTE**: When creating an MR, please remember that even a person who does not know what this feature/merge request stands for >should be able to understand the reasons why this feature/change was created. + +3. Change the history of your branch: + + a. Squash commits on your branch into one (you can have some number of logically separated commits if it is needed). Use the following template for commit message: + + ``` + [Domain] Small description of commit + + Multiline commit additional comments: + * additional comment 1 + * additional comment 2 + + JIRA: CVS-xxxx + ``` + + + Example of the commit message: +
+ ![commit_message] + + >**NOTE**: if there is no JIRA ticket for your task, you can leave this line empty. + + b. Rebase your branch to the latest version of the target branch. The difference between the last validated version of the branch must be no more than one day. + + c. Push your updates to the remote branch. You can use force push if needed, for example, when you rebase your branch to latest version of the target branch. + +4. Add required people as reviewers of the MR. Please list appropriate GitLab accounts in the discussion section of the MR. + +5. Run validation build (refer to the ["Run tests under CI"](#run-tests-under-ci)) + +6. When review is complited and you have one or several **Thumb up** in your MR: + a. Make sure that all comments are fixed and all discussions are resolved + b. Run validation build again, make sure that all tests are passed, and update link to CI in the MR description + c. Create cherry-picked MR if needed (refer to the ["Create a Cherry-Picked MR (Optional)"](#create-a-cherry-picked-mr-(optional))) and validate it by CI. + + Example of an [MR](https://gitlab-icv.inn.intel.com/inference-engine/inference-engine/merge_requests/2111): +
+ ![mr_example] + +7. Reassign the MR to the someone from integration managers. + +8. An integration manager will close or merge your MR. +
+ ![mr2] + +## Create a Cherry-Picked MR (Optional) +1. If you need to merge your changes in both target branch (`release` and `master`), create new merge request to another target branch containing cherry-pick with approved commit. + Follow the rules above to create the MR (sections 1 and 2 in (**Create a Merge Request**)[#create-a-merge-request]), but add line "\*\*cherry-picked from MR: !xxxx**" to the MR description and assign the MR to someine from the integration managers. + +2. Run validation build (refer to "Run tests under CI" section). + +3. Assign the MR to an integration manager. + +4. The integration manager will merge or close your MR. + +## Run Tests under CI +1. Go to the CI page: [TeamCity](https://teamcity01-ir.devtools.intel.com/project.html?projectId=DeepLearningSdk_DeepLearningSdk_InferenceEngineUnifiedRepo) + +2. Click the **Run Engineering Validation** (if you want to merge your changes into `master` branch) or **Run Engineering Validation for XXXX RX Release branch** (if you want merge changes into `release` branch): +
+ ![run_engineering] + +3. Select your branch in the top left corner: +
+ ![select_branch] + +4. If you have not committed anything to your branch for the past several days, you might not see your branch in the list. In this case, you can choose it in the properties of **Run Engineering Validation** task (if you want merge changes into `master` branch) or **Run Release Validation** task (if you want merge changes into `release` branch). Click on **Run** button. On the third tab, choose your branch and click the **Run Build**: +
+ ![run_tests] + +5. Click on an arrow right after **1 queued**. On a new dialog window, click on a build number. In this case, it is 1825: +
+ ![view_results] + + You will see the current status of tests: +
+ ![analyze_results] + +6. Make sure that all tests are passed. + If some test failed, see build log in TeamCity: choose failed build in dependencies, click on the result and go to **Build log** tab. + * If it looks like an infrastructure issue (for example, an absence of a software or a network issue), restart the build. + * If you think that your changes could not break the test, rebase your branch on latest version of `master` restart build. If the build failed again, explore build history: an incorrect code might have merged into the target branch before branching that is not fixed till current moment. + * If you have an issue with code style, run code style checks `/scripts/run_code_check.sh` locally to analyze the reason of failed CI jobs (see the picture below) and restart them if required: +
+ ![code_style_artifacts] + + Commit your changes. + + **Please add link to restarted build in MR description** + +## Merge Changes to a Target Branch (master or release branches) +1. The `master` and `release` branches are protected. Only integration managers can merge to these branches + +2. Assigned integration manager checks if all the requirements are met. If so, they can merge MR with the **Merge** button or manually. + +3. An integration manager removes a branch if an MR author has set an appropriate flag for this MR in GitLab GUI. + +[mr_example]: img/mr_example.png +[mr_example1]: img/mr_example1.png +[commit_message]: img/commit_message.png +[code_style_artifacts]: img/code_style_artifacts.png +[select_branch]: img/select_branch.png +[run_engineering]: img/run_engineering.png +[run_tests]: img/run_tests.png +[view_results]: img/view_results.png +[analyze_results]: img/analyze_results.png +[mr1]: img/mr1.png +[mr2]: img/mr2.png diff --git a/docs/Inference_Engine_Development_Procedure/img/analyze_results.png b/docs/Inference_Engine_Development_Procedure/img/analyze_results.png new file mode 100644 index 00000000000000..414bbcf25c0ea2 --- /dev/null +++ b/docs/Inference_Engine_Development_Procedure/img/analyze_results.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5a5a0ce07310382c6265bef5942e676f29114981e56f6329b901055c42c8dff5 +size 293945 diff --git a/docs/Inference_Engine_Development_Procedure/img/code_style_artifacts.png b/docs/Inference_Engine_Development_Procedure/img/code_style_artifacts.png new file mode 100644 index 00000000000000..7898c469627d85 --- /dev/null +++ b/docs/Inference_Engine_Development_Procedure/img/code_style_artifacts.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:67e1a33ba22ef58c7dbb78e4ef07b1f0d83ef926e6cd63e55731e056b06964d6 +size 218208 diff --git a/docs/Inference_Engine_Development_Procedure/img/commit_message.png b/docs/Inference_Engine_Development_Procedure/img/commit_message.png new file mode 100644 index 00000000000000..a3d838d4d4c726 --- /dev/null +++ b/docs/Inference_Engine_Development_Procedure/img/commit_message.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a88aa0d55207761ab3b5a9ebbe9a8610d55623a12e867438616461fd79e13082 +size 10739 diff --git a/docs/Inference_Engine_Development_Procedure/img/mr1.png b/docs/Inference_Engine_Development_Procedure/img/mr1.png new file mode 100644 index 00000000000000..f1721196cad9dd --- /dev/null +++ b/docs/Inference_Engine_Development_Procedure/img/mr1.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3e9b00393f8d26dbaa89f82c653bc03214305255369eee4ac0b7843f4fa18ab6 +size 107057 diff --git a/docs/Inference_Engine_Development_Procedure/img/mr2.png b/docs/Inference_Engine_Development_Procedure/img/mr2.png new file mode 100644 index 00000000000000..235db81285ede8 --- /dev/null +++ b/docs/Inference_Engine_Development_Procedure/img/mr2.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8e08347a27464c90038fd1f3737154e1af4c9546dfff46bfe7fcbb720758c08d +size 43082 diff --git a/docs/Inference_Engine_Development_Procedure/img/mr_example.png b/docs/Inference_Engine_Development_Procedure/img/mr_example.png new file mode 100644 index 00000000000000..c33d94051aa3e6 --- /dev/null +++ b/docs/Inference_Engine_Development_Procedure/img/mr_example.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fab30e9b33189493bb4e129aaf672d9240a812295fb8c359070bbaceb03d7c9b +size 40107 diff --git a/docs/Inference_Engine_Development_Procedure/img/mr_example1.png b/docs/Inference_Engine_Development_Procedure/img/mr_example1.png new file mode 100644 index 00000000000000..e156f1e101538b --- /dev/null +++ b/docs/Inference_Engine_Development_Procedure/img/mr_example1.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:163daea84811ab00fcda952915302a6a6ad712b53a27cc3e49c04146bfda698b +size 43281 diff --git a/docs/Inference_Engine_Development_Procedure/img/run_engineering.png b/docs/Inference_Engine_Development_Procedure/img/run_engineering.png new file mode 100644 index 00000000000000..fd3d640c48e6bc --- /dev/null +++ b/docs/Inference_Engine_Development_Procedure/img/run_engineering.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ed9e74765a8eba71b3df08b9f2d5a5acf2e605db73fd3c1c20bc28334408bdf4 +size 31140 diff --git a/docs/Inference_Engine_Development_Procedure/img/run_tests.png b/docs/Inference_Engine_Development_Procedure/img/run_tests.png new file mode 100644 index 00000000000000..e65cd491978f5d --- /dev/null +++ b/docs/Inference_Engine_Development_Procedure/img/run_tests.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a2285feece5b7ae18d690cd58c3282eafd1913279a4dd86afdaa18a50af9a890 +size 180101 diff --git a/docs/Inference_Engine_Development_Procedure/img/select_branch.png b/docs/Inference_Engine_Development_Procedure/img/select_branch.png new file mode 100644 index 00000000000000..6fb5ba25a8f7b0 --- /dev/null +++ b/docs/Inference_Engine_Development_Procedure/img/select_branch.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7f0b0ce84951be0da5326d28540f57ad561f49c3c0abaad6eff33a7189bc808d +size 102331 diff --git a/docs/Inference_Engine_Development_Procedure/img/view_results.png b/docs/Inference_Engine_Development_Procedure/img/view_results.png new file mode 100644 index 00000000000000..adb59489d6bedd --- /dev/null +++ b/docs/Inference_Engine_Development_Procedure/img/view_results.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3ff4bcfde5169f74ac09220ea04022fff3d12607a67805e10b13876768651d8d +size 47649 diff --git a/docs/Legal_Information.md b/docs/Legal_Information.md new file mode 100644 index 00000000000000..17c3788f9eac0f --- /dev/null +++ b/docs/Legal_Information.md @@ -0,0 +1,17 @@ +# Legal Information {#openvino_docs_Legal_Information} + +This software and the related documents are Intel copyrighted materials, and your use of them is governed by the express license (the “License”) under which they were provided to you. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Unless the License provides otherwise, you may not use, modify, copy, publish, distribute, disclose or transmit this software or the related documents without Intel's prior written permission. This software and the related documents are provided as is, with no express or implied warranties, other than those that are expressly stated in the License. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. + +This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting [www.intel.com/design/literature.htm](www.intel.com/design/literature.htm). + +Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. + +Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit [www.intel.com/benchmarks](www.intel.com/benchmarks). + +Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure. + +Your costs and results may vary. + +Intel technologies may require enabled hardware, software or service activation. + +© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. diff --git a/docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md b/docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md new file mode 100644 index 00000000000000..b2169b0cfba2a7 --- /dev/null +++ b/docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md @@ -0,0 +1,102 @@ +# Model Optimizer Developer Guide {#openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide} + +Model Optimizer is a cross-platform command-line tool that facilitates the transition between the training and deployment environment, performs static model analysis, and adjusts deep learning models for optimal execution on end-point target devices. + +Model Optimizer process assumes you have a network model trained using a supported deep learning framework. The scheme below illustrates the typical workflow for deploying a trained deep learning model: + +![](img/workflow_steps.png) + +Model Optimizer produces an Intermediate Representation (IR) of the network, which can be read, loaded, and inferred with the Inference Engine. The Inference Engine API offers a unified API across a number of supported Intel® platforms. The Intermediate Representation is a pair of files describing the model: + +* .xml - Describes the network topology + +* .bin - Contains the weights and biases binary data. + +## What's New in the Model Optimizer in this Release? + +**Deprecation Notice** + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* + +* Common changes: + * Implemented generation of a compressed OpenVINO IR suitable for INT8 inference, which takes up to 4 times less disk space than an expanded one. Use the `--disable_weights_compression` Model Optimizer command-line parameter to get an expanded version. + * Implemented an optimization transformation to replace a sub-graph with the `Erf` operation into the `GeLU` operation. + * Implemented an optimization transformation to replace an upsamping pattern that is represented as a sequence of `Split` and `Concat` operations to a single `Interpolate` operation. + * Fixed a number of Model Optimizer bugs to generate reshape-able IRs of many models with the command line parameter `--keep_shape_ops`. + * Fixed a number of Model Optimizer transformations to set operations name in an IR equal to the original framework model operation name. + * The following operations are no longer generated with `version="opset1"`: `MVN`, `ROIPooling`, `ReorgYolo`. They became a part of new `opset2` operation set and generated with `version="opset2"`. Before this fix, the operations were generated with `version="opset1"` by mistake, they were not a part of `opset1` nGraph namespace; `opset1` specification was fixed accordingly. + +* ONNX*: + * Added support for the following operations: `MeanVarianceNormalization` if normalization is performed over spatial dimensions. + +* TensorFlow*: + * Added support for the TensorFlow Object Detection models version 1.15.X. + * Added support for the following operations: `BatchToSpaceND`, `SpaceToBatchND`, `Floor`. + +* MXNet*: + * Added support for the following operations: + * `Reshape` with input shape values equal to -2, -3, and -4. + +> **NOTE:** +> [Intel® System Studio](https://software.intel.com/en-us/system-studio) is an all-in-one, cross-platform tool suite, purpose-built to simplify system bring-up and improve system and IoT device application performance on Intel® platforms. If you are using the Intel® Distribution of OpenVINO™ with Intel® System Studio, go to [Get Started with Intel® System Studio](https://software.intel.com/en-us/articles/get-started-with-openvino-and-intel-system-studio-2019). + +## Table of Content + +* [Introduction to OpenVINO™ Deep Learning Deployment Toolkit](../IE_DG/Introduction.md) + +* [Preparing and Optimizing your Trained Model with Model Optimizer](prepare_model/Prepare_Trained_Model.md) + * [Configuring Model Optimizer](prepare_model/Config_Model_Optimizer.md) + * [Converting a Model to Intermediate Representation (IR)](prepare_model/convert_model/Converting_Model.md) + * [Converting a Model Using General Conversion Parameters](prepare_model/convert_model/Converting_Model_General.md) + * [Converting Your Caffe* Model](prepare_model/convert_model/Convert_Model_From_Caffe.md) + * [Converting Your TensorFlow* Model](prepare_model/convert_model/Convert_Model_From_TensorFlow.md) + * [Converting BERT from TensorFlow](prepare_model/convert_model/tf_specific/Convert_BERT_From_Tensorflow.md) + * [Converting GNMT from TensorFlow](prepare_model/convert_model/tf_specific/Convert_GNMT_From_Tensorflow.md) + * [Converting YOLO from DarkNet to TensorFlow and then to IR](prepare_model/convert_model/tf_specific/Convert_YOLO_From_Tensorflow.md) + * [Converting Wide and Deep Models from TensorFlow](prepare_model/convert_model/tf_specific/Convert_WideAndDeep_Family_Models.md) + * [Converting FaceNet from TensorFlow](prepare_model/convert_model/tf_specific/Convert_FaceNet_From_Tensorflow.md) + * [Converting DeepSpeech from TensorFlow](prepare_model/convert_model/tf_specific/Convert_DeepSpeech_From_Tensorflow.md) + * [Converting Language Model on One Billion Word Benchmark from TensorFlow](prepare_model/convert_model/tf_specific/Convert_lm_1b_From_Tensorflow.md) + * [Converting Neural Collaborative Filtering Model from TensorFlow*](prepare_model/convert_model/tf_specific/Convert_NCF_From_Tensorflow.md) + + * [Converting TensorFlow* Object Detection API Models](prepare_model/convert_model/tf_specific/Convert_Object_Detection_API_Models.md) + * [Converting TensorFlow*-Slim Image Classification Model Library Models](prepare_model/convert_model/tf_specific/Convert_Slim_Library_Models.md) + * [Converting CRNN Model from TensorFlow*](prepare_model/convert_model/tf_specific/Convert_CRNN_From_Tensorflow.md) + * [Converting Your MXNet* Model](prepare_model/convert_model/Convert_Model_From_MxNet.md) + * [Converting a Style Transfer Model from MXNet](prepare_model/convert_model/mxnet_specific/Convert_Style_Transfer_From_MXNet.md) + * [Converting Your Kaldi* Model](prepare_model/convert_model/Convert_Model_From_Kaldi.md) + * [Converting Your ONNX* Model](prepare_model/convert_model/Convert_Model_From_ONNX.md) + * [Converting Mask-RCNN ONNX* Model](prepare_model/convert_model/onnx_specific/Convert_Mask_RCNN.md) + * [Converting DLRM ONNX* Model](prepare_model/convert_model/onnx_specific/Convert_DLRM.md) + * [Model Optimizations Techniques](prepare_model/Model_Optimization_Techniques.md) + * [Cutting parts of the model](prepare_model/convert_model/Cutting_Model.md) + * [Sub-graph Replacement in Model Optimizer](prepare_model/customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md) + * [(Deprecated) Case-Study: Converting SSD models created with the TensorFlow* Object Detection API](prepare_model/customize_model_optimizer/TensorFlow_SSD_ObjectDetection_API.md) + * [(Deprecated) Case-Study: Converting Faster R-CNN models created with the TensorFlow* Object Detection API](prepare_model/customize_model_optimizer/TensorFlow_Faster_RCNN_ObjectDetection_API.md) + * [Supported Framework Layers](prepare_model/Supported_Frameworks_Layers.md) + * [Intermediate Representation and Operation Sets](IR_and_opsets.md) + * [Operations Specification](../ops/opset.md) + * [Intermediate Representation suitable for INT8 inference](prepare_model/convert_model/IR_suitable_for_INT8_inference.md) + + * [Custom Layers in Model Optimizer](prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) + * [Extending Model Optimizer with New Primitives](prepare_model/customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md) + * [Legacy Mode for Caffe* Custom Layers](prepare_model/customize_model_optimizer/Legacy_Mode_for_Caffe_Custom_Layers.md) + + * [Model Optimizer Frequently Asked Questions](prepare_model/Model_Optimizer_FAQ.md) + +* [Known Issues](Known_Issues_Limitations.md) + +**Typical Next Step:** [Introduction to Intel® Deep Learning Deployment Toolkit](../IE_DG/Introduction.md) diff --git a/docs/MO_DG/IR_and_opsets.md b/docs/MO_DG/IR_and_opsets.md new file mode 100644 index 00000000000000..0b64c9b932d280 --- /dev/null +++ b/docs/MO_DG/IR_and_opsets.md @@ -0,0 +1,261 @@ +# Deep Learning Network Intermediate Representation and Operation Sets in OpenVINO™ {#openvino_docs_MO_DG_IR_and_opsets} + +This document provides essential information on the format used for representation of deep learning models in OpenVINO™ toolkit and supported operation sets. + +## Overview of Artificial Neural Networks Representation + +This paragraph provides an overview of how a deep learning network is represented in various deep learning frameworks. + +A deep learning network is usually represented as a directed graph describing the flow of data from the network input data to the inference results. +Input data can be represented as a photograph, video, audio information or some preprocessed data that represent object from the target area of interest in a convenient way. + +Here is an illustration of a small graph representing a model that consists of a single Convolutional layer and activation function: + +![](img/small_IR_graph_demonstration.png) + +Vertices in the graph represent layers or operation instances, like convolution, pooling or element-wise operations with tensors. +Layer and operation terms are used interchangeably along the OpenVINO™ documentation and define how input data is processed to produce output data for a node in a graph. +An operation node in a graph may consume data at one or multiple input ports. +For example, element-wise addition operation has two input ports which accepts tensors that are added together. +Some operations don't have any input ports, for example Const operation which knowns the data to be produced without any input. +An edge between operations represent data flow or data dependency implied from one operation node to another operation node. + +Each operation produces data on one or multiple output ports. For example, convolution produces output tensor with activations at a single output port. Split operation usually has multiple output ports each producing part of an input tensor. + +Depending on a deep learning framework, the graph can also contain extra nodes that explicitly represent tensors between operations. +In such representations, operation nodes are not connected directly to each other, rather using data nodes as intermediate stops for data flow. +If data nodes are not used, the produced data is associated with an output port of a corresponding operation node that produces the data. + +A set of various operations used in a network is usually fixed for each deep learning framework. +It determines expressiveness and level of representation available in that framework. +It may happen that a network that can be represented in one framework is hard or impossible to be represented in another one or should use significantly different graph because operation sets used in those two frameworks do not match. + +## Intermediate Representation Used in OpenVINO™ + +OpenVINO™ toolkit introduces its own format of graph representation and its own operation set. +A graph is represented with two files: an XML file and a binary file. +This representation is commonly referred to as the *Intermediate Representation* or *IR*. + +XML file describes a network topology using `` tag for an operation node and `` tag is for a data-flow connection. +Each operation has a fixed number of attributes that define operation flavor used for a node. +For example, `Convolution` operation has such attributes as `dilation`, `stride`, `pads_begin` and `pads_end`. + +XML file doesn't have big constant values, like convolution weights. +Instead, it refers to a part of accompanying binary file that stores such values in a binary format. + +Here is an example of a small IR XML file that corresponds to a graph from the previous section: + +```xml + + + + + + + + + 1 + 3 + 32 + 100 + + + + + + + + + 64 + 3 + 3 + 3 + + + + + + + + 1 + 3 + 32 + 100 + + + 64 + 3 + 3 + 3 + + + + + 1 + 64 + 32 + 100 + + + + + + + 1 + 64 + 32 + 100 + + + + + 1 + 64 + 32 + 100 + + + + + + + 1 + 64 + 32 + 100 + + + + + + + + + + + + + + + + + + + + ... + + + + + + +``` + +The IR doesn't use explicit data nodes described in the previous section. +In contrast, properties of data such as tensor dimensions and their data types are described as properties of input and output ports of operations. + +## Operation Set + +Operations in the OpenVINO™ Operation Set are selected based on capabilities of supported deep learning frameworks and hardware capabilities of the target inference device. +It consists of several groups of operations: + + * Conventional deep learning layers like Convolution, MaxPool, MatMul (also known as FullyConnected). + + * Various activation functions, e.g. ReLU, Tanh, PReLU. + + * Generic element-wise arithmetic tensor operations like Add, Subtract, Multiply. + + * Comparison operations that compare two numeric tensors and produce boolean tensors, for example Less, Equeal, Greater. + + * Logical operations that are dealing with boolean tensors, like And, Xor, Not. + + * Data movement operations which are dealing with parts of tensors: Concat, Split, StridedSlice, Select. + + * Specialized operations that implement complex algorithms dedicated for models of specific type: DetectionOutput, RegionYolo, PriorBox. + +Refer to the complete description of the supported operation sets in the [Available Operation Sets](../ops/opset.md) document. + +## IR Versions vs Operation Set Versions + +The expressiveness of operations in OpenVINO™ is highly dependent on the supported frameworks and target hardware capabilities. +As the frameworks and hardware capabilities grow over time, the operation set is constantly evolving to support new models. +To maintain backward compatibility and growing demands, both IR format and operation set have versioning. + +Version of IR specifies the rules which are used to read the XML and binary files that represent a model. It defines an XML schema and compatible operation set that can be used to describe operations. + +Historically, there are two major IR version epochs. + +1. The older one includes IR versions from version 1 to version 7 without versioning of the operation set. During that epoch, the operation set has been growing evolutionally accumulating more layer types and extending existing layer semantics. Changing of the operation set for those versions meant increasing of IR version. + +2. OpenVINO™ 2020.1 is the starting point of the next epoch. With IR version 10 introduced in OpenVINO™ 2020.1, the versioning of the operation set is tracked separately from the IR versioning. Also, the operation set was significantly reworked as the result of nGraph integration to the OpenVINO. + +The first supported operation set in the new epoch is `opset1`. +The number after `opset` is going to be increased each time when new operations are added or old operations deleted at the release cadence. + +The operations from the new epoch cover more TensorFlow* and ONNX* operators in a form that is closer to the original operation semantics from the frameworks in comparison to the operation set used in former versions of IR (7 and lower). + +The name of the opset is specified for each operation in IR. +The IR version is specified once per whole IR. +Here is an example from the IR snippet: + +```xml + + + + + + + + + + 1 + 3 + + ... +``` + +The attributes `type="Parameter"` and `version="opset1"` in the example above mean "use that version of operation `Parameter` that is included into the operation set `opset1`". + +When a new operation set is introduced, the significant part of the operations remains unchanged and it is just aliased from the previous operation set within a new one. +The goal of operation set versions evolution is adding new operations, and probably changing of small fraction of existing operations (fixing bugs and extending semantics). +However such changes affect only new versions of operations from a new operation set, while old operations are used by specifying an appropriate `version`. +When the old `version` is specified, the behavior is kept unchanged from that specified version to provide the backward compatibility with older IRs. + +A single `xml` file with IR may contain operations from different opsets. +An operation that is included into several opsets may be referred to with `version` which points to any opset that includes that operation. +For example, the same `Convolution` can be used with `version="opset1"` and `version="opset2"` because both opsets have the same operations `Convolution`. + +## How to Read the Specification + +In the [Available Operation Sets](../ops/opset.md) there are opsets and there are operations. +Each opset specification has a list of links to operations descriptions that are included into that specific opset. +Two or more opsets may refer to the same operation. +That means an operation is kept unchanged from one operation set to another. + +Each operation description has a field `Versioned name`. +For example, `ReLU` entry point in [`opset1`](../ops/opset1.md) refers to [`ReLU-1`](../ops/activation/ReLU_1.md) as the versioned name. +And `ReLU` in `opset2` refers to the same `ReLU-1` and both `ReLU` operations are the same operation and it has a single [description](../ops/activation/ReLU_1.md). +So `opset1` and `opset2` share the same operation `ReLU`. + +To differentiate versions of the same operation type, like `ReLU`, the suffix `-N` is used in a versioned name of the operation. +`N` usually refers to the first `opsetN` where this version of the operation is introduced. +It is not guaranteed that new operations will be named according to that rule, the naming convention might be changed, but not for old operations which are frozen completely. + +## Deprecation Notice + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* + diff --git a/docs/MO_DG/Known_Issues_Limitations.md b/docs/MO_DG/Known_Issues_Limitations.md new file mode 100644 index 00000000000000..075cbc6e7c333b --- /dev/null +++ b/docs/MO_DG/Known_Issues_Limitations.md @@ -0,0 +1,47 @@ +# Known Issues and Limitations in the Model Optimizer {#openvino_docs_MO_DG_Known_Issues_Limitations} + +## Model Optimizer for TensorFlow* should be run on Intel® hardware that supports the AVX instruction set + +TensorFlow* provides only prebuilt binaries with AVX instructions enabled. When you're configuring the Model Optimizer by running the `install_prerequisites` or `install_prerequisites_tf` scripts, they download only those ones, which are not supported on hardware such as Intel® Pentium® processor N4200/5, N3350/5, N3450/5 (formerly known as Apollo Lake). + +To run the Model Optimizer on this hardware, you should compile TensorFlow binaries from source as described at the [TensorFlow website](https://www.tensorflow.org/install/source). + +Another option is to run the Model Optimizer to generate an IR on hardware that supports AVX to and then perform inference on hardware without AVX. + + +## Multiple OpenMP Loadings + +If the application uses the Inference Engine with third-party components that depend on Intel OpenMP, multiple loadings of the libiomp library may occur and cause OpenMP runtime initialization conflicts. This may happen, for example, if the application uses Intel® Math Kernel Library (Intel® MKL) through the “Single Dynamic Library” (libmkl_rt.so) mechanism and calls Intel MKL after loading the Inference Engine plugin. +The error log looks as follows: +```sh +OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized. +OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/. +``` + +Possible workarounds: + +* Preload the OpenMP runtime using the LD_PRELOAD variable: + ```sh + LD_PRELOAD= ``` + This eliminates multiple loadings of libiomp, and makes all the components use this specific version of OpenMP. + +* Alternatively, you can set KMP_DUPLICATE_LIB_OK=TRUE. However, performance degradation or results incorrectness may occur in this case. + + +## Old proto compiler breaks protobuf library + +With python protobuf library version 3.5.1 the following incompatibility can happen. +The known case is for Cent OS 7.4 + +The error log looks as follows: + +```sh +File "../lib64/python3.5/site-packages/google/protobuf/descriptor.py", line 829, in _new_ +return _message.default_pool.AddSerializedFile(serialized_pb) +TypeError: expected bytes, str found +``` + +Possible workaround is to upgrade default protobuf compiler (libprotoc 2.5.0) to newer version, for example +libprotoc 2.6.1. + +[protobuf_issue]: https://github.com/google/protobuf/issues/4272 diff --git a/docs/MO_DG/img/DeepSpeech.png b/docs/MO_DG/img/DeepSpeech.png new file mode 100644 index 00000000000000..b6f1ca96486850 --- /dev/null +++ b/docs/MO_DG/img/DeepSpeech.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7ed2c9052f631055090ef3744117ca5a8e8314e0717ba0fdc984e295caa5b925 +size 112455 diff --git a/docs/MO_DG/img/FaceNet.png b/docs/MO_DG/img/FaceNet.png new file mode 100644 index 00000000000000..9f9a2eedaab3fa --- /dev/null +++ b/docs/MO_DG/img/FaceNet.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:11579795c778b28d57cbf080dedc10149500d78cc8b16a74fe2b113c76a94f6b +size 26152 diff --git a/docs/MO_DG/img/NCF_start.png b/docs/MO_DG/img/NCF_start.png new file mode 100644 index 00000000000000..ee6aeea74b3bf4 --- /dev/null +++ b/docs/MO_DG/img/NCF_start.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a570510808fb2997ee0d51af6f92c5a4a8f8a59dbd275000489f856e89124d5 +size 120211 diff --git a/docs/MO_DG/img/compressed_int8_Convolution_weights.png b/docs/MO_DG/img/compressed_int8_Convolution_weights.png new file mode 100644 index 00000000000000..ea3c831b1cc2cb --- /dev/null +++ b/docs/MO_DG/img/compressed_int8_Convolution_weights.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6c9ddc759bc419268f4c23089b91a9e3373114a1d36b01d6fe62a5e87b5c0ad4 +size 59827 diff --git a/docs/MO_DG/img/expanded_int8_Convolution_weights.png b/docs/MO_DG/img/expanded_int8_Convolution_weights.png new file mode 100644 index 00000000000000..918e2376a482fe --- /dev/null +++ b/docs/MO_DG/img/expanded_int8_Convolution_weights.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:59890c0c4a6d1c721dfaca22f0c1d0b305401f75dcd30418f858382830be2d31 +size 49598 diff --git a/docs/MO_DG/img/inception_v1_first_block.png b/docs/MO_DG/img/inception_v1_first_block.png new file mode 100644 index 00000000000000..6ec06171d56a9d --- /dev/null +++ b/docs/MO_DG/img/inception_v1_first_block.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:344b2fcb9b7a180a8d8047e65b4aad3ca2651cfc7d5e1e408710a5a3730fed09 +size 20851 diff --git a/docs/MO_DG/img/inception_v1_std_input.png b/docs/MO_DG/img/inception_v1_std_input.png new file mode 100644 index 00000000000000..747d12a757f408 --- /dev/null +++ b/docs/MO_DG/img/inception_v1_std_input.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:78a73487434f4178f111595eb34b344b35af14bd4ccb03e6a5b00509f86e19c5 +size 5348 diff --git a/docs/MO_DG/img/inception_v1_std_output.png b/docs/MO_DG/img/inception_v1_std_output.png new file mode 100644 index 00000000000000..6f295ee6ba7d16 --- /dev/null +++ b/docs/MO_DG/img/inception_v1_std_output.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:939e1aa0d2ba28dab1c930c6271a9f4063fd9f8c539d4713c0bd0f87c34f66c3 +size 15020 diff --git a/docs/MO_DG/img/lm_1b.png b/docs/MO_DG/img/lm_1b.png new file mode 100644 index 00000000000000..fb7493e39e22c1 --- /dev/null +++ b/docs/MO_DG/img/lm_1b.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9859464a5c3ec91e4d6316109f523f48ad8972d2213a6797330e665d45b35c54 +size 44117 diff --git a/docs/MO_DG/img/mo_caffe_priorities.png b/docs/MO_DG/img/mo_caffe_priorities.png new file mode 100644 index 00000000000000..9dd5273f211d13 --- /dev/null +++ b/docs/MO_DG/img/mo_caffe_priorities.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fcef3ef39c12df68649ce73b3d5016e85b322bff5d6b34cf2ea5016468ba3450 +size 230106 diff --git a/docs/MO_DG/img/optimizations/groups.png b/docs/MO_DG/img/optimizations/groups.png new file mode 100644 index 00000000000000..b497e16547b85c --- /dev/null +++ b/docs/MO_DG/img/optimizations/groups.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3812efef32bd7f1bf40b130d5d522bc3df6aebd406bd1186699d214bca856722 +size 43721 diff --git a/docs/MO_DG/img/optimizations/inception_v4.png b/docs/MO_DG/img/optimizations/inception_v4.png new file mode 100644 index 00000000000000..64058527a5de82 --- /dev/null +++ b/docs/MO_DG/img/optimizations/inception_v4.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e232c47e8500f42bd0e1f2b93f94f58e2d59caee149c687be3cdc3e8a5be59a +size 18417 diff --git a/docs/MO_DG/img/optimizations/resnet_269.png b/docs/MO_DG/img/optimizations/resnet_269.png new file mode 100644 index 00000000000000..4ef638090e9f61 --- /dev/null +++ b/docs/MO_DG/img/optimizations/resnet_269.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:92d36b9527a3e316cd9eb2b6f5054c312466df004e4aa9c3458e165330bc6561 +size 24157 diff --git a/docs/MO_DG/img/optimizations/resnet_optimization.png b/docs/MO_DG/img/optimizations/resnet_optimization.png new file mode 100644 index 00000000000000..b276e81a2dd18e --- /dev/null +++ b/docs/MO_DG/img/optimizations/resnet_optimization.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2adeca1e3512b9fe7b088a5412ce21592977a1f352a013735537ec92e895dc94 +size 15653 diff --git a/docs/MO_DG/img/small_IR_graph_demonstration.png b/docs/MO_DG/img/small_IR_graph_demonstration.png new file mode 100644 index 00000000000000..91a3fe385ae32f --- /dev/null +++ b/docs/MO_DG/img/small_IR_graph_demonstration.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c8ae479880ab43cdb12eeb2fbaaf3b7861f786413c583eeba906c5fdf4b66730 +size 30696 diff --git a/docs/MO_DG/img/workflow_steps.png b/docs/MO_DG/img/workflow_steps.png new file mode 100644 index 00000000000000..6bf780127ad14c --- /dev/null +++ b/docs/MO_DG/img/workflow_steps.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5e22bc22d614c7335ae461a8ce449ea8695973d755faca718cf74b95972c94e2 +size 19773 diff --git a/docs/MO_DG/prepare_model/Config_Model_Optimizer.md b/docs/MO_DG/prepare_model/Config_Model_Optimizer.md new file mode 100644 index 00000000000000..a837ceef09cc96 --- /dev/null +++ b/docs/MO_DG/prepare_model/Config_Model_Optimizer.md @@ -0,0 +1,253 @@ +# Configuring the Model Optimizer {#openvino_docs_MO_DG_prepare_model_Config_Model_Optimizer} + +You must configure the Model Optimizer for the framework that was used to train +the model. This section tells you how to configure the Model Optimizer either +through scripts or by using a manual process. + +## Using Configuration Scripts + +You can either configure all three frameworks at the same time or install an +individual framework. The scripts delivered with the tool install all required +dependencies and provide the fastest and easiest way to configure the Model +Optimizer. + +To configure all three frameworks, go to the +`/deployment_tools/model_optimizer/install_prerequisites` +directory and run: + +* For Linux\* OS: +``` +install_prerequisites.sh +``` +> **NOTE**: This command installs prerequisites globally. If you want to keep Model Optimizer in a separate sandbox, run the following commands instead: +``` +virtualenv --system-site-packages -p python3 ./venv +``` +``` +source ./venv/bin/activate  # sh, bash, ksh, or zsh +``` +``` +./install_prerequisites.sh +``` + + +* For Windows\* OS: +``` +install_prerequisites.bat +``` + +To configure a specific framework, go to the +`/deployment_tools/model_optimizer/install_prerequisites` +directory and run: + +* For Caffe\* on Linux: +``` +install_prerequisites_caffe.sh +``` +* For Caffe on Windows: +``` +install_prerequisites_caffe.bat +``` +* For TensorFlow\* on Linux: +``` +install_prerequisites_tf.sh +``` +* For TensorFlow on Windows: +``` +install_prerequisites_tf.bat +``` +* For MXNet\* on Linux: +``` +install_prerequisites_mxnet.sh +``` +* For MXNet on Windows: +``` +install_prerequisites_mxnet.bat +``` +* For Kaldi\* on Linux: +``` +install_prerequisites_kaldi.sh +``` +* For Kaldi on Windows: +``` +install_prerequisites_kaldi.bat +``` +* For ONNX\* on Linux: +``` +install_prerequisites_onnx.sh +``` +* For ONNX on Windows: +``` +install_prerequisites_onnx.bat +``` + +> **IMPORTANT**: **ONLY FOR CAFFE\*** By default, you do not need to install Caffe to create an +> Intermediate Representation for a Caffe model, unless you use Caffe for +> custom layer shape inference and do not write Model Optimizer extensions. +> To learn more about implementing Model Optimizer custom operations and the +> limitations of using Caffe for shape inference, see +> [Custom Layers in Model Optimizer](customize_model_optimizer/Customize_Model_Optimizer.md). + +## Using Manual Configuration Process + +If you prefer, you can manually configure the Model Optimizer for one +framework at a time. + +1. Go to the Model Optimizer directory: +```shell +cd /deployment_tools/model_optimizer/ +``` +2. **Strongly recommended for all global Model Optimizer dependency installations**: + Create and activate a virtual environment. While not required, this step is + strongly recommended since the virtual environment creates a Python\* + sandbox, and dependencies for the Model Optimizer do not influence the + global Python configuration, installed libraries, or other components. + In addition, a flag ensures that system-wide Python libraries are available + in this sandbox. Skip this step only if you do want to install all the Model + Optimizer dependencies globally: + * Create a virtual environment: +```shell +virtualenv -p /usr/bin/python3.6 .env3 --system-site-packages +``` + * Activate the virtual environment: +```shell +virtualenv -p /usr/bin/python3.6 .env3/bin/activate +``` +3. Install all dependencies or only the dependencies for a specific framework: + * To install dependencies for all frameworks: +```shell +pip3 install -r requirements.txt +``` + * To install dependencies only for Caffe: +```shell +pip3 install -r requirements_caffe.txt +``` + * To install dependencies only for TensorFlow: +```shell +pip3 install -r requirements_tf.txt +``` + * To install dependencies only for MXNet: +```shell +pip3 install -r requirements_mxnet.txt +``` + * To install dependencies only for Kaldi: +```shell +pip3 install -r requirements_kaldi.txt +``` + * To install dependencies only for ONNX: +```shell +pip3 install -r requirements_onnx.txt +``` + +## Using the protobuf Library in the Model Optimizer for Caffe\* + +These procedures require: + +* Access to GitHub and the ability to use git commands +* Microsoft Visual Studio\* 2013 for Win64\* +* C/C++ + +Model Optimizer uses the protobuf library to load trained Caffe models. +By default, the library executes pure Python\* language implementation, +which is slow. These steps show how to use the faster C++ implementation +of the protobuf library on Windows OS or Linux OS. + +### Using the protobuf Library on Linux\* OS + +To use the C++ implementation of the protobuf library on Linux, it is enough to +set up the environment variable: +```sh +export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp +``` + +### Using the protobuf Library on Windows\* OS + +On Windows, pre-built protobuf packages for Python versions 3.4, 3.5, 3.6, +and 3.7 are provided with the installation package and can be found in +the +`\deployment_tools\model_optimizer\install_prerequisites` +folder. Please note that they are not installed with the +`install_prerequisites.bat` installation script due to possible issues +with `pip`, and you can install them at your own discretion. Make sure +that you install the protobuf version that matches the Python version +you use: + +- `protobuf-3.6.1-py3.4-win-amd64.egg` for Python 3.4 +- `protobuf-3.6.1-py3.5-win-amd64.egg` for Python 3.5 +- `protobuf-3.6.1-py3.6-win-amd64.egg` for Python 3.6 +- `protobuf-3.6.1-py3.7-win-amd64.egg` for Python 3.7 + +To install the protobuf package: + +1. Open the command prompt as administrator. +2. Go to the `install_prerequisites` folder of the OpenVINO toolkit installation directory: +```sh +cd \deployment_tools\model_optimizer\install_prerequisites +``` + +3. Run the following command to install the protobuf for Python 3.6. If + you want to install the protobuf for Python 3.4, 3.5, or 3.7, replace + `protobuf-3.6.1-py3.6-win-amd64.egg` with the corresponding file + name from the list above. +```sh +python -m easy_install protobuf-3.6.1-py3.6-win-amd64.egg +``` + If the Python version you use is lower than 3.4, you need to update + it or build the library manually. + +#### Building the protobuf Library on Windows\* OS + +> **NOTE**: These steps are optional. If you use Python version 3.4, 3.5, 3.6, or 3.7, +> you can install the protobuf library using the pre-built packages. + +To compile the protobuf library from sources on Windows OS, do the following: + +1. Clone protobuf source files from GitHub: +```shell +git clone https://github.com/google/protobuf.git +cd protobuf +``` +2. Create a Visual Studio solution file. Run these commands: +```shell +cd C:\Path\to\protobuf\cmake\build +mkdir solution +cd solution C:\Path\to\protobuf\cmake\build\solution +cmake -G "Visual Studio 12 2013 Win64" ../.. +``` +3. Change the runtime library option for `libprotobuf` and `libprotobuf-lite`: + + * Open the project's **Property Pages** dialog box + * Expand the **C/C++** tab + * Select the **Code Generation** property page + * Change the **Runtime Library** property to **Multi-thread DLL (/MD)** +4. Build the `libprotoc`, `protoc`, `libprotobuf`, and `libprotobuf-lite` projects in the **Release** configuration. +5. Add a path to the build directory to the `PATH` environment variable: +```shell +set PATH=%PATH%;C:\Path\to\protobuf\cmake\build\solution\Release +``` +6. Go to the `python` directory: +```shell +cd C:\Path\to\protobuf\python +``` +7. Use a text editor to open and change these `setup.py` options: + + * Change from ​libraries = ['protobuf'] + to libraries = ['libprotobuf', 'libprotobuf-lite'] + * Change from extra_objects = ['../src/.libs/libprotobuf.a', '../src/.libs/libprotobuf-lite.a'] + to extra_objects = ['../cmake/build/solution/Release/libprotobuf.lib', '../cmake/build/solution/Release/libprotobuf-lite.lib'] +8. Build the Python package with the C++ implementation: +```shell +python setup.py build –cpp_implementation +``` +9. Install the Python package with the C++ implementation: +```shell +python3 -m easy_install dist/protobuf-3.6.1-py3.6-win-amd64.egg +``` +10. Set an environment variable to boost the protobuf performance: +```shell +set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp +``` + +## See Also + +* [Converting a Model to Intermediate Representation (IR)](convert_model/Converting_Model.md) diff --git a/docs/MO_DG/prepare_model/Model_Optimization_Techniques.md b/docs/MO_DG/prepare_model/Model_Optimization_Techniques.md new file mode 100644 index 00000000000000..f65fa2181401ee --- /dev/null +++ b/docs/MO_DG/prepare_model/Model_Optimization_Techniques.md @@ -0,0 +1,65 @@ +# Model Optimization Techniques {#openvino_docs_MO_DG_prepare_model_Model_Optimization_Techniques} + +Optimization offers methods to accelerate inference with the convolution neural networks (CNN) that do not require model retraining. + +* * * + +## Linear Operations Fusing + +Many convolution neural networks includes `BatchNormalization` and `ScaleShift` layers (for example, Resnet\*, Inception\*) that can be presented as a sequence of linear operations: additions and multiplications. For example ScaleShift layer can be presented as Mul → Add sequence. These layers can be fused into previous `Convolution` or `FullyConnected` layers, except that case when Convolution comes after Add operation (due to Convolution paddings). + +### Usage + +In the Model Optimizer, this optimization is turned on by default. To disable it, you can pass `--disable_fusing` parameter to the Model Optimizer. + +### Optimization Description + +This optimization method consists of three stages: + +1. `BatchNormalization` and `ScaleShift` decomposition: on this stage, `BatchNormalization` layer is decomposed to `Mul → Add → Mul → Add` sequence, and `ScaleShift` layer is decomposed to `Mul → Add` layers sequence. + +2. **Linear operations merge**: on this stage we merge sequences of `Mul` and `Add` operations to the single `Mul → Add` instance. + For example, if we have `BatchNormalization → ScaleShift` sequence in our topology, it is replaced with `Mul → Add` (by the first stage). On the next stage, the latter will be replaced with `ScaleShift` layer in case if we have no available `Convolution` or `FullyConnected` layer to fuse into (next). +3. **Linear operations fusion**: on this stage, the tool fuses `Mul` and `Add` operations to `Convolution` or `FullyConnected` layers. Notice that it searches for `Convolution` and `FullyConnected` layers both backward and forward in the graph (except for `Add` operation that cannot be fused to `Convolution` layer in forward direction). + +### Usage Examples + +The picture below shows the depicted part of Caffe\* Resnet269 topology where `BatchNorm` and `ScaleShift` layers will be fused to `Convolution` layers. + +![Caffe ResNet269 block before and after optimization generated with Netscope*](../img/optimizations/resnet_269.png) + +* * * + +## ResNet optimization (stride optimization) + +ResNet optimization is a specific optimization that applies to Caffe ResNet topologies such as ResNet50, ResNet101, ResNet152 and to ResNet-based topologies. This optimization is turned on by default, and can be disabled with the `--disable_resnet_optimization` key. + +### Optimization Description + +On the picture below, you can see the original and optimized parts of a Caffe ResNet50 model. The main idea of this optimization is to move the stride that is greater than 1 from Convolution layers with the kernel size = 1 to upper Convolution layers. In addition, the Model Optimizer adds a Pooling layer to align the input shape for a Eltwise layer, if it was changed during the optimization. + +![ResNet50 blocks (original and optimized) from Netscope*](../img/optimizations/resnet_optimization.png) + +In this example, the stride from the res3a_branch1 and `res3a_branch2a` Convolution layers moves to the `res2c_branch2b` Convolution layer. Also to align the input shape for `res2c` Eltwise, the optimization inserts the Pooling layer with kernel size = 1 and stride = 2. + +* * * + +## Grouped Convolution Fusing + +Grouped convolution fusing is a specific optimization that applies for TensorFlow\* topologies. The main idea of this optimization is to combine convolutions results for the `Split` outputs and then recombine them using `Concat` operation in the same order as they were out from `Split`. + +![Split→Convolutions→Concat block from TensorBoard*](../img/optimizations/groups.png) + +* * * + +## Disable Fusing + +Model Optimizer allows to disable optimizations for specified nodes via `--finegrain_fusing ,,...` (regex is also supported). Using this key, you mark nodes that will noy be touched by any optimizations. + +### Examples of usage + +On the picture below you can see two visualized Intermediate Representations (IR) of TensorFlow InceptionV4 topology. +The first one is original IR that will be produced by the Model Optimizer. +The second one will be produced by the Model Optimizer with key `--finegrain_fusing InceptionV4/InceptionV4/Conv2d_1a_3x3/Conv2D`, where you can see that `Convolution` was not fused with `Mul1_3752` and `Mul1_4061/Fused_Mul_5096/FusedScaleShift_5987` operations. + +![TF InceptionV4 block without/with key --finegrain_fusing (from IR visualizer)](../img/optimizations/inception_v4.png) diff --git a/docs/MO_DG/prepare_model/Model_Optimizer_FAQ.md b/docs/MO_DG/prepare_model/Model_Optimizer_FAQ.md new file mode 100644 index 00000000000000..799accd0d6d61f --- /dev/null +++ b/docs/MO_DG/prepare_model/Model_Optimizer_FAQ.md @@ -0,0 +1,617 @@ +# Model Optimizer Frequently Asked Questions {#openvino_docs_MO_DG_prepare_model_Model_Optimizer_FAQ} + +If your question is not covered by the topics below, use the [OpenVINO™ Support page](https://software.intel.com/en-us/openvino-toolkit/documentation/get-started), where you can participate on a free forum. + +#### 1. What does the message "[ ERROR ]: Current caffe.proto does not contain field" mean? + +Internally, the Model Optimizer uses a protobuf library to parse and load Caffe\* models. This library requires a file grammar and a generated parser. For a Caffe fallback, the Model Optimizer uses a Caffe-generated parser for a Caffe-specific `.proto` file (which is usually located in the `src/caffe/proto` directory). So, if you have Caffe installed on your machine with Python* interface available, make sure that this is exactly the version of Caffe that was used to create the model. + +If you just want to experiment with the Model Optimizer and test a Python extension for working with your custom +layers without building Caffe, add the layer description to the `caffe.proto` file and generate a parser for it. + +For example, to add the description of the `CustomReshape` layer, which is an artificial layer not present in any `caffe.proto` files: + +1. Add the following lines to of the `caffe.proto` file: +```shell + package mo_caffe; // to avoid conflict with system Caffe* it is highly recommended to specify different package name + ... + message LayerParameter { + // other layers parameters description + ... + optional CustomReshapeParameter custom_reshape_param = 546; // 546 - ID is any number not present in caffe.proto + } + // these lines to end of the file - describing contents of this parameter + message CustomReshapeParameter { + optional BlobShape shape = 1; // we just use the same parameter type as some other Caffe layers + } +``` + +2. Generate a new parser: +```shell +cd /deployment_tools/model_optimizer/mo/front/caffe/proto +python3 generate_caffe_pb2.py --input_proto /src/caffe/proto/caffe.proto +``` +where `PATH_TO_CUSTOM_CAFFE` is the path to the root directory of custom Caffe\*. + +3. Now, the Model Optimizer is able to load the model into memory and start working with your extensions if there are any. + +However, because your model has custom layers, you must register your custom layers as custom. To learn more about it, refer to the section [Custom Layers in Model Optimizer](customize_model_optimizer/Customize_Model_Optimizer.md). + +#### 2. How do I create a bare caffemodel, if I have only prototxt? + +You need the Caffe\* Python\* interface. In this case, do the following: +```shell +python3 +import caffe +net = caffe.Net('/my_net.prototxt', caffe.TEST) +net.save('/my_net.caffemodel') +``` +#### 3. What does the message "[ ERROR ]: Unable to create ports for node with id" mean? + +Most likely, the Model Optimizer does not know how to infer output shapes of some layers in the given topology. +To lessen the scope, compile the list of layers that are custom for the Model Optimizer: present in the topology, +absent in [list of supported layers](Supported_Frameworks_Layers.md) for the target framework. Then refer to available options in the corresponding section in [Custom Layers in Model Optimizer](customize_model_optimizer/Customize_Model_Optimizer.md). + +#### 4. What does the message "Input image of shape is larger than mean image from file" mean? + +Your model input shapes must be smaller than or equal to the shapes of the mean image file you provide. The idea behind the mean file is to subtract its values from the input image in an element-wise manner. When the mean file is smaller than the input image, there are not enough values to perform element-wise subtraction. Also, make sure that you use the mean file that was used during the network training phase. Note that the mean file is dataset dependent. + +#### 5. What does the message "Mean file is empty" mean? + +Most likely, the mean file that you have is specified with `--mean_file` flag, while launching the Model Optimizer is empty. Make sure that this is exactly the required mean file and try to regenerate it from the given dataset if possible. + +#### 6. What does the message "Probably mean file has incorrect format" mean? + +The mean file that you provide for the Model Optimizer must be in a `.binaryproto` format. You can try to check the content using recommendations from the BVLC Caffe\* ([#290](https://github.com/BVLC/caffe/issues/290)). + +#### 7. What does the message "Invalid proto file: there is neither 'layer' nor 'layers' top-level messages" mean? + +The structure of any Caffe\* topology is described in the `caffe.proto` file of any Caffe version. For example, in the Model Optimizer, you can find the following proto file, used by default: `/deployment_tools/model_optimizer/mo/front/caffe/proto/my_caffe.proto`. There you can find the structure: +``` +message NetParameter { + // ... some other parameters + // The layers that make up the net. Each of their configurations, including + // connectivity and behavior, is specified as a LayerParameter. + repeated LayerParameter layer = 100; // ID 100 so layers are printed last. + // DEPRECATED: use 'layer' instead. + repeated V1LayerParameter layers = 2; +} +``` +This means that any topology should contain layers as top-level structures in `prototxt`. For example, see the [LeNet topology](https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet.prototxt). + +#### 8. What does the message "Old-style inputs (via 'input_dims') are not supported. Please specify inputs via 'input_shape'" mean? + +The structure of any Caffe\* topology is described in the `caffe.proto` file for any Caffe version. For example, in the Model Optimizer you can find the following `.proto` file, used by default: `/deployment_tools/model_optimizer/mo/front/caffe/proto/my_caffe.proto`. There you can find the structure: +```sh +message NetParameter { + + optional string name = 1; // consider giving the network a name + // DEPRECATED. See InputParameter. The input blobs to the network. + repeated string input = 3; + // DEPRECATED. See InputParameter. The shape of the input blobs. + repeated BlobShape input_shape = 8; + // 4D input dimensions -- deprecated. Use "input_shape" instead. + // If specified, for each input blob there should be four + // values specifying the num, channels, height and width of the input blob. + // Thus, there should be a total of (4 * #input) numbers. + repeated int32 input_dim = 4; + // ... other parameters +} +``` +So, the input layer of the provided model must be specified in one of the following styles: + +* +```sh +input: "data" +input_shape +{ + dim: 1 + dim: 3 + dim: 227 + dim: 227 +} +``` + +* +```sh +input: "data" +input_shape +{ + dim: 1 + dim: 3 + dim: 600 + dim: 1000 +} +input: "im_info" +input_shape +{ + dim: 1 + dim: 3 +} +``` +* +```sh +layer +{ + name: "data" + type: "Input" + top: "data" + input_param {shape: {dim: 1 dim: 3 dim: 600 dim: 1000}} +} +layer +{ + name: "im_info" + type: "Input" + top: "im_info" + input_param {shape: {dim: 1 dim: 3}} +} +``` +* +```sh +input: "data" +input_dim: 1 +input_dim: 3 +input_dim: 500 +``` + +However, if your model contains more than one input, the Model Optimizer is able to convert the model with inputs specified in a form of 1, 2, 3 of the list above. The last form is not supported for multi-input topologies. + +#### 9. What does the message "Mean file for topologies with multiple inputs is not supported" mean? + +Model Optimizer does not support mean file processing for topologies with more than one input. In this case, you need to perform preprocessing of the inputs for a generated Intermediate Representation in the Inference Engine to perform subtraction for every input of your multi-input model. + +#### 10. What does the message "Cannot load or process mean file: value error" mean? + +There are multiple reasons why the Model Optimizer does not accept the mean file. See FAQs [#4](#FAQ4), [#5](#FAQ5), and [#6](#FAQ6). + +#### 11. What does the message "Invalid prototxt file: value error" mean? + +There are multiple reasons why the Model Optimizer does not accept a Caffe* topology. See FAQs [#7](#FAQ7) and [#20](#FAQ20). + +#### 12. What does the message "Error happened while constructing caffe.Net in the Caffe fallback function" mean? + +Model Optimizer tried to infer a specified layer via the Caffe\* framework, however it cannot construct a net using the Caffe Python* interface. Make sure that your `caffemodel` and `prototxt` files are correct. To prove that the problem is not in the `prototxt` file, see FAQ [#2](#FAQ2). + +#### 13. What does the message "Cannot infer shapes due to exception in Caffe" mean? + +Model Optimizer tried to infer a custom layer via the Caffe\* framework, however an error occurred, meaning that the model could not be inferred using the Caffe. It might happen if you try to convert the model with some noise weights and biases resulting in problems with layers with dynamic shapes. You should write your own extension for every custom layer you topology might have. For more details, refer to [Extending Model Optimizer with New Primitives](customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md). + +#### 14. What does the message "Cannot infer shape for node {} because there is no Caffe available. Please register python infer function for op or use Caffe for shape inference" mean? + +Your model contains a custom layer and you have correctly registered it with the `CustomLayersMapping.xml` file. These steps are required to offload shape inference of the custom layer with the help of the system Caffe\*. However, the Model Optimizer could not import a Caffe package. Make sure that you have built Caffe with a `pycaffe` target and added it into the `PYTHONPATH` environment variable. For more information, please refer to the [Configuring the Model Optimizer](customize_model_optimizer/Legacy_Mode_for_Caffe_Custom_Layers.md). At the same time, it is highly recommend to avoid dependency on Caffe and write your own Model Optimizer extension for your custom layer. For more information, refer to the FAQ [#45](#FAQ45). + +#### 15. What does the message "Framework name can not be deduced from the given options. Use --framework to choose one of Caffe, TensorFlow, MXNet" mean? + +You have run the Model Optimizer without a flag `--framework caffe|tf|mxnet`. Model Optimizer tries to deduce the framework by the input model file extension (`.pb` for TensorFlow\*, `.caffemodel` for Caffe\*, `.params` for MXNet\*). Your input model might have a different extension and you need to explicitly set the source framework. For example, use `--framework caffe`. + +#### 16. What does the message "Input shape is required to convert MXNet model. Please provide it with --input_shape" mean? + +Input shape was not provided. That is mandatory for converting an MXNet\* model to the Intermediate Representation, because MXNet models do not contain information about input shapes. Please, use the `--input_shape` flag to specify it. For more information about using the `--input_shape`, refer to the FAQ [#57](#FAQ57). + +#### 17. What does the message "Both --mean_file and mean_values are specified. Specify either mean file or mean values" mean? + +`--mean_file` and `--mean_values` are two ways of specifying preprocessing for the input. However, they cannot be used together, as it would mean double subtraction and lead to ambiguity. Choose one of these options and pass it using the corresponding CLI option. + +#### 18. What does the message "Negative value specified for --mean_file_offsets option. Please specify positive integer values in format '(x,y)'" mean? + +You might have specified negative values with `--mean_file_offsets`. Only positive integer values in format '(x,y)' must be used. + +#### 19. What does the message "Both --scale and --scale_values are defined. Specify either scale factor or scale values per input channels" mean? + +`--scale` sets a scaling factor for all channels. `--scale_values` sets a scaling factor per each channel. Using both of them simultaneously produces ambiguity, so you must use only one of them. For more information, refer to the Using Framework-Agnostic Conversion Parameters: for Converting a Caffe* Model, Converting a TensorFlow* Model, Converting an MXNet* Model. + +#### 20. What does the message "Cannot find prototxt file: for Caffe please specify --input_proto - a protobuf file that stores topology and --input_model that stores pretrained weights" mean? + +Model Optimizer cannot find a `.prototxt` file for a specified model. By default, it must be located in the same directory as the input model with the same name (except extension). If any of these conditions is not satisfied, use `--input_proto` to specify the path to the `.prototxt` file. + +#### 22. What does the message "Failed to create directory .. . Permission denied!" mean? + +Model Optimizer cannot create a directory specified via `--output_dir`. Make sure that you have enough permissions to create the specified directory. + +#### 23. What does the message "Discovered data node without inputs and value" mean? + +One of the layers in the specified topology might not have inputs or values. Please make sure that the provided `caffemodel` and `protobuf` files are correct. + +#### 24. What does the message "Part of the nodes was not translated to IE. Stopped" mean? + +Some of the layers are not supported by the Inference Engine and cannot be translated to an Intermediate Representation. You can extend the Model Optimizer by allowing generation of new types of layers and implement these layers in the dedicated Inference Engine plugins. For more information, refer to [Extending the Model Optimizer with New Primitives](customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md) page and [Inference Engine Extensibility Mechanism](../../IE_DG/Extensibility_DG/Intro.md) + +#### 25. What does the message "While creating an edge from .. to .. : node name is undefined in the graph. Check correctness of the input model" mean? + +Model Optimizer cannot build a graph based on a specified model. Most likely, it is incorrect. + +#### 26. What does the message "Node does not exist in the graph" mean? + +You might have specified an output node via the `--output` flag that does not exist in a provided model. Make sure that the specified output is correct and this node exists in the current model. + +#### 27. What does the message "--input parameter was provided. Other inputs are needed for output computation. Provide more inputs or choose another place to cut the net" mean? + +Most likely, the Model Optimizer tried to cut the model by a specified input. However, other inputs are needed. + +#### 28. What does the message "Placeholder node does not have an input port, but input port was provided" mean? + +You might have specified a placeholder node with an input node, while the placeholder node does not have it the model. + +#### 29. What does the message "Port index is out of number of available input ports for node" mean? + +This error occurs when an incorrect input port is specified with the `--input` command line argument. When using `--input`, you can optionally specify an input port in the form: `X:node_name`, where `X` is an integer index of the input port starting from 0 and `node_name` is the name of a node in the model. This error occurs when the specified input port `X` is not in the range 0..(n-1), where n is the number of input ports for the node. Please, specify a correct port index, or do not use it if it is not needed. + +#### 30. What does the message "Node has more than 1 input and input shapes were provided. Try not to provide input shapes or specify input port with PORT:NODE notation, where PORT is an integer" mean? + +This error occurs when an incorrect combination of the `--input` and `--input_shape` command line options is used. Using both `--input` and `--input_shape` is valid only if `--input` points to the `Placeholder` node, a node with one input port or `--input` has the form `PORT:NODE`, where `PORT` is an integer port index of input for node `NODE`. Otherwise, the combination of `--input` and `--input_shape` is incorrect. + +#### 31. What does the message "Input port > 0 in --input is not supported if --input_shape is not provided. Node: NAME_OF_THE_NODE. Omit port index and all input ports will be replaced by placeholders. Or provide --input_shape" mean? + +When using the `PORT:NODE` notation for the `--input` command line argument and `PORT` > 0, you should specify `--input_shape` for this input. This is a limitation of the current Model Optimizer implementation. + +#### 32. What does the message "No or multiple placeholders in the model, but only one shape is provided, cannot set it" mean? + +Looks like you have provided only one shape for the placeholder, however there are no or multiple inputs in the model. Please, make sure that you have provided correct data for placeholder nodes. + +#### 33. What does the message "The amount of input nodes for port is not equal to 1" mean? + +This error occurs when the `SubgraphMatch.single_input_node` function is used for an input port that supplies more than one node in a sub-graph. The `single_input_node` function can be used only for ports that has a single consumer inside the matching sub-graph. When multiple nodes are connected to the port, use the `input_nodes` function or `node_by_pattern` function instead of `single_input_node`. Please, refer to [Sub-Graph Replacement in the Model Optimizer](customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md) for more details. + +#### 34. What does the message "Output node for port has already been specified" mean? + +This error occurs when the `SubgraphMatch._add_output_node` function is called manually from user's extension code. This is an internal function, and you should not call it directly. + +#### 35. What does the message "Unsupported match kind.... Match kinds "points" or "scope" are supported only" mean? + +While using configuration file to implement a TensorFlow\* front replacement extension, an incorrect match kind was used. Only `points` or `scope` match kinds are supported. Please, refer to [Sub-Graph Replacement in the Model Optimizer](customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md) for more details. + +#### 36. What does the message "Cannot write an event file for the TensorBoard to directory" mean? + +Model Optimizer tried to write an event file in the specified directory but failed to do that. That could happen because the specified directory does not exist or you do not have enough permissions to write in it. + +#### 37. What does the message "There is no registered 'infer' function for node with op = .. . Please implement this function in the extensions" mean? + +Most likely, you tried to extend Model Optimizer with a new primitive, but did not specify an infer function. For more information on extensions, see [Extending the Model Optimizer with New Primitives](customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md). + +#### 38. What does the message "Stopped shape/value propagation at node" mean? + +Model Optimizer cannot infer shapes or values for the specified node. It can happen because of a bug in the custom shape infer function, because the node inputs have incorrect values/shapes, or because the input shapes are incorrect. + +#### 39. What does the message "The input with shape .. does not have the batch dimension" mean? + +Batch dimension is the first dimension in the shape and it should be equal to 1 or undefined. In your case, it is not equal to either 1 or undefined, which is why the `-b` shortcut produces undefined and unspecified behavior. To resolve the issue, specify full shapes for each input with the `--input_shape` option. Run Model Optimizer with the `--help` option to learn more about the notation for input shapes. + +#### 40. What does the message "Not all output shapes were inferred or fully defined for node" mean? + +Most likely, the shape is not defined (partially or fully) for the specified node. You can use `--input_shape` with positive integers to override model input shapes. + +#### 41. What does the message "Shape for tensor is not defined. Can not proceed" mean? + +This error occurs when the `--input` command line option is used to cut a model and `--input_shape` is not used to override shapes for a node and a shape for the node cannot be inferred by Model Optimizer. You need to help Model Optimizer and specify shapes with `--input_shape` for each node that is specified with the `--input` command line option. + +#### 42. What does the message "Module TensorFlow was not found. Please install TensorFlow 1.2 or higher" mean? + +To convert TensorFlow\* models with Model Optimizer, TensorFlow 1.2 or newer must be installed. For more information on prerequisites, see [Configuring the Model Optimizer](Config_Model_Optimizer.md). + +#### 43. What does the message "Cannot read the model file: it is incorrect TensorFlow model file or missing" mean? + +The model file should contain a frozen TensorFlow\* graph in the text or binary format. Make sure that `--input_model_is_text` is provided for a model in the text format. By default, a model is interpreted as binary file. + +#### 44. What does the message "Cannot pre-process TensorFlow graph after reading from model file. File is corrupt or has unsupported format" mean? + +Most likely, there is a problem with the specified file for model. The file exists, but it has bad formatting or is corrupted. + +#### 45. What does the message "Found custom layer. Model Optimizer does not support this layer. Please, register it in CustomLayersMapping.xml or implement extension" mean? + +This means that the layer `{layer_name}` is not supported in the Model Optimizer. You can find a list of all unsupported layers in the corresponding section. You should add this layer to `CustomLayersMapping.xml` ([Legacy Mode for Caffe* Custom Layers](customize_model_optimizer/Legacy_Mode_for_Caffe_Custom_Layers.md)) or implement the extensions for this layer ([Extending Model Optimizer with New Primitives](customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md)). + +#### 46. What does the message "Custom replacement configuration file does not exist" mean? + +Path to the custom replacement configuration file was provided with the `--transformations_config` flag, but the file could not be found. Please, make sure that the specified path is correct and the file exists. + +#### 47. What does the message "Extractors collection have case insensitive duplicates" mean? + +When extending Model Optimizer with new primitives keep in mind that their names are case insensitive. Most likely, another operation with the same name is already defined. For more information, see [Extending the Model Optimizer with New Primitives](customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md). + +#### 48. What does the message "Input model name is not in an expected format, cannot extract iteration number" mean? + +Model Optimizer can not load an MXNet\* model in the specified file format. Please, use the `.json` or `.param` format. + +#### 49. What does the message "Cannot convert type of placeholder because not all of its outputs are 'Cast' to float operations" mean? + +There are models where `Placeholder` has the UINT8 type and the first operation after it is 'Cast', which casts the input to FP32. Model Optimizer detected that the `Placeholder` has the UINT8 type, but the next operation is not 'Cast' to float. Model Optimizer does not support such a case. Please, change the model to have placeholder FP32 data type. + +#### 50. What does the message "Data type is unsupported" mean? + +Model Optimizer cannot convert the model to the specified data type. Currently, FP16 and FP32 are supported. Please, specify the data type with the `--data_type` flag. The available values are: FP16, FP32, half, float. + +#### 51. What does the message "No node with name ..." mean? + +Model Optimizer tried to access a node that does not exist. This could happen if you have incorrectly specified placeholder, input or output node name. + +#### 52. What does the message "Module mxnet was not found. Please install MXNet 1.0.0" mean? + +To convert MXNet\* models with Model Optimizer, MXNet 1.0.0 must be installed. For more information about prerequisites, see [Configuring the Model Optimizer](Config_Model_Optimizer.md). + +#### 53. What does the message "The following error happened while loading MXNet model .." mean? + +Most likely, there is a problem with loading of the MXNet\* model. Please, make sure that the specified path is correct, the model exists, it is not corrupted, and you have sufficient permissions to work with it. + +#### 54. What does the message "The following error happened while processing input shapes: .." mean? + +Please, make sure that inputs are defined and have correct shapes. You can use `--input_shape` with positive integers to override model input shapes. + +#### 55. What does the message "Attempt to register of custom name for the second time as class. Note that custom names are case-insensitive" mean? + +When extending Model Optimizer with new primitives keep in mind that their names are case insensitive. Most likely, another operation with the same name is already defined. For more information, see [Extending the Model Optimizer with New Primitives](customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md) . + +#### 56. What does the message "Both --input_shape and --batch were provided. Please, provide only one of them" mean? + +You cannot specify the batch and the input shape at the same time. You should specify a desired batch as the first value of the input shape. + +#### 57. What does the message "Input shape .. cannot be parsed" mean? + +The specified input shape cannot be parsed. Please, define it in one of the following ways: + +* +```shell +python3 mo.py --input_model .caffemodel --input_shape (1,3,227,227) +``` +* +```shell +python3 mo.py --input_model .caffemodel --input_shape [1,3,227,227] +``` +* In case of multi input topology you should also specify inputs: +```shell +python3 mo.py --input_model /path-to/your-model.caffemodel --input data,rois --input_shape (1,3,227,227),(1,6,1,1) +``` + +Keep in mind that there is no space between and inside the brackets for input shapes. + +#### 58. What does the message "Please provide input layer names for input layer shapes" mean? + +When specifying input shapes for several layers, you must provide names for inputs, whose shapes will be overwritten. For usage examples, see [Converting a Caffe\* Model](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_Caffe.html). Additional information for `--input_shape` is in FAQ [#57](#FAQ57). + +#### 59. What does the message "Values cannot be parsed" mean? + +Mean values for the given parameter cannot be parsed. It should be a string with a list of mean values. For example, in '(1,2,3)', 1 stands for the RED channel, 2 for the GREEN channel, 3 for the BLUE channel. + +#### 60. What does the message ".. channels are expected for given values" mean? + +The number of channels and the number of given values for mean values do not match. The shape should be defined as '(R,G,B)' or '[R,G,B]'. The shape should not contain undefined dimensions (? or -1). The order of values is as follows: (value for a RED channel, value for a GREEN channel, value for a BLUE channel). + +#### 61. What does the message "You should specify input for each mean value" mean? + +Most likely, you have not specified inputs using `--mean_values`. Please, specify inputs with the `--input` flag. For usage examples, please, refer to FAQ [#63](#FAQ63). + +#### 62. What does the message "You should specify input for each scale value" mean? + +Most likely, you have not specified inputs using `--scale_values`. Please, specify inputs with the `--input` flag. For usage examples, please, refer to FAQ [#64](#FAQ64). + +#### 63. What does the message "Number of inputs and mean values does not match" mean? + +The number of specified mean values and the number of inputs must be equal. Please, refer to [Converting a Caffe* Model](convert_model/Convert_Model_From_Caffe.md) for a usage example. + +#### 64. What does the message "Number of inputs and scale values does not match" mean? + +The number of specified scale values and the number of inputs must be equal. Please, refer to [Converting a Caffe* Model](convert_model/Convert_Model_From_Caffe.md) for a usage example. + +#### 65. What does the message "No class registered for match kind ... Supported match kinds are .. " mean? + +A replacement defined in the configuration file for sub-graph replacement using node names patterns or start/end nodes has the `match_kind` attribute. The attribute may have only one of the values: `scope` or `points`. If a different value is provided, this error is displayed. + +#### 66. What does the message "No instance(s) is(are) defined for the custom replacement" mean? + +A replacement defined in the configuration file for sub-graph replacement using node names patterns or start/end nodes has the `instances` attribute. This attribute is mandatory, and it causes this error if it is missing. Refer to documentation with a description of the sub-graph replacement feature. + +#### 67. What does the message "The instance must be a single dictionary for the custom replacement with id .." mean? + +A replacement defined in the configuration file for sub-graph replacement using start/end nodes has the `instances` attribute. For this type of replacement, the instance must be defined with a dictionary with two keys `start_points` and `end_points`. Values for these keys are lists with the start and end node names, respectively. Refer to documentation with a description of the sub-graph replacement feature. + +#### 68. What does the message "No instances are defined for replacement with id .. " mean? + +A replacement for the specified id is not defined in the configuration file. Please, refer to FAQ [#66](#FAQ66) for more information. + +#### 69. What does the message "Custom replacements configuration file .. does not exist" mean? + +Path to a custom replacement configuration file was provided with the `--transformations_config` flag, but it cannot be found. Please, make sure that the specified path is correct and the file exists. + +#### 70. What does the message "Failed to parse custom replacements configuration file .." mean? + +The file for custom replacement configuration provided with the `--transformations_config` flag cannot be parsed. In particular, it should have a valid JSON structure. For more details, refer to [JSON Schema Reference](https://spacetelescope.github.io/understanding-json-schema/reference/index.html). + +#### 71. What does the message "One of the custom replacements in the configuration file .. does not contain attribute 'id'" mean? + +Every custom replacement should declare a set of mandatory attributes and their values. For more details, refer to FAQ [#72](#FAQ72). + +#### 72. What does the message "File .. validation failed" mean? + +The file for custom replacement configuration provided with the `--transformations_config` flag cannot pass validation. Make sure that you have specified `id`, `instances` and `match_kind` for all the patterns. + +#### 73. What does the message "Cannot update the file .. because it is broken" mean? + +The custom replacement configuration file provided with the `--tensorflow_custom_operations_config_update` cannot be parsed. Please, make sure that the file is correct and refer to FAQs [#69](#FAQ69), [#70](#FAQ70), [#71](#FAQ71), and [#72](#FAQ72). + +#### 74. What does the message "End node .. is not reachable from start nodes: .." mean? + +This error occurs when you try to make a sub-graph match. It is detected that between the start and end nodes that were specified as inputs/outputs of the subgraph to find, there are nodes that are marked as outputs but there is no path from them to the input nodes. Make sure that the subgraph you want to match does actually contain all the specified output nodes. + +#### 75. What does the message "Sub-graph contains network input node .." mean? + +Start or end node for the sub-graph replacement using start/end nodes is specified incorrectly. Model Optimizer finds internal nodes of the sub-graph strictly "between" the start and end nodes. Then it adds all input nodes to the sub-graph (and inputs of their inputs and so on) for these "internal" nodes. The error reports, that the Model Optimizer reached input node during this phase. This means that the start/end points are specified incorrectly in the configuration file. Refer to documentation with a description of the sub-graph replacement feature. + +#### 76. What does the message "... elements of ... were clipped to infinity while converting a blob for node [...] to ..." mean? + +This message may appear when the `--data_type=FP16` command line option is used. This option implies conversion of all the blobs in the node to FP16. If a value in a blob is out of the range of valid FP16 values, the value is converted to positive or negative infinity. It may lead to incorrect results of inference or may not be a problem, depending on the model. The number of such elements and the total number of elements in the blob is printed out together with the name of the node, where this blob is used. + +#### 77. What does the message "... elements of ... were clipped to zero while converting a blob for node [...] to ..." mean? + +This message may appear when the `--data_type=FP16` command line option is used. This option implies conversion of all blobs in the mode to FP16. If a value in the blob is so close to zero that it cannot be represented as a valid FP16 value, it is converted to a true zero FP16 value. Depending on the model, it may lead to incorrect results of inference or may not be a problem. The number of such elements and the total number of elements in the blob are printed out together with a name of the node, where this blob is used. + +#### 78. What does the message "The amount of nodes matched pattern ... is not equal to 1" mean? + +This error occurs when the `SubgraphMatch.node_by_pattern` function is used with a pattern that does not uniquely identify a single node in a sub-graph. Try to extend the pattern string to make unambiguous match to a single sub-graph node. For more details, refer to [Sub-graph Replacement in the Model Optimizer](customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md). + +#### 79. What does the message "The topology contains no "input" layers" mean? + +Your Caffe\* topology `.prototxt` file is intended for training. Model Optimizer expects a deployment-ready `.prototxt` file. To fix the problem, prepare a deployment-ready `.prototxt` file. Usually, preparation of a deploy-ready topology results in removing `data` layer(s), adding `input` layer(s), and removing loss layer(s). + +#### 80. What does the message "Warning: please expect that Model Optimizer conversion might be slow" mean? + +You are using an unsupported Python\* version. Use only versions 3.4 - 3.6 for the C++ `protobuf` implementation that is supplied with the OpenVINO Toolkit. You can still boost conversion speed by building protobuf library from sources. For complete instructions about building `protobuf` from sources, see the appropriate section in [Converting a Model to Intermediate Representation](Config_Model_Optimizer.md). + +#### 81. What does the message "Arguments --nd_prefix_name, --pretrained_model_name and --input_symbol should be provided. Please provide all or do not use any." mean? + +This error occurs if you do not provide `--nd_prefix_name`, `--pretrained_model_name` and `--input_symbol` parameters. +Model Optimizer requires both `.params` and `.nd` model files to merge into the result file (`.params`). Topology +description (`.json` file) should be prepared (merged) in advance and provided with `--input_symbol` parameter. + +If you add to your model additional layers and weights that are in `.nd` files, the Model Optimizer can build a model +from one `.params` file and two additional `.nd` files (`*_args.nd`, `*_auxs.nd`). +To do that, provide both CLI options or do not pass them if you want to convert an MXNet model without additional weights. +For more information, refer to [Converting a MXNet* Model](convert_model/Convert_Model_From_MxNet.md). + +#### 82. What does the message "You should specify input for mean/scale values" mean? + +In case when the model has multiple inputs and you want to provide mean/scale values, you need to pass those values for each input. More specifically, a number of passed values should be the same as the number of inputs of the model. +For more information, refer to [Converting a Model to Intermediate Representation](convert_model/Converting_Model.md). + +#### 83. What does the message "Input with name ... not found!" mean? + +When you passed the mean/scale values and specify names of input layers of the model, you might have used the name that does not correspond to any input layer. Make sure that by passing values with `--input` option, you list only names of the input layers of your model. +For more information, refer to the [Converting a Model to Intermediate Representation](convert_model/Converting_Model.md). + +#### 84. What does the message "Specified input json ... does not exist" mean? + +Most likely, `.json` file does not exist or has a name that does not match the notation of MXNet. Make sure that the file exists and it has a correct name. +For more information, refer to [Converting a MXNet\* Model](convert_model/Convert_Model_From_MxNet.md). + +#### 85. What does the message "Unsupported Input model file type ... Model Optimizer support only .params and .nd files format" mean? + +Model Optimizer for MXNet supports only `.params` and `.nd` files formats. Most likely, you specified some unsupported file format in `--input_model`. +For more information, refer to [Converting a MXNet* Model](convert_model/Convert_Model_From_MxNet.md). + +#### 86. What does the message "Operation ... not supported. Please register it as custom op" mean? + +Model Optimizer tried to load the model that contains some unsupported operations. +If you want to convert model that contains unsupported operations you need to prepare extension for all such operations. +For more information, refer to [Extending Model Optimizer with New Primitives](customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md). + +#### 87. What does the message "Can not register Op ... Please, call function 'register_caffe_python_extractor' with parameter 'name'" mean? + +This error appears if the class of implementation of op for Python Caffe layer could not be used by Model Optimizer. Python layers should be handled differently compared to ordinary Caffe layers. + +In particular, you need to call the function `register_caffe_python_extractor` and pass `name` as the second argument of the function. +The name should be the compilation of the layer name and the module name separated by a dot. + +For example, your topology contains this layer with type `Python`: + +``` +layer { + name: 'proposal' + type: 'Python' + ... + python_param { + module: 'rpn.proposal_layer' + layer: 'ProposalLayer' + param_str: "'feat_stride': 16" + } +} +``` + +What you do first is implementing an extension for this layer in the Model Optimizer as an ancestor of `Op` class. +``` +class ProposalPythonExampleOp(Op): + op = 'Proposal' + + def __init__(self, graph: nx.MultiDiGraph, attrs: dict): + ... +``` + +It is mandatory to call two functions right after the implementation of that class: +``` +class ProposalPythonExampleOp(Op): + ... + +register_caffe_python_extractor(ProposalPythonExampleOp, 'rpn.proposal_layer.ProposalLayer') +Op.excluded_classes.append(ProposalPythonExampleOp) +``` + +Note that the first call register_caffe_python_extractor(ProposalPythonExampleOp, 'rpn.proposal_layer.ProposalLayer') registers extension of the layer in the Model Optimizer that will be found by the specific name (mandatory to join module name and layer name): rpn.proposal_layer.ProposalLayer. + +The second call prevents Model Optimizer from using this extension as if it is an extension for +a layer with type `Proposal`. Otherwise, this layer can be chosen as an implementation of extension that can lead to potential issues. +For more information, refer to the [Extending Model Optimizer with New Primitives](customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md). + +#### 88. What does the message "Model Optimizer is unable to calculate output shape of Memory node .." mean? + +Model Optimizer supports only `Memory` layers, in which `input_memory` goes before `ScaleShift` or `FullyConnected` layer. +This error message means that in your model the layer after input memory is not of type `ScaleShift` or `FullyConnected`. +This is a known limitation. + +#### 89. What do the messages "File ... does not appear to be a Kaldi file (magic number does not match)", "Kaldi model should start with tag" mean? + +These error messages mean that the Model Optimizer does not support your Kaldi\* model, because check sum of the model is not +16896 (the model should start with this number) or model file does not contain tag `` as a starting one. +Double check that you provide a path to a true Kaldi model and try again. + +#### 90. What do the messages "Expect counts file to be one-line file." or "Expect counts file to contain list of integers" mean? + +These messages mean that you passed the file counts containing not one line. The count file should start with +`[` and end with `]`, and integer values should be separated by space between those signs. + +#### 91. What does the message "Model Optimizer is not able to read Kaldi model .." mean? + +There are multiple reasons why the Model Optimizer does not accept a Kaldi topology: +file is not available or does not exist. Refer to FAQ [#89](#FAQ89). + +#### 92. What does the message "Model Optimizer is not able to read counts file .." mean? + +There are multiple reasons why the Model Optimizer does not accept a counts file: +file is not available or does not exist. Also refer to FAQ [#90](#FAQ90). + +#### 93. What does the message "For legacy MXNet models Model Optimizer does not support conversion of old MXNet models (trained with 1.0.0 version of MXNet and lower) with custom layers." mean? + +This message means that if you have model with custom layers and its json file has been generated with MXNet version +lower than 1.0.0, Model Optimizer does not support such topologies. If you want to convert it you have to rebuld +MXNet with unsupported layers or generate new json with MXNet version 1.0.0 and higher. Also you need to implement +Inference Engine extension for used custom layers. +For more information, refer to the [appropriate section of Model Optimizer configuration](customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md). + +#### 97. What does the message "Graph contains a cycle. Can not proceed .." mean? + +Model Optimizer supports only straightforward models without cycles. + +There are multiple ways to avoid cycles: + +For Tensorflow: +* [Convert models, created with TensorFlow Object Detection API](convert_model/tf_specific/Convert_Object_Detection_API_Models.md) + +For all frameworks: +1. [Replace cycle containing Sub-graph in Model Optimizer](customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md) +2. [Extend Model Optimizer with New Primitives from first step](customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md) + +or +* Edit network in original framework to exclude cycle. + +#### 98. What does the message "Can not transpose attribute '..' with value .. for node '..' .." mean? + +This message means that model is not supported. It may be caused by using shapes larger than 4-D. +There are two ways to avoid such message: + +1. [Cut model part containing such layers in Model Optimizer](convert_model/Cutting_Model.md) +2. Edit network in original framework to exclude such layers. + +#### 99. What does the message "Expected token ``, has `...`" mean? + +This error messages mean that Model Optimizer does not support your Kaldi model, because the Net contains `ParallelComponent` that does not end by tag ``. +Double check that you provide a path to a true Kaldi model and try again. + +#### 100. What does the message "Interp layer shape inference function may be wrong, please, try to update layer shape inference function in the file (extensions/ops/interp.op at the line ...)." mean? + +There are many flavors of Caffe framework, and most layers in them are implemented identically. +But there are exceptions. For example, output value of layer Interp is calculated differently in Deeplab-Caffe and classic Caffe. So if your model contain layer Interp and converting of your model has failed, please modify the 'interp_infer' function in the file extensions/ops/interp.op according to the comments of the file. + +#### 101. What does the message "Mean/scale values should ..." mean? + +It means that your mean/scale values have wrong format. Specify mean/scale values using the form `layer_name(val1,val2,val3)`. +You need to specify values for each input of the model. For more information, refer to [Converting a Model to Intermediate Representation](convert_model/Converting_Model.md). + +#### 102. What does the message "Operation _contrib_box_nms is not supported ..." mean? + +It means that you trying to convert the topology which contains '_contrib_box_nms' operation which is not supported directly. However the sub-graph of operations including the '_contrib_box_nms' could be replaced with DetectionOutput layer if your topology is one of the gluoncv topologies. Specify '--enable_ssd_gluoncv' command line parameter for the Model Optimizer to enable this transformation. diff --git a/docs/MO_DG/prepare_model/Prepare_Trained_Model.md b/docs/MO_DG/prepare_model/Prepare_Trained_Model.md new file mode 100644 index 00000000000000..f0dca5283f8b0e --- /dev/null +++ b/docs/MO_DG/prepare_model/Prepare_Trained_Model.md @@ -0,0 +1,63 @@ +# Preparing and Optimizing Your Trained Model {#openvino_docs_MO_DG_prepare_model_Prepare_Trained_Model} + +Inference Engine enables _deploying_ your network model trained with any of supported deep learning frameworks: Caffe\*, TensorFlow\*, Kaldi\*, MXNet\* or converted to the ONNX\* format. To perform the inference, the Inference Engine does not operate with the original model, but with its Intermediate Representation (IR), which is optimized for execution on end-point target devices. To generate an IR for your trained model, the Model Optimizer tool is used. + +## How the Model Optimizer Works + +Model Optimizer loads a model into memory, reads it, builds the internal representation of the model, optimizes it, and produces the Intermediate Representation. Intermediate Representation is the only format the Inference Engine accepts. + +> **NOTE**: Model Optimizer does not infer models. Model Optimizer is an offline tool that runs before the inference takes place. + +Model Optimizer has two main purposes: + +* **Produce a valid Intermediate Representation**. If this main conversion artifact is not valid, the Inference Engine cannot run. The primary responsibility of the Model Optimizer is to produce the two files (`.xml` and `.bin`) that form the Intermediate Representation. +* **Produce an optimized Intermediate Representation**. Pre-trained models contain layers that are important for training, such as the `Dropout` layer. These layers are useless during inference and might increase the inference time. In many cases, these operations can be automatically removed from the resulting Intermediate Representation. However, if a group of operations can be represented as a single mathematical operation, and thus as a single operation node in a model graph, the Model Optimizer recognizes such patterns and replaces this group of operation nodes with the only one operation. The result is an Intermediate Representation that has fewer operation nodes than the original model. This decreases the inference time. + +To produce a valid Intermediate Representation, the Model Optimizer must be able to read the original model operations, handle their properties and represent them in Intermediate Representation format, while maintaining validity of the resulting Intermediate Representation. The resulting model consists of operations described in the [Operations Specification](../../ops/opset.md). + +## What You Need to Know about Your Model + +Many common layers exist across known frameworks and neural network topologies. Examples of these layers are `Convolution`, `Pooling`, and `Activation`. To read the original model and produce the Intermediate Representation of a model, the Model Optimizer must be able to work with these layers. + +The full list of them depends on the framework and can be found in the [Supported Framework Layers](Supported_Frameworks_Layers.md) section. If your topology contains only layers from the list of layers, as is the case for the topologies used by most users, the Model Optimizer easily creates the Intermediate Representation. After that you can proceed to work with the Inference Engine. + +However, if you use a topology with layers that are not recognized by the Model Optimizer out of the box, see [Custom Layers in the Model Optimizer](customize_model_optimizer/Customize_Model_Optimizer.md) to learn how to work with custom layers. + +## Model Optimizer Directory Structure + +After installation with OpenVINO™ toolkit or Intel® Deep Learning Deployment Toolkit, the Model Optimizer folder has the following structure: +``` +|-- model_optimizer + |-- extensions + |-- front - Front-End framework agnostic transformations (operations output shapes are not defined yet). + |-- caffe - Front-End Caffe-specific transformations and Caffe layers extractors + |-- CustomLayersMapping.xml.example - example of file for registering custom Caffe layers (compatible with the 2017R3 release) + |-- kaldi - Front-End Kaldi-specific transformations and Kaldi operations extractors + |-- mxnet - Front-End MxNet-specific transformations and MxNet symbols extractors + |-- onnx - Front-End ONNX-specific transformations and ONNX operators extractors + |-- tf - Front-End TensorFlow-specific transformations, TensorFlow operations extractors, sub-graph replacements configuration files. + |-- middle - Middle-End framework agnostic transformations (layers output shapes are defined). + |-- back - Back-End framework agnostic transformations (preparation for IR generation). + |-- mo + |-- back - Back-End logic: contains IR emitting logic + |-- front - Front-End logic: contains matching between Framework-specific layers and IR specific, calculation of output shapes for each registered layer + |-- graph - Graph utilities to work with internal IR representation + |-- middle - Graph transformations - optimizations of the model + |-- pipeline - Sequence of steps required to create IR for each framework + |-- utils - Utility functions + |-- tf_call_ie_layer - Source code that enables TensorFlow fallback in Inference Engine during model inference + |-- mo.py - Centralized entry point that can be used for any supported framework + |-- mo_caffe.py - Entry point particularly for Caffe + |-- mo_kaldi.py - Entry point particularly for Kaldi + |-- mo_mxnet.py - Entry point particularly for MXNet + |-- mo_onnx.py - Entry point particularly for ONNX + |-- mo_tf.py - Entry point particularly for TensorFlow +``` + +The following sections provide the information about how to use the Model Optimizer, from configuring the tool and generating an IR for a given model to customizing the tool for your needs: + +* [Configuring Model Optimizer](Config_Model_Optimizer.md) +* [Converting a Model to Intermediate Representation](convert_model/Converting_Model.md) +* [Custom Layers in Model Optimizer](customize_model_optimizer/Customize_Model_Optimizer.md) +* [Model Optimization Techniques](Model_Optimization_Techniques.md) +* [Model Optimizer Frequently Asked Questions](Model_Optimizer_FAQ.md) diff --git a/docs/MO_DG/prepare_model/Supported_Frameworks_Layers.md b/docs/MO_DG/prepare_model/Supported_Frameworks_Layers.md new file mode 100644 index 00000000000000..048e0b89cdd68a --- /dev/null +++ b/docs/MO_DG/prepare_model/Supported_Frameworks_Layers.md @@ -0,0 +1,391 @@ +# Supported Framework Layers {#openvino_docs_MO_DG_prepare_model_Supported_Frameworks_Layers} + +## Caffe\* Supported Layers + +Standard Caffe\* layers: + +| Layer Name in Caffe\* | Limitations | +|:---------- | :----------| +| Axpy | No | +| BN | No | +| BatchNorm | No | +| Bias | No | +| Concat | No | +| Convolution | No | +| Deconvolution | No | +| DetectionOutput | No | +| Dropout | Not needed for inference | +| Eltwise | No | +| Flatten | No | +| GlobalInput | No | +| InnerProduct | No | +| Input | No | +| LRN | No | +| Permute | No | +| Pooling | No | +| Power | No | +| ROIPooling | No | +| ReLU | No | +| Reshape | No | +| Scale | No | +| ShuffleChannel | No | +| Slice | No | +| Softmax | No | +| Tile | No | + + +## MXNet\* Supported Symbols + +Standard MXNet\* symbols: + +| Symbol Name in MXNet\*| Limitations| +| :----------| :----------| +| Activation | supported "act_type" = "relu", "sigmoid", "softrelu" or "tanh" | +| BatchNorm | No | +| Concat | No | +| Convolution | No | +| Crop | "center_crop" = 1 is not supported | +| Custom | [Custom Layers in the Model Optimizer](customize_model_optimizer/Customize_Model_Optimizer.md) | +| Deconvolution | No | +| DeformableConvolution | No | +| DeformablePSROIPooling | No | +| Dropout | Not needed for inference | +| ElementWiseSum | No | +| Embedding | No | +| Flatten | No | +| FullyConnected | No | +| InstanceNorm | No | +| L2Normalization | only 4D input is supported | +| LRN | No | +| LeakyReLU | No | +| Pad | No | +| Pooling | No | +| ROIPooling | No | +| ReLU | No | +| Reshape | No | +| ScaleShift | No | +| SoftmaxActivation | No | +| SoftmaxOutput | No | +| SoftSign | No | +| Tile | No | +| UpSampling | No | +| Where | No | +| _Plus | No | +| _contrib_MultiBoxDetection | "force_suppress" = 1 is not supported, non-default variances are not supported | +| _contrib_MultiBoxPrior | No | +| _contrib_Proposal | No | +| _copy | Not needed for inference | +| _minus_scalar | No | +| _mul_scalar | No | +| _arange | No | +| _contrib_AdaptiveAvgPooling2D | Converted to the Average Pooling with fixed paddings | +| _maximum | No | +| _minimum | No | +| add_n | No | +| broadcast_add | No | +| broadcast_mul | No | +| cumsum | No | +| div_scalar | No | +| elementwise_sub | No | +| elemwise_add | No | +| elemwise_mul | No | +| exp | No | +| expand_dims | No | +| greater_scalar | No | +| minus_scalar | No | +| null | Not needed for inference | +| repeat | No | +| rnn | No | +| rnn_param_concat | No | +| sigmoid | No | +| slice | No | +| slice_axis | No | +| slice_channel | No | +| slice_like | No | +| stack | No | +| swapaxis | No | +| tile | No | +| transpose | No | +| zeros | No | + + +## TensorFlow\* Supported Operations + +Some TensorFlow\* operations do not match to any Inference Engine layer, but are still supported by the Model Optimizer and can be used on constant propagation path. These layers are labeled 'Constant propagation' in the table. + +Standard TensorFlow\* operations: + +| Operation Name in TensorFlow\* | Limitations| +| :----------| :----------| +| Add | No | +| AddN | No | +| ArgMax | No | +| AvgPool | No | +| BatchToSpaceND | No | +| BiasAdd | No | +| Bucketize | CPU only | +| Cast | No | +| Ceil | No | +| Concat | No | +| ConcatV2 | No | +| Const | No | +| Conv2D | No | +| Conv2DBackpropInput | No | +| Cos | No | +| Cosh | No | +| CropAndResize | "method" = "bilinear" only | +| CumSum | No | +| DepthToSpace| No | +| DepthwiseConv2dNative| No | +| Enter | Supported only when it is fused to the TensorIterator layer | +| Equal | No | +| Exit | Supported only when it is fused to the TensorIterator layer | +| Exp | No | +| ExpandDims | No | +| ExperimentalSparseWeightedSum | CPU only | +| ExtractImagePatches | No | +| Fill | No | +| Floor | No | +| FusedBatchNorm | No | +| Gather | No | +| GatherNd | Supported if it can be replaced with Gather | +| GatherV2 | No | +| Greater | No | +| GreaterEqual | No | +| Identity | Not needed for shape inference | +| LRN | No | +| Less | No | +| Log | No | +| Log1p | No | +| LogicalAnd | No | +| LogicalOr | No | +| LogicalNot | No | +| LogSoftmax | No | +| LoopCond | Supported only when it is fused to the TensorIterator layer | +| MatMul | No | +| Max | No | +| MaxPool | No | +| Maximum | No | +| Mean | No | +| Merge | Supported only when it is fused to the TensorIterator layer | +| Min | No | +| Minimum | No | +| MirrorPad | No | +| Mul | No | +| Neg | No | +| NextIteration | Supported only when it is fused to the TensorIterator layer | +| NonMaxSuppressionV3 | No | +| NonMaxSuppressionV4 | No | +| NonMaxSuppressionV5 | No | +| NoOp | No | +| OneHot | No | +| Pack | No | +| Pad | No | +| PadV2 | No | +| Placeholder | No | +| PlaceholderWithDefault | No | +| Prod | No | +| Range | No | +| Rank | No | +| RealDiv | No | +| Relu | No | +| Relu6 | No | +| Reshape | No | +| ResizeBilinear | No | +| ResizeNearestNeighbor | No | +| ResourceGather| No | +| ReverseSequence | No | +| Round | No | +| Rsqrt | No | +| Shape | No | +| Sigmoid | No | +| Sin | No | +| Sinh | No | +| Size | No | +| Slice | No | +| Softmax | No | +| Softplus | No | +| Softsign | No | +| SpaceToBatchND | No | +| SparseToDense | CPU only | +| Split | No | +| SplitV | No | +| Sqrt | No | +| Square | No | +| SquaredDifference | No | +| Square| No | +| Squeeze | The case when squeeze axis is not specified is not supported | +| StopGradient | Not needed for shape inference | +| StridedSlice | No | +| Sub | No | +| Sum | No | +| Swish | No | +| Switch | Control flow propagation | +| Tan | No | +| Tanh | No | +| TensorArrayGatherV3 | Supported only when it is fused to the TensorIterator layer | +| TensorArrayReadV3 | Supported only when it is fused to the TensorIterator layer | +| TensorArrayScatterV3 | Supported only when it is fused to the TensorIterator layer | +| TensorArraySizeV3 | Supported only when it is fused to the TensorIterator layer | +| TensorArrayV3 | Supported only when it is fused to the TensorIterator layer | +| TensorArrayWriteV3 | Supported only when it is fused to the TensorIterator layer | +| Tile | No | +| TopkV2 | No | +| Transpose | No | +| Unpack | No | +| Where | No | +| ZerosLike | No | + + +## Kaldi\* Supported Layers + +Standard Kaldi\* Layers: + +| Symbol Name in Kaldi\*| Limitations| +| :----------| :----------| +| addshift | No | +| affinecomponent | No | +| affinetransform | No | +| clipgradientcomponent | Not needed for inference | +| concat | No | +| convolutional1dcomponent | No | +| convolutionalcomponent | No | +| copy | No | +| Crop | No | +| elementwiseproductcomponent | No | +| fixedaffinecomponent | No | +| linearcomponent | No | +| logsoftmaxcomponent | No | +| lstmnonlinearitycomponent | No | +| lstmprojected | No | +| lstmprojectedstreams | No | +| maxpoolingcomponent | No | +| naturalgradientaffinecomponent | No | +| naturalgradientperelementscalecomponent | No | +| noopcomponent | Not needed for inference | +| normalizecomponent | No | +| parallelcomponent | No | +| pnormcomponent | No | +| rectifiedlinearcomponent | No | +| rescale | No | +| sigmoid | No | +| slice | No | +| softmax | No | +| softmaxComponent | No | +| softsign | No | +| splicecomponent | No | +| tanhcomponent | No | + + +## ONNX\* Supported Operators + +Standard ONNX\* operators: + +| Symbol Name in ONNX\*| Limitations| +| :----------| :----------| +| Abs | No | +| Acos | No | +| Add | No | +| Affine | No | +| ArgMax | No | +| Asin | No | +| Atan | No | +| AveragePool | No | +| BatchMatMul | No | +| BatchNormalization | No | +| Cast | No | +| Ceil | No | +| Clip | No | +| Concat | No | +| Constant | No | +| ConstantFill | No | +| ConstantOfShape | No | +| Conv | No | +| ConvTranspose | | +| Cos | No | +| Cosh | No | +| Crop | No | +| CumSum | No | +| DequantizeLinear | Only in combination with QuantizeLinear, refer to the desc of the latter | +| DetectionOutput (Intel experimental) | No | +| Div | No | +| Dropout | Not needed for inference | +| Elu | No | +| Equal | No | +| Erf | No | +| Expand | No | +| FakeQuantize (Intel experimental) | No | +| Fill | No | +| Flatten | No | +| Floor | No | +| GRU | No | +| Gather | No | +| GatherTree | No | +| Gemm | No | +| GlobalAveragePool | No | +| GlobalMaxPool | No | +| Greater | No | +| GreaterEqual | No | +| HardSigmoid | No | +| Identity | Not needed for inference | +| ImageScaler | No | +| LRN | No | +| LSTM | Peepholes are not supported | +| LeakyRelu | No | +| Less | No | +| LessEqual | No | +| Log | No | +| LogicalAnd | No | +| LogicalOr | No | +| LogSoftmax | No | +| MatMul | No | +| MaxPool | No | +| MeanVarianceNormalization | Reduction over the batch dimension is not supported, reduction over all dimensions except batch and channel ones is obligatory | +| Mul | No | +| Neg | No | +| NonMaxSuppression | No | +| NonZero | No | +| Not | No | +| NotEqual | No | +| OneHot | No | +| Pad | No | +| Pow | No | +| PriorBox (Intel experimental) | No | +| QuantizeLinear | Only in combination with DequantizeLinear. When the ops following each other in the graph and the scale and zero-point values for these operations are the same (or explicitly shared), the combination is fused into a 'FakeQuantization'| +| RNN | No | +| ROIAlign | No | +| Range | No | +| Reciprocal | No | +| ReduceMax | No | +| ReduceMean | No | +| ReduceMin | No | +| ReduceProd | No | +| ReduceSum | No | +| Relu | No | +| Reshape | No | +| Resize | Opset-10 version is supported | +| ReverseSequence | No | +| Scatter | Supported if fuse-able to ScatterUpdate. MYRIAD only | +| ScatterElements | Supported if fuse-able to ScatterUpdate. MYRIAD only | +| Select | No | +| Shape | No | +| Sigmoid | No | +| Sign | No | +| Sin | No | +| Slice | No | +| Softmax | No | +| Softplus | No | +| Softsign | No | +| SpaceToDepth | No | +| Sqrt | No | +| Squeeze | The case when squeeze axis is not specified is not supported | +| Sub | No | +| Sum | No | +| Tan | No | +| Tanh | No | +| TopK | No | +| Transpose | No | +| Unsqueeze | No | +| Upsample | No | +| Where | No | +| Xor | No | diff --git a/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_Caffe.md b/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_Caffe.md new file mode 100644 index 00000000000000..cb111e9004bc4d --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_Caffe.md @@ -0,0 +1,146 @@ +# Converting a Caffe* Model {#openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_Caffe} + +A summary of the steps for optimizing and deploying a model that was trained with Caffe\*: + +1. [Configure the Model Optimizer](../Config_Model_Optimizer.md) for Caffe\*. +2. [Convert a Caffe\* Model](#Convert_From_Caffe) to produce an optimized [Intermediate Representation (IR)](../../IR_and_opsets.md) of the model based on the trained network topology, weights, and biases values +3. Test the model in the Intermediate Representation format using the [Inference Engine](../../../IE_DG/Deep_Learning_Inference_Engine_DevGuide.md) in the target environment via provided Inference Engine [sample applications](../../../IE_DG/Samples_Overview.md) +4. [Integrate](../../../IE_DG/Samples_Overview.md) the [Inference Engine](../../../IE_DG/Deep_Learning_Inference_Engine_DevGuide.md) in your application to deploy the model in the target environment + +## Supported Topologies + +* **Classification models:** + * AlexNet + * VGG-16, VGG-19 + * SqueezeNet v1.0, SqueezeNet v1.1 + * ResNet-50, ResNet-101, Res-Net-152 + * Inception v1, Inception v2, Inception v3, Inception v4 + * CaffeNet + * MobileNet + * Squeeze-and-Excitation Networks: SE-BN-Inception, SE-Resnet-101, SE-ResNet-152, SE-ResNet-50, SE-ResNeXt-101, SE-ResNeXt-50 + * ShuffleNet v2 + +* **Object detection models:** + * SSD300-VGG16, SSD500-VGG16 + * Faster-RCNN + * RefineDet (Myriad plugin only) + +* **Face detection models:** + * VGG Face + * SSH: Single Stage Headless Face Detector + +* **Semantic segmentation models:** + * FCN8 + +> **NOTE:** It is necessary to specify mean and scale values for most of the Caffe\* models to convert them with the Model Optimizer. The exact values should be determined separately for each model. For example, for Caffe\* models trained on ImageNet, the mean values usually are `123.68`, `116.779`, `103.939` for blue, green and red channels respectively. The scale value is usually `127.5`. Refer to [Framework-agnostic parameters](Converting_Model_General.md) for the information on how to specify mean and scale values. + +## Convert a Caffe* Model + +To convert a Caffe\* model: + +1. Go to the `/deployment_tools/model_optimizer` directory. +2. Use the `mo.py` script to simply convert a model with the path to the input model `.caffemodel` file: +```sh +python3 mo.py --input_model .caffemodel +``` + +Two groups of parameters are available to convert your model: + +* [Framework-agnostic parameters](Converting_Model_General.md): These parameters are used to convert a model trained with any supported framework. +* [Caffe-specific parameters](#caffe_specific_conversion_params): Parameters used to convert only Caffe\* models + +### Using Caffe\*-Specific Conversion Parameters + +The following list provides the Caffe\*-specific parameters. + +``` +Caffe*-specific parameters: + --input_proto INPUT_PROTO, -d INPUT_PROTO + Deploy-ready prototxt file that contains a topology + structure and layer attributes + --caffe_parser_path CAFFE_PARSER_PATH + Path to python Caffe parser generated from caffe.proto + -k K Path to CustomLayersMapping.xml to register custom + layers + --mean_file MEAN_FILE, -mf MEAN_FILE + Mean image to be used for the input. Should be a + binaryproto file + --mean_file_offsets MEAN_FILE_OFFSETS, -mo MEAN_FILE_OFFSETS + Mean image offsets to be used for the input + binaryproto file. When the mean image is bigger than + the expected input, it is cropped. By default, centers + of the input image and the mean image are the same and + the mean image is cropped by dimensions of the input + image. The format to pass this option is the + following: "-mo (x,y)". In this case, the mean file is + cropped by dimensions of the input image with offset + (x,y) from the upper left corner of the mean image + --disable_omitting_optional + Disable omitting optional attributes to be used for + custom layers. Use this option if you want to transfer + all attributes of a custom layer to IR. Default + behavior is to transfer the attributes with default + values and the attributes defined by the user to IR. + --enable_flattening_nested_params + Enable flattening optional params to be used for + custom layers. Use this option if you want to transfer + attributes of a custom layer to IR with flattened + nested parameters. Default behavior is to transfer the + attributes without flattening nested parameters. +``` + +#### Command-Line Interface (CLI) Examples Using Caffe\*-Specific Parameters + +* Launching the Model Optimizer for the [bvlc_alexnet.caffemodel](https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet) with a specified `prototxt` file. This is needed when the name of the Caffe\* model and the `.prototxt` file are different or are placed in different directories. Otherwise, it is enough to provide only the path to the input `model.caffemodel` file. +```sh +python3 mo.py --input_model bvlc_alexnet.caffemodel --input_proto bvlc_alexnet.prototxt +``` + +* Launching the Model Optimizer for the [bvlc_alexnet.caffemodel](https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet) with a specified `CustomLayersMapping` file. This is the legacy method of quickly enabling model conversion if your model has custom layers. This requires system Caffe\* on the computer. To read more about this, see [Legacy Mode for Caffe* Custom Layers](../customize_model_optimizer/Legacy_Mode_for_Caffe_Custom_Layers.md). +Optional parameters without default values and not specified by the user in the `.prototxt` file are removed from the Intermediate Representation, and nested parameters are flattened: +```sh +python3 mo.py --input_model bvlc_alexnet.caffemodel -k CustomLayersMapping.xml --disable_omitting_optional --enable_flattening_nested_params +``` + This example shows a multi-input model with input layers: `data`, `rois` +``` +layer { + name: "data" + type: "Input" + top: "data" + input_param { + shape { dim: 1 dim: 3 dim: 224 dim: 224 } + } +} +layer { + name: "rois" + type: "Input" + top: "rois" + input_param { + shape { dim: 1 dim: 5 dim: 1 dim: 1 } + } +} +``` + +* Launching the Model Optimizer for a multi-input model with two inputs and providing a new shape for each input in the order they are passed to the Model Optimizer. In particular, for data, set the shape to `1,3,227,227`. For rois, set the shape to `1,6,1,1`: +```sh +python3 mo.py --input_model /path-to/your-model.caffemodel --input data,rois --input_shape (1,3,227,227),[1,6,1,1] +``` + +## Custom Layer Definition + +Internally, when you run the Model Optimizer, it loads the model, goes through the topology, and tries to find each layer type in a list of known layers. Custom layers are layers that are not included in the list of known layers. If your topology contains any layers that are not in this list of known layers, the Model Optimizer classifies them as custom. + +## Supported Caffe\* Layers +Refer to [Supported Framework Layers](../Supported_Frameworks_Layers.md) for the list of supported standard layers. + +## Frequently Asked Questions (FAQ) + +The Model Optimizer provides explanatory messages if it is unable to run to completion due to issues like typographical errors, incorrectly used options, or other issues. The message describes the potential cause of the problem and gives a link to the [Model Optimizer FAQ](../Model_Optimizer_FAQ.md). The FAQ has instructions on how to resolve most issues. The FAQ also includes links to relevant sections in the Model Optimizer Developer Guide to help you understand what went wrong. + +## Summary + +In this document, you learned: + +* Basic information about how the Model Optimizer works with Caffe\* models +* Which Caffe\* models are supported +* How to convert a trained Caffe\* model using the Model Optimizer with both framework-agnostic and Caffe-specific command-line options diff --git a/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md b/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md new file mode 100644 index 00000000000000..98bf2b78c3ae42 --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md @@ -0,0 +1,108 @@ +# Converting a Kaldi* Model {#openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_Kaldi} + +A summary of the steps for optimizing and deploying a model that was trained with Kaldi\*: + +1. [Configure the Model Optimizer](../Config_Model_Optimizer.md) for Kaldi\*. +2. [Convert a Kaldi\* Model](#Convert_From_Kaldi) to produce an optimized [Intermediate Representation (IR)](../../IR_and_opsets.md) of the model based on the trained network topology, weights, and biases values. +3. Test the model in the Intermediate Representation format using the [Inference Engine](../../../IE_DG/Deep_Learning_Inference_Engine_DevGuide.md) in the target environment via provided Inference Engine [sample applications](../../../IE_DG/Samples_Overview.md). +4. [Integrate](../../../IE_DG/Samples_Overview.md) the [Inference Engine](../../../IE_DG/Deep_Learning_Inference_Engine_DevGuide.md) in your application to deploy the model in the target environment. + +> **NOTE:** The Model Optimizer supports the [nnet1](http://kaldi-asr.org/doc/dnn1.html) and [nnet2](http://kaldi-asr.org/doc/dnn2.html) formats of Kaldi models. Support of the [nnet3](http://kaldi-asr.org/doc/dnn3.html) format is limited. + +## Supported Topologies +* Convolutional Neural Networks (CNN): + * Wall Street Journal CNN (wsj_cnn4b) + * Resource Management CNN (rm_cnn4a_smbr) + +* Long Short Term Memory (LSTM) Networks: + * Resource Management LSTM (rm_lstm4f) + * TED-LIUM LSTM (ted_lstm4f) + +* Deep Neural Networks (DNN): + * Wall Street Journal DNN (wsj_dnn5b_smbr); + * TED-LIUM DNN (ted_dnn_smbr) + +* Time delay neural network (TDNN) + * [ASpIRE Chain TDNN](kaldi_specific/Aspire_Tdnn_Model.md); + * [Librispeech nnet3](https://github.com/ryanleary/kaldi-test/releases/download/v0.0/LibriSpeech-trained.tgz). + +* TDNN-LSTM model + + +## Convert a Kaldi* Model + +To convert a Kaldi\* model: + +1. Go to the `/deployment_tools/model_optimizer` directory. +2. Use the `mo.py` script to simply convert a model with the path to the input model `.nnet` or `.mdl` file: +```sh +python3 mo.py --input_model .nnet +``` + +Two groups of parameters are available to convert your model: + +* [Framework-agnostic parameters](Converting_Model_General.md): These parameters are used to convert any model trained in any supported framework. +* [Kaldi-specific parameters](#kaldi_specific_conversion_params): Parameters used to convert only Kaldi\* models. + +### Using Kaldi\*-Specific Conversion Parameters + +The following list provides the Kaldi\*-specific parameters. + +```sh +Kaldi-specific parameters: + --counts COUNTS A file name with full path to the counts file + --remove_output_softmax + Removes the Softmax that is the output layer + --remove_memory Remove the Memory layer and add new inputs and outputs instead +``` + +### Examples of CLI Commands + +* To launch the Model Optimizer for the wsj_dnn5b_smbr model with the specified `.nnet` file: +```sh +python3 mo.py --input_model wsj_dnn5b_smbr.nnet +``` + +* To launch the Model Optimizer for the wsj_dnn5b_smbr model with existing file that contains counts for the last layer with biases: +```sh +python3 mo.py --input_model wsj_dnn5b_smbr.nnet --counts wsj_dnn5b_smbr.counts +``` + * The Model Optimizer normalizes сounts in the following way: + \f[ + S = \frac{1}{\sum_{j = 0}^{|C|}C_{j}} + \f] + \f[ + C_{i}=log(S*C_{i}) + \f] + where \f$C\f$ - the counts array, \f$C_{i} - i^{th}\f$ element of the counts array, + \f$|C|\f$ - number of elements in the counts array; + * The normalized counts are subtracted from biases of the last or next to last layer (if last layer is SoftMax). + +* If you want to remove the last SoftMax layer in the topology, launch the Model Optimizer with the +`--remove_output_softmax` flag. +```sh +python3 mo.py --input_model wsj_dnn5b_smbr.nnet --counts wsj_dnn5b_smbr.counts --remove_output_softmax +``` +The Model Optimizer finds the last layer of the topology and removes this layer only if it is a SoftMax layer. + + > **NOTE:** Model Optimizer can remove SoftMax layer only if the topology has one output. + + > **NOTE:** For sample inference of Kaldi models, you can use the Inference Engine Speech Recognition sample application. The sample supports models with one output. If your model has several outputs, specify the desired one with the `--output` option. + + If you want to convert a model for inference on Intel® Movidius™ Myriad™, use the `--remove_memory` option. +It removes Memory layers from the IR. Instead of it, additional inputs and outputs appear in the IR. +The Model Optimizer outputs the mapping between inputs and outputs. For example: +```sh +[ WARNING ] Add input/output mapped Parameter_0_for_Offset_fastlstm2.r_trunc__2Offset_fastlstm2.r_trunc__2_out -> Result_for_Offset_fastlstm2.r_trunc__2Offset_fastlstm2.r_trunc__2_out +[ WARNING ] Add input/output mapped Parameter_1_for_Offset_fastlstm2.r_trunc__2Offset_fastlstm2.r_trunc__2_out -> Result_for_Offset_fastlstm2.r_trunc__2Offset_fastlstm2.r_trunc__2_out +[ WARNING ] Add input/output mapped Parameter_0_for_iteration_Offset_fastlstm3.c_trunc__3390 -> Result_for_iteration_Offset_fastlstm3.c_trunc__3390 +``` + Based on this mapping, link inputs and outputs in your application manually as follows: + +1. Initialize inputs from the mapping as zeros in the first frame of an utterance. +2. Copy output blobs from the mapping to the corresponding inputs. For example, data from `Result_for_Offset_fastlstm2.r_trunc__2Offset_fastlstm2.r_trunc__2_out` +must be copied to `Parameter_0_for_Offset_fastlstm2.r_trunc__2Offset_fastlstm2.r_trunc__2_out`. + + +## Supported Kaldi\* Layers +Refer to [Supported Framework Layers ](../Supported_Frameworks_Layers.md) for the list of supported standard layers. diff --git a/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_MxNet.md b/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_MxNet.md new file mode 100644 index 00000000000000..85431af45d6f5f --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_MxNet.md @@ -0,0 +1,104 @@ +# Converting a MXNet* Model {#openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_MxNet} + +A summary of the steps for optimizing and deploying a model that was trained with the MXNet\* framework: + +1. [Configure the Model Optimizer](../Config_Model_Optimizer.md) for MXNet* (MXNet was used to train your model) +2. [Convert a MXNet model](#ConvertMxNet) to produce an optimized [Intermediate Representation (IR)](../../IR_and_opsets.md) of the model based on the trained network topology, weights, and biases values +3. Test the model in the Intermediate Representation format using the [Inference Engine](../../../IE_DG/Deep_Learning_Inference_Engine_DevGuide.md) in the target environment via provided Inference Engine [sample applications](../../../IE_DG/Samples_Overview.md) +4. [Integrate](../../../IE_DG/Samples_Overview.md) the [Inference Engine](../../../IE_DG/Deep_Learning_Inference_Engine_DevGuide.md) in your application to deploy the model in the target environment + +## Supported Topologies + +> **NOTE:** SSD models from the table require converting to the deploy mode. For details, see the [Conversion Instructions](https://github.com/zhreshold/mxnet-ssd/#convert-model-to-deploy-mode) in the GitHub MXNet-SSD repository. + +| Model Name| Model File | +| ------------- |:-------------:| +|VGG-16| [Repo](https://github.com/dmlc/mxnet-model-gallery/tree/master), [Symbol](http://data.mxnet.io/models/imagenet/vgg/vgg16-symbol.json), [Params](http://data.mxnet.io/models/imagenet/vgg/vgg16-0000.params)| +|VGG-19| [Repo](https://github.com/dmlc/mxnet-model-gallery/tree/master), [Symbol](http://data.mxnet.io/models/imagenet/vgg/vgg19-symbol.json), [Params](http://data.mxnet.io/models/imagenet/vgg/vgg19-0000.params)| +|ResNet-152 v1| [Repo](https://github.com/dmlc/mxnet-model-gallery/tree/master), [Symbol](http://data.mxnet.io/models/imagenet/resnet/152-layers/resnet-152-symbol.json), [Params](http://data.mxnet.io/models/imagenet/resnet/152-layers/resnet-152-0000.params)| +|SqueezeNet_v1.1| [Repo](https://github.com/dmlc/mxnet-model-gallery/tree/master), [Symbol](http://data.mxnet.io/models/imagenet/squeezenet/squeezenet_v1.1-symbol.json), [Params](http://data.mxnet.io/models/imagenet/squeezenet/squeezenet_v1.1-0000.params)| +|Inception BN| [Repo](https://github.com/dmlc/mxnet-model-gallery/tree/master), [Symbol](http://data.mxnet.io/models/imagenet/inception-bn/Inception-BN-symbol.json), [Params](http://data.mxnet.io/models/imagenet/inception-bn/Inception-BN-0126.params)| +|CaffeNet| [Repo](https://github.com/dmlc/mxnet-model-gallery/tree/master), [Symbol](http://data.mxnet.io/mxnet/models/imagenet/caffenet/caffenet-symbol.json), [Params](http://data.mxnet.io/models/imagenet/caffenet/caffenet-0000.params)| +|DenseNet-121| [Repo](https://github.com/miraclewkf/DenseNet), [Symbol](https://raw.githubusercontent.com/miraclewkf/DenseNet/master/model/densenet-121-symbol.json), [Params](https://drive.google.com/file/d/0ByXcv9gLjrVcb3NGb1JPa3ZFQUk/view?usp=drive_web)| +|DenseNet-161| [Repo](https://github.com/miraclewkf/DenseNet), [Symbol](https://raw.githubusercontent.com/miraclewkf/DenseNet/master/model/densenet-161-symbol.json), [Params](https://drive.google.com/file/d/0ByXcv9gLjrVcS0FwZ082SEtiUjQ/view)| +|DenseNet-169| [Repo](https://github.com/miraclewkf/DenseNet), [Symbol](https://raw.githubusercontent.com/miraclewkf/DenseNet/master/model/densenet-169-symbol.json), [Params](https://drive.google.com/file/d/0ByXcv9gLjrVcOWZJejlMOWZvZmc/view)| +|DenseNet-201| [Repo](https://github.com/miraclewkf/DenseNet), [Symbol](https://raw.githubusercontent.com/miraclewkf/DenseNet/master/model/densenet-201-symbol.json), [Params](https://drive.google.com/file/d/0ByXcv9gLjrVcUjF4MDBwZ3FQbkU/view)| +|MobileNet| [Repo](https://github.com/KeyKy/mobilenet-mxnet), [Symbol](https://github.com/KeyKy/mobilenet-mxnet/blob/master/mobilenet.py), [Params](https://github.com/KeyKy/mobilenet-mxnet/blob/master/mobilenet-0000.params)| +|SSD-ResNet-50| [Repo](https://github.com/zhreshold/mxnet-ssd), [Symbol + Params](https://github.com/zhreshold/mxnet-ssd/releases/download/v0.6/resnet50_ssd_512_voc0712_trainval.zip)| +|SSD-VGG-16-300| [Repo](https://github.com/zhreshold/mxnet-ssd), [Symbol + Params](https://github.com/zhreshold/mxnet-ssd/releases/download/v0.5-beta/vgg16_ssd_300_voc0712_trainval.zip)| +|SSD-Inception v3| [Repo](https://github.com/zhreshold/mxnet-ssd), [Symbol + Params](https://github.com/zhreshold/mxnet-ssd/releases/download/v0.7-alpha/ssd_inceptionv3_512_voc0712trainval.zip)| +|FCN8 (Semantic Segmentation)| [Repo](https://github.com/apache/incubator-mxnet/tree/master/example/fcn-xs), [Symbol](https://www.dropbox.com/sh/578n5cxej7ofd6m/AAA9SFCBN8R_uL2CnAd3WQ5ia/FCN8s_VGG16-symbol.json?dl=0), [Params](https://www.dropbox.com/sh/578n5cxej7ofd6m/AABHWZHCtA2P6iR6LUflkxb_a/FCN8s_VGG16-0019-cpu.params?dl=0)| +|MTCNN part 1 (Face Detection)| [Repo](https://github.com/pangyupo/mxnet_mtcnn_face_detection), [Symbol](https://github.com/pangyupo/mxnet_mtcnn_face_detection/blob/master/model/det1-symbol.json), [Params](https://github.com/pangyupo/mxnet_mtcnn_face_detection/blob/master/model/det1-0001.params)| +|MTCNN part 2 (Face Detection)| [Repo](https://github.com/pangyupo/mxnet_mtcnn_face_detection), [Symbol](https://github.com/pangyupo/mxnet_mtcnn_face_detection/blob/master/model/det2-symbol.json), [Params](https://github.com/pangyupo/mxnet_mtcnn_face_detection/blob/master/model/det2-0001.params)| +|MTCNN part 3 (Face Detection)| [Repo](https://github.com/pangyupo/mxnet_mtcnn_face_detection), [Symbol](https://github.com/pangyupo/mxnet_mtcnn_face_detection/blob/master/model/det3-symbol.json), [Params](https://github.com/pangyupo/mxnet_mtcnn_face_detection/blob/master/model/det3-0001.params)| +|MTCNN part 4 (Face Detection)| [Repo](https://github.com/pangyupo/mxnet_mtcnn_face_detection), [Symbol](https://github.com/pangyupo/mxnet_mtcnn_face_detection/blob/master/model/det4-symbol.json), [Params](https://github.com/pangyupo/mxnet_mtcnn_face_detection/blob/master/model/det4-0001.params)| +|Lightened_moon| [Repo](https://github.com/tornadomeet/mxnet-face/tree/master/model/lightened_moon), [Symbol](https://github.com/tornadomeet/mxnet-face/blob/master/model/lightened_moon/lightened_moon_fuse-symbol.json), [Params](https://github.com/tornadomeet/mxnet-face/blob/master/model/lightened_moon/lightened_moon_fuse-0082.params)| +|RNN-Transducer| [Repo](https://github.com/HawkAaron/mxnet-transducer) | +|word_lm| [Repo](https://github.com/apache/incubator-mxnet/tree/master/example/rnn/word_lm) | + +**Other supported topologies** + +* Style transfer [model](https://github.com/zhaw/neural_style) can be converted using [instruction](mxnet_specific/Convert_Style_Transfer_From_MXNet.md), + +## Convert an MXNet* Model + +To convert an MXNet\* model: + +1. Go to the `/deployment_tools/model_optimizer` directory. +2. To convert an MXNet\* model contained in a `model-file-symbol.json` and `model-file-0000.params`, run the Model Optimizer launch script `mo.py`, specifying a path to the input model file: +```sh +python3 mo_mxnet.py --input_model model-file-0000.params +``` + +Two groups of parameters are available to convert your model: + +* [Framework-agnostic parameters](Converting_Model_General.md): These parameters are used to convert any model trained in any supported framework. +* [MXNet-specific parameters](#mxnet_specific_conversion_params): Parameters used to convert only MXNet models + + +### Using MXNet\*-Specific Conversion Parameters +The following list provides the MXNet\*-specific parameters. + +``` +MXNet-specific parameters: + --input_symbol + Symbol file (for example, "model-symbol.json") that contains a topology structure and layer attributes + --nd_prefix_name + Prefix name for args.nd and argx.nd files + --pretrained_model_name + Name of a pretrained MXNet model without extension and epoch + number. This model will be merged with args.nd and argx.nd + files + --save_params_from_nd + Enable saving built parameters file from .nd files + --legacy_mxnet_model + Enable MXNet loader to make a model compatible with the latest MXNet version. + Use only if your model was trained with MXNet version lower than 1.0.0 + --enable_ssd_gluoncv + Enable transformation for converting the gluoncv ssd topologies. + Use only if your topology is one of ssd gluoncv topologies +``` + +> **NOTE:** By default, the Model Optimizer does not use the MXNet loader, as it transforms the topology to another format, which is compatible with the latest +> version of MXNet, but it is required for models trained with lower version of MXNet. If your model was trained with MXNet version lower than 1.0.0, specify the +> `--legacy_mxnet_model` key to enable the MXNet loader. However, the loader does not support models with custom layers. In this case, you must manually +> recompile MXNet with custom layers and install it to your environment. + +## Custom Layer Definition + +Internally, when you run the Model Optimizer, it loads the model, goes through the topology, and tries to find each layer type in a list of known layers. Custom layers are layers that are not included in the list of known layers. If your topology contains any layers that are not in this list of known layers, the Model Optimizer classifies them as custom. + +## Supported MXNet\* Layers +Refer to [Supported Framework Layers ](../Supported_Frameworks_Layers.md) for the list of supported standard layers. + +## Frequently Asked Questions (FAQ) + +The Model Optimizer provides explanatory messages if it is unable to run to completion due to issues like typographical errors, incorrectly used options, or other issues. The message describes the potential cause of the problem and gives a link to the [Model Optimizer FAQ](../Model_Optimizer_FAQ.md). The FAQ has instructions on how to resolve most issues. The FAQ also includes links to relevant sections in the Model Optimizer Developer Guide to help you understand what went wrong. + +## Summary + +In this document, you learned: + +* Basic information about how the Model Optimizer works with MXNet\* models +* Which MXNet\* models are supported +* How to convert a trained MXNet\* model using the Model Optimizer with both framework-agnostic and MXNet-specific command-line options diff --git a/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md b/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md new file mode 100644 index 00000000000000..c3633e185b7a26 --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md @@ -0,0 +1,80 @@ +# Converting a ONNX* Model {#openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_ONNX} + +## Introduction to ONNX + +[ONNX*](https://github.com/onnx/onnx) is a representation format for deep learning models. ONNX allows AI developers easily transfer models between different frameworks that helps to choose the best combination for them. Today, PyTorch\*, Caffe2\*, Apache MXNet\*, Microsoft Cognitive Toolkit\* and other tools are developing ONNX support. + +## Supported Public ONNX Topologies +| Model Name | Path to Public Models master branch| +|:----|:----| +| bert_large | [model archive](https://github.com/mlperf/inference/tree/master/v0.7/language/bert) | +| bvlc_alexnet | [model archive](https://s3.amazonaws.com/download.onnx/models/opset_8/bvlc_alexnet.tar.gz) | +| bvlc_googlenet | [model archive](https://s3.amazonaws.com/download.onnx/models/opset_8/bvlc_googlenet.tar.gz) | +| bvlc_reference_caffenet | [model archive](https://s3.amazonaws.com/download.onnx/models/opset_8/bvlc_reference_caffenet.tar.gz) | +| bvlc_reference_rcnn_ilsvrc13 | [model archive](https://s3.amazonaws.com/download.onnx/models/opset_8/bvlc_reference_rcnn_ilsvrc13.tar.gz) | +| inception_v1 | [model archive](https://s3.amazonaws.com/download.onnx/models/opset_8/inception_v1.tar.gz) | +| inception_v2 | [model archive](https://s3.amazonaws.com/download.onnx/models/opset_8/inception_v2.tar.gz) | +| resnet50 | [model archive](https://s3.amazonaws.com/download.onnx/models/opset_8/resnet50.tar.gz) | +| squeezenet | [model archive](https://s3.amazonaws.com/download.onnx/models/opset_8/squeezenet.tar.gz) | +| densenet121 | [model archive](https://s3.amazonaws.com/download.onnx/models/opset_8/densenet121.tar.gz) | +| emotion_ferplus | [model archive](https://www.cntk.ai/OnnxModels/emotion_ferplus/opset_2/emotion_ferplus.tar.gz) | +| mnist | [model archive](https://www.cntk.ai/OnnxModels/mnist/opset_1/mnist.tar.gz) | +| shufflenet | [model archive](https://s3.amazonaws.com/download.onnx/models/opset_8/shufflenet.tar.gz) | +| VGG19 | [model archive](https://s3.amazonaws.com/download.onnx/models/opset_8/vgg19.tar.gz) | +| zfnet512 | [model archive](https://s3.amazonaws.com/download.onnx/models/opset_8/zfnet512.tar.gz) | +| GPT-2 | [model archive](https://github.com/onnx/models/blob/master/text/machine_comprehension/gpt-2/model/gpt2-10.tar.gz) | + +Listed models are built with the operation set version 8 except the GPT-2 model. Models that are upgraded to higher operation set versions may not be supported. + +## Supported Pytorch* Models via ONNX Conversion +Starting from the 2019R4 release, the OpenVINO™ toolkit officially supports public Pytorch* models (from `torchvision` 0.2.1 and `pretrainedmodels` 0.7.4 packages) via ONNX conversion. +The list of supported topologies is presented below: + +|Package Name|Supported Models| +|:----|:----| +| [Torchvision Models](https://pytorch.org/docs/stable/torchvision/index.html) | alexnet, densenet121, densenet161, densenet169, densenet201, resnet101, resnet152, resnet18, resnet34, resnet50, vgg11, vgg13, vgg16, vgg19 | +| [Pretrained Models](https://github.com/Cadene/pretrained-models.pytorch) | alexnet, fbresnet152, resnet101, resnet152, resnet18, resnet34, resnet152, resnet18, resnet34, resnet50, resnext101_32x4d, resnext101_64x4d, vgg11 | +| [ESPNet Models](https://github.com/sacmehta/ESPNet/tree/master/pretrained) | | + +## Supported PaddlePaddle* Models via ONNX Conversion +Starting from the R5 release, the OpenVINO™ toolkit officially supports public PaddlePaddle* models via ONNX conversion. +The list of supported topologies downloadable from PaddleHub is presented below: + +| Model Name | Command to download the model from PaddleHub | +|:----|:----| +| [MobileNetV2](https://www.paddlepaddle.org.cn/hubdetail?name=mobilenet_v2_imagenet) | `hub install mobilenet_v2_imagenet==1.0.1` | +| [ResNet18](https://www.paddlepaddle.org.cn/hubdetail?name=resnet_v2_18_imagenet) | `hub install resnet_v2_18_imagenet==1.0.0` | +| [ResNet34](https://www.paddlepaddle.org.cn/hubdetail?name=resnet_v2_34_imagenet) | `hub install resnet_v2_34_imagenet==1.0.0` | +| [ResNet50](https://www.paddlepaddle.org.cn/hubdetail?name=resnet_v2_50_imagenet) | `hub install resnet_v2_50_imagenet==1.0.1` | +| [ResNet101](https://www.paddlepaddle.org.cn/hubdetail?name=resnet_v2_101_imagenet) | `hub install resnet_v2_101_imagenet==1.0.1` | +| [ResNet152](https://www.paddlepaddle.org.cn/hubdetail?name=resnet_v2_152_imagenet) | `hub install resnet_v2_152_imagenet==1.0.1` | +> **NOTE**: To convert a model downloaded from PaddleHub use [paddle2onnx](https://github.com/PaddlePaddle/paddle2onnx) converter. + +The list of supported topologies from the [models v1.5](https://github.com/PaddlePaddle/models/tree/release/1.5) package: +* [MobileNetV1](https://github.com/PaddlePaddle/models/blob/release/1.5/PaddleCV/image_classification/models/mobilenet.py) +* [MobileNetV2](https://github.com/PaddlePaddle/models/blob/release/1.5/PaddleCV/image_classification/models/mobilenet_v2.py) +* [ResNet](https://github.com/PaddlePaddle/models/blob/release/1.5/PaddleCV/image_classification/models/resnet.py) +* [ResNet_vc](https://github.com/PaddlePaddle/models/blob/release/1.5/PaddleCV/image_classification/models/resnet_vc.py) +* [ResNet_vd](https://github.com/PaddlePaddle/models/blob/release/1.5/PaddleCV/image_classification/models/resnet_vd.py) +* [ResNeXt](https://github.com/PaddlePaddle/models/blob/release/1.5/PaddleCV/image_classification/models/resnext.py) +* [ResNeXt_vd](https://github.com/PaddlePaddle/models/blob/release/1.5/PaddleCV/image_classification/models/resnext_vd.py) + +> **NOTE**: To convert these topologies one should first serialize the model by calling `paddle.fluid.io.save_inference_model` + ([description](https://www.paddlepaddle.org.cn/documentation/docs/en/1.3/api/io.html#save-inference-model)) command and + after that use [paddle2onnx](https://github.com/PaddlePaddle/paddle2onnx) converter. + +## Convert an ONNX* Model +The Model Optimizer process assumes you have an ONNX model that was directly downloaded from a public repository or converted from any framework that supports exporting to the ONNX format. + +To convert an ONNX\* model: + +1. Go to the `/deployment_tools/model_optimizer` directory. +2. Use the `mo.py` script to simply convert a model with the path to the input model `.nnet` file: +```sh +python3 mo.py --input_model .onnx +``` + +There are no ONNX\* specific parameters, so only [framework-agnostic parameters](Converting_Model_General.md) are available to convert your model. + +## Supported ONNX\* Layers +Refer to [Supported Framework Layers](../Supported_Frameworks_Layers.md) for the list of supported standard layers. diff --git a/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md b/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md new file mode 100644 index 00000000000000..42c1d487b7202e --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md @@ -0,0 +1,367 @@ +# Converting a TensorFlow* Model {#openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_TensorFlow} + +A summary of the steps for optimizing and deploying a model that was trained with the TensorFlow\* framework: + +1. [Configure the Model Optimizer](../Config_Model_Optimizer.md) for TensorFlow\* (TensorFlow was used to train your model). +2. [Freeze the TensorFlow model](#freeze-the-tensorflow-model) if your model is not already frozen or skip this step and use the [instruction](#loading-nonfrozen-models) to a convert a non-frozen model. +3. [Convert a TensorFlow\* model](#Convert_From_TF) to produce an optimized [Intermediate Representation (IR)](../../IR_and_opsets.md) of the model based on the trained network topology, weights, and biases values. +4. Test the model in the Intermediate Representation format using the [Inference Engine](../../../IE_DG/Deep_Learning_Inference_Engine_DevGuide.md) in the target environment via provided [sample applications](../../../IE_DG/Samples_Overview.md). +5. [Integrate](../../../IE_DG/Samples_Overview.md) the Inference Engine in your application to deploy the model in the target environment. + +## Supported Topologies + +**Supported Non-Frozen Topologies with Links to the Associated Slim Model Classification Download Files** + +Detailed information on how to convert models from the TensorFlow\*-Slim Image Classification Model Library is available in the [Converting TensorFlow*-Slim Image Classification Model Library Models](tf_specific/Convert_Slim_Library_Models.md) chapter. The table below contains list of supported TensorFlow\*-Slim Image Classification Model Library models and required mean/scale values. The mean values are specified as if the input image is read in BGR channels order layout like Inference Engine classification sample does. + +| Model Name| Slim Model Checkpoint File| \-\-mean_values | \-\-scale| +| ------------- | ------------ | ------------- | -----:| +|Inception v1| [inception_v1_2016_08_28.tar.gz](http://download.tensorflow.org/models/inception_v1_2016_08_28.tar.gz)| [127.5,127.5,127.5]| 127.5| +|Inception v2| [inception_v1_2016_08_28.tar.gz](http://download.tensorflow.org/models/inception_v1_2016_08_28.tar.gz)| [127.5,127.5,127.5]| 127.5| +|Inception v3| [inception_v3_2016_08_28.tar.gz](http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz)| [127.5,127.5,127.5]| 127.5| +|Inception V4| [inception_v4_2016_09_09.tar.gz](http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz)| [127.5,127.5,127.5]| 127.5| +|Inception ResNet v2| [inception_resnet_v2_2016_08_30.tar.gz](http://download.tensorflow.org/models/inception_resnet_v2_2016_08_30.tar.gz)| [127.5,127.5,127.5]| 127.5| +|MobileNet v1 128| [mobilenet_v1_0.25_128.tgz](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.25_128.tgz)| [127.5,127.5,127.5]| 127.5| +|MobileNet v1 160| [mobilenet_v1_0.5_160.tgz](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_0.5_160.tgz)| [127.5,127.5,127.5]| 127.5| +|MobileNet v1 224| [mobilenet_v1_1.0_224.tgz](http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_1.0_224.tgz)| [127.5,127.5,127.5]| 127.5| +|NasNet Large| [nasnet-a_large_04_10_2017.tar.gz](https://storage.googleapis.com/download.tensorflow.org/models/nasnet-a_large_04_10_2017.tar.gz)| [127.5,127.5,127.5]| 127.5| +|NasNet Mobile| [nasnet-a_mobile_04_10_2017.tar.gz](https://storage.googleapis.com/download.tensorflow.org/models/nasnet-a_mobile_04_10_2017.tar.gz)| [127.5,127.5,127.5]| 127.5| +|ResidualNet-50 v1| [resnet_v1_50_2016_08_28.tar.gz](http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz)| [103.94,116.78,123.68] | 1 | +|ResidualNet-50 v2| [resnet_v2_50_2017_04_14.tar.gz](http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz)| [103.94,116.78,123.68] | 1 | +|ResidualNet-101 v1| [resnet_v1_101_2016_08_28.tar.gz](http://download.tensorflow.org/models/resnet_v1_101_2016_08_28.tar.gz)| [103.94,116.78,123.68] | 1 | +|ResidualNet-101 v2| [resnet_v2_101_2017_04_14.tar.gz](http://download.tensorflow.org/models/resnet_v2_101_2017_04_14.tar.gz)| [103.94,116.78,123.68] | 1 | +|ResidualNet-152 v1| [resnet_v1_152_2016_08_28.tar.gz](http://download.tensorflow.org/models/resnet_v1_152_2016_08_28.tar.gz)| [103.94,116.78,123.68] | 1 | +|ResidualNet-152 v2| [resnet_v2_152_2017_04_14.tar.gz](http://download.tensorflow.org/models/resnet_v2_152_2017_04_14.tar.gz)| [103.94,116.78,123.68] | 1 | +|VGG-16| [vgg_16_2016_08_28.tar.gz](http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz)| [103.94,116.78,123.68] | 1 | +|VGG-19| [vgg_19_2016_08_28.tar.gz](http://download.tensorflow.org/models/vgg_19_2016_08_28.tar.gz)| [103.94,116.78,123.68] | 1 | + +**Supported Frozen Topologies from TensorFlow Object Detection Models Zoo** + +Detailed information on how to convert models from the Object Detection Models Zoo is available in the [Converting TensorFlow Object Detection API Models](tf_specific/Convert_Object_Detection_API_Models.md) chapter. The table below contains models from the Object Detection Models zoo that are supported. + +| Model Name| TensorFlow Object Detection API Models (Frozen)| +| :------------- | -----:| +|SSD MobileNet V1 COCO\*| [ssd_mobilenet_v1_coco_2018_01_28.tar.gz](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2018_01_28.tar.gz)| +|SSD MobileNet V1 0.75 Depth COCO| [ssd_mobilenet_v1_0.75_depth_300x300_coco14_sync_2018_07_03.tar.gz](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_0.75_depth_300x300_coco14_sync_2018_07_03.tar.gz)| +|SSD MobileNet V1 PPN COCO| [ssd_mobilenet_v1_ppn_shared_box_predictor_300x300_coco14_sync_2018_07_03.tar.gz](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_ppn_shared_box_predictor_300x300_coco14_sync_2018_07_03.tar.gz)| +|SSD MobileNet V1 FPN COCO| [ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.tar.gz](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.tar.gz)| +|SSD ResNet50 FPN COCO| [ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.tar.gz](http://download.tensorflow.org/models/object_detection/ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.tar.gz)| +|SSD MobileNet V2 COCO| [ssd_mobilenet_v2_coco_2018_03_29.tar.gz](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz)| +|SSD Lite MobileNet V2 COCO| [ssdlite_mobilenet_v2_coco_2018_05_09.tar.gz](http://download.tensorflow.org/models/object_detection/ssdlite_mobilenet_v2_coco_2018_05_09.tar.gz)| +|SSD Inception V2 COCO| [ssd_inception_v2_coco_2018_01_28.tar.gz](http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2018_01_28.tar.gz)| +|RFCN ResNet 101 COCO| [rfcn_resnet101_coco_2018_01_28.tar.gz](http://download.tensorflow.org/models/object_detection/rfcn_resnet101_coco_2018_01_28.tar.gz)| +|Faster R-CNN Inception V2 COCO| [faster_rcnn_inception_v2_coco_2018_01_28.tar.gz](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_v2_coco_2018_01_28.tar.gz)| +|Faster R-CNN ResNet 50 COCO| [faster_rcnn_resnet50_coco_2018_01_28.tar.gz](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet50_coco_2018_01_28.tar.gz)| +|Faster R-CNN ResNet 50 Low Proposals COCO| [faster_rcnn_resnet50_lowproposals_coco_2018_01_28.tar.gz](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet50_lowproposals_coco_2018_01_28.tar.gz)| +|Faster R-CNN ResNet 101 COCO| [faster_rcnn_resnet101_coco_2018_01_28.tar.gz](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_2018_01_28.tar.gz)| +|Faster R-CNN ResNet 101 Low Proposals COCO| [faster_rcnn_resnet101_lowproposals_coco_2018_01_28.tar.gz](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_lowproposals_coco_2018_01_28.tar.gz)| +|Faster R-CNN Inception ResNet V2 COCO| [faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28.tar.gz](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28.tar.gz)| +|Faster R-CNN Inception ResNet V2 Low Proposals COCO| [faster_rcnn_inception_resnet_v2_atrous_lowproposals_coco_2018_01_28.tar.gz](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_lowproposals_coco_2018_01_28.tar.gz)| +|Faster R-CNN NasNet COCO| [faster_rcnn_nas_coco_2018_01_28.tar.gz](http://download.tensorflow.org/models/object_detection/faster_rcnn_nas_coco_2018_01_28.tar.gz)| +|Faster R-CNN NasNet Low Proposals COCO| [faster_rcnn_nas_lowproposals_coco_2018_01_28.tar.gz](http://download.tensorflow.org/models/object_detection/faster_rcnn_nas_lowproposals_coco_2018_01_28.tar.gz)| +|Mask R-CNN Inception ResNet V2 COCO| [mask_rcnn_inception_resnet_v2_atrous_coco_2018_01_28.tar.gz](http://download.tensorflow.org/models/object_detection/mask_rcnn_inception_resnet_v2_atrous_coco_2018_01_28.tar.gz)| +|Mask R-CNN Inception V2 COCO| [mask_rcnn_inception_v2_coco_2018_01_28.tar.gz](http://download.tensorflow.org/models/object_detection/mask_rcnn_inception_v2_coco_2018_01_28.tar.gz)| +|Mask R-CNN ResNet 101 COCO| [mask_rcnn_resnet101_atrous_coco_2018_01_28.tar.gz](http://download.tensorflow.org/models/object_detection/mask_rcnn_resnet101_atrous_coco_2018_01_28.tar.gz)| +|Mask R-CNN ResNet 50 COCO| [mask_rcnn_resnet50_atrous_coco_2018_01_28.tar.gz](http://download.tensorflow.org/models/object_detection/mask_rcnn_resnet50_atrous_coco_2018_01_28.tar.gz)| +|Faster R-CNN ResNet 101 Kitti\*| [faster_rcnn_resnet101_kitti_2018_01_28.tar.gz](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_kitti_2018_01_28.tar.gz)| +|Faster R-CNN Inception ResNet V2 Open Images\*| [faster_rcnn_inception_resnet_v2_atrous_oid_2018_01_28.tar.gz](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_oid_2018_01_28.tar.gz)| +|Faster R-CNN Inception ResNet V2 Low Proposals Open Images\*| [faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid_2018_01_28.tar.gz](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_resnet_v2_atrous_lowproposals_oid_2018_01_28.tar.gz)| +|Faster R-CNN ResNet 101 AVA v2.1\*| [faster_rcnn_resnet101_ava_v2.1_2018_04_30.tar.gz](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_ava_v2.1_2018_04_30.tar.gz)| + +**Supported Frozen Quantized Topologies** + +The topologies hosted on the TensorFlow\* Lite [site](https://www.tensorflow.org/lite/guide/hosted_models). The frozen model file (`.pb` file) should be fed to the Model Optimizer. + +| Model Name | Frozen Model File | +|:----------------------|---------------------------------------------------------------------------------------------------------------------------------:| +| Mobilenet V1 0.25 128 | [mobilenet_v1_0.25_128_quant.tgz](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.25_128_quant.tgz) | +| Mobilenet V1 0.25 160 | [mobilenet_v1_0.25_160_quant.tgz](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.25_160_quant.tgz) | +| Mobilenet V1 0.25 192 | [mobilenet_v1_0.25_192_quant.tgz](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.25_192_quant.tgz) | +| Mobilenet V1 0.25 224 | [mobilenet_v1_0.25_224_quant.tgz](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.25_224_quant.tgz) | +| Mobilenet V1 0.50 128 | [mobilenet_v1_0.5_128_quant.tgz](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.5_128_quant.tgz) | +| Mobilenet V1 0.50 160 | [mobilenet_v1_0.5_160_quant.tgz](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.5_160_quant.tgz) | +| Mobilenet V1 0.50 192 | [mobilenet_v1_0.5_192_quant.tgz](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.5_192_quant.tgz) | +| Mobilenet V1 0.50 224 | [mobilenet_v1_0.5_224_quant.tgz](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.5_224_quant.tgz) | +| Mobilenet V1 0.75 128 | [mobilenet_v1_0.75_128_quant.tgz](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.75_128_quant.tgz) | +| Mobilenet V1 0.75 160 | [mobilenet_v1_0.75_160_quant.tgz](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.75_160_quant.tgz) | +| Mobilenet V1 0.75 192 | [mobilenet_v1_0.75_192_quant.tgz](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.75_192_quant.tgz) | +| Mobilenet V1 0.75 224 | [mobilenet_v1_0.75_224_quant.tgz](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_0.75_224_quant.tgz) | +| Mobilenet V1 1.0 128 | [mobilenet_v1_1.0_128_quant.tgz](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_128_quant.tgz) | +| Mobilenet V1 1.0 160 | [mobilenet_v1_1.0_160_quant.tgz](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_160_quant.tgz) | +| Mobilenet V1 1.0 192 | [mobilenet_v1_1.0_192_quant.tgz](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_192_quant.tgz) | +| Mobilenet V1 1.0 224 | [mobilenet_v1_1.0_224_quant.tgz](http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_224_quant.tgz) | +| Mobilenet V2 1.0 224 | [mobilenet_v2_1.0_224_quant.tgz](http://download.tensorflow.org/models/tflite_11_05_08/mobilenet_v2_1.0_224_quant.tgz) | +| Inception V1 | [inception_v1_224_quant_20181026.tgz](http://download.tensorflow.org/models/inception_v1_224_quant_20181026.tgz) | +| Inception V2 | [inception_v2_224_quant_20181026.tgz](http://download.tensorflow.org/models/inception_v2_224_quant_20181026.tgz) | +| Inception V3 | [inception_v3_quant.tgz](http://download.tensorflow.org/models/tflite_11_05_08/inception_v3_quant.tgz) | +| Inception V4 | [inception_v4_299_quant_20181026.tgz](http://download.tensorflow.org/models/inception_v4_299_quant_20181026.tgz) | + +It is necessary to specify the following command line parameters for the Model Optimizer to convert some of the models from the list above: `--input input --input_shape [1,HEIGHT,WIDTH,3]`. +Where `HEIGHT` and `WIDTH` are the input images height and width for which the model was trained. + +**Other supported topologies** + +| Model Name| Repository | +| :------------- | -----:| +| ResNext | [Repo](https://github.com/taki0112/ResNeXt-Tensorflow)| +| DenseNet | [Repo](https://github.com/taki0112/Densenet-Tensorflow)| +| CRNN | [Repo](https://github.com/MaybeShewill-CV/CRNN_Tensorflow) | +| NCF | [Repo](https://github.com/tensorflow/models/tree/master/official/recommendation) | +| lm_1b | [Repo](https://github.com/tensorflow/models/tree/master/research/lm_1b) | +| DeepSpeech | [Repo](https://github.com/mozilla/DeepSpeech) | +| A3C | [Repo](https://github.com/miyosuda/async_deep_reinforce) | +| VDCNN | [Repo](https://github.com/WenchenLi/VDCNN) | +| Unet | [Repo](https://github.com/kkweon/UNet-in-Tensorflow) | +| Keras-TCN | [Repo](https://github.com/philipperemy/keras-tcn) | +| PRNet | [Repo](https://github.com/YadiraF/PRNet) | + +* YOLO topologies from DarkNet* can be converted using [instruction](tf_specific/Convert_YOLO_From_Tensorflow.md), +* FaceNet topologies can be converted using [instruction](tf_specific/Convert_FaceNet_From_Tensorflow.md). +* CRNN topologies can be converted using [instruction](tf_specific/Convert_CRNN_From_Tensorflow.md). +* NCF topologies can be converted using [instruction](tf_specific/Convert_NCF_From_Tensorflow.md) +* [GNMT](https://github.com/tensorflow/nmt) topology can be converted using [instruction](tf_specific/Convert_GNMT_From_Tensorflow.md) +* [BERT](https://github.com/google-research/bert) topology can be converted using [this instruction](tf_specific/Convert_BERT_From_Tensorflow.md). +* [XLNet](https://github.com/zihangdai/xlnet) topology can be converted using [this instruction](tf_specific/Convert_XLNet_From_Tensorflow.md). + + + +## Loading Non-Frozen Models to the Model Optimizer + +There are three ways to store non-frozen TensorFlow models and load them to the Model Optimizer: + +1. Checkpoint: + + In this case, a model consists of two files: + - `inference_graph.pb` or `inference_graph.pbtxt` + - `checkpoint_file.ckpt` + + If you do not have an inference graph file, refer to [Freezing Custom Models in Python](#freeze-the-tensorflow-model). + + To convert such TensorFlow model: + + 1. Go to the `/deployment_tools/model_optimizer` directory + 2. Run the `mo_tf.py` script with the path to the checkpoint file to convert a model: + + * If input model is in `.pb` format:
+```sh +python3 mo_tf.py --input_model .pb --input_checkpoint +``` + * If input model is in `.pbtxt` format:
+```sh +python3 mo_tf.py --input_model .pbtxt --input_checkpoint --input_model_is_text +``` + +2. MetaGraph: + + In this case, a model consists of three or four files stored in the same directory: + - `model_name.meta` + - `model_name.index` + - `model_name.data-00000-of-00001` (digit part may vary) + - `checkpoint` (optional) + + To convert such TensorFlow model: + + 1. Go to the `/deployment_tools/model_optimizer` directory + 2. Run the `mo_tf.py` script with a path to the MetaGraph `.meta` file to convert a model:
+```sh +python3 mo_tf.py --input_meta_graph .meta +``` + +3. SavedModel format of TensorFlow 1.x and 2.x versions: + + In this case, a model consists of a special directory with a `.pb` file and several subfolders: `variables`, `assets`, and `assets.extra`. For more information about the SavedModel directory, refer to the [README](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/python/saved_model#components) file in the TensorFlow repository. + + To convert such TensorFlow model: + + 1. Go to the `/deployment_tools/model_optimizer` directory + 2. Run the `mo_tf.py` script with a path to the SavedModel directory to convert a model:
+```sh +python3 mo_tf.py --saved_model_dir +``` + +You can convert TensorFlow 1.x SavedModel format in the environment that has a 1.x or 2.x version of TensorFlow. However, TensorFlow 2.x SavedModel format strictly requires the 2.x version of TensorFlow. +If a model contains operations currently unsupported by OpenVINO, prune these operations by explicit specification of input nodes using the `--input` option. +To determine custom input nodes, display a graph of the model in TensorBoard. To generate TensorBoard logs of the graph, use the `--tensorboard_logs` option. +TensorFlow 2.x SavedModel format has a specific graph due to eager execution. In case of pruning, find custom input nodes in the `StatefulPartitionedCall/*` subgraph of TensorFlow 2.x SavedModel format. + +## Freezing Custom Models in Python\* + +When a network is defined in Python\* code, you have to create an inference graph file. Usually graphs are built in a form +that allows model training. That means that all trainable parameters are represented as variables in the graph. +To be able to use such graph with Model Optimizer such graph should be frozen. +The graph is frozen and dumped to a file with the following code: +```python +import tensorflow as tf +from tensorflow.python.framework import graph_io +frozen = tf.graph_util.convert_variables_to_constants(sess, sess.graph_def, ["name_of_the_output_node"]) +graph_io.write_graph(frozen, './', 'inference_graph.pb', as_text=False) +``` + +Where: + +* `sess` is the instance of the TensorFlow\* Session object where the network topology is defined. +* `["name_of_the_output_node"]` is the list of output node names in the graph; `frozen` graph will + include only those nodes from the original `sess.graph_def` that are directly or indirectly used + to compute given output nodes. `'name_of_the_output_node'` here is an example of possible output + node name. You should derive the names based on your own graph. +* `./` is the directory where the inference graph file should be generated. +* `inference_graph.pb` is the name of the generated inference graph file. +* `as_text` specifies whether the generated file should be in human readable text format or binary. + +## Convert a TensorFlow* Model + +To convert a TensorFlow model: + +1. Go to the `/deployment_tools/model_optimizer` directory +2. Use the `mo_tf.py` script to simply convert a model with the path to the input model `.pb` file: +```sh +python3 mo_tf.py --input_model .pb +``` + +Two groups of parameters are available to convert your model: + +* [Framework-agnostic parameters](Converting_Model_General.md): These parameters are used to convert any model trained in any supported framework. +* [TensorFlow-specific parameters](#tensorflow_specific_conversion_params): Parameters used to convert only TensorFlow models. + +> **NOTE:** The color channel order (RGB or BGR) of an input data should match the channel order of the model training dataset. If they are different, perform the `RGB<->BGR` conversion specifying the command-line parameter: `--reverse_input_channels`. Otherwise, inference results may be incorrect. For more information about the parameter, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](Converting_Model_General.md). + +### Using TensorFlow\*-Specific Conversion Parameters +The following list provides the TensorFlow\*-specific parameters. + +``` +TensorFlow*-specific parameters: + --input_model_is_text + TensorFlow*: treat the input model file as a text + protobuf format. If not specified, the Model Optimizer + treats it as a binary file by default. + --input_checkpoint INPUT_CHECKPOINT + TensorFlow*: variables file to load. + --input_meta_graph INPUT_META_GRAPH + Tensorflow*: a file with a meta-graph of the model + before freezing + --saved_model_dir SAVED_MODEL_DIR + TensorFlow*: directory with a model in SavedModel format + of TensorFlow 1.x or 2.x version + --saved_model_tags SAVED_MODEL_TAGS + Group of tag(s) of the MetaGraphDef to load, in string + format, separated by ','. For tag-set contains + multiple tags, all tags must be passed in. + --tensorflow_custom_operations_config_update TENSORFLOW_CUSTOM_OPERATIONS_CONFIG_UPDATE + TensorFlow*: update the configuration file with node + name patterns with input/output nodes information. + --tensorflow_object_detection_api_pipeline_config TENSORFLOW_OBJECT_DETECTION_API_PIPELINE_CONFIG + TensorFlow*: path to the pipeline configuration file + used to generate model created with help of Object + Detection API. + --tensorboard_logdir TENSORBOARD_LOGDIR + TensorFlow*: dump the input graph to a given directory + that should be used with TensorBoard. + --tensorflow_custom_layer_libraries TENSORFLOW_CUSTOM_LAYER_LIBRARIES + TensorFlow*: comma separated list of shared libraries + with TensorFlow* custom operations implementation. + --disable_nhwc_to_nchw + Disables default translation from NHWC to NCHW +``` + +> **NOTE:** Models produces with TensorFlow\* usually have not fully defined shapes (contain `-1` in some dimensions). It is necessary to pass explicit shape for the input using command line parameter `--input_shape` or `-b` to override just batch dimension. If the shape is fully defined, then there is no need to specify either `-b` or `--input_shape` options. + +#### Command-Line Interface (CLI) Examples Using TensorFlow\*-Specific Parameters + +* Launching the Model Optimizer for Inception V1 frozen model when model file is a plain text protobuf: +```sh +python3 mo_tf.py --input_model inception_v1.pbtxt --input_model_is_text -b 1 +``` + +* Launching the Model Optimizer for Inception V1 frozen model and update custom sub-graph replacement file `transform.json` with information about input and output nodes of the matched sub-graph. For more information about this feature, refer to [Sub-Graph Replacement in the Model Optimizer](../customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md). +```sh +python3 mo_tf.py --input_model inception_v1.pb -b 1 --tensorflow_custom_operations_config_update transform.json +``` + +* Launching the Model Optimizer for Inception V1 frozen model and use custom sub-graph replacement file `transform.json` for model conversion. For more information about this feature, refer to [Sub-Graph Replacement in the Model Optimizer](../customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md). +```sh +python3 mo_tf.py --input_model inception_v1.pb -b 1 --tensorflow_use_custom_operations_config transform.json +``` + +* Launching the Model Optimizer for Inception V1 frozen model and dump information about the graph to TensorBoard log dir `/tmp/log_dir` +```sh +python3 mo_tf.py --input_model inception_v1.pb -b 1 --tensorboard_logdir /tmp/log_dir +``` + +* Launching the Model Optimizer for a model with custom TensorFlow operations (refer to the [TensorFlow* documentation](https://www.tensorflow.org/extend/adding_an_op)) implemented in C++ and compiled into the shared library `my_custom_op.so`. Model Optimizer falls back to TensorFlow to infer output shape of operations implemented in the library if a custom TensorFlow operation library is provided. If it is not provided, a custom operation with an inference function is needed. For more information about custom operations, refer to the [Extending the Model Optimizer with New Primitives](../customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md). +```sh +python3 mo_tf.py --input_model custom_model.pb --tensorflow_custom_layer_libraries ./my_custom_op.so +``` + + +## Convert TensorFlow* 2 Models + +TensorFlow* 2.X officially supports two model formats: SavedModel and Keras H5 (or HDF5). +Below are the instructions on how to convert each of them. + +### SavedModel Format + +A model in the SavedModel format consists of a directory with a `saved_model.pb` file and two subfolders: `variables` and `assets`. +To convert such a model: +1. Go to the `/deployment_tools/model_optimizer` directory. +2. Run the `mo_tf.py` script with a path to the SavedModel directory: +```sh +python3 mo_tf.py --saved_model_dir +``` + +TensorFlow* 2 SavedModel format strictly requires the 2.x version of TensorFlow installed in the +environment for conversion to the Intermediate Representation (IR). + +If a model contains operations currently unsupported by OpenVINO™, +prune these operations by explicit specification of input nodes using the `--input` or `--output` +options. To determine custom input nodes, visualize a model graph in the TensorBoard. + +To generate TensorBoard logs of the graph, use the Model Optimizer `--tensorboard_logs` command-line +option. + +TensorFlow* 2 SavedModel format has a specific graph structure due to eager execution. In case of +pruning, find custom input nodes in the `StatefulPartitionedCall/*` subgraph. + +### Keras H5 + +If you have a model in the HDF5 format, load the model using TensorFlow* 2 and serialize it in the +SavedModel format. Here is an example of how to do it: +```python +import tensorflow as tf +model = tf.keras.models.load_model('model.h5') +tf.saved_model.save(model,'model') +``` + +Then follow the above instructions for the SavedModel format. + +> **NOTE:** Do not use other hacks to resave TensorFlow* 2 models into TensorFlow* 1 formats. + +> **NOTE**: Currently, OpenVINO™ support for TensorFlow* 2 models is in preview (aka Beta), which means limited and not of production quality yet. OpenVINO™ does not support models with Keras RNN and Embedding layers. + + +## Custom Layer Definition + +Internally, when you run the Model Optimizer, it loads the model, goes through the topology, and tries to find each layer type in a list of known layers. Custom layers are layers that are not included in the list of known layers. If your topology contains any layers that are not in this list of known layers, the Model Optimizer classifies them as custom. + +See [Custom Layers in the Model Optimizer](../customize_model_optimizer/Customize_Model_Optimizer.md) for information about: + +* Model Optimizer internal procedure for working with custom layers +* How to convert a TensorFlow model that has custom layers +* Custom layer implementation details + + +## Supported TensorFlow\* Layers +Refer to [Supported Framework Layers ](../Supported_Frameworks_Layers.md) for the list of supported standard layers. + + +## Frequently Asked Questions (FAQ) + +The Model Optimizer provides explanatory messages if it is unable to run to completion due to issues like typographical errors, incorrectly used options, or other issues. The message describes the potential cause of the problem and gives a link to the [Model Optimizer FAQ](../Model_Optimizer_FAQ.md). The FAQ has instructions on how to resolve most issues. The FAQ also includes links to relevant sections in the Model Optimizer Developer Guide to help you understand what went wrong. + + +## Summary +In this document, you learned: + +* Basic information about how the Model Optimizer works with TensorFlow\* models +* Which TensorFlow models are supported +* How to freeze a TensorFlow model +* How to convert a trained TensorFlow model using the Model Optimizer with both framework-agnostic and TensorFlow-specific command-line options diff --git a/docs/MO_DG/prepare_model/convert_model/Converting_Model.md b/docs/MO_DG/prepare_model/convert_model/Converting_Model.md new file mode 100644 index 00000000000000..b523897a773c57 --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/Converting_Model.md @@ -0,0 +1,42 @@ +# Converting a Model to Intermediate Representation (IR) {#openvino_docs_MO_DG_prepare_model_convert_model_Converting_Model} + +Use the mo.py script from the `/deployment_tools/model_optimizer` directory to run the Model Optimizer and convert the model to the Intermediate Representation (IR). +The simplest way to convert a model is to run mo.py with a path to the input model file: +```sh +python3 mo.py --input_model INPUT_MODEL +``` + +> **NOTE**: Some models require using additional arguments to specify conversion parameters, such as `--scale`, `--scale_values`, `--mean_values`, `--mean_file`. To learn about when you need to use these parameters, refer to [Converting a Model Using General Conversion Parameters](Converting_Model_General.md). + +The mo.py script is the universal entry point that can deduce the framework that has produced the input model by a standard extension of the model file: + +* `.caffemodel` - Caffe\* models +* `.pb` - TensorFlow\* models +* `.params` - MXNet\* models +* `.onnx` - ONNX\* models +* `.nnet` - Kaldi\* models. + +If the model files do not have standard extensions, you can use the ``--framework {tf,caffe,kaldi,onnx,mxnet}`` option to specify the framework type explicitly. + +For example, the following commands are equivalent: +```sh +python3 mo.py --input_model /user/models/model.pb +``` +```sh +python3 mo.py --framework tf --input_model /user/models/model.pb +``` + +To adjust the conversion process, you may use general parameters defined in the [Converting a Model Using General Conversion Parameters](Converting_Model_General.md) and +Framework-specific parameters for: +* [Caffe](Convert_Model_From_Caffe.md), +* [TensorFlow](Convert_Model_From_TensorFlow.md), +* [MXNet](Convert_Model_From_MxNet.md), +* [ONNX](Convert_Model_From_ONNX.md), +* [Kaldi](Convert_Model_From_Kaldi.md). + + +## See Also +* [Configuring the Model Optimizer](../Config_Model_Optimizer.md) +* [IR Notation Reference](../../IR_and_opsets.md) +* [Custom Layers in Model Optimizer](../customize_model_optimizer/Customize_Model_Optimizer.md) +* [Model Cutting](Cutting_Model.md) \ No newline at end of file diff --git a/docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md b/docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md new file mode 100644 index 00000000000000..044b6fb013f985 --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md @@ -0,0 +1,241 @@ +# Converting a Model Using General Conversion Parameters {#openvino_docs_MO_DG_prepare_model_convert_model_Converting_Model_General} + +To simply convert a model trained by any supported framework, run the Model Optimizer launch script ``mo.py`` with +specifying a path to the input model file: +```sh +python3 mo.py --input_model INPUT_MODEL +``` + +> **NOTE:** The color channel order (RGB or BGR) of an input data should match the channel order of the model training dataset. If they are different, perform the `RGB<->BGR` conversion specifying the command-line parameter: `--reverse_input_channels`. Otherwise, inference results may be incorrect. For details, refer to [When to Reverse Input Channels](#when_to_reverse_input_channels). + +To adjust the conversion process, you can also use the general (framework-agnostic) parameters: + +```sh +optional arguments: + -h, --help show this help message and exit + --framework {tf,caffe,mxnet,kaldi,onnx} + Name of the framework used to train the input model. + +Framework-agnostic parameters: + --input_model INPUT_MODEL, -w INPUT_MODEL, -m INPUT_MODEL + Tensorflow*: a file with a pre-trained model (binary + or text .pb file after freezing). Caffe*: a model + proto file with model weights + --model_name MODEL_NAME, -n MODEL_NAME + Model_name parameter passed to the final create_ir + transform. This parameter is used to name a network in + a generated IR and output .xml/.bin files. + --output_dir OUTPUT_DIR, -o OUTPUT_DIR + Directory that stores the generated IR. By default, it + is the directory from where the Model Optimizer is + launched. + --input_shape INPUT_SHAPE + Input shape(s) that should be fed to an input node(s) + of the model. Shape is defined as a comma-separated + list of integer numbers enclosed in parentheses or + square brackets, for example [1,3,227,227] or + (1,227,227,3), where the order of dimensions depends + on the framework input layout of the model. For + example, [N,C,H,W] is used for Caffe* models and + [N,H,W,C] for TensorFlow* models. Model Optimizer + performs necessary transformations to convert the + shape to the layout required by Inference Engine + (N,C,H,W). The shape should not contain undefined + dimensions (? or -1) and should fit the dimensions + defined in the input operation of the graph. If there + are multiple inputs in the model, --input_shape should + contain definition of shape for each input separated + by a comma, for example: [1,3,227,227],[2,4] for a + model with two inputs with 4D and 2D shapes. + Alternatively, specify shapes with the --input + option. + --scale SCALE, -s SCALE + All input values coming from original network inputs + will be divided by this value. When a list of inputs + is overridden by the --input parameter, this scale is + not applied for any input that does not match with the + original input of the model. + --reverse_input_channels + Switch the input channels order from RGB to BGR (or + vice versa). Applied to original inputs of the model + if and only if a number of channels equals 3. Applied + after application of --mean_values and --scale_values + options, so numbers in --mean_values and + --scale_values go in the order of channels used in the + original model. + --log_level {CRITICAL,ERROR,WARN,WARNING,INFO,DEBUG,NOTSET} + Logger level + --input INPUT Quoted list of comma-separated input nodes names with + shapes, data types, and values for freezing. The shape + and value are specified as space-separated lists. The + data type of input node is specified in braces and can + have one of the values: f64 (float64), f32 (float32), + f16 (float16), i64 (int64), i32 (int32), u8 (uint8), + boolean. For example, use the following format to set + input port 0 of the node `node_name1` with the shape + [3 4] as an input node and freeze output port 1 of the + node `node_name2` with the value [20 15] of the int32 + type and shape [2]: "0:node_name1[3 + 4],node_name2:1[2]{i32}->[20 15]". + --output OUTPUT The name of the output operation of the model. For + TensorFlow*, do not add :0 to this name. + --mean_values MEAN_VALUES, -ms MEAN_VALUES + Mean values to be used for the input image per + channel. Values to be provided in the (R,G,B) or + [R,G,B] format. Can be defined for desired input of + the model, for example: "--mean_values + data[255,255,255],info[255,255,255]". The exact + meaning and order of channels depend on how the + original model was trained. + --scale_values SCALE_VALUES + Scale values to be used for the input image per + channel. Values are provided in the (R,G,B) or [R,G,B] + format. Can be defined for desired input of the model, + for example: "--scale_values + data[255,255,255],info[255,255,255]". The exact + meaning and order of channels depend on how the + original model was trained. + --data_type {FP16,FP32,half,float} + Data type for all intermediate tensors and weights. If + original model is in FP32 and --data_type=FP16 is + specified, all model weights and biases are quantized + to FP16. + --disable_fusing Turn off fusing of linear operations to Convolution + --disable_resnet_optimization + Turn off resnet optimization + --finegrain_fusing FINEGRAIN_FUSING + Regex for layers/operations that won't be fused. + Example: --finegrain_fusing Convolution1,.*Scale.* + --disable_gfusing Turn off fusing of grouped convolutions + --enable_concat_optimization + Turn on Concat optimization. + --move_to_preprocess Move mean values to IR preprocess section + --extensions EXTENSIONS + Directory or a comma separated list of directories + with extensions. To disable all extensions including + those that are placed at the default location, pass an + empty string. + --batch BATCH, -b BATCH + Input batch size + --version Version of Model Optimizer + --silent Prevent any output messages except those that + correspond to log level equals ERROR, that can be set + with the following option: --log_level. By default, + log level is already ERROR. + --freeze_placeholder_with_value FREEZE_PLACEHOLDER_WITH_VALUE + Replaces input layer with constant node with provided + value, for example: "node_name->True". It will be + DEPRECATED in future releases. Use --input option to + specify a value for freezing. + --generate_deprecated_IR_V7 + Force to generate old deprecated IR V7 with layers + from old IR specification. + --keep_shape_ops [ Experimental feature ] Enables `Shape` operation + with all children keeping. This feature makes model + reshapable in Inference Engine + --disable_weights_compression + Disable compression and store weights with original + precision. + --progress Enable model conversion progress display. + --stream_output Switch model conversion progress display to a + multiline mode. + --transformations_config TRANSFORMATIONS_CONFIG + Use the configuration file with transformations + description. +``` + +The sections below provide details on using particular parameters and examples of CLI commands. + +## When to Specify Mean and Scale Values +Usually neural network models are trained with the normalized input data. This means that the input data values are converted to be in a specific range, for example, `[0, 1]` or `[-1, 1]`. Sometimes the mean values (mean images) are subtracted from the input data values as part of the pre-processing. There are two cases how the input data pre-processing is implemented. + * The input pre-processing operations are a part of a topology. In this case, the application that uses the framework to infer the topology does not pre-process the input. + * The input pre-processing operations are not a part of a topology and the pre-processing is performed within the application which feeds the model with an input data. + +In the first case, the Model Optimizer generates the IR with required pre-processing layers and Inference Engine samples may be used to infer the model. + +In the second case, information about mean/scale values should be provided to the Model Optimizer to embed it to the generated IR. Model Optimizer provides a number of command line parameters to specify them: `--scale`, `--scale_values`, `--mean_values`, `--mean_file`. + +If both mean and scale values are specified, the mean is subtracted first and then scale is applied. Input values are *divided* by the scale value(s). + +There is no a universal recipe for determining the mean/scale values for a particular model. The steps below could help to determine them: +* Read the model documentation. Usually the documentation describes mean/scale value if the pre-processing is required. +* Open the example script/application executing the model and track how the input data is read and passed to the framework. +* Open the model in a visualization tool and check for layers performing subtraction or multiplication (like `Sub`, `Mul`, `ScaleShift`, `Eltwise` etc) of the input data. If such layers exist, the pre-processing is most probably the part of the model. + +## When to Specify Input Shapes +There are situations when the input data shape for the model is not fixed, like for the fully-convolutional neural networks. In this case, for example, TensorFlow\* models contain `-1` values in the `shape` attribute of the `Placeholder` operation. Inference Engine does not support input layers with undefined size, so if the input shapes are not defined in the model, the Model Optimizer fails to convert the model. The solution is to provide the input shape(s) using the `--input` or `--input_shape` command line parameter for all input(s) of the model or provide the batch size using the `-b` command line parameter if the model contains just one input with undefined batch size only. In the latter case, the `Placeholder` shape for the TensorFlow\* model looks like this `[-1, 224, 224, 3]`. + +## When to Reverse Input Channels +Input data for your application can be of RGB or BRG color input order. For example, Inference Engine samples load input images in the BGR channels order. However, the model may be trained on images loaded with the opposite order (for example, most TensorFlow\* models are trained with images in RGB order). In this case, inference results using the Inference Engine samples may be incorrect. The solution is to provide `--reverse_input_channels` command line parameter. Taking this parameter, the Model Optimizer performs first convolution or other channel dependent operation weights modification so these operations output will be like the image is passed with RGB channels order. + +## When to Specify `--keep_shape_ops` Command Line Parameter +The `--keep_shape_ops` is an **experimental** command line parameter, so the model conversion may fail if it is specified. + +By default, the Model Optimizer evaluates shapes of all operations in the model (shape propagation) for a fixed input(s) shape(s). During the shape propagation the Model Optimizer evaluates operations *Shape* and removes them from the computation graph. With that approach, the initial model which can consume inputs of different shapes may be converted to IR working with the input of one fixed shape only. For example, consider the case when some blob is reshaped from 4D of a shape *[N, C, H, W]* to a shape *[N, C, H \* W]*. During the model conversion the Model Optimize calculates output shape as a constant 1D blob with values *[N, C, H \* W]*. So if the input shape changes to some other value *[N,C,H1,W1]* (it is possible scenario for a fully convolutional model) then the reshape layer becomes invalid. + +If the `--keep_shape_ops` command line parameter is specified then the Model Optimizer keeps *Shape* operations in the model and inserts additional layers to convert the graph layout from NHWC to NCHW layout if necessary. + +## Examples of CLI Commands + +Launch the Model Optimizer for the Caffe bvlc_alexnet model with debug log level: +```sh +python3 mo.py --input_model bvlc_alexnet.caffemodel --log_level DEBUG +``` + +Launch the Model Optimizer for the Caffe bvlc_alexnet model with the output IR called `result.*` in the specified `output_dir`: +```sh +python3 mo.py --input_model bvlc_alexnet.caffemodel --model_name result --output_dir /../../models/ +``` + +Launch the Model Optimizer for the Caffe bvlc_alexnet model with one input with scale values: +```sh +python3 mo.py --input_model bvlc_alexnet.caffemodel --scale_values [59,59,59] +``` + +Launch the Model Optimizer for the Caffe bvlc_alexnet model with multiple inputs with scale values: +```sh +python3 mo.py --input_model bvlc_alexnet.caffemodel --input data,rois --scale_values [59,59,59],[5,5,5] +``` + +Launch the Model Optimizer for the Caffe bvlc_alexnet model with multiple inputs with scale and mean values specified for the particular nodes: +```sh +python3 mo.py --input_model bvlc_alexnet.caffemodel --input data,rois --mean_values data[59,59,59] --scale_values rois[5,5,5] +``` + +Launch the Model Optimizer for the Caffe bvlc_alexnet model with specified input layer, overridden input shape, scale 5, batch 8 and specified name of an output operation: +```sh +python3 mo.py --input_model bvlc_alexnet.caffemodel --input "data[1 3 224 224]" --output pool5 -s 5 -b 8 +``` + +Launch the Model Optimizer for the Caffe bvlc_alexnet model with disabled fusing for linear operations to Convolution and grouped convolutions: +```sh +python3 mo.py --input_model bvlc_alexnet.caffemodel --disable_fusing --disable_gfusing +``` + +Launch the Model Optimizer for the Caffe bvlc_alexnet model with reversed input channels order between RGB and BGR, specified mean values to be used for the input image per channel and specified data type for input tensor values: +```sh +python3 mo.py --input_model bvlc_alexnet.caffemodel --reverse_input_channels --mean_values [255,255,255] --data_type FP16 +``` + +Launch the Model Optimizer for the Caffe bvlc_alexnet model with extensions listed in specified directories, specified mean_images binaryproto. + file For more information about extensions, please refer to [this](../customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md) page. +```sh +python3 mo.py --input_model bvlc_alexnet.caffemodel --extensions /home/,/some/other/path/ --mean_file /path/to/binaryproto +``` + +Launch the Model Optimizer for TensorFlow* FaceNet* model with a placeholder freezing value. +It replaces the placeholder with a constant layer that contains the passed value. +For more information about FaceNet conversion, please refer to [this](tf_specific/Convert_FaceNet_From_Tensorflow.md) page +```sh +python3 mo.py --input_model FaceNet.pb --input "phase_train->False" +``` + +Launch the Model Optimizer for any model with a placeholder freezing tensor of values. +It replaces the placeholder with a constant layer that contains the passed values. + +Tensor here is represented in square brackets with each value separated from another by a whitespace. +If data type is set in the model, this tensor will be reshaped to a placeholder shape and casted to placeholder data type. +Otherwise, it will be casted to data type passed to `--data_type` parameter (by default, it is FP32). +```sh +python3 mo.py --input_model FaceNet.pb --input "placeholder_layer_name->[0.1 1.2 2.3]" +``` diff --git a/docs/MO_DG/prepare_model/convert_model/Cutting_Model.md b/docs/MO_DG/prepare_model/convert_model/Cutting_Model.md new file mode 100644 index 00000000000000..096254ce19e25d --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/Cutting_Model.md @@ -0,0 +1,392 @@ +# Cutting Off Parts of a Model {#openvino_docs_MO_DG_prepare_model_convert_model_Cutting_Model} + +Sometimes some parts of a model must be removed while the Model Optimizer is converting models to the Intermediate Representation. This chapter describes methods of doing cutting off parts of a model using Model Optimizer command-line options. Model cutting applies mostly to TensorFlow\* models, but is also useful for other frameworks. In this chapter, TensorFlow examples are used for illustration. + +## Purpose of Model Cutting + +The following examples are the situations when model cutting is useful or even required: + +* model has pre- or post-processing parts that cannot be translated to existing Inference Engine layers. +* model has a training part that is convenient to be kept in the model, but not used during inference. +* model is too complex (contains lots of unsupported operations that cannot be easily implemented as custom layers), so the complete model cannot be converted in one shot. +* model is one of the supported [SSD models](../customize_model_optimizer/TensorFlow_SSD_ObjectDetection_API.md). In this case, you need to cut a post-processing part off. +* problem with model conversion in the Model Optimizer or inference in the Inference Engine occurred. To localize the issue, limit the scope for conversion by iteratively searching for problematic places in the model. +* single custom layer or a combination of custom layers is isolated for debugging purposes. + +## Command-Line Options + +Model Optimizer provides command line options `--input` and `--output` to specify new entry and exit nodes, while ignoring the rest of the model: + +* `--input` option accepts a comma-separated list of layer names of the input model that should be treated as new entry points to the model. +* `--output` option accepts a comma-separated list of layer names of the input model that should be treated as new exit points from the model. + +The `--input` option is required for cases unrelated to model cutting. For example, when the model contains several inputs and `--input_shape` or `--mean_values` options are used, you should use the `--input` option to specify the order of input nodes for correct mapping between multiple items provided in `--input_shape` and `--mean_values` and the inputs in the model. This is out of scope. + +Model cutting is illustrated with Inception V1. This model is in `models/research/slim` repository. [This section](Converting_Model.md) describes pre-work to prepare the model for the Model Optimizer to be ready to proceed with this chapter. + +## Default Behavior without --input and --output + +The input model is converted as a whole if neither `--input` nor `--output` command line options are used. All `Placeholder` operations in a TensorFlow\* graph are automatically identified as entry points. The `Input` layer type is generated for each of them. All nodes that have no consumers are automatically identified as exit points. + +For Inception_V1, there is one `Placeholder`: input. If the model is viewed in the TensorBoard\*, the input operation is easy to find: + +![Placeholder in Inception V1](../../img/inception_v1_std_input.png) + +There is only one output operation, which enclosed in a nested name scope `InceptionV1/Logits/Predictions`, the `Reshape` operation has a full name `InceptionV1/Logits/Predictions/Reshape_1`. + +In the TensorBoard, it looks the following way together with some predecessors: + +![TensorBoard with predecessors](../../img/inception_v1_std_output.png) + +Convert this model: +```sh +python3 mo.py --input_model=inception_v1.pb -b 1 +``` +The output `.xml` file with an Intermediate Representation contains the `Input` layer among other layers in the model: +```xml + + + + 1 + 3 + 224 + 224 + + + +``` +The `input` layer is converted from the TensorFlow graph `Placeholder` operation `input` and has the same name. + +The `-b` option is used here for conversion to override a possible undefined batch size (coded as -1 in TensorFlow models). If a model was frozen with a defined batch size, you may omit this option in all the examples. + +The last layer in the model is `InceptionV1/Logits/Predictions/Reshape_1`, which matches an output operation in the TensorFlow graph: +```xml + + + + + 1 + 1001 + + + + + 1 + 1001 + + + +``` +Due to automatic identification of inputs and outputs, you do not need to provide the `--input` and `--output` options to convert the whole model. The following commands are equivalent for the Inception V1 model: +```sh +python3 mo.py --input_model=inception_v1.pb -b 1 + +python3 mo.py --input_model=inception_v1.pb -b 1 --input=input --output=InceptionV1/Logits/Predictions/Reshape_1 +``` +The Intermediate Representations are identical for both conversions. The same is true if the model has multiple inputs and/or outputs. + +## Model Cutting + +Now consider how to cut some parts of the model off. This chapter uses the first convolution block `InceptionV1/InceptionV1/Conv2d_1a_7x7` of the Inception V1 model to illustrate cutting: + +![Inception V1 first convolution block](../../img/inception_v1_first_block.png) + +### Cutting at the End + +If you want to cut your model at the end, you have the following options: + +1. The following command cuts off the rest of the model after the `InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu`, making this node the last in the model: +```sh +python3 mo.py --input_model=inception_v1.pb -b 1 --output=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu +``` + The resulting Intermediate Representation has three layers: +```xml + + + + + + ... + + + + + + ... + + + ... + + + + + + + + + ... + + + ... + + + + + + + + +``` + As you can see in the TensorBoard picture, the original model has more nodes than Intermediate Representation. Model Optimizer has fused batch normalization `InceptionV1/InceptionV1/Conv2d_1a_7x7/BatchNorm` to the convolution `InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution`, and it is not present in the final Intermediate Representation. This is not an effect of the `--output` option, it is usual behavior of the Model Optimizer for batch normalizations and convolutions. The effect of the `--output` is that the `ReLU` layer becomes the last one in the converted model. + +2. The following command cuts the edge that comes from 0 output port of the `InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu` and the rest of the model, making this node the last one in the model: +```sh +python3 mo.py --input_model=inception_v1.pb -b 1 --output=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu:0 +``` + The resulting Intermediate Representation has three layers, which are the same as in the previous case: +```xml + + + + + + ... + + + + + + ... + + + ... + + + + + + + + + ... + + + ... + + + + + + + + +``` + This type of cutting is useful to cut edges in case of multiple output edges. + +3. The following command cuts the edge that comes to 0 input port of the `InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu` and the rest of the model including `InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu`, deleting this node and making the previous node `InceptionV1/InceptionV1/Conv2d_1a_7x7/Conv2D` the last in the model: +```sh +python3 mo.py --input_model=inception_v1.pb -b 1 --output=0:InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu +``` + The resulting Intermediate Representation has two layers, which are the same as the first two layers in the previous case: +```xml + + + + + + ... + + + + + + ... + + + ... + + + + + + + + + + + +``` + +### Cutting from the Beginning + +If you want to go further and cut the beginning of the model, leaving only the `ReLU` layer, you have the following options: + +1. You can use the following command line, where `--input` and `--output` specify the same node in the graph: +```sh +python3 mo.py --input_model=inception_v1.pb -b 1 --output=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu --input=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu +``` + The resulting Intermediate Representation looks as follows: +```xml + + + + + + ... + + + + + ... + + + ... + + + + + + + +``` + `Input` layer is automatically created to feed the layer that is converted from the node specified in `--input`, which is `InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu` in this case. Model Optimizer does not replace the `ReLU` node by the `Input` layer, it produces such Intermediate Representation to make the node be the first executable node in the final Intermediate Representation. So the Model Optimizer creates enough `Inputs` to feed all input ports of the node that is passed in `--input`. + + Even though `--input_shape` is not specified in the command line, the shapes for layers are inferred from the beginning of the original TensorFlow* model to the point at which the new input is defined. It has the same shape [1,64,112,112] as the model converted as a whole or without cutting off the beginning. + +2. You can cut edge incoming to layer by port number. To specify incoming port use notation `--input=port:input_node`. +So, to cut everything before `ReLU` layer, cut edge incoming in port 0 of `InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu` node: +```sh +python3 mo.py --input_model=inception_v1.pb -b 1 --input=0:InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu --output=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu +``` + The resulting Intermediate Representation looks as follows: +```xml + + + + + + ... + + + + + ... + + + ... + + + + + + + +``` + `Input` layer is automatically created to feed the layer that is converted from the node specified in `--input`, which is `InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu` in this case. Model Optimizer does not replace the `ReLU` node by the `Input` layer, it produces such Intermediate Representation to make the node be the first executable node in the final Intermediate Representation. So the Model Optimizer creates enough `Inputs` to feed all input ports of the node that is passed in `--input`. + + Even though `--input_shape` is not specified in the command line, the shapes for layers are inferred from the beginning of the original TensorFlow* model to the point at which the new input is defined. It has the same shape [1,64,112,112] as the model converted as a whole or without cutting off the beginning. + +3. You can cut edge outcoming from layer by port number. To specify outcoming port use notation `--input=input_node:port`. +So, to cut everything before `ReLU` layer, cut edge from `InceptionV1/InceptionV1/Conv2d_1a_7x7/BatchNorm/batchnorm/add_1` node to `ReLU`: +```sh +python3 mo.py --input_model=inception_v1.pb -b 1 --input=InceptionV1/InceptionV1/Conv2d_1a_7x7/BatchNorm/batchnorm/add_1:0 --output=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu +``` + The resulting Intermediate Representation looks as follows: +```xml + + + + + + ... + + + + + ... + + + ... + + + + + + + +``` + +## Shape Override for New Inputs + +The input shape can be overridden with `--input_shape`. In this case, the shape is applied to the node referenced in `--input`, not to the original `Placeholder` in the model. For example, this command line +```sh +python3 mo.py --input_model=inception_v1.pb --input_shape=[1,5,10,20] --output=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu --input=InceptionV1/InceptionV1/Conv2d_1a_7x7/Relu +``` + +gives the following shapes in the `Input` and `ReLU` layers: + +```xml + + + + 1 + 20 + 5 + 10 + + + + + + + 1 + 20 + 5 + 10 + + + + + 1 + 20 + 5 + 10 + + + +``` +An input shape [1,20,5,10] in the final Intermediate Representation differs from the shape [1,5,10,20] specified in the command line, because the original TensorFlow\* model uses NHWC layout, but the Intermediate Representation uses NCHW layout. So usual NHWC to NCHW layout conversion occurred. + +When `--input_shape` is specified, shape inference inside the Model Optimizer is not performed for the nodes in the beginning of the model that are not included in the translated region. It differs from the case when `--input_shape` is not specified as noted in the previous section where the shape inference is still performed for such nodes to deduce shape for the layers that should fall into the final Intermediate Representation. So `--input_shape` should be used for a model with a complex graph with loops, which are not supported by the Model Optimizer, to exclude such parts from the Model Optimizer shape inference process completely. + +## Inputs with Multiple Input Ports + +There are operations that contain more than one input ports. In the example considered here, the convolution `InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution` is such operation. When `--input_shape` is not provided, a new `Input` layer is created for each dynamic input port for the node. If a port is evaluated to a constant blob, this constant remains in the model and a corresponding input layer is not created. TensorFlow convolution used in this model contains two ports: + +* port 0: input tensor for convolution (dynamic) +* port 1: convolution weights (constant) + +Following this behavior, the Model Optimizer creates an `Input` layer for port 0 only, leaving port 1 as a constant. So the result of: + +```sh +python3 mo.py --input_model=inception_v1.pb -b 1 --input=InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution +``` + +is identical to the result of conversion of the model as a whole, because this convolution is the first executable operation in Inception V1. + +Different behavior occurs when `--input_shape` is also used as an attempt to override the input shape: +```sh +python3 mo.py --input_model=inception_v1.pb--input=InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution --input_shape=[1,224,224,3] +``` +An error occurs (for more information, see FAQ #30): +```sh +[ ERROR ] Node InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution has more than 1 input and input shapes were provided. +Try not to provide input shapes or specify input port with PORT:NODE notation, where PORT is an integer. +For more information, see FAQ #30 +``` +In this case, when `--input_shape` is specified and the node contains multiple input ports, you need to specify an input port index together with an input node name. The input port index is specified in front of the node name with ':' as a separator (`PORT:NODE`). In the considered case, the port index 0 of the node `InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution` should be specified as `0:InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution`. + +The correct command line is: +```sh +python3 mo.py --input_model=inception_v1.pb --input=0:InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution --input_shape=[1,224,224,3] +``` \ No newline at end of file diff --git a/docs/MO_DG/prepare_model/convert_model/IR_suitable_for_INT8_inference.md b/docs/MO_DG/prepare_model/convert_model/IR_suitable_for_INT8_inference.md new file mode 100644 index 00000000000000..50b0020ee2f161 --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/IR_suitable_for_INT8_inference.md @@ -0,0 +1,37 @@ +# Intermediate Representation Suitable for INT8 Inference {#openvino_docs_MO_DG_prepare_model_convert_model_IR_suitable_for_INT8_inference} + +## Introduction + +Inference Engine CPU plugin can infer models in the 8-bit integer (INT8) precision. +For details, refer to [INT8 inference on the CPU](../../../IE_DG/Int8Inference.md). + +Intermediate Representation (IR) should be specifically formed to be suitable for the INT8 inference. +Such an IR is called an INT8 IR and you can generate it in two ways: +- [Quantize model with the Post-Training Optimization tool](@ref pot_README) +- Use the Model Optimizer for TensorFlow\* pre-TFLite models (`.pb` model file with `FakeQuantize*` operations) + +For an operation to be executed in INT8, it must have `FakeQuantize` operations as inputs with the `levels` attribute set to `255` or `256`. +See the [specification of `FakeQuantize` operation](../../../ops/quantization/FakeQuantize_1.md) for details. +To see the list of supported INT8 layers, refer to [INT8 inference on the CPU](../../../IE_DG/Int8Inference.md). + +To execute the `Convolution` operation in INT8 on CPU, both data and weight inputs should have `FakeQuantize` as an input operation: +![](../../img/expanded_int8_Convolution_weights.png) + +INT8 IR is also suitable for FP32 and FP16 inference if a chosen plugin supports all operations of the IR, because the only difference between an INT8 IR and FP16 or FP32 IR is the existence of `FakeQuantize` in the INT8 IR. +Plugins with the INT8 inference support recognize these sub-graphs and quantize them during the inference time. +Plugins without the INT8 support execute all operations, including `FakeQuantize`, as is in the FP32 or FP16 precision. + +Accordingly, the presence of FakeQuantize operations in the IR is a recommendation for a plugin on how to quantize particular operations in the model. +If capable, a plugin accepts the recommendation and performs the INT8 inference, otherwise the plugin ignores the recommendation and executes a model in the floating-point precision. + +## Compressed INT8 Weights + +Weighted operations, like `Convolution`, `MatMul`, and others, store weights as floating-point `Constant` in the graph followed by the `FakeQuantize` operation. +`Constant` followed by the `FakeQuantize` operation could be optimized memory-wise due to the `FakeQuantize` operation semantics. +The resulting weights sub-graph stores weights in INT8 `Constant`, which gets unpacked back to floating point with the `Convert` operation. +Weights compression leaves `FakeQuantize` output arithmetically the same and weights storing takes four times less memory. + +See the visualization of `Convolution` with the compressed weights: +![](../../img/compressed_int8_Convolution_weights.png) + +Both Model Optimizer and Post-Training Optimization tool generate a compressed IR by default. To generate an expanded INT8 IR, use `--disable_weights_compression`. \ No newline at end of file diff --git a/docs/MO_DG/prepare_model/convert_model/Legacy_IR_Layers_Catalog_Spec.md b/docs/MO_DG/prepare_model/convert_model/Legacy_IR_Layers_Catalog_Spec.md new file mode 100644 index 00000000000000..569e52381501bc --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/Legacy_IR_Layers_Catalog_Spec.md @@ -0,0 +1,5317 @@ +# Intermediate Representation Notation Reference Catalog {#openvino_docs_MO_DG_prepare_model_convert_model_Legacy_IR_Layers_Catalog_Spec} + +> **NOTE**: This IR Notation Reference is no longer supported since the new concept of operation sets is introduced in OpenVINO 2020.1 version. For a complete list of supported operations, see the [Intermediate Representation and Operation Sets](../../IR_and_opsets.md) topic. + +## Table of Сontents + +* Activation Layer +* ArgMax Layer +* BatchNormalization Layer +* BinaryConvolution Layer +* Bucketize Layer +* Broadcast Layer +* Clamp Layer +* Concat Layer +* Const Layer +* Convolution Layer +* Crop (Type 1) Layer +* Crop (Type 2) Layer +* Crop (Type 3) Layer +* CTCGreadyDecoder Layer +* Deconvolution Layer +* DeformableConvolution Layer +* DepthToSpace Layer +* DetectionOutput Layer +* Erf Layer +* Eltwise Layer +* Fill Layer +* Flatten Layer +* FullyConnected Layer +* Gather Layer +* GRN Layer +* GRUCell Layer +* Input Layer +* Interp Layer +* LSTMCell Layer +* Memory Layer +* MVN Layer +* NonMaxSuppression Layer +* Norm Layer +* Normalize Layer +* OneHot Layer +* Pad Layer +* Permute Layer +* Pooling Layer +* Power Layer +* PReLU Layer +* PriorBox Layer +* PriorBoxClustered Layer +* Proposal Layer +* PSROIPooling Layer +* FakeQuantize Layer +* Range Layer +* RegionYolo Layer +* ReLU Layer +* ReorgYolo Layer +* Resample (Type 1) Layer +* Resample (Type 2) Layer +* Reshape Layer +* ReverseSequence Layer +* RNNCell Layer +* ROIPooling Layer +* ExperimentalDetectronROIFeatureExtractor layer +* ExperimentalSparseWeightedSum layer +* ScaleShift Layer +* Select Layer +* Shape Layer +* ShuffleChannels Layer +* SimplerNMS Layer +* Slice Layer +* SoftMax Layer +* SparseFillEmptyRows Layer +* SparseSegmentMean Layer +* SparseSegmentSqrtN Layer +* SparseSegmentSum Layer +* SparseToDense Layer +* Split Layer +* Squeeze Layer +* StridedSlice Layer +* TensorIterator Layer +* Tile Layer +* TopK Layer +* Unique Layer +* Unsqueeze Layer + +## Activation Layer +Back to top + +**Name**: *Activation* + +**Category**: *Activation* + +**Short description**: *Activation* layer represents an activation function of each neuron in a layer, which is used to add non-linearity to the computational flow. + +**Detailed description**: [Reference](https://medium.com/the-theory-of-everything/understanding-activation-functions-in-neural-networks-9491262884e0) + +**Parameters**: *Activation* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *type* + + * **Description**: *type* represents particular activation function. For example, *type* equal to `sigmoid` means that the neurons of this layer have a sigmoid activation function. + * **Range of values**: + * *sigmoid* - sigmoid activation function. Learn more from the **Detailed description** section. + * *tanh* - tanh activation function. Learn more from the **Detailed description** section. + * *elu* - elu activation function. Learn more from the **Detailed description** section. + * *relu6* - relu6 activation function + * *not* - logical NOT function + * *exp* - exponent function + * **Type**: `string` + * **Default value**: None + * **Required**: *yes* + +**Mathematical Formulation** + +* Sigmoid function: + \f[ + f(x) = \frac{1}{1+e^{-x}} + \f] +* Tahn function: + \f[ + f (x) = \frac{2}{1+e^{-2x}} - 1 = 2sigmoid(2x) - 1 + \f] +* Elu function: + \f[ + f(x) = \left\{\begin{array}{ll} + e^{x} - 1 \quad \mbox{if } x < 0 \\ + x \quad \mbox{if } x \geq 0 + \end{array}\right. + \f] +* Relu6 function: + \f[ + f(x) = min(max(0, x), 6) + \f] + +**Inputs**: + +* **1**: Multidimensional input blob. Required. + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## ArgMax Layer +Back to top + +**Name**: *ArgMax* + +**Category**: *Layer* + +**Short description**: *ArgMax* layer computes indexes and values of the *top_k* maximum values for each datum across all dimensions *CxHxW*. + +**Detailed description**: *ArgMax* layer is used after a classification layer to produce a prediction. If the parameter *out_max_val* is 1, output is a vector of pairs `(max_ind, max_val)` for each batch. The *axis* parameter specifies an axis along which to maximize. + +**Parameters**: *ArgMax* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *out_max_val* + + * **Description**: If *out_max_val* is 1, the output is a list of pairs `(max_ind, max_val)`. If *out_max_val* is 0, the output is a list of indexes of size *top_k*. + * **Range of values**: 0 or 1 + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *top_k* + + * **Description**: *top_k* is the number of elements to save in output. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *axis* + + * **Description**: If *axis* is set, maximizes along the specified axis, else maximizes the flattened trailing dimensions for each index of the first / num dimension. + * **Range of values**: an integer. Negative value means counting dimension from the end. + * **Type**: `int` + * **Default value**: None + * **Required**: *no* + +**Inputs**: + +* **1**: 4D input blob. Required. + +**Mathematical Formulation** + +*ArgMax* generally does the following with the input blobs: +\f[ +o_{i} = \left\{ +x| x \in S \wedge \forall y \in S : f(y) \leq f(x) +\right\} +\f] + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## BatchNormalization Layer +Back to top + +**Name**: *BatchNormalization* + +**Category**: *Normalization* + +**Short description**: [Reference](http://caffe.berkeleyvision.org/tutorial/layers/batchnorm.html) + +**Detailed description**: [Reference](https://kratzert.github.io/2016/02/12/understanding-the-gradient-flow-through-the-batch-normalization-layer.html) + +**Parameters**: *BatchNormalization* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *epsilon* + + * **Description**: *epsilon* is the number to be added to the variance to avoid division by zero when normalizing a value. For example, *epsilon* equal to 0.001 means that 0.001 is added to the variance. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *yes* + +**Inputs**: + +* **1**: 4D input blob. Required. + +**Mathematical Formulation** + +*BatchNormalization* normalizes the output in each hidden layer. +* **Input**: Values of \f$x\f$ over a mini-batch: + \f[ + \beta = \{ x_{1...m} \} + \f] +* **Parameters to learn**: \f$ \gamma, \beta\f$ +* **Output**: + \f[ + \{ o_{i} = BN_{\gamma, \beta} ( b_{i} ) \} + \f] +* **Mini-batch mean**: + \f[ + \mu_{\beta} \leftarrow \frac{1}{m}\sum_{i=1}^{m}b_{i} + \f] +* **Mini-batch variance**: + \f[ + \sigma_{\beta }^{2}\leftarrow \frac{1}{m}\sum_{i=1}^{m} ( b_{i} - \mu_{\beta} )^{2} + \f] +* **Normalize**: + \f[ + \hat{b_{i}} \leftarrow \frac{b_{i} - \mu_{\beta}}{\sqrt{\sigma_{\beta }^{2} + \epsilon }} + \f] +* **Scale and shift**: + \f[ + o_{i} \leftarrow \gamma\hat{b_{i}} + \beta = BN_{\gamma ,\beta } ( b_{i} ) + \f] + +**Example** + +```xml + + + ... + ... + +``` + +* * * + + +## BinaryConvolution Layer +Back to top + +**Name**: *BinaryConvolution* + +**Category**: *Layer* + +**Short description**: *BinaryConvolution* convolution with binary weights + +**Parameters**: *BinaryConvolution* layer parameters are specified in the `data` node, which is a child of the `layer` node. The layer has the same parameters as a regular *Convolution* layer and several unique parameters. + +* **Parameter name**: *input* + + * **Description**: *input* is the number of input channels. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *mode* + + * **Description**: *mode* defines how input tensor 0/1 values and weights 0/1 are interpreted as real numbers and how the result is computed. + * **Range of values**: + * *xnor-popcount* + * **Type**: `string` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *pad_value* + + * **Description**: *pad_value* is a floating-point value used to fill pad area. + * **Range of values**: a floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *yes* + +**Inputs**: + +* **1**: 4D input blob containing integer or floats; filled with 0/1 values. 0 means -1, 1 means 1 for `mode="xnor-popcount"`. Required. + +* * * + +## Bucketize Layer +Back to top + +**Name**: *Bucketize* + +**Category**: *Layer* + +**Short description**: *Bucketize* bucketizes the input based on boundaries. This is an equivalent to np.digitize. + +**Detailed description**: [Reference](https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/bucketize) + +* **Parameter name**: *with_right_bound* + + * **Description**: Indicates whether the intervals include the right or the left bucket edge. + * **Range of values**: *True* or *False* + * **Type**: `bool` + * **Default value**: None + * **Required**: *yes* + +**Inputs**: + +* **1**: N-D tensor. Input tensor for the bucketization. It contains with float or integer types. Required. +* **2**: 1-D tensor. Sorted boundaries of the buckets. It contains with a float type. Required. + +**Outputs**: + +* **1**: Output tensor with bucket indices for each element of the first input tensor. If the second input is empty, the bucket indice for all elements is equal to 0. The output tensor shape is the same as the first input tensor shape. + +* * * + +## Clamp Layer +Back to top + +**Name**: *Clamp* + +**Category**: *Layer* + +**Short description**: *Clamp* layer represents clipping activation operation. + +**Detailed description**: [Reference](https://www.tensorflow.org/versions/r1.2/api_docs/MO_DG/prepare_model/python/tf/clip_by_value) + +**Parameters**: *Clamp* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *min* + + * **Description**: *min* is the lower bound of values in the output. Any value in the input that is smaller than the bound is replaced by the *min* value. For example, *min* equal to 10.0 means that any value in the input that is smaller than the bound is replaced by 10.0. + * **Range of values**: a non-negative floating-point number + * **Type**: `float` + * **Default value**: 0.0 + * **Required**: *yes* + +* **Parameter name**: *max* + + * **Description**: *max* is the upper bound of values in the output. Any value in the input that is greater than the bound, is replaced by the *max* value. For example, *max* equal to 50.0 means that any value in the input that is greater than the bound is replaced by 50.0. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: 6.0 + * **Required**: *yes* + +**Inputs**: + +* **1**: Multidimensional input blob. Required. + +**Mathematical Formulation** + +*Clamp* generally does the following with the input blobs: +\f[ +out_i=\left\{\begin{array}{ll} + max\_value \quad \mbox{if } \quad input_i>max\_value \\ + min\_value \quad \mbox{if } \quad input_i +\end{array}\right. +\f] + +**Example** + +```xml + + + ... + ... + +``` + + +* * * + +## Broadcast +Back to top + +**Category**: Layer + +**Short description**: *Broadcast* replicates data on the first input to fit a given shape. + +**Detailed description**: + +*Broadcast* takes the first tensor and, following the [NumPy broadcasting rules specification](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html), builds a new tensor with shape matching the second input tensor. The second input value represents desired output shape. + +**Parameters**: *Broadcast* layer does not have parameters. + +**Inputs**: + +* **1**: Source tensor that is being broadcasted. Required. + +* **2**: 1D tensor describing output shape. Required. + + +**Outputs**: + +* **1**: Output tensor with replicated content from the first tensor with shape defined by the second input. + +**Example** + +```xml + + + + 16 + 1 + 1 + + + 4 + + + + + 1 + 16 + 50 + 50 + + + +``` + +* * * + + +## Concat Layer +Back to top + +**Name**: *Concat* + +**Category**: *Layer* + +**Short description**: [Reference](http://caffe.berkeleyvision.org/tutorial/layers/concat.html) + +**Parameters**: *Concat* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *axis* + + * **Description**: *axis* is the number of axis over which input blobs are concatenated. For example, *axis* equal to 1 means that input blobs are concatenated over the first axis. + * **Range of values**: a non-negative integer + * **Type**: `int` + * **Default value**: 1 + * **Required**: *yes* + +**Inputs**: + +* **1**: Multidimensional input blob. Required. + +* **2**: Multidimensional input blob. Required. + +**Mathematical Formulation** + +*Axis* parameter specifies a blob dimension to concatenate values over. For example, for two input blobs *B1xC1xH1xW1* and *B2xC2xH2xW2*, if `axis="1"`, the output blob is *B1xC1+C2xH1xW1*. This is only possible if *B1=B2*, *H1=H2*, *W1=W2*. + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## Const Layer +Back to top + +**Name**: *Const* + +**Category**: *Layer* + +**Short description**: *Const* layer produces a blob with a constant value specified in the *blobs* section. + +**Parameters**: *Const* layer does not have parameters. + +**Example** + +```xml + + + + 3 + 100 + + + + + + +``` + +* * * + +## Convolution Layer +Back to top + +**Name**: *Convolution* + +**Category**: *Layer* + +**Short description**: [Reference](http://caffe.berkeleyvision.org/tutorial/layers/convolution.html) + +**Detailed description**: [Reference](http://cs231n.github.io/convolutional-networks/#conv) + +**Parameters**: *Convolution* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *strides* + + * **Description**: *strides* is a distance (in pixels) to slide the filter on the feature map over the `(z, y, x)` axes for 3D convolutions and `(y, x)` axes for 2D convolutions. For example, *strides* equal to "4,2,1" means sliding the filter four pixels at a time over depth dimension, two pixels over height dimension, and one pixel over width dimension. + * **Range of values**: a list of non-negative integers + * **Type**: `int[]` + * **Default value**: a list of 1 with length equal to the number of convolution kernel dimensions + * **Required**: *no* + +* **Parameter name**: *pads_begin* + + * **Description**: *pads_begin* is the number of pixels to add to the beginning of each axis. For example, *pads_begin* equal to "1,2" means adding one pixel to the top of the input and two pixels to the left of the input. + * **Range of values**: a list of non-negative integers + * **Type**: `int[]` + * **Default value**: a list of 0 with length equal to the number of convolution kernel dimensions + * **Required**: *no* + +* **Parameter name**: *pads_end* + + * **Description**: *pads_end* is the number of pixels to add to the end of each axis. For example, *pads_end* equal to "1,2" means adding one pixel to the bottom of the input and two pixels to the right of the input. + * **Range of values**: a list of non-negative integers + * **Type**: `int[]` + * **Default value**: a list of 0 with length equal to the number of convolution kernel dimensions + * **Required**: *no* + +* **Parameter name**: *kernel* + + * **Description**: *kernel* is a size of each filter. For example, *kernel* equal to "2,3" means that each filter has height equal to 2 and width equal to 3. + * **Range of values**: a list of non-negative integers + * **Type**: `int[]` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *output* + + * **Description**: *output* is a number of output feature maps in the output. If *group* parameter value is greater than 1, *output* still matches the number of output features regardless of *group* value. For example, *output* equal to 1 means that there is one output feature map in a layer. + * **Range of values**: a non-negative integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *group* + + * **Description**: *group* is the number of groups which *output* and *input* should be split into. For example, *group* equal to 1 means that all filters are applied to the whole input (usual convolution), *group* equal to 2 means that both *input* and *output* channels are separated into two groups and the *i-th output* group is connected to the *i-th input* group channel. *group* equal to a number of output feature maps implies depth-wise separable convolution. For more information, see the [Reference](https://medium.com/towards-data-science/types-of-convolutions-in-deep-learning-717013397f4d#6f51). + * **Range of values**: an integer + * **Type**: `int` + * **Default value**: 1 + * **Required**: *no* + +* **Parameter name**: *dilations* + + * **Description**: *dilations* is a distance in width and height between elements (weights) in the filter. For example, *dilations* equal to "1,1" means that all elements in the filter are neighbors, so it is the same as the usual convolution. *dilations* equal to "2,2" means that all elements in the filter are matched to the elements in the input matrix separated by one pixel. + * **Range of values**: a non-negative integer + * **Type**: `int[]` + * **Default value**: a list of 1 with length equal to the number of convolution kernel dimensions + * **Required**: *no* + +* **Parameter name**: *auto_pad* + + * **Description**: *auto_pad* defines how the padding is calculated. Possible values: + * Not specified: use explicit padding values + * *same_upper/same_lower*: add paddings to the input to match the output size. In case of odd padding value, an extra padding is added to the beginning if `auto_pad="same_upper"` or to the end if `auto_pad="same_lower"`. + * *valid*: do not use padding + * **Type**: `string` + * **Default value**: None + * **Required**: *no* + +**Inputs**: + +* **1**: 4D or 5D input blob. Required. + +**Weights Layout** + +Weights layout is GOIYX (GOIZYX for 3D convolution), which means that *X* changes the fastest, then *Y*, *Input* and *Output*, *Group*. + + +**Mathematical Formulation** + +* For the convolutional layer, the number of output features in each dimension is calculated as: +\f[ +n_{out} = \left ( \frac{n_{in} + 2p - k}{s} \right ) + 1 +\f] +* The receptive field in each layer is calculated as: + * Jump in the output feature map: + \f[ + j_{out} = j_{in} * s + \f] + * Size of the receptive field of output feature: + \f[ + r_{out} = r_{in} + ( k - 1 ) * j_{in} + \f] + * Center position of the receptive field of the first output feature: + \f[ + start_{out} = start_{in} + ( \frac{k - 1}{2} - p ) * j_{in} + \f] + * Output is calculated as: + \f[ + out = \sum_{i = 0}^{n}w_{i}x_{i} + b + \f] + +**Example** + +```xml + + + ... + ... + + + +``` + +* * * + +## Crop (Type 1) Layer +Back to top + +**Name**: *Crop* + +**Category**: *Layer* + +**Short description**: *Crop* layer changes selected dimensions of the input blob according to the specified parameters. + +**Parameters**: *Crop* layer parameters are specified in the `data` section, which is a child of the `layer` node. *Crop* **Type 1** layer takes two input blobs, and the shape of the second blob specifies the *Crop* size. The *Crop* layer of this type supports shape inference. + +* **Parameter name**: *axis* + + * **Description**: *axis* is the number of a dimension to crop. For example, *axis* equal to [1] means that the first dimension is cropped. + * **Range of values**: a list of unique integers, where each element is greater than or equal to 0 and less than input shape length + * **Type**: `int[]` + * **Default value**: `[1]` + * **Required**: *yes* + +* **Parameter name**: *offset* + + * **Description**: *offset* is the starting point for crop in the input blob. For example, *offset* equal to 2 means that crop starts from the second value of a specified axis. + * **Range of values**: a list of integers of the length equal to the length of the *axis* attribute. In the list, *offset[i]* is greater than or equal to 0 and less than or equal to *input_shape[axis[i]] - crop_size[axis[i]]*, where *crop_size* is the shape of the second input. + * **Type**: `int[]` + * **Default value**: None + * **Required**: *yes* + +**Inputs** + +* **1**: Multidimensional input blob +* **2**: Shape of this input will be used for crop + +**Example** + +```xml + + + + + 1 + 21 + 44 + 44 + + + 1 + 21 + 34 + 34 + + + + + 1 + 21 + 34 + 34 + + + +``` + +* * * + +## Crop (Type 2) Layer +Back to top + +**Name**: *Crop* + +**Category**: *Layer* + +**Short description**: *Crop* layer changes selected dimensions of the input blob according to the specified parameters. + +**Parameters**: Specify parameters for the *Crop* layer in the `data` section, which is a child of the `layer` node. *Crop* **Type 2** layer takes one input blob to crop. The *Crop* layer of this type supports shape inference only when shape propagation is applied to dimensions not specified in the *axis* attribute. + +* **Parameter name**: *axis* + + * **Description**: *axis* is the number of a dimension to crop. For example, *axis* equal to [1] means that the first dimension is cropped. + * **Range of values**: a list of unique integers, where each element is greater than or equal to 0 and less than input shape length + * **Type**: `int[]` + * **Default value**: `[1]` + * **Required**: *yes* + +* **Parameter name**: *offset* + + * **Description**: *offset* is the starting point for crop in the input blob. For example, *offset* equal to 2 means that cropping starts from the second value of the specified axis. + * **Range of values**: a list of integers with the length equal to the length of *axis* attribute, where *offset[i]* is greater than or equal to 0 and less or equal to *input_shape[axis[i]] - dim[i]* + * **Type**: `int[]` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *dim* + + * **Description**: *dim* is the resulting size of the output blob for the specified axis. For example, *dim* equal to [88] means that the output blob gets the dimension equal to 88 for the specified axis. + * **Range of values**: a list of integers + * **Type**: `int[]` + * **Default value**: None + * **Required**: *yes* + +**Example** + +```xml + + + + + 1 + 21 + 44 + 44 + + + + + 1 + 21 + 34 + 34 + + + +``` + +* * * + +## Crop (Type 3) Layer +Back to top + +**Name**: *Crop* + +**Category**: *Layer* + +**Short description**: *Crop* layer changes selected dimensions of the input blob according to the specified parameters. + +**Parameters**: *Crop* layer parameters are specified in the `data` section, which is a child of the `layer` node. *Crop* **Type 3** layer takes one input blob to crop. The *Crop* layer of this type supports shape inference. + +* **Parameter name**: *axis* + + * **Description**: *axis* is the number of a dimension to crop. For example, *axis* equal to [1] means that the first dimension is cropped. + * **Range of values**: a list of unique integers, where each element is greater than or equal to 0 and less than input shape length + * **Type**: `int[]` + * **Default value**: `[1]` + * **Required**: *yes* + +* **Parameter name**: *crop_begin* + + * **Description**: *crop_begin* specifies the starting offset for crop in the input blob for a specified axes. + * **Range of values**: a list of integers, where *crop_begin[i]* is greater than or equal to 0 and less than *input_shape[axis[i]] - crop_end[i]* + * **Type**: `int[]` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *crop_end* + + * **Description**: *crop_end* specifies the ending offset for crop in the input blob for the specified axes. + * **Range of values**: a list of integers, where *crop_end[i]* is greater than or equal to 0 and less than *input_shape[axis[i]] - crop_begin[i]* + * **Type**: `int[]` + * **Default value**: None + * **Required**: *yes* + +**Example** + +```xml + + + + + 1 + 21 + 44 + 44 + + + + + 1 + 21 + 34 + 34 + + + +``` + +* * * + +## CTCGreedyDecoder Layer +Back to top + +**Name**: *CTCGreedyDecoder* + +**Category**: *Layer* + +**Short description**: *CTCGreedyDecoder* performs greedy decoding on the logits given in input (best path). + +**Detailed description**: [Reference](https://www.tensorflow.org/api_docs/python/tf/nn/ctc_greedy_decoder) + +**Parameters**: *CTCGreedyDecoder* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *ctc_merge_repeated* + + * **Description**: *ctc_merge_repeated* is a flag for merging repeated labels during the CTC calculation. + * **Range of values**: 0 or 1 + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +**Mathematical Formulation** + +Given an input sequence \f$X\f$ of length \f$T\f$, *CTCGreadyDecoder* assumes the probability of a length \f$T\f$ character sequence \f$C\f$ is given by +\f[ +p(C|X) = \prod_{t=1}^{T} p(c_{t}|X) +\f] + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## Deconvolution Layer +Back to top + +**Name**: *Deconvolution* + +**Category**: *Layer* + +**Short description**: *Deconvolution* layer is applied for upsampling the output to the higher image resolution. + +**Detailed description**: [Reference](https://distill.pub/2016/deconv-checkerboard/) + +**Parameters**: *Deconvolution* layer parameters should be specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *strides* + + * **Description**: *strides* is a distance (in pixels) to slide the filter on the feature map over the `(z, y, x)` axes for 3D deconvolutions and `(y, x)` axes for 2D deconvolutions. For example, *strides* equal to "4,2,1" means sliding the filter four pixels at a time over depth dimension, two pixels over height dimension, and one pixel over width dimension. + * **Range of values**: a list of non-negative integers + * **Type**: `int[]` + * **Default value**: a list of 1 with length equal to the number of convolution kernel dimensions + * **Required**: *no* + +* **Parameter name**: *pads_begin* + + * **Description**: *pads_begin* is the number of pixels to add to the beginning of each axis. For example, *pads_begin* equal to "1,2" means adding one pixel to the top of the input and two pixels to the left of the input. + * **Range of values**: a list of non-negative integers + * **Type**: `int[]` + * **Default value**: a list of 0 with length equal to the number of convolution kernel dimensions + * **Required**: *no* + +* **Parameter name**: *pads_end* + + * **Description**: *pads_end* is the number of pixels to add to the end of each axis. For example, *pads_end* equal to "1,2" means adding one pixel to the bottom of the input and two pixels to the right of the input. + * **Range of values**: a list of non-negative integers + * **Type**: `int[]` + * **Default value**: a list of 1 with length equal to the number of convolution kernel dimensions + * **Required**: *no* + +* **Parameter name**: *kernel* + + * **Description**: *kernel* is a size of each filter. For example, *kernel* equal to "2,3" means that each filter has height equal to 2 and width equal to 3. + * **Range of values**: a list of positive integers + * **Type**: `int[]` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *output* + + * **Description**: *output* is the number of output feature maps in the output. If *group* parameter value is greater than 1, *output* still matches the number of output features regardless of the *group* value. For example, *output* equal to 1 means that there is one output feature map in a layer. + * **Range of values**: a non-negative integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *group* + + * **Description**: *group* denotes the number of groups to which *output* and *input* should be split. For example, *group* equal to 1 means that all filters are applied to the whole input (usual convolution), *group* equal to 2 means that both *input* and *output* channels are separated into 2 groups and *i-th output* group is connected to *i-th input* group channels. *group* equal to a number of output feature maps implies depth-wise separable convolution. For more information, see the [Reference](https://medium.com/towards-data-science/types-of-convolutions-in-deep-learning-717013397f4d#6f51). + * **Range of values**: a non-negative integer + * **Type**: `int` + * **Default value**: 1 + * **Required**: *no* + +* **Parameter name**: *dilations* + + * **Description**: *dilations* is the distance in width and height between elements (weights) in the filter. For example, *dilation* equal to "1,1" means that all elements in the filter are neighbors, so it is the same as the usual convolution. *dilation* equal to "2,2" means that all elements in the filter are matched to the elements in the input matrix separated by one pixel. + * **Range of values**: a list of non-negative integers + * **Type**: `int[]` + * **Default value**: a list of 1 with length equal to the number of convolution kernel dimensions + * **Required**: *no* + +* **Parameter name**: *auto_pad* + + * **Description**: *auto_pad* defines how the padding is calculated. + * **Range of values**: + * Not specified: use explicit padding values. + * *same_upper/same_lower*: add paddings to the input to match the output size. In case of odd padding value, an extra padding is added to the beginning if `auto_pad="same_upper"` or to the end if `auto_pad="same_lower"`. + * *valid*: do not use padding + * **Type**: string + * **Default value**: None + * **Required**: *no* + +**Inputs**: + +* **1**: 4D or 5D blob with input data. Required. + +**Weights Layout** + +Weights layout is GOIYX, which means that *X* changes the fastest, then *Y*, *Input* and *Output*, *Group*. + + +**Mathematical Formulation** + +*Deconvolution* is also called transpose convolution and performs operation that is reverse to convolution. +The number of output features for each dimensions is calculated as: +\f[S_{o}=stride(S_{i} - 1 ) + S_{f} - 2pad \f] +Where \f$S\f$ is the size of output, input, and filter. +Output is calculated in the same way as for convolution layer: +\f[out = \sum_{i = 0}^{n}w_{i}x_{i} + b\f] + +**Example** + +```xml + + + + + 1 + 512 + 8 + 8 + 8 + + + + + 1 + 512 + 16 + 16 + 16 + + + + + + + +``` + +* * * + +## DeformableConvolution Layer +Back to top + +**Name**: *DeformableConvolution* + +**Category**: *Layer* + +**Short description**: *DeformableConvolution* convolution layer enhances the transformation modeling capacity of CNNs. + +**Detailed description**: [Reference](https://arxiv.org/abs/1703.06211) + +**Parameters**: *DeformableConvolution* layer parameters are specified in the `data` node, which is a child of the `layer` node. The layer has the same parameters as a regular *Convolution* layer and several unique parameters. + +* **Parameter name**: *num_deformable_group* + + * **Description**: *num_deformable_group* is the number of deformable group partitions. + * **Range of values**: a non-negative integer + * **Type**: `int` + * **Default value**: 1 + * **Required**: *no* + +**Inputs**: + +* **1**: 4D or 5D blob with input data. Required. +* **2**: Input offset to the DeformableConvolution + +**Weights Layout** + +Weights layout is GOIYX (GOIZYX for 3D convolution), which means that *X* changes the fastest, then *Y*, *Input* and *Output*, *Group*. + +**Example** + +```xml + + + + + 1 + 512 + 40 + 27 + + + 1 + 72 + 40 + 27 + + + + + 1 + 512 + 40 + 27 + + + + + + +``` + +* * * + +## DepthToSpace Layer +Back to top + +**Name**: *DepthToSpace* + +**Category**: *Layer* + +**Short description**: *DepthToSpace* layer rearranges data from the depth dimension of the input blob into spatial dimensions. + +**Detailed description**: *DepthToSpace* layer outputs a copy of the input blob, where values from the depth dimension (features) are moved to spatial blocks. Refer to the [ONNX* specification](https://github.com/onnx/onnx/blob/master/docs/Operators.md#DepthToSpace) for an example of the 4D input blob case. + +**Parameters**: *DepthToSpace* layer parameters are specified parameters in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *block_size* + + * **Description**: *block_size* specifies the size of the value block to be moved. The depth dimension size must be evenly divided by `block_size ^ (len(input.shape) - 2)`. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: 1 + * **Required**: *no* + +**Inputs**: + +* **1**: 3D+ blob with input data. Required. + +**Mathematical Formulation** + +The operation is equivalent to the following transformation of the input blob *x* with *K* spatial dimensions of shape *[N, C, D1, D2, D3 , ... , DK]*: + +``` +x' = reshape(x, [N, block_size, block_size, ... , block_size, D1 * block_size, D2 * block_size, ... Dk * block_size]) +x'' = transpose(x', [0, K + 1, K + 2, 1, K + 3, 2, K + 4, 3, ... K + K + 1, K]) +y = reshape(x'', [N, C / block_size ^ K, D1 * block_size, D2 * block_size, D3 * block_size, ... , DK * block_size]) + +``` +**Example** + +```xml + + + + + 5 + 4 + 2 + 3 + + + + + 5 + 1 + 4 + 6 + + + +``` + +* * * + +## DetectionOutput Layer +Back to top + +**Name**: *DetectionOutput* + +**Category**: *Layer* + +**Short description**: *DetectionOutput* layer performs non-maximum suppression to generate the detection output using information on location and confidence predictions. + +**Detailed description**: [Reference](https://arxiv.org/pdf/1512.02325.pdf). The layer has three required inputs: blob with box logits, blob with confidence predictions, and blob with box coordinates (proposals). It can have two additional inputs with additional confidence predictions and box coordinates described in the [article](https://arxiv.org/pdf/1711.06897.pdf). The five input version of the layer is supported with MYRIAD plugin only. The output blob contains information about filtered detections described with seven element tuples: *[batch_id, class_id, confidence, x_1, y_1, x_2, y_2]*. The first tuple with *batch_id* equal to -1 means end of output. + +**Parameters**: *DetectionOutput* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *num_classes* + + * **Description**: *num_classes* is the number of classes to be predicted. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *background_label_id* + + * **Description**: *background_label_id* is the background label ID. If there is no background class, set it to -1. + * **Range of values**: an integer + * **Type**: `int` + * **Default value**: 0 + * **Required**: *no* + +* **Parameter name**: *top_k* + + * **Description**: *top_k* is the maximum number of results to keep per batch after NMS step. -1 means keeping all bounding boxes. + * **Range of values**: an integer + * **Type**: `int` + * **Default value**: -1 + * **Required**: *no* + +* **Parameter name**: *variance_encoded_in_target* + + * **Description**: *variance_encoded_in_target* is a flag that specifies if variance is encoded in target. If flag is 0 (that is, `false`), you need to adjust the predicted offset accordingly. + * **Range of values**: 0 or 1 + * **Type**: `int` + * **Default value**: 0 + * **Required**: *no* + +* **Parameter name**: *keep_top_k* + + * **Description**: *keep_top_k* is the maximum number of bounding boxes per batch to keep after NMS step. -1 means keeping all bounding boxes. + * **Range of values**: an integer + * **Type**: `int` + * **Default value**: -1 + * **Required**: *yes* + +* **Parameter name**: *code_type* + + * **Description**: *code_type* is a coding method for bounding boxes. + * **Range of values**: `"caffe.PriorBoxParameter.CENTER_SIZE"`, `"caffe.PriorBoxParameter.CORNER"` + * **Type**: `string` + * **Default value**: `caffe.PriorBoxParameter.CORNER` + * **Required**: *no* + +* **Parameter name**: *share_location* + + * **Description**: *share_location* is a flag that specifies if bounding boxes are shared among different classes. + * **Range of values**: 0 or 1 + * **Type**: `int` + * **Default value**: 1 + * **Required**: *no* + +* **Parameter name**: *nms_threshold* + + * **Description**: *nms_threshold* is the threshold to be used in the NMS stage. + * **Range of values**: a floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *confidence_threshold* + + * **Description**: *confidence_threshold* is a threshold to filter out detections with smaller confidence. If not set, all boxes are used. + * **Range of values**: a floating-point number + * **Type**: `float` + * **Default value**: `-FLT_MAX` + * **Required**: *no* + +* **Parameter name**: *clip_after_nms* + + * **Description**: *clip_after_nms* is a flag that specifies whether to perform clip bounding boxes after non-maximum suppression or not. + * **Range of values**: 0 or 1 + * **Type**: `int` + * **Default value**: 0 + * **Required**: *no* + +* **Parameter name**: *clip_before_nms* + + * **Description**: *clip_before_nms* is a flag that specifies whether to clip bounding boxes before non-maximum suppression or not. + * **Range of values**: 0 or 1 + * **Type**: `int` + * **Default value**: 0 + * **Required**: *no* + +* **Parameter name**: *decrease_label_id* + + * **Description**: *decrease_label_id* is a flag that denotes how to perform NMS. + * **Range of values**: + * *0* - perform NMS like in Caffe\* + * *1* - perform NMS like in MxNet\* + * **Type**: `int` + * **Default value**: 0 + * **Required**: *no* + +* **Parameter name**: *normalized* + + * **Description**: *normalized* is a flag that specifies whether input blobs with boxes are normalized. If blobs are not normalized, the *input_height* and *input_width* parameters are used to normalize box coordinates. + * **Range of values**: 0 or 1 + * **Type**: `int` + * **Default value**: 0 + * **Required**: *no* + +* **Parameter name**: *input_height* + + * **Description**: *input_height* is the height of an input image. If the *normalized* is 1, *input_height* is ignored. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: 1 + * **Required**: *no* + +* **Parameter name**: *input_width* + + * **Description**: *input_width* is the width of an input image. If the *normalized* is 1, *input_width* is ignored. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: 1 + * **Required**: *no* + +* **Parameter name**: *objectness_score* + + * **Description**: *objectness_score* is the threshold to sort out confidence predictions. Used only when the *DetectionOutput* layer has five inputs. + * **Range of values**: a non-negative floating-point number + * **Type**: `float` + * **Default value**: 0.0 + * **Required**: *no* + +**Inputs**: + +* **1**: 2D input blob with box logits. Required. +* **2**: 2D input blob with class predictions. Required. +* **3**: 3D input blob with proposals. Required. +* **4**: 2D input blob with additional class predictions information described in the [article](https://arxiv.org/pdf/1711.06897.pdf). Optional. +* **5**: 2D input blob with additional box predictions information described in the [article](https://arxiv.org/pdf/1711.06897.pdf). Optional. + +**Mathematical Formulation** + +At each feature map cell, *DetectionOutput* predicts the offsets relative to the default box shapes in the cell, as well as the per-class scores that indicate the presence of a class instance in each of those boxes. Specifically, for each box out of k at a given location, *DetectionOutput* computes class scores and the four offsets relative to the original default box shape. This results are a total of \f$(c + 4)k\f$ filters that are applied around each location in the feature map, yielding \f$(c + 4)kmn\f$ outputs for a *m \* n* feature map. + +**Example** + +```xml + + + ... + ... + +``` + +* * * + + +## Erf Layer +Back to top + +**Name**: *Erf* + +**Category**: *Layer* + +**Short description**: *Erf* layer computes the Gauss error function of input element-wise. + +**Detailed Description**: [Reference](https://www.tensorflow.org/api_docs/python/tf/math/erf) + +**Parameters**: *Erf* layer does not have parameters. + +**Inputs**: + +* **1**: Input tensor X of any floating-point type. Required. + +**Outputs**: + +* **1**: Result of Erf function applied on input tensor x. Floating point tensor with shape and type matching input tensor. Required. + +**Mathematical Formulation** + +For each element from an input tensor, *Erf* layer calculates corresponding +element in the output tensor by the formula: +\f[ +erf(x) = \frac{2}{\sqrt{\pi}} \int_0^x e^{-t^2} dt +\f] + +**Example** + +```xml + + + + 5 + 4 + + + + + 5 + 4 + + + +``` + + + +* * * + + +## Eltwise Layer +Back to top + +**Name**: *Eltwise* + +**Category**: *Layer* + +**Short description**: *Eltwise* layer performs element-wise operation specified in parameters, over given inputs. + +**Parameters**: *Eltwise* layer parameters are specified in the `data` node, which is a child of the `layer` node. *Eltwise* accepts two inputs of arbitrary number of dimensions. The operation supports broadcasting input blobs according to the [NumPy specification](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html). + +* **Parameter name**: *operation* + + * **Description**: *operation* is a mathematical operation to be performed over inputs. + * **Range of values**: + * *sum* - summation + * *sub* - subtraction + * *mul* - multiplication + * *div* - division + * *max* - maximum + * *min* - minimum + * *squared_diff* - squared difference + * *floor_mod* - reminder of division + * *pow* - power + * *logical_and* - logical AND + * *logical_or* - logical OR + * *logical_xor* - logical XOR + * *less* - less + * *less_equal* - less or equal + * *greater* - greater + * *greater_equal* - greater equal + * *equal* - equal + * *not_equal* - not equal + * **Type**: string + * **Default value**: *sum* + * **Required**: *no* + +**Inputs** + +* **1**: Multidimensional input blob. Required. +* **2**: Multidimensional input blob. Required. + +**Mathematical Formulation** +*Eltwise* does the following with the input blobs: +\f[ +o_{i} = f(b_{i}^{1}, b_{i}^{2}) +\f] +where \f$b_{i}^{1}\f$ - first blob \f$i\f$-th element, \f$b_{i}^{2}\f$ - second blob \f$i\f$-th element, \f$o_{i}\f$ - output blob \f$i\f$-th element, \f$f(a, b)\f$ - is a function that performs an operation over its two arguments \f$a, b\f$. + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## Fill Layer +Back to top + +**Name**: *Fill* + +**Category**: *Layer* + +**Short description**: *Fill* layer generates a blob of the specified shape filled with the specified value. + +**Parameters**: *Fill* layer does not have parameters. + +**Inputs**: + +* **1**: 1D blob with an output blob shape. Required. + +* **2**: 0D blob (constant) with the value for fill. Required. + +**Example** + +```xml + + + + 2 + + + + + + 3 + 4 + + + +``` + +* * * + +## Flatten Layer +Back to top + +**Name**: *Flatten* + +**Category**: *Layer* + +**Short description**: *Flatten* layer performs flattening of specific dimensions of the input blob. + +**Parameters**: *Flatten* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *axis* + + * **Description**: *axis* specifies the first axis to flatten. + * **Range of values**: a non-negative integer + * **Type**: `int` + * **Default value**: 0 + * **Required**: *no* + +* **Parameter name**: *end_axis* + + * **Description**: *end_axis* speficies the last dimension to flatten. The value can be negative meaning counting axes from the end. + * **Range of values**: an integer + * **Type**: `int` + * **Default value**: -1 + * **Required**: *no* + +**Inputs** + +* **1**: Multidimensional input blob. Required. + +**Example** + +```xml + + + + + 7 + 19 + 19 + 12 + + + + + 7 + 4332 + + + +``` + +* * * + +## FullyConnected Layer +Back to top + +**Name**: *FullyConnected* + +**Category**: *Layer* + +**Short description**: [Reference](http://caffe.berkeleyvision.org/tutorial/layers/innerproduct.html) + +**Detailed description**: [Reference](http://cs231n.github.io/convolutional-networks/#fc) + +**Parameters**: *FullyConnected* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *out-size* + + * **Description**: *out-size* is the length of the output vector. For example, *out-size* equal to 4096 means that the output vector length is 4096. + * **Range of values**: a non-negative integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +**Inputs** + +* **1**: 2D or 4D input blob. Required. + +**Weights Layout** + +OI, which means that Input changes the fastest, then Output. + +**Mathematical Formulation** + +* If previous layer is *FullyConnected*: + \f[ + y_{i} = f( z_{i} ) \quad with \quad z_{i} = \sum_{j=1}^{m_{1}^{( l-1 )}}w_{i,j}^{( l )}y_{i}^{ ( l -1 )} + \f] +* Otherwise: + \f[ + y_{i} = f( z_{i} ) \quad with \quad z_{i}^{ ( l )} = \sum_{j=1}^{m_{1}^{( l-1 )}}\sum_{r=1}^{m_{2}^{ ( l-1 )}}\sum_{s=1}^{m_{3}^{ ( l-1 )}}w_{i,j,r,s}^{ ( l )} ( Y_{i}^{ (l-1) })_{r,s} + \f] + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## Gather Layer +Back to top + +**Name**: *Gather* + +**Category**: *Layer* + +**Short description**: *Gather* layer takes slices of data in the second input blob according to the indexes specified in the first input blob. The output blob shape is `input2.shape[:axis] + input1.shape + input2.shape[axis + 1:]`. + +**Parameters**: *Gather* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *axis* + + * **Description**: *axis* is a dimension index to gather data from. For example, *axis* equal to 1 means that gathering is performed over the first dimension. + * **Range of values**: an integer in the range `[-len(input2.shape), len(input2.shape) - 1]`. + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +**Mathematical Formulation** + + \f[ + output[:, ... ,:, i, ... , j,:, ... ,:] = input2[:, ... ,:, input1[i, ... ,j],:, ... ,:] + \f] + + +**Inputs** + +* **1**: Multidimensional input blob with indexes to gather. The values for indexes are in the range `[0, input1[axis] - 1]`. +* **2**: Multidimensional input blob with arbitrary data. + +**Example** + +```xml + + + + + 15 + 4 + 20 + 28 + + + 6 + 12 + 10 + 24 + + + + + 6 + 15 + 4 + 20 + 28 + 10 + 24 + + + +``` + +* * * + +## GRN Layer +Back to top + +**Name**: *GRN* + +**Category**: *Normalization* + +**Short description**: *GRN* is the Global Response Normalization with L2 norm (across channels only). + +**Parameters**: *GRN* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *bias* + + * **Description**: *bias* is added to the variance. + * **Range of values**: a floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *yes* + +**Inputs** + +* **1**: 2D, 3D or 4D input blob. Required. + +**Mathematical Formulation** + +*GRN* computes the L2 norm by channels for input blob. *GRN* generally does the following with the input blob: +\f[ +output_{i} = \frac{input_{i}}{\sqrt{\sum_{i}^{C} input_{i}}} +\f] + +**Example** + +```xml + + + ... + ... + +``` + +* * * +## GRUCell Layer +Back to top + +**Name**: *GRUCell* + +**Category**: *Layer* + +**Short description**: *GRUCell* layer computes the output using the formula described in the [paper](https://arxiv.org/abs/1406.1078). + +**Parameters**: *GRUCell* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *hidden_size* + + * **Description**: *hidden_size* specifies hidden state size. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *activations* + + * **Description**: *activations* specifies activation functions for gates. + * **Range of values**: any combination of *relu*, *sigmoid*, *tanh* + * **Type**: a list of strings + * **Default value**: *sigmoid,tanh* + * **Required**: *no* + +* **Parameter name**: *activations_alpha, activations_beta* + + * **Description**: *activations_alpha, activations_beta* parameters of functions + * **Range of values**: a list of floating-point numbers + * **Type**: `float[]` + * **Default value**: None + * **Required**: *no* + +* **Parameter name**: *clip* + + * **Description**: *clip* specifies bound values *[-C, C]* for tensor clipping. Clipping is performed before activations. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *no* + +* **Parameter name**: *linear_before_reset* + + * **Description**: *linear_before_reset* flag denotes if the layer behaves according to the modification of *GRUCell* described in the formula in the [ONNX documentation](https://github.com/onnx/onnx/blob/master/docs/Operators.md#GRU). + * **Range of values**: 0 or 1 + * **Type**: `int` + * **Default value**: 0 + * **Required**: *no* + +**Inputs** + +* **1**: `X` - 2D ([batch_size, input_size]) input data. Required. + +* **2**: `Hi` - 2D ([batch_size, hidden_size]) input hidden state data. Required. + +**Outputs** + +* **1**: `Ho` - 2D ([batch_size, hidden_size]) output hidden state. + +**Example** +```xml + + + + + 1 + 16 + + + 1 + 128 + + + + + 1 + 128 + + + + + + + +``` + +* * * +## Input Layer +Back to top + +**Name**: *Input* + +**Category**: *Layer* + +**Short description**: *Input* layer specifies input to the model. + +**Parameters**: *Input* layer does not have parameters. + +**Example** + +```xml + + + + 1 + 3 + 224 + 224 + + + +``` + +* * * + +## Interp Layer +Back to top + +**Name**: *Interp* + +**Category**: *Layer* + +**Short description**: *Interp* layer performs bilinear interpolation of the input blob by the specified parameters. + +**Parameters**: *Interp* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *height* + + * **Description**: *height* specifies output height. If the parameter is not set, other parameters are used for output size calculation. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *no* + +* **Parameter name**: *width* + + * **Description**: *width* specifies output width. If the parameter is not set, other parameters are used for output size calculation. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *no* + +* **Parameter name**: *align_corners* + + * **Description**: *align_corners* is a flag that specifies whether to align corners or not. + * **Range of values**: 0 or 1 + * **Type**: `int` + * **Default value**: 1 + * **Required**: *no* + +* **Parameter name**: *pad_beg* + + * **Description**: *pad_beg* specify the number of pixels to add to the beginning of the image being interpolated. + * **Range of values**: a non-negative integer number + * **Type**: `int` + * **Default value**: 0 + * **Required**: *yes* + +* **Parameter name**: *pad_end* + + * **Description**: *pad_end* specify the number of pixels to add to the end of the image being interpolated. + * **Range of values**: a non-negative integer + * **Type**: `int` + * **Default value**: 0 + * **Required**: *yes* + +**Inputs** + +* **1**: 4D input blob. Required. + +**Example** + +```xml + + + + + 1 + 2 + 48 + 80 + + + + + 1 + 2 + 96 + 160 + + + +``` + +* * * + +## LSTMCell Layer +Back to top + +**Name**: *LSTMCell* + +**Category**: *Layer* + +**Short description**: *LSTMCell* layer computes the output using the formula described in the original paper [Long Short-Term Memory](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.676.4320&rep=rep1&type=pdf). + +**Parameters**: *LSTMCell* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *hidden_size* + + * **Description**: *hidden_size* specifies hidden state size. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *activations* + + * **Description**: *activations* specifies activation functions for gates. + * **Range of values**: any combination of *relu*, *sigmoid*, *tanh* + * **Type**: a list of strings + * **Default value**: *sigmoid,tanh,tanh* + * **Required**: *no* + +* **Parameter name**: *activations_alpha, activations_beta* + + * **Description**: *activations_alpha, activations_beta* parameters of functions. + * **Range of values**: a list of floating-point numbers + * **Type**: `float[]` + * **Default value**: None + * **Required**: *no* + +* **Parameter name**: *clip* + + * **Description**: *clip* specifies bound values *[-C, C]* for tensor clipping. Clipping is performed before activations. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *no* + +**Inputs** + +* **1**: `X` - 2D ([batch_size, input_size]) input data. Required. + +* **2**: `Hi` - 2D ([batch_size, hidden_size]) input hidden state data. Required. + +* **3**: `Ci` - 2D ([batch_size, hidden_size]) input cell state data. Required. + + +**Outputs** + +* **1**: `Ho` - 2D ([batch_size, hidden_size]) output hidden state. + +* **2**: `Co` - 2D ([batch_size, hidden_size]) output cell state. + +**Mathematical Formulation** + +``` +Formula: + * - matrix mult + (.) - eltwise mult + [,] - concatenation +sigm - 1/(1 + e^{-x}) +tanh - (e^{2x} - 1)/(e^{2x} + 1) + f = sigm(Wf*[Hi, X] + Bf) + i = sigm(Wi*[Hi, X] + Bi) + c = tanh(Wc*[Hi, X] + Bc) + o = sigm(Wo*[Hi, X] + Bo) + Co = f (.) Ci + i (.) c + Ho = o (.) tanh(Co) +``` + +**Example** + +```xml + + ... + ... + +``` + +* * * + +## Memory Layer +Back to top + +**Name**: *Memory* + +**Category**: *Layer* + +**Short description**: *Memory* layer represents the delay layer in terms of LSTM terminology. For more information about LSTM topologies, please refer to this [article](http://colah.github.io/posts/2015-08-Understanding-LSTMs). + +**Detailed description**: *Memory* layer saves the state between two infer requests. In the topology, it is the single layer, however, in the Intermediate Representation, it is always represented as a pair of **Memory** layers. One of these layers does not have outputs and another does not have inputs (in terms of the Intermediate Representation). + +**Parameters**: *Memory* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *id* + + * **Description**: *id* is the ID of the pair of *Memory* layers. Two layers with the same value of the *id* parameter are paired. + * **Range of values**: any combination of Latin characters, numbers, and underscores (`_`) in the `string` format + * **Type**: string + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *index* + + * **Description**: *index* specifies whether the given layer is input or output. For example, *index* equal to 0 means the layer is output. + * **Range of values**: + * 0 - current layer is output + * 1 - current layer is input + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *size* + + * **Description**: *size* is the size of the group. For example, *size* equal to 2 means this group is a pair. + * **Range of values**: only *2* is supported + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +**Mathematical Formulation** + +*Memory* saves data from the input blob. + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## MVN Layer +Back to top + +**Name**: *MVN* + +**Category**: *Normalization* + +**Short description**: [Reference](http://caffe.berkeleyvision.org/tutorial/layers/mvn.html) + +**Parameters**: *MVN* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *across_channels* + + * **Description**: *across_channels* is a flag that specifies whether mean values are shared across channels. For example, *across_channels* equal to 0 means that mean values are not shared across channels. + * **Range of values**: + * 0 - do not share mean values across channels + * 1 - share mean values across channels + * **Type**: `int` + * **Default value**: 0 + * **Required**: *no* + +* **Parameter name**: *normalize_variance* + + * **Description**: *normalize_variance* is a flag that specifies whether to perform variance normalization. + * **Range of values**: + * 0 - do not normalize variance + * 1 - normalize variance + * **Type**: `int` + * **Default value**: 0 + * **Required**: *no* + +* **Parameter name**: *eps* + + * **Description**: *eps* is the number to be added to the variance to avoid division by zero when normalizing the value. For example, *epsilon* equal to 0.001 means that 0.001 is added to the variance. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *yes* + +**Inputs** + +* **1**: 4D or 5D input blob. Required. + +**Mathematical Formulation** + +*MVN* subtracts mean value from the input blob: +\f[ +o_{i} = i_{i} - \frac{\sum{i_{k}}}{C * H * W} +\f] +If *normalize_variance* is set to 1, the output blob is divided by variance: +\f[ +o_{i}=\frac{o_{i}}{\sum \sqrt {o_{k}^2}+\epsilon} +\f] + +**Example** + +```xml + + + + ... + + + ... + + +``` + +* * * + +## NonMaxSuppression Layer +Back to top + +**Name**: *NonMaxSuppression* + +**Category**: *Layer* + +**Short description**: *NonMaxSuppression* performs non-maximum suppression of the input boxes and return indices of the selected boxes. + +**Detailed description**: [Reference](https://github.com/onnx/onnx/blob/rel-1.5.0/docs/Operators.md#NonMaxSuppression) + +**Parameters**: *NonMaxSuppression* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *center_point_box* + + * **Description**: *center_point_box* is flag that specifies the format of the box data. + * **Range of values**: + * false (0) - the box data is supplied as `[y1, x1, y2, x2]` where `(y1, x1)` and `(y2, x2)` are the coordinates of any diagonal pair of box corners. + * true (1) - the box data is supplied as `[x_center, y_center, width, height]`. + * **Type**: `bool` + * **Default value**: false + * **Required**: *no* + + +**Inputs** + +* **1**: 3D floating point blob with the boxes data of shape [batch_size, num_boxes, 4]. Required. +* **2**: 3D floating point blob with the boxes scores of shape [batch_size, num_classes, num_boxes]. Required. +* **3**: 1D integer blob with of shape [1] representing maximum number of boxes to be selected per class. Optional. If not specified then all boxes will be selected. +* **4**: 1D floating point blob with of shape [1] representing intersection over union threshold. Optional. If not specified then it is equal to 1.0. +* **5**: 1D floating point blob with of shape [1] representing box score threshold. Optional. If not specified then it is equal to 0.0. + + +**Mathematical Formulation** + +\f[o_{i} = \left( 1 + \left( \frac{\alpha}{n} \right)\sum_{i}x_{i}^{2} \right)^{\beta}\f] +Where \f$n\f$ is the size of each local region. + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## Norm Layer +Back to top + +**Name**: *Norm* + +**Category**: *Normalization* + +**Short description**: [Reference](http://caffe.berkeleyvision.org/tutorial/layers/lrn.html) + +**Detailed description**: [Reference](http://yeephycho.github.io/2016/08/03/Normalizations-in-neural-networks/#Local-Response-Normalization-LRN) + +**Parameters**: *Norm* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *alpha* + + * **Description**: *alpha* is a scaling parameter for the normalizing sum. For example, *alpha* equal to 0.0001 means that the normalizing sum is multiplied by 0.0001. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *beta* + + * **Description**: *beta* is an exponent for the normalizing sum. For example, *beta* equal to 0.75 means that the normalizing sum is raised to the power of 0.75. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *region* + + * **Description**: *region* is the strategy of local regions extension. For example, *region* equal to *across* means that the normalizing sum is performed over adjacent channels. + * **Range of values**: + * *across* - normalize sum over adjacent channels + * *same* - normalize sum over nearby spatial locations + * **Type**: string + * **Default value**: `across` + * **Required**: *yes* + +* **Parameter name**: *local-size* + + * **Description**: *local-size* represents the side length of the region to be used for the normalization sum or number of channels depending on the strategy specified in the *region* parameter. For example, *local-size* equal to 5 for the *across* strategy means application of sum across 5 adjacent channels. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +**Inputs** + +* **1**: 4D input blob. Required. + +**Mathematical Formulation** + +\f[o_{i} = \left( 1 + \left( \frac{\alpha}{n} \right)\sum_{i}x_{i}^{2} \right)^{\beta}\f] +Where \f$n\f$ is the size of each local region. + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## Normalize Layer +Back to top + +**Name**: *Normalize* + +**Category**: *Normalization* + +**Short description**: *Normalize* layer performs l-p normalization of 1 of input blob. + +**Parameters**: *Normalize* layer parameters should be specified as the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *across_spatial* + + * **Description**: *across_spatial* is a flag that specifies if normalization is performed over CHW or HW. For example, *across_spatial* equal to 0 means that normalization is not shared across channels. + * **Range of values**: + * 0 - do not share normalization across channels + * 1 - not supported + * **Type**: `int` + * **Default value**: 0 + * **Required**: *no* + +* **Parameter name**: *channel_shared* + + * **Description**: *channel_shared* is a flag that specifies if scale parameters are shared across channels. For example, *channel_shared* equal to 0 means that scale parameters are not shared across channels. + * **Range of values**: + * 0 - do not share scale parameters across channels + * 1 - not supported + * **Type**: `int` + * **Default value**: 0 + * **Required**: *no* + +* **Parameter name**: *eps* + + * **Description**: *eps* is the number to be added to the variance to avoid division by zero when normalizing the value. For example, *eps* equal to 0.001 means that 0.001 is used if all the values in normalization are equal to zero. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *yes* + +**Inputs** + +* **1**: 2D, 3D or 4D input blob. Required. + +**Mathematical Formulation** + +\f[ +o_{i} = \sum_{i}^{H*W}\frac{\left ( n*C*H*W \right )` scale}{\sqrt{\sum_{i=0}^{C*H*W}\left ( n*C*H*W \right )^{2}}} +\f] + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## OneHot Layer +Back to top + +**Name**: *OneHot* + +**Category**: *Layer* + +**Short description**: *OneHot* layer fills the locations represented by indices specified in input with the value of *on_value* and fills all other locations with the value of *off_value*. If an index is out of range, the corresponding element is also filled with the *off_value*. + +**Detailed description**: [Reference](https://www.tensorflow.org/api_docs/python/tf/one_hot) + +**Parameters**: *OneHot* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *axis* + + * **Description**: *axis* is a new axis position in the output shape to fill with one-hot values. + * **Range of values**: an integer. Negative value means counting dimension from the end. + * **Type**: `int` + * **Default value**: -1 + * **Required**: *no* + +* **Parameter name**: *depth* + + * **Description**: *depth* is depth of a new one-hot dimension. + * **Range of values**: a non-negative integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *on_value* + + * **Description**: *on_value* is the value that the locations represented by indices in input take. + * **Range of values**: a floating-point number. + * **Type**: `float` + * **Default value**: 1.0 + * **Required**: *no* + +* **Parameter name**: *off_value* + + * **Description**: *off_value* is the value that the locations not represented by indices in input take. + * **Range of values**: a floating-point number. + * **Type**: `float` + * **Default value**: 0.0 + * **Required**: *no* + +**Inputs**: + +* **1**: Multidimensional input tensor with indices of type T (can be 0D). Required. + +**Outputs**: + +* **1** Multidimensional output tensor. If the input indices have rank N, the output will have rank N+1. + A new axis of the size `depth` is created at the dimension `axis`. + +**Examples** + +```xml + + + + + 3 + + + + + 3 + 3 + + + +``` + +* * * + +## Pad Layer +Back to top + +**Name**: *Pad* + +**Category**: *Layer* + +**Short description**: *Pad* layer extends an input blob on edges. New element values are generated based on the *Pad* layer parameters described below. + +**Parameters**: *Pad* layer parameters are specified in the `data` section, which is a child of the `layer` node. The parameters specify a number of elements to add along each axis and a rule by which new element values are generated: for example, whether they are filled with a given constant or generated based on the input blob content. + +* **Parameter name**: *pads_begin* + + * **Description**: *pads_begin* specifies the number of padding elements at the beginning of each axis. + * **Range of values**: a list of non-negative integers. The length of the list must be equal to the number of dimensions in the input blob. + * **Type**: `int[]` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *pads_end* + + * **Description**: *pads_end* specfies the number of padding elements at the end of each axis. + * **Range of values**: a list of non-negative integers. The length of the list must be equal to the number of dimensions in the input blob. + * **Type**: `int[]` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *pad_mode* + + * **Description**: *pad_mode* specifies the method used to generate new element values. + * **Range of values**: Name of the method in string format: + * `constant` - padded values are equal to the value of the *pad_value* layer parameter. + * `edge` - padded values are copied from the respective edge of the input blob. + * `reflect` - padded values are a reflection of the input blob; values on the edges are not duplicated. `pads_begin[D]` and `pads_end[D]` must be not greater than `input.shape[D] – 1` for any valid `D`. + * `symmetric` - padded values are symmetrically added from the input blob. This method is similar to the `reflect`, but values on edges are duplicated. Refer to the examples below for more details. `pads_begin[D]` and `pads_end[D]` must be not greater than `input.shape[D]` for any valid `D`. + * **Type**: string + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *pad_value* + + * **Description**: Use with the `pad_mode = "constant"` only. All new elements are filled with this value. + * **Range of values**: a floating-point number + * **Type**: `float` + * **Default value**: 0.0 + * **Required**: *no* + +**Inputs** + +* **1**: Multidimensional input blob. Required. + + +**Outputs** + +* **1**: Multidimensional input blob with dimensions `pads_begin[D] + input.shape[D] + pads_end[D]` for each `D` from `0` to `len(input.shape) - 1`. + + +**pad_mode Examples** + +The following examples illustrate how output blob is generated for the *Pad* layer for a given input blob: +``` +INPUT = +[[ 1 2 3 4 ] +[ 5 6 7 8 ] +[ 9 10 11 12 ]] +``` +with the following parameters: +``` +pads_begin = [0, 1] +pads_end = [2, 3] +``` +depending on the *pad_mode*. +* `pad_mode = "constant"`: +``` +OUTPUT = +[[ 0 1 2 3 4 0 0 0 ] +[ 0 5 6 7 8 0 0 0 ] +[ 0 9 10 11 12 0 0 0 ] +[ 0 0 0 0 0 0 0 0 ] +[ 0 0 0 0 0 0 0 0 ]] +``` +* `pad_mode = "edge"`: +``` +OUTPUT = +[[ 1 1 2 3 4 4 4 4 ] +[ 5 5 6 7 8 8 8 8 ] +[ 9 9 10 11 12 12 12 12 ] +[ 9 9 10 11 12 12 12 12 ] +[ 9 9 10 11 12 12 12 12 ]] +``` +* `pad_mode = "reflect"`: +``` +OUTPUT = +[[ 2 1 2 3 4 3 2 1 ] +[ 6 5 6 7 8 7 6 5 ] +[ 10 9 10 11 12 11 10 9 ] +[ 6 5 6 7 8 7 6 5 ] +[ 2 1 2 3 4 3 2 1 ]] +``` +* `pad_mode = "symmetric"`: +``` +OUTPUT = +[[ 1 1 2 3 4 4 3 2 ] +[ 5 5 6 7 8 8 7 6 ] +[ 9 9 10 11 12 12 11 10 ] +[ 9 9 10 11 12 12 11 10 ] +[ 5 5 6 7 8 8 7 6 ]] +``` + +**Example** + +```xml + + + + + 1 + 3 + 32 + 40 + + + + + 2 + 8 + 37 + 48 + + + +``` + +* * * + +## Permute Layer +Back to top + +**Name**: *Permute* + +**Category**: *Layer* + +**Short description**: *Permute* layer reorders input blob dimensions. + +**Detailed description**: [Reference](http://caffe.help/manual/layers/tile.html) + +**Parameters**: *Permute* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *order* + + * **Description**: *order* is a list of dimensions indexes for output blob. For example, *order* equal to "0,2,3,1" means that the output blob has the following dimensions: the first dimension from the input blob, the third dimension from the input blob, the fourth dimension from the input blob, the second dimension from the input blob. + * **Range of values**: a list of positive integers separated by comma + * **Type**: `int[]` + * **Default value**: None + * **Required**: *yes* + + +**Inputs**: + +* **1**: Multidimensional input blob. Required. + +**Mathematical Formulation** + +*Permute* layer reorders input blob dimensions. Source indexes and destination indexes are bound by the formula: +\f[ +src\_ind_{offset} = n * ordered[1] * ordered[2] * ordered[3] + (h * ordered[3] + w) +\f] +\f[ +n \in ( 0, order[0] ) +\f] +\f[ +h \in ( 0, order[2] ) +\f] +\f[ +w \in ( 0, order[3] ) +\f] + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## Pooling Layer +Back to top + +**Name**: *Pooling* + +**Category**: *Pool* + +**Short description**: [Reference](http://caffe.berkeleyvision.org/tutorial/layers/pooling.html) + +**Detailed description**: [Reference](http://cs231n.github.io/convolutional-networks/#pool) + +**Parameters**: *Pooling* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *strides* + + * **Description**: *strides* is a distance (in pixels) to slide the window on the feature map over the `(z, y, x)` axes for 3D poolings and `(y, x)` axes for 2D poolings. For example, *strides* equal to "4,2,1" means sliding the window four pixels at a time over depth dimension, two pixels over height dimension, and one pixel over width dimension. + * **Range of values**: a list of non-negative integers + * **Type**: `int[]` + * **Default value**: a list of 1 with length equal to the number of convolution kernel dimensions + * **Required**: *no* + +* **Parameter name**: *pads_begin* + + * **Description**: *pads_begin* is the number of pixels to add to the beginning along each axis. For example, *pads_begin* equal to "1,2" means adding one pixel to the top of the input and two pixels to the left of the input. + * **Range of values**: a list of non-negative integers + * **Type**: `int[]` + * **Default value**: a list of 0 with length equal to the number of convolution kernel dimensions + * **Required**: *no* + +* **Parameter name**: *pads_end* + + * **Description**: *pads_end* is the number of pixels to add to the ending along each axis. For example, *pads_end* equal "1,2" means adding one pixel to the bottom of the input and two pixels to the right of the input. + * **Range of values**: a list of non-negative integers + * **Type**: `int[]` + * **Default value**: a list of 1 with length equal to the number of convolution kernel dimensions + * **Required**: *no* + +* **Parameter name**: *kernel* + + * **Description**: *kernel* is a size of each filter. For example, *kernel* equal to "2,3" means that each filter has height equal to 2 and width equal to 3. + * **Range of values**: a list of positive integers + * **Type**: `int[]` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *pool-method* + + * **Description**: *pool-method* is a type of pooling strategy for values. + * **Range of values**: + * *max* - choose the biggest value in a feature map for each window position + * *avg* - take the average value in a feature map for each windows position + * **Type**: string + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *exclude-pad* + + * **Description**: *exclude-pad* is a flag that specifies whether to ignore zeros in a padding area. For example, *exclude-pad* equal to *true* means that zero values in the padding are not used. + * **Range of values**: *true* or *false* + * **Type**: string + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *rounding_type* + + * **Description**: *rounding_type* is a type of rounding to apply. + * **Range of values**: + * *ceil* + * *floor* + * **Type**: string + * **Default value**: *floor* + * **Required**: *no* + +* **Parameter name**: *auto_pad* + + * **Description**: *auto_pad* specifies how to calculate padding. + * **Range of values**: + * Not specified: use explicit padding values + * *same_upper/same_lower*: the input is padded to match the output size. In case of odd padding value, an extra padding is added at the end (at the beginning). + * *valid*: do not use padding + * **Type**: string + * **Default value**: None + * **Required**: *no* + +**Inputs**: + +* **1**: 4D or 5D input blob. Required. + +**Mathematical Formulation** + +* For `pool-method="max"`: + \f[ + output_{j} = MAX\{ x_{0}, ... x_{i}\} + \f] +* For `pool-method="avg"`: + \f[ + output_{j} = \frac{\sum_{i = 0}^{n}x_{i}}{n} + \f] + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## Power Layer +Back to top + +**Name**: *Power* + +**Category**: *Layer* + +**Short description**: *Power* layer computes the output as `(shift + scale * x) ^ power` for each input element `x`. + +**Parameters**: *Power* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *power* + + * **Description**: *power* is a parameter in the formula described above. + * **Range of values**: a floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *scale* + + * **Description**: *scale* is a parameter in the formula described above. + * **Range of values**: a floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *shift* + + * **Description**: *shift* is a parameter in the formula described above. + * **Range of values**: a floating-point number + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +**Inputs**: + +* **1**: Multidimensional input blob. Required. + +**Mathematical Formulation** + +\f[ +p = (shift + scale * x)^{power} +\f] + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## PReLU Layer +Back to top + +**Name**: *PReLU* + +**Category**: *Activation* + +**Short description**: *PReLU* is the Parametric Rectifier Linear Unit. The difference from *ReLU* is that negative slopes can vary across channels. + +**Parameters**: *PReLU* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *channel_shared* + + * **Description**: *channel_shared* specifies whether a negative slope is shared across channels or not. If the *channel_shared* is equal to 0, the slope shape is equal to the number of channels, if the *channel_shared* is equal to 1, the slope is scalar. + * **Range of values**: 0 or 1 + * **Type**: `int` + * **Default value**: 0 + * **Required**: *no* + +**Inputs**: + +* **1**: 4D or 5D input blob. Required. + + +**Mathematical Formulation** + +*PReLU* accepts one input with four dimensions. The produced blob has the same dimensions as input. +*PReLU* does the following with the input blob: +\f[ +o_{i} = max(0, x_{i}) + w_{i} * min(0,x_{i}) +\f] +where \f$w_{i}\f$ is from weights blob. + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## PriorBox Layer +Back to top + +**Name**: *PriorBox* + +**Category**: *Layer* + +**Short description**: *PriorBox* layer generates prior boxes of specified sizes and aspect ratios across all dimensions. + +**Parameters**: *PriorBox* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *min_size* + + * **Description**: *min_size* is the minimum box size (in pixels). For example, *min_size* equal to `[15.0]` means that the minimum box size is 15.0. + * **Range of values**: a list of positive floating-point numbers + * **Type**: `float[]` + * **Default value**: `[]` + * **Required**: *yes* + +* **Parameter name**: *max_size* + + * **Description**: *max_size* is the maximum box size (in pixels). For example, *max_size* equal to `[15.0]` means that the maximum box size is 15.0. + * **Range of values**: a list of positive floating-point numbers + * **Type**: `float[]` + * **Default value**: `[]` + * **Required**: *yes* + +* **Parameter name**: *aspect_ratio* + + * **Description**: *aspect_ratio* is a variance of aspect ratios. Duplicate values are ignored. For example, *aspect_ratio* equal to "[2.0,3.0]" means that for the first box, aspect ratio is 2.0, for the second box, it is 3.0. + * **Range of values**: a list of positive floating-point numbers + * **Type**: `float[]` + * **Default value**: `[]` + * **Required**: *no* + +* **Parameter name**: *flip* + + * **Description**: *flip* is a flag that specifies whether each *aspect_ratio* is duplicated and flipped. For example, *flip* equal to 1 and *aspect_ratio* equal to "4.0,2.0" mean that *aspect_ratio* is equal to "4.0,2.0,0.25,0.5". + * **Range of values**: + * 0 - flip each *aspect_ratio* + * 1 - do not flip each *aspect_ratio* + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *clip* + + * **Description**: *clip* is a flag that specifies if each value in the output blob is clipped to *[0,1]* interval. + * **Range of values**: + * 0 - do not perform clipping + * 1 - clip each value in the output blob to *[0,1]* interval + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *step* + + * **Description**: *step* is the distance between box centers. For example, *step* equal to `85.0` means that the distance between neighborhood prior boxes centers is 85.0. + * **Range of values**: a non-negative floating-point number + * **Type**: `float` + * **Default value**: 0.0 + * **Required**: *yes* + +* **Parameter name**: *offset* + + * **Description**: *offset* is a shift of box respectively to top left corner. For example, *offset* equal to `85.0` means that the shift of neighborhood prior boxes centers is 85.0. + * **Range of values**: a non-negative floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *variance* + + * **Description**: *variance* is the variance of adjusting bounding boxes. The parameter can contain 0, 1 or 4 elements. + * **Range of values**: a list of positive floating-point numbers + * **Type**: `float[]` + * **Default value**: `[]` + * **Required**: *yes* + +* **Parameter name**: *scale_all_sizes* + + * **Description**: *scale_all_sizes* is a flag that specifies the type of inference. For example, *scale_all_sizes* equal to 0 means that the *PriorBox* layer is inferred in MXNet-like manner, which means that the *max_size* parameter is ignored. + * **Range of values**: + * 0 - do not use *max_size* + * 1 - use *max_size* + * **Type**: `int` + * **Default value**: 1 + * **Required**: *no* + +* **Parameter name**: *fixed_ratio* + + * **Description**: *fixed_ratio* is an aspect ratio of a box. For example, *fixed_ratio* equal to 2.000000 means that the aspect ratio for the first box aspect ratio is 2. + * **Range of values**: a list of positive floating-point numbers + * **Type**: `float[]` + * **Default value**: None + * **Required**: *no* + +* **Parameter name**: *fixed_size* + + * **Description**: *fixed_size* is an initial box size (in pixels). For example, *fixed_size* equal to 15 means that the initial box size is 15. + * **Range of values**: a list of positive floating-point numbers + * **Type**: `float[]` + * **Default value**: None + * **Required**: *no* + +* **Parameter name**: *density* + + * **Description**: *density* is the square root of the number of boxes of each type. For example, *density* equal to 2 means that the first box generates four boxes of the same size and with the same shifted centers. + * **Range of values**: a list of positive floating-point numbers + * **Type**: `float[]` + * **Default value**: None + * **Required**: *no* + +**Inputs**: + +* **1**: 4D input blob. Used to get height and width only. Required. + +* **2**: 4D input blob. Used to get image height and image width only. Required. + +**Mathematical Formulation**: + +*PriorBox* computes coordinates of prior boxes as follows: +1. Calculates *center_x* and *center_y* of prior box: + \f[ + W \equiv Width \quad Of \quad Image + \f] + \f[ + H \equiv Height \quad Of \quad Image + \f] + * If step equals 0: + \f[ + center_x=(w+0.5) + \f] + \f[ + center_y=(h+0.5) + \f] + * else: + \f[ + center_x=(w+offset)`step + \f] + \f[ + center_y=(h+offset)`step + \f] + \f[ + w \subset \left( 0, W \right ) + \f] + \f[ + h \subset \left( 0, H \right ) + \f] +2. For each \f$ s \subset \left( 0, min_sizes \right ) \f$, calculates coordinates of prior boxes: + \f[ + xmin = \frac{\frac{center_x - s}{2}}{W} + \f] + \f[ + ymin = \frac{\frac{center_y - s}{2}}{H} + \f] + \f[ + xmax = \frac{\frac{center_x + s}{2}}{W} + \f] + \f[ + ymax = \frac{\frac{center_y + s}{2}}{H} + \f] + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## PriorBoxClustered Layer +Back to top + +**Name**: *PriorBoxClustered* + +**Category**: *Layer* + +**Short description**: *PriorBoxClustered* layer generates prior boxes of specified sizes normalized to the input image size. + +**Parameters**: *PriorBoxClustered* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *width* + + * **Description**: *width* specifies desired boxes widths in pixels. + * **Range of values**: a list of positive floating-point numbers + * **Type**: `float[]` + * **Default value**: `[]` + * **Required**: *yes* + +* **Parameter name**: *height* + + * **Description**: *height* specifies desired boxes heights in pixels. + * **Range of values**: positive floating-point numbers + * **Type**: `float[]` + * **Default value**: `[]` + * **Required**: *yes* + +* **Parameter name**: *clip* + + * **Description**: *clip* is a flag that specifies if each value in the output blob is clipped within *[0,1]*. + * **Range of values**: + * 0 - do not perform clipping + * 1 - clip each value in the output blob to *[0,1]* + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *step (step_w, step_h)* + + * **Description**: *step (step_w, step_h)* is the distance between box centers. For example, *step* equal to 85.0 means that the distance between neighborhood prior boxes centers is 85.0. If both *step_h* and *step_w* are 0.0, they are updated with value of *step*. If after that they are still 0.0, they are calculated as input image heights/width divided by the first input heights/width. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: 0.0 + * **Required**: *yes* + +* **Parameter name**: *offset* + + * **Description**: *offset* is a shift of box respectively to top left corner. For example, *offset* equal to 85.0 means that the shift of neighborhood prior boxes centers is 85.0. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *variance* + + * **Description**: *variance* is the variance of adjusting bounding boxes. + * **Range of values**: a list of positive floating-point numbers + * **Type**: `float[]` + * **Default value**: `[]` + * **Required**: *yes* + +* **Parameter name**: *img_h* + + * **Description**: *img_h* is the height of input image. It is calculated as the second input height unless provided explicitly. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: 1.0 + * **Required**: *yes* + +* **Parameter name**: *img_w* + + * **Description**: *img_w* is the width of input image. It is calculated as second input width unless provided explicitly. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: 1.0 + * **Required**: *yes* + +**Inputs**: + +* **1**: 4D input blob. Used to get height and width only. Required. + +* **2**: 4D input blob. Used to get image height and image width only. Required. + +**Mathematical Formulation** + +*PriorBoxClustered* computes coordinates of prior boxes as follows: +1. Calculates the *center_x* and *center_y* of prior box: + \f[ + W \equiv Width \quad Of \quad Image + \f] + \f[ + H \equiv Height \quad Of \quad Image + \f] + \f[ + center_x=(w+offset)`step + \f] + \f[ + center_y=(h+offset)`step + \f] + \f[ + w \subset \left( 0, W \right ) + \f] + \f[ + h \subset \left( 0, H \right ) + \f] +2. For each \f$s \subset \left( 0, W \right )\f$, calculates the prior boxes coordinates: + \f[ + xmin = \frac{center_x - \frac{width_s}{2}}{W} + \f] + \f[ + ymin = \frac{center_y - \frac{height_s}{2}}{H} + \f] + \f[ + xmax = \frac{center_x - \frac{width_s}{2}}{W} + \f] + \f[ + ymax = \frac{center_y - \frac{height_s}{2}}{H} + \f] +If *clip* is defined, the coordinates of prior boxes are recalculated with the formula: +\f$coordinate = \min(\max(coordinate,0), 1)\f$ + +**Example** + +```xml + + + + ... + + + ... + + +``` + +* * * + +## Proposal Layer +Back to top + +**Name**: *Proposal* + +**Category**: *Layer* + +**Short description**: *Proposal* layer filters bounding boxes and outputs only those with the highest prediction confidence. + +**Parameters**: *Proposal* layer parameters are specified in the `data` node, which is a child of the `layer` node. The layer has three inputs: a blob with probabilities whether particular bounding box corresponds to background and foreground, a blob with logits for each of the bounding boxes, a blob with input image size in the [`image_height`, `image_width`, `scale_height_and_width`] or [`image_height`, `image_width`, `scale_height`, `scale_width`] format. + +* **Parameter name**: *base_size* + + * **Description**: *base_size* is the size of the anchor to which *scale* and *ratio* parameters are applied. + * **Range of values**: a positive integer number + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *pre_nms_topn* + + * **Description**: *pre_nms_topn* is the number of bounding boxes before the NMS operation. For example, *pre_nms_topn* equal to 15 means that the minimum box size is 15. + * **Range of values**: a positive integer number + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *post_nms_topn* + + * **Description**: *post_nms_topn* is the number of bounding boxes after the NMS operation. For example, *post_nms_topn* equal to 15 means that the maximum box size is 15. + * **Range of values**: a positive integer number + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *nms_thresh* + + * **Description**: *nms_thresh* is the minimum value of the proposal to be taken into consideration. For example, *nms_thresh* equal to 0.5 means that all boxes with prediction probability less than 0.5 are filtered out. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *feat_stride* + + * **Description**: *feat_stride* is the step size to slide over boxes (in pixels). For example, *feat_stride* equal to 16 means that all boxes are analyzed with the slide 16. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *min_size* + + * **Description**: *min_size* is the minimum size of box to be taken into consideration. For example, *min_size* equal 35 means that all boxes with box size less than 35 are filtered out. + * **Range of values**: a positive integer number + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *ratio* + + * **Description**: *ratio* is the ratios for anchor generation. + * **Range of values**: a list of floating-point numbers + * **Type**: `float[]` + * **Default value**: `[]` + * **Required**: *yes* + +* **Parameter name**: *scale* + + * **Description**: *scale* is the scales for anchor generation. + * **Range of values**: a list of floating-point numbers + * **Type**: `float[]` + * **Default value**: `[]` + * **Required**: *yes* + +* **Parameter name**: *clip_before_nms* + + * **Description**: *clip_before_nms* flag that specifies whether to perform clip bounding boxes before non-maximum suppression or not. + * **Range of values**: 0 or 1 + * **Type**: `int` + * **Default value**: 1 + * **Required**: *no* + +* **Parameter name**: *clip_after_nms* + + * **Description**: *clip_after_nms* is a flag that specifies whether to perform clip bounding boxes after non-maximum suppression or not. + * **Range of values**: 0 or 1 + * **Type**: `int` + * **Default value**: 0 + * **Required**: *no* + +* **Parameter name**: *normalize* + + * **Description**: *normalize* is a flag that specifies whether to perform normalization of output boxes to *[0,1]* interval or not. + * **Range of values**: 0 or 1 + * **Type**: `int` + * **Default value**: 0 + * **Required**: *no* + +* **Parameter name**: *box_size_scale* + + * **Description**: *box_size_scale* specifies the scale factor applied to logits of box sizes before decoding. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: 1.0 + * **Required**: *no* + +* **Parameter name**: *box_coordinate_scale* + + * **Description**: *box_coordinate_scale* specifies the scale factor applied to logits of box coordinates before decoding. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: 1.0 + * **Required**: *no* + +* **Parameter name**: *framework* + + * **Description**: *framework* specifies how the box coordinates are calculated. + * **Range of values**: + * "" (empty string) - calculate box coordinates like in Caffe* + * *tensorflow* - calculate box coordinates like in the TensorFlow* Object Detection API models + * **Type**: string + * **Default value**: "" (empty string) + * **Required**: *no* + +* **Parameter name**: *for_deformable* + + * **Description**: *for_deformable* specifies how the box coordinates are calculated. + * **Range of values**: 0 or 1 + * **Type**: int + * **Default value**: 0 + * **Required**: *no* + +**Mathematical Formulation** + +*Proposal* layer accepts three inputs with four dimensions. The produced blob has two dimensions: the first one equals `batch_size * post_nms_topn`. +*Proposal* layer does the following with the input blob: +1. Generates initial anchor boxes. Left top corner of all boxes is at (0, 0). Width and height of boxes are calculated from *base_size* with *scale* and *ratio* parameters. +2. For each point in the first input blob: + * pins anchor boxes to the image according to the second input blob that contains four deltas for each box: for *x* and *y* of center, for *width* and for *height* + * finds out score in the first input blob +3. Filters out boxes with size less than *min_size* +4. Sorts all proposals (*box*, *score*) by score from highest to lowest +5. Takes top *pre_nms_topn* proposals +6. Calculates intersections for boxes and filter out all boxes with \f$intersection/union > nms\_thresh\f$ +7. Takes top *post_nms_topn* proposals +8. Returns top proposals + +**Inputs**: + +* **1**: 4D input blob with class prediction scores. Required. + +* **2**: 4D input blob with box logits. Required. + +* **3**: 1D input blob 3 or 4 elements: [image height, image width, scale for image height/width OR scale for image height and scale for image width]. Required. + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## PSROIPooling Layer +Back to top + +**Name**: *PSROIPooling* + +**Category**: *Pool* + +**Short description**: *PSROIPooling* layer compute position-sensitive pooling on regions of interest specified by input. + +**Detailed description**: [Reference](https://arxiv.org/pdf/1703.06211.pdf) + +**Parameters**: *PSRoiPooling* layer parameters are specified in the `data` node, which is a child of the `layer` node. *PSROIPooling* layer takes two input blobs: with feature maps and with regions of interests (box coordinates). The latter is specified as five element tuples: *[batch_id, x_1, y_1, x_2, y_2]*. ROIs coordinates are specified in absolute values for the average mode and in normalized values (to *[0,1]* interval) for bilinear interpolation. + +* **Parameter name**: *output_dim* + + * **Description**: *output_dim* is a pooled output channel number. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *group_size* + + * **Description**: *group_size* is the number of groups to encode position-sensitive score maps. Use for *average* mode only. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: 1 + * **Required**: *no* + +* **Parameter name**: *spatial_scale* + + * **Description**: *spatial_scale* is a multiplicative spatial scale factor to translate ROI coordinates from their input scale to the scale used when pooling. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *mode* + * **Description**: *mode* specifies mode for pooling. + * **Range of values**: + * *average* - perform average pooling + * *bilinear* - perform pooling with bilinear interpolation + * **Type**: string + * **Default value**: *average* + * **Required**: *yes* + +* **Parameter name**: *spatial_bins_x* + * **Description**: *spatial_bins_x* specifies numbers of bins to divide the input feature maps over width. Used for "bilinear" mode only. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: 1 + * **Required**: *no* + +* **Parameter name**: *spatial_bins_y* + * **Description**: *spatial_bins_y* specifies numbers of bins to divide the input feature maps over height. Used for *bilinear* mode only. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: 1 + * **Required**: *no* + +**Inputs**: + +* **1**: 4D input blob with feature maps. Required. + +* **2**: 2D input blob describing box consisting of five element tuples: `[batch_id, x_1, y_1, x_2, y_2]`. Required. + +**Example** + +```xml + + + + + 1 + 3240 + 38 + 38 + + + 100 + 5 + + + + + 100 + 360 + 6 + 6 + + + +``` + +* * * + +## FakeQuantize Layer +Back to top + +**Name**: *FakeQuantize* + +**Category**: *Layer* + +**Short description**: *FakeQuantize* layer is element-wise linear quantization of floating-point input values into a discrete set of floating-point values. + +**Detailed description**: Input and output ranges as well as the number of levels of quantization are specified by dedicated inputs and attributes. There can be different limits for each element or groups of elements (channels) of the input blobs. Otherwise, one limit applies to all elements. It depends on shape of inputs that specify limits and regular broadcasting rules applied for input blobs. The output of the operator is a floating-point number of the same type as the input blob. In general, there are four values that specify quantization for each element: *input_low*, *input_high*, *output_low*, *output_high*. *input_low* and *input_high* parameters specify the input range of quantization. All input values that are outside this range are clipped to the range before actual quantization. *output_low* and *output_high* specify minimum and maximum quantized values at the output. + +**Parameters**: *Quantize* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *levels* + + * **Description**: *levels* is the number of quantization levels. + * **Range of values**: an integer greater than or equal to 2 + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +**Inputs**: + +* **1**: `X` - multidimensional input blob to quantize. Required. + +* **2**: `input_low` - minimum limit for input value. The shape must be broadcastable to the shape of `X`. Required. + +* **3**: `input_high` - maximum limit for input value. Can be the same as `input_low` for binarization. The shape must be broadcastable to the shape of `X`. Required. + +* **4**: `output_low` - minimum quantized value. The shape must be broadcastable to the shape of `X`. Required. + +* **5**: `output_high` - maximum quantized value. The shape must be broadcastable to the of `X`. Required. + +**Mathematical Formulation** + +Each element of the output is defined as the result of the following expression: + +```python +if x <= input_low: + output = output_low +elif x > input_high: + output = output_high +else: + # input_low < x <= input_high + output = round((x - input_low) / (input_high - input_low) * (levels-1)) / (levels-1) * (output_high - output_low) + output_low +``` + +**Example** +```xml + + + + + 1 + 64 + 56 + 56 + + + 1 + 64 + 1 + 1 + + + 1 + 64 + 1 + 1 + + + 1 + 1 + 1 + 1 + + + 1 + 1 + 1 + 1 + + + + + 1 + 64 + 56 + 56 + + + +``` + +* * * + +## Range Layer +Back to top + +**Name**: *Range* + +**Category**: *Layer* + +**Short description**: *Range* sequence of numbers according input values. + +**Detailed description**: *Range* layers generates a sequence of numbers starting from the value in the first input up to but not including the value in the second input with a step equal to the value in the third input. + +**Parameters**: *Range* layer does not have parameters. + +**Inputs**: + +* **1**: 0D blob (constant) with the start value of the range. Required. + +* **2**: 0D blob (constant) with the limit value of the range. Required. + +* **3**: 0D blob (constant) with the step value. Required. + +**Example** + +```xml + + + + + + + + + 10 + + + +``` + +* * * + +## RegionYolo Layer +Back to top + +**Name**: *RegionYolo* + +**Category**: *Layer* + +**Short description**: *RegionYolo* computes the coordinates of regions with probability for each class. + +**Detailed description**: [Reference](https://arxiv.org/pdf/1612.08242.pdf) + +**Parameters**: *RegionYolo* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *coords* + + * **Description**: *coords* is the number of coordinates for each region. + * **Range of values**: an integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *classes* + + * **Description**: *classes* is the number of classes for each region. + * **Range of values**: an integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *num* + + * **Description**: *num* is the number of regions. + * **Range of values**: an integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *do_softmax* + + * **Description**: *do_softmax* is a flag that specifies the inference method and affects how the number of regions is determined. + * **Range of values**: + * *0* - do not perform softmax + * *1* - perform softmax + * **Type**: `int` + * **Default value**: 1 + * **Required**: *no* + +* **Parameter name**: *mask* + + * **Description**: *mask* specifies the number of regions. Use this parameter instead of *num* when *do_softmax* is equal to 0. + * **Range of values**: a list of integers + * **Type**: `int[]` + * **Default value**: `[]` + * **Required**: *no* + +**Inputs**: + +* **1**: 4D input blob. Required. + +**Example** + +```xml + + + ... + ... + + +``` + +* * * + +## ReLU Layer +Back to top + +**Name**: *ReLU* + +**Category**: *Activation* + +**Short description**: [Reference](http://caffe.berkeleyvision.org/tutorial/layers/relu.html) + +**Detailed description**: [Reference](https://github.com/Kulbear/deep-learning-nano-foundation/wiki/ReLU-and-Softmax-Activation-Functions#rectified-linear-units) + +**Parameters**: *ReLU* layer parameters are specified parameters in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *negative_slope* + + * **Description**: *negative_slope* is a multiplier, which is used if the unit is not active (that is, negative). For example, *negative_slope* equal to 0.1 means that an inactive unit value would be multiplied by 0.1 and this is the [Leaky ReLU](https://keras.io/layers/advanced-activations/#leakyrelu). If *negative_slope* is equal to 0.0, this is the usual *ReLU*. + * **Range of values**: a non-negative floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *no* + +**Mathematical Formulation** + +\f[ +Y_{i}^{( l )} = max(0, Y_{i}^{( l - 1 )}) +\f] + +**Inputs**: + +* **1**: Multidimensional input blob. Required. + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## ReorgYolo Layer +Back to top + +**Name**: *ReorgYolo* + +**Category**: *Layer* + +**Short description**: *ReorgYolo* reorganizes input blob taking into account strides. + +**Detailed description**: [Reference](https://arxiv.org/pdf/1612.08242.pdf) + +**Parameters**: *ReorgYolo* layer parameters are specified parameters in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *stride* + + * **Description**: *stride* is the distance between cut throws in output blobs. + * **Range of values**: an integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +**Inputs**: + +* **1**: 4D input blob. Required. + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## Resample (Type 1) Layer +Back to top + +**Name**: *Resample* + +**Category**: *Layer* + +**Short description**: *Resample* layer scales the input blob by the specified parameters. + +**Parameters**: *Resample* layer parameters are specified in the `data` node, which is a child of the `layer` node. *Resample* **Type 1** layer has one input blob containing image to resample. + +* **Parameter name**: *type* + + * **Description**: *type* parameter specifies the type of blob interpolation. + * **Range of values**: + * *caffe.ResampleParameter.LINEAR* - linear blob interpolation + * *caffe.ResampleParameter.NEAREST* - nearest-neighbor blob interpolation + * **Type**: string + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *antialias* + + * **Description**: *antialias* is a flag that specifies whether to perform anti-aliasing. + * **Range of values**: + * 0 - do not perform anti-aliasing + * 1 - perform anti-aliasing + * **Type**: `int` + * **Default value**: 0 + * **Required**: *no* + +* **Parameter name**: *factor* + + * **Description**: *factor* specifies a scale factor for output height and width. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +**Inputs**: + +* **1**: 4D input blob. Required. + +**Example** + +```xml + + + + + 1 + 3 + 25 + 30 + + + + + 1 + 3 + 50 + 60 + + +​ +``` + +* * * + +## Resample (Type 2) Layer +Back to top + +**Name**: *Resample* + +**Category**: *Layer* + +**Short description**: *Resample* layer scales the input blob by the specified parameters. + +**Parameters**: *Resample* layer parameters are specified in the `data` node, which is a child of the `layer` node. *Resample* **Type 2** layer has two input blobs containing image to resample and output dimensions. + +* **Parameter name**: *type* + + * **Description**: *type* parameter specifies the type of blob interpolation. + * **Range of values**: + * *caffe.ResampleParameter.LINEAR* - linear blob interpolation + * *caffe.ResampleParameter.NEAREST* - nearest-neighbor blob interpolation + * **Type**: string + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *antialias* + + * **Description**: *antialias* is a flag that specifies whether to perform anti-aliasing. + * **Range of values**: + * 0 - do not perform anti-aliasing + * 1 - perform anti-aliasing + * **Type**: `int` + * **Default value**: 0 + * **Required**: *no* + +* **Parameter name**: *factor* + + * **Description**: *factor* parameter is ignored in the *Resample* **Type 2**. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +**Inputs**: + +* **1**: 4D input blob. Required. + +* **2**: 1D blob describing output shape. Required. + +**Example** + +```xml + + + + + 1 + 3 + 25 + 30 + + + 4 + + + + + 1 + 3 + 50 + 60 + + +​ +``` + +* * * + +## Reshape Layer +Back to top + +**Name**: *Reshape* + +**Category**: *Layer* + +**Short description**: *Reshape* layer changes dimensions of the input blob according to the specified order. Input blob volume is equal to output blob volume, where volume is the product of dimensions. + +**Detailed description**: [Reference](http://caffe.berkeleyvision.org/tutorial/layers/reshape.html) + +**Parameters**: *Reshape* layer does not have parameters. *Reshape* layer takes two input blobs: the blob to be resized and the output blob shape. The values in the second blob can be -1, 0 and any positive integer number. The two special values -1 and 0: + * 0 means copying the respective dimension of the input blob. + * -1 means that this dimension is calculated to keep the overall elements count the same as in the input blob. No more than one `-1` can be used in a reshape operation. + +**Inputs**: + +* **1**: Multidimensional input blob. Required. + +* **2**: 1D blob describing output shape. Required. + +**Example** + +```xml + + + + 2 + 5 + 5 + 24 + + + 3 + + + + + 2 + 150 + 4 + + + +``` + +* * * + +## ReverseSequence Layer +Back to top + +**Name**: *ReverseSequence* + +**Category**: *Layer* + +**Short description**: *ReverseSequence* reverses variable length slices of data. + +**Detailed description**: *ReverseSequence* slices input along the dimension specified in the *batch_axis*, and for each slice *i*, reverses the first *lengths[i]* (the second input) elements along the dimension specified in the *seq_axis*. + +**Parameters**: *ReverseSequence* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *batch_axis* + + * **Description**: *batch_axis* is the index of the batch dimension. + * **Range of values**: an integer. Can be negative. + * **Type**: `int` + * **Default value**: 0 + * **Required**: *no* + +* **Parameter name**: *seq_axis* + + * **Description**: *seq_axis* is the index of the sequence dimension. + * **Range of values**: an integer. Can be negative. + * **Type**: `int` + * **Default value**: 1 + * **Required**: *no* + +**Inputs**: + +* **1**: Blob with input data to reverse. Required. + +* **2**: 1D blob with sequence lengths in the first input blob. Required. + +**Example** + +```xml + + + + + 3 + 10 + 100 + 200 + + + 10 + + + + + 3 + 10 + 100 + 200 + + + +``` + +* * * +## RNNCell Layer +Back to top + +**Name**: *RNNCell* + +**Category**: *Layer* + +**Short description**: *RNNCell* layer computes the output using the formula described in the [article](https://hackernoon.com/understanding-architecture-of-lstm-cell-from-scratch-with-code-8da40f0b71f4). + +**Parameters**: *RNNCell* layer parameters should be specified as the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *hidden_size* + + * **Description**: *hidden_size* specifies hidden state size. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *activations* + + * **Description**: activation functions for gates + * **Range of values**: any combination of *relu*, *sigmoid*, *tanh* + * **Type**: a list of strings + * **Default value**: *sigmoid,tanh* + * **Required**: *no* + +* **Parameter name**: *activations_alpha, activations_beta* + + * **Description**: *activations_alpha, activations_beta* functions parameters + * **Range of values**: a list of floating-point numbers + * **Type**: `float[]` + * **Default value**: None + * **Required**: *no* + +* **Parameter name**: *clip* + + * **Description**: *clip* specifies value for tensor clipping to be in *[-C, C]* before activations + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *no* + +**Inputs** + +* **1**: `X` - 2D ([batch_size, input_size]) input data. Required. + +* **2**: `Hi` - 2D ([batch_size, hidden_size]) input hidden state data. Required. + +**Outputs** + +* **1**: `Ho` - 2D ([batch_size, hidden_size]) output hidden state. + +* * * + +## ROIPooling Layer +Back to top + +**Name**: *ROIPooling* + +**Category**: *Pool* + +**Short description**: *ROIPooling* is a *pooling layer* used over feature maps of non-uniform input sizes and outputs a feature map of a fixed size. + +**Detailed description**: [deepsense.io reference](https://blog.deepsense.ai/region-of-interest-pooling-explained/) + +**Parameters**: *ROIPooling* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *pooled_h* + + * **Description**: *pooled_h* is the height of the ROI output feature map. For example, *pooled_h* equal to 6 means that the height of the output of *ROIPooling* is 6. + * **Range of values**: a non-negavive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *pooled_w* + + * **Description**: *pooled_w* is the width of the ROI output feature map. For example, *pooled_w* equal to 6 means that the width of the output of *ROIPooling* is 6. + * **Range of values**: a non-negative integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *spatial_scale* + + * **Description**: *spatial_scale* is the ratio of the input feature map over the input image size. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *method* + + * **Description**: *method* specifies a method to perform pooling. If the method is *bilinear*, the input box coordinates are normalized to the [0,1] interval. + * **Range of values**: *max* or *bilinear* + * **Type**: string + * **Default value**: *max* + * **Required**: *no* + +**Inputs**: + +* **1**: 4D input blob with feature maps. Required. + +* **2**: 2D input blob describing box consisting of 5 element tuples: [batch_id, x_1, y_1, x_2, y_2]. Required. + +**Mathematical Formulation** + +\f[ +output_{j} = MAX\{ x_{0}, ... x_{i}\} +\f] + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## ExperimentalDetectronROIFeatureExtractor Layer +Back to top + +**Name**: *ExperimentalDetectronROIFeatureExtractor* (aka *ROIAlignPyramid*) + +**Category**: *Pool* + +**Short description**: *ExperimentalDetectronROIFeatureExtractor* is the *ROIAlign* operation applied over a feature pyramid. + +**Detailed description**: *ExperimentalDetectronROIFeatureExtractor* maps input ROIs to the levels of the pyramid depending on the sizes of ROIs and parameters of the operation, and then extracts features via *ROIAlign* from corresponding pyramid levels. +For more details please see the math formulas below and the following sources: + + * [Feature Pyramid Networks for Object Detection](https://arxiv.org/pdf/1612.03144.pdf) + * [Facebook AI / detectron](https://ai.facebook.com/tools/detectron/) + * [ONNX / ROI Align](https://github.com/onnx/onnx/blob/rel-1.5.0/docs/Operators.md#RoiAlign) + * [NNEF / ROI Align](https://www.khronos.org/registry/NNEF/specs/1.0/nnef-1.0.2.html#roi-resize) + +**Parameters**: *ExperimentalDetectronROIFeatureExtractor* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *output_size* + + * **Description**: *output_size* is the width and height of the output tensor. + * **Range of values**: a positive integer number + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *sampling_ratio* + + * **Description**: *sampling_ratio* is the number of sampling points per the output value. If 0, then use adaptive number computed as `ceil(roi_width / output_width)`, and likewise for height. + * **Range of values**: a non-negative integer number + * **Type**: `int` + * **Default value**: 0 + * **Required**: *yes* + +* **Parameter name**: *pyramid_scales* + + * **Description**: *pyramid_scales* enlists `image_size / layer_size[l]` ratios for pyramid layers `l=1,...,L`, where `L` is the number of pyramid layers, and `image_size` refers to network's input image. Note that pyramid's largest layer may have smaller size than input image, e.g. `image_size` is 640 in the XML example below. + * **Range of values**: a list of positive integer numbers + * **Type**: `int[]` + * **Default value**: None + * **Required**: *yes* + +**Inputs**: + +* **0**: 2D input blob describing the rois as 4-tuples: [x1, y1, x2, y2]. Batch size is the number of rois. Coordinates *x* and *y* are `float` numbers and refer to the input *image_size*. Required. + +* **1**, ..., **L**: Pyramid of 4D input blobs with feature maps. Batch size must be 1. The number of channels must be the same for all layers of the pyramid. The layer width and height must equal to the `layer_size[l] = image_size / pyramid_scales[l]`. Required. + +**Outputs**: + +* **0**: 4D output blob. Batch size equals to number of rois. +Channels number is the same as for all images in the input pyramid. +Data type is `float`. Required. + +**Mathematical Formulation** + +*ExperimentalDetectronROIFeatureExtractor* applies the *ROIAlign* algorithm to the pyramid layers: + +* output[i, :, :, :] = ROIAlign(inputPyramid[j], rois[i]) +* j = PyramidLevelMapper(rois[i]) + +PyramidLevelMapper maps the ROI to the pyramid level using the following formula: + +* j = floor(2 + log2(sqrt(w * h) / 224) + +Here 224 is the "canonical" size, 2 is the pyramid starting level, and w, h are the ROI width and height. + +**Example** + +```xml + + + + + 100 + 4 + + + 1 + 256 + 160 + 160 + + + 1 + 256 + 80 + 80 + + + 1 + 256 + 40 + 40 + + + 1 + 256 + 20 + 20 + + + + + 100 + 256 + 14 + 14 + + + +``` + +* * * + +## ExperimentalSparseWeightedSum Layer +Back to top + +**Name**: *ExperimentalSparseWeightedSum* + +**Category**: *Layer* + +**Short description**: *ExperimentalSparseWeightedSum* extracts embedding vectors from the parameters table for each object feature value and sum up these embedding vectors multiplied by weights for each object. + +**Detailed description**: [Reference](https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup_sparse). This is similar to *embedding_lookup_sparse* but it accepts objects with empty feature values for which it uses a defaut value to extract an embedding from the parameters table. In comparison with *embedding_lookup_sparse* it has a limitation to work only with two-dimensional indices tensor. + +**Inputs**: + +* **1**: 2-D tensor. Input indices of the sparse tensor. It contains with an integer type. Required. +* **2**: 1-D tensor. Input values of the sparse tensor. It contains with an integer type. Required. +* **3**: 1-D tensor. Dense shape of the sparse tensor. It contains with an integer type. Required. +* **4**: N-D tensor. The parameters table. It contains with a float type. Required. +* **5**: 0-D tensor. The default value. It contains with an integer type. Required. +* **6**: 1-D tensor. Input weights. It contains with a float type. Optional. + +**Outputs**: + +* **1**: The output tensor of resulted embedding vectors for each object. It is has a shape [batch_size, params_table_shape[1], ..., params_table_shape[-1]] where batch_size is a number of objects or a number of rows in the sparse tensor. + +* * * + +## ScaleShift Layer +Back to top + +**Name**: *ScaleShift* + +**Category**: *Layer* + +**Short description**: *ScaleShift* layer performs linear transformation of the input blobs. Weights denote a scaling parameter, biases denote a shift. + +**Parameters**: *ScaleShift* layer does not have parameters. + +**Inputs**: + +* **1**: 4D input blob. Required. + +**Mathematical Formulation** + +\f[ +o_{i} =\gamma b_{i} + \beta +\f] + +**Example** + +``` + + ... + ... + +``` + +* * * + +## Select Layer +Back to top +**Name**: *Select* + +**Category**: *Layer* + +**Short description**: *Select* layer returns a tensor filled with the elements from the second or the third input, depending on the condition (the first input) value. + +**Detailed description**: *Select* takes elements from the second (`then`) or the third (`else`) input based on a condition mask + provided in the first input (`cond`). The `cond` tensor is broadcasted to `then` and `else` tensors. The output tensor shape is equal + to the broadcasted shape of `cond`, `then`, and `else`. The behavior is similar to [numpy.where](https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html) with three parameters. + +**Parameters**: *Select* layer does not have parameters. + +**Inputs**: +* **1**: `cond` tensor with selection mask (only integer values). The tensor can be 0D. +* **2**: `then` the tensor with elements to take where condition is true. +* **3**: `else` the tensor with elements to take where condition is false. + +**Example** + +```xml + + + + 3 + 2 + + + 3 + 2 + + + 3 + 2 + + + + + 3 + 2 + + + +``` + +* * * + +## Shape Layer +Back to top + +**Name**: *Shape* + +**Category**: *Layer* + +**Short description**: *Shape* produces a blob with the input blob shape. + +**Parameters**: *Shape* layer does not have parameters. + +**Inputs**: + +* **1**: Multidimensional input blob. Required. + +**Example** + +```xml + + + + 2 + 3 + 224 + 224 + + + + + 4 + + + +``` + +* * * + +## ShuffleChannels Layer +Back to top + +**Name**: *ShuffleChannels* + +**Category**: *Layer* + +**Short description**: *ShuffleChannels* permutes data in the channel dimension of the input blob. + +**Parameters**: *ShuffleChannels* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *axis* + + * **Description**: *axis* specifies the index of a channel dimension. + * **Range of values**: a non-negative integer + * **Type**: `int` + * **Default value**: 1 + * **Required**: *No* + +* **Parameter name**: *group* + + * **Description**: *group* specifies the number of groups to split the channel dimension into. This number must evenly divide the channel dimension size. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: 1 + * **Required**: *No* + +**Inputs**: + +* **1**: 4D input blob. Required. + +**Mathematical Formulation** + +The operation is the equivalent with the following transformation of the input blob *x* of shape *[N, C, H, W]*: + +``` +x' = reshape(x, [N, group, C / group, H * W]) +x'' = transpose(x', [0, 2, 1, 3]) +y = reshape(x'', [N, C, H, W]) +``` + +where *group* is the layer parameter described above. + +**Example** + +```xml + + + + + 3 + 12 + 200 + 400 + + + + + 3 + 12 + 200 + 400 + + + +``` + +* * * + +## SimplerNMS Layer +Back to top + +**Name**: *SimplerNMS* + +**Category**: *Layer* + +**Short description**: *SimplerNMS* layer filters bounding boxes and outputs only those with the highest confidence of prediction. + +**Parameters**: *SimplerNMS* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *pre_nms_topn* + + * **Description**: *pre_nms_topn* is the number of bounding boxes before the NMS operation. For example, *pre_nms_topn* equal to 15 means that the minimum box size is 15. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *post_nms_topn* + + * **Description**: *post_nms_topn* is the quantity of bounding boxes after the NMS operation. For example, *post_nms_topn* equal to 15 means that the maximum box size is 15. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *iou_threshold* + + * **Description**: *iou_threshold* is the minimum ratio of boxes overlapping to be taken into consideration. For example, *iou_threshold* equal to 0.7 means that all boxes with overlapping ratio less than 0.7 are filtered out. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *feat_stride* + + * **Description**: *feat_stride* is the step size to slide over boxes (in pixels). For example, *feat_stride* equal to 16 means that all boxes are analyzed with the slide 16. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *min_bbox_size* + + * **Description**: *min_bbox_size* is the minimum size of a box to be taken into consideration. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *scale* + + * **Description**: *scale* is for generating anchor boxes. + * **Range of values**: a list of positive floating-point numbers + * **Type**: `float[]` + * **Default value**: `[]` + * **Required**: *no* + +**Inputs**: + +* **1**: 4D input blob with class prediction scores. Required. + +* **2**: 4D input blob with box logits. Required. + +* **3**: 1D input blob 3 or 4 elements: [image height, image width, scale for image height/width OR scale for image height and scale for image width]. Required. + +**Mathematical Formulation** + +*SimplerNMS* accepts three inputs with four dimensions. Produced blob has two dimensions, the first one equals *post_nms_topn*. +*SimplerNMS* does the following with the input blob: +1. Generates initial anchor boxes. Left top corner of all boxes is (0, 0). Width and height of boxes are calculated based on scaled (according to the scale parameter) default widths and heights +2. For each point in the first input blob: + * pins anchor boxes to a picture according to the second input blob, which contains four deltas for each box: for `x` and `y` of the center, for width, and for height + * finds out score in the first input blob +3. Filters out boxes with size less than *min_bbox_size.* +4. Sorts all proposals (*box, score*) by score from highest to lowest +5. Takes top *pre_nms_topn* proposals +6. Calculates intersections for boxes and filters out all with \f$intersection/union > iou\_threshold\f$ +7. Takes top *post_nms_topn* proposals +8. Returns top proposals + +**Example** + +```xml + + + ... + ... + +``` + +* * * + +## Slice Layer +Back to top + +**Name**: *Slice* + +**Category**: *Layer* + +**Short description**: *Slice* layer splits the input blob into several pieces over the specified axis. + +**Parameters**: *Slice* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *axis* + + * **Description**: *axis* specifies the axis to split the input blob along + * **Range of values**: a non-negative integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +**Inputs**: + +* **1**: Multidimensional input blob. Required. + +**Example** + +```xml + + + + + 1 + 1048 + 14 + 14 + + + + + 1 + 1024 + 14 + 14 + + + 1 + 24 + 14 + 14 + + + +``` + +* * * + +## SoftMax Layer +Back to top + +**Name**: *SoftMax* + +**Category**: *Activation* + +**Short description**: [Reference](https://github.com/Kulbear/deep-learning-nano-foundation/wiki/ReLU-and-Softmax-Activation-Functions#softmax) + +**Detailed description**: [Reference](http://cs231n.github.io/linear-classify/#softmax) + +**Parameters**: *SoftMax* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *axis* + + * **Description**: *axis* is the axis along which the *SoftMax* is calculated. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: 1 + * **Required**: *no* + +**Mathematical Formulation** + +\f[ +y_{c} = \frac{e^{Z_{c}}}{\sum_{d=1}^{C}e^{Z_{d}}} +\f] +where \f$C\f$ is a number of classes + +**Example** + +```xml + + + ... + ... + +``` + +**Inputs**: + +* **1**: Multidimensional input blob. Required. + +* * * + +## SparseFillEmptyRows Layer +Back to top + +**Name**: *SparseFillEmptyRows* + +**Category**: *Layer* + +**Short description**: *SparseFillEmptyRows* fills empty rows in the input 2-D SparseTensor with a default value. + +**Detailed description**: [Reference](https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/sparse-fill-empty-rows) + +**Inputs**: + +* **1**: 2-D tensor. Input indices of the sparse tensor. Required. +* **2**: 1-D tensor. Input values of the sparse tensor. Required. +* **3**: 1-D tensor. Shape of the sparse tensor. Value of this input is required for the Model Optimizer. +* **4**: 0-D tensor. Default value to insert at rows missing from the input sparse tensor. Required. + +**Outputs**: + +* **1**: 2-D tensor. Indices of the filled sparse tensor. +* **2**: 1-D tensor. Values of the filled sparse tensor. +* **3**: 1-D tensor. An indicator of whether the dense row was missing in the input sparse tensor. + +* * * + +## SparseSegmentMean Layer +Back to top + +**Name**: *SparseSegmentMean* + +**Category**: *Layer* + +**Short description**: *SparseSegmentMean* computes the mean along sparse segments of a tensor. + +**Detailed description**: [Reference](https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/sparse-segment-mean) + +**Parameters**: *SparseSegmentMean* layer does not have parameters. + +**Inputs**: + +* **1**: ND tensor. Data tensor from which rows are selected for the mean operation. Required. +* **2**: 1D tensor. Tensor of rows indices selected from the first input tensor along 0 dimension. Required. +* **3**: 1D tensor. Tensor of segment IDs that rows selected for the operation belong to. Rows beloging to the same segment are summed up and divided by N, where N is a number of selected rows in a segment. This input has the same size as the second input. Values must be sorted in ascending order and can be repeated. Required. + +**Outputs**: + +* **1**: ND tensor. It has the same shape as the data tensor, except for dimension 0, which has a size equal to a size of an indices tensor. + +* * * + +## SparseSegmentSqrtN Layer +Back to top + +**Name**: *SparseSegmentSqrtN* + +**Category**: *Layer* + +**Short description**: *SparseSegmentSqrtN* computes the sum along sparse segments of a tensor and divides it by the square root of N, where N is a number of rows in a segment. + +**Detailed description**: [Reference](https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/sparse-segment-sqrt-n) + +**Parameters**: *SparseSegmentSqrtN* layer does not have parameters. + +**Inputs**: + +* **1**: ND tensor. Data tensor from which rows are selected. Required. +* **2**: 1D tensor. Tensor of rows indices selected from the first input tensor along 0 dimension. Required. +* **3**: 1D tensor. Tensor of segment IDs that selected rows belong to. Rows belonging to the same segment are summed up and divided by the square root of N, where N is a number of rows in a segment. This input tensor has the same size as the second input. Values must be sorted in ascending order and can be repeated. Required. + +**Outputs**: + +* **1**: ND tensor. It has the same shape as the data tensor, except for a dimension 0, which has a size equal to a size of an indices tensor. + +* * * + +## SparseSegmentSum Layer +Back to top + +**Name**: *SparseSegmentSum* + +**Category**: *Layer* + +**Short description**: *SparseSegmentSum* computes the sum along sparse segments of a tensor. + +**Detailed description**: [Reference](https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/sparse-segment-sum) + +**Parameters**: *SparseSegmentSum* layer does not have parameters. + +**Inputs**: + +* **1**: ND tensor. Data tensor from which rows are selected. Required. +* **2**: 1D tensor. Tensor of rows indices selected from the first input tensor along 0 dimension. Required. +* **3**: 1D tensor. Tensor of segment IDs that selected rows belong to. Rows belonging to the same segment are summed up. This input tensor has the same size as the second input. Values must be sorted in ascending order and can be repeated. Required. + +**Outputs**: + +* **1**: ND tensor. It has the same shape as the data tensor, except for a dimension 0, which has a size equal to a size of an indices tensor. + +* * * + +## SparseToDense Layer +Back to top + +**Name**: *SparseToDense* + +**Category**: *Layer* + +**Short description**: *SparseToDense* converts a sparse tensor into a dense tensor. + +**Detailed description**: [Reference](https://www.tensorflow.org/api_docs/python/tf/sparse/to_dense) + +**Inputs**: + +* **1**: 2-D tensor. Input indices of the sparse tensor. It contains with an integer type. Required. +* **2**: 1-D tensor. Dense shape of the sparse tensor. It contains with an integer type. Required. +* **3**: 1-D tensor. Input values of the sparse tensor. It contains with integer and float types. Required. +* **4**: 0-D tensor. Default value to insert at missing positions. The fourth input type must be the same as the third input type. If it is not specified, zero value is used. Optional. + +**Outputs**: + +* **1**: The output dense tensor. The output tensor shape is equal to a value of the second input. + +* * * + +## Split Layer +Back to top + +**Name**: *Split* + +**Category**: *Layer* + +**Short description**: *Split* layer splits the input along the specified axis into several output pieces. + +**Detailed description**: [Reference](http://caffe.berkeleyvision.org/tutorial/layers/split.html) + +**Parameters**: *Split* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *axis* + + * **Description**: *axis* is the number of the axis to split input blob along. + * **Range of values**: a non-negative integer less than the number of dimensions in the input + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *num_split* + + * **Description**: *num_split* is the number of pieces to split the input into. The *num_split* must evenly divide the size of the *axis* dimension. + * **Range of values**: a positive integer less than or equal to the size of the dimension being split over + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +**Mathematical Formulation** + +For example, if the blob is *BxC+CxHxW*, `axis="1"`, and `num_split="2"`, the sizes of output blobs are *BxCxHxW*. + +**Inputs**: + +* **1**: Multidimensional input blob. Required. + +**Example** + +```xml + + + ... + ... + +``` + +* * * + + +## Squeeze Layer + +**Name**: *Squeeze* + +**Category**: *Layer* + +**Short description**: *Squeeze* removes specified dimensions (second input) equal to 1 of the first input tensor. If the second input is omitted then all dimensions equal to 1 are removed. If the specified dimension is not equal to one then error is raised. + +**Parameters**: *Squeeze* layer doesn't have parameters. + +**Inputs**: + +* **1**: Multidimensional input blob. Required. + +* **2**: `(optional)`: 0D or 1D tensor with dimensions indices to squeeze. Values could be negative. Indices could be integer or float values. + +**Example** + +*Example 1:* +```xml + + + + 1 + 3 + 1 + 2 + + + + + 2 + + + + + 3 + 2 + + + +``` + +*Example 2: squeeze 1D tensor with 1 element to a 0D tensor (constant)* +```xml + + + + 1 + + + + + 1 + + + + + + + +``` + + +* * * + +## StridedSlice Layer + +**Name**: *StridedSlice* + +**Short description**: *StridedSlice* layer extracts a strided slice of a blob. + It is similar to generalized array indexing in Python\*. + +**Parameters**: *StridedSlice* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *begin_mask* + + * **Description**: *begin_mask* is a bit mask. *begin_mask[i]* equal to 0 means that the corresponding dimension of the `begin` input is ignored. + * **Range of values**: a list of `0`s and `1`s + * **Type**: `int[]` + * **Default value**: `[1]` + * **Required**: *yes* + +* **Parameter name**: *end_mask* + + * **Description**: *end_mask* is a bit mask. If *end_mask[i]* is 0, the corresponding dimension of the `end` input is ignored. + * **Range of values**: a list of `0`s and `1`s + * **Type**: `int[]` + * **Default value**: `[1]` + * **Required**: *yes* + +* **Parameter name**: *new_axis_mask* + + * **Description**: *new_axis_mask* is a bit mask. If *new_axis_mask[i]* is 1, a length 1 dimension is inserted on the `i`-th position of input blob. + * **Range of values**: a list of `0`s and `1`s + * **Type**: `int[]` + * **Default value**: `[0]` + * **Required**: *no* + + +* **Parameter name**: *shrink_axis_mask* + + * **Description**: *shrink_axis_mask* is a bit mask. If *shrink_axis_mask[i]* is 1, the dimension on the `i`-th position is deleted. + * **Range of values**: a list of `0`s and `1`s + * **Type**: `int[]` + * **Default value**: `[0]` + * **Required**: *no* + +* **Parameter name**: *ellipsis_mask* + + * **Description**: *ellipsis_mask* is a bit mask. It inserts missing dimensions on a position of a non-zero bit. + * **Range of values**: a list of `0`s and `1`. Only one non-zero bit is allowed. + * **Type**: `int[]` + * **Default value**: `[0]` + * **Required**: *no* + +**Inputs**: + +* **1**: Multidimensional input blob. Required. + +* **2**: `begin` input - 1D input blob with begin indexes for input blob slicing. Required. + Out-of-bounds values are silently clamped. If `begin_mask[i]` is 0, the value of `begin[i]` is ignored + and the range of the appropriate dimension starts from 0. + Negative values mean indexing starts from the end. For example, if `foo=[1,2,3]`, `begin[0]=-1` means `begin[0]=3`. + +* **3**: `end` input - 1D input blob with end indexes for input blob slicing. Required. + Out-of-bounds values will be silently clamped. If `end_mask[i]` is 0, the value of `end[i]` is ignored + and the full range of the appropriate dimension is used instead. + Negative values mean indexing starts from the end. For example, if `foo=[1,2,3]`, `end[0]=-1` means `end[0]=3`. + +* **4**: `stride` input - 1D input blob with strides. Optional. + +**Example** +```xml + + + + + 1 + 2 + 384 + 640 + 8 + + + 5 + + + 5 + + + 5 + + + + + 1 + 384 + 640 + 8 + + + +``` + + +* * * + + +## TensorIterator Layer +Back to top + +**Name**: *TensorIterator* + +**Category**: *Layer* + +**Short description**: *TensorIterator* (TI) layer performs recurrent sub-graph execution iterating through the data. + +**Parameters**: The parameters are specified in the child nodes of the `port_map` and `back_edges` sections, which are child nodes of the layer node. The `port_map` and `back_edges` sections specify data mapping rules. + +* **Node**: *port_map* is a set of rules to map input/output data blobs of the `TensorIterator` layer onto `body` data blobs. Port mapping rule is presented as `input`/`output` nodes. + + * **Parameter name**: *external_port_id* + + * **Description**: *external_port_id* is a port ID of the `TensorIterator` layer. + * **Range of values**: indexes of the *TensorIterator* outputs + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + + * **Parameter name**: *internal_layer_id* + + * **Description**: *internal_layer_id* is a layer ID inside the `body` sub-network to map to. + * **Range of values**: IDs of the layers inside in the *TensorIterator* layer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + + * **Parameter name**: *internal_port_id* + + * **Description**: *internal_port_id* is a port ID of the `body` layer to map to. + * **Range of values**: indexes of the `body` layer input + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + + * **Parameter name**: *axis* + + * **Description**: *axis* is an axis to iterate through. `-1` means no iteration is done. + * **Range of values**: an integer + * **Type**: `int` + * **Default value**: -1 + * **Required**: *no* + + * **Parameter name**: *start* + + * **Description**: *start* is an index where the iteration starts from. Negative value means counting indexes from the end. + * **Range of values**: an integer + * **Type**: `int` + * **Default value**: 0 + * **Required**: *no* + + * **Parameter name**: *end* + + * **Description**: *end* is an index where iteration ends. Negative value means counting indexes from the end. + * **Range of values**: an integer + * **Type**: `int` + * **Default value**: -1 + * **Required**: *no* + + * **Parameter name**: *stride* + + * **Description**: *stride* is a step of iteration. Negative value means backward iteration. + * **Range of values**: an integer + * **Type**: `int` + * **Default value**: 1 + * **Required**: *no* + +* **Node**: *back_edges* is a set of rules to transfer data blobs between `body` iteration. Mapping rule is presented as a general `edge` node with port and layer indexes of `body` sub-network. + + * **Parameter name**: *from-layer* + + * **Description**: *from-layer* is a layer ID inside the `body` sub-network. + * **Range of values**: IDs of the layers inside the *TensorIterator* + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + + * **Parameter name**: *from-port* + + * **Description**: *from-port* is a port ID inside the `body` sub-network to start mapping from. + * **Range of values**: the respective layer port index + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + + * **Parameter name**: *to-layer* + + * **Description**: *to-layer* is a layer ID inside the `body` sub-network to end mapping. + * **Range of values**: IDs of the layers inside the *TensorIterator* + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + + * **Parameter name**: *to-port* + + * **Description**: *to-port* is a port ID inside the `body` sub-network to end mapping. + * **Range of values**: the respective layer port index + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Node**: *body* is a sub-network that will be recurrently executed. + + * **Parameters**: The *body* node does not have parameters. + +**Example** + +```xml + + ... + ... + + + + ... + + ... + + + + ... + + + ... + ... + + +``` + +* * * + +## Tile Layer +Back to top + +**Name**: *Tile* + +**Category**: *Layer* + +**Short description**: *Tile* layer extends input blob with copies of data along a specified axis. + +**Detailed description**: [Reference](http://caffe.help/manual/layers/tile.html) + +**Parameters**: *Tile* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *axis* + + * **Description**: *axis* is the index of an axis to tile. For example, *axis* equal to 3 means that the fourth axis is used for tiling. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *tiles* + + * **Description**: *tiles* is the size of the specified axis in the output blob. For example, *tiles* equal to 88 means that the output blob gets 88 copies of data from the specified axis. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +**Mathematical Formulation** + +*Tile* extends input blobs and filling in output blobs by the following rules: +\f[ +out_i=input_i[inner\_dim*t] +\f] +\f[ +t \in \left ( 0, \quad tiles \right ) +\f] + +**Inputs**: + +* **1**: Multidimensional input blob. Required. + +**Example** + +```xml + + + ... + ... + +``` + +* * * + + +## TopK Layer +Back to top + +**Name**: *TopK* + +**Category**: *Layer* + +**Short description**: *TopK* layer computes indices and values of the *k* maximum/minimum values for each slice along the axis specified. + +**Parameters**: *TopK* layer parameters are specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *axis* + + * **Description**: Specifies the axis along which to search for k maximum/minimum values. + * **Range of values**: an integer. Negative value means counting dimension from the end. + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *mode* + + * **Description**: *mode* specifies an operation to use for selecting the largest element of two. + * **Range of values**: `min`, `max` + * **Type**: `string` + * **Default value**: None + * **Required**: *yes* + +* **Parameter name**: *sort* + + * **Description**: *sort* specifies an order of output elements and/or indices. + * **Range of values**: `value`, `index`, `none` + * **Type**: `string` + * **Default value**: None + * **Required**: *yes* + + +**Inputs**: + +* **1**: Arbitrary tensor. Required. + +* **2**: *k* - scalar specifies how many maximum/minimum elements should be computed + +**Outputs**: + +* **1**: Output tensor with top *k* values from input tensor along specified dimension *axis*. The shape of the tensor is `[input1.shape[0], ..., input1.shape[axis-1], k, input1.shape[axis+1], ...]`. + +* **2**: Output tensor with top *k* indices for each slice along *axis* dimension. + The shape of the tensor is the same as for the 1st output, that is `[input1.shape[0], ..., input1.shape[axis-1], k, input1.shape[axis+1], ...]` + +**Mathematical Formulation** + +Output tensor is populated by values computes in the following way: + + output[i1, ..., i(axis-1), j, i(axis+1) ..., iN] = top_k(input[i1, ...., i(axis-1), :, i(axis+1), ..., iN]), k, sort, mode) + +So for each slice `input[i1, ...., i(axis-1), :, i(axis+1), ..., iN]` which represents 1D array, top_k value is computed individually. Sorting and minimum/maximum are controlled by `sort` and `mode` attributes. + +**Example** + +```xml + + + + + 6 + 12 + 10 + 24 + + + + + + 6 + 3 + 10 + 24 + + + +``` + +* * * + +## Unique Layer +Back to top + +**Name**: *Unique* + +**Category**: *Layer* + +**Short description**: *Unique* finds unique elements in 1-D tensor. + +**Detailed description**: [Reference](https://pytorch.org/docs/stable/torch.html?highlight=unique#torch.unique) + +**Parameters**: *Unique* layer parameters should be specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *sorted* + + * **Description**: If *sorted* is equal to *true*, the unique elements in the output are sorted in ascending order. Otherwise, all of the unique elements are sorted in the same order as they occur in the input. + * **Range of values**: *true* or *false* + * **Type**: `string` + * **Required**: *yes* + +* **Parameter name**: *return_inverse* + + * **Description**: If *return_inverse* is equal to *true*, the layer outputs the indices. Otherwise, it does not. + * **Range of values**: *true* or *false* + * **Type**: `string` + * **Required**: *yes* + +* **Parameter name**: *return_counts* + + * **Description**: If *return_counts* is equal to *true*, the layer outputs the counts for each unique element. Otherwise, it does not. + * **Range of values**: *true* or *false* + * **Type**: `string` + * **Required**: *yes* + +**Input**: + +* **1**: 1-D tensor. Input tensor. Required. + +**Outputs**: + +* **1**: 1-D tensor. Tensor of all unique elements from the input tensor. As a number of unique elements can be less than a size of the input, the end of this tensor is marked with the latest unique element. Required. +* **2**: 1-D tensor. Tensor of indices of unique elements of the first output that can be used to reconstruct the input. The size of this tensor is equal to the input size. It outputs in the second output port. Optional. +* **3**: 1-D tensor. Tensor of counts of occurrences of each unique element in the input. It has the same size as the output with unique elements. The end of this tensor is marked with zero. It outputs in the second output port if return_inverse is *false*, otherwise, it outputs in the third output port. Optional. + +**Example** +```xml + + + + + 20 + + + + + 20 + + + 20 + + + +``` + +* * * + + + +## Unsqueeze Layer + +**Name**: *Unsqueeze* + +**Category**: *Layer* + +**Short description**: *Unsqueeze* adds dimensions of size 1 to the first input tensor. The second input value specifies a list of dimensions that will be inserted. Indices specify dimensions in the output tensor. + +**Parameters**: *Unsqueeze* layer doesn't have parameters. + +**Inputs**: + +* **1**: Multidimensional input blob. Required. + +* **2**: OD or 1D tensor with dimensions indices to be set to 1. Values could be negative. Indices could be integer or float values. + +**Example** + +*Example 1:* +```xml + + + + 2 + 3 + + + + + 2 + + + + + 1 + 2 + 3 + 1 + + + +``` + +*Example 2: (unsqueeze 0D tensor (constant) to 1D tensor)* +```xml + + + + + + + + 1 + + + + + 1 + + + +``` + +* * * + +## Unique Layer +Back to top + +**Name**: *Unique* + +**Category**: *Layer* + +**Short description**: *Unique* finds unique elements in 1-D tensor. + +**Detailed description**: [Reference](https://pytorch.org/docs/stable/torch.html?highlight=unique#torch.unique) + +**Parameters**: *Unique* layer parameters should be specified in the `data` node, which is a child of the `layer` node. + +* **Parameter name**: *sorted* + + * **Description**: If *sorted* is equal to *true*, the unique elements in the output are sorted in ascending order. Otherwise, all of the unique elements are sorted in the same order as they occur in the input. + * **Range of values**: *true* or *false* + * **Type**: `string` + * **Required**: *yes* + +* **Parameter name**: *return_inverse* + + * **Description**: If *return_inverse* is equal to *true*, the layer outputs the indices. Otherwise, it does not. + * **Range of values**: *true* or *false* + * **Type**: `string` + * **Required**: *yes* + +* **Parameter name**: *return_counts* + + * **Description**: If *return_counts* is equal to *true*, the layer outputs the counts for each unique element. Otherwise, it does not. + * **Range of values**: *true* or *false* + * **Type**: `string` + * **Required**: *yes* + +**Input**: + +* **1**: 1-D tensor. Input tensor. Required. + +**Outputs**: + +* **1**: 1-D tensor. Tensor of all unique elements from the input tensor. As a number of unique elements can be less than a size of the input, the end of this tensor is marked with the latest unique element. Required. +* **2**: 1-D tensor. Tensor of indices of unique elements of the first output that can be used to reconstruct the input. The size of this tensor is equal to the input size. It outputs in the second output port. Optional. +* **3**: 1-D tensor. Tensor of counts of occurrences of each unique element in the input. It has the same size as the output with unique elements. The end of this tensor is marked with zero. It outputs in the second output port if return_inverse is *false*, otherwise, it outputs in the third output port. Optional. + +**Example** +```xml + + + + + 20 + + + + + 20 + + + 20 + + + +``` + +* * * \ No newline at end of file diff --git a/docs/MO_DG/prepare_model/convert_model/kaldi_specific/Aspire_Tdnn_Model.md b/docs/MO_DG/prepare_model/convert_model/kaldi_specific/Aspire_Tdnn_Model.md new file mode 100644 index 00000000000000..b4e0ea06651ccf --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/kaldi_specific/Aspire_Tdnn_Model.md @@ -0,0 +1,112 @@ +# Convert Kaldi* ASpIRE Chain Time Delay Neural Network (TDNN) Model to the Intermediate Representation {#openvino_docs_MO_DG_prepare_model_convert_model_kaldi_specific_Aspire_Tdnn_Model} + +You can [download a pre-trained model](https://kaldi-asr.org/models/1/0001_aspire_chain_model.tar.gz) +for the ASpIRE Chain Time Delay Neural Network (TDNN) from the Kaldi* project official web-site. + +## Convert ASpIRE Chain TDNN Model to IR + +To generate the Intermediate Representation (IR) of the model, run the Model Optimizer with the following parameters: +```sh +python3 ./mo_kaldi.py --input_model exp/chain/tdnn_7b/final.mdl --output output +``` + +The IR will have two inputs: `input` for data and `ivector` for ivectors. + +## Example: Run ASpIRE Chain TDNN Model with the Speech Recognition Sample + +These instructions show how to run the converted model with the [Speech Recognition sample](../../../../../inference-engine/samples/speech_sample/README.md). +In this example, the input data contains one utterance from one speaker. + +To follow the steps described below, you must first do the following: +1. Download a [Kaldi repository](https://github.com/kaldi-asr/kaldi). +2. Build it using instructions in `README.md` in the repository. +3. Download the [model archive](https://kaldi-asr.org/models/1/0001_aspire_chain_model.tar.gz) from Kaldi website. +4. Extract the downloaded model archive to the `egs/aspire/s5` folder of the Kaldi repository. + + +To run the ASpIRE Chain TDNN Model with Speech Recognition sample: + +1. Prepare the model for decoding. Refer to the `README.txt` file from the downloaded model archive for instructions. +2. Convert data and ivectors to `.ark` format. Refer to the corresponding sections below for instructions. + +### Prepare Data + +If you have a `.wav` data file, you can convert it to `.ark` format using the following command: +```sh +/src/featbin/compute-mfcc-feats --config=/egs/aspire/s5/conf/mfcc_hires.conf scp:./wav.scp ark,scp:feats.ark,feats.scp +``` +Add the `feats.ark` absolute path to `feats.scp` to avoid errors in later commands. + +### Prepare Ivectors + +To prepare ivectors for the Speech Recognition sample, do the following: + +1. Copy the `feats.scp` file to the `egs/aspire/s5/` directory of the built Kaldi repository and navigate there: +```sh +cp feats.scp /egs/aspire/s5/ +cd /egs/aspire/s5/ +``` + +2. Extract ivectors from the data: +```sh +./steps/online/nnet2/extract_ivectors_online.sh --nj 1 --ivector_period exp/tdnn_7b_chain_online/ivector_extractor +``` +To simplify the preparation of ivectors for the Speech Recognition sample, +specify the maximum number of frames in utterances as a parameter for `--ivector_period` +to get only one ivector per utterance. + +To get the maximum number of frames in utterances, you can use the following command line: +```sh +../../../src/featbin/feat-to-len scp:feats.scp ark,t: | cut -d' ' -f 2 - | sort -rn | head -1 +``` +As a result, in ``, you will find the `ivector_online.1.ark` file. + +3. Go to the ``: +```sh +cd +``` + +4. Convert the `ivector_online.1.ark` file to text format using the `copy-feats` tool. Run the following command: +```sh +/src/featbin/copy-feats --binary=False ark:ivector_online.1.ark ark,t:ivector_online.1.ark.txt +``` + +5. For the Speech Recognition sample, the `.ark` file must contain an ivector +for each frame. You must copy the ivector `frame_count` times. +To do this, you can run the following script in the Python* command prompt: +```python +import subprocess + +subprocess.run(["/src/featbin/feat-to-len", "scp:/egs/aspire/s5/feats.scp", "ark,t:feats_length.txt"]) + +f = open("ivector_online.1.ark.txt", "r") +g = open("ivector_online_ie.ark.txt", "w") +length_file = open("feats_length.txt", "r") +for line in f: + if "[" not in line: + for i in range(frame_count): + line = line.replace("]", " ") + g.write(line) + else: + g.write(line) + frame_count = int(length_file.read().split(" ")[1]) +g.write("]") +f.close() +g.close() +length_file.close() +``` + +6. Create an `.ark` file from `.txt`: +```sh +/src/featbin/copy-feats --binary=True ark,t:ivector_online_ie.ark.txt ark:ivector_online_ie.ark +``` + +### Run the Speech Recognition Sample + +Run the Speech Recognition sample with the created ivector `.ark` file as follows: +```sh +speech_sample -i feats.ark,ivector_online_ie.ark -m final.xml -d CPU -o prediction.ark -cw_l 17 -cw_r 12 +``` + +Results can be decoded as described in "Use of Sample in Kaldi* Speech Recognition Pipeline" chapter +in [the Speech Recognition Sample description](../../../../../inference-engine/samples/speech_sample/README.md). diff --git a/docs/MO_DG/prepare_model/convert_model/mxnet_specific/Convert_Style_Transfer_From_MXNet.md b/docs/MO_DG/prepare_model/convert_model/mxnet_specific/Convert_Style_Transfer_From_MXNet.md new file mode 100644 index 00000000000000..f6a9f189750598 --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/mxnet_specific/Convert_Style_Transfer_From_MXNet.md @@ -0,0 +1,114 @@ +# Converting a Style Transfer Model from MXNet* {#openvino_docs_MO_DG_prepare_model_convert_model_mxnet_specific_Convert_Style_Transfer_From_MXNet} + +The tutorial explains how to generate a model for style transfer using the public MXNet\* neural style transfer sample. +To use the style transfer sample from OpenVINO™, follow the steps below as no public pre-trained style transfer model is provided with the OpenVINO toolkit. + +#### 1. Download or clone the repository with an MXNet neural style transfer sample: [Zhaw's Neural Style Transfer repository](https://github.com/zhaw/neural_style). + +#### 2. Prepare the environment required to work with the cloned repository: +1. Install packages dependency:
+```sh +sudo apt-get install python-tk +``` + +2. Install Python\* requirements: +```sh +pip3 install --user mxnet +pip3 install --user matplotlib +pip3 install --user scikit-image +``` + +#### 3. Download the pre-trained [VGG19 model](https://github.com/dmlc/web-data/raw/master/mxnet/neural-style/model/vgg19.params) and save it to the root directory of the cloned repository because the sample expects the model `vgg19.params` file to be in that directory.
+ +#### 4. Modify source code files of style transfer sample from cloned repository.
+ +1. Go to the `fast_mrf_cnn` subdirectory. +```sh +cd ./fast_mrf_cnn +``` + +2. Open the `symbol.py` file and modify the `decoder_symbol()` function. Replace. +```py +def decoder_symbol(): + data = mx.sym.Variable('data') + data = mx.sym.Convolution(data=data, num_filter=256, kernel=(3,3), pad=(1,1), stride=(1, 1), name='deco_conv1') +``` +with the following code:
+```py +def decoder_symbol_with_vgg(vgg_symbol): + data = mx.sym.Convolution(data=vgg_symbol, num_filter=256, kernel=(3,3), pad=(1,1), stride=(1, 1), name='deco_conv1') +``` + +3. Save and close the `symbol.py` file. + +4. Open and edit the `make_image.py` file: +Modify the `__init__()` function in the `Maker` class. Replace:
+```py +decoder = symbol.decoder_symbol() +``` +with the following code:
+```py +decoder = symbol.decoder_symbol_with_vgg(vgg_symbol) +``` + +5. To join the pre-trained weights with the decoder weights, make the following changes: +After the code lines for loading the decoder weights:
+```py +args = mx.nd.load('%s_decoder_args.nd'%model_prefix) +auxs = mx.nd.load('%s_decoder_auxs.nd'%model_prefix) +``` +add the following line:
+```py +arg_dict.update(args) +``` + +6. Use `arg_dict` instead of `args` as a parameter of the `decoder.bind()` function. Replace the line:
+```py +self.deco_executor = decoder.bind(ctx=mx.cpu(), args=args, aux_states=auxs) +``` +with the following:
+```py +self.deco_executor = decoder.bind(ctx=mx.cpu(), args=arg_dict, aux_states=auxs) +``` +7. Replace all `mx.gpu` with `mx.cpu` in the `decoder.bind()` function. +8. To save the result model as a `.json` file, add the following code to the end of the `generate()` function in the `Maker` class:
+```py +self.vgg_executor._symbol.save('{}-symbol.json'.format('vgg19')) +self.deco_executor._symbol.save('{}-symbol.json'.format('nst_vgg19')) +``` +9. Save and close the `make_image.py` file. + +#### 5. Run the sample with a decoder model according to the instructions from the `README.md` file in the cloned repository. +For example, to run the sample with the pre-trained decoder weights from the `models` folder and output shape, use the following code:
+```py +import make_image +maker = make_image.Maker('models/13', (1024, 768)) +maker.generate('output.jpg', '../images/tubingen.jpg') +``` +Where `'models/13'` string is composed of the following sub-strings: +* `'models/'` - path to the folder that contains .nd files with pre-trained styles weights and `'13'` +* Decoder prefix: the repository contains a default decoder, which is the 13_decoder. + +You can choose any style from [collection of pre-trained weights](https://pan.baidu.com/s/1skMHqYp). The `generate()` function generates `nst_vgg19-symbol.json` and `vgg19-symbol.json` files for the specified shape. In the code, it is [1024 x 768] for a 4:3 ratio, and you can specify another, for example, [224,224] for a square ratio. + +#### 6. Run the Model Optimizer to generate an Intermediate Representation (IR): + +1. Create a new directory. For example:
+```sh +mkdir nst_model +``` +2. Copy the initial and generated model files to the created directory. For example, to copy the pre-trained decoder weights from the `models` folder to the `nst_model` directory, run the following commands:
+```sh +cp nst_vgg19-symbol.json nst_model +cp vgg19-symbol.json nst_model +cp ../vgg19.params nst_model/vgg19-0000.params +cp models/13_decoder_args.nd nst_model +cp models/13_decoder_auxs.nd nst_model +``` +> **NOTE**: Make sure that all the `.params` and `.json` files are in the same directory as the `.nd` files. Otherwise, the conversion process fails. + +3. Run the Model Optimizer for MXNet. Use the `--nd_prefix_name` option to specify the decoder prefix and `--input_shape` to specify input shapes in [N,C,W,H] order. For example:
+```sh +python3 mo.py --input_symbol /nst_vgg19-symbol.json --framework mxnet --output_dir --input_shape [1,3,224,224] --nd_prefix_name 13_decoder --pretrained_model /vgg19-0000.params +``` +4. The IR is generated (`.bin`, `.xml` and `.mapping` files) in the specified output directory and ready to be consumed by the Inference Engine. diff --git a/docs/MO_DG/prepare_model/convert_model/onnx_specific/Convert_DLRM.md b/docs/MO_DG/prepare_model/convert_model/onnx_specific/Convert_DLRM.md new file mode 100644 index 00000000000000..2d12ee0a1e7b02 --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/onnx_specific/Convert_DLRM.md @@ -0,0 +1,29 @@ +# Convert ONNX* DLRM to the Intermediate Representation {#openvino_docs_MO_DG_prepare_model_convert_model_onnx_specific_Convert_DLRM} + +> **NOTE**: These instructions are currently deprecated. Since OpenVINO™ 2020.4 version, no specific steps are needed to convert ONNX\* DLRM models. For general instructions on converting ONNX models, please refer to [Converting a ONNX* Model](../Convert_Model_From_ONNX.md) topic. + +These instructions are applicable only to the DLRM converted to the ONNX* file format from the [facebookresearch/dlrm model](https://github.com/facebookresearch/dlrm). + +**Step 1**. Save trained Pytorch* model to ONNX* format. If you training model using [script provided in model repository](https://github.com/facebookresearch/dlrm/blob/master/dlrm_s_pytorch.py) just add `--save-onnx` flag to the command line parameters and you'll get `dlrm_s_pytorch.onnx` file containing model serialized in ONNX* format. + +**Step 2**. To generate the Intermediate Representation (IR) of the model, change your current working directory to the Model Optimizer installation directory and run the Model Optimizer with the following parameters: +```sh +python3 ./mo.py --input_model dlrm_s_pytorch.onnx +``` + +Note that Pytorch model uses operation `torch.nn.EmbeddingBag`. This operation converts to onnx as custom `ATen` layer and not directly supported by OpenVINO*, but it is possible to convert this operation to: +* `Gather` if each "bag" consists of exactly one index. In this case `offsets` input becomes obsolete and not needed. They will be removed during conversion. +* `ExperimentalSparseWeightedSum` if "bags" contain not just one index. In this case Model Optimizer will print warning that pre-process of offsets is needed, because `ExperimentalSparseWeightedSum` and `torch.nn.EmbeddingBag` have different format of inputs. +For example if you have `indices` input of shape [indices_shape] and `offsets` input of shape [num_bags] you need to get offsets of shape [indices_shape, 2]. To do that you may use the following code snippet: +```python +import numpy as np + +new_offsets = np.zeros((indices.shape[-1], 2), dtype=np.int32) +new_offsets[:, 1] = np.arange(indices.shape[-1]) +bag_index = 0 +for i in range(offsets.shape[-1] - 1): + new_offsets[offsets[i]:offsets[i + 1], 0] = bag_index + bag_index += 1 +new_offsets[offsets[-1]:, 0] = bag_index +``` +If you have more than one `torch.nn.EmbeddingBag` operation you'll need to do that for every offset input. If your offsets have same shape they will be merged into one input of shape [num_embedding_bags, indices_shape, 2]. diff --git a/docs/MO_DG/prepare_model/convert_model/onnx_specific/Convert_GPT2.md b/docs/MO_DG/prepare_model/convert_model/onnx_specific/Convert_GPT2.md new file mode 100644 index 00000000000000..c2117ee516877b --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/onnx_specific/Convert_GPT2.md @@ -0,0 +1,17 @@ +# Convert ONNX* GPT-2 Model to the Intermediate Representation {#openvino_docs_MO_DG_prepare_model_convert_model_onnx_specific_Convert_GPT2} + +[Public pre-trained GPT-2 model](https://github.com/onnx/models/tree/master/text/machine_comprehension/gpt-2) is a large +transformer-based language model with a simple objective: predict the next word, given all of the previous words within some text. + +## Download the Pre-Trained Base GPT-2 Model + +To download the model, click **Download** on [https://github.com/onnx/models/blob/master/text/machine_comprehension/gpt-2/model/gpt2-10.onnx](https://github.com/onnx/models/blob/master/text/machine_comprehension/gpt-2/model/gpt2-10.onnx). + +To download the model and sample test data, click **Download** on [https://github.com/onnx/models/blob/master/text/machine_comprehension/gpt-2/model/gpt2-10.tar.gz](https://github.com/onnx/models/blob/master/text/machine_comprehension/gpt-2/model/gpt2-10.tar.gz). + +## Convert ONNX* GPT-2 Model to IR + +To generate the Intermediate Representation (IR) of the model GPT-2, run the Model Optimizer with the following parameters: +```sh +python3 mo.py --input_model gpt2-10.onnx --input_shape [X,Y,Z] +``` diff --git a/docs/MO_DG/prepare_model/convert_model/onnx_specific/Convert_Mask_RCNN.md b/docs/MO_DG/prepare_model/convert_model/onnx_specific/Convert_Mask_RCNN.md new file mode 100644 index 00000000000000..7ef839d9621e80 --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/onnx_specific/Convert_Mask_RCNN.md @@ -0,0 +1,19 @@ +# Convert ONNX* Mask R-CNN Model to the Intermediate Representation {#openvino_docs_MO_DG_prepare_model_convert_model_onnx_specific_Convert_Mask_RCNN} + +These instructions are applicable only to the [Mask R-CNN model](https://onnxzoo.blob.core.windows.net/models/opset_10/mask_rcnn/mask_rcnn_R_50_FPN_1x.onnx) converted to the ONNX* file format from the [facebookresearch/maskrcnn-benchmark model](https://github.com/facebookresearch/maskrcnn-benchmark). + +**Step 1**. Download the [pre-trained model file](https://onnxzoo.blob.core.windows.net/models/opset_10/mask_rcnn/mask_rcnn_R_50_FPN_1x.onnx). + +**Step 2**. To generate the Intermediate Representation (IR) of the model, change your current working directory to the Model Optimizer installation directory and run the Model Optimizer with the following parameters: +```sh +python3 ./mo_onnx.py +--input_model mask_rcnn_R_50_FPN_1x.onnx \ +--input "0:2" \ +--input_shape [1,3,800,800] \ +--mean_values [102.9801,115.9465,122.7717] \ +--transformations_config ./extensions/front/onnx/mask_rcnn.json +``` + +Note that the height and width specified with the `input_shape` command line parameter could be different. Refer to the [documentation](https://github.com/onnx/models/tree/master/vision/object_detection_segmentation/mask-rcnn) for more information about supported input image dimensions and required pre- and post-processing steps. + +**Step 3**. Interpret the outputs. The generated IR file has several outputs: masks, class indices, probabilities and box coordinates. The first one is a layer with the name "6849/sink_port_0". The rest three are outputs from the "DetectionOutput" layer. diff --git a/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_BERT_From_Tensorflow.md b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_BERT_From_Tensorflow.md new file mode 100644 index 00000000000000..f9f815ca99cb8d --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_BERT_From_Tensorflow.md @@ -0,0 +1,121 @@ +# Convert TensorFlow* BERT Model to the Intermediate Representation {#openvino_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_BERT_From_Tensorflow} + +Pre-trained models for BERT (Bidirectional Encoder Representations from Transformers) are +[publicly available](https://github.com/google-research/bert). + +## Supported Models + +Currently, the following models from the [pre-trained BERT model list](https://github.com/google-research/bert#pre-trained-models) are supported: + +* `BERT-Base, Cased` +* `BERT-Base, Uncased` +* `BERT-Base, Multilingual Cased` +* `BERT-Base, Multilingual Uncased` +* `BERT-Base, Chinese` +* `BERT-Large, Cased` +* `BERT-Large, Uncased` + +## Download the Pre-Trained BERT Model + +Download and unzip an archive with the [BERT-Base, Multilingual Uncased Model](https://storage.googleapis.com/bert_models/2018_11_03/multilingual_L-12_H-768_A-12.zip). + +After the archive is unzipped, the directory `uncased_L-12_H-768_A-12` is created and contains the following files: +* `bert_config.json` +* `bert_model.ckpt.data-00000-of-00001` +* `bert_model.ckpt.index` +* `bert_model.ckpt.meta` +* `vocab.txt` + +Pre-trained model meta-graph files are `bert_model.ckpt.*`. + +## Convert TensorFlow BERT Model to IR + +To generate the BERT Intermediate Representation (IR) of the model, run the Model Optimizer with the following parameters: +```sh +python3 ./mo_tf.py +--input_meta_graph uncased_L-12_H-768_A-12/bert_model.ckpt.meta \ +--output bert/pooler/dense/Tanh \ +--disable_nhwc_to_nchw \ +--input Placeholder{i32},Placeholder_1{i32},Placeholder_2{i32} +``` + +Pre-trained models are not suitable for batch reshaping out-of-the-box because of multiple hardcoded shapes in the model. + +# Convert Reshape-able TensorFlow* BERT Model to the Intermediate Representation + +Follow these steps to make pre-trained TensorFlow BERT model reshape-able over batch dimension: +1. Download pre-trained BERT model you would like to use from the Supported Models list +2. Clone google-research/bert git repository: +```sh +https://github.com/google-research/bert.git +``` +3. Go to the root directory of the cloned repository:
+```sh +cd bert +``` +4. (Optional) Checkout to the commit that the conversion was tested on:
+```sh +git checkout eedf5716c +``` +5. Download script to load GLUE data: + * For UNIX*-like systems, run the following command: +```sh +wget https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py +``` + * For Windows* systems:
+ Download the [Python script](https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py) to the current working directory. +6. Download GLUE data by running: +```sh +python3 download_glue_data.py --tasks MRPC +``` +7. Open the file `modeling.py` in the text editor and delete lines 923-924. They should look like this: +```python + if not non_static_indexes: + return shape +``` +8. Open the file `run_classifier.py` and insert the following code after the line 645: +```python + import os, sys + from tensorflow.python.framework import graph_io + with tf.Session(graph=tf.get_default_graph()) as sess: + (assignment_map, initialized_variable_names) = \ + modeling.get_assignment_map_from_checkpoint(tf.trainable_variables(), init_checkpoint) + tf.train.init_from_checkpoint(init_checkpoint, assignment_map) + sess.run(tf.global_variables_initializer()) + frozen = tf.graph_util.convert_variables_to_constants(sess, sess.graph_def, ["bert/pooler/dense/Tanh"]) + graph_io.write_graph(frozen, './', 'inference_graph.pb', as_text=False) + print('BERT frozen model path {}'.format(os.path.join(os.path.dirname(__file__), 'inference_graph.pb'))) + sys.exit(0) +``` +Lines before the inserted code should look like this: +```python + (total_loss, per_example_loss, logits, probabilities) = create_model( + bert_config, is_training, input_ids, input_mask, segment_ids, label_ids, + num_labels, use_one_hot_embeddings) +``` +9. Set environment variables `BERT_BASE_DIR`, `BERT_REPO_DIR` and run the script `run_classifier.py` to create `inference_graph.pb` file in the root of the cloned BERT repository. +```sh +export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12 +export BERT_REPO_DIR=/current/working/directory + +python3 run_classifier.py \ + --task_name=MRPC \ + --do_eval=true \ + --data_dir=$BERT_REPO_DIR/glue_data/MRPC \ + --vocab_file=$BERT_BASE_DIR/vocab.txt \ + --bert_config_file=$BERT_BASE_DIR/bert_config.json \ + --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \ + --output_dir=./ +``` + +Run the Model Optimizer with the following command line parameters to generate reshape-able BERT Intermediate Representation (IR): +```sh +python3 ./mo_tf.py +--input_model inference_graph.pb +--input "IteratorGetNext:0{i32}[1 128],IteratorGetNext:1{i32}[1 128],IteratorGetNext:4{i32}[1 128]" +--disable_nhwc_to_nchw +--keep_shape_ops +``` +For other applicable parameters, refer to [Convert Model from TensorFlow](../Convert_Model_From_TensorFlow.md). + +For more information about reshape abilities, refer to [Using Shape Inference](../../../../IE_DG/ShapeInference.md). diff --git a/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_CRNN_From_Tensorflow.md b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_CRNN_From_Tensorflow.md new file mode 100644 index 00000000000000..906ca8c1e4d3cc --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_CRNN_From_Tensorflow.md @@ -0,0 +1,55 @@ +# Convert CRNN* Models to the Intermediate Representation (IR) {#openvino_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_CRNN_From_Tensorflow} + +This tutorial explains how to convert a CRNN model to Intermediate Representation (IR). + +On GitHub*, you can find several public versions of TensorFlow\* CRNN model implementation. This tutorial explains how to convert the model from +the [https://github.com/MaybeShewill-CV/CRNN_Tensorflow](https://github.com/MaybeShewill-CV/CRNN_Tensorflow) repository to IR. If you +have another implementation of CRNN model, you can convert it to IR in similar way: you need to get inference graph and run the Model Optimizer on it. + +**To convert this model to the IR:** + +**Step 1.** Clone this GitHub repository and checkout the commit: + 1. Clone reposirory: +```sh + git clone https://github.com/MaybeShewill-CV/CRNN_Tensorflow.git +``` + 2. Checkout necessary commit: +```sh +git checkout 64f1f1867bffaacfeacc7a80eebf5834a5726122 +``` + +**Step 2.** Train the model using framework or use the pretrained checkpoint provided in this repository. + +**Step 3.** Create an inference graph: + 1. Go to the `CRNN_Tensorflow` directory with the cloned repository: +```sh +cd path/to/CRNN_Tensorflow +``` + 2. Add `CRNN_Tensorflow` folder to `PYTHONPATH`. + * For Linux\* OS: +```sh +export PYTHONPATH="${PYTHONPATH}:/path/to/CRNN_Tensorflow/" +``` + * For Windows\* OS add `/path/to/CRNN_Tensorflow/` to the `PYTHONPATH` environment variable in settings. + 3. Open the `tools/demo_shadownet.py` script. After `saver.restore(sess=sess, save_path=weights_path)` line, add the following code: +```python +from tensorflow.python.framework import graph_io +frozen = tf.graph_util.convert_variables_to_constants(sess, sess.graph_def, ['shadow/LSTMLayers/transpose_time_major']) +graph_io.write_graph(frozen, '.', 'frozen_graph.pb', as_text=False) +``` + 4. Run the demo with the following command: +```sh +python tools/demo_shadownet.py --image_path data/test_images/test_01.jpg --weights_path model/shadownet/shadownet_2017-10-17-11-47-46.ckpt-199999 +``` + If you want to use your checkpoint, replace the path in the `--weights_path` parameter with a path to your checkpoint. + 5. In the `CRNN_Tensorflow` directory, you will find the inference CRNN graph `frozen_graph.pb`. You can use this graph with the OpenVINO™ toolkit + to convert the model into IR and run inference. + +**Step 4.** Convert the model into IR: +```sh +python3 path/to/model_optimizer/mo_tf.py --input_model path/to/your/CRNN_Tensorflow/frozen_graph.pb +``` + + + + diff --git a/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_DeepSpeech_From_Tensorflow.md b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_DeepSpeech_From_Tensorflow.md new file mode 100644 index 00000000000000..4b8bd1e40484f8 --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_DeepSpeech_From_Tensorflow.md @@ -0,0 +1,72 @@ +# Convert TensorFlow* DeepSpeech Model to the Intermediate Representation {#openvino_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_DeepSpeech_From_Tensorflow} + +[DeepSpeech project](https://github.com/mozilla/DeepSpeech) provides an engine to train speech-to-text models. + +## Download the Pre-Trained DeepSpeech Model + +[Pre-trained English speech-to-text model](https://github.com/mozilla/DeepSpeech#getting-the-pre-trained-model) +is publicly available. To download the model, please follow the instruction below: + +* For UNIX*-like systems, run the following command: +``` +wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.3.0/deepspeech-0.3.0-models.tar.gz | tar xvfz - +``` +* For Windows* systems: + 1. Download the archive from the DeepSpeech project repository: [https://github.com/mozilla/DeepSpeech/releases/download/v0.3.0/deepspeech-0.3.0-models.tar.gz](https://github.com/mozilla/DeepSpeech/releases/download/v0.3.0/deepspeech-0.3.0-models.tar.gz). + 2. Unpack it with a file archiver application. + +After you unpack the archive with the pre-trained model, you will have the new `models` directory with the +following files: +``` +alphabet.txt +lm.binary +output_graph.pb +output_graph.pbmm +output_graph.rounded.pb +output_graph.rounded.pbmm +trie +``` + +Pre-trained frozen model file is `output_graph.pb`. + +![DeepSpeech model view](../../../img/DeepSpeech.png) + +As you can see, the frozen model still has two variables: `previous_state_c` and +`previous_state_h`. It means that the model keeps training those variables at each inference. + +At the first inference of this graph, the variables are initialized by zero tensors. After executing the +`lstm_fused_cell` nodes, cell state and hidden state, which are the results of the `BlockLSTM` execution, +are assigned to these two variables. + +With each inference of the DeepSpeech graph, initial cell state and hidden state data for `BlockLSTM` is taken +from previous inference from variables. Outputs (cell state and hidden state) of `BlockLSTM` are reassigned +to the same variables. + +It helps the model to remember the context of the words that it takes as input. + +## Convert the TensorFlow* DeepSpeech Model to IR + +The Model Optimizer assumes that the output model is for inference only. That is why you should cut those variables off and +resolve keeping cell and hidden states on the application level. + +There are certain limitations for the model conversion: +- Time length (`time_len`) and sequence length (`seq_len`) are equal. +- Original model cannot be reshaped, so you should keep original shapes. + +To generate the DeepSpeech Intermediate Representation (IR), provide the TensorFlow DeepSpeech model to the Model Optimizer with the following parameters: +```sh +python3 ./mo_tf.py +--input_model path_to_model/output_graph.pb \ +--freeze_placeholder_with_value input_lengths->[16] \ +--input input_node,previous_state_h/read,previous_state_c/read \ +--input_shape [1,16,19,26],[1,2048],[1,2048] \ +--output raw_logits,lstm_fused_cell/GatherNd,lstm_fused_cell/GatherNd_1 \ +--disable_nhwc_to_nchw +``` + +Where: +* `--freeze_placeholder_with_value input_lengths->[16]` freezes sequence length +* `--input input_node,previous_state_h/read,previous_state_c/read` and +`--input_shape [1,16,19,26],[1,2048],[1,2048]` replace the variables with a placeholder +* `--output raw_logits,lstm_fused_cell/GatherNd,lstm_fused_cell/GatherNd_1` gets data for the next model +execution. diff --git a/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_FaceNet_From_Tensorflow.md b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_FaceNet_From_Tensorflow.md new file mode 100644 index 00000000000000..0c229d5469cdb2 --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_FaceNet_From_Tensorflow.md @@ -0,0 +1,28 @@ +# Convert TensorFlow* FaceNet Models to Intermediate Representation {#openvino_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_FaceNet_From_Tensorflow} + +[Public pre-trained FaceNet models](https://github.com/davidsandberg/facenet#pre-trained-models) contain both training +and inference part of graph. Switch between this two states is manageable with placeholder value. +Intermediate Representation (IR) models are intended for inference, which means that train part is redundant. + +There are two inputs in this network: boolean `phase_train` which manages state of the graph (train/infer) and +`batch_size` which is a part of batch joining pattern. + + +![FaceNet model view](../../../img/FaceNet.png) + +## Convert TensorFlow FaceNet Model to IR + +To generate FaceNet IR provide TensorFlow FaceNet model to Model Optimizer with parameters: +```sh +python3 ./mo_tf.py +--input_model path_to_model/model_name.pb \ +--freeze_placeholder_with_value "phase_train->False" +``` + +Batch joining pattern transforms to placeholder with model default shape if `--input_shape` or `--batch`/`-b` was not +provided. Otherwise, placeholder shape has custom parameters. + +* `--freeze_placeholder_with_value "phase_train->False"` to switch graph to inference mode +* `--batch`/`-b` is applicable to override original network batch +* `--input_shape` is applicable with or without `--input` +* other options are applicable diff --git a/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_GNMT_From_Tensorflow.md b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_GNMT_From_Tensorflow.md new file mode 100644 index 00000000000000..587e2f53db344d --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_GNMT_From_Tensorflow.md @@ -0,0 +1,277 @@ +# Convert GNMT* Model to the Intermediate Representation (IR) {#openvino_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_GNMT_From_Tensorflow} + +This tutorial explains how to convert Google\* Neural Machine Translation (GNMT) model to the Intermediate Representation (IR). + +On GitHub*, you can find several public versions of TensorFlow* GNMT model implementation. This tutorial explains how to convert the GNMT model from the [TensorFlow* Neural Machine Translation (NMT) repository](https://github.com/tensorflow/nmt) to the IR. + +## Create a Patch File + +Before converting the model, you need to create a patch file for the repository. The patch modifies the framework code by adding a special command-line argument to the framework options that enables inference graph dumping: + +1. Go to a writable directory and create a `GNMT_inference.patch` file. +2. Copy the following diff code to the file: +```git +diff --git a/nmt/inference.py b/nmt/inference.py +index 2cbef07..e185490 100644 +--- a/nmt/inference.py ++++ b/nmt/inference.py +@@ -17,9 +17,11 @@ + from __future__ import print_function + + import codecs ++import os + import time + + import tensorflow as tf ++from tensorflow.python.framework import graph_io + + from . import attention_model + from . import gnmt_model +@@ -105,6 +107,29 @@ def start_sess_and_load_model(infer_model, ckpt_path): + return sess, loaded_infer_model + + ++def inference_dump_graph(ckpt_path, path_to_dump, hparams, scope=None): ++ model_creator = get_model_creator(hparams) ++ infer_model = model_helper.create_infer_model(model_creator, hparams, scope) ++ sess = tf.Session( ++ graph=infer_model.graph, config=utils.get_config_proto()) ++ with infer_model.graph.as_default(): ++ loaded_infer_model = model_helper.load_model( ++ infer_model.model, ckpt_path, sess, "infer") ++ utils.print_out("Dumping inference graph to {}".format(path_to_dump)) ++ loaded_infer_model.saver.save( ++ sess, ++ os.path.join(path_to_dump + 'inference_GNMT_graph') ++ ) ++ utils.print_out("Dumping done!") ++ ++ output_node_name = 'index_to_string_Lookup' ++ utils.print_out("Freezing GNMT graph with output node {}...".format(output_node_name)) ++ frozen = tf.graph_util.convert_variables_to_constants(sess, sess.graph_def, ++ [output_node_name]) ++ graph_io.write_graph(frozen, '.', os.path.join(path_to_dump, 'frozen_GNMT_inference_graph.pb'), as_text=False) ++ utils.print_out("Freezing done. Freezed model frozen_GNMT_inference_graph.pb saved to {}".format(path_to_dump)) ++ ++ + def inference(ckpt_path, + inference_input_file, + inference_output_file, +diff --git a/nmt/nmt.py b/nmt/nmt.py +index f5823d8..a733748 100644 +--- a/nmt/nmt.py ++++ b/nmt/nmt.py +@@ -310,6 +310,13 @@ def add_arguments(parser): + parser.add_argument("--num_intra_threads", type=int, default=0, + help="number of intra_op_parallelism_threads") + ++ # Special argument for inference model dumping without inference ++ parser.add_argument("--dump_inference_model", type="bool", nargs="?", ++ const=True, default=False, ++ help="Argument for dump inference graph for specified trained ckpt") ++ ++ parser.add_argument("--path_to_dump", type=str, default="", ++ help="Path to dump inference graph.") + + def create_hparams(flags): + """Create training hparams.""" +@@ -396,6 +403,9 @@ def create_hparams(flags): + language_model=flags.language_model, + num_intra_threads=flags.num_intra_threads, + num_inter_threads=flags.num_inter_threads, ++ ++ dump_inference_model=flags.dump_inference_model, ++ path_to_dump=flags.path_to_dump, + ) + + +@@ -613,7 +623,7 @@ def create_or_load_hparams( + return hparams + + +-def run_main(flags, default_hparams, train_fn, inference_fn, target_session=""): ++def run_main(flags, default_hparams, train_fn, inference_fn, inference_dump, target_session=""): + """Run main.""" + # Job + jobid = flags.jobid +@@ -653,8 +663,26 @@ def run_main(flags, default_hparams, train_fn, inference_fn, target_session=""): + out_dir, default_hparams, flags.hparams_path, + save_hparams=(jobid == 0)) + +- ## Train / Decode +- if flags.inference_input_file: ++ # Dumping inference model ++ if flags.dump_inference_model: ++ # Inference indices ++ hparams.inference_indices = None ++ if flags.inference_list: ++ (hparams.inference_indices) = ( ++ [int(token) for token in flags.inference_list.split(",")]) ++ ++ # Ckpt ++ ckpt = flags.ckpt ++ if not ckpt: ++ ckpt = tf.train.latest_checkpoint(out_dir) ++ ++ # Path to dump graph ++ assert flags.path_to_dump != "", "Please, specify path_to_dump model." ++ path_to_dump = flags.path_to_dump ++ if not tf.gfile.Exists(path_to_dump): tf.gfile.MakeDirs(path_to_dump) ++ ++ inference_dump(ckpt, path_to_dump, hparams) ++ elif flags.inference_input_file: + # Inference output directory + trans_file = flags.inference_output_file + assert trans_file +@@ -693,7 +721,8 @@ def main(unused_argv): + default_hparams = create_hparams(FLAGS) + train_fn = train.train + inference_fn = inference.inference +- run_main(FLAGS, default_hparams, train_fn, inference_fn) ++ inference_dump = inference.inference_dump_graph ++ run_main(FLAGS, default_hparams, train_fn, inference_fn, inference_dump) + + + if __name__ == "__main__": + +``` +3. Save and close the file. + +## Convert GNMT Model to the IR + +> **NOTE**: Please, use TensorFlow version 1.13 or lower. + +**Step 1**. Clone the GitHub repository and check out the commit: + +1. Clone the NMT reposirory: +```sh +git clone https://github.com/tensorflow/nmt.git +``` +2. Check out the necessary commit: +```sh +git checkout b278487980832417ad8ac701c672b5c3dc7fa553 +``` + +**Step 2**. Get a trained model. You have two options: + +* Train the model with the GNMT `wmt16_gnmt_4_layer.json` or `wmt16_gnmt_8_layer.json` configuration file using the NMT framework. +* Use the pretrained checkpoints provided in the NMT repository. Refer to the [Benchmarks](https://github.com/tensorflow/nmt#benchmarks) section for more information (*checkpoints in this section are outdated and can be incompatible with the current repository version. To avoid confusion, train a model by yourself*). + +This tutorial assumes the use of the trained GNMT model from `wmt16_gnmt_4_layer.json` config, German to English translation. + +**Step 3**. Create an inference graph: + +The OpenVINO™ assumes that a model is used for inference only. Hence, before converting the model into the IR, you need to transform the training graph into the inference graph. +For the GNMT model, the training graph and the inference graph have different decoders: the training graph uses a greedy search decoding algorithm, while the inference graph uses a beam search decoding algorithm. + +1. Apply the `GNMT_inference.patch` patch to the repository. Refer to the Create a Patch File instructions if you do not have it: +```sh + git apply /path/to/patch/GNMT_inference.patch +``` + +2. Run the NMT framework to dump the inference model: + +```sh +python -m nmt.nmt + --src=de + --tgt=en + --ckpt=/path/to/ckpt/translate.ckpt + --hparams_path=/path/to/repository/nmt/nmt/standard_hparams/wmt16_gnmt_4_layer.json + --vocab_prefix=/path/to/vocab/vocab.bpe.32000 + --out_dir="" + --dump_inference_model + --infer_mode beam_search + --path_to_dump /path/to/dump/model/ +``` + +If you use different checkpoints, use the corresponding values for the `src`,`tgt`,`ckpt`,`hparams_path`, and `vocab_prefix` parameters. +Inference checkpoint `inference_GNMT_graph` and frozen inference graph `frozen_GNMT_inference_graph.pb` will appear in the `/path/to/dump/model/` folder. + +To generate `vocab.bpe.32000`, execute the `nmt/scripts/wmt16_en_de.sh` script. If you face an issue of a size mismatch between the checkpoint graph's embedding layer and vocabulary (both src and target), we recommend you to add the following code to the `nmt.py` file to the `extend_hparams` function after the line 508 (after initialization of the `src_vocab_size` and `tgt_vocab_size` variables): +```py +src_vocab_size -= 1 +tgt_vocab_size -= 1 +``` + +**Step 4**. Convert the model to the IR: + +```sh +python3 path/to/model_optimizer/mo_tf.py +--input_model /path/to/dump/model/frozen_GNMT_inference_graph.pb +--input "IteratorGetNext:1{i32}[1],IteratorGetNext:0{i32}[1 50],dynamic_seq2seq/hash_table_Lookup_1:0[1]->[2],dynamic_seq2seq/hash_table_Lookup:0[1]->[1]" +--output dynamic_seq2seq/decoder/decoder/GatherTree +--output_dir /path/to/output/IR/ +``` + +Input and output cutting with the `--input` and `--output` options is required since OpenVINO™ does not support `IteratorGetNext` and `LookupTableFindV2` operations. + +Input cutting: + +* `IteratorGetNext` operation iterates over a dataset. It is cut by output ports: port 0 contains data tensor with shape `[batch_size, max_sequence_length]`, port 1 contains `sequence_length` for every batch with shape `[batch_size]`. + +* `LookupTableFindV2` operations (`dynamic_seq2seq/hash_table_Lookup_1` and `dynamic_seq2seq/hash_table_Lookup` nodes in the graph) are cut with constant values). + +Output cutting: + +* `LookupTableFindV2` operation is cut from the output and the `dynamic_seq2seq/decoder/decoder/GatherTree` node is treated as a new exit point. + +For more information about model cutting, refer to [Cutting Off Parts of a Model](../Cutting_Model.md). + +## How to Use GNMT Model + +> **NOTE**: This step assumes you have converted a model to the Intermediate Representation. + +Inputs of the model: +* `IteratorGetNext/placeholder_out_port_0` input with shape `[batch_size, max_sequence_length]` contains `batch_size` decoded input sentences. + Every sentence is decoded the same way as indices of sentence elements in vocabulary and padded with index of `eos` (end of sentence symbol). If the length of the sentence is less than `max_sequence_length`, remaining elements are filled with index of `eos` token. + +* `IteratorGetNext/placeholder_out_port_1` input with shape `[batch_size]` contains sequence lengths for every sentence from the first input. \ + For example, if `max_sequence_length = 50`, `batch_size = 1` and the sentence has only 30 elements, then the input tensor for `IteratorGetNext/placeholder_out_port_1` should be `[30]`. + + +Outputs of the model: + +* `dynamic_seq2seq/decoder/decoder/GatherTree` tensor with shape `[max_sequence_length * 2, batch, beam_size]`, + that contains `beam_size` best translations for every sentence from input (also decoded as indices of words in + vocabulary). \ +> **NOTE**: Shape of this tensor in TensorFlow\* can be different: instead of `max_sequence_length * 2`, it can be any value less than that, because OpenVINO™ does not support dynamic shapes of outputs, while TensorFlow can stop decoding iterations when `eos` symbol is generated.* + +#### How to RUN GNMT IR + +1. With benchmark app: +```sh +python3 benchmark_app.py -m -d CPU +``` + + +2. With Inference Engine Python API: + +> **NOTE**: Before running the example, insert a path to your GNMT `.xml` and `.bin` files into `MODEL_PATH` and `WEIGHTS_PATH`, and fill `input_data_tensor` and `seq_lengths` tensors according to your input data. + +```python +from openvino.inference_engine import IENetwork, IECore + +MODEL_PATH = '/path/to/IR/frozen_GNMT_inference_graph.xml' +WEIGHTS_PATH = '/path/to/IR/frozen_GNMT_inference_graph.bin' + +# Creating network +net = IENetwork( + model=MODEL_PATH, + weights=WEIGHTS_PATH) + +# Creating input data +input_data = {'IteratorGetNext/placeholder_out_port_0': input_data_tensor, + 'IteratorGetNext/placeholder_out_port_1': seq_lengths} + +# Creating plugin and loading extensions +ie = IECore() +ie.add_extension(extension_path="libcpu_extension.so", device_name="CPU") + +# Loading network +exec_net = ie.load_network(network=net, device_name="CPU") + +# Run inference +result_ie = exec_net.infer(input_data) +``` + +For more information about Python API, refer to [Inference Engine Python API Overview](../../../../../inference-engine/ie_bridges/python/docs/api_overview.md). diff --git a/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_NCF_From_Tensorflow.md b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_NCF_From_Tensorflow.md new file mode 100644 index 00000000000000..6e03abe921d71a --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_NCF_From_Tensorflow.md @@ -0,0 +1,48 @@ +# Convert Neural Collaborative Filtering Model from TensorFlow* to the Intermediate Representation {#openvino_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_NCF_From_Tensorflow} + +This tutorial explains how to convert Neural Collaborative Filtering (NCF) model to Intermediate Representation (IR). + +[Public TensorFlow NCF model](https://github.com/tensorflow/models/tree/master/official/recommendation) does not contain + pretrained weights. To convert this model to the IR: + 1. Use [the instructions](https://github.com/tensorflow/models/tree/master/official/recommendation#train-and-evaluate-model) from this repository to train the model. + 2. Freeze the inference graph you get on previous step in `model_dir` following +the instructions from the Freezing Custom Models in Python* section of +[Converting a TensorFlow* Model](../Convert_Model_From_TensorFlow.md). +Run the following commands: +```python +import tensorflow as tf +from tensorflow.python.framework import graph_io + +sess = tf.Session() +saver = tf.train.import_meta_graph("/path/to/model/model.meta") +saver.restore(sess, tf.train.latest_checkpoint('/path/to/model/')) + +frozen = tf.graph_util.convert_variables_to_constants(sess, sess.graph_def, \ + ["rating/BiasAdd"]) +graph_io.write_graph(frozen, './', 'inference_graph.pb', as_text=False) +``` +where `rating/BiasAdd` is an output node. + + 3. Convert the model to the IR.If you look at your frozen model, you can see that +it has one input that is split to four `ResourceGather` layers. + +![NCF model beginning](../../../img/NCF_start.png) + + But as the Model Optimizer does not support such data feeding, you should skip it. Cut +the edges incoming in `ResourceGather`s port 1: +```sh +python3 mo_tf.py --input_model inference_graph.pb \ +--input 1:embedding/embedding_lookup,1:embedding_1/embedding_lookup,\ +1:embedding_2/embedding_lookup,1:embedding_3/embedding_lookup \ +--input_shape [256],[256],[256],[256] +``` +Where 256 is a `batch_size` you choose for your model. + +Alternatively, you can do steps 2 and 3 in one command line: +```sh +python3 mo_tf.py --input_meta_graph /path/to/model/model.meta \ +--input 1:embedding/embedding_lookup,1:embedding_1/embedding_lookup,\ +1:embedding_2/embedding_lookup,1:embedding_3/embedding_lookup \ +--input_shape [256],[256],[256],[256] --output rating/BiasAdd +``` + diff --git a/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_Object_Detection_API_Models.md b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_Object_Detection_API_Models.md new file mode 100644 index 00000000000000..0f14e8f2cd99c2 --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_Object_Detection_API_Models.md @@ -0,0 +1,1045 @@ +# Converting TensorFlow* Object Detection API Models {#openvino_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_Object_Detection_API_Models} + +> **NOTES**: +> +> * Starting with the 2019 R1 release, the Model Optimizer supports the `--keep_shape_ops` command line parameter that allows you to convert the TensorFlow\* Object Detection API Faster and Mask RCNNs topologies so they can be re-shaped in the Inference Engine using dedicated reshape API. Refer to [Using Shape Inference](../../../../IE_DG/ShapeInference.md) for more information on how to use this feature. It is possible to change the both spatial dimensions of the input image and batch size. +> * Starting with the 2018 R4 release, the Model Optimizer supports the `--input_shape` command line parameter for the TensorFlow\* Object Detection API topologies. Refer to the [Custom Input Shape](#tf_od_custom_input_shape) for more information. +> * To generate IRs for SSD topologies, the Model Optimizer creates a number of `PriorBoxClustered` layers instead of a constant node with prior boxes calculated for the particular input image size. This change allows you to reshape the topology in the Inference Engine using dedicated Inference Engine API. The reshaping is supported for all SSD topologies except FPNs which contain hardcoded shapes for some operations preventing from changing topology input shape. + +## How to Convert a Model + +With 2018 R3 release, the Model Optimizer introduces a new approach to convert models created using the TensorFlow\* Object Detection API. Compared with the previous approach, the new process produces inference results with higher accuracy and does not require modifying any configuration files and providing intricate command line parameters. + +You can download TensorFlow\* Object Detection API models from the Object Detection Model Zoo. + +NOTE: Before converting, make sure you have configured the Model Optimizer. For configuration steps, refer to [Configuring the Model Optimizer](../../Config_Model_Optimizer.md). + +To convert a TensorFlow\* Object Detection API model, go to the `/deployment_tools/model_optimizer` directory and run the `mo_tf.py` script with the following required parameters: + +* `--input_model ` --- File with a pre-trained model (binary or text .pb file after freezing) +* `--transformations_config ` --- A subgraph replacement configuration file with transformations description. For the models downloaded from the TensorFlow\* Object Detection API zoo, you can find the configuration files in the `/deployment_tools/model_optimizer/extensions/front/tf` directory. Use: + * `ssd_v2_support.json` --- for frozen SSD topologies from the models zoo version up to 1.13.X inclusively + * `ssd_support_api_v.1.14.json` --- for frozen SSD topologies trained manually using the TensorFlow* Object Detection API version 1.14 up to 1.14.X inclusively + * `ssd_support_api_v.1.15.json` --- for frozen SSD topologies trained manually using the TensorFlow* Object Detection API version 1.15 or higher + * `faster_rcnn_support.json` --- for frozen Faster R-CNN topologies from the models zoo + * `faster_rcnn_support_api_v1.7.json` --- for Faster R-CNN topologies trained manually using the TensorFlow* Object Detection API version 1.7.0 up to 1.9.X inclusively + * `faster_rcnn_support_api_v1.10.json` --- for Faster R-CNN topologies trained manually using the TensorFlow* Object Detection API version 1.10.0 up to 1.12.X inclusively + * `faster_rcnn_support_api_v1.13.json` --- for Faster R-CNN topologies trained manually using the TensorFlow* Object Detection API version 1.13.X + * `faster_rcnn_support_api_v1.14.json` --- for Faster R-CNN topologies trained manually using the TensorFlow* Object Detection API version 1.14.0 up to 1.14.X inclusively + * `faster_rcnn_support_api_v1.15.json` --- for Faster R-CNN topologies trained manually using the TensorFlow* Object Detection API version 1.15.0 or higher + * `mask_rcnn_support.json` --- for frozen Mask R-CNN topologies from the models zoo + * `mask_rcnn_support_api_v1.7.json` --- for Mask R-CNN topologies trained manually using the TensorFlow* Object Detection API version 1.7.0 up to 1.9.X inclusively + * `mask_rcnn_support_api_v1.11.json` --- for Mask R-CNN topologies trained manually using the TensorFlow* Object Detection API version 1.11.0 up to 1.12.X inclusively + * `mask_rcnn_support_api_v1.13.json` --- for Mask R-CNN topologies trained manually using the TensorFlow* Object Detection API version 1.13.0 up to 1.13.X inclusively + * `mask_rcnn_support_api_v1.14.json` --- for Mask R-CNN topologies trained manually using the TensorFlow* Object Detection API version 1.14.0 up to 1.14.X inclusively + * `mask_rcnn_support_api_v1.15.json` --- for Mask R-CNN topologies trained manually using the TensorFlow* Object Detection API version 1.15.0 or higher + * `rfcn_support.json` --- for the frozen RFCN topology from the models zoo frozen with TensorFlow\* version 1.9.0 or lower. + * `rfcn_support_api_v1.10.json` --- for the frozen RFCN topology from the models zoo frozen with TensorFlow\* version 1.10.0 up to 1.12.X inclusively + * `rfcn_support_api_v1.13.json` --- for the frozen RFCN topology from the models zoo frozen with TensorFlow\* version 1.13.X. + * `rfcn_support_api_v1.14.json` --- for the frozen RFCN topology from the models zoo frozen with TensorFlow\* version 1.14.0 or higher. +* `--tensorflow_object_detection_api_pipeline_config ` --- A special configuration file that describes the topology hyper-parameters and structure of the TensorFlow Object Detection API model. For the models downloaded from the TensorFlow\* Object Detection API zoo, the configuration file is named `pipeline.config`. If you plan to train a model yourself, you can find templates for these files in the [models repository](https://github.com/tensorflow/models/tree/master/research/object_detection/samples/configs). +* `--input_shape` (optional) --- A custom input image shape. Refer to [Custom Input Shape](#tf_od_custom_input_shape) for more information how the `--input_shape` parameter is handled for the TensorFlow* Object Detection API models. + +> **NOTE:** The color channel order (RGB or BGR) of an input data should match the channel order of the model training dataset. If they are different, perform the `RGB<->BGR` conversion specifying the command-line parameter: `--reverse_input_channels`. Otherwise, inference results may be incorrect. If you convert a TensorFlow\* Object Detection API model to use with the Inference Engine sample applications, you must specify the `--reverse_input_channels` parameter. For more information about the parameter, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../Converting_Model_General.md). + +Additionally to the mandatory parameters listed above you can use optional conversion parameters if needed. A full list of parameters is available in the [Converting a TensorFlow* Model](../Convert_Model_From_TensorFlow.md) topic. + +For example, if you downloaded the [pre-trained SSD InceptionV2 topology](http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2018_01_28.tar.gz) and extracted archive to the directory `/tmp/ssd_inception_v2_coco_2018_01_28`, the sample command line to convert the model looks as follows: + +``` +/deployment_tools/model_optimizer/mo_tf.py --input_model=/tmp/ssd_inception_v2_coco_2018_01_28/frozen_inference_graph.pb --transformations_config /deployment_tools/model_optimizer/extensions/front/tf/ssd_v2_support.json --tensorflow_object_detection_api_pipeline_config /tmp/ssd_inception_v2_coco_2018_01_28/pipeline.config --reverse_input_channels +``` + +## Custom Input Shape +Model Optimizer handles command line parameter `--input_shape` for TensorFlow\* Object Detection API models in a special way depending on the image resizer type defined in the `pipeline.config` file. TensorFlow\* Object Detection API generates different `Preprocessor` sub-graph based on the image resizer type. Model Optimizer supports two types of image resizer: +* `fixed_shape_resizer` --- *Stretches* input image to the specific height and width. The `pipeline.config` snippet below shows a `fixed_shape_resizer` sample definition: +``` +image_resizer { + fixed_shape_resizer { + height: 300 + width: 300 + } +} +``` +* `keep_aspect_ratio_resizer` --- Resizes the input image *keeping aspect ratio* to satisfy the minimum and maximum size constraints. The `pipeline.config` snippet below shows a `keep_aspect_ratio_resizer` sample definition: +``` +image_resizer { + keep_aspect_ratio_resizer { + min_dimension: 600 + max_dimension: 1024 + } +} +``` + +### Fixed Shape Resizer Replacement +* If the `--input_shape` command line parameter is not specified, the Model Optimizer generates an input layer with the height and width as defined in the `pipeline.config`. + +* If the `--input_shape [1, H, W, 3]` command line parameter is specified, the Model Optimizer sets the input layer height to `H` and width to `W` and convert the model. However, the conversion may fail because of the following reasons: + * The model is not reshape-able, meaning that it's not possible to change the size of the model input image. For example, SSD FPN models have `Reshape` operations with hard-coded output shapes, but the input size to these `Reshape` instances depends on the input image size. In this case, the Model Optimizer shows an error during the shape inference phase. Run the Model Optimizer with `--log_level DEBUG` to see the inferred layers output shapes to see the mismatch. + * Custom input shape is too small. For example, if you specify `--input_shape [1,100,100,3]` to convert a SSD Inception V2 model, one of convolution or pooling nodes decreases input tensor spatial dimensions to non-positive values. In this case, the Model Optimizer shows error message like this: '[ ERROR ] Shape [ 1 -1 -1 256] is not fully defined for output X of "node_name".' + + +### Keep Aspect Ratio Resizer Replacement +* If the `--input_shape` command line parameter is not specified, the Model Optimizer generates an input layer with both height and width equal to the value of parameter `min_dimension` in the `keep_aspect_ratio_resizer`. + +* If the `--input_shape [1, H, W, 3]` command line parameter is specified, the Model Optimizer scales the specified input image height `H` and width `W` to satisfy the `min_dimension` and `max_dimension` constraints defined in the `keep_aspect_ratio_resizer`. The following function calculates the input layer height and width: + +```python +def calculate_shape_keeping_aspect_ratio(H: int, W: int, min_dimension: int, max_dimension: int): + ratio_min = min_dimension / min(H, W) + ratio_max = max_dimension / max(H, W) + ratio = min(ratio_min, ratio_max) + return int(round(H * ratio)), int(round(W * ratio)) +``` + +Models with `keep_aspect_ratio_resizer` were trained to recognize object in real aspect ratio, in contrast with most of the classification topologies trained to recognize objects stretched vertically and horizontally as well. By default, the Model Optimizer converts topologies with `keep_aspect_ratio_resizer` to consume a square input image. If the non-square image is provided as input, it is stretched without keeping aspect ratio that results to objects detection quality decrease. + +> **NOTE**: It is highly recommended to specify the `--input_shape` command line parameter for the models with `keep_aspect_ratio_resizer` if the input image dimensions are known in advance. + +## Important Notes About Feeding Input Images to the Samples + +Inference Engine comes with a number of samples that use Object Detection API models including: + +* [Object Detection for SSD Sample](../../../../../inference-engine/samples/object_detection_sample_ssd/README.md) --- for RFCN, SSD and Faster R-CNNs +* [Mask R-CNN Sample for TensorFlow* Object Detection API Models](@ref omz_demos_mask_rcnn_demo_README) --- for Mask R-CNNs + +There are a number of important notes about feeding input images to the samples: + +1. Inference Engine samples stretch input image to the size of the input layer without preserving aspect ratio. This behavior is usually correct for most topologies (including SSDs), but incorrect for the following Faster R-CNN topologies: Inception ResNet, Inception V2, ResNet50 and ResNet101. Images pre-processing for these topologies keeps aspect ratio. Also all Mask R-CNN and R-FCN topologies require keeping aspect ratio. The type of pre-processing is defined in the pipeline configuration file in the section `image_resizer`. If keeping aspect ratio is required, then it is necessary to resize image before passing it to the sample. + +2. TensorFlow\* implementation of image resize may be different from the one implemented in the sample. Even reading input image from compressed format (like `.jpg`) could give different results in the sample and TensorFlow\*. So, if it is necessary to compare accuracy between the TensorFlow\* and the Inference Engine it is recommended to pass pre-scaled input image in a non-compressed format (like `.bmp`). + +3. If you want to infer the model with the Inference Engine samples, convert the model specifying the `--reverse_input_channels` command line parameter. The samples load images in BGR channels order, while TensorFlow* models were trained with images in RGB order. When the `--reverse_input_channels` command line parameter is specified, the Model Optimizer performs first convolution or other channel dependent operation weights modification so the output will be like the image is passed with RGB channels order. + + +## Detailed Explanations of Model Conversion Process + +This section is intended for users who want to understand how the Model Optimizer performs Object Detection API models conversion in details. The knowledge given in this section is also useful for users having complex models that are not converted with the Model Optimizer out of the box. It is highly recommended to read [Sub-Graph Replacement in Model Optimizer](../../customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md) chapter first to understand sub-graph replacement concepts which are used here. + +Implementation of the sub-graph replacers for Object Detection API models is located in the file `/deployment_tools/model_optimizer/extensions/front/tf/ObjectDetectionAPI.py`. + +It is also important to open the model in the [TensorBoard](https://www.tensorflow.org/guide/summaries_and_tensorboard) to see the topology structure. Model Optimizer can create an event file that can be then fed to the TensorBoard* tool. Run the Model Optimizer with providing two command line parameters: +* `--input_model ` --- Path to the frozen model +* `--tensorboard_logdir` --- Path to the directory where TensorBoard looks for the event files. + +### SSD (Single Shot Multibox Detector) Topologies + +The SSD topologies are the simplest ones among Object Detection API topologies, so they will be analyzed first. The sub-graph replacement configuration file `ssd_v2_support.json`, which should be used to convert these models, contains three sub-graph replacements: `ObjectDetectionAPIPreprocessorReplacement`, `ObjectDetectionAPISSDPostprocessorReplacement` and `ObjectDetectionAPIOutputReplacement`. Their implementation is described below. + +#### Preprocessor Block + +All Object Detection API topologies contain `Preprocessor` block of nodes (aka ["scope"](https://www.tensorflow.org/guide/graph_viz)) that performs two tasks: + +* Scales image to the size required by the topology. +* Applies mean and scale values if needed. + +Model Optimizer cannot convert the part of the `Preprocessor` block performing scaling because the TensorFlow implementation uses `while`- loops which the Inference Engine does not support. Another reason is that the Inference Engine samples scale input images to the size of the input layer from the Intermediate Representation (IR) automatically. Given that it is necessary to cut-off the scaling part of the `Preprocessor` block and leave only operations applying mean and scale values. This task is solved using the Model Optimizer [sub-graph replacer mechanism](../../customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md). + +The `Preprocessor` block has two outputs: the tensor with pre-processed image(s) data and a tensor with pre-processed image(s) size(s). While converting the model, Model Optimizer keeps only the nodes producing the first tensor. The second tensor is a constant which can be obtained from the `pipeline.config` file to be used in other replacers. + +The implementation of the `Preprocessor` block sub-graph replacer is the following (file `/deployment_tools/model_optimizer/extensions/front/tf/ObjectDetectionAPI.py`): + +```python +class ObjectDetectionAPIPreprocessorReplacement(FrontReplacementFromConfigFileSubGraph): + """ + The class replaces the "Preprocessor" block resizing input image and applying mean/scale values. Only nodes related + to applying mean/scaling values are kept. + """ + replacement_id = 'ObjectDetectionAPIPreprocessorReplacement' + + def run_before(self): + return [Pack, Sub] + + def nodes_to_remove(self, graph: Graph, match: SubgraphMatch): + new_nodes_to_remove = match.matched_nodes_names() + # do not remove nodes that perform input image scaling and mean value subtraction + for node_to_keep in ('Preprocessor/sub', 'Preprocessor/sub/y', 'Preprocessor/mul', 'Preprocessor/mul/x'): + if node_to_keep in new_nodes_to_remove: + new_nodes_to_remove.remove(node_to_keep) + return new_nodes_to_remove + + def generate_sub_graph(self, graph: Graph, match: SubgraphMatch): + argv = graph.graph['cmd_params'] + layout = graph.graph['layout'] + if argv.tensorflow_object_detection_api_pipeline_config is None: + raise Error(missing_param_error) + pipeline_config = PipelineConfig(argv.tensorflow_object_detection_api_pipeline_config) + + sub_node = match.output_node(0)[0] + if not sub_node.has('op') or sub_node.op != 'Sub': + raise Error('The output op of the Preprocessor sub-graph is not of type "Sub". Looks like the topology is ' + 'not created with TensorFlow Object Detection API.') + + mul_node = None + if sub_node.in_node(0).has('op') and sub_node.in_node(0).op == 'Mul': + log.info('There is image scaling node in the Preprocessor block.') + mul_node = sub_node.in_node(0) + + initial_input_node_name = 'image_tensor' + if initial_input_node_name not in graph.nodes(): + raise Error('Input node "{}" of the graph is not found. Do not run the Model Optimizer with ' + '"--input" command line parameter.'.format(initial_input_node_name)) + placeholder_node = Node(graph, initial_input_node_name) + + # set default value of the batch size to 1 if user didn't specify batch size and input shape + batch_dim = get_batch_dim(layout, 4) + if argv.batch is None and placeholder_node.shape[batch_dim] == -1: + placeholder_node.shape[batch_dim] = 1 + if placeholder_node.shape[batch_dim] > 1: + print("[ WARNING ] The batch size more than 1 is supported for SSD topologies only.") + height, width = calculate_placeholder_spatial_shape(graph, match, pipeline_config) + placeholder_node.shape[get_height_dim(layout, 4)] = height + placeholder_node.shape[get_width_dim(layout, 4)] = width + + # save the pre-processed image spatial sizes to be used in the other replacers + graph.graph['preprocessed_image_height'] = placeholder_node.shape[get_height_dim(layout, 4)] + graph.graph['preprocessed_image_width'] = placeholder_node.shape[get_width_dim(layout, 4)] + + to_float_node = placeholder_node.out_node(0) + if not to_float_node.has('op') or to_float_node.op != 'Cast': + raise Error('The output of the node "{}" is not Cast operation. Cannot apply replacer.'.format( + initial_input_node_name)) + + # connect to_float_node directly with node performing scale on mean value subtraction + if mul_node is None: + create_edge(to_float_node, sub_node, 0, 0) + else: + create_edge(to_float_node, mul_node, 0, 1) + + print('The Preprocessor block has been removed. Only nodes performing mean value subtraction and scaling (if' + ' applicable) are kept.') + return {} +``` +The `run_before` function defines a list of replacers which current replacer should be run before. In this case it is `Pack` and `Sub`. The `Sub` operation is not supported by Inference Engine plugins so Model Optimizer replaces it with a combination of the `Eltwise` layer (element-wise sum) and the `ScaleShift` layer. But the `Preprocessor` replacer expects to see `Sub` node, so it should be called before the `Sub` is replaced. + +The `nodes_to_remove` function returns list of nodes that should be removed after the replacement happens. In this case it removes all nodes matched in the `Preprocessor` scope except the `Sub` and `Mul` nodes performing mean value subtraction and scaling. + +The `generate_sub_graph` function performs the following actions: + +* Lines 20-24: Reads the `pipeline.config` configuration file to get the model hyper-parameters and other attributes. +* Lines 25-29: Checks that the output node of the `Preprocessor` scope is of type `Sub`. +* Lines 31-34: Checks that the input of the `Sub` node is of type `Mul`. This information is needed to correctly connect the input node of the topology later. +* Lines 36-50: Finds the topology input (placeholder) node and sets its weight and height according to the image resizer defined in the `pipeline.config` file and the `--input_shape` provided by the user. The batch size is set to 1 by default, but it will be overridden if you specify a batch size using command-line option `-b`. Refer to the [Custom Input Shape](#tf_od_custom_input_shape) on how the Model Optimizer calculates input layer height and width. +* Lines 52-54: Saves the placeholder shape in the `graph` object for other sub-graph replacements. +* Lines 56-59: Checks that the placeholder node follows the 'Cast' node which converts model input data from UINT8 to FP32. +* Lines 61-65: Creates edge from the placeholder node to the `Mul` (if present) or `Sub` node to a correct input port (0 for `Sub` and 1 for `Mul`). +* Line 69: The replacer returns a dictionary with nodes mapping that is used by other sub-graph replacement functions. In this case, it is not needed, so the empty dictionary is returned. + +#### Postprocessor Block + +A distinct feature of any SSD topology is a part performing non-maximum suppression of proposed images bounding boxes. This part of the topology is implemented with dozens of primitive operations in TensorFlow, while in Inference Engine, it is one [layer](../../../../ops/opset.md) called `DetectionOutput`. Thus, to convert a SSD model from the TensorFlow, the Model Optimizer should replace the entire sub-graph of operations that implement the `DetectionOutput` layer with a single `DetectionOutput` node. + +The Inference Engine `DetectionOutput` layer implementation consumes three tensors in the following order: + +1. Tensor with locations of bounding boxes +2. Tensor with confidences for each bounding box +3. Tensor with prior boxes ("anchors" in a TensorFlow terminology) + +The Inference Engine `DetectionOutput` layer implementation produces one tensor with seven numbers for each actual detection: + +* batch index +* class label +* class probability +* x_1 box coordinate +* y_1 box coordinate +* x_2 box coordinate +* y_2 box coordinate. + +There are more output tensors in the TensorFlow Object Detection API: "detection_boxes", "detection_classes", "detection_scores" and "num_detections", but the values in them are consistent with the output values of the Inference Engine DetectionOutput layer. + +The sub-graph replacement by points is used in the `ssd_v2_support.json` to match the `Postprocessor` block. The start points are defined the following way: + +* "Postprocessor/Shape" receives tensor with bounding boxes; +* "Postprocessor/scale_logits" receives tensor with confidences(probabilities) for each box; +* "Postprocessor/Tile" receives tensor with prior boxes (anchors); +* "Postprocessor/Reshape_1" is specified only to match the whole `Postprocessor` scope. Not used in the replacement code; +* "Postprocessor/ToFloat" is specified only to match the whole `Postprocessor` scope. Not used in the replacement code. + +There are a number of differences in layout, format and content of in input tensors to `DetectionOutput` layer and what tensors generates TensorFlow, so additional tensors processing before creating `DetectionOutput` layer is required. It is described below. The sub-graph replacement class for the `DetectionOutput` layer is given below: + +```python +class ObjectDetectionAPISSDPostprocessorReplacement(FrontReplacementFromConfigFileSubGraph): + replacement_id = 'ObjectDetectionAPISSDPostprocessorReplacement' + + def run_after(self): + return [ObjectDetectionAPIPreprocessorReplacement] + + def run_before(self): + # the replacer uses node of type "RealDiv" as one of the start points, but Model Optimizer replaces nodes of + # type "RealDiv" with a new ones, so it is necessary to replace the sub-graph before replacing the "RealDiv" + # nodes + return [Div, StandaloneConstEraser] + + def output_edges_match(self, graph: Graph, match: SubgraphMatch, new_sub_graph: dict): + # the DetectionOutput in IE produces single tensor, but in TF it produces two tensors, so create only one output + # edge match + return {match.output_node(0)[0].id: new_sub_graph['detection_output_node'].id} + + def generate_sub_graph(self, graph: Graph, match: SubgraphMatch): + argv = graph.graph['cmd_params'] + if argv.tensorflow_object_detection_api_pipeline_config is None: + raise Error(missing_param_error) + pipeline_config = PipelineConfig(argv.tensorflow_object_detection_api_pipeline_config) + num_classes = _value_or_raise(match, pipeline_config, 'num_classes') + + # reshapes confidences to 4D before applying activation function + expand_dims_op = Reshape(graph, {'dim': int64_array([0, 1, -1, num_classes + 1])}) + # do not convert from NHWC to NCHW this node shape + expand_dims_node = expand_dims_op.create_node([match.input_nodes(1)[0][0].in_node(0)], + dict(name='do_ExpandDims_conf')) + + activation_function = _value_or_raise(match, pipeline_config, 'postprocessing_score_converter') + activation_conf_node = add_activation_function_after_node(graph, expand_dims_node, activation_function) + PermuteAttrs.set_permutation(expand_dims_node, expand_dims_node.out_node(), None) + + # IE DetectionOutput layer consumes flattened tensors + # reshape operation to flatten locations tensor + reshape_loc_op = Reshape(graph, {'dim': int64_array([0, -1])}) + reshape_loc_node = reshape_loc_op.create_node([match.input_nodes(0)[0][0].in_node(0)], + dict(name='do_reshape_loc')) + + # IE DetectionOutput layer consumes flattened tensors + # reshape operation to flatten confidence tensor + reshape_conf_op = Reshape(graph, {'dim': int64_array([0, -1])}) + reshape_conf_node = reshape_conf_op.create_node([activation_conf_node], dict(name='do_reshape_conf')) + + if pipeline_config.get_param('ssd_anchor_generator_num_layers') is not None or \ + pipeline_config.get_param('multiscale_anchor_generator_min_level') is not None: + # change the Reshape operations with hardcoded number of output elements of the convolution nodes to be + # reshapable + _relax_reshape_nodes(graph, pipeline_config) + + # create PriorBoxClustered nodes instead of a constant value with prior boxes so the model could be reshaped + if pipeline_config.get_param('ssd_anchor_generator_num_layers') is not None: + priors_node = _create_prior_boxes_node(graph, pipeline_config) + elif pipeline_config.get_param('multiscale_anchor_generator_min_level') is not None: + priors_node = _create_multiscale_prior_boxes_node(graph, pipeline_config) + else: + log.info('The anchor generator is not known. Save constant with prior-boxes to IR.') + priors_node = match.input_nodes(2)[0][0].in_node(0) + + # creates DetectionOutput Node object from Op class + detection_output_op = DetectionOutput(graph, match.custom_replacement_desc.custom_attributes) + detection_output_op.attrs['old_infer'] = detection_output_op.attrs['infer'] + detection_output_op.attrs['infer'] = __class__.do_infer + detection_output_node = detection_output_op.create_node( + [reshape_loc_node, reshape_conf_node, priors_node], + dict(name=detection_output_op.attrs['type'], + clip=1, + confidence_threshold=_value_or_raise(match, pipeline_config, 'postprocessing_score_threshold'), + top_k=_value_or_raise(match, pipeline_config, 'postprocessing_max_detections_per_class'), + keep_top_k=_value_or_raise(match, pipeline_config, 'postprocessing_max_total_detections'), + nms_threshold=_value_or_raise(match, pipeline_config, 'postprocessing_iou_threshold'))) + + return {'detection_output_node': detection_output_node} +``` + +The `run_before` and `run_after` functions define lists of replacers that this replacer should be run before and after respectively. + +The `input_edges_match` and `output_edges_match` functions generate dictionaries describing how the input/output nodes matched with the replacer should be connected with new nodes generated in the `generate_sub_graph` function. Refer to [sub-graph replacements](../../customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md) documentation for more information. + +The `generate_sub_graph` function performs the following actions: + +* Lines 19-23: Reads the `pipeline.config` configuration file to get the model hyper-parameters and other attributes. +* Lines 25-32: Makes tensor with confidences 4D and apply correct activation function (read from the `pipeline.config` file) to it. +* Line 33: Disables permutation of `expand_dims_node`'s attributes because they are already in the NCHW layout. +* Lines 35-39: Makes tensor with bounding boxes 2D, where the first dimension corresponds to a batch size. +* Lines 49-52: Makes tensor with confidences 2D, where the first dimension corresponds to a batch size. +* Lines 41-44: Creates a node with `DetectionOutput` layer with a number of layer attributes from the `pipeline.config` file. Also the inference function (`infer` attribute) is updated with a custom inference function `__class__.do_infer`. The latter change is described below. +* Lines 46-59: Creates several `PriorBoxClustered` layers which generate prior boxes depending on the type of the grid anchor generator defined in the `pipeline.config` file. If the grid anchor type is not known then initialize `priors_node` as a node matched by the sub-graph replacement. In the latter case it is a constant node with prior boxes calculated for a particular input image shape. +* Lines 61-72: Creates `DetectionOutput` layer with attributes from the `pipeline.config` file. +* Line 74: Returns dictionary with mapping of nodes that is used in the `input_edges_match` and `output_edges_match` functions. + +The paragraphs below explains why the inference function for the Detection Output layer is modified. Before doing that it is necessary to make acquaintance with selected high-level steps of the Model Optimize model conversion pipeline. Note, that only selected steps are required for understanding the change are mentioned: + +1. Model Optimizer creates calculation graph from the initial topology where each nodes corresponds to a operation from the initial model. +2. Model Optimizer performs "Front replacers" (including the one being described now). +3. Model Optimizer adds data nodes between operation nodes to the graph. +4. Model Optimizer performs "Middle replacers". +5. Model Optimizer performs "shape inference" phase. During this phase the shape of all data nodes is being calculated. Model Optimizer also calculates value for data tensors which are constant, i.e. do not depend on input. For example, tensor with prior boxes (generated with `MultipleGridAnchorGenerator` or similar scopes) doesn't depend on input and is evaluated by Model Optimizer during shape inference. Model Optimizer uses inference function stored in the 'infer' attribute of operation nodes. +6. Model Optimizer performs "Back replacers". +7. Model Optimizer generates IR. + +The `do_infer` function is needed to perform some adjustments to the tensor with prior boxes (anchors) that is known only after the shape inference phase and to perform additional transformations described below. This change is performed only if the tensor with prior boxes is not constant (so it is produced by `PriorBoxClustered` layers during inference). It is possible to make the `Postprocessor` block replacement as a Middle replacer (so the prior boxes tensor would be evaluated by the time the replacer is called), but in this case it will be necessary to correctly handle data nodes which are created between each pair of initially adjacent operation nodes. In order to inject required modification to the inference function of the `DetectionOutput` node, a new function is created to perform modifications and to call the initial inference function. The code of a new inference function is the following: + +```python +@staticmethod +def do_infer(node: Node): + prior_boxes = node.in_node(2).value + if prior_boxes is not None: + argv = node.graph.graph['cmd_params'] + if argv.tensorflow_object_detection_api_pipeline_config is None: + raise Error(missing_param_error) + pipeline_config = PipelineConfig(argv.tensorflow_object_detection_api_pipeline_config) + variance = _variance_from_pipeline_config(pipeline_config) + # replicating the variance values for all prior-boxes + variances = np.tile(variance, [prior_boxes.shape[-2], 1]) + # DetectionOutput Inference Engine expects the prior-boxes in the following layout: (values, variances) + prior_boxes = prior_boxes.reshape([-1, 4]) + prior_boxes = np.concatenate((prior_boxes, variances), 0) + # compared to the IE's DetectionOutput, the TF keeps the prior-boxes in YXYX, need to get back to the XYXY + prior_boxes = np.concatenate((prior_boxes[:, 1:2], prior_boxes[:, 0:1], + prior_boxes[:, 3:4], prior_boxes[:, 2:3]), 1) + # adding another dimensions, as the prior-boxes are expected as 3d tensors + prior_boxes = prior_boxes.reshape((1, 2, -1)) + node.in_node(2).shape = int64_array(prior_boxes.shape) + node.in_node(2).value = prior_boxes + + node.old_infer(node) + # compared to the IE's DetectionOutput, the TF keeps the locations in YXYX, need to get back to the XYXY + # for last convolutions that operate the locations need to swap the X and Y for output feature weights & biases + conv_nodes = backward_bfs_for_operation(node.in_node(0), ['Conv2D']) + swap_weights_xy(conv_nodes) + squeeze_reshape_and_concat(conv_nodes) + + for node_name in node.graph.nodes(): + node = Node(node.graph, node_name) + if node.has_and_set('swap_xy_count') and len(node.out_nodes()) != node['swap_xy_count']: + raise Error('The weights were swapped for node "{}", but this weight was used in other nodes.'.format( + node.name)) +``` + +* Lines 3-18: Updates the value of the tensor with prior boxes by appending variance values if the prior boxes are pre-calculated. Inference Engine implementation of the `DetectionOutput` layer expects these values located within the tensor with bounding boxes, but in TensorFlow they are applied in different way. +* Line 23: Executes initial inference function to calculate the output shape of this node. +* Lines 26-27: Finds predecessor node of type "Conv2D" of the node with bounding boxes (which is `node.in_node(0)`) and modifies convolution weights so "X" and "Y" coordinates are swapped. In TensorFlow bounding boxes are stored in the tensors in "YXYX" order, while in the Inference Engine it is "XYXY". +* Line 28: Executes function looking for `Reshape` operations after the `Conv2D` nodes found above with 4D output and remove the dimension with index 2 which should be equal to 1. This is a workaround to make tensor 3D so its shape will not be transposed during the IR generation. The problem arises when bounding boxes predictions are reshaped from [1, 1, 1, X] to [1, X / 4, 1, 4]. The result tensor should not be transposed because after transpose it will have shape [1, 4, X / 4, 1] and the concatenation over dimension with index 2 will produce incorrect tensor. Also the function looks for `Concat` operations and changes the concatenation dimension from 2 to 1. + +### Faster R-CNN Topologies +The Faster R-CNN models contain several building blocks similar to building blocks from SSD models so it is highly recommended to read the section about converting them first. Detailed information about Faster R-CNN topologies is provided [in the abstract](https://arxiv.org/abs/1506.01497). + +#### Preprocessor Block +Faster R-CNN topologies contain similar `Preprocessor` block as SSD topologies. The same `ObjectDetectionAPIPreprocessorReplacement` sub-graph replacer is used to cut it off. + +#### Proposal Layer +The `Proposal` layer is implemented with dozens of primitive operations in TensorFlow, meanwhile, it is a single layer in the Inference Engine. The `ObjectDetectionAPIProposalReplacement` sub-graph replacer identifies nodes corresponding to the layer and replaces them with required new nodes. + +```python +class ObjectDetectionAPIProposalReplacement(FrontReplacementFromConfigFileSubGraph): + """ + This class replaces sub-graph of operations with Proposal layer and additional layers transforming + tensors from layout of TensorFlow to layout required by Inference Engine. + Refer to comments inside the function for more information about performed actions. + """ + replacement_id = 'ObjectDetectionAPIProposalReplacement' + + def run_after(self): + return [ObjectDetectionAPIPreprocessorReplacement] + + def run_before(self): + return [Sub, CropAndResizeReplacement] + + def output_edges_match(self, graph: Graph, match: SubgraphMatch, new_sub_graph: dict): + return {match.output_node(0)[0].id: new_sub_graph['proposal_node'].id} + + def nodes_to_remove(self, graph: Graph, match: SubgraphMatch): + new_list = match.matched_nodes_names().copy() + # do not remove nodes that produce box predictions and class predictions + new_list.remove(match.single_input_node(0)[0].id) + new_list.remove(match.single_input_node(1)[0].id) + return new_list + + def generate_sub_graph(self, graph: Graph, match: SubgraphMatch): + argv = graph.graph['cmd_params'] + if argv.tensorflow_object_detection_api_pipeline_config is None: + raise Error(missing_param_error) + pipeline_config = PipelineConfig(argv.tensorflow_object_detection_api_pipeline_config) + + max_proposals = _value_or_raise(match, pipeline_config, 'first_stage_max_proposals') + proposal_ratios = _value_or_raise(match, pipeline_config, 'anchor_generator_aspect_ratios') + proposal_scales = _value_or_raise(match, pipeline_config, 'anchor_generator_scales') + anchors_count = len(proposal_ratios) * len(proposal_scales) + + # Convolution/matmul node that produces classes predictions + # Permute result of the tensor with classes permissions so it will be in a correct layout for Softmax + predictions_node = backward_bfs_for_operation(match.single_input_node(1)[0], ['Add'])[0] + + reshape_classes_op = Reshape(graph, dict(dim=int64_array([0, anchors_count, 2, -1]))) + reshape_classes_node = reshape_classes_op.create_node([], dict(name='predictions/Reshape', nchw_layout=True)) + predictions_node.insert_node_after(reshape_classes_node, 0) + + softmax_conf_op = Softmax(graph, dict(axis=2, nchw_layout=True, name=reshape_classes_node.id + '/Softmax')) + softmax_conf_node = softmax_conf_op.create_node([reshape_classes_node]) + permute_reshape_softmax_op = Permute(graph, dict(order=int64_array([0, 2, 1, 3]), nchw_layout=True)) + permute_reshape_softmax_node = permute_reshape_softmax_op.create_node([softmax_conf_node], dict( + name=softmax_conf_node.name + '/Permute')) + + initial_shape_op = Shape(graph, dict(name=predictions_node.id + '/Shape')) + initial_shape_node = initial_shape_op.create_node([predictions_node]) + + # implement custom reshape infer function because we need to know the input convolution node output dimension + # sizes but we can know it only after partial infer + reshape_permute_op = Reshape(graph, dict()) + reshape_permute_node = reshape_permute_op.create_node([permute_reshape_softmax_node, initial_shape_node], + dict(name='Reshape_Permute_Class')) + + variance_height = pipeline_config.get_param('frcnn_variance_height') + variance_width = pipeline_config.get_param('frcnn_variance_width') + variance_x = pipeline_config.get_param('frcnn_variance_x') + variance_y = pipeline_config.get_param('frcnn_variance_y') + anchor_generator_height_stride = pipeline_config.get_param('anchor_generator_height_stride') + anchor_generator_width_stride = pipeline_config.get_param('anchor_generator_width_stride') + anchor_generator_height = pipeline_config.get_param('anchor_generator_height') + anchor_generator_width = pipeline_config.get_param('anchor_generator_width') + + if variance_height != variance_width: + log.error('The values for variance for height "{}" is not equal to variance for width "{}". The detection ' + 'results will be inaccurate.'.format(variance_height, variance_width)) + if variance_x != variance_y: + log.error('The values for variance for x "{}" is not equal to variance for y "{}". The detection ' + 'results will be inaccurate.'.format(variance_x, variance_y)) + if anchor_generator_height_stride != anchor_generator_width_stride: + log.error('The values for the anchor generator height stride "{}" is not equal to the anchor generator ' + 'width stride "{}". The detection results will be inaccurate.'.format( + anchor_generator_height_stride, anchor_generator_width_stride)) + if anchor_generator_height != anchor_generator_width: + log.error('The values for the anchor generator height "{}" is not equal to the anchor generator width ' + 'stride "{}". The detection results will be inaccurate.'.format(anchor_generator_height, + anchor_generator_width)) + + proposal_op = ProposalOp(graph, dict(min_size=1, + framework='tensorflow', + pre_nms_topn=2 ** 31 - 1, + box_size_scale=variance_height, + box_coordinate_scale=variance_x, + post_nms_topn=max_proposals, + feat_stride=anchor_generator_height_stride, + ratio=proposal_ratios, + scale=proposal_scales, + normalize=1, + base_size=anchor_generator_height, + nms_thresh=_value_or_raise(match, pipeline_config, + 'first_stage_nms_iou_threshold'))) + for key in ('clip_before_nms', 'clip_after_nms'): + if key in match.custom_replacement_desc.custom_attributes: + proposal_op.attrs[key] = int(match.custom_replacement_desc.custom_attributes[key]) + + anchors_node = backward_bfs_for_operation(match.single_input_node(0)[0], ['Add'])[0] + + # creates input to store input image height, width and scales (usually 1.0s) + # the batch size for this input is fixed because it is allowed to pass images of the same size only as input + input_op_with_image_size = Input(graph, dict(shape=int64_array([1, 3]), fixed_batch=True)) + input_with_image_size_node = input_op_with_image_size.create_node([], dict(name='image_info')) + + proposal_node = proposal_op.create_node([reshape_permute_node, anchors_node, input_with_image_size_node], + dict(name='proposals')) + + if 'do_not_swap_proposals' in match.custom_replacement_desc.custom_attributes and \ + match.custom_replacement_desc.custom_attributes['do_not_swap_proposals']: + swapped_proposals_node = proposal_node + else: + swapped_proposals_node = add_convolution_to_swap_xy_coordinates(graph, proposal_node, 5) + + proposal_reshape_2d_op = Reshape(graph, dict(dim=int64_array([-1, 5]), nchw_layout=True)) + proposal_reshape_2d_node = proposal_reshape_2d_op.create_node([swapped_proposals_node], + dict(name="reshape_swap_proposals_2d")) + + # feed the CropAndResize node with a correct boxes information produced with the Proposal layer + # find the first CropAndResize node in the BFS order + crop_and_resize_nodes_ids = [node_id for node_id in bfs_search(graph, [match.single_input_node(0)[0].id]) if + graph.node[node_id]['op'] == 'CropAndResize'] + assert len(crop_and_resize_nodes_ids) != 0, "Didn't find any CropAndResize nodes in the graph." + if 'do_not_swap_proposals' not in match.custom_replacement_desc.custom_attributes or not \ + match.custom_replacement_desc.custom_attributes['do_not_swap_proposals']: + crop_and_resize_node = Node(graph, crop_and_resize_nodes_ids[0]) + # set a marker that the input with box coordinates has been pre-processed so the CropAndResizeReplacement + # transform doesn't try to merge the second and the third inputs + crop_and_resize_node['inputs_preprocessed'] = True + graph.remove_edge(crop_and_resize_node.in_node(1).id, crop_and_resize_node.id) + graph.create_edge(proposal_reshape_2d_node, crop_and_resize_node, out_port=0, in_port=1) + + tf_proposal_reshape_4d_op = Reshape(graph, dict(dim=int64_array([-1, 1, max_proposals, 5]), nchw_layout=True)) + tf_proposal_reshape_4d_node = tf_proposal_reshape_4d_op.create_node([swapped_proposals_node], + dict(name="reshape_proposal_4d")) + + crop_op = Crop(graph, dict(axis=int64_array([3]), offset=int64_array([1]), dim=int64_array([4]), + nchw_layout=True)) + crop_node = crop_op.create_node([tf_proposal_reshape_4d_node], dict(name='crop_proposals')) + + tf_proposals_crop_reshape_3d_op = Reshape(graph, dict(dim=int64_array([0, -1, 4]), nchw_layout=True)) + tf_proposals_crop_reshape_3d_node = tf_proposals_crop_reshape_3d_op.create_node([crop_node], + dict(name="reshape_crop_3d")) + + return {'proposal_node': tf_proposals_crop_reshape_3d_node} +``` +The main interest of the implementation of this replacer is the `generate_sub_graph` function. + +Lines 26-34: Parses the `pipeline.config` file and gets required parameters for the `Proposal` layer. + +Lines 38-57: Performs the following manipulations with the tensor with class predictions. TensorFlow uses the NHWC layout, while the Inference Engine uses NCHW. Model Optimizer by default performs transformations with all nodes data in the inference graph to convert it to the NCHW layout. The size of 'C' dimension of the tensor with class predictions is equal to \f$base\_anchors\_count \cdot 2\f$, where 2 corresponds to a number of classes (background and foreground) and \f$base\_anchors\_count\f$ is equal to number of anchors that are applied to each position of 'H' and 'W' dimensions. Therefore, there are \f$H \cdot W \cdot base\_anchors\_count\f$ bounding boxes. Lines 44-45 apply the Softmax layer to this tensor to get class probabilities for each bounding box. + +Lines 59-81: Reads topology parameters related to variances and anchors generation. + +Lines 83-108: Adds the `Proposal` layer to the graph. This layer has one input (generated in lines 104-105) which should be filled with three values before inference: input image height, input image width, image scale factor. + +Lines 110-132: Swaps output values of the `Proposal` layer if the parameter `do_not_swap_proposals` is not set to `True` in the sub-graph replacement configuration file for the replacer. + +Lines 134-144: Crops the output from the `Proposal` node to remove the batch indices (the Inference Engine implementation of the `Proposal` layer generates tensor with shape `[num_proposals, 5]`). The final tensor contains just box coordinates as in the TensorFlow implementation. + +#### SecondStagePostprocessor Block +The `SecondStagePostprocessor` block is similar to the `Postprocessor` block from the SSDs topologies. But there are a number of differences in conversion of the `SecondStagePostprocessor` block. + +```python +class ObjectDetectionAPIDetectionOutputReplacement(FrontReplacementFromConfigFileSubGraph): + """ + Replaces the sub-graph that is equal to the DetectionOutput layer from Inference Engine. This replacer is used for + Faster R-CNN, R-FCN and Mask R-CNN topologies conversion. + The replacer uses a value of the custom attribute 'coordinates_swap_method' from the sub-graph replacement + configuration file to choose how to swap box coordinates of the 0-th input of the generated DetectionOutput layer. + Refer to the code for more details. + """ + replacement_id = 'ObjectDetectionAPIDetectionOutputReplacement' + + def run_before(self): + return [ObjectDetectionAPIMaskRCNNROIPoolingSecondReplacement, Unpack, Sub] + + def run_after(self): + return [ObjectDetectionAPIProposalReplacement, CropAndResizeReplacement] + + def nodes_to_remove(self, graph: Graph, match: SubgraphMatch): + new_nodes_to_remove = match.matched_nodes_names().copy() + outputs = ['detection_boxes', 'detection_scores', 'num_detections'] + for output in outputs: + children = Node(graph, output).out_nodes() + if len(children) != 1: + log.warning('Output {} has {} children. It should have only one output: with op==`OpOutput`' + ''.format(output, len(children))) + elif children[list(children.keys())[0]].op == 'OpOutput': + new_nodes_to_remove.append(children[list(children.keys())[0]].id) + else: + continue + new_nodes_to_remove.extend(outputs) + return new_nodes_to_remove + + def output_edges_match(self, graph: Graph, match: SubgraphMatch, new_sub_graph: dict): + # the DetectionOutput in IE produces single tensor, but in TF it produces four tensors, so we need to create + # only one output edge match + return {match.output_node(0)[0].id: new_sub_graph['detection_output_node'].id} + + @staticmethod + def skip_nodes_by_condition(current_node: Node, condition: callable): + while condition(current_node): + current_node = current_node.in_node() + return current_node + + def generate_sub_graph(self, graph: Graph, match: SubgraphMatch): + argv = graph.graph['cmd_params'] + if argv.tensorflow_object_detection_api_pipeline_config is None: + raise Error(missing_param_error) + pipeline_config = PipelineConfig(argv.tensorflow_object_detection_api_pipeline_config) + + num_classes = _value_or_raise(match, pipeline_config, 'num_classes') + max_proposals = _value_or_raise(match, pipeline_config, 'first_stage_max_proposals') + activation_function = _value_or_raise(match, pipeline_config, 'postprocessing_score_converter') + + activation_conf_node = add_activation_function_after_node(graph, match.single_input_node(1)[0].in_node(0), + activation_function) + + # IE DetectionOutput layer consumes flattened tensors so need add a Reshape layer. + # The batch value of the input tensor is not equal to the batch of the topology, so it is not possible to use + # "0" value in the Reshape layer attribute to refer to the batch size, but we know how to + # calculate the second dimension so the batch value will be deduced from it with help of "-1". + reshape_conf_op = Reshape(graph, dict(dim=int64_array([-1, (num_classes + 1) * max_proposals]))) + reshape_conf_node = reshape_conf_op.create_node([activation_conf_node], dict(name='do_reshape_conf')) + + # Workaround for PermuteForReshape pass. + # We looking for first not Reshape-typed node before match.single_input_node(0)[0].in_node(0). + # And add reshape_loc node after this first not Reshape-typed node. + current_node = self.skip_nodes_by_condition(match.single_input_node(0)[0].in_node(0), + lambda x: x['kind'] == 'op' and x.soft_get('type') == 'Reshape') + + reshape_loc_op = Reshape(graph, dict(dim=int64_array([-1, num_classes, 1, 4]))) + reshape_loc_node = reshape_loc_op.create_node([current_node], dict(name='reshape_loc', nchw_layout=True)) + update_attrs(reshape_loc_node, 'shape_attrs', 'dim') + + # constant node with variances + variances_const_op = Const(graph, dict(value=_variance_from_pipeline_config(pipeline_config))) + variances_const_node = variances_const_op.create_node([]) + + # TF produces locations tensor without boxes for background. + # Inference Engine DetectionOutput layer requires background boxes so we generate them + loc_node = add_fake_background_loc(graph, reshape_loc_node) + PermuteAttrs.set_permutation(reshape_loc_node, loc_node, None) + + # reshape locations tensor to 2D so it could be passed to Eltwise which will be converted to ScaleShift + reshape_loc_2d_op = Reshape(graph, dict(dim=int64_array([-1, 4]))) + reshape_loc_2d_node = reshape_loc_2d_op.create_node([loc_node], dict(name='reshape_locs_2d', nchw_layout=True)) + PermuteAttrs.set_permutation(loc_node, reshape_loc_2d_node, None) + + # element-wise multiply locations with variances + eltwise_locs_op = Eltwise(graph, dict(operation='mul')) + eltwise_locs_node = eltwise_locs_op.create_node([reshape_loc_2d_node, variances_const_node], + dict(name='scale_locs')) + + # IE DetectionOutput layer consumes flattened tensors so need add a Reshape layer. + # The batch value of the input tensor is not equal to the batch of the topology, so it is not possible to use + # "0" value in the Reshape layer attribute to refer to the batch size, but we know how to + # calculate the second dimension so the batch value will be deduced from it with help of "-1". + reshape_loc_do_op = Reshape(graph, dict(dim=int64_array([-1, (num_classes + 1) * max_proposals * 4]))) + + custom_attributes = match.custom_replacement_desc.custom_attributes + coordinates_swap_method = 'add_convolution' + if 'coordinates_swap_method' not in custom_attributes: + log.error('The ObjectDetectionAPIDetectionOutputReplacement sub-graph replacement configuration file ' + 'must contain "coordinates_swap_method" in the "custom_attributes" dictionary. Two values are ' + 'supported: "swap_weights" and "add_convolution". The first one should be used when there is ' + 'a MatMul or Conv2D node before the "SecondStagePostprocessor" block in the topology. With this ' + 'solution the weights of the MatMul or Conv2D nodes are permutted, simulating the swap of XY ' + 'coordinates in the tensor. The second could be used in any other cases but it is worse in terms ' + 'of performance because it adds the Conv2D node which performs permutting of data. Since the ' + 'attribute is not defined the second approach is used by default.') + else: + coordinates_swap_method = custom_attributes['coordinates_swap_method'] + supported_swap_methods = ['swap_weights', 'add_convolution'] + if coordinates_swap_method not in supported_swap_methods: + raise Error('Unsupported "coordinates_swap_method" defined in the sub-graph replacement configuration ' + 'file. Supported methods are: {}'.format(', '.join(supported_swap_methods))) + + if coordinates_swap_method == 'add_convolution': + swapped_locs_node = add_convolution_to_swap_xy_coordinates(graph, eltwise_locs_node, 4) + reshape_loc_do_node = reshape_loc_do_op.create_node([swapped_locs_node], dict(name='do_reshape_locs')) + else: + reshape_loc_do_node = reshape_loc_do_op.create_node([eltwise_locs_node], dict(name='do_reshape_locs')) + + # find Proposal output which has the data layout as in TF: YXYX coordinates without batch indices. + proposal_nodes_ids = [node_id for node_id, attrs in graph.nodes(data=True) + if 'name' in attrs and attrs['name'] == 'crop_proposals'] + if len(proposal_nodes_ids) != 1: + raise Error("Found the following nodes '{}' with name 'crop_proposals' but there should be exactly 1. " + "Looks like ObjectDetectionAPIProposalReplacement replacement didn't work.". + format(proposal_nodes_ids)) + proposal_node = Node(graph, proposal_nodes_ids[0]) + + # check whether it is necessary to permute proposals coordinates before passing them to the DetectionOutput + # currently this parameter is set for the RFCN topologies + if 'swap_proposals' in custom_attributes and custom_attributes['swap_proposals']: + proposal_node = add_convolution_to_swap_xy_coordinates(graph, proposal_node, 4) + + # reshape priors boxes as Detection Output expects + reshape_priors_op = Reshape(graph, dict(dim=int64_array([-1, 1, max_proposals * 4]))) + reshape_priors_node = reshape_priors_op.create_node([proposal_node], + dict(name='DetectionOutput_reshape_priors_')) + + detection_output_op = DetectionOutput(graph, {}) + if coordinates_swap_method == 'swap_weights': + # update infer function to re-pack weights + detection_output_op.attrs['old_infer'] = detection_output_op.attrs['infer'] + detection_output_op.attrs['infer'] = __class__.do_infer + for key in ('clip_before_nms', 'clip_after_nms'): + if key in match.custom_replacement_desc.custom_attributes: + detection_output_op.attrs[key] = int(match.custom_replacement_desc.custom_attributes[key]) + + detection_output_node = detection_output_op.create_node( + [reshape_loc_do_node, reshape_conf_node, reshape_priors_node], + dict(name=detection_output_op.attrs['type'], share_location=0, variance_encoded_in_target=1, + code_type='caffe.PriorBoxParameter.CENTER_SIZE', pad_mode='caffe.ResizeParameter.CONSTANT', + resize_mode='caffe.ResizeParameter.WARP', + num_classes=num_classes, + confidence_threshold=_value_or_raise(match, pipeline_config, 'postprocessing_score_threshold'), + top_k=_value_or_raise(match, pipeline_config, 'postprocessing_max_detections_per_class'), + keep_top_k=_value_or_raise(match, pipeline_config, 'postprocessing_max_total_detections'), + nms_threshold=_value_or_raise(match, pipeline_config, 'postprocessing_iou_threshold'))) + # sets specific name to the node so we can find it in other replacers + detection_output_node.name = 'detection_output' + + output_op = Output(graph, dict(name='do_OutputOp')) + output_op.create_node([detection_output_node]) + + print('The graph output nodes "num_detections", "detection_boxes", "detection_classes", "detection_scores" ' + 'have been replaced with a single layer of type "Detection Output". Refer to IR catalogue in the ' + 'documentation for information about this layer.') + + return {'detection_output_node': detection_output_node} + + @staticmethod + def do_infer(node): + node.old_infer(node) + # compared to the IE's DetectionOutput, the TF keeps the locations in YXYX, need to get back to the XYXY + # for last matmul/Conv2D that operate the locations need to swap the X and Y for output feature weights & biases + swap_weights_xy(backward_bfs_for_operation(node.in_node(0), ['MatMul', 'Conv2D'])) +``` + +The differences in conversion are the following: + +* The locations tensor does not contain information about class 0 (background), but Inference Engine `DetectionOutput` layer expects it. Line 79 append dummy tensor with fake coordinates. +* The prior boxes tensor are not constant like in SSDs models, so it is not possible to apply the same solution. Instead, the element-wise multiplication is added to scale prior boxes tensor values with the variances values. The attribute `variance_encoded_in_target=1` is set to the `DetectionOutput` layer (lines 141-159). +* The X and Y coordinates in the tensor with bounding boxes locations adjustments should be swapped. For some topologies it could be done by updating preceding convolution weights, but if there is no preceding convolutional node, the Model Optimizer inserts convolution node with specific kernel and weights that performs coordinates swap during topology inference. +* Added marker node of type `OpOutput` that is used by the Model Optimizer to determine output nodes of the topology. It is used in the dead nodes elimination pass. + +#### Cutting Off Part of the Topology + +There is an ability to cut-off part of the topology using the `--output` command line parameter. Detailed information on why it could be useful is provided in the [Cutting Off Parts of a Model ](../Cutting_Model.md). The Faster R-CNN models are cut at the end using the sub-graph replacer `ObjectDetectionAPIOutputReplacement`. + +```python +class ObjectDetectionAPIOutputReplacement(FrontReplacementFromConfigFileGeneral): + """ + This replacer is used to cut-off the network by specified nodes for models generated with Object Detection API. + The custom attribute for the replacer contains one value for key "outputs". This string is a comma separated list + of outputs alternatives. Each output alternative is a '|' separated list of node name which could be outputs. The + first node from each alternative that exits in the graph is chosen. Others are ignored. + For example, if the "outputs" is equal to the following string: + + "Reshape_16,SecondStageBoxPredictor_1/Conv_3/BiasAdd|SecondStageBoxPredictor_1/Conv_1/BiasAdd" + + then the "Reshape_16" will be an output if it exists in the graph. The second output will be + SecondStageBoxPredictor_1/Conv_3/BiasAdd if it exist in the graph, if not then + SecondStageBoxPredictor_1/Conv_1/BiasAdd will be output if it exists in the graph. + """ + replacement_id = 'ObjectDetectionAPIOutputReplacement' + + def run_before(self): + return [ObjectDetectionAPIPreprocessorReplacement] + + def transform_graph(self, graph: Graph, replacement_descriptions: dict): + if graph.graph['cmd_params'].output is not None: + log.warning('User defined output nodes are specified. Skip the graph cut-off by the ' + 'ObjectDetectionAPIOutputReplacement.') + return + outputs = [] + outputs_string = replacement_descriptions['outputs'] + for alternatives in outputs_string.split(','): + for out_node_name in alternatives.split('|'): + if graph.has_node(out_node_name): + outputs.append(out_node_name) + break + else: + log.debug('A node "{}" does not exist in the graph. Do not add it as output'.format(out_node_name)) + _outputs = output_user_data_repack(graph, outputs) + add_output_ops(graph, _outputs, graph.graph['inputs']) +``` + +This is a replacer of type "general" which is called just once in comparison with other Front-replacers ("scope" and "points") which are called for each matched instance. The replacer reads node names that should become new output nodes, like specifying `--output `. The only difference is that the string containing node names could contain '|' character specifying output node names alternatives. Detailed explanation is provided in the class description in the code. + +The `detection_boxes`, `detection_scores`, `num_detections` nodes are specified as outputs in the `faster_rcnn_support.json` file. These nodes are used to remove part of the graph that is not be needed to calculate value of specified output nodes. + +### R-FCN topologies + +The R-FCN models are based on Faster R-CNN models so it is highly recommended to read the section about converting them first. Detailed information about R-FCN topologies is provided [in the abstract](https://arxiv.org/abs/1605.06409). + +#### Preprocessor Block + +R-FCN topologies contain similar `Preprocessor` block as SSD and Faster R-CNN topologies. The same `ObjectDetectionAPIPreprocessorReplacement` sub-graph replacer is used to cut it off. + +#### Proposal Layer + +Similar to Faster R-CNNs, R-FCN topologies contain implementation of Proposal layer before the `SecondStageBoxPredictor` block, so `ObjectDetectionAPIProposalReplacement` replacement is used in the sub-graph replacement configuration file. + +#### SecondStageBoxPredictor block + +The `SecondStageBoxPredictor` block differs from the self-titled block from Faster R-CNN topologies. It contains a number of `CropAndResize` operations consuming variously scaled boxes generated with a Proposal layer. The combination of `CropAndResize` layers located in the `while` loop forms a single position-sensitive ROI pooling (PSROIPooling) layer with bilinear interpolation. The `ObjectDetectionAPIPSROIPoolingReplacement` replacement matches two `while` loops with PSROIPooling layers applied to the blobs with box coordinates and classes predictions. + +```python +class ObjectDetectionAPIPSROIPoolingReplacement(FrontReplacementFromConfigFileSubGraph): + replacement_id = 'ObjectDetectionAPIPSROIPoolingReplacement' + + def run_after(self): + return [ObjectDetectionAPIProposalReplacement] + + def output_edges_match(self, graph: Graph, match: SubgraphMatch, new_sub_graph: dict): + return {match.output_node(0)[0].id: new_sub_graph['output_node'].id} + + def generate_sub_graph(self, graph: Graph, match: SubgraphMatch): + argv = graph.graph['cmd_params'] + if argv.tensorflow_object_detection_api_pipeline_config is None: + raise Error(missing_param_error) + pipeline_config = PipelineConfig(argv.tensorflow_object_detection_api_pipeline_config) + num_classes = _value_or_raise(match, pipeline_config, 'num_classes') + + input_node = match.input_nodes(0)[0][0].in_node(0) + if 'class_predictions' in input_node.id: + psroipooling_output_dim = num_classes + 1 + else: + psroipooling_output_dim = num_classes * 4 + + num_spatial_bins_height = pipeline_config.get_param('num_spatial_bins_height') + num_spatial_bins_width = pipeline_config.get_param('num_spatial_bins_width') + crop_height = pipeline_config.get_param('crop_height') + crop_width = pipeline_config.get_param('crop_width') + if crop_height != crop_width: + raise Error('Different "crop_height" and "crop_width" parameters from the pipeline config are not ' + 'supported: {} vs {}'.format(crop_height, crop_width)) + psroipooling_op = PSROIPoolingOp(graph, {'name': input_node.soft_get('name') + '/PSROIPooling', + 'output_dim': psroipooling_output_dim, + 'group_size': crop_width / num_spatial_bins_width, + 'spatial_bins_x': num_spatial_bins_width, + 'spatial_bins_y': num_spatial_bins_height, + 'mode': 'bilinear', + 'spatial_scale': 1, + }) + + if 'reshape_swap_proposals_2d' in graph.nodes(): + reshape_swap_proposals_node = Node(graph, 'reshape_swap_proposals_2d') + else: + swap_proposals_node = add_convolution_to_swap_xy_coordinates(graph, Node(graph, 'proposals'), 5) + reshape_swap_proposals_node = Reshape(graph, {'dim': [-1, 5], 'nchw_layout': True, + 'name': 'reshape_swap_proposals_2d'}).create_node( + [swap_proposals_node]) + psroipooling_node = psroipooling_op.create_node([input_node, reshape_swap_proposals_node]) + + reduce_op = Reduce(graph, {'name': 'mean', + 'reduce_type': 'mean', + 'axis': int64_array([1, 2]), + 'keep_dims': True + }) + reduce_node = reduce_op.create_node([psroipooling_node]) + + graph.erase_node(match.output_node(0)[0].out_node()) + + return {'output_node': reduce_node} +``` + +The main interest of the implementation of this replacer is the `generate_sub_graph` function. + +Lines 12-15: Parses the `pipeline.config` file and gets required parameters for the `PSROIPooling` layer. +Lines 17-21: Determines number of output channels for the `PSROIPooling` layer for box coordinates and classes predictions. +Lines 23-46: Create `PSROIPooling` layer based on model parameters determined from the pipeline configuration file. +Lines 48-53: Add Reduce layer which is the output of the `while` loops being replaced. + +#### SecondStagePostprocessor block + +The `SecondStagePostprocessor` block implements functionality of the `DetectionOutput` layer from the Inference Engine. The `ObjectDetectionAPIDetectionOutputReplacement` sub-graph replacement is used to replace the block. For this type of topologies the replacer adds convolution node to swap coordinates of boxes in of the 0-th input tensor to the `DetectionOutput` layer. The custom attribute `coordinates_swap_method` is set to value `add_convolution` in the sub-graph replacement configuration file to enable that behaviour. A method (`swap_weights`) is not suitable for this type of topologies because there are no `Mul` or `Conv2D` operations before the 0-th input of the `DetectionOutput` layer. + +#### Cutting Off Part of the Topology + +The R-FCN models are cut at the end with the sub-graph replacer `ObjectDetectionAPIOutputReplacement` as Faster R-CNNs topologies using the following output node names: `detection_boxes`. + +### Mask R-CNN Topologies + +The Mask R-CNN models are based on Faster R-CNN models so it is highly recommended to read the section about converting them first. Detailed information about Mask R-CNN topologies is provided [in the abstract](https://arxiv.org/abs/1703.06870). + +#### Preprocessor Block + +Mask R-CNN topologies contain similar `Preprocessor` block as SSD and Faster R-CNN topologies. The same `ObjectDetectionAPIPreprocessorReplacement` sub-graph replacer is used to cut it off. + +#### Proposal and ROI (Region of Interest) Pooling + +Proposal and ROI Pooling layers are added to Mask R-CNN topologies like in Faster R-CNNs. + +#### DetectionOutput Layer + +Unlike in SSDs and Faster R-CNNs, the implementation of the `DetectionOutput` layer in Mask R-CNNs topologies is not separated in a dedicated scope. But the matcher is defined with start/end points defined in the `mask_rcnn_support.json` so the replacer correctly adds the `DetectionOutput` layer. + +#### One More ROIPooling + +There is the second `CropAndResize` (equivalent of `ROIPooling` layer) that uses boxes produced with the `DetectionOutput` layer. The `ObjectDetectionAPIMaskRCNNROIPoolingSecondReplacement` replacer is used to replace this node. + +```python +class ObjectDetectionAPIMaskRCNNROIPoolingSecondReplacement(FrontReplacementFromConfigFileSubGraph): + replacement_id = 'ObjectDetectionAPIMaskRCNNROIPoolingSecondReplacement' + + def run_after(self): + return [ObjectDetectionAPIProposalReplacement] + + def output_edges_match(self, graph: Graph, match: SubgraphMatch, new_sub_graph: dict): + return {match.output_node(0)[0].id: new_sub_graph['roi_pooling_node'].id} + + def generate_sub_graph(self, graph: Graph, match: SubgraphMatch): + argv = graph.graph['cmd_params'] + if argv.tensorflow_object_detection_api_pipeline_config is None: + raise Error(missing_param_error) + pipeline_config = PipelineConfig(argv.tensorflow_object_detection_api_pipeline_config) + roi_pool_size = _value_or_raise(match, pipeline_config, 'initial_crop_size') + + detection_output_nodes_ids = [node_id for node_id, attrs in graph.nodes(data=True) + if 'name' in attrs and attrs['name'] == 'detection_output'] + if len(detection_output_nodes_ids) != 1: + raise Error("Found the following nodes '{}' with 'detection_output' but there should be exactly 1.". + format(detection_output_nodes_ids)) + detection_output_node = Node(graph, detection_output_nodes_ids[0]) + + # add reshape of Detection Output so it can be an output of the topology + reshape_detection_output_2d_op = Reshape(graph, dict(dim=int64_array([-1, 7]))) + reshape_detection_output_2d_node = reshape_detection_output_2d_op.create_node( + [detection_output_node], dict(name='reshape_do_2d')) + + # adds special node of type "Output" that is a marker for the output nodes of the topology + output_op = Output(graph, dict(name='do_reshaped_OutputOp')) + output_node = output_op.create_node([reshape_detection_output_2d_node]) + + # add attribute 'output_sort_order' so it will be used as a key to sort output nodes before generation of IR + output_node.in_edge()['data_attrs'].append('output_sort_order') + output_node.in_edge()['output_sort_order'] = [('detection_boxes', 0)] + + # creates two Crop operations which get input from the DetectionOutput layer, cuts of slices of data with class + # ids and probabilities and produce a tensor with batch ids and bounding boxes only (as it is expected by the + # ROIPooling layer) + crop_batch_op = Crop(graph, dict(axis=int64_array([3]), offset=int64_array([0]), dim=int64_array([1]), + nchw_layout=True)) + crop_batch_node = crop_batch_op.create_node([detection_output_node], dict(name='crop_do_batch_ids')) + + crop_coordinates_op = Crop(graph, dict(axis=int64_array([3]), offset=int64_array([3]), dim=int64_array([4]), + nchw_layout=True)) + crop_coordinates_node = crop_coordinates_op.create_node([detection_output_node], dict(name='crop_do_coords')) + + concat_op = Concat(graph, dict(axis=3)) + concat_node = concat_op.create_node([crop_batch_node, crop_coordinates_node], dict(name='batch_and_coords', + nchw_layout=True)) + + # reshape bounding boxes as required by ROIPooling + reshape_do_op = Reshape(graph, dict(dim=int64_array([-1, 5]))) + reshape_do_node = reshape_do_op.create_node([concat_node], dict(name='reshape_do')) + + roi_pooling_op = ROIPooling(graph, dict(method="bilinear", spatial_scale=1, + pooled_h=roi_pool_size, pooled_w=roi_pool_size)) + roi_pooling_node = roi_pooling_op.create_node([match.single_input_node(0)[0].in_node(), reshape_do_node], + dict(name='ROI_pooling_2')) + return {'roi_pooling_node': roi_pooling_node} +``` +The Inference Engine `DetectionOutput` layer implementation produces one tensor with seven numbers for each actual detection: + +* batch index +* class label +* class probability +* x_1 box coordinate +* y_1 box coordinate +* x_2 box coordinate +* y_2 box coordinate. + +The boxes coordinates must be fed to the `ROIPooling` layer, so the `Crop` layer is added to remove unnecessary part (lines 37-50). + +Then the result tensor is reshaped (lines 53-54) and `ROIPooling` layer is created (lines 56-59). + +#### Mask Tensors Processing + +The post-processing part of Mask R-CNN topologies filters out bounding boxes with low probabilities and applies activation function to the rest one. This post-processing is implemented using the `Gather` operation, which is not supported by the Inference Engine. Special Front-replacer removes this post-processing and just inserts the activation layer to the end. The filtering of bounding boxes is done in the dedicated demo `mask_rcnn_demo`. The code of the replacer is the following: + +```python +class ObjectDetectionAPIMaskRCNNSigmoidReplacement(FrontReplacementFromConfigFileGeneral): + """ + This replacer is used to convert Mask R-CNN topologies only. + Adds activation with sigmoid function to the end of the network producing masks tensors. + """ + replacement_id = 'ObjectDetectionAPIMaskRCNNSigmoidReplacement' + + def run_after(self): + return [ObjectDetectionAPIMaskRCNNROIPoolingSecondReplacement] + + def transform_graph(self, graph: Graph, replacement_descriptions): + output_node = None + op_outputs = [n for n, d in graph.nodes(data=True) if 'op' in d and d['op'] == 'OpOutput'] + for op_output in op_outputs: + last_node = Node(graph, op_output).in_node(0) + if last_node.name.startswith('SecondStageBoxPredictor'): + sigmoid_op = Activation(graph, dict(operation='sigmoid')) + sigmoid_node = sigmoid_op.create_node([last_node], dict(name=last_node.id + '/sigmoid')) + sigmoid_node.name = 'masks' + + if output_node is not None: + raise Error('Identified two possible outputs from the topology. Cannot proceed.') + # add special node of type "Output" that is a marker for the output nodes of the topology + output_op = Output(graph, dict(name=sigmoid_node.name + '/OutputOp')) + output_node = output_op.create_node([sigmoid_node]) + + print('The predicted masks are produced by the "masks" layer for each bounding box generated with a ' + '"detection_output" layer.\n Refer to IR catalogue in the documentation for information ' + 'about the DetectionOutput layer and Inference Engine documentation about output data interpretation.\n' + 'The topology can be inferred using dedicated demo "mask_rcnn_demo".') +``` +The replacer looks for the output node which name starts with 'SecondStageBoxPredictor' (the another node of type 'OpOutput' is located after the `DetectionOutput` node). This node contains the generated masks. The replacer adds activation layer 'Sigmoid' after this node as it is done in the initial TensorFlow* model. + +#### Cutting Off Part of the Topology + +The Mask R-CNN models are cut at the end with the sub-graph replacer `ObjectDetectionAPIOutputReplacement` using the following output node names: + +```SecondStageBoxPredictor_1/Conv_3/BiasAdd|SecondStageBoxPredictor_1/Conv_1/BiasAdd``` + +One of these two nodes produces output mask tensors. The child nodes of these nodes are related to post-processing which is implemented in the [Mask R-CNN demo](@ref omz_demos_mask_rcnn_demo_README) and should be cut off. diff --git a/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_Slim_Library_Models.md b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_Slim_Library_Models.md new file mode 100644 index 00000000000000..e750661c9c8eb8 --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_Slim_Library_Models.md @@ -0,0 +1,89 @@ +# Converting TensorFlow*-Slim Image Classification Model Library Models {#openvino_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_Slim_Library_Models} + +TensorFlow\*-Slim Image Classification Model Library is a library to define, train and evaluate classification models in TensorFlow\*. The library contains Python scripts defining the classification topologies together with checkpoint files for several pre-trained classification topologies. To convert a TensorFlow\*-Slim library model, complete the following steps: + +1. Download the TensorFlow\*-Slim models [git repository](https://github.com/tensorflow/models). +2. Download the pre-trained model [checkpoint](https://github.com/tensorflow/models/tree/master/research/slim#pre-trained-models). +3. Export the inference graph. +4. Convert the model using the Model Optimizer. + +The [Example of an Inception V1 Model Conversion](#example_of_an_inception_v1_model_conversion) section below illustrates the process of converting an Inception V1 Model. + +## Example of an Inception V1 Model Conversion +This example demonstrates how to convert the model on Linux\* OSes, but it could be easily adopted for the Windows\* OSes. + +Step 1. Create a new directory to clone the TensorFlow\*-Slim git repository to: + +```sh +mkdir tf_models +``` +```sh +git clone https://github.com/tensorflow/models.git tf_models +``` + +Step 2. Download and unpack the [Inception V1 model checkpoint file](http://download.tensorflow.org/models/inception_v1_2016_08_28.tar.gz): + +```sh +wget http://download.tensorflow.org/models/inception_v1_2016_08_28.tar.gz +``` +```sh +tar xzvf inception_v1_2016_08_28.tar.gz +``` + +Step 3. Export the inference graph --- the protobuf file (`.pb`) containing the architecture of the topology. Note, this file *doesn't* contain the neural network weights and cannot be used for inference. + +```sh +python3 tf_models/research/slim/export_inference_graph.py \ + --model_name inception_v1 \ + --output_file inception_v1_inference_graph.pb +``` + +Model Optimizer comes with the summarize graph utility, which identifies graph input and output nodes. Run the utility to determine input/output nodes of the Inception V1 model: + +```sh +python3 /mo/utils/summarize_graph.py --input_model ./inception_v1_inference_graph.pb +``` + +The output looks as follows:
+```sh +1 input(s) detected: +Name: input, type: float32, shape: (-1,224,224,3) +1 output(s) detected: +InceptionV1/Logits/Predictions/Reshape_1 +``` +The tool finds one input node with name `input`, type `float32`, fixed image size `(224,224,3)` and undefined batch size `-1`. The output node name is `InceptionV1/Logits/Predictions/Reshape_1`.
+ +Step 4. Convert the model with the Model Optimizer: + +```sh +/mo_tf.py --input_model ./inception_v1_inference_graph.pb --input_checkpoint ./inception_v1.ckpt -b 1 --mean_value [127.5,127.5,127.5] --scale 127.5 +``` + +The `-b` command line parameter is required because the Model Optimizer cannot convert a model with undefined input size. + +Refer to the [Mean and Scale Values for TensorFlow\*-Slim Models](#tf_slim_mean_scale_values) for the information why `--mean_values` and `--scale` command line parameters are used. + +## Mean and Scale Values for TensorFlow\*-Slim Models +The TensorFlow\*-Slim Models were trained with normalized input data. There are several different normalization algorithms used in the Slim library. Inference Engine classification sample does not perform image pre-processing except resizing to the input layer size. It is necessary to pass mean and scale values to the Model Optimizer so they are embedded into the generated IR in order to get correct classification results. + +The file [preprocessing_factory.py](https://github.com/tensorflow/models/blob/master/research/slim/preprocessing/preprocessing_factory.py) contains a dictionary variable `preprocessing_fn_map` defining mapping between the model type and pre-processing function to be used. The function code should be analyzed to figure out the mean/scale values. + +The [inception_preprocessing.py](https://github.com/tensorflow/models/blob/master/research/slim/preprocessing/inception_preprocessing.py) file defines the pre-processing function for the Inception models. The `preprocess_for_eval` function contains the following code: + +```python3 + ... + if image.dtype != tf.float32: + image = tf.image.convert_image_dtype(image, dtype=tf.float32) + ... + image = tf.subtract(image, 0.5) + image = tf.multiply(image, 2.0) + return image +``` + +Firstly, the `image` is converted to data type `tf.float32` and the values in the tensor are scaled to the `[0, 1]` range using the [tf.image.convert_image_dtype](https://www.tensorflow.org/api_docs/python/tf/image/convert_image_dtype) function. Then the `0.5` is subtracted from the image values and values multiplied by `2.0`. The final image range of values is `[-1, 1]`. + +Inference Engine classification sample reads an input image as a three-dimensional array of integer values from the range `[0, 255]`. In order to scale them to `[-1, 1]` range, the mean value `127.5` for each image channel should be specified as well as scale factor `127.5`. + +Similarly, the mean/scale values can be determined for other Slim models. + +The exact mean/scale values are defined in the table with list of supported TensorFlow\*-Slim models at the [Converting a TensorFlow* Model](../Convert_Model_From_TensorFlow.md). \ No newline at end of file diff --git a/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_WideAndDeep_Family_Models.md b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_WideAndDeep_Family_Models.md new file mode 100644 index 00000000000000..7e28a7ac0533e3 --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_WideAndDeep_Family_Models.md @@ -0,0 +1,130 @@ +# Converting TensorFlow* Wide and Deep Family Models to the Intermediate Representation {#openvino_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_WideAndDeep_Family_Models} + +The Wide and Deep models is a combination of wide and deep parts for memorization and generalization of object features respectively. +These models can contain different types of object features such as numerical, categorical, sparse and sequential features. These feature types are specified +through Tensorflow* tf.feature_column API. Table below presents what feature types are supported by the OpenVINO™ toolkit. + +| numeric | (weighted) categorical | categorical with hash | bucketized | sequential | crossed | +|:-------:|:----------------------:|:---------------------:|:----------:|:----------:|:-------:| +| yes | yes | no | yes | yes | no | + +**NOTE**: the categorical with hash and crossed features are currently unsupported since The OpenVINO™ toolkit does not support tensors of `string` type and operations with them. + +## Prepare an Example of Wide and Deep Model + +**Step 1**. Clone the GitHub repository with TensorFlow models and move to the directory with an example of Wide and Deep model: + +```sh +git clone https://github.com/tensorflow/models.git; +cd official/r1/wide_deep +``` + +**Step 2**. Train the model + +As the OpenVINO™ toolkit does not support the categorical with hash and crossed features, such feature types must be switched off in the model +by changing the `build_model_columns()` function in `census_dataset.py` as follows: + +```python +def build_model_columns(): + """Builds a set of wide and deep feature columns.""" + # Continuous variable columns + age = tf.feature_column.numeric_column('age') + education_num = tf.feature_column.numeric_column('education_num') + capital_gain = tf.feature_column.numeric_column('capital_gain') + capital_loss = tf.feature_column.numeric_column('capital_loss') + hours_per_week = tf.feature_column.numeric_column('hours_per_week') + education = tf.feature_column.categorical_column_with_vocabulary_list( + 'education', [ + 'Bachelors', 'HS-grad', '11th', 'Masters', '9th', 'Some-college', + 'Assoc-acdm', 'Assoc-voc', '7th-8th', 'Doctorate', 'Prof-school', + '5th-6th', '10th', '1st-4th', 'Preschool', '12th']) + marital_status = tf.feature_column.categorical_column_with_vocabulary_list( + 'marital_status', [ + 'Married-civ-spouse', 'Divorced', 'Married-spouse-absent', + 'Never-married', 'Separated', 'Married-AF-spouse', 'Widowed']) + relationship = tf.feature_column.categorical_column_with_vocabulary_list( + 'relationship', [ + 'Husband', 'Not-in-family', 'Wife', 'Own-child', 'Unmarried', + 'Other-relative']) + workclass = tf.feature_column.categorical_column_with_vocabulary_list( + 'workclass', [ + 'Self-emp-not-inc', 'Private', 'State-gov', 'Federal-gov', + 'Local-gov', '?', 'Self-emp-inc', 'Without-pay', 'Never-worked']) + # To show an example of hashing: + #occupation = tf.feature_column.categorical_column_with_hash_bucket( + # 'occupation', hash_bucket_size=_HASH_BUCKET_SIZE) + # Transformations. + age_buckets = tf.feature_column.bucketized_column( + age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65]) + # Wide columns and deep columns. + base_columns = [ + education, marital_status, relationship, workclass, + age_buckets, + ] + crossed_columns = [] + wide_columns = base_columns + crossed_columns + deep_columns = [ + age, + education_num, + capital_gain, + capital_loss, + hours_per_week, + tf.feature_column.indicator_column(workclass), + tf.feature_column.indicator_column(education), + tf.feature_column.indicator_column(marital_status), + tf.feature_column.indicator_column(relationship), + # To show an example of embedding + ] + return wide_columns, deep_columns +``` + +After that start training by the following command: + +```sh +python census_main.py +``` + +## Convert the Wide and Deep Model to IR + +Use the following command line to convert the saved model file with the checkpoint: + +```sh +python mo.py +--input_checkpoint checkpoint --input_meta_graph model.ckpt.meta +--input "IteratorGetNext:0[2], + IteratorGetNext:1[2], + IteratorGetNext:2[2], + IteratorGetNext:4[2], + IteratorGetNext:7[2], + linear/linear_model/linear_model/linear_model/education/to_sparse_input/indices:0[10 2]{i32}, + linear/linear_model/linear_model/linear_model/education/hash_table_Lookup/LookupTableFindV2:0[10]{i32}, + linear/linear_model/linear_model/linear_model/education/to_sparse_input/dense_shape:0[2]{i32}->[2 50], + linear/linear_model/linear_model/linear_model/marital_status/to_sparse_input/indices:0[10 2]{i32}, + linear/linear_model/linear_model/linear_model/marital_status/hash_table_Lookup/LookupTableFindV2:0[10]{i32}, + linear/linear_model/linear_model/linear_model/marital_status/to_sparse_input/dense_shape:0[2]{i32}->[2 50], + linear/l inear_model/linear_model/linear_model/relationship/to_sparse_input/indices:0[10 2]{i32}, + linear/linear_model/linear_model/linear_model/relationship/hash_table_Lookup/LookupTableFindV2:0[10]{i32}, + linear/linear_model/linear_model/linear_model/relationship/to_sparse_input/dense_shape:0[2]{i32}->[2 50], + linear/linear_model/linear_model/linear_model/workclass/to_sparse_input/indices:0[10 2]{i32}, + linear/linear_model/linear_model/linear_model/workclass/hash_table_Lookup/LookupTableFindV2:0[10]{i32}, + linear/linear_model/linear_model/linear_model/workclass/to_sparse_input/dense_shape:0[2]{i32}->[2 50], + dnn/input_from_feature_columns/input_layer/education_indicator/to_sparse_input/indices:0[10 2]{i32}, + dnn/input_from_feature_columns/input_layer/education_indicator/hash_table_Lookup/LookupTableFindV2:0[10]{i32}, + dnn/input_from_feature_columns/input_layer/education_indicator/to_sparse_input/dense_shape:0[2]{i32}->[2 50], + dnn/input_from_feature_columns/input_layer/marital_status_indicator/to_sparse_input/indices:0[10 2]{i32}, + dnn/input_from_feature_columns/input_layer/marital_status_indicator/hash_table_Lookup/LookupTableFindV2:0[10]{i32}, + dnn/input_from_feature_columns/input_layer/marital_status_indicator/to_sparse_input/dense_shape:0[2]{i32}->[2 50], + dnn/input_from_feature_columns/input_layer/relationship_indicator/to_sparse_input/indices:0[10 2]{i32}, + dnn/input_from_feature_columns/input_layer/relationship_indicator/hash_table_Lookup/LookupTableFindV2:0[10]{i32}, + dnn/input_from_feature_columns/input_layer/relationship_indicator/to_sparse_input/dense_shape:0[2]{i32}->[2 50], + dnn/input_from_feature_columns/input_layer/workclass_indicator/to_sparse_input/indices:0[10 2]{i32}, + dnn/input_from_feature_columns/input_layer/workclass_indicator/hash_table_Lookup/LookupTableFindV2:0[10]{i32}, + dnn/input_from_feature_columns/input_layer/workclass_indicator/to_sparse_input/dense_shape:0[2]{i32}->[2 50]" +--output head/predictions/probabilities +``` + +The model contains operations unsupported by the OpenVINO™ toolkit such as `IteratorGetNext` and `LookupTableFindV2`, so the Model Optimizer must prune these nodes. +The pruning is specified through `--input` option. The prunings for `IteratorGetNext:*` nodes correspond to numeric features. +The pruning for each categorical feature consists of three prunings for the following nodes: `*/to_sparse_input/indices:0`, `*/hash_table_Lookup/LookupTableFindV2:0`, and `*/to_sparse_input/dense_shape:0`. + +The above command line generates IR for a batch of two objects, with total number of actual categorical feature values equal to 10 and maximum size of sparse categorical feature for one object equal to 50. diff --git a/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_XLNet_From_Tensorflow.md b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_XLNet_From_Tensorflow.md new file mode 100644 index 00000000000000..493f05ba8546ac --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_XLNet_From_Tensorflow.md @@ -0,0 +1,190 @@ +# Convert TensorFlow* XLNet Model to the Intermediate Representation {#openvino_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_XLNet_From_Tensorflow} + +Pre-trained models for XLNet (Bidirectional Encoder Representations from Transformers) are +[publicly available](https://github.com/zihangdai/xlnet). + +## Supported Models + +Currently, the following models from the [pre-trained XLNet model list](https://github.com/zihangdai/xlnet#pre-trained-models) are supported: + +* **[`XLNet-Large, Cased`](https://storage.googleapis.com/xlnet/released_models/cased_L-24_H-1024_A-16.zip)** +* **[`XLNet-Base, Cased`](https://storage.googleapis.com/xlnet/released_models/cased_L-12_H-768_A-12.zip)** + +## Download the Pre-Trained Base XLNet Model + +Download and unzip an archive with the [XLNet-Base, Cased](https://storage.googleapis.com/xlnet/released_models/cased_L-12_H-768_A-12.zip). + +After the archive is unzipped, the directory `cased_L-12_H-768_A-12` is created and contains the following files: +* TensorFlow checkpoint (`xlnet_model.ckpt`) containing the pre-trained weights (which is actually 3 files) +* sentence piece model (`spiece.model`) used for (de)tokenization +* config file (`xlnet_config.json`) which specifies the hyperparameters of the model + +To get pb-file from the archive contents, you need to do the following. + +1. Run commands + +```sh + cd ~ + mkdir XLNet-Base + cd XLNet-Base + git clone https://github.com/zihangdai/xlnet + wget https://storage.googleapis.com/xlnet/released_models/cased_L-12_H-768_A-12.zip + unzip cased_L-12_H-768_A-12.zip + mkdir try_save +``` + + + +2. Save and run the following script: + +```python +from collections import namedtuple + +import tensorflow as tf +from tensorflow.python.framework import graph_io + +import model_utils +import xlnet + +LENGTHS = 50 +BATCH = 1 +OUTPUT_DIR = '~/XLNet-Base/try_save/' +INIT_CKPT_PATH = '~/XLNet-Base/xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt' +XLNET_CONFIG_PATH = '~/XLNet-Base/xlnet_cased_L-12_H-768_A-12/xlnet_config.json' + +FLags = namedtuple('FLags', 'use_tpu init_checkpoint') +FLAGS = FLags(use_tpu=False, init_checkpoint=INIT_CKPT_PATH) + +xlnet_config = xlnet.XLNetConfig(json_path=XLNET_CONFIG_PATH) +run_config = xlnet.RunConfig(is_training=False, use_tpu=False, use_bfloat16=False, dropout=0.1, dropatt=0.1,) + + +sentence_features_input_idx = tf.placeholder(tf.int32, shape=[LENGTHS, BATCH], name='input_ids') +sentence_features_segment_ids = tf.placeholder(tf.int32, shape=[LENGTHS, BATCH], name='seg_ids') +sentence_features_input_mask = tf.placeholder(tf.float32, shape=[LENGTHS, BATCH], name='input_mask') + +with tf.Session() as sess: + xlnet_model = xlnet.XLNetModel(xlnet_config=xlnet_config, run_config=run_config, + input_ids=sentence_features_input_idx, + seg_ids=sentence_features_segment_ids, + input_mask=sentence_features_input_mask) + + sess.run(tf.global_variables_initializer()) + model_utils.init_from_checkpoint(FLAGS, True) + + # Save the variables to disk. + saver = tf.train.Saver() + + # Saving checkpoint + save_path = saver.save(sess, OUTPUT_DIR + "model.ckpt") + + # Freezing model + outputs = ['model/transformer/dropout_2/Identity'] + graph_def_freezed = tf.graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), outputs) + + # Saving non-frozen and frozen model to pb + graph_io.write_graph(sess.graph.as_graph_def(), OUTPUT_DIR, 'model.pb', as_text=False) + graph_io.write_graph(graph_def_freezed,OUTPUT_DIR, 'model_frozen.pb', + as_text=False) + + # Write to tensorboard + with tf.summary.FileWriter(logdir=OUTPUT_DIR, graph_def=graph_def_freezed) as writer: + writer.flush() +``` + +The script should save into `~/XLNet-Base/xlnet`. + +## Download the Pre-Trained Large XLNet Model + +Download and unzip an archive with the [XLNet-Large, Cased](https://storage.googleapis.com/xlnet/released_models/cased_L-24_H-1024_A-16.zip). + +After the archive is unzipped, the directory `cased_L-12_H-1024_A-16` is created and contains the following files: + +* TensorFlow checkpoint (`xlnet_model.ckpt`) containing the pre-trained weights (which is actually 3 files) +* sentence piece model (`spiece.model`) used for (de)tokenization +* config file (`xlnet_config.json`) which specifies the hyperparameters of the model + +To get pb-file from the archive contents, you need to do the following. + +1. Run commands + +```sh + cd ~ + mkdir XLNet-Large + cd XLNet-Large + git clone https://github.com/zihangdai/xlnet + wget https://storage.googleapis.com/xlnet/released_models/cased_L-24_H-1024_A-16.zip + unzip cased_L-24_H-1024_A-16.zip + mkdir try_save +``` + + + +2. Save and run the following script: + +```python +from collections import namedtuple + +import tensorflow as tf +from tensorflow.python.framework import graph_io + +import model_utils +import xlnet + +LENGTHS = 50 +BATCH = 1 +OUTPUT_DIR = '~/XLNet-Large/try_save' +INIT_CKPT_PATH = '~/XLNet-Large/cased_L-24_H-1024_A-16/xlnet_model.ckpt' +XLNET_CONFIG_PATH = '~/XLNet-Large/cased_L-24_H-1024_A-16/xlnet_config.json' + +FLags = namedtuple('FLags', 'use_tpu init_checkpoint') +FLAGS = FLags(use_tpu=False, init_checkpoint=INIT_CKPT_PATH) + +xlnet_config = xlnet.XLNetConfig(json_path=XLNET_CONFIG_PATH) +run_config = xlnet.RunConfig(is_training=False, use_tpu=False, use_bfloat16=False, dropout=0.1, dropatt=0.1,) + + +sentence_features_input_idx = tf.placeholder(tf.int32, shape=[LENGTHS, BATCH], name='input_ids') +sentence_features_segment_ids = tf.placeholder(tf.int32, shape=[LENGTHS, BATCH], name='seg_ids') +sentence_features_input_mask = tf.placeholder(tf.float32, shape=[LENGTHS, BATCH], name='input_mask') + +with tf.Session() as sess: + xlnet_model = xlnet.XLNetModel(xlnet_config=xlnet_config, run_config=run_config, + input_ids=sentence_features_input_idx, + seg_ids=sentence_features_segment_ids, + input_mask=sentence_features_input_mask) + + sess.run(tf.global_variables_initializer()) + model_utils.init_from_checkpoint(FLAGS, True) + + # Save the variables to disk. + saver = tf.train.Saver() + + # Saving checkpoint + save_path = saver.save(sess, OUTPUT_DIR + "model.ckpt") + + # Freezing model + outputs = ['model/transformer/dropout_2/Identity'] + graph_def_freezed = tf.graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), outputs) + + # Saving non-frozen and frozen model to pb + graph_io.write_graph(sess.graph.as_graph_def(), OUTPUT_DIR, 'model.pb', as_text=False) + graph_io.write_graph(graph_def_freezed,OUTPUT_DIR, 'model_frozen.pb', + as_text=False) + + # Write to tensorboard + with tf.summary.FileWriter(logdir=OUTPUT_DIR, graph_def=graph_def_freezed) as writer: + writer.flush() +``` + +The script should save into `~/XLNet-Large/xlnet`. + + + +## Convert frozen TensorFlow XLNet Model to IR + +To generate the XLNet Intermediate Representation (IR) of the model, run the Model Optimizer with the following parameters: +```sh +python3 mo.py --input_model path-to-model/model_frozen.pb --input "input_mask[50 1],input_ids[50 1],seg_ids[50 1]" --log_level DEBUG --disable_nhwc_to_nchw +``` + diff --git a/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_YOLO_From_Tensorflow.md b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_YOLO_From_Tensorflow.md new file mode 100644 index 00000000000000..b7288322441692 --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_YOLO_From_Tensorflow.md @@ -0,0 +1,176 @@ +# Converting YOLO* Models to the Intermediate Representation (IR) {#openvino_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_YOLO_From_Tensorflow} + +This tutorial explains how to convert real-time object detection YOLOv1\*, YOLOv2\*, and YOLOv3\* public models to the Intermediate Representation (IR). All YOLO\* models are originally implemented in the DarkNet\* framework and consist of two files: +* `.cfg` file with model configurations +* `.weights` file with model weights + +Depending on a YOLO model version, the Model Optimizer converts it differently: + +- YOLOv3 has several implementations. This tutorial uses a TensorFlow implementation of YOLOv3 model, which can be directly converted to the IR. +- YOLOv1 and YOLOv2 models must be first converted to TensorFlow\* using DarkFlow\*. + +## Convert YOLOv3 Model to IR + +On GitHub*, you can find several public versions of TensorFlow YOLOv3 model implementation. This tutorial explains how to convert YOLOv3 model from +the [https://github.com/mystic123/tensorflow-yolo-v3](https://github.com/mystic123/tensorflow-yolo-v3) repository (commit ed60b90) to IR , but the process is similar for other versions of TensorFlow YOLOv3 model. + +### Overview of YOLOv3 Model Architecture +Originally, YOLOv3 model includes feature extractor called `Darknet-53` with three branches at the end that make detections at three different scales. These branches must end with the YOLO `Region` layer. + +`Region` layer was first introduced in the DarkNet framework. Other frameworks, including TensorFlow, do not have the +`Region` implemented as a single layer, so every author of public YOLOv3 model creates it using +simple layers. This badly affects performance. For this reason, the main idea of YOLOv3 model conversion to IR is to cut off these +custom `Region`-like parts of the model and complete the model with the `Region` layers where required. + +### Dump YOLOv3 TensorFlow\* Model +To dump TensorFlow model out of [https://github.com/mystic123/tensorflow-yolo-v3](https://github.com/mystic123/tensorflow-yolo-v3) GitHub repository (commit ed60b90), follow the instructions below: + +1. Clone the repository:
+```sh +git clone https://github.com/mystic123/tensorflow-yolo-v3.git +cd tensorflow-yolo-v3 +``` +2. (Optional) Checkout to the commit that the conversion was tested on:
+```sh +git checkout ed60b90 +``` +3. Download [coco.names](https://raw.githubusercontent.com/pjreddie/darknet/master/data/coco.names) file from the DarkNet website **OR** use labels that fit your task. +4. Download the [yolov3.weights](https://pjreddie.com/media/files/yolov3.weights) (for the YOLOv3 model) or [yolov3-tiny.weights](https://pjreddie.com/media/files/yolov3-tiny.weights) (for the YOLOv3-tiny model) file **OR** use your pretrained weights with the same structure +5. Run a converter: +- for YOLO-v3: +```sh +python3 convert_weights_pb.py --class_names coco.names --data_format NHWC --weights_file yolov3.weights +``` +- for YOLOv3-tiny: +```sh +python3 convert_weights_pb.py --class_names coco.names --data_format NHWC --weights_file yolov3-tiny.weights --tiny +``` + +If you have YOLOv3 weights trained for an input image with the size different from 416 (320, 608 or your own), please provide the `--size` key with the size of your image specified while running the converter. For example, run the following command for an image with size 608: +```sh +python3 convert_weights_pb.py --class_names coco.names --data_format NHWC --weights_file yolov3_608.weights --size 608 +``` + +### Convert YOLOv3 TensorFlow Model to the IR + +To solve the problems explained in the YOLOv3 architecture overview section, use the `yolo_v3.json` or `yolo_v3_tiny.json` (depending on a model) configuration file with custom operations located in the `/deployment_tools/model_optimizer/extensions/front/tf` repository. + +It consists of several attributes:
+```sh +[ + { + "id": "TFYOLOV3", + "match_kind": "general", + "custom_attributes": { + "classes": 80, + "anchors": [10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326], + "coords": 4, + "num": 9, + "masks":[[6, 7, 8], [3, 4, 5], [0, 1, 2]], + "entry_points": ["detector/yolo-v3/Reshape", "detector/yolo-v3/Reshape_4", "detector/yolo-v3/Reshape_8"] + } + } +] +``` +where: +- `id` and `match_kind` are parameters that you cannot change. +- `custom_attributes` is a parameter that stores all the YOLOv3 specific attributes: + - `classes`, `coords`, `num`, and `masks` are attributes that you should copy from the configuration file + file that was used for model training. If you used DarkNet officially shared weights, + you can use `yolov3.cfg` or `yolov3-tiny.cfg` configuration file from https://github.com/pjreddie/darknet/tree/master/cfg. Replace the default values in `custom_attributes` with the parameters that + follow the `[yolo]` titles in the configuration file. + - `anchors` is an optional parameter that is not used while inference of the model, but it used in a demo to parse `Region` layer output + - `entry_points` is a node name list to cut off the model and append the Region layer with custom attributes specified above. + + +To generate the IR of the YOLOv3 TensorFlow model, run:
+```sh +python3 mo_tf.py +--input_model /path/to/yolo_v3.pb +--tensorflow_use_custom_operations_config $MO_ROOT/extensions/front/tf/yolo_v3.json +--batch 1 +``` + +To generate the IR of the YOLOv3-tiny TensorFlow model, run:
+```sh +python3 mo_tf.py +--input_model /path/to/yolo_v3_tiny.pb +--tensorflow_use_custom_operations_config $MO_ROOT/extensions/front/tf/yolo_v3_tiny.json +--batch 1 +``` + +where: + +* `--batch` defines shape of model input. In the example, `--batch` is equal to 1, but you can also specify other integers larger than 1. +* `--tensorflow_use_custom_operations_config` adds missing `Region` layers to the model. In the IR, the `Region` layer has name `RegionYolo`. + +> **NOTE:** The color channel order (RGB or BGR) of an input data should match the channel order of the model training dataset. If they are different, perform the `RGB<->BGR` conversion specifying the command-line parameter: `--reverse_input_channels`. Otherwise, inference results may be incorrect. For more information about the parameter, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../Converting_Model_General.md). + +OpenVINO™ toolkit provides a demo that uses YOLOv3 model. For more information, refer to [Object Detection YOLO* V3 Demo, Async API Performance Showcase](@ref omz_demos_object_detection_demo_yolov3_async_README). + +## Convert YOLOv1 and YOLOv2 Models to the IR + +Before converting Choose a YOLOv1 or YOLOv2 model version that best suits your task. Download model configuration file and corresponding weight file: +* From [DarkFlow repository](https://github.com/thtrieu/darkflow): configuration files are stored in the `cfg` directory, links to weight files are given in the `README.md` file. The files from this repository are adapted for conversion to TensorFlow using DarkFlow. +* From DarkNet website and repository: configuration files are stored in the `cfg` directory of the [repository](https://github.com/pjreddie/darknet), links to weight files are given on the [YOLOv1](https://pjreddie.com/darknet/yolov1/) and [YOLOv2](https://pjreddie.com/darknet/yolov2/) websites. + +To convert DarkNet YOLOv1 and YOLOv2 models to IR, follow the next steps: + +1. Install DarkFlow +2. Convert DarkNet YOLOv1 or YOLOv2 model to TensorFlow using DarkFlow +3. Convert TensorFlow YOLOv1 or YOLOv2 model to IR + +#### Install DarkFlow* + +You need DarkFlow to convert YOLOv1 and YOLOv2 models to TensorFlow. To install DarkFlow: +1. Install DarkFlow [required dependencies](https://github.com/thtrieu/darkflow#dependencies). +2. Clone DarkFlow git repository:
+```sh +git clone https://github.com/thtrieu/darkflow.git +``` +3. Go to the root directory of the cloned repository:
+```sh +cd darkflow +``` +4. Install DarkFlow using the instructions from the `README.md` file in the [DarkFlow repository](https://github.com/thtrieu/darkflow/blob/master/README.md#getting-started). + +#### Convert DarkNet\* YOLOv1 or YOLOv2 Model to TensorFlow\* + +To convert YOLOv1 or YOLOv2 model to TensorFlow, go to the root directory of the cloned DarkFlow repository and run the following command:
+```sh +python3 ./flow --model /.cfg --load /.weights --savepb +``` + +If the model was successfully converted, you can find the `.meta` and `.pb` files +in `built_graph` subdirectory of the cloned DarkFlow repository. + +File `.pb` is a TensorFlow representation of the YOLO model. + +#### Convert TensorFlow YOLOv1 or YOLOv2 Model to the IR + +Converted TensorFlow YOLO model is missing `Region` layer and its parameters. Original YOLO `Region` layer parameters are stored in the configuration `/.cfg` +file under the `[region]` title. + +To recreate the original model structure, use the corresponding yolo `.json` configuration file with custom operations and `Region` layer +parameters when converting the model to the IR. This file is located in the `/deployment_tools/model_optimizer/extensions/front/tf` directory. + +If chosen model has specific values of this parameters, +create another configuration file with custom operations and use it for conversion. + +To generate the IR of the YOLOv1 model, provide TensorFlow YOLOv1 or YOLOv2 model to the Model Optimizer with the following parameters:
+```sh +python3 ./mo_tf.py +--input_model /.pb \ +--batch 1 \ +--scale 255 \ +--tensorflow_use_custom_operations_config /deployment_tools/model_optimizer/extensions/front/tf/.json +``` +where: + +* `--batch` defines shape of model input. In the example, `--batch` is equal to 1, but you can also specify other integers larger than 1. +* `--scale` specifies scale factor that input values will be divided by. +The model was trained with input values in the range `[0,1]`. OpenVINO™ toolkit samples read input images as values in `[0,255]` range, so the scale 255 must be applied. +* `--tensorflow_use_custom_operations_config` adds missing `Region` layers to the model. In the IR, the `Region` layer has name `RegionYolo`. +For other applicable parameters, refer to [Convert Model from TensorFlow](../Convert_Model_From_TensorFlow.md). + +> **NOTE:** The color channel order (RGB or BGR) of an input data should match the channel order of the model training dataset. If they are different, perform the `RGB<->BGR` conversion specifying the command-line parameter: `--reverse_input_channels`. Otherwise, inference results may be incorrect. For more information about the parameter, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../Converting_Model_General.md). diff --git a/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_lm_1b_From_Tensorflow.md b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_lm_1b_From_Tensorflow.md new file mode 100644 index 00000000000000..21fb9ff3278f56 --- /dev/null +++ b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_lm_1b_From_Tensorflow.md @@ -0,0 +1,101 @@ +# Converting TensorFlow* Language Model on One Billion Word Benchmark to the Intermediate Representation {#openvino_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_lm_1b_From_Tensorflow} + +## Download the Pre-trained Language Model on One Billion Word Benchmark + +TensorFlow* provides [a pre-trained Language Model on One Billion Word Benchmark](https://github.com/tensorflow/models/tree/master/research/lm_1b). + +To download the model for IR conversion, please follow the instruction: +1. Create new directory to store the model: +```shell +mkdir lm_1b +``` +2. Go to the `lm_1b` directory: +```shell +cd lm_1b +``` +3. Download the model GraphDef file: +``` +wget http://download.tensorflow.org/models/LM_LSTM_CNN/graph-2016-09-10.pbtxt +``` +4. Create new directory to store 12 checkpoint shared files: +```shell +mkdir ckpt +``` +5. Go to the `ckpt` directory: +```shell +cd ckpt +``` +6. Download 12 checkpoint shared files: +``` +wget http://download.tensorflow.org/models/LM_LSTM_CNN/all_shards-2016-09-10/ckpt-base +wget http://download.tensorflow.org/models/LM_LSTM_CNN/all_shards-2016-09-10/ckpt-char-embedding +wget http://download.tensorflow.org/models/LM_LSTM_CNN/all_shards-2016-09-10/ckpt-lstm +wget http://download.tensorflow.org/models/LM_LSTM_CNN/all_shards-2016-09-10/ckpt-softmax0 +wget http://download.tensorflow.org/models/LM_LSTM_CNN/all_shards-2016-09-10/ckpt-softmax1 +wget http://download.tensorflow.org/models/LM_LSTM_CNN/all_shards-2016-09-10/ckpt-softmax2 +wget http://download.tensorflow.org/models/LM_LSTM_CNN/all_shards-2016-09-10/ckpt-softmax3 +wget http://download.tensorflow.org/models/LM_LSTM_CNN/all_shards-2016-09-10/ckpt-softmax4 +wget http://download.tensorflow.org/models/LM_LSTM_CNN/all_shards-2016-09-10/ckpt-softmax5 +wget http://download.tensorflow.org/models/LM_LSTM_CNN/all_shards-2016-09-10/ckpt-softmax6 +wget http://download.tensorflow.org/models/LM_LSTM_CNN/all_shards-2016-09-10/ckpt-softmax7 +wget http://download.tensorflow.org/models/LM_LSTM_CNN/all_shards-2016-09-10/ckpt-softmax8 +``` + +After you download the pre-trained model files, you will have the `lm_1b` directory with the following hierarchy: + +``` +lm_1b/ + graph-2016-09-10.pbtxt + ckpt/ + ckpt-base + ckpt-char-embedding + ckpt-lstm + ckpt-softmax0 + ckpt-softmax1 + ckpt-softmax2 + ckpt-softmax3 + ckpt-softmax4 + ckpt-softmax5 + ckpt-softmax6 + ckpt-softmax7 + ckpt-softmax8 +``` + + +![lm_1b model view](../../../img/lm_1b.png) + +As you can see, the frozen model still has two variables: `Variable` and `Variable_1`. +It means that the model keeps training those variables at each inference. + +At the first inference of this graph, the variables are initialized by initial values. +After executing the `lstm` nodes, results of execution are assigned to these two variables. + +With each inference of the `lm_1b` graph, `lstm` initial states data is taken from previous inference +from variables and states of current inference of `lstm` is reassigned to the same variables. + +It helps the model to remember the context of the words that it takes as input. + +## Convert TensorFlow Language Model on One Billion Word Benchmark to IR + +The Model Optimizer assumes that output model is for inference only. +That is why you should cut those variables off and resolve keeping cell and hidden states on application level. + +There is a certain limitations for the model conversion: +- Original model cannot be reshaped, so you should keep original shapes. + +To generate the `lm_1b` Intermediate Representation (IR), provide TensorFlow `lm_1b` model to the +Model Optimizer with parameters: +```sh +python3 ./mo_tf.py +--input_model lm_1b/graph-2016-09-10.pbtxt \ +--input_checkpoint lm_1b/ckpt \ +--input_model_is_text \ +--input_shape [50],[50],[1,9216],[1,9216] \ +--output softmax_out,lstm/lstm_0/concat_2,lstm/lstm_1/concat_2 \ +--input char_embedding/EmbeddingLookupUnique/Unique:0,char_embedding/EmbeddingLookupUnique/Unique:1,Variable/read,Variable_1/read +``` + +Where: +* `--input char_embedding/EmbeddingLookupUnique/Unique:0,char_embedding/EmbeddingLookupUnique/Unique:1,Variable/read,Variable_1/read` + and `--input_shape [50],[50],[1,9216],[1,9216]` replace the variables with a placeholder +* `--output softmax_out,lstm/lstm_0/concat_2,lstm/lstm_1/concat_2` specifies output node name and names of LSTM cell states. diff --git a/docs/MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md b/docs/MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md new file mode 100644 index 00000000000000..2eb6b1717a58f5 --- /dev/null +++ b/docs/MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md @@ -0,0 +1,82 @@ +# Custom Layers in the Model Optimizer {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer} + +Model Optimizer searches for each layer of the input model in the list of known layers before building the model's internal representation, optimizing the model, and producing the Intermediate Representation. + +The list of known layers is different for each of supported frameworks. To see the layers supported by your framework, refer to the [corresponding section](../Supported_Frameworks_Layers.md). + +Custom layers are layers that are not included into a list of known layers. If your topology contains any layers that are not in the list of known layers, the Model Optimizer classifies them as custom. + +## Caffe\* Models with Custom Layers + +You have two options if your Caffe\* model has custom layers: + +* **Register the custom layers as extensions to the Model Optimizer**. For instructions, see [Extending Model Optimizer with New Primitives](Extending_Model_Optimizer_with_New_Primitives.md). When your custom layers are registered as extensions, the Model Optimizer generates a valid and optimized Intermediate Representation. You only need to write a small chunk of Python\* code that lets the Model Optimizer: + + * Generate a valid Intermediate Representation according to the rules you specified + * Be independent from the availability of Caffe on your computer + +* **Register the custom layers as Custom and use the system Caffe to calculate the output shape of each Custom Layer**, which is required by the Intermediate Representation format. For this method, the Model Optimizer requires the Caffe Python interface on your system. When registering the custom layer in the `CustomLayersMapping.xml` file, you can specify if layer parameters should appear in Intermediate Representation or if they should be skipped. To read more about the expected format and general structure of this file, see [Legacy Mode for Caffe* Custom Layers](Legacy_Mode_for_Caffe_Custom_Layers.md). This approach has several limitations: + + * If your layer output shape depends on dynamic parameters, input data or previous layers parameters, calculation of output shape of the layer via Caffe can be incorrect. In this case, you need to patch Caffe on your own. + + * If the calculation of output shape of the layer via Caffe fails inside the framework, Model Optimizer is unable to produce any correct Intermediate Representation and you also need to investigate the issue in the implementation of layers in the Caffe and patch it. + + * You are not able to produce Intermediate Representation on any machine that does not have Caffe installed. If you want to use Model Optimizer on multiple machines, your topology contains Custom Layers and you use `CustomLayersMapping.xml` to fallback on Caffe, you need to configure Caffe on each new machine. + + For these reasons, it is best to use the Model Optimizer extensions for Custom Layers: you do not depend on the framework and fully control the workflow. + +If your model contains Custom Layers, it is important to understand the internal workflow of Model Optimizer. Consider the following example. + +**Example**: + +The network has: + +* One input layer (#1) +* One output Layer (#5) +* Three internal layers (#2, 3, 4) + +The custom and standard layer types are: + +* Layers #2 and #5 are implemented as Model Optimizer extensions. +* Layers #1 and #4 are supported in Model Optimizer out-of-the box. +* Layer #3 is neither in the list of supported layers nor in extensions, but is specified in CustomLayersMapping.xml. + +> **NOTE**: If any of the layers are not in one of three categories described above, the Model Optimizer fails with an appropriate message and a link to the corresponding question in [Model Optimizer FAQ](../Model_Optimizer_FAQ.md). + +The general process is as shown: + +![Example custom layer network](../../img/mo_caffe_priorities.png) + +1. The example model is fed to the Model Optimizer that **loads the model** with the special parser, built on top of `caffe.proto` file. In case of failure, Model Optimizer asks you to prepare the parser that can read the model. For more information, refer to Model Optimizer, FAQ #1. + +2. Model Optimizer **extracts the attributes of all layers**. In particular, it goes through the list of layers and attempts to find the appropriate extractor. In order of priority, Model Optimizer checks if the layer is: + + * Registered in `CustomLayersMapping.xml` + * Registered as a Model Optimizer extension + * Registered as a standard Model Optimizer layer + + When the Model Optimizer finds a satisfying condition from the list above, it extracts the attributes according to the following rules: + + * For bullet #1 - either takes all parameters or no parameters, according to the content of `CustomLayersMapping.xml` + * For bullet #2 - takes only the parameters specified in the extension + * For bullet #3 - takes only the parameters specified in the standard extractor + +3. Model Optimizer **calculates the output shape of all layers**. The logic is the same as it is for the priorities. **Important:** the Model Optimizer always takes the first available option. + +4. Model Optimizer **optimizes the original model and produces the Intermediate Representation**. + +## TensorFlow\* Models with Custom Layers + +You have two options for TensorFlow\* models with custom layers: + +* **Register those layers as extensions to the Model Optimizer.** In this case, the Model Optimizer generates a valid and optimized Intermediate Representation. +* **If you have sub-graphs that should not be expressed with the analogous sub-graph in the Intermediate Representation, but another sub-graph should appear in the model, the Model Optimizer provides such an option.** This feature is helpful for many TensorFlow models. To read more, see [Sub-graph Replacement in the Model Optimizer](Subgraph_Replacement_Model_Optimizer.md). + +## MXNet\* Models with Custom Layers + +There are two options to convert your MXNet* model that contains custom layers: + +1. Register the custom layers as extensions to the Model Optimizer. For instructions, see [Extending MXNet Model Optimizer with New Primitives](Extending_MXNet_Model_Optimizer_with_New_Primitives.md). When your custom layers are registered as extensions, the Model Optimizer generates a valid and optimized Intermediate Representation. You can create Model Optimizer extensions for both MXNet layers with op `Custom` and layers which are not standard MXNet layers. + +2. If you have sub-graphs that should not be expressed with the analogous sub-graph in the Intermediate Representation, but another sub-graph should appear in the model, the Model Optimizer provides such an option. In MXNet the function is actively used for ssd models provides an opportunity to for the necessary subgraph sequences and replace them. To read more, see [Sub-graph Replacement in the Model Optimizer](Subgraph_Replacement_Model_Optimizer.md). + diff --git a/docs/MO_DG/prepare_model/customize_model_optimizer/Extending_MXNet_Model_Optimizer_with_New_Primitives.md b/docs/MO_DG/prepare_model/customize_model_optimizer/Extending_MXNet_Model_Optimizer_with_New_Primitives.md new file mode 100644 index 00000000000000..4203a1f74114de --- /dev/null +++ b/docs/MO_DG/prepare_model/customize_model_optimizer/Extending_MXNet_Model_Optimizer_with_New_Primitives.md @@ -0,0 +1,45 @@ +# Extending the MXNet Model Optimizer with New Primitives {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_MXNet_Model_Optimizer_with_New_Primitives} + +This section describes how you can create a Model Optimizer extension for a custom layer from your MXNet* model. It supplements the main document [Extending Model Optimizer with New Primitives](Extending_Model_Optimizer_with_New_Primitives.md) and provides a step-by-step procedure. To create an extension for a particular layer, perform the following steps: + +1. Create the file `custom_proposal_ext.py` in the folder `/deployment_tools/model_optimizer/extensions/front/mxnet` +If your MXNet layer has op `Custom`, create the `CustomProposalFrontExtractor` class inherited from `MXNetCustomFrontExtractorOp`: +```py +from mo.front.extractor import MXNetCustomFrontExtractorOp +class CustomProposalFrontExtractor(MXNetCustomFrontExtractorOp): + pass +``` +Otherwise, for layers that are not standard MXNet layers, create the `ProposalFrontExtractor` class inherited from `FrontExtractorOp`: +```py + from mo.front.extractor import FrontExtractorOp + class ProposalFrontExtractor(FrontExtractorOp): + pass +``` +2. Specify the operation that the extractor refers to and a specific flag. The flag represents whether the operation should be used by the Model Optimizer or should be excluded from processing: +```py +from mo.front.extractor import MXNetCustomFrontExtractorOp +class CustomProposalFrontExtractor(MXNetCustomFrontExtractorOp): + op = '_contrib_Proposal' + enabled = True +``` +3. Register a mapping rule between the original model and the `PythonProposalOp` attributes by overriding the following function: +```py +from mo.front.mxnet.extractors.utils import get_mxnet_layer_attrs +from mo.front.extractor import MXNetCustomFrontExtractorOp +from mo.ops.op import Op + +class CustomProposalFrontExtractor(MXNetCustomFrontExtractorOp): + op = '_contrib_Proposal' + enabled = True + @staticmethod + def extract(node): + attrs = get_mxnet_layer_attrs(node.symbol_dict) + node_attrs = { + 'feat_stride': attrs.float('feat_stride', 16) + } + + # update the attributes of the node + Op.get_op_class_by_name('Proposal').update_node_stat(node, node_attrs) # <------ here goes the name ('Proposal') of the Operation that was implemented before + return __class__.enabled +``` + diff --git a/docs/MO_DG/prepare_model/customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md b/docs/MO_DG/prepare_model/customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md new file mode 100644 index 00000000000000..8625f406078861 --- /dev/null +++ b/docs/MO_DG/prepare_model/customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md @@ -0,0 +1,476 @@ +# Extending the Model Optimizer with New Primitives {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_with_New_Primitives} + +This section explains how to register a custom layer in the Model Optimizer, including how to register Proposal as a custom layer. This section also demonstrates how `Proposal` works as a custom layer. + +Model Optimizer loads the model, goes through the topology, and tries to find each layer type in the list of known layers. If the Model Optimizer does not find a layer in that list, it looks for the layer in the list of custom layers. If the Model Optimizer fails to find the layer among the defined custom layers, it registers a Caffe\* fallback for for the output shape inference. If the Model Optimizer does not find Caffe and cannot infer shapes, the Model Optimizer fails with an appropriate message. + +You must know two things about custom layers with the Model Optimizer: + +* How to map a subgraph in a FW model to a subgraph consisting of Inference Engine layers. For Caffe, the subgraph is a 1-to-1 mapping of a Caffe layer to an Inference Engine layer. +* How to infer shapes for unknown subgraphs. This can be either for a step in which the internal representation consists of framework-specific layers, or for a step in which the internal representation consists of Inference Engine layers. + +You also have the option of a framework fallback for unknown subgraphs, for when the original framework is used for inference of output shapes of operations. The example below demonstrates the case in which the framework is not available or should not be used. + +## Preparing an Example Topology + +> **NOTE**: Skip this section if you have a topology with a layer that is not known to the Model Optimizer. + +The information in this section prepares a Caffe\* model with the provided, deployment-ready `prototxt` for a +well-known topology called +[Faster-R-CNN protoxt](https://raw.githubusercontent.com/rbgirshick/py-faster-rcnn/master/models/pascal_voc/VGG16/faster_rcnn_end2end/test.prototxt) +to demonstrate the workflow. To use this example, you must have +[weights and biases](http://dl.dropboxusercontent.com/s/o6ii098bu51d139/faster_rcnn_models.tgz?dl=0) for inference, +because `prototxt` just describes the structure of the topology. + +1. Download the `.caffemodel` and `.prototxt` files +2. Run the Model Optimizer on the `.caffemodel` and `.prototxt` files: +```shell +python mo.py --input_model VGG16_faster_rcnn_final.caffemodel --input_proto test.prototxt +``` +You will likely see the error message: +```shell +Error parsing text-format caffe.NetParameter: 196:16: Message type "caffe.DropoutParameter" has no field named "scale_train". +``` +Whether you see the error depends on your Caffe version. For example, BVLC Caffe does not support the boolean parameter `scale_train` for the `dropout` layer. The error message does not matter, because the dropout layer is needed only for training, and the Model Optimizer removes it. +3. To proceed, comment out these lines in `test.prototxt`: +```sh +... +layer { + name: "drop6" + type: "Dropout" + bottom: "fc6" + top: "fc6" + dropout_param { + dropout_ratio: 0.5 + # scale_train: false # <-------------- comment out this line + } +} +... +layer { + name: "drop7" + type: "Dropout" + bottom: "fc7" + top: "fc7" + dropout_param { + dropout_ratio: 0.5 + # scale_train: false # <-------------- comment out this line + } +} +... +``` +4. Run the Model Optimizer on this model again: +```shell +python mo.py --input_model VGG16_faster_rcnn_final.caffemodel --input_proto test.prototxt +``` + You get the model successfuly converted to Intermediate Representation, and you can infer it with the Inference Engine. + + However, the aim of this tutorial is to demonstrate the way of supporting custom layers not yet supported by the Model Optimizer. + If you want to understand better how Model Optimizer works, remove the extension for layer `Proposal` and follow all steps of this tutorial. + +5. Remove the extension for layer `Proposal`: +```sh +mkdir extensions/old +mv extensions/front/caffe/proposal_python_ext.py extensions/old/proposal_python_ext_old.py +mv extensions/ops/proposal_python_example.py extensions/old/proposal_python__example_old.py +``` +6. Now you can run the Model Optimizer on this model once again: +```sh +python mo.py --input_model VGG16_faster_rcnn_final.caffemodel --input_proto test.prototxt +``` +You will see the message: +```shell +[ ERROR ] Found custom layer proposal. Model Optimizer does not support this layer. +Please, register it in CustomLayersMapping.xml or implement extension. +For more information please refer to Model Optimizer FAQ, question #FAQ45. +``` +This message means the Model Optimizer can load the model, but is unable to infer the shape and handle the custom layer properties. + +## Registering a Custom Layer as a Model Optimizer Extension + +In the following sections, you will learn how to make the Model Optimizer independent from Caffe\* when processing a +model that has a custom layer. In this example, the custom layer is referred to as the Proposal layer. + +Use this section to implement the mapping rules for the `Proposal` layer attributes and the output shape calculation. As part of these steps, you must first create a class for the `Proposal` layer and inherit it from general-purpose Op that defines the interface of every new custom layer. + +In this section, it is important to understand the `Op` class and its function. The implementation of this class shows that it expects a graph and attributes to be passed when initializing. The graph and attributes are in `/deployment_tools/model_optimizer/mo/ops/op.py` + +`Op` keeps the attributes for each operation and contains logic for handling node creation for internal model representation. `Op` is responsible for dumping each particular operation to the `.xml` format for the Intermediate Representation. By inheriting from it, the technical items are complete and you concentrate on the specificity of this layer: the attributes it supports and the rules on computing its output shape. + +Follow these steps: + +1. Create the file `python_proposal.py` in the directory `/deployment_tools/model_optimizer/extensions/ops`: +```python +from mo.ops.op import Op +class PythonProposalOp(Op): + pass +``` +2. Define the name of the operation and make a stub constructor: +```python +from mo.ops.op import Op +class PythonProposalOp(Op): + op = 'Proposal' + def __init__(self, graph, attrs): + super().__init__(graph) +``` +3. Every `Op` must have three specific fields defined: `type`, `op`, and `infer`. In most cases, the `type` and `op` names are the same, and `infer` is defined as a function to compute the output shape. Reflect these fields in your constructor: +```python +from mo.ops.op import Op +class PythonProposalOp(Op): + op = 'Proposal' + def __init__(self, graph, attrs): + mandatory_props = { + 'type': __class__.op, + 'op': __class__.op, + 'infer': None + } + super().__init__(graph, mandatory_props, attrs) +``` + According to the Intermediate Representation catalog, Proposal layer has the following attributes: + + * `pre_nms_topn` + * `post_nms_topn` + * `nms_thresh` + * `feat_stride` + * `min_size` + * `base_size` + * `ratio` + * `scale` +4. In defining supported attribute names, it is best to use the same names as in the original models. The names are similar to parameters and have no connection with the model layer properties. For clarity, you can use the name `my_ratio` for `ratio`. Other than defining the list of supported parameters, you can define only the parameters that appear in the Intermediate Representation in the `backend_attrs` method. + Define your attributes: +```python +class PythonProposalOp(Op): + # ... constructor + def supported_attrs(self): + return [ + 'pre_nms_topn', + 'post_nms_topn', + 'nms_thresh', + 'feat_stride', + 'min_size', + 'base_size', + 'ratio', + 'scale' + ] +``` +5. Model Optimizer now knows how to create the layer called Proposal when it is in the topology and what attributes this layer has. However, the Model Optimizer does not know how to calculate the output shape of this operation. Define a rule to calculate the output shape: +```python +import numpy as np +from mo.graph.graph import Node +from mo.ops.op import Op +class PythonProposalOp(Op): + def __init__(self, graph, attrs): + mandatory_props = { + 'type': __class__.op, + 'op': __class__.op, + 'infer': PythonProposalOp.calculate_output_shape + } + super().__init__(graph, mandatory_props, attrs) + # ... supported attrs + @staticmethod + def calculate_output_shape(node: Node): + node.out_node().shape = (1, 1, 1, 1) # any Proposal now has always the same output +``` +6. According to the Intermediate Representation catalog, Proposal layer has the following output calculation formula, where shape dynamically depends on the `post_nms_topn` parameter. + Implement the output calculation formula in Python\*: +```python +import numpy as np +class PythonProposalOp(Op): + # ... static fields + # ... constructor + # ... supported attrs + @staticmethod + def calculate_output_shape(node: Node): + input_shape = node.in_node(0).shape + out_shape = np.array([0, 0], dtype=np.int64) + # rois blob: holds R regions of interest, each is a 5 - tuple + # (n, x1, y1, x2, y2) specifying an image batch index n and a + # rectangle(x1, y1, x2, y2) + out_shape[0] = input_shape[0] * node.post_nms_topn + out_shape[1] = 5 + node.out_node(0).shape = out_shape +``` + The node does not contain this parameter because it should be initialized in the constructor and in other parameters. The Inference Engine contains the implementation of a Caffe\*-like Proposal layer and works well with the default values from `caffe.proto`: +``` +// Message that stores parameters used by ProposalLayer message ProposalParameter { optional uint32 feat_stride = 1 [default = 16]; optional uint32 base_size = 2 [default = 16]; optional uint32 min_size = 3 [default = 16]; repeated float ratio = 4; repeated float scale = 5; optional uint32 pre_nms_topn = 6 [default = 6000]; optional uint32 post_nms_topn = 7 [default = 300]; optional float nms_thresh = 8 [default = 0.7]; } +``` +7. Change the constructor as follows: +```python +class PythonProposalOp(Op): + # ... static fields + def __init__(self, graph, attrs): + mandatory_props = { + 'type': __class__.op, + 'op': __class__.op, + 'feat_stride': 16, + 'base_size': 16, + 'min_size': 16, + 'ratio': [0.5, 1, 2], + 'scale': [8, 16, 32], + 'pre_nms_topn': 6000, + 'post_nms_topn': 300, + 'nms_thresh': 0.7, + 'infer': PythonProposalOp.calculate_output_shape + } + super().__init__(graph, mandatory_props, attrs) + # ... supported attrs + # ... calculate output shape + +``` + +It is mandatory to call two functions right after the implementation of that class: + +``` +class ProposalPythonOp(Op): + ... + +register_caffe_python_extractor(ProposalPythonOp, 'rpn.proposal_layer.ProposalLayer') +Op.excluded_classes.append(ProposalPythonOp) +``` + +Note that the first call register_caffe_python_extractor(ProposalPythonOp, 'rpn.proposal_layer.ProposalLayer') registers the extension of the layer in the Model Optimizer that will be found by a specific name (it is mandatory to join module name and layer name): 'rpn.proposal_layer.ProposalLayer'. + +The second call prevents the Model Optimizer from using this extension as if it is an extension for a layer with type `Proposal`. Otherwise, this layer can be chosen as an implementation of extension that can lead to potential issues. + +**Summary** + +In this section you implemented support for a custom layer with type `Python` that is `Proposal` layer in the topology. You learned how to calculate output shape of this layer. + +The values of attributes are hardcoded, and in the next section you will learn how to extract these values from original framework model (Caffe model in this case). + +## Registering Rules to Pass Extension Layer Properties from a Caffe\* Model to the Intermediate Representation + +Model Optimizer now knows how to set the shape of the `PythonProposalOp` operation, but it is incorrect to initialize attributes with same values for every operation. Instead, the values should be extracted from the original topology. Model Optimizer does not know how to map the custom layer properties to the `PythonProposalOp`. For this, you must register the `FrontExtractorOp` instance. + +> **NOTE**: This step is required only if the layer requires parameters from the original model. + +1. Remove call functions `register_caffe_python_extractor` and `Op.excluded_classes.append` from the file with `op`, because you will implement extracted attributes from prototxt by yourself. +There are multiple types of layers in Caffe: for example, `Convolution` and `Pooling`. Also, there is a specific type for custom Python\* layers called `Python`. Therefore, it is necessary to distinguish between those 'usual' types of layers and custom ones. If you want to implement extensions for a layer with type different to `Python`, you need to inherit your class of operation (for example, `ProposalFrontExtractor`) from `FrontExtractorOp`. Otherwise, inherit your class of operation from `CaffePythonFrontExtractorOp`. +2. Create a file `python_proposal_ext.py` in the folder `/deployment_tools/model_optimizer/extensions/front/caffe` +```py +from mo.front.extractor import CaffePythonFrontExtractorOp +class PythonProposalFrontExtractor(CaffePythonFrontExtractorOp): + pass +``` +For other layers types, inherit from `FrontExtractorOp`: +```py + from mo.front.extractor import FrontExtractorOp + class ProposalFrontExtractor(FrontExtractorOp): + pass +``` +You will implement extractor for layer with type `Python`, however, the steps are generally the same for layers with other types. +3. Specify the operation that the extractor refers to and a specific flag. The flag represents whether the operation should be used by the Model Optimizer or should be excluded from processing: +```py +from mo.front.extractor import CaffePythonFrontExtractorOp +class PythonProposalFrontExtractor(CaffePythonFrontExtractorOp): + op = 'rpn.proposal_layer.ProposalLayer' + enabled = True +``` +4. Register a mapping rule between the original model and the `PythonProposalOp` attributes by overriding the following function: +```py +from mo.front.extractor import CaffePythonFrontExtractorOp +from mo.ops.op import Op +class ProposalPythonFrontExtractor(CaffePythonFrontExtractorOp): + op = 'rpn.proposal_layer.ProposalLayer' + enabled = True + @staticmethod + def extract(node): + proto_layer = node.pb + param = proto_layer.python_param # each layer has a specific parameter, take a look at caffe.proto + python_params = str(param.param_str) # for Python layers, all params are in param_str + attrs = { + 'feat_stride': int(python_params.split(':')[-1]) + } + # update the attributes of the node + Op.get_op_class_by_name('Proposal').update_node_stat(node, attrs) # <------ here goes the name ('Proposal') of the Operation that was implemented before + return __class__.enabled +``` +> **NOTE:** if you implement extension for layer with type different to `Python`, change the following line: Op.get_op_class_by_name('Proposal').update_node_stat(node, attrs) to this line: Op.get_op_class_by_name(__class__.op).update_node_stat(node, mapping_rule). +You have successfully extracted the parameter `feat_stride` from `prototxt`, assuming it is the only parameter in this layer. +5. To increase the implementation flexibility: +```py + from mo.front.extractor import CaffePythonFrontExtractorOp + from mo.ops.op import Op + class PythonProposalFrontExtractor(CaffePythonFrontExtractorOp): + op = 'rpn.proposal_layer.ProposalLayer' + enabled = True + @staticmethod + def extract(node): + param = node.pb.python_param + attrs = CaffePythonFrontExtractorOp.parse_param_str(param.param_str) + Op.get_op_class_by_name('Proposal').update_node_stat(node, attrs) + return ProposalPythonFrontExtractor.enabled +``` + +You can successfully convert the model. Open the `.xml` file and view your code: +```xml +... + + + + + 1 + 18 + 15 + 15 + + + 1 + 36 + 15 + 15 + + + 1 + 3 + + + + + 300 + 5 + + + +... +``` + +Look at the output shape of the custom layer you implemented. The shape was calculated according to the rules specified in `PythonProposalOp`. The `ratio` and `scale` properties have the value `[0.5, 1, 2]` and `[8, 16, 32]`. They have square brackets because they are originally a repeated parameter. You converted the parameter to a list in `PythonProposalOp`. Model Optimizer cast the value to a string. According to Python\* rules, a list has a string representation of opening and closing square brackets and values joined by commas. + +This is not a valid notation for the Intermediate Representation specification, because repeated parameters must be separated by a comma but without the brackets. Therefore, you must override the Model Optimizer default behavior regarding how it handles those parameters during the Intermediate Representation emitting stage, after the optimizations are complete. To do so, implement `backend_attrs()` in the `PythonProposalOp` class: +```python +class PythonProposalOp(Op): + ... other methods + def backend_attrs(self) -> list: + """ + Gets list of attributes that should appear in resulting IR + Returns: + list of attributes names or list of tuples (name of attribute, pre-processing rule) + """ + return [ + ( # a tuple per attribute + 'ratio', # name of attribute + # pre-processing rule in a form of lambda + # lambda takes a PythonProposalOp node with all defined properties + # it translates [1,2,3] -> "1,2,3" + lambda node: ','.join(map(str, node['ratio'])) + ), + ( + 'scale', + lambda node: ','.join(map(str, node['scale'])) + ), + 'feat_stride', + 'base_size', + 'min_size', + 'pre_nms_topn', + 'post_nms_topn', + 'nms_thresh' + ] +``` +The model can now be successfully converted. + +Open the `.xml` file. `ratio` and `scale` have the expected correct values `0.5,1,2` and `8,16,32`: +```xml + ... + + + + + ... + + + ... + + + + ... +``` + +> **NOTE**: Model Optimizer supports the Faster-R-CNN topology. Run the following command for the same Intermediate Representation: + +```sh +python mo.py --input_model VGG16_faster_rcnn_final.caffemodel --input_proto test.prototxt --extensions /deployment_tools/inference-engine/samples/object_detection_sample/fasterrcnn_extensions +``` + +**Summary** + +In this section you learned how to: + +1. Create a framework-independent extension implementation of the Intermediate Representation custom layer with unified logic for calculating output shapes, specified set of attributes +2. Use the Framework-Specific property extractor to map original model custom layer properties to the expected properties of the Framework-Independent extension +3. Manipulate the custom layer properties representation in the resulting Intermediate Representation + +Files used in this section: + +* `/deployment_tools/model_optimizer/extensions/ops/python_proposal.py`: + +```py +import networkx as nx +import numpy as np +from mo.front.extractor import attr_getter +from mo.graph.graph import Node +from mo.ops.op import Op + +class ProposalOp(Op): + op = 'Proposal' + + def __init__(self, graph: nx.MultiDiGraph, attrs: dict): + mandatory_props = { + 'type': __class__.op, + 'op': __class__.op, + 'post_nms_topn': 300, # default in caffe-shared + 'infer': ProposalOp.proposal_infer + } + super().__init__(graph, mandatory_props, attrs) + + def supported_attrs(self): + return [ + 'feat_stride', + 'base_size', + 'min_size', + 'ratio', + 'scale', + 'pre_nms_topn', + 'post_nms_topn', + 'nms_thresh' + ] + + def backend_attrs(self): + return [ + 'feat_stride', + 'base_size', + 'min_size', + ('ratio', lambda node: attr_getter(node, 'ratio')), + ('scale', lambda node: attr_getter(node, 'scale')), + 'pre_nms_topn', + 'post_nms_topn', + 'nms_thresh', + ] + + @staticmethod + def proposal_infer(node: Node): + input_shape = node.in_node(0).shape + out_shape = np.array([0, 0], dtype=np.int64) + # rois blob: holds R regions of interest, each is a 5 - tuple + # (n, x1, y1, x2, y2) specifying an image batch index n and a + # rectangle(x1, y1, x2, y2) + out_shape[0] = input_shape[0] * node.post_nms_topn + out_shape[1] = 5 + node.out_node(0).shape = out_shape +``` +* `/deployment_tools/model_optimizer/extensions/front/caffe/python_proposal_ext.py`: + +```py +from mo.front.extractor import CaffePythonFrontExtractorOp +from mo.ops.op import Op + +class ProposalPythonFrontExtractor(CaffePythonFrontExtractorOp): + op = 'rpn.proposal_layer.ProposalLayer' + enabled = True + + @staticmethod + def extract(node): + param = node.pb.python_param + attrs = CaffePythonFrontExtractorOp.parse_param_str(param.param_str) + Op.get_op_class_by_name('Proposal').update_node_stat(node, attrs) + return ProposalPythonFrontExtractor.enabled +``` diff --git a/docs/MO_DG/prepare_model/customize_model_optimizer/Legacy_Mode_for_Caffe_Custom_Layers.md b/docs/MO_DG/prepare_model/customize_model_optimizer/Legacy_Mode_for_Caffe_Custom_Layers.md new file mode 100644 index 00000000000000..ba56ecfcaa147d --- /dev/null +++ b/docs/MO_DG/prepare_model/customize_model_optimizer/Legacy_Mode_for_Caffe_Custom_Layers.md @@ -0,0 +1,71 @@ +# Legacy Mode for Caffe* Custom Layers {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Legacy_Mode_for_Caffe_Custom_Layers} + +> **NOTE**: This functionality is deprecated and will be removed in future releases. + +Model Optimizer can register custom layers in a way that the output shape is calculated by the Caffe\* framework installed on your system. This chapter covers this option. + +> **NOTE**: Caffe Python\* API has an issue when layer name does not correspond to the name of its top. The fix was implemented on [BVLC Caffe\*](https://github.com/BVLC/caffe/commit/35a7b87ad87457291dfc79bf8a7e7cf7ef278cbb). The Caffe framework on your computer must contain this fix. Otherwise, Caffe framework can unexpectedly fail during the fallback procedure. + +> **NOTE**: The Caffe fallback feature was validated against [this GitHub revision](https://github.com/BVLC/caffe/tree/99466224dac86ddb86296b1e727794fb836bd80f). You may have issues with forks or later Caffe framework versions. + +1. Create a file `CustomLayersMapping.xml`: +```shell +mv extensions/front/caffe/CustomLayersMapping.xml.example extensions/front/caffe/CustomLayersMapping.xml +``` +2. Add (register) custom layers to `CustomLayersMapping.xml`: +``` +\ +``` + +Where: + +* `${Type}` is a type of the layer in the Caffe +* `${has_params}` is "true" if the layer has parameters, and is "false" otherwise +* `${layer_param}` is a name of the layer parameters in `caffe.proto` if the layer has it + +**Example**: + +1. `Proposal` layer has parameters, and they appear in the Intermediate Representation. The parameters are stored in the `proposal_param` property of the layer: +```shell +\ +``` +2. CustomLayer layer has no parameters: +```shell +\ +``` + +For this feature, you need an appropriate version of Caffe installed on the computer on which you run the Model Optimizer. + +## Constraints of Using the Caffe Fallback + +Several layers in the Caffe\* framework can have shapes that dynamically depend on the input data, not only the layers that proceed the layer and its parameters. For example, `SimplerNMS` is filtering out bounding boxes that do not satisfy the condition. Internally, Caffe fallback forwards the whole net without any meaningful data - just some noise. It is natural to get only one bounding box (0,0,0,0) instead of expected number (for example, 15). There is an option to patch Caffe accordingly, however, it makes success of Intermediate Representation generation on the patched Caffe on the particular machine. To keep the solution independent from Caffe, we recommend to use extensions mechanism for such layers. + +Known cases like `Proposal`, `DetectionOutput`, `SimplerNMS` are implemented as extensions and can be used out of the box. + +A detailed description of supported layers is in the [Operations Specification](../../../ops/opset.md) document. + +## Building Caffe\* + +1. Build Caffe\* with Python\* 3.5: +```shell +export CAFFE_HOME=PATH_TO_CAFFE +cd $CAFFE_HOME +rm -rf ./build +mkdir ./build +cd ./build +cmake -DCPU_ONLY=ON -DOpenCV_DIR= -DPYTHON_EXECUTABLE=/usr/bin/python3.5 .. +make all # also builds pycaffe +make install +make runtest # optional +``` +2. Add Caffe Python directory to `PYTHONPATH` to let it be imported from the Python program: +```shell +export PYTHONPATH=$CAFFE_HOME/python;$PYTHONPATH +``` +3. Check the Caffe installation: +```shell +python3 +import caffe +``` + +If Caffe was installed correctly, the `caffe` module is imported without errors. \ No newline at end of file diff --git a/docs/MO_DG/prepare_model/customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md b/docs/MO_DG/prepare_model/customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md new file mode 100644 index 00000000000000..d3ba399a87745d --- /dev/null +++ b/docs/MO_DG/prepare_model/customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md @@ -0,0 +1,363 @@ +# Sub-Graph Replacement in the Model Optimizer {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Subgraph_Replacement_Model_Optimizer} + +Several reasons exist for why the Model Optimizer could not generate an Intermediate Representation for a model. However, in some cases, the Intermediate Representation could be generated after providing certain hints to the tool. The examples of hints below are mostly related to TensorFlow\*, but potentially could be actual for models created in any framework: + +* Topology contains an operation (or a sub-graph of operations) not known for Model Optimizer, but this operation (sub-graph) could be expressed as a combination of known operations. A hint would be a description of this combination to the tool). +* Sub-graph of operations in the topology expresses a single layer known to Inference Engine. +* TensorFlow and Inference Engine use different layouts of tensors, NHWC and NCHW respectively. If some tensor in NHWC layout is flattened (for example, all the dimensions are squashed into single dim), it is not possible to convert it to NCHW layout required for Inference Engine, so Model Optimizer cannot produce correct Intermediate Representation. + +The detailed solutions for the examples above are given later, the next subsection shows what is common in all three examples. + +## Sub-graph Replacement + +In these cases, the sub-graph (or a single node) of initial graph is replaced with a new sub-graph (single node). The sub-graph replacement consists of the following steps: + +1. Identify an existing sub-graph for replacement + +2. Generate a new sub-graph + +3. Connect a new sub-graph to the graph (create input/output edges to the new sub-graph) + +4. Create output edges out of a new sub-graph to the graph + +5. Do something with the original sub-graph (for example, remove it) + +Model Optimizer provides several ways to perform most of the sub-graph replacement steps. The next subsections describe these methods. + +## Replace a Single Operation with a Sub-graph of Operations + +For example, there is an operation `SquaredDifference` in TensorFlow which calculates \f$(a - b)^2\f$, where \f$a\f$ and \f$b\f$ are input tensors. Inference Engine does not support such operation. However, `SquaredDifference` could be expressed using two `Power` operations and one `Eltwise Add`. The `Power` operation calculates \f$scale * (a ^ {power}) + shift\f$, where \f$a\f$ is a tensor and \f$scale\f$, \f$power\f$ and \f$shift\f$ are float values. The first `Power` operation negates the value of tensor \f$b\f$. The second one is used to square the result of \f$a + (- b)\f$ which is calculated using the `Eltwise Add` operation applied to tensor \f$a\f$ and tensor \f$-b\f$. + +Given that, we can replace all `SquaredDifference` operations in the initial model with two `Power` and one `Eltwise` operations. The replacer is implemented in the following file `/deployment_tools/model_optimizer/extensions/front/SquaredDifference.py`. +```python +import networkx as nx +from mo.front.common.replacement import FrontReplacementOp +from mo.graph.graph import Node +from mo.ops.eltwise import Eltwise +from mo.ops.power import Power +class SquaredDifference(FrontReplacementOp): + """ + Example class illustrating how to implement replacement of a single op in the front-end of the MO pipeline. + This class replaces a single op SquaredDifference by a sub-graph consisting of 3 lower-level ops. + """ + op = "SquaredDifference" + enabled = True + def replace_op(self, graph: nx.MultiDiGraph, node: Node): + negate = Power(graph, dict(scale=-1, name=node.name + '/negate_')) + add = Eltwise(graph, dict(operation='sum', name=node.name + '/add_')) + squared = Power(graph, dict(power=2, name=node.name + '/squared_')) + out_node = squared.create_node([add.create_node([node.in_node(0), negate.create_node([node.in_node(1)])])]) + # Replace edge from out port 0 of the matched node with a edge from node out_node.id with port 0. + # The "explicit" version of the return value is: [(out_node.id, 0)]) + return [out_node.id] +``` +Model Optimizer internal representation of the graph uses the networkx module. + +**Key lines**: + +* Line 1: Imports this module. + +* Line 3: Imports class `FrontReplacementOp` that is used to replace operation of particular type with a new sub-graph. This class performs the first step of the sub-graph replacement (identifies an existing sub-graph for replacement). It is important to mention that the replacement happens before shape inference and creation of data nodes representing tensors with values. At this stage of model conversion pipeline, all nodes in the graph are operation nodes or nodes of type `Const` that produce tensor with fixed value embedded into the node. + +* Line 4: Imports class `Node` representing a single node in the computation graph. + +* Lines 5 - 6: Import classes representing operations `Power` and `Eltwise`. These classes are inherited from base class `mo.ops.Op` that represents operation and stores its attributes. + +* Line 9: Defines class `SquaredDifference` inherited from `FrontReplacementOp`. This is a replacer class that is automatically registered and executed by Model Optimizer. Since the class is located in the common (not framework) specific directory `/deployment_tools/model_optimizer/extensions/front`, it is used for replacement for all supported frameworks. + +* Line 15: Defines the class variable `op` that stores the name of the operation to be replaced. In this case, it is `SquaredDifference`. + +* Line 16: Defines class variable `enabled` that controls whether the replacer is enabled or not. The only function that should be implemented in the class is `replace_op`. It gets graph to operate on and an instance of node of desired operation (`SquaredDifference` in this case). This function performs step two and three of the sub-graph replacement (generates a new sub-graph to replace with and connects a new sub-graph to the graph). + +* Lines 19 - 21: Create instances of operations classes with required attributes. + +* Line 23: Creates a sub-graph from the operations defined above. The `create_node` method of the `Op` class generates `Node` from the `Op` and uses single mandatory argument - the list of input nodes (represented as instances of `Node` class) to create input edges to the node being generated. Inputs of the `SquaredDifference` node are retrieved using `node.in_node(0)` and `node.in_node(1)` method calls. The `Eltwise Add` node gets first input as initial first input of `SquaredDifference` node, the second input of `add` is the result of negation of the second input of `SquaredDifference` node: `[add.create_node([node.in_node(0), negate.create_node([node.in_node(1)])])]`. Then the result of `Add` node is squared. `out_node` node performs this calculation. + +The `replace_op` function returns a list of node names used to create output edges of the sub-graph to connect it with the rest of the graph. Each element of the list describes mapping between old output edge of the matched node and new sub-graph node and output edge index. The i-th element of the list corresponds to the i-th output tensor of the matched node. In this case, `SquaredDifference` produces single tensor through output port 0, so the returned list contains single element. In general, each element is a tuple, where the first element is the name of a new node producing required tensor and the second is the output port for that tensor. If the output port is 0, it is possible to use shortcut - just the name of the node instead of a tuple. Line 26 uses this shortcut. The returned value is used to create the new sub-graph output edges (step 4 of the sub-graph replacement). + +Default implementation of the `FrontReplacementOp` class removes matched node and all its input/output edges (step 5 of the sub-graph replacement). + +Another example of such kind of replacement is in the `/deployment_tools/model_optimizer/extensions/front/Sub.py` class where all instances of `Sub` operations are replaced with two operations: `Power` to negate the second argument and the `Eltwise` to perform elementwise add. + +## Replace Sub-graph of Operations with a New Sub-graph of Operations + +The previous example considered situation when one single node of a specific type is replaced. When it is necessary to replace a sub-graph of operations it is necessary to tell Model Optimizer how to identify this sub-graph. There are three ways to achieve that: + +* Use graph isomorphism pattern of the networkx module + +* Use nodes name pattern to identify `scope` (according to TensorFlow terminology) to be replaced + +* Use sets of `start` and `end` node names to match all nodes "between" them + +The next sections explain each option using real examples. + +### Replace Sub-graph of Operations Using Graph Isomorphism Pattern + +networkx Python\* module provides methods to find graph isomorphic to the given one using nodes and edges match: for example, `networkx.algorithms.isomorphism.categorical_node_match`, `networkx.algorithms.isomorphism.categorical_multiedge_match`. Model Optimizer uses these methods and provides simple API to use that feature. + +For example, the Caffe\* has layer called [Mean-Variance Normalization (MVN)](http://caffe.berkeleyvision.org/tutorial/layers/mvn.html), which is also supported by the Inference Engine. This layer is implemented with low-level operations in TensorFlow: `Mean`, `StopGradient`, `SquaredDifference`, `Squeeze` and `FusedBatchNorm`. Model Optimizer should replace sub-graph with these operations with a single Inference Engine layer of type `MVN`. + +The file `/deployment_tools/model_optimizer/extensions/front/tf/mvn.py` performs such a replacement. The first part of the file is: +```python +class MVN(FrontReplacementSubgraph): + enabled = True + def pattern(self): + log.debug('Enabled MVN replacement') + return dict( + nodes=[ + ('mean', dict(op='Mean')), + ('stop_grad', dict(op='StopGradient')), + ('sqdiff', dict(op='SquaredDifference')), + ('variance', dict(op='Mean')), + ('squeeze_mean', dict(op='Squeeze')), + ('squeeze_variance', dict(op='Squeeze')), + ('fbn', dict(op='FusedBatchNorm')), + ], + edges=[ + ('mean', 'stop_grad', {'in': 0}), + ('stop_grad', 'sqdiff', {'in': 1}), + ('sqdiff', 'variance', {'in': 0}), + ('mean', 'squeeze_mean', {'in': 0}), + ('variance', 'squeeze_variance', {'in': 0}), + ('squeeze_mean', 'fbn', {'in': 3}), + ('squeeze_variance', 'fbn', {'in': 4}), + ], + node_attrs=['op'], + edge_attrs=['in']) +``` +**Key lines**: + +* Line 1: Defines class `MVN` inherited from class `FrontReplacementSubgraph` that performs sub-graph replacement using sub-graph isomorphism pattern. + +* Line 3: Sets class variable `enabled` to value True meaning that this replacer is enabled. + +* The function `pattern` defines the sub-graph constraints to be matched. It returns a dictionary with four keys: + + * the `nodes` defines a list of nodes to be matched. Each element in the list is a tuple. The first element is the alias name assigned for the matched node, the second element is a dictionary with desired attributes of the node. + + * the `edges` defines a list of edges to be matched. Each element in the list is a tuple. The first and the second elements are the start and end edge nodes alias names respectively. The third element is a dictionary with desired edge attributes. + + * the `node_attrs` contains the names of nodes attributes to use during sub-graph isomorphism search. + + * the `edge_attrs` contains the names of edges attributes to use during sub-graph isomorphism search. + + The sub-graph is matched if all provided constraints are satisfied. If at least one node with desired attributes is missing or at least one defined edge is absent, the sub-graph is not matched. +* Line 9: Adds constraint that sub-graph should contain node with attribute `op` with value `Mean`. The matched node gets an alias name `mean`. The same way the line 10 add constrain for node `StopGradient`, the matched node gets an alias name `stop_grad`. + +* Line 18: Defines edge from node with alias name `mean` to node with alias name `stop_grad` having attribute `in` equal to 0. This means that the output of node `mean` is connected to the node `stop_grad` as a first input (Model Optimizer uses zero-based indexing that is why `in` is 0). Another example of defining the edges constraints is in line 25 where the edge from `squeeze_mean` is connected to the `fbn` node as fourth input. + +* Lines 26 - 27: Specify a list of attributes to be checked. In fact, these lists are just list of all keys in the dictionaries for node and edge attributes. + +Now when the Model Optimizer knows how to find sub-graph (step 1 of the sub-graph replacement), it is necessary to implement function that will perform actual sub-graph replacement (step 2 and 3). The code for this function is: +```python +def replace_sub_graph(self, graph: nx.MultiDiGraph, match: dict): + fbn = match['fbn'] + input = fbn.in_node(0) + log.debug('Found potential MVN pattern after {} with name {}'.format(input.op, input.name)) + if input.id != match['mean'].in_node(0).id or input.id != match['sqdiff'].in_node(0).id: + return + log.debug('Confirmed MVN pattern after {} with name {}'.format(input.op, input.name)) + MVN = Op.get_op_class_by_name('MVN') + mvn = MVN(graph, dict( + name=fbn.name + '/MVN_', + eps=fbn.eps, + required_reduction_indices=[1,2] if fbn.data_format == b'NHWC' else [2,3] + )) + mvn.attrs['old_infer'] = mvn.attrs['infer'] + mvn.attrs['infer'] = __class__.infer + mul = Eltwise(graph, dict(operation='mul', name=fbn.name + '/Mul_')) + add = Eltwise(graph, dict(operation='sum', name=fbn.name + '/Add_')) + input_gamma = fbn.in_node(1) + input_beta = fbn.in_node(2) + mean_reduction = match['mean'].in_node(1) + variance_reduction = match['mean'].in_node(1) + new_subgraph = add.create_node([ + mul.create_node([ + mvn.create_node([input, mean_reduction, variance_reduction]), + input_gamma + ]), + input_beta + ]) + replace_node(fbn, new_subgraph) +``` +The function accepts two arguments - the graph and the dictionary `match`. The keys in the dictionary are the alias names of matched nodes (defined in the `nodes` list in the function `pattern`) and the values are the matched node of the graph (the instance of Node object). + +The function generates new sub-graph with node of type `MVN` and two nodes of the type `Eltwise` calculating sum and product. There is nothing interesting in how the graph is generated and mathematics behind that, so attention will be put to two aspects of this function. + +The first one is the call to function `replace_node` in line 36. `FusedBatchNorm` node is replaced with the output node of the generated sub-graph: all input edges of the `FusedBatchNorm` node are re-connected to the `new_subgraph` node, all consumers of the `FusedBatchNorm` node are updated to get inputs from the `new_subgraph` node. This action connects newly generated sub-graph with an existing graph (step 4 of the sub-graph replacement). + +The second one is that the default implementation of the inference function for `MVN` operation is overwritten. In line 16, the default implementation of the inference function for `MVN` is saved to attribute `old_infer`. In line 17, the new inference function is saved to the instance of the `MVN` operation class. The new inference function code looks the following way: +```python +@staticmethod +def infer(node: Node): + if not(node.in_node(1).has_valid('value') and node.in_node(2).has_valid('value')): + log.warning('Reduction indices for mean and variance for MVN node {} are not constants'.format(node.name)) + return + if not(all(node.in_node(1).value == node.required_reduction_indices) and + all(node.in_node(2).value == node.required_reduction_indices)): + log.warning('Reduction indices for mean {} and variance {} do not match required ones {}'.format( + node.in_node(1).value, + node.in_node(2).value, + node.required_reduction_indices + )) + return + node.graph.remove_edge(node.in_node(1).id, node.id) + node.graph.remove_edge(node.in_node(2).id, node.id) + node.old_infer(node) +``` +The `infer` function is needed to infer value of the node (if it is possible) and to infer shapes of the output tensors of the node (mandatory). The custom `infer` function performs additional checks that describe limitations of the `MVN` layer implementation in the Inference Engine. For example, reduction indices for mean and variance must be constants (line 10), while in TensorFlow they could be computed during model inference. In addition, the function removes two edges from the graph (lines 17 and 18) because all required information is already stored in the `MVN` node attributes. This is due to different `MVN` layer implementation in Inference Engine and TensorFlow\*: `mean` and `variance` are attributes of the node in Inference Engine while in TensorFlow they are input tensors. Edges are not removed in the `replace_sub_graph` function, because these edges are used in the `infer` function (lines 7-12). + +The last action in the `infer` method (line 19) is to call default infer function for the `MVN`, which is saved in the attribute `old_infer` of the node to infer output tensors shapes. + +On the step 5 of the sub-graph replacement, six matching nodes are automatically removed during the dead code elimination pass that is performed after applying of custom sub-graph replacements defined. Six matching nodes are no more connected to the inputs of the network after replacing node `fbn` with a newly created sub-graph node. Since they are not marked as output nodes (using `--output` command line parameter), they could be removed. + +The replacement works for all sub-graph isomorphism instances found in the network. + +### Replace Sub-graph of Operations Using Nodes Name Pattern + +TensorFlow uses a mechanism of scope to group related operation nodes. It is a good practice to put nodes performing particular task into the scope. This approach divides a graph into logical blocks that are easier to review in TensorBoard\*. The `scope`, in fact, just defines a common prefix for the node names in the scope. + +For example, Inception topologies contain several types of so-called "Inception blocks". Some of them are exactly equal to each other, but located in different places of the network. For example, Inception V4 from `tensorflow.contrib.slim` module has inception blocks `Mixed_5b`, `Mixed_5c` and `Mixed_5d` with exactly the same nodes with the same attributes. + +Now consider situation when someone implemented these Inception blocks extremely efficiently using single Inference Engine custom layer called `InceptionBlock` and would like to replace these blocks with instances of the layer to decrease inference time. Model Optimizer provides mechanism to replace sub-graph of operations defined by the regular expressions for the node names prefixes (scope). In this particular case, some of the patterns are: `.*InceptionV4/Mixed_5b`, `.*InceptionV4/Mixed_5c` and `.*InceptionV4/Mixed_5d`. Each pattern starts with `.*`, because a prefix `InceptionV4` is added to all nodes names during a model freeze. + +The sub-graph replacement using nodes name pattern is a bit trickier than replacements of single operation and networkx isomorphism pattern described above. You should do the following additional steps in comparison with previously described replacements: + +1. Prepare configuration file template defining node names patterns and information about custom layer attributes. + +2. Run Model Optimizer with command line parameter to add information about input and output nodes of the specified sub-graphs. + +Consider the following possible configuration file for the Inception Block replacer: +```json +[ + { + "custom_attributes": { + "attr1_key": "attr1_value", + "attr2_key": 123456 + }, + "id": "InceptionBlockReplacer", + "op": "InceptionBlock", + "instances": [ + ".*InceptionV4/Mixed_5b", + ".*InceptionV4/Mixed_5c", + ".*InceptionV4/Mixed_5d" + ], + "match_kind": "scope" + } +] +``` +The `.json` file contains list of dictionaries. Each dictionary defines one replacement. Each replacement is defined with several keys: + +* `id` (mandatory) is a unique identifier of the replacer. It is used in the Python\* code that implements sub-graph replacement to link the class and the replacement description from the configuration file. + +* `match_kind` (mandatory) is a string that specifies what matching algorithm is used. Currently supported `scope` and `points`. In this example, the first one is considered. The `points` match kind is described below. + +* `instances` (mandatory) specifies instances of the sub-graph to be matched. It contains a list of node names prefixes patterns for the match kind `scope`. + +* `custom_attributes` (optional) is a dictionary with static attributes of the layer to be dumped to Inference Engine Intermediate Representation `.xml` file. + +* `op` (optional) is used only if the sub-graph replacement Python code is not needed, because the sub-graph should be replaced with a single node of type `op`. If this attribute is not set, it is necessary to implement Python code with sub-graph generation code. Both options are considered in this example. + +When the configuration file is ready, run the Model Optimizer with regular command line parameters pointing to the file with model and input shapes (if necessary) and additional parameter `--tensorflow_custom_operations_config_update` pointing to the generated configuration file. If the file is correct, Model Optimizer adds two keys to the `InceptionBlockReplacer` dictionary: `inputs` and `outputs` with the following content: +```json +[ + { + "id": "InceptionBlockReplacer", + ... + "inputs": [ + [ + { + "node": "Branch_2/Conv2d_0a_1x1/Conv2D$", + "port": 0 + }, + { + "node": "Branch_3/AvgPool_0a_3x3/AvgPool$", + "port": 0 + }, + { + "node": "Branch_1/Conv2d_0a_1x1/Conv2D$", + "port": 0 + }, + { + "node": "Branch_0/Conv2d_0a_1x1/Conv2D$", + "port": 0 + } + ] + ], + "outputs": [ + { + "node": "concat$", + "port": 0 + } + ] + } +] +``` +The value for key `inputs` is a list of lists describing input tensors of the sub-graph. Each element of the top-level list corresponds to one unique input tensor of the sub-graph. Each internal list describes a list of nodes consuming this tensor and port numbers where the tensor is consumed. Model Optimizer generates regular expressions for the input nodes names to uniquely identify them in each instance of the sub-graph defined by the `instances`. Denote these nodes as input nodes of the sub-graph. + +In the InceptionV4 topology, the `InceptionV4/Mixed_5b` block has four input tensors from outside of the sub-graph, but all of them are produced by the node `InceptionV4/Mixed_5a/concat`. Therefore, the top-level list of the `inputs` contains one list corresponding to this tensor. Four input nodes of the sub-graph consume the tensor produced by `InceptionV4/Mixed_5a/concat` node. In this case, all four input nodes consume input tensor into port 0. + +The order of items in the internal list describing nodes does not matter, but the order of elements in the top-level list is important. This order defines the order in which the Model Optimizer attaches input tensors to a new generated node if the sub-graph is replaced with a single node. The i-th input node of the sub-graph is obtained using call `match.single_input_node(i)` in the sub-graph replacer code. More information about API is given below. If you need to change the order of input tensors, you can edit the configuration file in the text-editor. + +The value for the key `outputs` is a list describing nodes of the sub-graph producing tensor that goes outside of the sub-graph or does not have child nodes. Denote these nodes as output nodes of the sub-graph. The order of elements in the list is important. The i-th element of the list describes the i-th output tensor of the sub-graph, which could be obtained using call `match.output_node(i)`. The order of elements can be manually changed in the configuration file. Model Optimizer uses this order to connect output edges if the sub-graph is replaced with a single node. + +Now, when meaning of `inputs` and `outputs` attributes is clean, return back to the replacer implementation. The replacer `InceptionBlockReplacer` contains attribute `op` with the value `InceptionBlock`, which means that the identified sub-graph should be replaced with a single layer of type `InceptionBlock`. This layer is not known for the Model Optimizer, so it is necessary to define it. See [Extending the Model Optimizer with New Primitives](Extending_Model_Optimizer_with_New_Primitives.md). You must create file `extension/ops/InceptionBlock.py` with the following content: +```python +import numpy as np +from mo.graph.graph import Node +from mo.ops.op import Op +class InceptionBlock(Op): + op = "InceptionBlock" + enabled = True + def __init__(self, graph, attrs): + super().__init__(graph, attrs, { + 'type': __class__.op, + 'op': __class__.op, + }) +``` +The shape inference function is not defined. In this case, Model Optimizer uses TensorFlow fallback to calculate shapes of the sub-graph output tensors. + +Run the Model Optimizer with the regular command line parameters, path to the model file and input shape (if necessary), and the parameter `--tensorflow_use_custom_operations_config` and point to the created configuration file. Model Optimizer generates Intermediate Representation `.xml` file with three sequential layers of type `InceptionBlock` like in the following example: +```xml + + + + 1 + 384 + 35 + 35 + + + + + 1 + 384 + 35 + 35 + + + +``` +The implementation of the sub-graph replacement by scope with a single layer is complete. The next subsection explains +how Model Optimizer replaces sub-graph identified by start/end nodes (`points`) with another sub-graph. + +### Replace Sub-graph of Operations Using Points +In this scenario, for the matching algorithm user defines the sub-graph via a set of "start" and "end" nodes. +Given the set, the Model Optimizer performs the following steps: +1. Starts graph traversal from every _start_ nodes following the direction of the graph edges. +The search stops in _end_ nodes or in case of nodes without further children. All visited nodes are added to the matched sub-graph. +2. Starts another graph traversal from each non-start node of the sub-graph, i.e. every node except nodes from "start" set. +In this step the edges are traversed in the opposite edge direction. All newly visited nodes are added to the + matched sub-graph. This step is needed to add nodes required for calculation values of internal nodes of the + matched sub-graph. +3. Checks that all "end" nodes were reached from "input" nodes. If no then exit with error. +4. Check that there are no "Placeholder" operations among added nodes. If it is not true then some side branch of + the sub-graph (added in step 2) depends on inputs of the network. Such configuration is not correct so exit with error. + +This algorithm finds all nodes "between" start and end nodes. Also nodes needed for calculation of non-input nodes of the +matched sub-graph produce _constant_ values because they do not depend on input of the network. +**This sub-graph match has a limitation that each start node must have only one input**. Therefore, it is not possible +to specify, for example, convolution node as input because it has two inputs: data tensor and tensor with weights. + +For example of replacement with points, please refer to the case-study of the +[conversion for the SSD models, created with TensorFlow Object Detection API](TensorFlow_SSD_ObjectDetection_API.md). diff --git a/docs/MO_DG/prepare_model/customize_model_optimizer/TensorFlow_Faster_RCNN_ObjectDetection_API.md b/docs/MO_DG/prepare_model/customize_model_optimizer/TensorFlow_Faster_RCNN_ObjectDetection_API.md new file mode 100644 index 00000000000000..09805f6a598cfb --- /dev/null +++ b/docs/MO_DG/prepare_model/customize_model_optimizer/TensorFlow_Faster_RCNN_ObjectDetection_API.md @@ -0,0 +1,449 @@ +# Converting Faster R-CNN models, created with TensorFlow Object Detection API {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_TensorFlow_Faster_RCNN_ObjectDetection_API} + +This is a deprecated page. Please, consider reading [this](../convert_model/tf_specific/Convert_Object_Detection_API_Models.md) page describing new approach to convert Object Detection API models giving closer to TensorFlow inference results. + +## Converting models created with TensorFlow Object Detection API version equal or higher than 1.6.0 +This chapter describes how to convert selected Faster R-CNN models from the TensorFlow Object Detection API zoo version equal or higher than 1.6.0. The full list of supported models is provided in the table below. Note that currently batch size 1 is supported only. The only Inference Engine plugin supporting these topologies inference is CPU. + +The Faster R-CNN models contain several building blocks similar to building blocks from SSD models so it is highly recommended to read chapter about [enabling TensorFlow Object Detection API SSD models](TensorFlow_SSD_ObjectDetection_API.md) first. Detailed information about Faster R-CNN topologies is provided [here](https://arxiv.org/abs/1506.01497). + +The TensorFlow network consists of a number of big blocks grouped by scope: + +* `Preprocessor` performs scaling/resizing of the image and converts input data to [0, 1] interval. Has two outputs: the first one is modified input image and the second one is a constant tensor with shape (batch_size, 3) and values (resized_image_height, resized_image_width, 3). + +* `FirstStageFeatureExtractor` is a backbone feature extractor. + +* `FirstStageBoxPredictor` calculates boxes and classes predictions. + +* `GridAnchorGenerator` generates anchors coordinates. + +* `ClipToWindow` crops anchors to the resized image size. + +* `Decode` decodes coordinates of boxes using anchors and data from the `FirstStageBoxPredictor`. + +* `BatchMultiClassNonMaxSuppression` performs non maximum suppression. + +* `map` scales coordinates of boxes to [0, 1] interval by dividing coordinates by (resized_image_height, resized_image_width). + +* `map_1` scales coordinates from [0, 1] interval to resized image sizes. + +* `SecondStageFeatureExtractor` is a feature extractor for predicted Regions of interest (ROIs). + +* `SecondStageBoxPredictor` refines box coordinates according `SecondStageFeatureExtractor`. + +* `SecondStagePostprocessor` is Detection Output layer performing final boxes predictions. + +### Sub-graph replacements +There are three sub-graph replacements defined in the `extensions/front/tf/legacy_faster_rcnn_support.json` used to convert these models: + +* the first one replaces the `Preprocessor` block. The implementation of this replacer is in the `/deployment_tools/model_optimizer/extensions/front/tf/Preprocessor.py` + +* the second one replaces a number of blocks in the the graph including `GridAnchorGenerator`, `ClipToWindow`, `Decode`, `BatchMultiClassNonMaxSuppression`, `Tile`, `Tile_1` and `map` with Proposal and ROIRooling layers and some additional layers to pre-process input data + +* the third one replaces `SecondStagePostprocessor` with a DetectionOutput layer. + +The second replacer is defined using the following configuration that matches sub-graph by points: + +```json + { + "custom_attributes": { + "nms_threshold": 0.7, + "feat_stride": 16, + "max_proposals": 100, + "anchor_base_size": 256, + "anchor_scales": [0.25, 0.5, 1.0, 2.0], + "anchor_aspect_ratios": [0.5, 1.0, 2.0], + "roi_spatial_scale": 0.0625 + }, + "id": "TFObjectDetectionAPIFasterRCNNProposalAndROIPooling", + "include_inputs_to_sub_graph": true, + "include_outputs_to_sub_graph": true, + "instances": { + "end_points": [ + "CropAndResize", + "map_1/TensorArrayStack/TensorArrayGatherV3", + "map_1/while/strided_slice/Enter", + "BatchMultiClassNonMaxSuppression/map/TensorArrayStack_4/TensorArrayGatherV3" + ], + "start_points": [ + "FirstStageBoxPredictor/concat", + "FirstStageBoxPredictor/concat_1", + "GridAnchorGenerator/Identity", + "Shape", + "CropAndResize" + ] + }, + "match_kind": "points" + } +``` + +The `start_points` list contains the following nodes: + +* `FirstStageBoxPredictor/concat` node produces box coordinates predictions. + +* `FirstStageBoxPredictor/concat_1` node produces classes predictions which will be used for the ROIs + +* `GridAnchorGenerator/Identity` node produces anchors coordinates. + +* `Shape` and `CropAndResize` nodes are specified as inputs to correctly isolate the required sub-graph. Refer to the [chapter](Subgraph_Replacement_Model_Optimizer.md) for more information about replacements by points. + +The `end_points` list contains the following nodes: + +* `CropAndResize` is the node that performs ROI pooling operation. + +* `map_1/TensorArrayStack/TensorArrayGatherV3`, `map_1/while/strided_slice/Enter` and `BatchMultiClassNonMaxSuppression/map/TensorArrayStack_4/TensorArrayGatherV3` are specified to correctly isolate the sub-graph. + +The `custom_attributes` dictionary contains attributes where most values are taken from the topology-specific configuration file `samples/configs/faster_rcnn_*.config` of the [TensorFlow Object Detection API repository](https://github.com/tensorflow/models/tree/master/research/object_detection): + +* `nms_threshold` is the value of the `first_stage_nms_iou_threshold` parameter. + +* `feat_stride` is the value of the `height_stride` and `width_stride` parameters. Inference Engine supports case when these two values are equal that is why the replacement configuration file contains just one parameter. + +* `max_proposals` is the value of the `max_total_detections` parameter which is a maximum number of proposal boxes from the Proposal layer and detected boxes. + +* `anchor_base_size` is the base size of the generated anchor. The 256 is the default value for this parameter and it is not specified in the configuration file. + +* `anchor_scales" is the value of the `scales` attrbite. + +* `anchor_aspect_ratios` is the value of the `aspect_ratios` attribute. + +* `roi_spatial_scale` is needed for the Inference Engine ROIPooling layer. It is the default value that is not actually used. + +The identifier for this replacer is `TFObjectDetectionAPIFasterRCNNProposalAndROIPooling`. The Python implementation of this replacer is in the file `/deployment_tools/model_optimizer/extensions/front/tf/FasterRCNNs.py`. + +The first four functions of the replacer class are the following: + +```python +class TFObjectDetectionAPIFasterRCNNProposalAndROIPooling(FrontReplacementFromConfigFileSubGraph): + """ + This class replaces sub-graph of operations with Proposal and ROIPooling layers and additional layers transforming + tensors from layout of TensorFlow to layout required by Inference Engine. + Refer to comments inside the function for more information about performed actions. + """ + replacement_id = 'TFObjectDetectionAPIFasterRCNNProposalAndROIPooling' + + def run_after(self): + return [PreprocessorReplacement] + + def run_before(self): + return [SecondStagePostprocessorReplacement] + + def output_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict): + return {match.output_node(0)[0].id: new_sub_graph['roi_pooling_node'].id} + + def nodes_to_remove(self, graph: nx.MultiDiGraph, match: SubgraphMatch): + new_list = match.matched_nodes_names().copy() + # do not remove nodes that produce box predictions and class predictions + new_list.remove(match.single_input_node(0)[0].id) + new_list.remove(match.single_input_node(1)[0].id) + return new_list +``` + +The function `run_after` returns list of Python classes inherited from one of the replacer classes (`FrontReplacementOp`, `FrontReplacementPattern`, `FrontReplacementFromConfigFileSubGraph` etc) those current sub-graph replacement class must be run after. In this case the replacer must be run after the `Preprocessor` is removed by the `PreprocessorReplacement` replacer. Similar way the `run_before` function is used to tell Model Optimizer to execute `SecondStagePostprocessorReplacement` before this replacer. + +The `output_edges_match` function describes matching between the output nodes of the sub-graph before replacement and after. In this case the only needed output node of the sub-graph is the `CropAndResize` node which is identified with `match.output_node(0)[0]`. The new output node which is created in the `generate_sub_graph` function is identified with `new_sub_graph['roi_pooling_node']`. + +The `nodes_to_remove` function takes the default list of nodes to be removed which contains all matched nodes and remove from them two input nodes which are identified with `match.single_input_node(0)[0]` and `match.single_input_node(1)[0]`. These nodes will be connected as inputs to new nodes being generated in the `generate_sub_graph` function so they should node be removed. + +The code generating new sub-graph is the following: + +```python + def generate_sub_graph(self, graph: nx.MultiDiGraph, match: SubgraphMatch): + log.debug('TFObjectDetectionAPIFasterRCNNProposal: matched_nodes = {}'.format(match.matched_nodes_names())) + + config_attrs = match.custom_replacement_desc.custom_attributes + nms_threshold = config_attrs['nms_threshold'] + feat_stride = config_attrs['feat_stride'] + max_proposals = config_attrs['max_proposals'] + anchor_base_size = config_attrs['anchor_base_size'] + roi_spatial_scale = config_attrs['roi_spatial_scale'] + proposal_ratios = config_attrs['anchor_aspect_ratios'] + proposal_scales = config_attrs['anchor_scales'] + anchors_count = len(proposal_ratios) * len(proposal_scales) +``` + +These lines get parameters defined in the sub-graph replacement configuration file and calculate initial anchors count. + +```python + # get the ROIPool size from the CropAndResize which performs the same action + if 'CropAndResize' not in graph.nodes(): + raise Error('Failed to find node with name "CropAndResize" in the topology. Probably this is not Faster' + ' RCNN topology or it is not supported') + roi_pool_size = Node(graph, 'CropAndResize').in_node(3).value[0] +``` + +The code above gets the ROI Pooling spatial output dimension size as a value from the fourth argument of the node with name `CropAndResize`. + +```python + # Convolution/matmul node that produces classes predictions + # Permute result of the tensor with classes permissions so it will be in a correct layout for Softmax + predictions_node = match.single_input_node(1)[0].in_node(0).in_node(0) + permute_predictions_op = Permute(graph, {'order': np.array([0, 2, 3, 1])}) + permute_predictions_node = permute_predictions_op.create_node([], dict(name=predictions_node.name + '/Permute_')) + insert_node_after(predictions_node, permute_predictions_node, 0) + + reshape_classes_op = Reshape(graph, {'dim': np.array([0, -1, 2])}) + reshape_classes_node = reshape_classes_op.create_node([permute_predictions_node], + dict(name='Reshape_FirstStageBoxPredictor_Class_')) + update_attrs(reshape_classes_node, 'shape_attrs', 'dim') + + softmax_conf_op = Softmax(graph, {'axis': 1}) + softmax_conf_node = softmax_conf_op.create_node([reshape_classes_node], + dict(name='FirstStageBoxPredictor_SoftMax_Class_')) +``` + +The output with class predictions from the `FirstStageBoxPredictor` is generated with a convolution operation. The convolution output data layout in TensorFlow is NHWC while Inference Engine uses NCHW layout. Model Optimizer by default converts the weights of TensorFlow convolutions to produce output tensor in NCHW layout required by Inference Engine. The issue arises because the class predictions tensor is passed through the Softmax operation to produce class probabilities. The Inference Engine Softmax is performed over the fastest-changing dimension which is 'W' in Inference Engine. Thus, the softmax operation will be performed over a wrong dimension after conversion of the convolution node producing classes predicitions. The solution is to add Permute and Reshape operations to prepare the input data for Softmax. The Reshape operation is required to make the size of the fastest-changing dimension equal to 2, because there are 2 classes being predicted: background and foreground. + +Another issue is that layout of elements in the predicted classes tensor is different between TensorFlow and Inference Engine Proposal layer requirements. In TensorFlow the tensor has the following virtual layout [N, H, W, num_anchors, num_classes] while the Inference Engine Proposal layer requires in the following virtual layout [N, num_classes, num_anchors, H, W]. Thus, it is necessary to reshape, permute and then reshape again output from the Softmax to the required shape for the Proposal layer: + +```python + reshape_softmax_op = Reshape(graph, {'dim': np.array([1, anchors_count, 2, -1])}) + reshape_softmax_node = reshape_softmax_op.create_node([softmax_conf_node], dict(name='Reshape_Softmax_Class_')) + update_attrs(reshape_softmax_node, 'shape_attrs', 'dim') + + permute_reshape_softmax_op = Permute(graph, {'order': np.array([0, 1, 3, 2])}) + permute_reshape_softmax_node = permute_reshape_softmax_op.create_node([reshape_softmax_node], + dict(name='Permute_')) + + # implement custom reshape infer function because we need to know the input convolution node output dimension + # sizes but we can know it only after partial infer + reshape_permute_op = Reshape(graph, {'dim': np.ones([4]), 'anchors_count': anchors_count, + 'conv_node': predictions_node}) + reshape_permute_op.attrs['old_infer'] = reshape_permute_op.attrs['infer'] + reshape_permute_op.attrs['infer'] = __class__.classes_probabilities_reshape_shape_infer + reshape_permute_node = reshape_permute_op.create_node([permute_reshape_softmax_node], + dict(name='Reshape_Permute_Class_')) + update_attrs(reshape_permute_node, 'shape_attrs', 'dim') +``` + +The Proposal layer has 3 inputs: classes probabilities, boxes predictions and a input shape of the image. The first two tensors are ready so it is necessary to create the Const operation that produces the desired third input tensor. + +```python + # create constant input with the image height, width and scale H and scale W (if present) required for Proposal + const_value = np.array([[input_height, input_width, 1]], dtype=np.float32) + const_op = Const(graph, dict(value=const_value, shape=const_value.shape)) + const_node = const_op.create_node([], dict(name='Proposal_const_image_size_')) +``` + +Now add the Proposal layer: + +```python + + proposal_op = ProposalOp(graph, dict(min_size=10, framework='tensorflow', box_coordinate_scale=10, + box_size_scale=5, post_nms_topn=max_proposals, feat_stride=feat_stride, + ratio=proposal_ratios, scale=proposal_scales, base_size=anchor_base_size, + pre_nms_topn=2**31 - 1, + nms_thresh=nms_threshold)) + proposal_node = proposal_op.create_node([reshape_permute_node, + match.single_input_node(0)[0].in_node(0).in_node(0), + const_node], + dict(name=proposal_op.attrs['type'] + '_')) +``` + +The box coordinates in the TensorFlow are in the following layout "YXYX" while Inference Engine uses "XYXY" layout so it is necessary to swap coordinates produced by Proposal layer. It is implemented with help of a convolution node with a special filter of a size [5, 5]: + +```python + proposal_reshape_4d_op = Reshape(graph, {'dim': np.array([max_proposals, 1, 1, 5])}) + proposal_reshape_4d_node = proposal_reshape_4d_op.create_node([proposal_node], dict(name="reshape_4d_")) + update_attrs(proposal_reshape_4d_node, 'shape_attrs', 'dim') + + # create convolution node to swap X and Y coordinates in the proposals + conv_filter_const_data = np.array(np.array([[1, 0, 0, 0, 0], + [0, 0, 1, 0, 0], + [0, 1, 0, 0, 0], + [0, 0, 0, 0, 1], + [0, 0, 0, 1, 0]], + dtype=np.float32).reshape([1, 1, 5, 5]), dtype=np.float32) + conv_filter_const_op = Const(graph, dict(value=conv_filter_const_data, spatial_dims=np.array([2, 3]))) + conv_filter_const_node = conv_filter_const_op.create_node([], dict(name="conv_weights")) + + conv_op = Op(graph, { + 'op': 'Conv2D', + 'bias_addable': False, + 'spatial_dims': np.array([1, 2]), + 'channel_dims': np.array([3]), + 'batch_dims': np.array([0]), + 'pad': None, + 'pad_spatial_shape': None, + 'input_feature_channel': 2, + 'output_feature_channel': 2, + 'output_shape': [max_proposals, 1, 1, 5], + 'dilation': np.array([1, 1, 1, 1], dtype=np.int64), + 'stride': np.array([1, 1, 1, 1]), + 'type': 'Convolution', + 'group': None, + 'layout': 'NHWC', + 'infer': __class__.fake_conv_shape_infer}) + predictions_node = conv_op.create_node([proposal_reshape_4d_node, conv_filter_const_node], dict(name="conv_")) + update_ie_fields(graph.node[predictions_node.id]) + + proposal_reshape_2d_op = Reshape(graph, {'dim': np.array([max_proposals, 5])}) + proposal_reshape_2d_node = proposal_reshape_2d_op.create_node([predictions_node], dict(name="reshape_2d_")) + # set specific name for this Reshape operation so we can use it in the DetectionOutput replacer + proposal_reshape_2d_node['name'] = 'swapped_proposals' +``` + +The ROIPooling layer in TensorFlow is implemented with operation called `CropAndResize` with bi-linear filtration. Inference Engine implementation of the ROIPooling layer with bi-linear filtration requires input boxes coordinates be scaled to [0, 1] interval. Adding elementwise multiplication of box coordinates solves this issue: + +```python + # the TF implementation of Proposal with bi-linear filtration need proposals scaled by image size + proposal_scale_const = np.array([1.0, 1 / input_height, 1 / input_width, 1 / input_height, 1 / input_width], + dtype=np.float32) + proposal_scale_const_op = Const(graph, dict(value=proposal_scale_const, shape=proposal_scale_const.shape)) + proposal_scale_const_node = proposal_scale_const_op.create_node([], dict(name='Proposal_scale_const_')) + + scale_proposals_op = Eltwise(graph, {'operation': 'mul'}) + scale_proposals_node = scale_proposals_op.create_node([proposal_reshape_2d_node, proposal_scale_const_node], + dict(name='scale_proposals_')) +``` + +The last step is to create the ROIPooling node with 2 inputs: the identified feature maps from the `FirstStageFeatureExtractor` and the scaled output of the Proposal layer: + +```python + feature_extractor_output_nodes = scope_output_nodes(graph, 'FirstStageFeatureExtractor') + if len(feature_extractor_output_nodes) != 1: + raise Error("Failed to determine FirstStageFeatureExtractor output node to connect it to the ROIPooling." + "Found the following nodes: {}".format([node.name for node in feature_extractor_output_nodes])) + + roi_pooling_op = ROIPooling(graph, dict(method="bilinear", framework="tensorflow", + pooled_h=roi_pool_size, pooled_w=roi_pool_size, + spatial_scale=roi_spatial_scale)) + roi_pooling_node = roi_pooling_op.create_node([feature_extractor_output_nodes[0], scale_proposals_node], + dict(name='ROI_Pooling_')) + + return {'roi_pooling_node': roi_pooling_node} +``` + +The are two additional methods implemented in the replacer class: + +* The `fake_conv_shape_infer` is a silly infer function for the convolution that permutes X and Y coordinates of the Proposal output which avoids setting a lot of internal attributes required for propoper shape inference. + +* The "classes_probabilities_reshape_shape_infer" function is used to update the output dimension of the reshape operation. The output spatial dimensions depends on the convolution output spatial dimensions thus they are not known until the shape inference pass which is performed after this sub-graph replacement class. So this custom infer function is called instead of default Reshape shape inference function, updates the required attribute "dim" of the node with the convolution output spatial dimensions which are known at the time of calling this inference function and then call the default Reshape inference function. + +```python + @staticmethod + def fake_conv_shape_infer(node: Node): + node.out_node(0).shape = node.in_node(0).shape + # call functions to update internal attributes required for correct IR generation + mark_input_bins(node) + assign_dims_to_weights(node.in_node(1), [0, 1], node.input_feature_channel, node.output_feature_channel, 4) + + @staticmethod + def classes_probabilities_reshape_shape_infer(node: Node): + # now we can determine the reshape dimensions from Convolution node + conv_node = node.conv_node + conv_output_shape = conv_node.out_node().shape + + # update desired shape of the Reshape node + node.dim = np.array([0, conv_output_shape[1], conv_output_shape[2], node.anchors_count * 2]) + node.old_infer(node) +``` + +The second replacer defined in the sub-graph replacement configuration file replaces the `SecondStagePostprocessor` block and is defined using scope: + +```json + { + "custom_attributes": { + "code_type": "caffe.PriorBoxParameter.CENTER_SIZE", + "confidence_threshold": 0.01, + "keep_top_k": 300, + "nms_threshold": 0.6, + "pad_mode": "caffe.ResizeParameter.CONSTANT", + "resize_mode": "caffe.ResizeParameter.WARP", + "max_detections_per_class": 100, + "num_classes": 90 + }, + "id": "SecondStagePostprocessorReplacement", + "inputs": [ + [ + { + "node": "Reshape$", + "port": 0 + } + ], + [ + { + "node": "Reshape_1$", + "port": 0 + } + ], + [ + { + "node": "ExpandDims$", + "port": 0 + } + ] + ], + "instances": [ + ".*SecondStagePostprocessor/" + ], + "match_kind": "scope", + "outputs": [ + { + "node": "BatchMultiClassNonMaxSuppression/map/TensorArrayStack/TensorArrayGatherV3$", + "port": 0 + } + ] + } +``` + +The replacement code is similar to the `SecondStagePostprocessor` replacement for the SSDs topologies. The are two major difference: + +* The tensor with bounding boxes doesn't contain locations for class 0 (background class) but Inference Engine Detection Output layer requires it. The Const node with some dummy values are created and concatenated with the tensor. + +* The priors tensor is not constant like in SSDs so the bounding boxes tensor must be scaled with variances [0.1, 0.1, 0.2, 0.2]. + +The descibed above difference are resolved with the following code: + +```python + # TF produces locations tensor without boxes for background. + # Inference Engine DetectionOutput layer requires background boxes so we generate them with some values + # and concatenate with locations tensor + fake_background_locs_blob = np.tile([[[1, 1, 2, 2]]], [max_detections_per_class, 1, 1]) + fake_background_locs_const_op = Const(graph, dict(value=fake_background_locs_blob, + shape=fake_background_locs_blob.shape)) + fake_background_locs_const_node = fake_background_locs_const_op.create_node([]) + + reshape_loc_op = Reshape(graph, {'dim': np.array([max_detections_per_class, num_classes, 4])}) + reshape_loc_node = reshape_loc_op.create_node([match.single_input_node(0)[0].in_node(0)], + dict(name='Reshape_loc_')) + + concat_loc_op = Concat(graph, {'axis': 1}) + concat_loc_node = concat_loc_op.create_node([fake_background_locs_const_node, reshape_loc_node], + dict(name='Concat_fake_loc_')) + + # blob with variances + variances_blob = np.array([0.1, 0.1, 0.2, 0.2]) + variances_const_op = Const(graph, dict(value=variances_blob, shape=variances_blob.shape)) + variances_const_node = variances_const_op.create_node([]) + + # reshape locations tensor to 2D so it could be passed to Eltwise which will be converted to ScaleShift + reshape_loc_2d_op = Reshape(graph, {'dim': np.array([-1, 4])}) + reshape_loc_2d_node = reshape_loc_2d_op.create_node([concat_loc_node], dict(name='reshape_locs_2d_')) + + # element-wise multiply locations with variances + eltwise_locs_op = Eltwise(graph, {'operation': 'mul'}) + eltwise_locs_node = eltwise_locs_op.create_node([reshape_loc_2d_node, variances_const_node], + dict(name='scale_locs_')) +``` + +### Example of Model Optimizer Command-Line for TensorFlow's Faster R-CNNs +The final command line to convert Faster R-CNNs from the TensorFlow* Object Detection Zoo is the following: + +```sh +./mo.py --input_model= --output=detection_boxes,detection_scores,num_detections --tensorflow_use_custom_operations_config extensions/front/tf/legacy_faster_rcnn_support.json +``` + +Note that there are minor changes that should be made to the and sub-graph replacement configuration file `/deployment_tools/model_optimizer/extensions/front/tf/legacy_faster_rcnn_support.json` before converting particular Faster R-CNN topology. Refer to the table below. + +### Sub-Graph Replacement Configuration File Parameters to Convert Different Faster R-CNN Models +|Model Name | Configuration File Changes| +|:----|:----:| +| faster_rcnn_inception_v2_coco | None +| faster_rcnn_resnet50_coco | None +| faster_rcnn_resnet50_lowproposals_coco | None +| faster_rcnn_resnet101_coco | None +| faster_rcnn_resnet101_lowproposals_coco | None +| faster_rcnn_inception_resnet_v2_atrous_coco | "feat_stride: 8" +| faster_rcnn_inception_resnet_v2_atrous_lowproposals_coco| "feat_stride: 8" + diff --git a/docs/MO_DG/prepare_model/customize_model_optimizer/TensorFlow_SSD_ObjectDetection_API.md b/docs/MO_DG/prepare_model/customize_model_optimizer/TensorFlow_SSD_ObjectDetection_API.md new file mode 100644 index 00000000000000..b43d5de15e21aa --- /dev/null +++ b/docs/MO_DG/prepare_model/customize_model_optimizer/TensorFlow_SSD_ObjectDetection_API.md @@ -0,0 +1,339 @@ +# (Deprecated) Case Study: Converting SSD Models Created with TensorFlow* Object Detection API {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_TensorFlow_SSD_ObjectDetection_API} + +This is a deprecated page. Please, consider reading [this](../convert_model/tf_specific/Convert_Object_Detection_API_Models.md) page describing new approach to convert Object Detection API models giving closer to TensorFlow inference results. + +## Converting Models Created with TensorFlow Object Detection API Version prior 1.6.0 + +As explained in the [Sub-graph Replacement in Model Optimizer](Subgraph_Replacement_Model_Optimizer.md) section, there are multiple +ways to setup the sub-graph matching. In this example we are focusing on the defining the sub-graph via a set of +"start" and "end" nodes. +The result of matching is two buckets of nodes: +* Nodes "between" start and end nodes. +* Nodes connected to the first list, but just on the constant path (e.g. these nodes are not connected to the inputs of the entire graph). + +Let's look closer to the SSD models from the TensorFlow* detection model +zoo: +[SSD MobileNet](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2017_11_17.tar.gz) and +[SSD InceptionV2](http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2017_11_17.tar.gz). + +* Nodes "between" start and end nodes +* Nodes connected to the first list, but just on the constant path (for example, these nodes are not connected to the inputs of the entire graph). Let's look closer to the SSD models from the TensorFlow\* detection model zoo : [SSD MobileNet](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2017_11_17.tar.gz) and [SSD InceptionV2](http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2017_11_17.tar.gz). + +A distinct layer of any SSD topology is the `DetectionOutput` layer. This layer is implemented with a dozens of primitive operations in TensorFlow, while in Inference Engine, it is one [layer](../../../ops/opset.md). Thus, to convert a SSD model from the TensorFlow, the Model Optimizer should replace the entire sub-graph of operations that implement the `DetectionOutput` layer with a single well-known `DetectionOutput` node. + +The Inference Engine `DetectionOutput` layer consumes three tensors in the following order: + +1. Tensor with locations of bounding boxes +2. Tensor with confidences for each bounding box +3. Tensor with prior boxes (anchors in TensorFlow terminology) + +`DetectionOutput` layer produces one tensor with seven numbers for each actual detection. There are more output tensors in the TensorFlow Object Detection API, but the values in them are consistent with the Inference Engine ones. + +The difference with [other examples](Subgraph_Replacement_Model_Optimizer.md) is that here the `DetectionOutput` sub-graph is replaced with a new sub-graph (not a single layer). + +Look at sub-graph replacement configuration file `/deployment_tools/model_optimizer/extensions/front/tf/legacy_ssd_support.json` that is used to enable two models listed above: +```json +[ + { + "custom_attributes": { + "code_type": "caffe.PriorBoxParameter.CENTER_SIZE", + "confidence_threshold": 0.01, + "keep_top_k": 200, + "nms_threshold": 0.45, + "pad_mode": "caffe.ResizeParameter.CONSTANT", + "resize_mode": "caffe.ResizeParameter.WARP" + }, + "id": "TFObjectDetectionAPIDetectionOutput", + "include_inputs_to_sub_graph": true, + "include_outputs_to_sub_graph": true, + "instances": { + "end_points": [ + "detection_boxes", + "detection_scores", + "num_detections" + ], + "start_points": [ + "Postprocessor/Shape", + "Postprocessor/Slice", + "Postprocessor/ExpandDims", + "Postprocessor/Reshape_1" + ] + }, + "match_kind": "points" + }, + { + "custom_attributes": { + }, + "id": "PreprocessorReplacement", + "inputs": [ + [ + { + "node": "map/Shape$", + "port": 0 + }, + { + "node": "map/TensorArrayUnstack/Shape$", + "port": 0 + }, + { + "node": "map/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3$", + "port": 2 + } + ] + ], + "instances": [ + ".*Preprocessor/" + ], + "match_kind": "scope", + "outputs": [ + { + "node": "sub$", + "port": 0 + }, + { + "node": "map/TensorArrayStack_1/TensorArrayGatherV3$", + "port": 0 + } + ] + } +] +``` + +**Key lines**: + +* Lines 3-10 define static attributes that will be saved to the Intermediate Representation `.xml` file for `DetectionOutput` layer. + +* Lines 12 and 13 define values for attributes that should be always set to "true" for this release of the Model Optimizer. These two attributes are specific for sub-graph match by points only. + +* Lines 14-26 define one instance of the sub-graph to be match. It is an important difference between sub-graph matching by scope and points. Several instances could be specified for matching by scope, but matching with points allows specifying just one instance. So the full node names (not regular expressions like in case of match with scope) are specified in `instances` dictionary. + +The second sub-graph replacer with identifier `PreprocessorReplacement` is used to remove the `Preprocessing` block from the graph. The replacer removes all nodes from this scope except nodes performing mean value subtraction and scaling (if applicable). Implementation of the replacer is in the `/deployment_tools/model_optimizer/extensions/front/tf/Preprocessor.py` file. + +Now let's analyze the structure of the topologies generated with the Object Detection API. There are several blocks in the graph performing particular task: + +* `Preprocessor` block resizes, scales and subtracts mean values from the input image. + +* `FeatureExtractor` block is a [MobileNet](https://arxiv.org/abs/1704.04861) or other backbone to extract features. + +* `MultipleGridAnchorGenerator` block creates initial bounding boxes locations (anchors). + +* `Postprocessor` block acts as a `DetectionOutput` layer. So we need to replace `Postprocessor` block with `DetectionOutput` layer. It is necessary to add all input nodes of the `Postprocessor` scope to the list `start_points`. Consider inputs of each of these nodes: + + * `Postprocessor/Shape` consumes tensor with locations. + * `Postprocessor/Slice` consumes tensor with confidences. + * `Postprocessor/ExpandDims` consumes tensor with prior boxes. + * `Postprocessor/Reshape_1` consumes tensor with locations similarly to the `Postprocessor/Shape` node. Despite the fact that the last node `Postprocessor/Reshape_1` gets the same tensor as node `Postprocessor/Shape`, it must be explicitly put to the list. + +Object Detection API `Postprocessor` block generates output nodes: `detection_boxes`, `detection_scores`, `num_detections`, `detection_classes`. + +Now consider the implementation of the sub-graph replacer, available in the `/deployment_tools/model_optimizer/extensions/front/tf/SSDs.py`. The file is rather big, so only some code snippets are used: +```python +class PostprocessorReplacement(FrontReplacementFromConfigFileSubGraph): + replacement_id = 'TFObjectDetectionAPIDetectionOutput' +``` + +These lines define the new `PostprocessorReplacement` class inherited from `FrontReplacementFromConfigFileSubGraph`. `FrontReplacementFromConfigFileSubGraph` is designed to replace sub-graph of operations described in the configuration file. There are methods to override for implementing custom replacement logic that we need: + +* `generate_sub_graph` performs new sub-graph generation and returns dictionary where key is an alias name for the node and value is a Node objects. The dictionary has the same format as parameter `match` in the `replace_sub_graph` method in the example with networkx sub-graph isomorphism pattern. This dictionary is passed as argument to the next three methods, so it should contain entries the for nodes that the functions need. + +* `input_edges_match` specifies mapping between input edges to sub-graph before replacement and after replacement. The key of the dictionary is a tuple specifying input tensor of the sub-graph before replacement: sub-graph input node name and input port number for this node. The value for this key is also a tuple specifying the node where this tensor should be attached during replacement: the node name (or alias name of the node) and the input port for this node. If the port number is zero, the parameter could be omitted so the key or value is just a node name (alias). Default implementation of the method returns an empty dictionary, so Model Optimizer does not create new edges. + +* `output_edges_match` returns mapping between old output edges of the matched nodes and new sub-graph node and output edge index. The format is similar to the dictionary returned in the `input_edges_match` method. The only difference is that instead of specifying input port numbers for the nodes it is necessary to specify output port number. Of course, this mapping is needed for the output nodes only. Default implementation of the method returns an empty dictionary, so the Model Optimizer does not create new edges. + +* `nodes_to_remove` specifies list of nodes that Model Optimizer should remove after sub-graph replacement. Default implementation of the method removes all sub-graph nodes. + +Review of the replacer code, considering details of the `DetectionOutput` layer implementation in the Inference Engine. There are several constraints to the input tensors of the `DetectionOutput` layer: + +* The tensor with locations must be of shape `[#‍batch, #‍prior_boxes * 4]` or `[#‍batch, #‍prior_boxes * 5]` depending on shared locations between different batches or not. +* The tensor with confidences must be of shape `[#‍batch, #‍prior_boxes * #‍classes]` and confidences values are in range [0, 1], that is passed through `softmax` layer. +* The tensor with prior boxes must be of shape `[#‍batch, 2, #‍prior_boxes * 4]`. Inference Engine expects that it contains variance values which TensorFlow Object Detection API does not add. + +To enable these models, add `Reshape` operations for locations and confidences tensors and update the values for the prior boxes to include the variance constants (they are not there in TensorFlow Object Detection API). + +Look at the `generate_sub_graph` method: +```python +def generate_sub_graph(self, graph: nx.MultiDiGraph, match: SubgraphMatch): + log.debug('PostprocessorReplacement.generate_sub_graph') + log.debug('matched_nodes = {}'.format(match.matched_nodes_names())) + # softmax to be applied to the confidence + softmax_conf_op = Softmax(graph, {'axis': 2, 'nchw_layout': True}) + softmax_conf_node = softmax_conf_op.add_node(dict(name='DetectionOutput_SoftMax_conf_')) + # Inference Engine DetectionOutput layer consumes flattened tensors + # reshape operation to flatten locations tensor + reshape_loc_op = Reshape(graph, {'dim': np.array([0, -1])}) + reshape_loc_node = reshape_loc_op.add_node(dict(name='DetectionOutput_Reshape_loc_')) + # Inference Engine DetectionOutput layer consumes flattened tensors + # reshape operation to flatten confidence tensor + reshape_conf_op = Reshape(graph, {'dim': np.array([0, -1])}) + reshape_conf_node = reshape_conf_op.add_node(dict(name='DetectionOutput_Reshape_conf_')) + # create Node object from Op class + detection_output_op = DetectionOutput(graph, match.custom_replacement_desc.custom_attributes) + detection_output_op.attrs['old_infer'] = detection_output_op.attrs['infer'] + detection_output_op.attrs['infer'] = __class__.do_infer + detection_output_node = detection_output_op.add_node(dict(name=detection_output_op.attrs['type'] + '_')) + # create internal edges of the sub-graph. In this case we add edges to connect input port 0 and 1 of the + # detection output with output of reshape of locations and reshape of confidence + create_edge(softmax_conf_node, reshape_conf_node, 0, 0) + create_edge(reshape_loc_node, detection_output_node, 0, 0) + create_edge(reshape_conf_node, detection_output_node, 0, 1) + return {'detection_output_node': detection_output_node, 'reshape_conf_node': softmax_conf_node, + 'reshape_loc_node': reshape_loc_node} +``` +The method has two inputs: the graph to operate on and the instance of `SubgraphMatch` object, which describes matched sub-graph. The latter class has several useful methods to get particular input/output node of the sub-graph by input/output index or by node name pattern. Examples of these methods usage are given below. + +**Key lines**: + +* Lines 6 and 7 create new instance of operation of type `Softmax` and graph Node object corresponding to that operation. + +* Lines 11-12 and 16-17 create new instance of operation of type `Reshape` to reshape locations and confidences tensors correspondingly. + +* Lines 20-23 create new instance of operation `DetectionOutput` and graph Node object corresponding to that operation. + +* Lines 27-29 connect `softmax` node with `reshape` node and connect two reshaped locations and confidences tensors with `DetectionOutput` node. + +* Lines 30-31 define dictionary with aliases for detection output node, reshape locations and confidences nodes. These aliases are used in the `input_edges_match` and `output_edges_match` methods. + +The `input_edges_match` method is the following: +```python +def input_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict): + locs_consumer_node, locs_consumer_node_port = match.input_nodes(0)[0] + conf_consumer_node, conf_consumer_node_port = match.input_nodes(1)[0] + priors_consumer_node, priors_consumer_node_port = match.input_nodes(2)[0] + # create matching nodes for locations and confidence tensors using simple scheme "old_node_name: new_node_name" + # which in fact means "(old_node_name, 0): (new_node_name, 0)", while first '0' means old_port and the second + # zero defines 'new_port'. + return {locs_consumer_node.id: new_sub_graph['reshape_loc_node'].id, + conf_consumer_node.id: new_sub_graph['reshape_conf_node'].id, + priors_consumer_node.id: (new_sub_graph['detection_output_node'].id, 2), + } +``` +The method has three parameters: input `graph`, `match` object describing matched sub-graph and `new_sub_graph` dictionary with alias names returned from the `generate_sub_graph` method. + +**Key lines**: + +* Lines 2-4 initialize Node objects and input ports for the nodes where the input tensors for the sub-graph are consumed. The method `match.input_nodes(ind)` returns list of tuples where the first element is a Node object and the second is the input port for this node which consumes the ind-th input tensor of the sub-graph. `input_points` list in the configuration file defines the order of input tensors to the sub-graph. For example, the `locs_consumer_node` object of type Node is a node that consumes tensor with locations in the port with number `locs_consumer_node_port`. + +* Lines 8-11 define dictionary with the mapping of tensors as described above. Note that the attribute `id` of the Node object contains the name of the node in the graph. + +The `output_edges_match` method is the following: +```python +def output_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict): + # the DetectionOutput in IE produces single tensor, but in TF it produces two tensors, so we need to create only + # one output edge match + return {match.output_node(0)[0].id: new_sub_graph['detection_output_node'].id} +``` + +The method has the same three parameters as `input_edges_match` method. The returned dictionary contains mapping just for one tensor initially produces by the first output node of the sub-graph (which is `detection_boxes` according to the configuration file) to a single output tensor of the created `DetectionOutput` node. In fact, it is possible to use any output node of the initial sub-graph in mapping, because the sub-graph output nodes are the output nodes of the whole graph (their output is not consumed by any other nodes). + +Now, the Model Optimizer knows how to replace the sub-graph. The last step to enable the model is to cut-off some parts of the graph not needed during inference. + +It is necessary to remove the `Preprocessor` block where image is resized. Inference Engine does not support dynamic input shapes, so the Model Optimizer must froze the input image size, and thus, resizing of the image is not necessary. This is achieved by replacer `/deployment_tools/model_optimizer/extensions/front/tf/Preprocessor.py` which is executed automatically. + +There are several `Switch` operations in the `Postprocessor` block without output edges. For example: +```sh +Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond/cond/switch_t +``` +```sh +Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond/cond/switch_f +``` +```sh +Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond_1/cond/switch_t +``` +```sh +Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond_1/cond/switch_f +``` + +Model Optimizer marks these nodes as output nodes of the topology. Some parts of the `Posprocessor` blocks are not removed during sub-graph replacement because of that. In order to fix this issue, it is necessary to specify output nodes of the graph manually using the `--output` command line parameter. + +###Example Model Optimizer Command-Line for TensorFlow\* SSD + +The final command line to convert SSDs from the TensorFlow Object Detection API Zoo is: +```shell +./mo_tf.py --input_model= --tensorflow_use_custom_operations_config extensions/front/tf/legacy_ssd_support.json --output="detection_boxes,detection_scores,num_detections" +``` + +## Converting MobileNet V2 model created with TensorFlow Object Detection API +The [MobileNet V2 model](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz) differs from the previous version, so converting the model requires a new sub-graph replacement configuration file and new command line parameters. The major differences are: + +* The `Preprocessor` block has two outputs: the pre-processed image and the pre-processed image size. +* The `Postprocessor` block has one more input (in comparison with models created with TensorFlow Object Detection API +version 1.6 or lower): the pre-processed image size. +* Some node names have been changed in the `Postprocessor` block. + +The updated sub-graph replacement configuration file `extensions/front/tf/ssd_v2_support.json` reflecting these changes +is the following: + +```json +[ + { + "custom_attributes": { + "code_type": "caffe.PriorBoxParameter.CENTER_SIZE", + "confidence_threshold": 0.01, + "keep_top_k": 200, + "nms_threshold": 0.6, + "pad_mode": "caffe.ResizeParameter.CONSTANT", + "resize_mode": "caffe.ResizeParameter.WARP" + }, + "id": "TFObjectDetectionAPIDetectionOutput", + "include_inputs_to_sub_graph": true, + "include_outputs_to_sub_graph": true, + "instances": { + "end_points": [ + "detection_boxes", + "detection_scores", + "num_detections" + ], + "start_points": [ + "Postprocessor/Shape", + "Postprocessor/scale_logits", + "Postprocessor/ExpandDims", + "Postprocessor/Reshape_1", + "Postprocessor/ToFloat" + ] + }, + "match_kind": "points" + }, + { + "custom_attributes": { + }, + "id": "PreprocessorReplacement", + "inputs": [ + [ + { + "node": "map/Shape$", + "port": 0 + }, + { + "node": "map/TensorArrayUnstack/Shape$", + "port": 0 + }, + { + "node": "map/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3$", + "port": 2 + } + ] + ], + "instances": [ + ".*Preprocessor/" + ], + "match_kind": "scope", + "outputs": [ + { + "node": "sub$", + "port": 0 + }, + { + "node": "map/TensorArrayStack_1/TensorArrayGatherV3$", + "port": 0 + } + ] + } +] +``` + +### Example of Model Optimizer Command-Line for TensorFlow SSD MobileNet V2 +The final command line to convert MobileNet SSD V2 from the TensorFlow Object Detection Zoo is the following: + +```sh +./mo_tf.py --input_model= --tensorflow_use_custom_operations_config extensions/front/tf/ssd_v2_support.json --output="detection_boxes,detection_scores,num_detections" +``` diff --git a/docs/Optimization_notice.md b/docs/Optimization_notice.md new file mode 100644 index 00000000000000..99f71b905cc6b5 --- /dev/null +++ b/docs/Optimization_notice.md @@ -0,0 +1,3 @@ +# Optimization Notice {#openvino_docs_Optimization_notice} + +![Optimization_notice](img/opt-notice-en_080411.gif) \ No newline at end of file diff --git a/docs/benchmarks/performance_benchmarks.md b/docs/benchmarks/performance_benchmarks.md new file mode 100644 index 00000000000000..687cb940c42821 --- /dev/null +++ b/docs/benchmarks/performance_benchmarks.md @@ -0,0 +1,234 @@ +# Get a Deep Learning Model Performance Boost with Intel® Platforms {#openvino_docs_performance_benchmarks} + +## Increase Performance for Deep Learning Inference + +The [Intel® Distribution of OpenVINO™ toolkit](https://software.intel.com/en-us/openvino-toolkit) helps accelerate deep learning inference across a variety of Intel® processors and accelerators. Rather than a one-size-fits-all solution, Intel offers a powerful portfolio of scalable hardware and software solutions, powered by the Intel® Distribution of OpenVINO™ toolkit, to meet the various performance, power, and price requirements of any use case. The benchmarks below demonstrate high performance gains on several public neural networks for a streamlined, quick deployment on **Intel® CPU, VPU and FPGA** platforms. Use this data to help you decide which hardware is best for your applications and solutions, or to plan your AI workload on the Intel computing already included in your solutions. + +Measuring inference performance involves many variables and is extremely use-case and application dependent. We use the below four parameters for measurements, which are key elements to consider for a successful deep learning inference application: + +1. **Throughput** - Measures the number of inferences delivered within a latency threshold. (for example, number of frames per second). When deploying a system with deep learning inference, select the throughput that delivers the best trade-off between latency and power for the price and performance that meets your requirements. +2. **Value** - While throughput is important, what is more critical in edge AI deployments is the performance efficiency or performance-per-cost. Application performance in throughput per dollar of system cost is the best measure of value. +3. **Efficiency** - System power is a key consideration from the edge to the data center. When selecting deep learning solutions, power efficiency (throughput/watt) is a critical factor to consider. Intel designs provide excellent power efficiency for running deep learning workloads. +4. **Total Benefit** (Most applicable for Intel® VPU Platforms) - Combining the factors of value and efficiency can be a good way to compare which hardware yields the best performance per watt and per dollar for your particular use case. + +--- + +## Intel® Xeon® E-2124G + +![](img/throughput_xeon_e212g.png) +![](img/value_xeon_e212g.png) +![](img/eff_xeon_e212g.png) + +--- + +## Intel® Xeon® Silver 4216R + +![](img/throughput_xeon_silver.png) +![](img/value_xeon_silver.png) +![](img/eff_xeon_silver.png) + +--- + +## Intel® Xeon® Gold 5218T + +![](img/throughput_xeon_gold.png) +![](img/value_xeon_gold.png) +![](img/eff_xeon_gold.png) + +--- + +## Intel® Xeon® Platinum 8270 + +![](img/throughput_xeon_platinum.png) +![](img/value_xeon_platinum.png) +![](img/eff_xeon_platinum.png) + +--- + +## Intel® Atom™ x5-E3940 + +![](img/throughput_atom.png) +![](img/value_atom.png) +![](img/eff_atom.png) + +--- + +## Intel® Core™ i3-8100 + +![](img/throughput_i3.png) +![](img/value_i3.png) +![](img/eff_i3.png) + +--- + +## Intel® Core™ i5-8500 + +![](img/throughput_i5.png) +![](img/value_i5.png) +![](img/eff_i5.png) + +--- + +## Intel® Core™ i7-8700T + +![](img/throughput_i7.png) +![](img/value_i7.png) +![](img/eff_i7.png) + +--- + +## Intel® Core™ i9-10920X + +![](img/throughput_i9.png) +![](img/value_i9.png) +![](img/eff_i9.png) + +--- + +## Intel® Neural Compute Stick 2 + +![](img/throughput_ncs2.png) +![](img/value_ncs2.png) +![](img/eff_ncs2.png) +![](img/benefit_ncs2.png) + +--- + +## Intel® Vision Accelerator Design with Intel® Movidius™ VPUs (Uzel* UI-AR8) + +![](img/throughput_hddlr.png) +![](img/value_hddlr.png) +![](img/eff_hddlr.png) + +--- + +## Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA + +![](img/throughput_ivad_fpga.png) +![](img/value_ivad_fpga.png) +![](img/eff_ivad_fpga.png) + +## Platform Configurations + +Intel® Distribution of OpenVINO™ toolkit performance benchmark numbers are based on release 2020.4. + +Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. Performance results are based on testing as of July 8, 2020 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. + +Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information, see [Performance Benchmark Test Disclosure](https://www.intel.com/content/www/us/en/benchmarks/benchmark.html). + +Your costs and results may vary. + +© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. + +Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. [Notice Revision #2010804](https://software.intel.com/articles/optimization-notice). + +Testing by Intel done on: see test date for each HW platform below. + +**CPU Inference Engines** + +| | Intel® Xeon® E-2124G | Intel® Xeon® Silver 4216R | Intel® Xeon® Gold 5218T | Intel® Xeon® Platinum 8270 | +| ------------------------------- | ----------------------| ---------------------------- | ---------------------------- | ---------------------------- | +| Motherboard | ASUS* WS C246 PRO | Intel® Server Board S2600STB | Intel® Server Board S2600STB | Intel® Server Board S2600STB | +| CPU | Intel® Xeon® E-2124G CPU @ 3.40GHz | Intel® Xeon® Silver 4216R CPU @ 2.20GHz | Intel® Xeon® Gold 5218T CPU @ 2.10GHz | Intel® Xeon® Platinum 8270 CPU @ 2.70GHz | +| Hyper Threading | OFF | ON | ON | ON | +| Turbo Setting | ON | ON | ON | ON | +| Memory | 2 x 16 GB DDR4 2666MHz| 12 x 32 GB DDR4 2666MHz | 12 x 32 GB DDR4 2666MHz | 12 x 32 GB DDR4 2933MHz | +| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | +| Kernel Version | 5.3.0-24-generic | 5.3.0-24-generic | 5.3.0-24-generic | 5.3.0-24-generic | +| BIOS Vendor | American Megatrends Inc.* | Intel Corporation | Intel Corporation | Intel Corporation | +| BIOS Version | 0904 | SE5C620.86B.02.01.
0009.092820190230 | SE5C620.86B.02.01.
0009.092820190230 | SE5C620.86B.02.01.
0009.092820190230 | +| BIOS Release | April 12, 2019 | September 28, 2019 | September 28, 2019 | September 28, 2019 | +| BIOS Settings | Select optimized default settings,
save & exit | Select optimized default settings,
change power policy
to "performance",
save & exit | Select optimized default settings,
change power policy to "performance",
save & exit | Select optimized default settings,
change power policy to "performance",
save & exit | +| Batch size | 1 | 1 | 1 | 1 | +| Precision | INT8 | INT8 | INT8 | INT8 | +| Number of concurrent inference requests | 4 | 32 | 32 | 52 | +| Test Date | July 8, 2020 | July 8, 2020 | July 8, 2020 | July 8, 2020 | +| Power dissipation, TDP in Watt | [71](https://ark.intel.com/content/www/us/en/ark/products/134854/intel-xeon-e-2124g-processor-8m-cache-up-to-4-50-ghz.html#tab-blade-1-0-1) | [125](https://ark.intel.com/content/www/us/en/ark/products/193394/intel-xeon-silver-4216-processor-22m-cache-2-10-ghz.html#tab-blade-1-0-1) | [105](https://ark.intel.com/content/www/us/en/ark/products/193953/intel-xeon-gold-5218t-processor-22m-cache-2-10-ghz.html#tab-blade-1-0-1) | [205](https://ark.intel.com/content/www/us/en/ark/products/192482/intel-xeon-platinum-8270-processor-35-75m-cache-2-70-ghz.html#tab-blade-1-0-1) | +| CPU Price on July 8, 2020, USD
Prices may vary | [213](https://ark.intel.com/content/www/us/en/ark/products/134854/intel-xeon-e-2124g-processor-8m-cache-up-to-4-50-ghz.html) | [1,002](https://ark.intel.com/content/www/us/en/ark/products/193394/intel-xeon-silver-4216-processor-22m-cache-2-10-ghz.html) | [1,349](https://ark.intel.com/content/www/us/en/ark/products/193953/intel-xeon-gold-5218t-processor-22m-cache-2-10-ghz.html) | [7,405](https://ark.intel.com/content/www/us/en/ark/products/192482/intel-xeon-platinum-8270-processor-35-75m-cache-2-70-ghz.html) | + +**CPU Inference Engines (continue)** + +| | Intel® Core™ i5-8500 | Intel® Core™ i7-8700T | Intel® Core™ i9-10920X | +| -------------------- | ---------------------------------- | ----------------------------------- |--------------------------------------| +| Motherboard | ASUS* PRIME Z370-A | GIGABYTE* Z370M DS3H-CF | ASUS* PRIME X299-A II | +| CPU | Intel® Core™ i5-8500 CPU @ 3.00GHz | Intel® Core™ i7-8700T CPU @ 2.40GHz | Intel® Core™ i9-10920X CPU @ 3.50GHz | +| Hyper Threading | OFF | ON | ON | +| Turbo Setting | ON | ON | ON | +| Memory | 2 x 16 GB DDR4 2666MHz | 4 x 16 GB DDR4 2400MHz | 4 x 16 GB DDR4 2666MHz | +| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | +| Kernel Version | 5.3.0-24-generic | 5.0.0-23-generic | 5.0.0-23-generic | +| BIOS Vendor | American Megatrends Inc.* | American Megatrends Inc.* | American Megatrends Inc.* | +| BIOS Version | 2401 | F11 | 505 | +| BIOS Release | July 12, 2019 | March 13, 2019 | December 17, 2019 | +| BIOS Settings | Select optimized default settings,
save & exit | Select optimized default settings,
set OS type to "other",
save & exit | Default Settings | +| Batch size | 1 | 1 | 1 | +| Precision | INT8 | INT8 | INT8 | +| Number of concurrent inference requests | 3 | 4 | 24 | +| Test Date | July 8, 2020 | July 8, 2020 | July 8, 2020 | +| Power dissipation, TDP in Watt | [65](https://ark.intel.com/content/www/us/en/ark/products/129939/intel-core-i5-8500-processor-9m-cache-up-to-4-10-ghz.html#tab-blade-1-0-1) | [35](https://ark.intel.com/content/www/us/en/ark/products/129948/intel-core-i7-8700t-processor-12m-cache-up-to-4-00-ghz.html#tab-blade-1-0-1) | [165](https://ark.intel.com/content/www/us/en/ark/products/198012/intel-core-i9-10920x-x-series-processor-19-25m-cache-3-50-ghz.html) | +| CPU Price on July 8, 2020, USD
Prices may vary | [192](https://ark.intel.com/content/www/us/en/ark/products/129939/intel-core-i5-8500-processor-9m-cache-up-to-4-10-ghz.html) | [303](https://ark.intel.com/content/www/us/en/ark/products/129948/intel-core-i7-8700t-processor-12m-cache-up-to-4-00-ghz.html) | [700](https://ark.intel.com/content/www/us/en/ark/products/198012/intel-core-i9-10920x-x-series-processor-19-25m-cache-3-50-ghz.html) + +**CPU Inference Engines (continue)** + +| | Intel Atom® x5-E3940 | Intel® Core™ i3-8100 | +| -------------------- | ---------------------------------- |----------------------------------- | +| Motherboard | | GIGABYTE* Z390 UD | +| CPU | Intel Atom® Processor E3940 @ 1.60GHz | Intel® Core™ i3-8100 CPU @ 3.60GHz | +| Hyper Threading | OFF | OFF | +| Turbo Setting | ON | OFF | +| Memory | 1 x 8 GB DDR3 1600MHz | 4 x 8 GB DDR4 2400MHz | +| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | +| Kernel Version | 5.3.0-24-generic | 5.3.0-24-generic | +| BIOS Vendor | American Megatrends Inc.* | American Megatrends Inc.* | +| BIOS Version | 5.12 | F8 | +| BIOS Release | September 6, 2017 | May 24, 2019 | +| BIOS Settings | Default settings | Select optimized default settings,
set OS type to "other",
save & exit | +| Batch size | 1 | 1 | +| Precision | INT8 | INT8 | +| Number of concurrent inference requests | 4 | 4 | +| Test Date | July 8, 2020 | July 8, 2020 | +| Power dissipation, TDP in Watt | [9.5](https://ark.intel.com/content/www/us/en/ark/products/96485/intel-atom-x5-e3940-processor-2m-cache-up-to-1-80-ghz.html) | [65](https://ark.intel.com/content/www/us/en/ark/products/126688/intel-core-i3-8100-processor-6m-cache-3-60-ghz.html#tab-blade-1-0-1)| +| CPU Price on July 8, 2020, USD
Prices may vary | [34](https://ark.intel.com/content/www/us/en/ark/products/96485/intel-atom-x5-e3940-processor-2m-cache-up-to-1-80-ghz.html) | [117](https://ark.intel.com/content/www/us/en/ark/products/126688/intel-core-i3-8100-processor-6m-cache-3-60-ghz.html) | + + + +**Accelerator Inference Engines** + +| | Intel® Neural Compute Stick 2 | Intel® Vision Accelerator Design
with Intel® Movidius™ VPUs (Uzel* UI-AR8) | Intel® Vision Accelerator Design
with Intel® Arria® 10 FPGA - IEI/SAF3*| +| -------------------- | ------------------------------------- | ------------------------------------- | ------------------------- | +| VPU | 1 X Intel® Movidius™ Myriad™ X MA2485 | 8 X Intel® Movidius™ Myriad™ X MA2485 | 1 X Intel® Arria® 10 FPGA | +| Connection | USB 2.0/3.0 | PCIe X4 | PCIe X8 | +| Batch size | 1 | 1 | 1 | +| Precision | FP16 | FP16 | FP11 | +| Number of concurrent inference requests | 4 | 32 | 5 | +| Power dissipation, TDP in Watt | 2.5 | [30](https://www.mouser.com/ProductDetail/IEI/MUSTANG-V100-MX8-R10?qs=u16ybLDytRaZtiUUvsd36w%3D%3D) | [60](https://www.mouser.com/ProductDetail/IEI/MUSTANG-F100-A10-R10?qs=sGAEpiMZZMtNlGR3Dbecs5Qs0RmP5oxxCbTJPjyRuMXthliRUwiVGw%3D%3D) | +| CPU Price, USD
Prices may vary | [69](https://ark.intel.com/content/www/us/en/ark/products/140109/intel-neural-compute-stick-2.html) (from July 8, 2020) | [768](https://www.mouser.com/ProductDetail/IEI/MUSTANG-V100-MX8-R10?qs=u16ybLDytRaZtiUUvsd36w%3D%3D) (from May 15, 2020) | [1,650](https://www.bhphotovideo.com/c/product/1477989-REG/qnap_mustang_f100_a10_r10_pcie_fpga_highest_performance.html/?ap=y&ap=y&smp=y&msclkid=371b373256dd1a52beb969ecf5981bf8) (from July 8, 2020) | +| Host Computer | Intel® Core™ i7 | Intel® Core™ i5 | Intel® Xeon® E3 | +| Motherboard | ASUS* Z370-A II | Uzelinfo* / US-E1300 | IEI/SAF3* | +| CPU | Intel® Core™ i7-8700 CPU @ 3.20GHz | Intel® Core™ i5-6600 CPU @ 3.30GHz | Intel® Xeon® CPU E3-1268L v5 @ 2.40GHz | +| Hyper Threading | ON | OFF | OFF | +| Turbo Setting | ON | ON | ON | +| Memory | 4 x 16 GB DDR4 2666MHz | 2 x 16 GB DDR4 2400MHz | 2 x 16 GB DDR4 2666MHz | +| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu* 16.04 LTS | +| Kernel Version | 5.0.0-23-generic | 5.0.0-23-generic | 4.13.0-45-generic | +| BIOS Vendor | American Megatrends Inc.* | American Megatrends Inc.* | American Megatrends Inc.* | +| BIOS Version | 411 | 5.12 | V2RMAR15 | +| BIOS Release | September 21, 2018 | September 21, 2018 | December 03, 2019 | +| Test Date | July 8, 2020 | July 8, 2020 | July 8, 2020 | + +Please follow this link for more detailed configuration descriptions: [Configuration Details](https://docs.openvinotoolkit.org/resources/benchmark_files/system_configurations_2020.4.html) + +\htmlonly + +
+

+\endhtmlonly +For more complete information about performance and benchmark results, visit: [www.intel.com/benchmarks](https://www.intel.com/benchmarks) and [Optimization Notice](https://software.intel.com/articles/optimization-notice). [Legal Information](../Legal_Information.md). +\htmlonly +

+
+\endhtmlonly diff --git a/docs/benchmarks/performance_benchmarks_faq.md b/docs/benchmarks/performance_benchmarks_faq.md new file mode 100644 index 00000000000000..0abe0da4f88479 --- /dev/null +++ b/docs/benchmarks/performance_benchmarks_faq.md @@ -0,0 +1,62 @@ +# Performance Information Frequently Asked Questions {#openvino_docs_performance_benchmarks_faq} + +The following questions and answers are related to performance benchmarks published on the [Performance Information](https://docs.openvinotoolkit.org/latest/_docs_performance_benchmarks.html) documentation site. + +#### 1. How often do performance benchmarks get updated? +New performance benchmarks are typically published on every `major.minor` release of the Intel® Distribution of OpenVINO™ toolkit. + +#### 2. Where can I find the models used in the performance benchmarks? +All of the models used are included in the toolkit's [Open Model Zoo](https://github.com/opencv/open_model_zoo) GitHub repository. + +#### 3. Will there be new models added to the list used for benchmarking? +The models used in the performance benchmarks were chosen based on general adoption and usage in deployment scenarios. We're continuing to add new models that support a diverse set of workloads and usage. + +#### 4. What does CF or TF in the graphs stand for? +CF means Caffe*, while TF means TensorFlow*. + +#### 5. How can I run the benchmark results on my own? +All of the performance benchmarks were generated using the open-sourced tool within the Intel® Distribution of OpenVINO™ toolkit called `benchmark_app`, which is available in both [C++](https://docs.openvinotoolkit.org/latest/_inference_engine_samples_benchmark_app_README.html) and [Python](https://docs.openvinotoolkit.org/latest/_inference_engine_tools_benchmark_tool_README.html). + +#### 6. What image sizes are used for the classification network models? +The image size used in the inference depends on the network being benchmarked. The following table shows the list of input sizes for each network model. +| **Model** | **Public Network** | **Task** | **Input Size** (Height x Width) | +|------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------|-----------------------------|-----------------------------------| +| [faster_rcnn_resnet50_coco-TF](https://github.com/opencv/open_model_zoo/tree/master/models/public/faster_rcnn_resnet50_coco) | Faster RCNN Tf | object detection | 600x1024 | +| [googlenet-v1-CF](https://github.com/opencv/open_model_zoo/tree/master/models/public/googlenet-v1) | GoogLeNet_ILSVRC-2012_Caffe | classification | 224x224 | +| [googlenet-v3-TF](https://github.com/opencv/open_model_zoo/tree/master/models/public/googlenet-v3) | Inception v3 Tf | classification | 299x299 | +| [mobilenet-ssd-CF](https://github.com/opencv/open_model_zoo/tree/master/models/public/mobilenet-ssd) | SSD (MobileNet)_COCO-2017_Caffe | object detection | 300x300 | +| [mobilenet-v2-1.0-224-TF](https://github.com/opencv/open_model_zoo/tree/master/models/public/mobilenet-v2-1.0-224) | MobileNet v2 Tf | classification | 224x224 | +| [mobilenet-v2-CF](https://github.com/opencv/open_model_zoo/tree/master/models/public/mobilenet-v2) | Mobilenet V2 Caffe | classification | 224x224 | +| [resnet-101-CF](https://github.com/opencv/open_model_zoo/tree/master/models/public/resnet-101) | ResNet-101_ILSVRC-2012_Caffe | classification | 224x224 | +| [resnet-50-CF](https://github.com/opencv/open_model_zoo/tree/master/models/public/resnet-50) | ResNet-50_v1_ILSVRC-2012_Caffe | classification | 224x224 | +| [se-resnext-50-CF](https://github.com/opencv/open_model_zoo/tree/master/models/public/se-resnext-50) | Se-ResNext-50_ILSVRC-2012_Caffe | classification | 224x224 | +| [squeezenet1.1-CF](https://github.com/opencv/open_model_zoo/tree/master/models/public/squeezenet1.1) | SqueezeNet_v1.1_ILSVRC-2012_Caffe | classification | 227x227 | +| [ssd300-CF](https://github.com/opencv/open_model_zoo/tree/master/models/public/ssd300) | SSD (VGG-16)_VOC-2007_Caffe | object detection | 300x300 | + +#### 7. Where can I purchase the specific hardware used in the benchmarking? +Intel partners with various vendors all over the world. Visit the [Intel® AI: In Production Partners & Solutions Catalog](https://www.intel.com/content/www/us/en/internet-of-things/ai-in-production/partners-solutions-catalog.html) for a list of Equipment Makers and the [Supported Devices](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_Supported_Devices.html) documentation. You can also remotely test and run models before purchasing any hardware by using [Intel® DevCloud for the Edge](http://devcloud.intel.com/edge/). + +#### 8. How can I optimize my models for better performance or accuracy? +We published a set of guidelines and recommendations to optimize your models available in an [introductory](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Intro_to_Performance.html) guide and an [advanced](https://docs.openvinotoolkit.org/latest/_docs_optimization_guide_dldt_optimization_guide.html) guide. For further support, please join the conversation in the [Community Forum](https://software.intel.com/en-us/forums/intel-distribution-of-openvino-toolkit). + +#### 9. Why are INT8 optimized models used for benchmarking on CPUs with no VNNI support? +The benefit of low-precision optimization using the OpenVINO™ toolkit model optimizer extends beyond processors supporting VNNI through Intel® DL Boost. The reduced bit width of INT8 compared to FP32 allows Intel® CPU to process the data faster and thus offers better throughput on any converted model agnostic of the intrinsically supported low-precision optimizations within Intel® hardware. Please refer to [INT8 vs. FP32 Comparison on Select Networks and Platforms](https://docs.openvinotoolkit.org/latest/_docs_performance_int8_vs_fp32.html) for comparison on boost factors for different network models and a selection of Intel® CPU architectures, including AVX-2 with Intel® Core™ i7-8700T, and AVX-512 (VNNI) with Intel® Xeon® 5218T and Intel® Xeon® 8270. + +#### 10. Previous releases included benchmarks on googlenet-v1. Why is there no longer benchmarks on this neural network model? +We replaced googlenet-v1 to [resnet-18-pytorch](https://github.com/opencv/open_model_zoo/blob/master/models/public/resnet-18-pytorch/resnet-18-pytorch.md) due to changes in developer usage. The public model resnet-18 is used by many developers as an Image Classification model. This pre-optimized model was also trained on the ImageNet database, similar to googlenet-v1. Both googlenet-v1 and resnet-18 will remain part of the Open Model Zoo. Developers are encouraged to utilize resnet-18-pytorch for Image Classification use cases. + + +\htmlonly + +
+

+\endhtmlonly +For more complete information about performance and benchmark results, visit: [www.intel.com/benchmarks](https://www.intel.com/benchmarks) and [Optimization Notice](https://software.intel.com/articles/optimization-notice). [Legal Information](../Legal_Information.md). +\htmlonly +

+
+\endhtmlonly diff --git a/docs/benchmarks/performance_int8_vs_fp32.md b/docs/benchmarks/performance_int8_vs_fp32.md new file mode 100644 index 00000000000000..1bf64469ea0389 --- /dev/null +++ b/docs/benchmarks/performance_int8_vs_fp32.md @@ -0,0 +1,375 @@ +# INT8 vs FP32 Comparison on Select Networks and Platforms {#openvino_docs_performance_int8_vs_fp32} + +The table below illustrates the speed-up factor for the performance gain by switching from an FP32 representation of an OpenVINO™ supported model to its INT8 representation. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Intel® Core™
i7-8700T
Intel® Xeon®
Gold
5218T
Intel® Xeon®
Platinum
8270
Intel® Core™
i7-1065G7
OpenVINO
benchmark
model name
DatasetThroughput speed-up FP16-INT8 vs FP32
bert-large-
uncased-whole-word-
masking-squad-0001
SQuAD1.52.52.0N/A
brain-tumor-
segmentation-
0001-mxnet
BraTS1.51.71.61.9
brain-tumor-
segmentation-
0002-cf2
BraTS
2017
1.21.71.42.2
deeplabv3-tfVOC 2012
Segmentation
1.52.42.62.8
facenet-
20180408-
102900-tf
LFW2.03.53.53.5
faster_rcnn_
resnet50_coco-tf
MS COCO1.73.43.43.6
googlenet-v1-caffeImageNet1.52.92.23.2
inception-v3-tfImageNet1.83.84.03.7
mobilenet-
ssd-caffe
VOC20121.53.03.33.1
mobilenet-v1-1.0-
224-caffe
ImageNet1.53.23.92.9
mobilenet-v2-1.0-
224-tf
ImageNet1.32.63.82.2
mobilenet-v2-
caffe
ImageNet1.32.53.42.2
resnet-101-
caffe
ImageNet1.83.73.73.6
resnet-18-
pytorch
ImageNet1.93.73.83.6
resnet-50-
caffe
ImageNet1.83.63.93.5
resnet-50-
pytorch
ImageNet1.83.63.93.5
squeezenet1.1-
caffe
ImageNet1.62.93.23.0
ssd_mobilenet_
v1_coco-tf
MS COCO1.63.03.43.1
ssd300-caffeMS COCO1.83.73.73.8
ssdlite_
mobilenet_
v2-tf
MS COCO1.42.33.02.4
+ +The following table shows the absolute accuracy drop that is calculated as the difference in accuracy between the FP32 representation of a model and its INT8 representation. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Intel® Core™
i9-10920X CPU
@ 3.50GHZ (VNNI)
Intel® Core™
i9-9820X CPU
@ 3.30GHz (AVX512)
Intel® Core™
i7-8700 CPU
@ 3.20GHz (AVX2)
OpenVINO Benchmark
Model Name
DatasetMetric NameAbsolute Accuracy Drop, %
bert-large-
uncased-whole-word-
masking-squad-0001
SQuADF10.460.700.64
brain-tumor-
segmentation-
0001-mxnet
BraTSDice-index@
Mean@
Overall Tumor
0.080.080.14
brain-tumor-
segmentation-
0002-cf2
BraTS
2017
Dice-index@
Mean@
Overall Tumor
0.160.140.13
deeplabv3-tfVOC 2012
Segmentation
mean_iou0.280.710.71
facenet-
20180408-
102900-tf
LFWpairwise_
accuracy
_subsets
0.020.050.05
faster_rcnn_
resnet50_coco-tf
MS COCOcoco_
precision
0.210.200.20
googlenet-v1-caffeImageNetacc@top-10.240.190.20
inception-v3-tfImageNetacc@top-10.030.010.01
mobilenet-
ssd-caffe
VOC2012mAP0.350.340.34
mobilenet-v1-1.0-
224-caffe
ImageNetacc@top-10.190.180.18
mobilenet-v2-1.0-
224-tf
ImageNetacc@top-10.450.940.94
mobilenet-v2-
caffe
ImageNetacc@top-10.241.451.45
resnet-101-
caffe
ImageNetacc@top-10.000.020.02
resnet-18-
pytorch
ImageNetacc@top-10.260.250.25
resnet-50-
caffe
ImageNetacc@top-10.160.120.12
resnet-50-
pytorch
ImageNetacc@top-10.200.170.17
squeezenet1.1-
caffe
ImageNetacc@top-10.660.640.64
ssd_mobilenet_
v1_coco-tf
MS COCOCOCO mAp0.243.073.07
ssd300-caffeMS COCOCOCO mAp0.060.050.05
ssdlite_
mobilenet_
v2-tf
MS COCOCOCO mAp0.140.470.47
+ +![INT8 vs FP32 Comparison](img/int8vsfp32.png "INT8 vs FP32 Comparison on Select Networks and Platforms") + +\htmlonly + +
+

+\endhtmlonly +For more complete information about performance and benchmark results, visit: [www.intel.com/benchmarks](https://www.intel.com/benchmarks) and [Optimization Notice](https://software.intel.com/articles/optimization-notice). [Legal Information](../Legal_Information.md). +\htmlonly +

+
+\endhtmlonly \ No newline at end of file diff --git a/docs/doxygen/ie_c_api.xml b/docs/doxygen/ie_c_api.xml new file mode 100644 index 00000000000000..4c23367111f18e --- /dev/null +++ b/docs/doxygen/ie_c_api.xml @@ -0,0 +1,206 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/doxygen/ie_docs.xml b/docs/doxygen/ie_docs.xml new file mode 100644 index 00000000000000..f5e312d7285087 --- /dev/null +++ b/docs/doxygen/ie_docs.xml @@ -0,0 +1,984 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/doxygen/ie_plugin_api.xml b/docs/doxygen/ie_plugin_api.xml new file mode 100644 index 00000000000000..b2839444af8421 --- /dev/null +++ b/docs/doxygen/ie_plugin_api.xml @@ -0,0 +1,24 @@ + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/doxygen/ie_py_api.xml b/docs/doxygen/ie_py_api.xml new file mode 100644 index 00000000000000..35254f6edb5b15 --- /dev/null +++ b/docs/doxygen/ie_py_api.xml @@ -0,0 +1,204 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/get_started/get_started_linux.md b/docs/get_started/get_started_linux.md new file mode 100644 index 00000000000000..495882d73f0d16 --- /dev/null +++ b/docs/get_started/get_started_linux.md @@ -0,0 +1,556 @@ +# Get Started with OpenVINO™ Toolkit on Linux* {#openvino_docs_get_started_get_started_linux} + +The OpenVINO™ toolkit optimizes and runs Deep Learning Neural Network models on Intel® hardware. This guide helps you get started with the OpenVINO™ toolkit you installed on a Linux* operating system. + +In this guide, you will: +* Learn the OpenVINO™ inference workflow. +* Run demo scripts that perform the steps for you. These demo scripts illustrate the workflow. +* Run the workflow steps yourself, using detailed instructions with a code sample and demo application. + +## OpenVINO™ toolkit Components +The toolkit consists of three primary components: +* **Inference Engine:** The software libraries that run inference against the Intermediate Representation (optimized model) to produce inference results. +* **Model Optimizer:** Optimizes models for Intel® architecture, converting models into a format compatible with the Inference Engine. This format is called an Intermediate Representation (IR). +* **Intermediate Representation (IR):** The Model Optimizer output. A model converted to a format that has been optimized for Intel® architecture and is usable by the Inference Engine. + +In addition, demo scripts, code samples and demo applications are provided to help you get up and running with the toolkit: +* **Demo Scripts** - Shell scripts that automatically perform the workflow steps to demonstrate running inference pipelines for different scenarios. +* [**Code Samples**](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Samples_Overview.html) - Small console applications that show you how to: + * Utilize specific OpenVINO capabilities in an application + * Perform specific tasks, such as loading a model, running inference, querying specific device capabilities, and more. +* [**Demo Applications**](https://docs.openvinotoolkit.org/latest/_demos_README.html) - Console applications that provide robust application templates to help you implement specific deep learning scenarios. These applications involve increasingly complex processing pipelines that gather analysis data from several models that run inference simultaneously, such as detecting a person in a video stream along with detecting the person's physical attributes, such as age, gender, and emotional state. + +## Intel® Distribution of OpenVINO™ toolkit Installation and Deployment Tools Directory Structure +This guide assumes you completed all Intel® Distribution of OpenVINO™ toolkit installation and configuration steps. If you have not yet installed and configured the toolkit, see [Install Intel® Distribution of OpenVINO™ toolkit for Linux*](https://docs.openvinotoolkit.org/latest/_docs_install_guides_installing_openvino_linux.html). + +By default, the installation directory is `/opt/intel/openvino`, but the installation gave you the option to use the directory of your choice. If you installed the Intel® Distribution of OpenVINO™ toolkit to a directory other than the default, replace `/opt/intel` with the directory in which you installed the software. + +The primary tools for deploying your models and applications are installed to the `/opt/intel/openvino/deployment_tools` directory. +
+ Click for the Intel® Distribution of OpenVINO™ toolkit directory structure + + +| Directory         | Description | +|:----------------------------------------|:--------------------------------------------------------------------------------------| +| `demo/` | Demo scripts. Demonstrate pipelines for inference scenarios, automatically perform steps and print detailed output to the console. For more information, see the [Use OpenVINO: Demo Scripts](#use-openvino-demo-scripts) section.| +| `inference_engine/` | Inference Engine directory. Contains Inference Engine API binaries and source files, samples and extensions source files, and resources like hardware drivers.| +| `~intel_models/` | Symbolic link to the `intel_models` subfolder of the `open_model-zoo` folder | +|       `include/` | Inference Engine header files. For API documentation, see the [Inference Engine API Reference](./annotated.html). | +|       `lib/` | Inference Engine binaries.| +|       `samples/` | Inference Engine samples. Contains source code for C++ and Python* samples and build scripts. See the [Inference Engine Samples Overview](../IE_DG/Samples_Overview.md). | +|       `src/` | Source files for CPU extensions.| +| `model_optimizer/` | Model Optimizer directory. Contains configuration scripts, scripts to run the Model Optimizer and other files. See the [Model Optimizer Developer Guide](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +| `open_model_zoo/` | Open Model Zoo directory. Includes the Model Downloader tool to download [pre-trained OpenVINO](@ref omz_models_intel_index) and public models, OpenVINO models documentation, demo applications and the Accuracy Checker tool to evaluate model accuracy.| +|       `demos/` | Demo applications for inference scenarios. Also includes documentation and build scripts.| +|       `intel_models/` | Pre-trained OpenVINO models and associated documentation. See the [Overview of OpenVINO™ Toolkit Pre-Trained Models](@ref omz_models_intel_index).| +|       `tools/` | Model Downloader and Accuracy Checker tools. | +| `tools/` | Contains a symbolic link to the Model Downloader folder and auxiliary tools to work with your models: Calibration tool, Benchmark and Collect Statistics tools.| + +
+ +## OpenVINO™ Workflow Overview + +The simplified OpenVINO™ workflow is: +1. **Get a trained model** for your inference task. Example inference tasks: pedestrian detection, face detection, vehicle detection, license plate recognition, head pose. +2. **Run the trained model through the Model Optimizer** to convert the model to an Intermediate Representation, which consists of a pair of `.xml` and `.bin` files that are used as the input for Inference Engine. +3. **Use the Inference Engine API in the application** to run inference against the Intermediate Representation (optimized model) and output inference results. The application can be an OpenVINO™ sample, demo, or your own application. + +## Use the Demo Scripts to Learn the Workflow + +The demo scripts in `/opt/intel/openvino/deployment_tools/demo` give you a starting point to learn the OpenVINO workflow. These scripts automatically perform the workflow steps to demonstrate running inference pipelines for different scenarios. The demo steps let you see how to: +* Compile several samples from the source files delivered as part of the OpenVINO toolkit. +* Download trained models. +* Perform pipeline steps and see the output on the console. + +> **NOTE**: You must have Internet access to run the demo scripts. If your Internet access is through a proxy server, make sure the operating system environment proxy information is configured. + +The demo scripts can run inference on any [supported target device](https://software.intel.com/en-us/openvino-toolkit/hardware). Although the default inference device is CPU, you can use the `-d` parameter to change the inference device. The general command to run the scripts looks as follows: + +```sh +./ -d [CPU, GPU, MYRIAD, HDDL] +``` + +Before running the demo applications on Intel® Processor Graphics or on an Intel® Neural Compute Stick 2 device, you must complete the [Steps for Intel® Processor Graphics (GPU)](https://docs.openvinotoolkit.org/2020.1/_docs_install_guides_installing_openvino_linux.html#additional-GPU-steps) or [Steps for Intel® Neural Compute Stick 2](https://docs.openvinotoolkit.org/2020.1/_docs_install_guides_installing_openvino_linux.html#additional-NCS-steps). + +The following paragraphs describe each demo script. + +### Image Classification Demo Script +The `demo_squeezenet_download_convert_run` script illustrates the image classification pipeline. + +The script: +1. Downloads a SqueezeNet model. +2. Runs the Model Optimizer to convert the model to the IR. +3. Builds the Image Classification Sample Async application. +4. Runs the compiled sample with the `car.png` image located in the `demo` directory. + +
+ Click for an example of running the Image Classification demo script + +To run the script to perform inference on a CPU: + +```sh +./demo_squeezenet_download_convert_run.sh +``` + +When the script completes, you see the label and confidence for the top-10 categories: + +```sh + +Top 10 results: + +Image /home/user/dldt/inference-engine/samples/sample_data/car.png + +classid probability label +------- ----------- ----- +817 0.8363345 sports car, sport car +511 0.0946488 convertible +479 0.0419131 car wheel +751 0.0091071 racer, race car, racing car +436 0.0068161 beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon +656 0.0037564 minivan +586 0.0025741 half track +717 0.0016069 pickup, pickup truck +864 0.0012027 tow truck, tow car, wrecker +581 0.0005882 grille, radiator grille + + +total inference time: 2.6642941 +Average running time of one iteration: 2.6642941 ms + +Throughput: 375.3339402 FPS + +[ INFO ] Execution successful +``` + +
+ +### Inference Pipeline Demo Script +The `demo_security_barrier_camera` uses vehicle recognition in which vehicle attributes build on each other to narrow in on a specific attribute. + +The script: +1. Downloads three pre-trained model IRs. +2. Builds the Security Barrier Camera Demo application. +3. Runs the application with the downloaded models and the `car_1.bmp` image from the `demo` directory to show an inference pipeline. + +This application: + +1. Identifies an object identified as a vehicle. +2. Uses the vehicle identification as input to the second model, which identifies specific vehicle attributes, including the license plate. +3. Uses the the license plate as input to the third model, which recognizes specific characters in the license plate. + +
+ Click for an example of Running the Pipeline demo script + +To run the script performing inference on Intel® Processor Graphics: + +```sh +./demo_security_barrier_camera.sh -d GPU +``` + +When the verification script completes, you see an image that displays the resulting frame with detections rendered as bounding boxes, and text: + +![](https://docs.openvinotoolkit.org/latest/inference_pipeline_script_lnx.png) + +
+ +### Benchmark Demo Script +The `demo_benchmark_app` script illustrates how to use the Benchmark Application to estimate deep learning inference performance on supported devices. + +The script: +1. Downloads a SqueezeNet model. +2. Runs the Model Optimizer to convert the model to the IR. +3. Builds the Inference Engine Benchmark tool. +4. Runs the tool with the `car.png` image located in the `demo` directory. + +
+ Click for an example of running the Benchmark demo script + +To run the script that performs inference on Intel® Vision Accelerator Design with Intel® Movidius™ VPUs: + +```sh +./demo_squeezenet_download_convert_run.sh -d HDDL +``` +When the verification script completes, you see the performance counters, resulting latency, and throughput values displayed on the screen. +
+ +## Use Code Samples and Demo Applications to Learn the Workflow + +This section guides you through a simplified workflow for the Intel® Distribution of OpenVINO™ toolkit using code samples and demo applications. + +You will perform the following steps: + +1. Use the Model Downloader to download suitable models. +2. Convert the models with the Model Optimizer. +3. Download media files to run inference on. +4. Run inference on the Image Classification Code Sample and see the results. +5. Run inference on the Security Barrier Camera Demo application and see the results. + +Each demo and code sample is a separate application, but they use the same behavior and components. The code samples and demo applications are: + +* [Code Samples](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Samples_Overview.html) - Small console applications that show how to utilize specific OpenVINO capabilities within an application and execute specific tasks such as loading a model, running inference, querying specific device capabilities, and more. + +* [Demo Applications](https://docs.openvinotoolkit.org/latest/_demos_README.html) - Console applications that provide robust application templates to support developers in implementing specific deep learning scenarios. They may also involve more complex processing pipelines that gather analysis from several models that run inference simultaneously. For example concurrently detecting a person in a video stream and detecting attributes such as age, gender and/or emotions. + +Inputs you'll need to specify: +- **A compiled OpenVINO™ code sample or demo application** that runs inferencing against a model that has been run through the Model Optimizer, resuiting in an IR, using the other inputs you provide. +- **One or more models** in the Intermediate Representation format. Each model is trained for a specific task. Examples include pedestrian detection, face detection, vehicle detection, license plate recognition, head pose, and others. Different models are used for different applications. Models can be chained together to provide multiple features; for example vehicle + make/model + license plate recognition. +- **One or more media files**. The media is typically a video file, but can be a still photo. +- **One or more target device** on which you run inference. The target device can be the CPU, GPU, FPGA, or VPU accelerator. + +### Build the Code Samples and Demo Applications + +To perform sample inference, run the Image Classification code sample and Security Barrier Camera demo application that were automatically compiled when you ran the Image Classification and Inference Pipeline demo scripts. The binary files are in the `~/inference_engine_cpp_samples_build/intel64/Release` and `~/inference_engine_demos_build/intel64/Release` directories, respectively. + +To run other sample code or demo applications, build them from the source files delivered as part of the OpenVINO toolkit. To learn how to build these, see the [Inference Engine Code Samples Overview](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Samples_Overview.html#build_samples_linux) and the [Demo Applications Overview](https://docs.openvinotoolkit.org/latest/_demos_README.html#build_the_demo_applications) sections. + +### Step 1: Download the Models + +You must have a model that is specific for you inference task. Example model types are: +- Classification (AlexNet, GoogleNet, SqueezeNet, others) - Detects one type of element in a frame. +- Object Detection (SSD, YOLO) - Draws bounding boxes around multiple types of objects. +- Custom (Often based on SSD) + +Options to find a model suitable for the OpenVINO™ toolkit are: +- Download public and Intel's pre-trained models from the [Open Model Zoo](https://github.com/opencv/open_model_zoo) using [Model Downloader tool](https://docs.openvinotoolkit.org/latest/_tools_downloader_README.html#model_downloader_usage). +- Download from GitHub*, Caffe* Zoo, TensorFlow* Zoo, etc. +- Train your own model. + +This guide uses the Model Downloader to get pre-trained models. You can use one of the following options to find a model: + +* **List the models available in the downloader**: +```sh +cd /opt/intel/openvino/deployment_tools/tools/model_downloader/ +``` +```sh +python3 info_dumper.py --print_all +``` + +* **Use `grep` to list models that have a specific name pattern**: +```sh +python3 info_dumper.py --print_all | grep +``` + +Use the Model Downloader to download the models to a models directory. This guide uses `` as the models directory and `` as the model name: +```sh +sudo python3 ./downloader.py --name --output_dir +``` +> **NOTE:** Always run the downloader with `sudo`. + +Download the following models if you want to run the Image Classification Sample and Security Barrier Camera Demo application: + +|Model Name | Code Sample or Demo App | +|-----------------------------------------------|-----------------------------------------------------| +|`squeezenet1.1` | Image Classification Sample | +|`vehicle-license-plate-detection-barrier-0106` | Security Barrier Camera Demo application | +|`vehicle-attributes-recognition-barrier-0039` | Security Barrier Camera Demo application | +|`license-plate-recognition-barrier-0001` | Security Barrier Camera Demo application | + +
+ Click for an example of downloading the SqueezeNet Caffe* model + +To download the SqueezeNet 1.1 Caffe* model to the `~/models` folder: + +```sh +sudo python3 ./downloader.py --name squeezenet1.1 --output_dir ~/models +``` + +Your screen looks similar to this after the download: +``` +###############|| Downloading models ||############### + +========= Downloading /home/username/models/public/squeezenet1.1/squeezenet1.1.prototxt + +========= Downloading /home/username/models/public/squeezenet1.1/squeezenet1.1.caffemodel +... 100%, 4834 KB, 3157 KB/s, 1 seconds passed + +###############|| Post processing ||############### + +========= Replacing text in /home/username/models/public/squeezenet1.1/squeezenet1.1.prototxt ========= +``` +
+ +
+ Click for an example of downloading models for the Security Barrier Camera Demo application + +To download all three pre-trained models in FP16 precision to the `~/models` folder: + +```sh +./downloader.py --name vehicle-license-plate-detection-barrier-0106,vehicle-attributes-recognition-barrier-0039,license-plate-recognition-barrier-0001 --output_dir ~/models --precisions FP16 +``` +Your screen looks similar to this after the download: +``` +################|| Downloading models ||################ + +========== Downloading /home/username/models/intel/vehicle-license-plate-detection-barrier-0106/FP16/vehicle-license-plate-detection-barrier-0106.xml +... 100%, 204 KB, 183949 KB/s, 0 seconds passed + +========== Downloading /home/username/models/intel/vehicle-license-plate-detection-barrier-0106/FP16/vehicle-license-plate-detection-barrier-0106.bin +... 100%, 1256 KB, 3948 KB/s, 0 seconds passed + +========== Downloading /home/username/models/intel/vehicle-attributes-recognition-barrier-0039/FP16/vehicle-attributes-recognition-barrier-0039.xml +... 100%, 32 KB, 133398 KB/s, 0 seconds passed + +========== Downloading /home/username/models/intel/vehicle-attributes-recognition-barrier-0039/FP16/vehicle-attributes-recognition-barrier-0039.bin +... 100%, 1222 KB, 3167 KB/s, 0 seconds passed + +========== Downloading /home/username/models/intel/license-plate-recognition-barrier-0001/FP16/license-plate-recognition-barrier-0001.xml +... 100%, 47 KB, 85357 KB/s, 0 seconds passed + +========== Downloading /home/username/models/intel/license-plate-recognition-barrier-0001/FP16/license-plate-recognition-barrier-0001.bin +... 100%, 2378 KB, 5333 KB/s, 0 seconds passed + +################|| Post-processing ||################ +``` + +
+ +### Step 2: Convert the Models to the Intermediate Representation + +In this step, your trained models are ready to run through the Model Optimizer to convert them to the Intermediate Representation (IR) format. This is required before using the Inference Engine with the model. + +Models in the Intermediate Representation format always include a pair of `.xml` and `.bin` files. Make sure you have these files for the Inference Engine to find them. +- **REQUIRED:** `model_name.xml` +- **REQUIRED:** `model_name.bin` + +This guide uses the public SqueezeNet 1.1 Caffe\* model to run the Image Classification Sample. See the example to download a model in the Download Models section to learn how to download this model. + +The `squeezenet1.1` model is downloaded in the Caffe* format. You must use the Model Optimizer to convert the model to the IR. +The `vehicle-license-plate-detection-barrier-0106`, `vehicle-attributes-recognition-barrier-0039`, `license-plate-recognition-barrier-0001` models are downloaded in the Intermediate Representation format. You don't need to use the Model Optimizer to convert these models. + +1. Create an `` directory to contain the model's Intermediate Representation (IR). + +2. The Inference Engine can perform inference on different precision formats, such as `FP32`, `FP16`, `INT8`. To prepare an IR with specific precision, run the Model Optimizer with the appropriate `--data_type` option. + +3. Run the Model Optimizer script: + ```sh + cd /opt/intel/openvino/deployment_tools/model_optimizer + ``` + ```sh + python3 ./mo.py --input_model / --data_type --output_dir + ``` + The produced IR files are in the `` directory. + +
+ Click for an example of converting the SqueezeNet Caffe* model + +The following command converts the public SqueezeNet 1.1 Caffe\* model to the FP16 IR and saves to the `~/models/public/squeezenet1.1/ir` output directory: + +```sh + cd /opt/intel/openvino/deployment_tools/model_optimizer + ``` + ```sh + python3 ./mo.py --input_model ~/models/public/squeezenet1.1/squeezenet1.1.caffemodel --data_type FP16 --output_dir ~/models/public/squeezenet1.1/ir + ``` + +After the Model Optimizer script is completed, the produced IR files (`squeezenet1.1.xml`, `squeezenet1.1.bin`) are in the specified `~/models/public/squeezenet1.1/ir` directory. + +Copy the `squeezenet1.1.labels` file from the `/opt/intel/openvino/deployment_tools/demo/` to ``. This file contains the classes that ImageNet uses. Therefore, the inference results show text instead of classification numbers: + ```sh + cp /opt/intel/openvino/deployment_tools/demo/squeezenet1.1.labels + ``` +
+ +### Step 3: Download a Video or a Still Photo as Media + +Many sources are available from which you can download video media to use the code samples and demo applications. Possibilities include: +- https://videos.pexels.com +- https://images.google.com + +As an alternative, the Intel® Distribution of OpenVINO™ toolkit includes two sample images that you can use for running code samples and demo applications: +* `/opt/intel/openvino/deployment_tools/demo/car.png` +* `/opt/intel/openvino/deployment_tools/demo/car_1.bmp` + +### Step 4: Run the Image Classification Code Sample + +> **NOTE**: The Image Classification code sample is automatically compiled when you ran the Image Classification demo script. If you want to compile it manually, see the [Inference Engine Code Samples Overview](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Samples_Overview.html#build_samples_linux) section. + +To run the **Image Classification** code sample with an input image on the IR: + +1. Set up the OpenVINO environment variables: + ```sh + source /opt/intel/openvino/bin/setupvars.sh + ``` +2. Go to the code samples build directory: + ```sh + cd ~/inference_engine_samples_build/intel64/Release + ``` +3. Run the code sample executable, specifying the input media file, the IR of your model, and a target device on which you want to perform inference: + ```sh + classification_sample_async -i -m -d + ``` +
+ Click for examples of running the Image Classification code sample on different devices + +The following commands run the Image Classification Code Sample using the `car.png` file from the `/opt/intel/openvino/deployment_tools/demo/` directory as an input image, the IR of your model from `~/models/public/squeezenet1.1/ir` and on different hardware devices: + +**CPU:** + ```sh + ./classification_sample_async -i /opt/intel/openvino/deployment_tools/demo/car.png -m ~/models/public/squeezenet1.1/ir/squeezenet1.1.xml -d CPU + ``` + + **GPU:** + + > **NOTE**: Running inference on Intel® Processor Graphics (GPU) requires + [additional hardware configuration steps](https://docs.openvinotoolkit.org/latest/_docs_install_guides_installing_openvino_linux.html#additional-GPU-steps). + ```sh + ./classification_sample_async -i /opt/intel/openvino/deployment_tools/demo/car.png -m ~/models/public/squeezenet1.1/ir/squeezenet1.1.xml -d GPU + ``` + + **MYRIAD:** + + > **NOTE**: Running inference on VPU devices (Intel® Movidius™ Neural Compute + Stick or Intel® Neural Compute Stick 2) with the MYRIAD plugin requires + [additional hardware configuration steps](inference-engine/README.md#optional-additional-installation-steps-for-the-intel-movidius-neural-compute-stick-and-neural-compute-stick-2). + ```sh + ./classification_sample_async -i /opt/intel/openvino/deployment_tools/demo/car.png -m ~/models/public/squeezenet1.1/ir/squeezenet1.1.xml -d MYRIAD + ``` + +When the Sample Application completes, you see the label and confidence for the top-10 categories on the display. Below is a sample output with inference results on CPU: +```sh +Top 10 results: + +Image /home/user/dldt/inference-engine/samples/sample_data/car.png + +classid probability label +------- ----------- ----- +817 0.8363345 sports car, sport car +511 0.0946488 convertible +479 0.0419131 car wheel +751 0.0091071 racer, race car, racing car +436 0.0068161 beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon +656 0.0037564 minivan +586 0.0025741 half track +717 0.0016069 pickup, pickup truck +864 0.0012027 tow truck, tow car, wrecker +581 0.0005882 grille, radiator grille + + +total inference time: 2.6642941 +Average running time of one iteration: 2.6642941 ms + +Throughput: 375.3339402 FPS + +[ INFO ] Execution successful +``` + +
+ +### Step 5: Run the Security Barrier Camera Demo Application + +> **NOTE**: The Security Barrier Camera Demo Application is automatically compiled when you ran the Inference Pipeline demo scripts. If you want to build it manually, see the [Demo Applications Overview](https://docs.openvinotoolkit.org/latest/_demos_README.html#build_the_demo_applications) section. + +To run the **Security Barrier Camera Demo Application** using an input image on the prepared IRs: + +1. Set up the OpenVINO environment variables: + ```sh + source /opt/intel/openvino/bin/setupvars.sh + ``` +2. Go to the demo application build directory: + ```sh + cd ~/inference_engine_demos_build/intel64/Release + ``` +3. Run the demo executable, specifying the input media file, list of model IRs, and a target device on which to perform inference: + ```sh + ./security_barrier_camera_demo -i -m -m_va -m_lpr -d + ``` + +
+ Click for examples of running the Security Barrier Camera demo application on different devices + +**CPU:** + +```sh +./security_barrier_camera_demo -i /opt/intel/openvino/deployment_tools/demo/car_1.bmp -m /home/username/models/intel/vehicle-license-plate-detection-barrier-0106/FP16/vehicle-license-plate-detection-barrier-0106.xml -m_va /home/username/models/intel/vehicle-attributes-recognition-barrier-0039/FP16/vehicle-attributes-recognition-barrier-0039.xml -m_lpr /home/username/models/intel/license-plate-recognition-barrier-0001/FP16/license-plate-recognition-barrier-0001.xml -d CPU +``` + +**GPU:** + +> **NOTE**: Running inference on Intel® Processor Graphics (GPU) requires [additional hardware configuration steps](https://docs.openvinotoolkit.org/latest/_docs_install_guides_installing_openvino_linux.html#additional-GPU-steps). +```sh +./security_barrier_camera_demo -i /opt/intel/openvino/deployment_tools/demo/car_1.bmp -m /vehicle-license-plate-detection-barrier-0106.xml -m_va /vehicle-attributes-recognition-barrier-0039.xml -m_lpr /license-plate-recognition-barrier-0001.xml -d GPU +``` + +**MYRIAD:** + +> **NOTE**: Running inference on VPU devices (Intel® Movidius™ Neural Compute + Stick or Intel® Neural Compute Stick 2) with the MYRIAD plugin requires + [additional hardware configuration steps](https://docs.openvinotoolkit.org/latest/_docs_install_guides_installing_openvino_linux.html#additional-NCS-steps). +```sh +./classification_sample_async -i /inference-engine/samples/sample_data/car.png -m /squeezenet1.1.xml -d MYRIAD +``` + +
+ +## Basic Guidelines for Using Code Samples and Demo Applications + +Following are some basic guidelines for executing the OpenVINO™ workflow using the code samples and demo applications: + +1. Before using the OpenVINO™ samples, always set up the environment: +```sh +source /opt/intel/openvino/bin/setupvars.sh +``` +2. Have the directory path for the following: +- Code Sample binaries located in `~/inference_engine_cpp_samples_build/intel64/Release` +- Demo Application binaries located in `~/inference_engine_demos_build/intel64/Release` +- Media: Video or image. See Download Media. +- Model: Neural Network topology converted with the Model Optimizer to the IR format (.bin and .xml files). See Download Models for more information. + +## Typical Code Sample and Demo Application Syntax Examples + +Template to call sample code or a demo application: + +```sh + -i -m -d +``` + +With the sample information specified, the command might look like this: + +```sh +./object_detection_demo_ssd_async -i ~/Videos/catshow.mp4 \ +-m ~/ir/fp32/mobilenet-ssd.xml -d CPU +``` + +## Advanced Demo Use + +Some demo applications let you use multiple models for different purposes. In these cases, the output of the first model is usually used as the input for later models. + +For example, an SSD will detect a variety of objects in a frame, then age, gender, head pose, emotion recognition and similar models target the objects classified by the SSD to perform their functions. + +In these cases, the use pattern in the last part of the template above is usually: + +`-m_ … -d_ …` + +For head pose: + +`-m_hp -d_hp ` + +**Example of an Entire Command (object_detection + head pose):** + +```sh +./object_detection_demo_ssd_async -i ~/Videos/catshow.mp4 \ +-m ~/ir/fp32/mobilenet-ssd.xml -d CPU -m_hp headpose.xml \ +-d_hp CPU +``` + +**Example of an Entire Command (object_detection + head pose + age-gender):** + +```sh +./object_detection_demo_ssd_async -i ~/Videos/catshow.mp4 \ +-m ~/r/fp32/mobilenet-ssd.xml -d CPU -m_hp headpose.xml \ +-d_hp CPU -m_ag age-gender.xml -d_ag CPU +``` + +You can see all the sample application’s parameters by adding the `-h` or `--help` option at the command line. + + +## Additional Resources + +Use these resources to learn more about the OpenVINO™ toolkit: + +* [OpenVINO™ Release Notes](https://software.intel.com/en-us/articles/OpenVINO-RelNotes) +* [Introduction to Intel® Deep Learning Deployment Toolkit](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Introduction.html) +* [Inference Engine Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Deep_Learning_Inference_Engine_DevGuide.html) +* [Model Optimizer Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) +* [Inference Engine Samples Overview](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Samples_Overview.html) +* [Overview of OpenVINO™ Toolkit Pre-Trained Models](https://software.intel.com/en-us/openvino-toolkit/documentation/pretrained-models) +* [OpenVINO™ Hello World Face Detection Exercise](https://github.com/intel-iot-devkit/inference-tutorials-generic) diff --git a/docs/hetero-plugin.md b/docs/hetero-plugin.md new file mode 100644 index 00000000000000..7a47dbf7ddf6de --- /dev/null +++ b/docs/hetero-plugin.md @@ -0,0 +1,137 @@ +# Inference Engine hetero plugin design overview {#openvino_docs_hetero_plugin} + +## Subgraphs selection + +Algorithm: + +For each plugin +1. Select *root* node + * Node not in subgraph previously constructed + * Affinity is equal to plugin name +2. Select adjacent node to any node in already subgraph which is not in rejected list + * if there are no such nodes **end** +3. Check selected node has same affinity +4. Add node to subgraph if check was successful or add to rejected list otherwise +5. Check global condition + * Nodes in rejected list can never be added to subgraph + * Nodes not in subgraph and not in rejected list can possibly be added later + * Check subgraph topology (the only check now is there are no indirect subgraph self-references) +6. If global condition was failed remove last node from subgraph, add it to rejected list and go to step 5 + * we can rollback multiple times here because rejected list is changed every time +7. Go to step 2 + +Example: +``` + 1 + | + 2 + / \ + 3 4 + \ / + 5 + | + 6 + | + 7 +``` + +Nodes [1,2,3,5,6,7] are supported in plugin, [4] is not + +Possible roots: [1,2,3,5,6,7] +1. Select root [1] + * Subgraph: [1] + * Rejected: [] + * Global condition: ok +2. Merge [2] + * Subgraph: [1,2] + * Rejected: [] + * Global condition: ok +3. Merge [3] + * Subgraph: [1,2,3] + * Rejected: [] + * Global condition: ok +4. Merge [5] + * Subgraph: [1,2,3,5] + * Rejected: [] + * Global condition: There is possible self-references through node [4] but we do not know yet, ok +5. Merge [6] + * Subgraph: [1,2,3,5,6] + * Rejected: [] + * Global condition: There is possible self-references through node [4] but we do not know yet, ok +6. Merge [7] + * Subgraph: [1,2,3,5,6,7] + * Rejected: [] + * Global condition: There is possible self-references through node [4] but we do not know yet, ok +7. Failed to merge [4] + * Subgraph: [1,2,3,5,6,7] + * Rejected: [4] + * Global condition: There is self-references through node [4], reject +8. Rollback [7] + * Subgraph: [1,2,3,5,6] + * Rejected: [4,7] + * Global condition: There is self-references through node [4], reject +9. Rollback [6] + * Subgraph: [1,2,3,5] + * Rejected: [4,6,7] + * Global condition: There is self-references through node [4], reject +10. Rollback [5] + * Subgraph: [1,2,3] + * Rejected: [4,5,6,7] + * Global condition: ok +11. There are nodes to merge **end** + +Possible roots: [5,6,7] +1. Select root [5] + * Subgraph: [5] + * Rejected: [] + * Global condition: ok +2. Merge [6] + * Subgraph: [5,6] + * Rejected: [] + * Global condition: ok +3. Merge [7] + * Subgraph: [5,6,7] + * Rejected: [] + * Global condition: ok +4. Merge [3] + * Subgraph: [3,5,6,7] + * Rejected: [] + * Global condition: ok +5. Merge [2] + * Subgraph: [2,3,5,6,7] + * Rejected: [] + * Global condition: There is possible self-references through node [4] but we do not know yet, ok +6. Failed to merge [4] + * Subgraph: [2,3,5,6,7] + * Rejected: [4] + * Global condition: There is self-references through node [4], reject +7. Rollback [2] + * Subgraph: [3,5,6,7] + * Rejected: [2,4] + * Global condition: ok +8. There are nodes to merge **end** + +Possible roots: [] no roots, **END** + +Subgraphs: [1,2,3], [3,5,6,7] + +Select best subgraph: +* When we have multiple subgraphs larger ([3,5,6,7]) is always selected, always + +Repeat previous steps with remaining nodes [1,2] + +The final result is: +* First plugin: [3,5,6,7], [1,2] +* Second plugin: [4] + + +## Subgraphs self reference detection + +1. For each node in network build a list of reachable node (transitive closure) +2. For each pair of nodes in subgraph find `path` nodes (nodes through one node in pair reachable to other) + * assume `src` - one node in pair, `dst` - other node in pair + * get all nodes reachable from `src` + * in those nodes find nodes through you can reach `dst` those will be our `path` node +3. Results for pairs is cached. +4. Check if there intersection between `path` nodes set and rejected nodes set for each nodes pair in subgraph +5. In case of intersection we have a self-reference and subgraph is invalid diff --git a/docs/how_tos/how-to-links.md b/docs/how_tos/how-to-links.md new file mode 100644 index 00000000000000..89c4f210c0d38e --- /dev/null +++ b/docs/how_tos/how-to-links.md @@ -0,0 +1,75 @@ +# "Hot Topic" How-To Links {#openvino_docs_how_tos_how_to_links} + +## Blogs & Articles + +* [Maximize CPU Inference Performance with Improved Threads and Memory Management in Intel® Distribution of OpenVINO™ toolkit](https://www.edge-ai-vision.com/2020/03/maximize-cpu-inference-performance-with-improved-threads-and-memory-management-in-intel-distribution-of-openvino-toolkit/) +* [Simplifying Cloud to Edge AI Deployments with the Intel® Distribution of OpenVINO™ Toolkit, Microsoft Azure, and ONNX Runtime](https://www.intel.ai/microsoft-azure-openvino-toolkit/#gs.11oa13) +* [Streamline your Intel® Distribution of OpenVINO™ Toolkit development with Deep Learning Workbench](https://www.intel.ai/openvino-dlworkbench/#gs.wwj3bq) +* [Enhanced Low-Precision Pipeline to Accelerate Inference with OpenVINO Toolkit](https://www.intel.ai/open-vino-low-precision-pipeline/) +* [Improving DL Performance Using Binary Convolution Support in OpenVINO Toolkit](https://www.intel.ai/binary-convolution-openvino) +* [Automatic Multi-Device Inference with the Intel® Distribution of OpenVINO™ toolkit](https://www.intel.ai/automatic-multi-device-inference-with-intel-distribution-of-openvino-toolkit/) +* [CPU Inference Performance Boost with “Throughput” Mode in the Intel® Distribution of OpenVINO™ Toolkit](https://www.intel.ai/cpu-inference-performance-boost-openvino/) +* [Introducing int8 quantization for fast CPU inference using OpenVINO](https://www.intel.ai/introducing-int8-quantization-for-fast-cpu-inference-using-openvino/) +* [Accelerate Vision-based AI with Intel® Distribution of OpenVINO™ Toolkit](https://www.intel.ai/accelerate-vision-based-ai-with-intel-distribution-of-openvino-toolkit/) + +## Custom Layers Guide +To learn about what is *custom layers* and how to work with them in the Deep Learning Deployment Toolkit, see the [Custom Layers Guide](../HOWTO/Custom_Layers_Guide.md). + +## Introducing OpenVINO™ and Computer Vision | IoT Developer Show Season 2 | Intel Software + +[![](https://img.youtube.com/vi/M6Nyh2JDLQs/0.jpg)](https://www.youtube.com/watch?v=M6Nyh2JDLQs) + + + +## OpenVINO™ Toolkit and Two Hardware Development Kits | IoT Developer Show Season 2 | Intel Software + +[![](https://img.youtube.com/vi/GtJPBYjuyVU/0.jpg)](https://www.youtube.com/watch?v=GtJPBYjuyVU) + + + +## Intel Demonstration of High Performance Vision Deployment - The OpenVINO Toolkit in Action + +[![](https://img.youtube.com/vi/1_iI_4Zgufw/0.jpg)](https://www.youtube.com/watch?v=1_iI_4Zgufw) + + + +## Deploying Intel® FPGAs for Deep Learning Inferencing with OpenVINO™ Toolkit + +[![](https://img.youtube.com/vi/7yh1c8kJn1A/0.jpg)](https://www.youtube.com/watch?v=7yh1c8kJn1A) + + + + +## Computer Vision at the Edge with OpenVINO by Krishnakumar Shetti at ODSC_India + +[![](https://img.youtube.com/vi/RfRCrq35LXg/0.jpg)](https://www.youtube.com/watch?v=RfRCrq35LXg) + + + +## Model optimizer concept + +[![](https://img.youtube.com/vi/Kl1ptVb7aI8/0.jpg)](https://www.youtube.com/watch?v=Kl1ptVb7aI8) + + + +## Computer Vision with Intel + +[![](https://img.youtube.com/vi/FZZD4FCvO9c/0.jpg)](https://www.youtube.com/watch?v=FZZD4FCvO9c) + + + +## Case Studies + +|| Link to tutorial | +|:---:|:---:| +|![dl_healthcare]"" | [Deep Learning for Healthcare Imaging](https://ai.intel.com/wp-content/uploads/sites/53/2018/03/IntelSWDevTools_OptimizeDLforHealthcare.pdf) | +|![performance-boost-dl]"" | [Performance Boost for a Deep Learning Algorithm](https://software.intel.com/en-us/download/geovision-case-study) | +|![digital-security-surveillance]"" | [Digital Security & Surveillance Solutions](https://software.intel.com/en-us/download/agent-vi-case-study) | +|![robotics-with-AI]"" | [Robotics with AI for Industry 4.0](https://software.intel.com/en-us/download/intel-vision-accelerator-design-products-intel-nexcom-solution-brief) | +|![people-counter-syestem]"" | [People Counter Reference Implementation](https://software.intel.com/en-us/articles/iot-reference-implementation-people-counter) | + +[dl_healthcare]: ../img/DL-for-Healthcare-Imaging.jpg +[performance-boost-dl]: ../img/performance-boost-DL-algorithm.jpg +[digital-security-surveillance]: ../img/digital-security-surveillance.jpg +[robotics-with-AI]: ../img/robotics-with-AI.jpg +[people-counter-syestem]: ../img/people-counter-syestem.jpg \ No newline at end of file diff --git a/docs/img/Add_Environment_Variable.PNG b/docs/img/Add_Environment_Variable.PNG new file mode 100644 index 00000000000000..4d2da1a6be0327 --- /dev/null +++ b/docs/img/Add_Environment_Variable.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:241ba6e72bd7176d26c61db539de393bee06b2fba584484a5e59a4d43ce1054d +size 74559 diff --git a/docs/img/Build_samples.png b/docs/img/Build_samples.png new file mode 100644 index 00000000000000..6e7f8703908d07 --- /dev/null +++ b/docs/img/Build_samples.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a463af45b1a78e662ae395aa4a29aba39e813a1138c3483a739445369a6e23b2 +size 145146 diff --git a/docs/img/Build_success.png b/docs/img/Build_success.png new file mode 100644 index 00000000000000..9a2cf143a0b144 --- /dev/null +++ b/docs/img/Build_success.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4e2358bed81ef556b0c6c07be40bf177c7284bc0b8aab03683ed41cae0ade65d +size 81284 diff --git a/docs/img/CMakeInstalled.PNG b/docs/img/CMakeInstalled.PNG new file mode 100644 index 00000000000000..d864233d0f36c3 --- /dev/null +++ b/docs/img/CMakeInstalled.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:90b7a7265b7ef4ecab8d7de5c93e4506fe178b011a8706d81eaa96b259c34830 +size 39942 diff --git a/docs/img/Configure-MO.PNG b/docs/img/Configure-MO.PNG new file mode 100644 index 00000000000000..ad1971e60b3f8f --- /dev/null +++ b/docs/img/Configure-MO.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:92143a550cfa20740f12274fea53b15a0268ee5d722547f66a3cade38d4c4e74 +size 42301 diff --git a/docs/img/Confirm_install.png b/docs/img/Confirm_install.png new file mode 100644 index 00000000000000..94a4f1ed8de5bf --- /dev/null +++ b/docs/img/Confirm_install.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eda10b447d0b98b530f01901781a15c0b3f01b41dcbede93c4ae34c4a174b79e +size 67445 diff --git a/docs/img/DL-for-Healthcare-Imaging.jpg b/docs/img/DL-for-Healthcare-Imaging.jpg new file mode 100644 index 00000000000000..7e784a1abb5f02 --- /dev/null +++ b/docs/img/DL-for-Healthcare-Imaging.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4c00025d661d83f9c72061587597d3ffbd2acb12ceb416b0732dec8812b74f1d +size 44100 diff --git a/docs/img/DeviceDriverVersion.PNG b/docs/img/DeviceDriverVersion.PNG new file mode 100644 index 00000000000000..01f92aea8338b7 --- /dev/null +++ b/docs/img/DeviceDriverVersion.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c2f144de249eddf1c159cbc1a27a06ad40f57442efcf75f2f49cc02626fc6875 +size 13168 diff --git a/docs/img/DeviceManager.PNG b/docs/img/DeviceManager.PNG new file mode 100644 index 00000000000000..2ef1df565fa702 --- /dev/null +++ b/docs/img/DeviceManager.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:30cc5a9b8e6e37381aded7602b2006aa1867df99ce01b817614092f01aae8e4c +size 13401 diff --git a/docs/img/DownloadVisualStudioBuildTools.PNG b/docs/img/DownloadVisualStudioBuildTools.PNG new file mode 100644 index 00000000000000..732d2ab3bcc4e7 --- /dev/null +++ b/docs/img/DownloadVisualStudioBuildTools.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3e9efe895615d3746528db08ccbc223775d3932f6ae52dc46430ba6b9f70cbd8 +size 23161 diff --git a/docs/img/Environment_Variables-select_Path.PNG b/docs/img/Environment_Variables-select_Path.PNG new file mode 100644 index 00000000000000..1f9af3d3e23c5e --- /dev/null +++ b/docs/img/Environment_Variables-select_Path.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b91420cf0b50e6761652b3154f8948f60ff248222246eae173ef3bbd306c53fd +size 65875 diff --git a/docs/img/GX_Configure1.png b/docs/img/GX_Configure1.png new file mode 100644 index 00000000000000..7b310dc5d60d48 --- /dev/null +++ b/docs/img/GX_Configure1.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:58842c11fb9ab70253030c9d1122b1be227e7b0bf8695a5f6491d7b6fc4e576d +size 301148 diff --git a/docs/img/GX_Configure2.png b/docs/img/GX_Configure2.png new file mode 100644 index 00000000000000..6fc6f7d58d7e0c --- /dev/null +++ b/docs/img/GX_Configure2.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:86d868cb6202e36fbf7c32058766713405273a49ba00038e422b2c4e5ff87d13 +size 386528 diff --git a/docs/img/GX_Configure3.png b/docs/img/GX_Configure3.png new file mode 100644 index 00000000000000..0f816b8cf41855 --- /dev/null +++ b/docs/img/GX_Configure3.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1d2a5dd1d7fa8f172ff310d5ede2ed493fdb35130bac7df4fd92b640850cbc19 +size 358637 diff --git a/docs/img/Image_Classification_Demo_Complete.PNG b/docs/img/Image_Classification_Demo_Complete.PNG new file mode 100644 index 00000000000000..d88ec597af8fec --- /dev/null +++ b/docs/img/Image_Classification_Demo_Complete.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f546b3bfce4cb15c20abdff940ba94e3c12a44f12af85bda70f5d2e91b855d9 +size 12931 diff --git a/docs/img/InstallPython64-bit.PNG b/docs/img/InstallPython64-bit.PNG new file mode 100644 index 00000000000000..85ea83599f029e --- /dev/null +++ b/docs/img/InstallPython64-bit.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a19288f618fc7c6ace4703c06ff018f8ffa480a0300a5cb6dc3ce924f8fc2a59 +size 85959 diff --git a/docs/img/InstallPython64-bit_1.PNG b/docs/img/InstallPython64-bit_1.PNG new file mode 100644 index 00000000000000..9005e0d4001d1f --- /dev/null +++ b/docs/img/InstallPython64-bit_1.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:67c9e9f6c146a3f7b5d1f45357cbe8820f295492cd3d2ba71387547fd5f368dc +size 88207 diff --git a/docs/img/InstallVisualComponents.png b/docs/img/InstallVisualComponents.png new file mode 100644 index 00000000000000..f2895ecdbe5a86 --- /dev/null +++ b/docs/img/InstallVisualComponents.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6d17dfbb5ba7b3484d02afffea067360f5bc74269296ecb8c5326b54168306b1 +size 34384 diff --git a/docs/img/InstallVisualStudioComponentsSummary.png b/docs/img/InstallVisualStudioComponentsSummary.png new file mode 100644 index 00000000000000..14e5df0f58a7c4 --- /dev/null +++ b/docs/img/InstallVisualStudioComponentsSummary.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2ccf75c407ac691d9348d653288a5b751f2364e81e4c53f5ec02202158f517f6 +size 56518 diff --git a/docs/img/Install_Screen_3-dd.PNG b/docs/img/Install_Screen_3-dd.PNG new file mode 100644 index 00000000000000..82b942757e0321 --- /dev/null +++ b/docs/img/Install_Screen_3-dd.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5558b76c3faccf390b69c959a983e95ed38e2f6021ed5abb1250c233a0efa391 +size 106713 diff --git a/docs/img/OpenVINO_Install_Options.png b/docs/img/OpenVINO_Install_Options.png new file mode 100644 index 00000000000000..3bbaa9e829f751 --- /dev/null +++ b/docs/img/OpenVINO_Install_Options.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3bec596828897af0d59dc72a0a2ec5d00a92fee92112c0e28d974431107006f2 +size 43683 diff --git a/docs/img/OpenVINO_Install_warnings.PNG b/docs/img/OpenVINO_Install_warnings.PNG new file mode 100644 index 00000000000000..9b0e076e5b1315 --- /dev/null +++ b/docs/img/OpenVINO_Install_warnings.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:922d409ed5703f148daeb73d40a5f9d436de6513582968eaf271c3e2a2272ac7 +size 95774 diff --git a/docs/img/OpenVINO_Install_warnings_R3.PNG b/docs/img/OpenVINO_Install_warnings_R3.PNG new file mode 100644 index 00000000000000..7e0c7ce7e203fd --- /dev/null +++ b/docs/img/OpenVINO_Install_warnings_R3.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:98f3d5bd68025d3d8811d6ac7595749879be6d7467287573565661df31d41939 +size 82299 diff --git a/docs/img/OpenVINO_first_demo.PNG b/docs/img/OpenVINO_first_demo.PNG new file mode 100644 index 00000000000000..ccbc7f62770d5d --- /dev/null +++ b/docs/img/OpenVINO_first_demo.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ab41914b2ec5b153fa423ac1245cae8535b5461606adc2beac8bf778691a08ab +size 17462 diff --git a/docs/img/Python-Download.png b/docs/img/Python-Download.png new file mode 100644 index 00000000000000..28e641e1038a19 --- /dev/null +++ b/docs/img/Python-Download.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:97b3c297e74357e7c08300a34370ca02f5fef9ec6036592573db9bc33e431ffa +size 193220 diff --git a/docs/img/Quartus_Lite_Download.png b/docs/img/Quartus_Lite_Download.png new file mode 100644 index 00000000000000..0f816b8cf41855 --- /dev/null +++ b/docs/img/Quartus_Lite_Download.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1d2a5dd1d7fa8f172ff310d5ede2ed493fdb35130bac7df4fd92b640850cbc19 +size 358637 diff --git a/docs/img/Quartus_Pro_Download.png b/docs/img/Quartus_Pro_Download.png new file mode 100644 index 00000000000000..6fc6f7d58d7e0c --- /dev/null +++ b/docs/img/Quartus_Pro_Download.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:86d868cb6202e36fbf7c32058766713405273a49ba00038e422b2c4e5ff87d13 +size 386528 diff --git a/docs/img/SelectSecondVisualStudioBox-2017.PNG b/docs/img/SelectSecondVisualStudioBox-2017.PNG new file mode 100644 index 00000000000000..9713f715babb69 --- /dev/null +++ b/docs/img/SelectSecondVisualStudioBox-2017.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:07477e2bb351c6826cfa649eae720fc1b1d491dff64da0f83d34abca0aea6fd8 +size 24639 diff --git a/docs/img/SelectVisualStudioBuildTooltoInstall-2017.PNG b/docs/img/SelectVisualStudioBuildTooltoInstall-2017.PNG new file mode 100644 index 00000000000000..76e586b30f544b --- /dev/null +++ b/docs/img/SelectVisualStudioBuildTooltoInstall-2017.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c594c4420bb7372d9327eeac20ce931e084c3cd887efcdcf0731b8c61930cc74 +size 31724 diff --git a/docs/img/SelectVisualStudioBuildTooltoInstall.png b/docs/img/SelectVisualStudioBuildTooltoInstall.png new file mode 100644 index 00000000000000..974066f9646f25 --- /dev/null +++ b/docs/img/SelectVisualStudioBuildTooltoInstall.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:60196edc21ada46ab92e52f2acec8a47c645e696c45929c674598e4c0c52798a +size 87484 diff --git a/docs/img/Summary-2017.PNG b/docs/img/Summary-2017.PNG new file mode 100644 index 00000000000000..fbcf635e16cfb6 --- /dev/null +++ b/docs/img/Summary-2017.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5c5212dcc04651cc36068b3dd825e062b5ee7e00d165827783cb58336c5979d3 +size 34841 diff --git a/docs/img/System_Properties.PNG b/docs/img/System_Properties.PNG new file mode 100644 index 00000000000000..5d6bbdff88724d --- /dev/null +++ b/docs/img/System_Properties.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ff0554c93bf72cc2f863175bb4446aca63f7b31aded8acef6ed83c3c837b63fe +size 32049 diff --git a/docs/img/VS_2015_Build_Tools_Download.PNG b/docs/img/VS_2015_Build_Tools_Download.PNG new file mode 100644 index 00000000000000..9f661d4f3698be --- /dev/null +++ b/docs/img/VS_2015_Build_Tools_Download.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:710be49ad01b379b3b1689d5ff4330b38fba0fe8b77765682e59cc30b6541f4c +size 26970 diff --git a/docs/img/VisionAcceleratorJTAG.png b/docs/img/VisionAcceleratorJTAG.png new file mode 100644 index 00000000000000..f1ac9b2f00e4ba --- /dev/null +++ b/docs/img/VisionAcceleratorJTAG.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:59c5c62c2999f5962552b3f0fd55f74cd7abe7d623aa5293b3b022673b6cf888 +size 148611 diff --git a/docs/img/Visual-Studio-2017-Downloads.PNG b/docs/img/Visual-Studio-2017-Downloads.PNG new file mode 100644 index 00000000000000..844e30fad85521 --- /dev/null +++ b/docs/img/Visual-Studio-2017-Downloads.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8d1dbc70fc0266a346d81777a353adbef1d800ae9548b86d44b0ac3b2e434c84 +size 22703 diff --git a/docs/img/VisualStudio2017BuildToolSummary.png b/docs/img/VisualStudio2017BuildToolSummary.png new file mode 100644 index 00000000000000..d7397200124407 --- /dev/null +++ b/docs/img/VisualStudio2017BuildToolSummary.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a2bd12f9754c9346b6f53cfaa0eadb498c07ef1df6b687ddcd340f45b7ec41ab +size 16791 diff --git a/docs/img/VisualStudio2019BuildToolSummary.PNG b/docs/img/VisualStudio2019BuildToolSummary.PNG new file mode 100644 index 00000000000000..e489306e6c204b --- /dev/null +++ b/docs/img/VisualStudio2019BuildToolSummary.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4daedcb024767f5364a76314d7de52d6c9c48ae0527591d240516c88ccb3af25 +size 99695 diff --git a/docs/img/VisualStudioBuildTools.PNG b/docs/img/VisualStudioBuildTools.PNG new file mode 100644 index 00000000000000..f80fe589fd7fb4 --- /dev/null +++ b/docs/img/VisualStudioBuildTools.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9a166f68c464ae61d799989e70cd2660575d0b0cb69a9efc38205f88e594b83f +size 172954 diff --git a/docs/img/benefit_hddlr.png b/docs/img/benefit_hddlr.png new file mode 100644 index 00000000000000..155bce030e1bef --- /dev/null +++ b/docs/img/benefit_hddlr.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eed680b8104c14f30fc6ecd55b85aab580617d8080b1b48ade038d3f67d88ce8 +size 18879 diff --git a/docs/img/benefit_i3.png b/docs/img/benefit_i3.png new file mode 100644 index 00000000000000..e453bd222294c8 --- /dev/null +++ b/docs/img/benefit_i3.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cb937dcaf5dac64c73e17a53d59f9f98dae71bfc87f83eece2935af551d09235 +size 18667 diff --git a/docs/img/benefit_i5.png b/docs/img/benefit_i5.png new file mode 100644 index 00000000000000..985a16aecc9450 --- /dev/null +++ b/docs/img/benefit_i5.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:36c260a70c7c5bf1aca60d2754e98b1e071d14f74e8d870a3d0db7b07a79eb4d +size 18278 diff --git a/docs/img/benefit_i7.png b/docs/img/benefit_i7.png new file mode 100644 index 00000000000000..10410215d3380c --- /dev/null +++ b/docs/img/benefit_i7.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9444a62df34218b274edd125c6757cb6287ba44009b4c01d9386394c12ded50b +size 19118 diff --git a/docs/img/benefit_ivad_fpga.png b/docs/img/benefit_ivad_fpga.png new file mode 100644 index 00000000000000..cc8c35fbfcf354 --- /dev/null +++ b/docs/img/benefit_ivad_fpga.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cd039cef14b81c6cf573692566ce99a728181b692973bd1c52a95ba0561bd537 +size 20256 diff --git a/docs/img/benefit_ncs2.png b/docs/img/benefit_ncs2.png new file mode 100644 index 00000000000000..104a49a17cefc9 --- /dev/null +++ b/docs/img/benefit_ncs2.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ba09c1c8b748936350df0999b1a926a41ca040f171754bd7884950fe13402e4c +size 19279 diff --git a/docs/img/benefit_xeon_e212g.png b/docs/img/benefit_xeon_e212g.png new file mode 100644 index 00000000000000..e0440e579ef1d5 --- /dev/null +++ b/docs/img/benefit_xeon_e212g.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b0d5cd7e017399bc57fa2db7214643c654c0833f595d93b4ac2502ab8e99a3ef +size 19099 diff --git a/docs/img/benefit_xeon_gold.png b/docs/img/benefit_xeon_gold.png new file mode 100644 index 00000000000000..e141d22bcd1e51 --- /dev/null +++ b/docs/img/benefit_xeon_gold.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e672636e3e39904835a7840fb23587f239413ba8a00eb4a02cdf4b092dc82c0e +size 13102 diff --git a/docs/img/benefit_xeon_platinum.png b/docs/img/benefit_xeon_platinum.png new file mode 100644 index 00000000000000..f2698bebd0b824 --- /dev/null +++ b/docs/img/benefit_xeon_platinum.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:382e2dc25cd0bf4290d4a550e157561cea5becb8b9a520141263350b210fa064 +size 13855 diff --git a/docs/img/benefit_xeon_silver.png b/docs/img/benefit_xeon_silver.png new file mode 100644 index 00000000000000..b64195c6caa408 --- /dev/null +++ b/docs/img/benefit_xeon_silver.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:242338870c7bb08cd48930dbb024225a3f64d8ac96296b57d929bfca73c83219 +size 18010 diff --git a/docs/img/cmake-installer_1.png b/docs/img/cmake-installer_1.png new file mode 100644 index 00000000000000..436d214e8e2bb5 --- /dev/null +++ b/docs/img/cmake-installer_1.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e5c62a6356c3575b296736643aaceb9f3dbdc3e2d52ca71fe0590a4cb4046c6c +size 35011 diff --git a/docs/img/command_prompt.PNG b/docs/img/command_prompt.PNG new file mode 100644 index 00000000000000..ae89ba34a3feb6 --- /dev/null +++ b/docs/img/command_prompt.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:291655ea87e6dfe109f95204531863602e0b90a8dc69f2dd1be8c393b4ca047a +size 19471 diff --git a/docs/img/configuration_dialog.png b/docs/img/configuration_dialog.png new file mode 100644 index 00000000000000..ffd02aff2411b0 --- /dev/null +++ b/docs/img/configuration_dialog.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a9a30b2cc5ca8ebe2da122247e292a9b415beb7bb6fbfd88f6843061d81a9e83 +size 29381 diff --git a/docs/img/configure_MO_success.PNG b/docs/img/configure_MO_success.PNG new file mode 100644 index 00000000000000..736c6f31475f64 --- /dev/null +++ b/docs/img/configure_MO_success.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bbb2f8446a8b384a9fc6a4ec9921befb205e86ad85947348abb2d864f21fd752 +size 176782 diff --git a/docs/img/cpu_streams_explained.png b/docs/img/cpu_streams_explained.png new file mode 100644 index 00000000000000..21b27d18705559 --- /dev/null +++ b/docs/img/cpu_streams_explained.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4740f9c1c4215367a6e31af0fb23eb9c5abf594e87aaecc211845d0a8480c211 +size 105457 diff --git a/docs/img/digital-security-surveillance.jpg b/docs/img/digital-security-surveillance.jpg new file mode 100644 index 00000000000000..2e5078faa79848 --- /dev/null +++ b/docs/img/digital-security-surveillance.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ea4e900661f1321f7318ff279eb66e41907bc3bcff2f52c7120b6378040b0fcd +size 61349 diff --git a/docs/img/eff_atom.png b/docs/img/eff_atom.png new file mode 100644 index 00000000000000..d961336992158e --- /dev/null +++ b/docs/img/eff_atom.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4405ac12978a40d6257f0a57d7a99e544cab3b0f283de1006a89df0e2235b452 +size 25347 diff --git a/docs/img/eff_hddlr.png b/docs/img/eff_hddlr.png new file mode 100644 index 00000000000000..5f3e7d7c306d00 --- /dev/null +++ b/docs/img/eff_hddlr.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fee9f17f6d707f06aca42d7c334b4ae9cdaeb28c6360ac654ca7581c74bf2d6c +size 20665 diff --git a/docs/img/eff_i3.png b/docs/img/eff_i3.png new file mode 100644 index 00000000000000..b91327f4fd7228 --- /dev/null +++ b/docs/img/eff_i3.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e3ec66355bbc60db9b54ed7556c639a27e7ef89a9d0fc19649df8fcc9d34eff4 +size 25735 diff --git a/docs/img/eff_i5.png b/docs/img/eff_i5.png new file mode 100644 index 00000000000000..5da1b42179cc5b --- /dev/null +++ b/docs/img/eff_i5.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b347f0a7423bd99d1b9b177325c0e9e436c0b4a3c33b611195745c2138e77853 +size 22781 diff --git a/docs/img/eff_i7.png b/docs/img/eff_i7.png new file mode 100644 index 00000000000000..a79f1c70efcac1 --- /dev/null +++ b/docs/img/eff_i7.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f5cb9342b5ffb67dd0fbc8dc2540e6b2c993503a6e6c55feda881134f3c31a2a +size 25646 diff --git a/docs/img/eff_i9.png b/docs/img/eff_i9.png new file mode 100644 index 00000000000000..8df628be61cb7e --- /dev/null +++ b/docs/img/eff_i9.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:be5d78af17682c7cec7fbad4b3bff0b625a217179810458c3ebb941c66c98c02 +size 26404 diff --git a/docs/img/eff_ivad_fpga.png b/docs/img/eff_ivad_fpga.png new file mode 100644 index 00000000000000..6c2257cdb6a211 Binary files /dev/null and b/docs/img/eff_ivad_fpga.png differ diff --git a/docs/img/eff_ncs2.png b/docs/img/eff_ncs2.png new file mode 100644 index 00000000000000..83dd7a13483a52 --- /dev/null +++ b/docs/img/eff_ncs2.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5a5af49441cd924ecd0b155b20c120f3a38638fd6500646bc8e0d6005e701639 +size 17323 diff --git a/docs/img/eff_xeon_e212g.png b/docs/img/eff_xeon_e212g.png new file mode 100644 index 00000000000000..8d07bc5fce7da7 --- /dev/null +++ b/docs/img/eff_xeon_e212g.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a08a1dcbd57285dac978583287c3bf881598769a782346122ccea094369e0c98 +size 24643 diff --git a/docs/img/eff_xeon_gold.png b/docs/img/eff_xeon_gold.png new file mode 100644 index 00000000000000..ff87faed58aca1 --- /dev/null +++ b/docs/img/eff_xeon_gold.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:13031b485bba780de5eeb52230d7d9780c89efb458dd90150e40f4c35cf0f126 +size 24654 diff --git a/docs/img/eff_xeon_platinum.png b/docs/img/eff_xeon_platinum.png new file mode 100644 index 00000000000000..7fa94edbf3a836 --- /dev/null +++ b/docs/img/eff_xeon_platinum.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:02f8567da0f9ccda74e96609769341e2bcc8366dd2b6ecc276e3b1b13953b67c +size 23156 diff --git a/docs/img/eff_xeon_silver.png b/docs/img/eff_xeon_silver.png new file mode 100644 index 00000000000000..f58c16538b2484 --- /dev/null +++ b/docs/img/eff_xeon_silver.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2b68b2766f9bde57c7c20967caef0517d9aa7d2d6c707c2df9dc3f7adb4ccdd8 +size 26332 diff --git a/docs/img/example_sample_output.png b/docs/img/example_sample_output.png new file mode 100644 index 00000000000000..2a830b853aac36 --- /dev/null +++ b/docs/img/example_sample_output.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6de604f7753746d7444bf766a9cae40b4758ef8f2382a4118fb98e36ea85631b +size 435896 diff --git a/docs/img/fpga-download-cable.png b/docs/img/fpga-download-cable.png new file mode 100644 index 00000000000000..7645f7898511b4 --- /dev/null +++ b/docs/img/fpga-download-cable.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dea06469d8bb27e6b536c0b9124401dab73e5a0e311cd2da8836ad924032bffe +size 297488 diff --git a/docs/img/image_classification_script_output_lnx.png b/docs/img/image_classification_script_output_lnx.png new file mode 100644 index 00000000000000..09f6c4aac5c7ad --- /dev/null +++ b/docs/img/image_classification_script_output_lnx.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3ffc6c50c91a1eb41b773654cb6ee4efbf4c1ae903688345a91d231ad6257d00 +size 38351 diff --git a/docs/img/image_classification_script_output_win.png b/docs/img/image_classification_script_output_win.png new file mode 100644 index 00000000000000..e78c1af3515d2b --- /dev/null +++ b/docs/img/image_classification_script_output_win.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:721c7599553af43d892be4b4da76e31833ccfdf26bd34a490215843640fbcf5c +size 29059 diff --git a/docs/img/inference_pipeline_script_lnx.png b/docs/img/inference_pipeline_script_lnx.png new file mode 100644 index 00000000000000..a4ff0ba7f8e93f --- /dev/null +++ b/docs/img/inference_pipeline_script_lnx.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9ca0811c19b4108054bfa66d99107e469409d7a0200745da96dd3e8fdac79daf +size 397011 diff --git a/docs/img/inference_pipeline_script_mac.png b/docs/img/inference_pipeline_script_mac.png new file mode 100644 index 00000000000000..ebceb4a24353ed --- /dev/null +++ b/docs/img/inference_pipeline_script_mac.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:619da8838c460aa26253fa6cfed3d3346fcf7c7c5deb8f178e9bd55dc78c9c8f +size 2017750 diff --git a/docs/img/inference_pipeline_script_win.png b/docs/img/inference_pipeline_script_win.png new file mode 100644 index 00000000000000..a42f193d454aa6 --- /dev/null +++ b/docs/img/inference_pipeline_script_win.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c08b4d12634d3e17a7ed198cdc15be7b8e4b1fe33728d5f38d0998faa7ea8e7e +size 568383 diff --git a/docs/img/initialize_STOP_copy.png b/docs/img/initialize_STOP_copy.png new file mode 100644 index 00000000000000..7b310dc5d60d48 --- /dev/null +++ b/docs/img/initialize_STOP_copy.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:58842c11fb9ab70253030c9d1122b1be227e7b0bf8695a5f6491d7b6fc4e576d +size 301148 diff --git a/docs/img/install-linux-01.png b/docs/img/install-linux-01.png new file mode 100644 index 00000000000000..f28e7e54a528a9 --- /dev/null +++ b/docs/img/install-linux-01.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6a42792cf5685c1a18530449d39b004027a0f1df2c731d827a6a22da335292d9 +size 96974 diff --git a/docs/img/install-linux-03_0.png b/docs/img/install-linux-03_0.png new file mode 100644 index 00000000000000..e6c714abc40ee6 --- /dev/null +++ b/docs/img/install-linux-03_0.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:083d1c6c43d9df83405a0348f9e3ca87fd68ca7e3c3ffa33e362d39b87da820c +size 64793 diff --git a/docs/img/install-linux-05.png b/docs/img/install-linux-05.png new file mode 100644 index 00000000000000..996df1a9274f43 --- /dev/null +++ b/docs/img/install-linux-05.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f563a54637a942f50b7ce9448b1fc778be2c5b74204e7f9d63176cef14559779 +size 71164 diff --git a/docs/img/install-linux-fpga-01.png b/docs/img/install-linux-fpga-01.png new file mode 100644 index 00000000000000..5eac578cd391f5 --- /dev/null +++ b/docs/img/install-linux-fpga-01.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f4b9eef8c0b59891e659f016d1e6d1a8848f3da6a7fe9b8ca15b0580153e619e +size 81763 diff --git a/docs/img/install-linux-fpga-01R4.png b/docs/img/install-linux-fpga-01R4.png new file mode 100644 index 00000000000000..3e9fd76aeb672b --- /dev/null +++ b/docs/img/install-linux-fpga-01R4.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3a013212fc0e496eaec3b8536a198ca57d6cb3c20af3636b8b357f0d56fe562a +size 118929 diff --git a/docs/img/install-linux-fpga-02.png b/docs/img/install-linux-fpga-02.png new file mode 100644 index 00000000000000..339733d623dcf8 --- /dev/null +++ b/docs/img/install-linux-fpga-02.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:01bbc258e620e7c86f2eef0942a71ae80e145c23c770d3961f411562d0d69d37 +size 86357 diff --git a/docs/img/install-linux-fpga-02R4.png b/docs/img/install-linux-fpga-02R4.png new file mode 100644 index 00000000000000..ee7c9af9b26c0b --- /dev/null +++ b/docs/img/install-linux-fpga-02R4.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:98d444502a38f315f357772fb6618a752eac7bb7441efc7f3e16f4528a8cc3d3 +size 130350 diff --git a/docs/img/install-linux-fpga-03.png b/docs/img/install-linux-fpga-03.png new file mode 100644 index 00000000000000..35d448cae9fac5 --- /dev/null +++ b/docs/img/install-linux-fpga-03.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e2310580a11326896d9780176a080c81b78a7c5da06d4e9f7b8a0580ecbd8eea +size 121442 diff --git a/docs/img/install-linux-fpga-03R4.png b/docs/img/install-linux-fpga-03R4.png new file mode 100644 index 00000000000000..0b5b2eb1ca4b24 --- /dev/null +++ b/docs/img/install-linux-fpga-03R4.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f1d0e8076dadcc092e8198718441a033d48d4daacfafa2775dc6506d2dcf9120 +size 109345 diff --git a/docs/img/install-linux-fpga-04.png b/docs/img/install-linux-fpga-04.png new file mode 100644 index 00000000000000..1ab2d789d8eba0 --- /dev/null +++ b/docs/img/install-linux-fpga-04.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:51348a146a4e2a1b3bb6f7155809b2a36cab91351ba32c5760d5fc5bd8210086 +size 105408 diff --git a/docs/img/install-linux-fpga-05.png b/docs/img/install-linux-fpga-05.png new file mode 100644 index 00000000000000..4ad6665fa55dbe --- /dev/null +++ b/docs/img/install-linux-fpga-05.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:56c00ce2b4b251b5be512758ea91f32faf846050c2b7480f223587b533f888b9 +size 81576 diff --git a/docs/img/int8vsfp32.png b/docs/img/int8vsfp32.png new file mode 100644 index 00000000000000..27680e89625960 --- /dev/null +++ b/docs/img/int8vsfp32.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:411d78dc7fa90d6e857d1486eab7c7e7e338f1e27f21d04fadd02433d8a74f32 +size 29204 diff --git a/docs/img/intel_logo.png b/docs/img/intel_logo.png new file mode 100644 index 00000000000000..77a3ff51275b83 --- /dev/null +++ b/docs/img/intel_logo.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2d147adf801535e95d8b627a8a1d23f7b89dea1eabe06218235e756b0a9866fe +size 1636 diff --git a/docs/img/myriad_driver.png b/docs/img/myriad_driver.png new file mode 100644 index 00000000000000..c1277480141a41 --- /dev/null +++ b/docs/img/myriad_driver.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4b9798c721eb6ba6a73489790ba8555dc2c2b3f9b3a99992c59e173e2624aba4 +size 8392 diff --git a/docs/img/openvino-install-linux-00.png b/docs/img/openvino-install-linux-00.png new file mode 100644 index 00000000000000..62e39cf4a2362c --- /dev/null +++ b/docs/img/openvino-install-linux-00.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:db0049a7cbd7872e8fa9d7500143fa3ab0a4efc6be93732e95449effe5178219 +size 116689 diff --git a/docs/img/openvino-install-linux-01.png b/docs/img/openvino-install-linux-01.png new file mode 100644 index 00000000000000..a1bfe4bd1f033d --- /dev/null +++ b/docs/img/openvino-install-linux-01.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4ad22e7056c752e13c27b922d65df430977ee596caba430f0b796594b8d61cc3 +size 78355 diff --git a/docs/img/openvino-install-linux-02.png b/docs/img/openvino-install-linux-02.png new file mode 100644 index 00000000000000..8a6b2ab39cb15e --- /dev/null +++ b/docs/img/openvino-install-linux-02.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5de8915825f7518bf15b2646e5da1e6e9b2ee5d4b458f1db283fc18c8a7e4beb +size 99507 diff --git a/docs/img/openvino-install-linux-03.png b/docs/img/openvino-install-linux-03.png new file mode 100644 index 00000000000000..4f30356e7872a1 --- /dev/null +++ b/docs/img/openvino-install-linux-03.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9e07066337fe9cd4fbe65d06e6960433b142fb01b77db52eece88fde8c63a07a +size 100357 diff --git a/docs/img/openvino-install-linux-04.png b/docs/img/openvino-install-linux-04.png new file mode 100644 index 00000000000000..22228a940d1258 --- /dev/null +++ b/docs/img/openvino-install-linux-04.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d56da346677edf0098ebfd769ec4c2f66b81720ea9eaa43e96320b15066b2f00 +size 69092 diff --git a/docs/img/openvino-install-macos-01.png b/docs/img/openvino-install-macos-01.png new file mode 100644 index 00000000000000..8ffebf1ca9a8d0 --- /dev/null +++ b/docs/img/openvino-install-macos-01.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:55b47515883dbbbb2c133ed7ab8decee67f298948b06efd9d170fc7de6a5d55c +size 480186 diff --git a/docs/img/openvino-install-macos-02.png b/docs/img/openvino-install-macos-02.png new file mode 100644 index 00000000000000..6556b03e6a70d1 --- /dev/null +++ b/docs/img/openvino-install-macos-02.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8ac9f4df8618ce9782fb1c234838de9edf9cb9666d2441c9c8a0c38cb4e1b638 +size 81654 diff --git a/docs/img/openvino-install-macos-03.png b/docs/img/openvino-install-macos-03.png new file mode 100644 index 00000000000000..4d9c053981aa08 --- /dev/null +++ b/docs/img/openvino-install-macos-03.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dd347719eb890cd407f0758d89d10b2c91305f164ee587de4971f48839b3fb63 +size 130658 diff --git a/docs/img/openvino-install-macos-04.png b/docs/img/openvino-install-macos-04.png new file mode 100644 index 00000000000000..a718ccc75cf6fe --- /dev/null +++ b/docs/img/openvino-install-macos-04.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:85dcfae244ea6e117371e41a7eb59b660a4e5b4b97c88962fa70fad44566940e +size 133677 diff --git a/docs/img/openvino-install-macos-05.png b/docs/img/openvino-install-macos-05.png new file mode 100644 index 00000000000000..ef520a1856b59b --- /dev/null +++ b/docs/img/openvino-install-macos-05.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d9b4638c301d543658e6304d1b339b8731f802dcd34f859cfdc0baa9f0554265 +size 128579 diff --git a/docs/img/openvino-install-windows-01.png b/docs/img/openvino-install-windows-01.png new file mode 100644 index 00000000000000..569052995c5fa8 --- /dev/null +++ b/docs/img/openvino-install-windows-01.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:de85bd59edc66bfd37aab395bc7e2dde2988f16c7ff263153d382bfcbeb9ff2e +size 35998 diff --git a/docs/img/openvino-install-windows-02.png b/docs/img/openvino-install-windows-02.png new file mode 100644 index 00000000000000..b83cf8472c6acc --- /dev/null +++ b/docs/img/openvino-install-windows-02.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7b2586ce56ff1a5c0527b53dc21aa09b489c11e24fec82c6a58e2db860a772c4 +size 39720 diff --git a/docs/img/openvino-install-windows-03.png b/docs/img/openvino-install-windows-03.png new file mode 100644 index 00000000000000..a96a2e17eab254 --- /dev/null +++ b/docs/img/openvino-install-windows-03.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3faa8b02a8477b5d764ea2d47502bc0a878087614e0516704cc1525b5b60dedb +size 26412 diff --git a/docs/img/openvino-install-windows-fpga-01.png b/docs/img/openvino-install-windows-fpga-01.png new file mode 100644 index 00000000000000..81dae7375c3c85 --- /dev/null +++ b/docs/img/openvino-install-windows-fpga-01.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7f827f4cf5a8cfa1ae93bb5105da0e86cc33cc17363f4203460781901ef39d8a +size 40812 diff --git a/docs/img/openvino-install-windows-fpga-02.png b/docs/img/openvino-install-windows-fpga-02.png new file mode 100644 index 00000000000000..de9829d22667a3 --- /dev/null +++ b/docs/img/openvino-install-windows-fpga-02.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:32c1d47529d291b8e7b0d1f09e927538681a777ce83fa370b2a71859ca5ed2dc +size 23184 diff --git a/docs/img/openvino-install-windows-fpga-03.png b/docs/img/openvino-install-windows-fpga-03.png new file mode 100644 index 00000000000000..c6c4bf5226c5e9 --- /dev/null +++ b/docs/img/openvino-install-windows-fpga-03.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:022eb8cac6cb8d254676802ecf05c95748094e8f592a4d26535e5de5dc22e21d +size 30581 diff --git a/docs/img/output_trimmed.png b/docs/img/output_trimmed.png new file mode 100644 index 00000000000000..cb6dbde165a16e --- /dev/null +++ b/docs/img/output_trimmed.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e47c2b259bea6d3539f0c4556de4cc3a07f6d60af54e1cf32002b4a7bc2cc90a +size 25231 diff --git a/docs/img/ov_r3_install_01.png b/docs/img/ov_r3_install_01.png new file mode 100644 index 00000000000000..557d2a039d8674 --- /dev/null +++ b/docs/img/ov_r3_install_01.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c4ca97c26a0c85b63bbfe4f877b2806e6feb6d4db435d0fbc94fdbee7f140730 +size 42163 diff --git a/docs/img/ov_r3_install_02.png b/docs/img/ov_r3_install_02.png new file mode 100644 index 00000000000000..d75638a5218d2b --- /dev/null +++ b/docs/img/ov_r3_install_02.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:efe024e0e4df49978213b7500a9e93da25ff9f7b756e47cb82d1d4c0aad6e6cb +size 82166 diff --git a/docs/img/people-counter-syestem.jpg b/docs/img/people-counter-syestem.jpg new file mode 100644 index 00000000000000..f1ac4dc73a90f2 --- /dev/null +++ b/docs/img/people-counter-syestem.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bda7f6de0434e13166c7204393666ba1fdde931cc147211bfff9c3b935ef5ed4 +size 38027 diff --git a/docs/img/performance-boost-DL-algorithm.jpg b/docs/img/performance-boost-DL-algorithm.jpg new file mode 100644 index 00000000000000..4a20cc018b54cd --- /dev/null +++ b/docs/img/performance-boost-DL-algorithm.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1e2c8f9b205825bf9a08f66cd12b010d8d1f9e1677e1f89b2e0360cda1cc7dc6 +size 29618 diff --git a/docs/img/resnet_269.png b/docs/img/resnet_269.png new file mode 100644 index 00000000000000..4ef638090e9f61 --- /dev/null +++ b/docs/img/resnet_269.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:92d36b9527a3e316cd9eb2b6f5054c312466df004e4aa9c3458e165330bc6561 +size 24157 diff --git a/docs/img/robotics-with-AI.jpg b/docs/img/robotics-with-AI.jpg new file mode 100644 index 00000000000000..fa8c9e331b3ff7 --- /dev/null +++ b/docs/img/robotics-with-AI.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:375c4960136a158a1c5a4968439bacc2af75dc46b13b380c1f8790807d0a5b22 +size 94942 diff --git a/docs/img/security-barrier-results.png b/docs/img/security-barrier-results.png new file mode 100644 index 00000000000000..50ba5e1070f057 --- /dev/null +++ b/docs/img/security-barrier-results.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:abf2a9a8b9e55a8527b470df05fb53df26f4f681408d540614df735bfe7a6975 +size 940455 diff --git a/docs/img/selection_dialog.png b/docs/img/selection_dialog.png new file mode 100644 index 00000000000000..fa9e97725d3380 --- /dev/null +++ b/docs/img/selection_dialog.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aee73cd3275e6aaeb13a3df843ce23889cadc6e7e4d031349de7c4dfe851c2f5 +size 25629 diff --git a/docs/img/squeezenet_results.png b/docs/img/squeezenet_results.png new file mode 100644 index 00000000000000..1d3c24b3ceeae2 --- /dev/null +++ b/docs/img/squeezenet_results.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:888a593ce1b9ed484c98813d788f32fddd6461f7a949fb1a0296ac529d97b30e +size 298555 diff --git a/docs/img/throughput_atom.png b/docs/img/throughput_atom.png new file mode 100644 index 00000000000000..420ca1449c58d2 --- /dev/null +++ b/docs/img/throughput_atom.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6a023743a67b2e5de5bab46ff46de507ade2c0e6ad20479aac518e8b07afa7d4 +size 26528 diff --git a/docs/img/throughput_hddlr.png b/docs/img/throughput_hddlr.png new file mode 100644 index 00000000000000..687245ed714768 --- /dev/null +++ b/docs/img/throughput_hddlr.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d07e8946112157ba455b653c99f2f1a9110d5ae4ada2f0f0ba195cc3ab9f1f5a +size 19479 diff --git a/docs/img/throughput_i3.png b/docs/img/throughput_i3.png new file mode 100644 index 00000000000000..8e006a7f2cfbb1 --- /dev/null +++ b/docs/img/throughput_i3.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3156cf692745abe9995fc92d84a333e1f8ceff81e20409f918dd5ec0f7e03f7c +size 28910 diff --git a/docs/img/throughput_i5.png b/docs/img/throughput_i5.png new file mode 100644 index 00000000000000..a5d0089d157d8d --- /dev/null +++ b/docs/img/throughput_i5.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8bc8b968f7758abd04bc55add1099a20e5015b924263d14e2bec1289326ec003 +size 29832 diff --git a/docs/img/throughput_i7.png b/docs/img/throughput_i7.png new file mode 100644 index 00000000000000..0140b1529e7695 --- /dev/null +++ b/docs/img/throughput_i7.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8ce3ffe16b1f2033780482ffb2075c53a7a0f0d7012b25c1bdd9df5086f40c43 +size 28247 diff --git a/docs/img/throughput_i9.png b/docs/img/throughput_i9.png new file mode 100644 index 00000000000000..da7f99d038c099 --- /dev/null +++ b/docs/img/throughput_i9.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3b0946f59a32ffdd4e58cb1861f0c0bdbfa570476161ed9b7e7b1af9cb5204e8 +size 29158 diff --git a/docs/img/throughput_ivad_fpga.png b/docs/img/throughput_ivad_fpga.png new file mode 100644 index 00000000000000..63fa3a745f4c88 Binary files /dev/null and b/docs/img/throughput_ivad_fpga.png differ diff --git a/docs/img/throughput_ncs2.png b/docs/img/throughput_ncs2.png new file mode 100644 index 00000000000000..edb9017983fb52 --- /dev/null +++ b/docs/img/throughput_ncs2.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb7865973b2eae48a4d89fbbcac6ae5ed25d663ee93086999cf2e5f8aa5dbcdf +size 18914 diff --git a/docs/img/throughput_xeon_e212g.png b/docs/img/throughput_xeon_e212g.png new file mode 100644 index 00000000000000..05b3a831679c11 --- /dev/null +++ b/docs/img/throughput_xeon_e212g.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:248eb597c25b829dbcf41273e72a8a0e190fca95ec5c4b8b854ebfb6cacd16d0 +size 26356 diff --git a/docs/img/throughput_xeon_gold.png b/docs/img/throughput_xeon_gold.png new file mode 100644 index 00000000000000..d2e7759fd58767 --- /dev/null +++ b/docs/img/throughput_xeon_gold.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1c313bf9525b4257519dd3e7a9b4771f450f75d89e68a75b1ee709b4a0fc8ba9 +size 26674 diff --git a/docs/img/throughput_xeon_platinum.png b/docs/img/throughput_xeon_platinum.png new file mode 100644 index 00000000000000..7cd82b740234f8 --- /dev/null +++ b/docs/img/throughput_xeon_platinum.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f422e04f717e6c1d015ed0df908713104e77607315a0f346322afb2fc3aa43fb +size 26315 diff --git a/docs/img/throughput_xeon_silver.png b/docs/img/throughput_xeon_silver.png new file mode 100644 index 00000000000000..b390b3adbd98f2 --- /dev/null +++ b/docs/img/throughput_xeon_silver.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:57bee340d37eca2d2d89a0bec8b4fca8e13b4ad4acbe3e6aa2009c247d75dd75 +size 28028 diff --git a/docs/img/usb-blaster-setup.png b/docs/img/usb-blaster-setup.png new file mode 100644 index 00000000000000..88bbee86317e10 --- /dev/null +++ b/docs/img/usb-blaster-setup.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:48ff1ba6e556c20d97c57ada4f4a57de0bcdd5fc85d7b1357ae2db2d19607b32 +size 440566 diff --git a/docs/img/value_atom.png b/docs/img/value_atom.png new file mode 100644 index 00000000000000..0e04cc41941be6 --- /dev/null +++ b/docs/img/value_atom.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cd354d49dde3dec1f8fd468f60a9181b9757ea4d2da2460bd550fbdb45c43f2b +size 23932 diff --git a/docs/img/value_hddlr.png b/docs/img/value_hddlr.png new file mode 100644 index 00000000000000..ed385d55efb9f8 --- /dev/null +++ b/docs/img/value_hddlr.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e03b8173ab61261d51316bd33b9959b4ef30bc589791d9d8288f1996358b8d03 +size 18085 diff --git a/docs/img/value_i3.png b/docs/img/value_i3.png new file mode 100644 index 00000000000000..8c94040af5b5a6 --- /dev/null +++ b/docs/img/value_i3.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6b7fb638977d48804fc0a2887ecbd8cff8fe91d9c89d67f62d4a9bfc0c2e54e7 +size 23789 diff --git a/docs/img/value_i5.png b/docs/img/value_i5.png new file mode 100644 index 00000000000000..aaf4eb87c5c0c7 --- /dev/null +++ b/docs/img/value_i5.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6a673a39a511c31198be23b78211ca504ece5bb1faa09280d489266033dea205 +size 24144 diff --git a/docs/img/value_i7.png b/docs/img/value_i7.png new file mode 100644 index 00000000000000..643b786bd3856e --- /dev/null +++ b/docs/img/value_i7.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f2adaa13ce4148a26411b922a04045f2c072dee6b687d8b321a3208f7102706e +size 21928 diff --git a/docs/img/value_i9.png b/docs/img/value_i9.png new file mode 100644 index 00000000000000..c82683d632caee --- /dev/null +++ b/docs/img/value_i9.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:824e6ed1c463bd0d30cc857a0da9d21870960a66148e7413ceac08d49835d991 +size 24134 diff --git a/docs/img/value_ivad_fpga.png b/docs/img/value_ivad_fpga.png new file mode 100644 index 00000000000000..3c86fb0e35b005 Binary files /dev/null and b/docs/img/value_ivad_fpga.png differ diff --git a/docs/img/value_ncs2.png b/docs/img/value_ncs2.png new file mode 100644 index 00000000000000..c4726ccd9aaf6f --- /dev/null +++ b/docs/img/value_ncs2.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:673ab40557aea496cd3ee9ccf2a6c42bed553e6748a0367bbd6f17fe0395c700 +size 18082 diff --git a/docs/img/value_xeon_e212g.png b/docs/img/value_xeon_e212g.png new file mode 100644 index 00000000000000..668fa576a7f072 --- /dev/null +++ b/docs/img/value_xeon_e212g.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f7bbf369ad89f77be771f275a6ab9b9fcbc506168dc4f2a7fc25428aec576811 +size 23159 diff --git a/docs/img/value_xeon_gold.png b/docs/img/value_xeon_gold.png new file mode 100644 index 00000000000000..73f0e86839b149 --- /dev/null +++ b/docs/img/value_xeon_gold.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c4f598a4e99019b2bf099224794bd9fd88e78d6cd73a70766aa0bf93d32a2511 +size 23611 diff --git a/docs/img/value_xeon_platinum.png b/docs/img/value_xeon_platinum.png new file mode 100644 index 00000000000000..eec5e5c6317b93 --- /dev/null +++ b/docs/img/value_xeon_platinum.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4c8fae5e4d891c9436a8795ac1db8859391d487ef1c2fe88d1af1e6cf8b4e853 +size 23536 diff --git a/docs/img/value_xeon_silver.png b/docs/img/value_xeon_silver.png new file mode 100644 index 00000000000000..6aabca32d2c568 --- /dev/null +++ b/docs/img/value_xeon_silver.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:484c72f866cde63169ee1b5edec684af4d1f4de060265c2d90f87d45b24473d1 +size 25088 diff --git a/docs/img/visualStudioAccount.PNG b/docs/img/visualStudioAccount.PNG new file mode 100644 index 00000000000000..85635a0c64a7e1 --- /dev/null +++ b/docs/img/visualStudioAccount.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d38f4a540ff4db8801893f636fbb8fce30b43ca41257aea2706982f49fc7c7be +size 40036 diff --git a/docs/img/vs_Studio_2015_First_Install_Screen_CPP.PNG b/docs/img/vs_Studio_2015_First_Install_Screen_CPP.PNG new file mode 100644 index 00000000000000..e357f8c52219b3 --- /dev/null +++ b/docs/img/vs_Studio_2015_First_Install_Screen_CPP.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a711fcd717fa0f9bc9e7e31dd264486929e0554e4ca63309458e0af9336a21ff +size 52848 diff --git a/docs/img/vs_Studio_2015_Summary.PNG b/docs/img/vs_Studio_2015_Summary.PNG new file mode 100644 index 00000000000000..6382897f83c73a --- /dev/null +++ b/docs/img/vs_Studio_2015_Summary.PNG @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5d1ddf36504fb7286ec9f37682feb1a209aa319391ce04deac496da7636080dd +size 31644 diff --git a/docs/img/vtune_async.png b/docs/img/vtune_async.png new file mode 100644 index 00000000000000..044d7a606e06e0 --- /dev/null +++ b/docs/img/vtune_async.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c47ede993681ba3f0a3e3f4274369ee1854365b1bcd1b5cb0f649a781fdf51bd +size 6215 diff --git a/docs/img/vtune_option.jpg b/docs/img/vtune_option.jpg new file mode 100644 index 00000000000000..f284465314bde8 --- /dev/null +++ b/docs/img/vtune_option.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4a82b414dbc4f7ce2eae625bb7c9c7b88c154a7c476374683dd9886564560f67 +size 7951 diff --git a/docs/img/vtune_regular.png b/docs/img/vtune_regular.png new file mode 100644 index 00000000000000..9d01e7627ad028 --- /dev/null +++ b/docs/img/vtune_regular.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9a4fce51076df19fbca04a36d6886765771f8ffc174bebbd751bfc77d91ab1f2 +size 7081 diff --git a/docs/img/vtune_timeline.png b/docs/img/vtune_timeline.png new file mode 100644 index 00000000000000..f29cc161cdad36 --- /dev/null +++ b/docs/img/vtune_timeline.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c514316f78f04e8c000f6b95dc579d8c63c57f00c4c980ea4d358a6a4f1b9d7e +size 8744 diff --git a/docs/img/vtune_topdown_view.jpg b/docs/img/vtune_topdown_view.jpg new file mode 100644 index 00000000000000..fd3431c97ec70f --- /dev/null +++ b/docs/img/vtune_topdown_view.jpg @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:40c4b9096ef264807d930fe64d427f53a69ce2247c836415e64c5aa72d9f245e +size 36468 diff --git a/docs/img/workflow_steps.png b/docs/img/workflow_steps.png new file mode 100644 index 00000000000000..7e8f3030c5d6b7 --- /dev/null +++ b/docs/img/workflow_steps.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b630a7deb8bbcf1d5384c351baff7505dc96a1a5d59b5f6786845d549d93d9ab +size 36881 diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 00000000000000..1802e71249b74f --- /dev/null +++ b/docs/index.md @@ -0,0 +1,66 @@ +# OpenVINO™ Toolkit Documentation {#index} + +## Introduction to OpenVINO™ Toolkit + +OpenVINO™ toolkit quickly deploys applications and solutions that emulate human vision. Based on Convolutional Neural Networks (CNNs), the toolkit extends computer vision (CV) workloads across Intel® hardware, maximizing performance. The OpenVINO™ toolkit includes the Deep Learning Deployment Toolkit (DLDT). + +OpenVINO™ toolkit: + +- Enables CNN-based deep learning inference on the edge +- Supports heterogeneous execution across an Intel® CPU, Intel® Integrated Graphics, Intel® FPGA, Intel® Neural Compute Stick 2 and Intel® Vision Accelerator Design with Intel® Movidius™ VPUs +- Speeds time-to-market via an easy-to-use library of computer vision functions and pre-optimized kernels +- Includes optimized calls for computer vision standards, including OpenCV\* and OpenCL™ + +## Toolkit Components + +OpenVINO™ toolkit includes the following components: + +- Deep Learning Deployment Toolkit (DLDT) + - [Deep Learning Model Optimizer](MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) - A cross-platform command-line tool for importing models and + preparing them for optimal execution with the Inference Engine. The Model Optimizer imports, converts, and optimizes models, which were trained in popular frameworks, such as Caffe*, + TensorFlow*, MXNet*, Kaldi*, and ONNX*. + - [Deep Learning Inference Engine](IE_DG/inference_engine_intro.md) - A unified API to allow high performance inference on many hardware types + including the following: + - Intel® CPU + - Intel® Integrated Graphics + - Intel® Neural Compute Stick 2 + - Intel® Vision Accelerator Design with Intel® Movidius™ vision processing unit (VPU) + - [Samples](IE_DG/Samples_Overview.md) - A set of simple console applications demonstrating how to use the Inference Engine in your applications + - [Tools](IE_DG/Tools_Overview.md) - A set of simple console tools to work with your models +- [Open Model Zoo](@ref omz_models_intel_index) + - [Demos](@ref omz_demos_README) - Console applications that demonstrate how you can use the Inference Engine in your applications to solve specific use cases + - [Tools](IE_DG/Tools_Overview.md) - Additional tools to download models and check accuracy + - [Documentation for Pretrained Models](@ref omz_models_intel_index) - Documentation for pretrained models is available in the [Open Model Zoo repository](https://github.com/opencv/open_model_zoo) +- [Post-Training Optimization tool](@ref pot_README) - A tool to calibrate a model and then execute it in the INT8 precision +- [Deep Learning Workbench](@ref workbench_docs_Workbench_DG_Introduction) - A web-based graphical environment that allows you to easily use various sophisticated OpenVINO™ toolkit components +- Deep Learning Streamer (DL Streamer) – Streaming analytics framework, based on GStreamer, for constructing graphs of media analytics components. DL Streamer can be installed by the Intel® Distribution of OpenVINO™ toolkit installer. Its open source version is available on [GitHub](https://github.com/opencv/gst-video-analytics). For the DL Streamer documentation, see: + - [DL Streamer Samples](IE_DG/Tools_Overview.md) + - [API Reference](https://opencv.github.io/gst-video-analytics/) + - [Elements](https://github.com/opencv/gst-video-analytics/wiki/Elements) + - [Tutorial](https://github.com/opencv/gst-video-analytics/wiki/DL%20Streamer%20Tutorial) +- [OpenCV](https://docs.opencv.org/master/) - OpenCV* community version compiled for Intel® hardware +- Drivers and runtimes for OpenCL™ version 2.1 +- [Intel® Media SDK](https://software.intel.com/en-us/media-sdk) + +## Documentation Set Contents + +OpenVINO™ toolkit documentation set includes the following documents: + +- [Install the Intel® Distribution of OpenVINO™ Toolkit for Linux*](install_guides/installing-openvino-linux.md) +- [Install the Intel® Distribution of OpenVINO™ Toolkit for Linux with FPGA Support](install_guides/installing-openvino-linux-fpga.md) +- [Install the Intel® Distribution of OpenVINO™ Toolkit for Windows*](install_guides/installing-openvino-windows.md) +- [Install the Intel® Distribution of OpenVINO™ Toolkit for macOS*](install_guides/installing-openvino-macos.md) +- [Install the Intel® Distribution of OpenVINO™ Toolkit for Raspbian*](install_guides/installing-openvino-raspbian.md) +- [Install OpenVINO™ Deep Learning Workbench](@ref workbench_docs_Workbench_DG_Install_Workbench) +- [Introduction to Deep Learning Deployment Toolkit](IE_DG/Introduction.md) +- [Model Optimizer Developer Guide](MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) +- [Inference Engine Developer Guide](IE_DG/Deep_Learning_Inference_Engine_DevGuide.md) +- [Post-Training Optimization Tool](@ref pot_README) +- [Inference Engine Samples](IE_DG/Samples_Overview.md) +- [Demo Applications](@ref omz_demos_README) +- [Tools](IE_DG/Tools_Overview.md) +- [Pretrained Models](@ref omz_models_intel_index) +- [Known Issues](IE_DG/Known_Issues_Limitations.md) +- [Legal Information](@ref omz_demos_README) + +> **Typical Next Step:** [Introduction to Deep Learning Deployment Toolkit](IE_DG/Introduction.md) diff --git a/docs/install_guides/PAC_Configure.md b/docs/install_guides/PAC_Configure.md new file mode 100644 index 00000000000000..e277c9f5b6fc2e --- /dev/null +++ b/docs/install_guides/PAC_Configure.md @@ -0,0 +1,241 @@ +# Configuration Guide for Intel® Distribution of OpenVINO™ toolkit 2020.4 and the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA on CentOS or Ubuntu* {#openvino_docs_install_guides_PAC_Configure} + +> **NOTE**: For previous versions, see [Configuration Guide for OpenVINO 2020.3](https://docs.openvinotoolkit.org/2020.3/_docs_install_guides_PAC_Configure.html), [Configuration Guide for OpenVINO 2020.2](https://docs.openvinotoolkit.org/2020.2/_docs_install_guides_PAC_Configure.html), [Configuration Guide for OpenVINO 2019R1/2019R2/2019R3](https://docs.openvinotoolkit.org/2019_R3.1/_docs_install_guides_PAC_Configure_2019RX.html), [Configuration Guide for OpenVINO 2018R5](https://docs.openvinotoolkit.org/2019_R1/_docs_install_guides_PAC_Configure_2018R5.html). + +## Get Started + +The following describes the set-up of the Intel® Distribution of OpenVINO™ toolkit on CentOS* 7.4 or Ubuntu* 16.04, kernel 4.15. This is based upon a completely fresh install of the OS with developer tools included. Official Intel® documentation for the install process can be found in the following locations and it is highly recommended that these are read, especially for new users. This document serves as a guide, and in some cases, adds additional detail where necessary. + +[Intel® Acceleration Stack for FPGAs Quick Start Guide](https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug-qs-ias-v1-2-1.pdf) + +[OpenCL™ on Intel® PAC Quick Start Guide](https://www.intel.com/content/dam/altera-www/global/en_US/pdfs/literature/ug/ug-qs-ias-opencl-a10.pdf) + +[Installing the Intel® Distribution of OpenVINO™ toolkit for Linux*](https://docs.openvinotoolkit.org/latest/_docs_install_guides_installing_openvino_linux.html) + +(Optional): Install NTFS support for transferring large installers if already downloaded on another machine. +```sh +sudo yum -y install epel-release +``` +```sh +sudo yum -y install ntfs-3g +``` + +## Install Intel® PAC and the Intel® Programmable Acceleration Card Stack + +1. Download version 1.2.1 of the Acceleration Stack for Runtime from the [Intel FPGA Acceleration Hub](https://www.altera.com/solutions/acceleration-hub/downloads.html). +This downloads as `a10_gx_pac_ias_1_2_1_pv_rte.tar.gz`. Let it download to `~/Downloads`. + +2. Create a new directory to install to: +```sh +mkdir -p ~/tools/intelrtestack +``` + +3. Untar and launch the installer: +```sh +cd ~/Downloads +``` +```sh +tar xf a10_gx_pac_ias_1_2_1_pv_rte.tar.gz +``` +```sh +cd a10_gx_pac_ias_1_2_1_pv_rte_installer +``` +```sh +./setup.sh +``` + +4. Select **Y** to install OPAE and accept license and when asked, specify `/home//tools/intelrtestack` as the absolute install path. During the installation there should be a message stating the directory already exists as it was created in the first command above. Select **Y** to install to this directory. If this message is not seen, it suggests that there was a typo when entering the install location. + +5. Tools are installed to the following directories: + * OpenCL™ Run-time Environment: `~/tools/intelrtestack/opencl_rte/aclrte-linux64` + * Intel® Acceleration Stack for FPGAs: `~/tools/intelrtestack/a10_gx_pac_ias_1_2_1_pv` + +7. Check the version of the FPGA Interface Manager firmware on the PAC board. +```sh +sudo fpgainfo fme +``` + +8. If the reported FIM (`Pr Interface Id`) is not `38d782e3-b612-5343-b934-2433e348ac4c` then follow the instructions in Appendix A: Updating the FIM and BMC Firmware of the [Intel® Acceleration Stack for FPGAs Quick Start Guide](https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug-qs-ias-v1-2-1.pdf) to update the FIM and BMC. + +9. Run the built in self-test to verify operation of the Acceleration Stack and Intel® PAC in a non-virtualized environment. +```sh +sudo sh -c "echo 20 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages" +``` +```sh +source ~/tools/intelrtestack/init_env.sh +``` +```sh +sudo fpgabist $OPAE_PLATFORM_ROOT/hw/samples/nlb_mode_3/bin/nlb_mode_3.gbs +``` + +## Verify the Intel® Acceleration Stack for FPGAs OpenCL™ BSP + +1. Remove any previous FCD files that may be from previous installations of hardware in the `/opt/Intel/OpenCL/Boards/` directory: +```sh +cd /opt/Intel/OpenCL/Boards +sudo rm -rf *.fcd +``` + +2. Install `lsb_release` on your system if you are using CentOS: +```sh +sudo yum install redhat-lsb-core +``` + +3. Create an initialization script `~/init_openvino.sh` with the following content that can be run upon opening a new terminal or rebooting. This will source the script ran above as well as setting up the OpenCL™ environment. +```sh +source $HOME/tools/intelrtestack/init_env.sh +``` +```sh +export CL_CONTEXT_COMPILER_MODE_ALTERA=3 +``` +```sh +export CL_CONTEXT_COMPILER_MODE_INTELFPGA=3 +``` +```sh +export INTELFPGAOCLSDKROOT="/opt/altera/aocl-pro-rte/aclrte-linux64" +``` +```sh +export ALTERAOCLSDKROOT="$INTELFPGAOCLSDKROOT" +``` +```sh +export AOCL_BOARD_PACKAGE_ROOT="$OPAE_PLATFORM_ROOT/opencl/opencl_bsp" +``` +```sh +$AOCL_BOARD_PACKAGE_ROOT/linux64/libexec/setup_permissions.sh +``` +```sh +source $INTELFPGAOCLSDKROOT/init_opencl.sh +``` + +4. Source the script: +```sh +source ~/init_openvino.sh +``` + +5. Some of the settings made in the child scripts need a reboot to take effect. Reboot the machine and source the script again. Note that this script should be sourced each time a new terminal is opened for use with the Intel® Acceleration Stack for FPGAs and Intel® Distribution of OpenVINO™ toolkit. +```sh +source ~/init_openvino.sh +``` + +6. Install the OpenCL™ driver: +```sh +cd ~ +``` +```sh +sudo -E ./tools/intelrtestack/opencl_rte/aclrte-linux64/bin/aocl install +``` +Select **Y** when asked to install the BSP. Note that the following warning can be safely ignored. +```sh +WARNING: install not implemented. Please refer to DCP Quick Start User Guide. +``` + +7. Program the Intel® PAC board with a pre-compiled `.aocx` file (OpenCL™ based FPGA bitstream). +```sh +cd $OPAE_PLATFORM_ROOT/opencl +``` +```sh +aocl program acl0 hello_world.aocx +``` + +8. Build and run the Hello World application: +```sh +sudo tar xf exm_opencl_hello_world_x64_linux.tgz +``` +```sh +sudo chmod -R a+w hello_world +``` +```sh +cd hello_world +``` +```sh +make +``` +```sh +cp ../hello_world.aocx ./bin +``` +```sh +./bin/host +``` + +## Add Intel® Distribution of OpenVINO™ toolkit with FPGA Support to Environment Variables + +1. To run the Intel® Distribution of OpenVINO™ toolkit, add the last four commands to the `~/init_openvino.sh` script. The previous content is shown as well. +```sh +source $HOME/tools/intelrtestack/init_env.sh +export CL_CONTEXT_COMPILER_MODE_ALTERA=3 +export CL_CONTEXT_COMPILER_MODE_INTELFPGA=3 +export INTELFPGAOCLSDKROOT="/opt/altera/aocl-pro-rte/aclrte-linux64" +export ALTERAOCLSDKROOT="$INTELFPGAOCLSDKROOT" +export AOCL_BOARD_PACKAGE_ROOT="$OPAE_PLATFORM_ROOT/opencl/opencl_bsp" +$AOCL_BOARD_PACKAGE_ROOT/linux64/libexec/setup_permissions.sh +source $INTELFPGAOCLSDKROOT/init_opencl.sh +export IE_INSTALL="/opt/intel/openvino/deployment_tools" +source $IE_INSTALL/../bin/setupvars.sh +export PATH="$PATH:$HOME/inference_engine_samples_build/intel64/Release" +alias mo="python3.6 $IE_INSTALL/model_optimizer/mo.py" +``` +For Ubuntu systems, it is recommended to use python3.5 above instead of python3.6. + +2. Source the script +```sh +source ~/init_openvino.sh +``` + +## Program a Bitstream + +The bitstream you program should correspond to the topology you want to deploy. In this section, you program a SqueezeNet bitstream and deploy the classification sample with a SqueezeNet model. + +> **IMPORTANT**: Only use bitstreams from the installed version of the Intel® Distribution of OpenVINO™ toolkit. Bitstreams from older versions of the Intel® Distribution of OpenVINO™ toolkit are incompatible with later versions. For example, you cannot use the `2020-3_RC_FP16_AlexNet_GoogleNet_Generic` bitstream, when the Intel® Distribution of OpenVINO™ toolkit supports the `2020-4_RC_FP16_AlexNet_GoogleNet_Generic bitstream`. + +There are different folders for each FPGA card type which were downloaded in the Intel® Distribution of OpenVINO™ toolkit package. +For the Intel® Programmable Acceleration Card with Intel® Arria® 10 FPGA GX, the pre-trained bitstreams are in the `/opt/intel/openvino/bitstreams/a10_dcp_bitstreams` directory. This example uses a SqueezeNet bitstream with low precision for the classification sample. + +Program the bitstream for Intel® Programmable Acceleration Card with Intel® Arria® 10 FPGA GX. +```sh +aocl program acl0 /opt/intel/openvino/bitstreams/a10_dcp_bitstreams/2020-4_RC_FP11_InceptionV1_ResNet_SqueezeNet_TinyYolo_YoloV3.aocx +``` + +## Use the Intel® Distribution of OpenVINO™ toolkit + +1. Run inference with the Intel® Distribution of OpenVINO™ toolkit independent of the demo scripts using the SqueezeNet model that was download by the scripts. For convenience, copy the necessary files to a local directory. If the workstation has been rebooted or a new terminal is opened, source the script above first. +```sh +mkdir ~/openvino_test +``` +```sh +cd ~/openvino_test +``` +```sh +cp ~/openvino_models/models/public/squeezenet1.1/squeezenet1.1.* . +``` +```sh +cp ~/openvino_models/ir/public/squeezenet1.1/FP16/squeezenet1.1.labels . +``` + +2. Note that the `squeezenet1.1.labels` file contains the classes used by ImageNet and is included here so that the inference results show text rather than classification numbers. Convert the model with the [Model Optimizer](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html). Note that the command below uses the alias defined in the script above and is not referred to in other documentation. +```sh +mo --input_model squeezenet1.1.caffemodel +``` + +3. Now run Inference on the CPU using one of the built in Inference Engine samples: +```sh +classification_sample_async -m squeezenet1.1.xml -i $IE_INSTALL/demo/car.png +``` + +4. Add the `-d` option to run on FPGA: +```sh +classification_sample_async -m squeezenet1.1.xml -i $IE_INSTALL/demo/car.png -d HETERO:FPGA,CPU +``` + +Congratulations, You are done with the Intel® Distribution of OpenVINO™ toolkit installation for FPGA. To learn more about how the Intel® Distribution of OpenVINO™ toolkit works, the Hello World tutorial and are other resources are provided below. + +## Hello World Face Detection Tutorial + +Use the [Intel® Distribution of OpenVINO™ toolkit with FPGA Hello World Face Detection Exercise](https://github.com/fritzboyle/openvino-with-fpga-hello-world-face-detection) to learn more about how the software and hardware work together. + +## Additional Resources + +Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) + +Intel® Distribution of OpenVINO™ toolkit documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org) + +Inference Engine FPGA plugin documentation: [https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html) diff --git a/docs/install_guides/PAC_Configure_2018R5.md b/docs/install_guides/PAC_Configure_2018R5.md new file mode 100644 index 00000000000000..8177adb315d2b5 --- /dev/null +++ b/docs/install_guides/PAC_Configure_2018R5.md @@ -0,0 +1,251 @@ +# Configuration Guide for Intel® Distribution of OpenVINO™ toolkit 2018R5 and the Intel® Programmable Acceleration Card with Intel® Arria® 10 FPGA GX on CentOS* {#openvino_docs_install_guides_PAC_Configure_2018R5} + +## Get Started + +The following describes the set-up of the Intel® Distribution of OpenVINO™ toolkit on CentOS* 7.4. This is based upon a completely fresh install of CentOS 7.4 with developer tools included. This document was written for the Intel® Distribution of OpenVINO™ toolkit 2018 R5 release and may be largely applicable for later versions. Official Intel® documentation for the install process can be found in the following locations and it is highly recommended that these are read, especially for new users. This document serves as a guide, and in some cases, adds additional detail, specifically for an install with `sudo` privileges on CentOS 7.4. + +[Intel® Acceleration Stack for FPGAs Quick Start Guide](https://www.intel.com/content/dam/altera-www/global/en_US/pdfs/literature/ug/ug-qs-ias-v1-1.pdf) + +[OpenCL™ on Intel® PAC Quick Start Guide](https://www.intel.com/content/dam/altera-www/global/en_US/pdfs/literature/ug/ug-qs-ias-opencl-a10-v1-1.pdf) + +[Installing the Intel® Distribution of OpenVINO™ toolkit for Linux*](https://docs.openvinotoolkit.org/latest/_docs_install_guides_installing_openvino_linux.html) + +(Optional): Install NTFS support for transferring large installers if already downloaded on another machine. +```sh +sudo yum -y install epel-release +``` +```sh +sudo yum -y install ntfs-3g +``` + +## Install Intel® PAC and the Intel® Programmable Acceleration Card Stack + +1. Download version 1.1 of the Acceleration Stack for Runtime from the [Intel FPGA Acceleration Hub](https://www.altera.com/solutions/acceleration-hub/downloads.html). +This downloads as `a10_gx_pac_ias_1_1_pv_rte_installer.tar.gz`. Let it download to `~/Downloads`. + +2. Create a new directory to install to: +```sh +mkdir -p ~/tools/intelrtestack +``` + +3. Untar and launch the installer: +```sh +cd ~/Downloads +``` +```sh +tar xf a10_gx_pac_ias_1_1_pv_rte_installer.tar.gz +``` +```sh +cd a10_gx_pac_ias_1_1_pv_rte_installer +``` +```sh +sudo ./setup.sh +``` + +4. Select **Y** to install OPAE and accept license and when asked, specify `~/tools/intelrtestack` as the install path. During the installation there should be a message stating the directory already exists as it was created in the first command above. Select Y to install to this directory. If this message is not seen, it suggests that there was a typo when entering the install location. + +5. Tools are installed to the following directories: + * `Intel® Quartus® software Programmer: ~/tools/inteltrestack/intelFPGA_pro/qprogrammer` + * `OpenCL™ Run Time Environment: ~/tools/intelrtestack/intelFPGA_pro/aclrte-linux64` + * `Intel® Acceleration Stack for FPGAs: ~/tools/intelrtestack/a10_gx_pac_ias_1_1_pv` + +6. Install E10/E40 Software Patch +```sh +source ~/tools/intelrtestack/init_env.sh +``` +```sh +cd $OPAE_PLATFORM_ROOT/hw +``` +```sh +sudo wget https://www.intel.com/content/dam/altera-www/global/en_US/others/solutions/acceleration-hub/a10_gx_pac_ias_1_1_pv_eth.patch +``` +```sh +sudo patch -s -p0 < a10_gx_pac_ias_1_1_pv_eth.patch +``` + +7. Check the version of the FPGA Interface Manager firmware on the PAC board. +```sh +sudo fpgainfo fme +``` + +8. If the reported `Pr Interface Id` is not `9926ab6d-6c92-5a68-aabc-a7d84c545738` then follow the instructions in section 4 of the [Intel® Acceleration Stack for FPGAs Quick Start Guide](https://www.intel.com/content/dam/altera-www/global/en_US/pdfs/literature/ug/ug-qs-ias-v1-1.pdf) to update the FME. + +9. Run the built in self-test to verify operation of the Acceleration Stack and Intel® PAC in a non-virtualized environment. +```sh +sudo sh -c "echo 20 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages" +``` +```sh +sudo fpgabist $OPAE_PLATFORM_ROOT/hw/samples/nlb_mode_3/bin/nlb_mode_3.gbs +``` + +## Extract and Verify the Intel® Acceleration Stack for FPGAs OpenCL™ BSP + +1. Extract the BSP +```sh +cd $OPAE_PLATFORM_ROOT/opencl +``` +```sh +sudo tar xf opencl_bsp.tar.gz +``` + +2. Create an initialization script `~/init_openvino.sh` with the following content that can be run upon opening a new terminal or rebooting. This will source the script ran above as well as setting up the OpenCL™ environment. +```sh +source \$HOME/tools/intelrtestack/init_env.sh +``` +```sh +export CL_CONTEXT_COMPILER_MODE_ALTERA=3 +``` +```sh +export CL_CONTEXT_COMPILER_MODE_INTELFPGA=3 +``` +```sh +export INTELFPGAOCLSDKROOT="\$HOME/tools/intelrtestack/intelFPGA_pro/aclrte-linux64" +``` +```sh +export ALTERAOCLSDKROOT="\$INTELFPGAOCLSDKROOT" +``` +```sh +export AOCL_BOARD_PACKAGE_ROOT="\$OPAE_PLATFORM_ROOT/opencl/opencl_bsp" +``` +```sh +\$AOCL_BOARD_PACKAGE_ROOT/linux64/libexec/setup_permissions.sh``` +```sh +source $INTELFPGAOCLSDKROOT/init_opencl.sh +``` + +3. Source the script: +```sh +source ~/init_openvino.sh +``` + +4. Some of the settings made in the child scripts need a reboot to take effect. Reboot the machine and source the script again. Note that this script should be sourced each time a new terminal is opened for use with the Intel® Acceleration Stack for FPGAs and Intel® Distribution of OpenVINO™ toolkit. +```sh +source ~/init_openvino.sh +``` + +5. Install the OpenCL™ driver: +```sh +cd ~ +``` +```sh +sudo -E ./tools/intelrtestack/intelFPGA_pro/aclrte-linux64/bin/aocl install +``` +Select **Y** when asked to install the BSP. Note that the following warning can be safely ignored. +```sh +WARNING: install not implemented. Please refer to DCP Quick Start User Guide. +``` + +6. Program the Intel® PAC board with a pre-compiled `.aocx` file (OpenCL™ based FPGA bitstream). +```sh +cd \$OPAE_PLATFORM_ROOT/opencl +``` +```sh +aocl program acl0 hello_world.aocx +``` + +7. Build and run the Hello World application: +```sh +sudo tar xf exm_opencl_hello_world_x64_linux.tgz +``` +```sh +sudo chmod -R a+w hello_world +``` +```sh +cd hello_world +``` +```sh +make +``` +```sh +cp ../hello_world.aocx ./bin +``` +```sh +./bin/host +``` + +## Add Intel® Distribution of OpenVINO™ toolkit with FPGA Support to Environment Variables + +1. To run the Intel® Distribution of OpenVINO™ toolkit, add the last four commands to the `~/init_openvino.sh` script. The previous content is shown as well. +```sh +source \$HOME/tools/intelrtestack/init_env.sh +export CL_CONTEXT_COMPILER_MODE_ALTERA=3 +export CL_CONTEXT_COMPILER_MODE_INTELFPGA=3 +export INTELFPGAOCLSDKROOT="\$HOME/tools/intelrtestack/intelFPGA_pro/aclrte-linux64" +export ALTERAOCLSDKROOT="\$INTELFPGAOCLSDKROOT" +export AOCL_BOARD_PACKAGE_ROOT="\$OPAE_PLATFORM_ROOT/opencl/opencl_bsp" +\$AOCL_BOARD_PACKAGE_ROOT/linux64/libexec/setup_permissions.sh +source $INTELFPGAOCLSDKROOT/init_opencl.sh +export IE_INSTALL="/opt/intel/openvino/deployment_tools" +source \$IE_INSTALL/../bin/setupvars.sh +export PATH="\$PATH:\$HOME/inference_engine_samples/intel64/Release" +alias mo="python3.6 \$IE_INSTALL/model_optimizer/mo.py" +``` + +2. Source the script +```sh +source ~/init_openvino.sh +``` + +## Program a Bitstream + +The bitstream you program should correspond to the topology you want to deploy. In this section, you program a SqueezeNet bitstream and deploy the classification sample with a SqueezeNet model. + +> **IMPORTANT**: Only use bitstreams from the installed version of the Intel® Distribution of OpenVINO™ toolkit. Bitstreams from older versions of the Intel® Distribution of OpenVINO™ toolkit are incompatible with later versions. For example, you cannot use the `1-0-1_RC_FP16_Generic` bitstream, when the Intel® Distribution of OpenVINO™ toolkit supports the `2-0-1_RC_FP16_Generic bitstream`. + +There are different folders for each FPGA card type which were downloaded in the Intel® Distribution of OpenVINO™ toolkit package. +For the Intel® Programmable Acceleration Card with Intel® Arria® 10 FPGA GX, the pre-trained bitstreams are in the `/opt/intel/openvino/bitstreams/a10_dcp_bitstreams` directory. This example uses a SqueezeNet bitstream with low precision for the classification sample. + +Program the bitstream for Intel® Programmable Acceleration Card with Intel® Arria® 10 FPGA GX: +```sh +aocl program acl0 /opt/intel/openvino/bitstreams/a10_dcp_bitstreams/5-0_RC_FP11_SqueezeNet.aocx +``` + +## Use the Intel® Distribution of OpenVINO™ toolkit + +1. Run inference with the Intel® Distribution of OpenVINO™ toolkit independent of the demo scripts using the SqueezeNet model that was download by the scripts. For convenience, copy the necessary files to a local directory. If the workstation has been rebooted or a new terminal is opened, source the script above first. +```sh +mkdir ~/openvino_test +``` +```sh +cd ~/openvino_test +``` +```sh +cp ~/openvino_models/classification/squeezenet/1.1/caffe/squeezenet1.1.* . +``` +```sh +cp ~/openvino_models/ir/squeezenet1.1/squeezenet1.1.labels . +``` + +2. Note that the `squeezenet1.1.labels` file contains the classes used by ImageNet and is included here so that the inference results show text rather than classification numbers. Convert the model with the [Model Optimizer](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html). Note that the command below uses the alias defined in the script above and is not referred to in other documentation. +```sh +mo --input_model squeezenet1.1.caffemodel +``` + +3. Now run Inference on the CPU using one of the built in Inference Engine samples: +```sh +classification_sample_async -m squeezenet1.1.xml -i $IE_INSTALL/demo/car.png +``` + +4. Add the `-d` option to run on FPGA: +```sh +classification_sample_async -m squeezenet1.1.xml -i $IE_INSTALL/demo/car.png -d HETERO:FPGA,CPU +``` + +5. Increase the number of iterations with the `-ni` option to reduce the impact of initialization: +```sh +classification_sample_async -m squeezenet1.1.xml -i $IE_INSTALL/demo/car.png -d HETERO:FPGA,CPU -ni 100 +``` + +Congratulations, You are done with the Intel® Distribution of OpenVINO™ toolkit installation for FPGA. To learn more about how the Intel® Distribution of OpenVINO™ toolkit works, the Hello World tutorial and are other resources are provided below. + +## Hello World Face Detection Tutorial + +Use the [Intel® Distribution of OpenVINO™ toolkit with FPGA Hello World Face Detection Exercise](https://github.com/fritzboyle/openvino-with-fpga-hello-world-face-detection) to learn more about how the software and hardware work together. + +## Additional Resources + +Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) + +Intel® Distribution of OpenVINO™ toolkit documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org) + +Inference Engine FPGA plugin documentation: [https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html) diff --git a/docs/install_guides/PAC_Configure_2019RX.md b/docs/install_guides/PAC_Configure_2019RX.md new file mode 100644 index 00000000000000..867215540e4881 --- /dev/null +++ b/docs/install_guides/PAC_Configure_2019RX.md @@ -0,0 +1,252 @@ +# Configuration Guide for Intel® Distribution of OpenVINO™ toolkit 2019R1/2019R2/2019R3/2020.1 and the Intel® Programmable Acceleration Card with Intel® Arria® 10 FPGA GX on CentOS or Ubuntu* {#openvino_docs_install_guides_PAC_Configure_2019RX} + +## Get Started + +The following describes the set-up of the Intel® Distribution of OpenVINO™ toolkit on CentOS* 7.4 or Ubuntu* 16.04, kernel 4.15. This is based upon a completely fresh install of the OS with developer tools included. This document was written for the Intel® Distribution of OpenVINO™ toolkit 2019 release 1, 2, and 3 and may be largely applicable for later versions. Official Intel® documentation for the install process can be found in the following locations and it is highly recommended that these are read, especially for new users. This document serves as a guide, and in some cases, adds additional detail where necessary. + +[Intel® Acceleration Stack for FPGAs Quick Start Guide](https://www.intel.com/content/dam/altera-www/global/en_US/pdfs/literature/ug/ug-qs-ias-v1-1.pdf) + +[OpenCL™ on Intel® PAC Quick Start Guide](https://www.intel.com/content/dam/altera-www/global/en_US/pdfs/literature/ug/ug-qs-ias-opencl-a10-v1-1.pdf) + +[Installing the Intel® Distribution of OpenVINO™ toolkit for Linux*](https://docs.openvinotoolkit.org/latest/_docs_install_guides_installing_openvino_linux.html) + +(Optional): Install NTFS support for transferring large installers if already downloaded on another machine. +```sh +sudo yum -y install epel-release +``` +```sh +sudo yum -y install ntfs-3g +``` + +## Install Intel® PAC and the Intel® Programmable Acceleration Card Stack + +1. Download version 1.2 of the Acceleration Stack for Runtime from the [Intel FPGA Acceleration Hub](https://www.altera.com/solutions/acceleration-hub/downloads.html). +This downloads as `a10_gx_pac_ias_1_2_pv_rte_installer.tar.gz`. Let it download to `~/Downloads`. + +2. Create a new directory to install to: +```sh +mkdir -p ~/tools/intelrtestack +``` + +3. Untar and launch the installer: +```sh +cd ~/Downloads +``` +```sh +tar xf a10_gx_pac_ias_1_2_pv_rte_installer.tar.gz +``` +```sh +cd a10_gx_pac_ias_1_2_pv_rte_installer +``` +```sh +./setup.sh +``` + +4. Select **Y** to install OPAE and accept license and when asked, specify `/home//tools/intelrtestack` as the absolute install path. During the installation there should be a message stating the directory already exists as it was created in the first command above. Select **Y** to install to this directory. If this message is not seen, it suggests that there was a typo when entering the install location. + +5. Tools are installed to the following directories: + * OpenCL™ Run-time Environment: `~/tools/intelrtestack/opencl_rte/aclrte-linux64` + * Intel® Acceleration Stack for FPGAs: `~/tools/intelrtestack/a10_gx_pac_ias_1_2_pv` + +7. Check the version of the FPGA Interface Manager firmware on the PAC board. +```sh +sudo fpgainfo fme +``` + +8. If the reported `Pr Interface Id` is not `69528db6-eb31-577a-8c36-68f9faa081f6` then follow the instructions in section 4 of the [Intel® Acceleration Stack for FPGAs Quick Start Guide](https://www.intel.com/content/dam/altera-www/global/en_US/pdfs/literature/ug/ug-qs-ias-v1-2.pdf) to update the FME. + +9. Run the built in self-test to verify operation of the Acceleration Stack and Intel® PAC in a non-virtualized environment. +```sh +sudo sh -c "echo 20 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages" +``` +```sh +source ~/tools/intelrtestack/init_env.sh +``` +```sh +sudo fpgabist $OPAE_PLATFORM_ROOT/hw/samples/nlb_mode_3/bin/nlb_mode_3.gbs +``` + +## Verify the Intel® Acceleration Stack for FPGAs OpenCL™ BSP + +1. Remove any previous FCD files that may be from previous installations of hardware in the `/opt/Intel/OpenCL/Boards/` directory: +```sh +cd /opt/Intel/OpenCL/Boards +sudo rm -rf *.fcd +``` + +2. Install `lsb_release` on your system if you are using CentOS: +```sh +sudo yum install redhat-lsb-core +``` + +3. Create an initialization script `~/init_openvino.sh` with the following content that can be run upon opening a new terminal or rebooting. This will source the script ran above as well as setting up the OpenCL™ environment. +```sh +source $HOME/tools/intelrtestack/init_env.sh +``` +```sh +export CL_CONTEXT_COMPILER_MODE_ALTERA=3 +``` +```sh +export CL_CONTEXT_COMPILER_MODE_INTELFPGA=3 +``` +```sh +export INTELFPGAOCLSDKROOT="/opt/altera/aocl-pro-rte/aclrte-linux64" +``` +```sh +export ALTERAOCLSDKROOT="$INTELFPGAOCLSDKROOT" +``` +```sh +export AOCL_BOARD_PACKAGE_ROOT="$OPAE_PLATFORM_ROOT/opencl/opencl_bsp" +``` +```sh +$AOCL_BOARD_PACKAGE_ROOT/linux64/libexec/setup_permissions.sh +``` +```sh +source $INTELFPGAOCLSDKROOT/init_opencl.sh +``` + +4. Source the script: +```sh +source ~/init_openvino.sh +``` + +5. Some of the settings made in the child scripts need a reboot to take effect. Reboot the machine and source the script again. Note that this script should be sourced each time a new terminal is opened for use with the Intel® Acceleration Stack for FPGAs and Intel® Distribution of OpenVINO™ toolkit. +```sh +source ~/init_openvino.sh +``` + +6. Install the OpenCL™ driver: +```sh +cd ~ +``` +```sh +sudo -E ./tools/intelrtestack/opencl_rte/aclrte-linux64/bin/aocl install +``` +Select **Y** when asked to install the BSP. Note that the following warning can be safely ignored. +```sh +WARNING: install not implemented. Please refer to DCP Quick Start User Guide. +``` + +7. Program the Intel® PAC board with a pre-compiled `.aocx` file (OpenCL™ based FPGA bitstream). +```sh +cd $OPAE_PLATFORM_ROOT/opencl +``` +```sh +aocl program acl0 hello_world.aocx +``` + +8. Build and run the Hello World application: +```sh +sudo tar xf exm_opencl_hello_world_x64_linux.tgz +``` +```sh +sudo chmod -R a+w hello_world +``` +```sh +cd hello_world +``` +```sh +make +``` +```sh +cp ../hello_world.aocx ./bin +``` +```sh +./bin/host +``` + +## Add Intel® Distribution of OpenVINO™ toolkit with FPGA Support to Environment Variables + +1. To run the Intel® Distribution of OpenVINO™ toolkit, add the last four commands to the `~/init_openvino.sh` script. The previous content is shown as well. +```sh +source $HOME/tools/intelrtestack/init_env.sh +export CL_CONTEXT_COMPILER_MODE_ALTERA=3 +export CL_CONTEXT_COMPILER_MODE_INTELFPGA=3 +export INTELFPGAOCLSDKROOT="/opt/altera/aocl-pro-rte/aclrte-linux64" +export ALTERAOCLSDKROOT="$INTELFPGAOCLSDKROOT" +export AOCL_BOARD_PACKAGE_ROOT="$OPAE_PLATFORM_ROOT/opencl/opencl_bsp" +$AOCL_BOARD_PACKAGE_ROOT/linux64/libexec/setup_permissions.sh +source $INTELFPGAOCLSDKROOT/init_opencl.sh +export IE_INSTALL="/opt/intel/openvino/deployment_tools" +source $IE_INSTALL/../bin/setupvars.sh +export PATH="$PATH:$HOME/inference_engine_samples_build/intel64/Release" +alias mo="python3.6 $IE_INSTALL/model_optimizer/mo.py" +``` +For Ubuntu systems, it is recommended to use python3.5 above instead of python3.6. + +2. Source the script +```sh +source ~/init_openvino.sh +``` + +## Program a Bitstream + +The bitstream you program should correspond to the topology you want to deploy. In this section, you program a SqueezeNet bitstream and deploy the classification sample with a SqueezeNet model. + +> **IMPORTANT**: Only use bitstreams from the installed version of the Intel® Distribution of OpenVINO™ toolkit. Bitstreams from older versions of the Intel® Distribution of OpenVINO™ toolkit are incompatible with later versions. For example, you cannot use the `1-0-1_RC_FP16_Generic` bitstream, when the Intel® Distribution of OpenVINO™ toolkit supports the `2-0-1_RC_FP16_Generic bitstream`. + +There are different folders for each FPGA card type which were downloaded in the Intel® Distribution of OpenVINO™ toolkit package. +For the Intel® Programmable Acceleration Card with Intel® Arria® 10 FPGA GX, the pre-trained bitstreams are in the `/opt/intel/openvino/bitstreams/a10_dcp_bitstreams` directory. This example uses a SqueezeNet bitstream with low precision for the classification sample. + +Program the bitstream for Intel® Programmable Acceleration Card with Intel® Arria® 10 FPGA GX. +For R1: +```sh +aocl program acl0 /opt/intel/openvino/bitstreams/a10_dcp_bitstreams/2019R1_RC_FP11_ResNet_SqueezeNet_VGG.aocx +``` +Or for R2: +```sh +aocl program acl0 /opt/intel/openvino/bitstreams/a10_dcp_bitstreams/2019R2_RC_FP11_ResNet_SqueezeNet_VGG.aocx +``` +Or for R3: +```sh +aocl program acl0 /opt/intel/openvino/bitstreams/a10_dcp_bitstreams/2019R3_PV_RC_FP11_InceptionV1_ResNet_SqueezeNet_TinyYolo_VGG.aocx +``` +Or for 2020.1: +```sh +aocl program acl0 /opt/intel/openvino/bitstreams/a10_dcp_bitstreams/2019R4_RC_FP11_ResNet_SqueezeNet_TinyYolo.aocx +``` + +## Use the Intel® Distribution of OpenVINO™ toolkit + +1. Run inference with the Intel® Distribution of OpenVINO™ toolkit independent of the demo scripts using the SqueezeNet model that was download by the scripts. For convenience, copy the necessary files to a local directory. If the workstation has been rebooted or a new terminal is opened, source the script above first. +```sh +mkdir ~/openvino_test +``` +```sh +cd ~/openvino_test +``` +```sh +cp ~/openvino_models/models/public/squeezenet1.1/squeezenet1.1.* . +``` +```sh +cp ~/openvino_models/ir/public/squeezenet1.1/FP16/squeezenet1.1.labels . +``` + +2. Note that the `squeezenet1.1.labels` file contains the classes used by ImageNet and is included here so that the inference results show text rather than classification numbers. Convert the model with the [Model Optimizer](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html). Note that the command below uses the alias defined in the script above and is not referred to in other documentation. +```sh +mo --input_model squeezenet1.1.caffemodel +``` + +3. Now run Inference on the CPU using one of the built in Inference Engine samples: +```sh +classification_sample_async -m squeezenet1.1.xml -i $IE_INSTALL/demo/car.png +``` + +4. Add the `-d` option to run on FPGA: +```sh +classification_sample_async -m squeezenet1.1.xml -i $IE_INSTALL/demo/car.png -d HETERO:FPGA,CPU +``` + +Congratulations, You are done with the Intel® Distribution of OpenVINO™ toolkit installation for FPGA. To learn more about how the Intel® Distribution of OpenVINO™ toolkit works, the Hello World tutorial and are other resources are provided below. + +## Hello World Face Detection Tutorial + +Use the [Intel® Distribution of OpenVINO™ toolkit with FPGA Hello World Face Detection Exercise](https://github.com/fritzboyle/openvino-with-fpga-hello-world-face-detection) to learn more about how the software and hardware work together. + +## Additional Resources + +Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) + +Intel® Distribution of OpenVINO™ toolkit documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org) + +Inference Engine FPGA plugin documentation: [https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html) diff --git a/docs/install_guides/VisionAcceleratorFPGA_Configure.md b/docs/install_guides/VisionAcceleratorFPGA_Configure.md new file mode 100644 index 00000000000000..4bc05bc40710c8 --- /dev/null +++ b/docs/install_guides/VisionAcceleratorFPGA_Configure.md @@ -0,0 +1,326 @@ +# Configuration Guide for the Intel® Distribution of OpenVINO™ toolkit 2020.4 and the Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA SG2 (IEI's Mustang-F100-A10) on Linux* {#openvino_docs_install_guides_VisionAcceleratorFPGA_Configure} + +> **NOTE**: Intel® Arria® 10 FPGA (Mustang-F100-A10) Speed Grade 1 is not available since the OpenVINO 2020.3 release. If you use Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Mustang-F100-A10) Speed Grade 1, we recommend continuing to use the [Intel® Distribution of OpenVINO™ toolkit 2020.1](https://docs.openvinotoolkit.org/2020.1/_docs_install_guides_VisionAcceleratorFPGA_Configure.html) release. +For previous versions, see [Configuration Guide for OpenVINO 2019R3](https://docs.openvinotoolkit.org/2019_R3.1/_docs_install_guides_VisionAcceleratorFPGA_Configure_2019R3.html), [Configuration Guide for OpenVINO 2019R1](https://docs.openvinotoolkit.org/2019_R3.1/_docs_install_guides_VisionAcceleratorFPGA_Configure_2019R1.html), [Configuration Guide for OpenVINO 2018R5](https://docs.openvinotoolkit.org/2019_R3.1/_docs_install_guides_VisionAcceleratorFPGA_Configure_2018R5.html). + +## 1. Configure and Set Up the Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA + +1. Download [Intel® Quartus® Prime Programmer and Tools Standard Edition 18.1](http://fpgasoftware.intel.com/18.1/?edition=standard&platform=linux&download_manager=direct#tabs-4). Install the Intel® Quartus® Prime Programmer and Tools Software to the `/home//intelFPGA/18.1` directory. + +2. Download the [fpga_install.sh](https://docs.openvinotoolkit.org/downloads/2020/2/fpga_install.sh) script to the `/home/` directory. + + a. Switch to superuser: +```sh +sudo su +``` + b. Use the `fpga_install.sh` script from `/home/` to install your FPGA card (default is SG2). +```sh +source /home//fpga_install.sh +``` + c. To know more about the fpga_install options, invoke the script with `-h` command. +```sh +source /home//fpga_install.sh -h +``` + d. Follow the `fpga_install.sh` script prompts to finish installing your FPGA card. + + e. After reboot launch the script again with same options as in step 2.b. + + f. The `fpga_install.sh` script creates an initialization script `/home//init_openvino.sh` that should be used to setup proper environment variables. + + g. To test if FPGA card was installed succesfully run `aocl diagnose`: +```sh +aocl diagnose +``` +You should see `DIAGNOSTIC_PASSED` before proceeding to the next steps. + + h. If you prefer to install the FPGA card manually, follow the steps 3-17 in this section and [Steps to Flash the FPGA Card](#steps-to-flash-the-fpga-card), otherwise you can skip to "Program a Bitstream". + +3. Check if /etc/udev/rules.d/51-usbblaster.rules file exists and content matches with 3.b, if it does skip to next step. + + a. Switch to superuser: +```sh +sudo su +``` + + b. Create a file named /etc/udev/rules.d/51-usbblaster.rules and add the following lines to it (Red Hat Enterprise 5 and above): +```sh +# Intel FPGA Download Cable +SUBSYSTEM=="usb", ATTR{idVendor}=="09fb", ATTR{idProduct}=="6001", MODE="0666" +SUBSYSTEM=="usb", ATTR{idVendor}=="09fb", ATTR{idProduct}=="6002", MODE="0666" +SUBSYSTEM=="usb", ATTR{idVendor}=="09fb", ATTR{idProduct}=="6003", MODE="0666" + +# Intel FPGA Download Cable II +SUBSYSTEM=="usb", ATTR{idVendor}=="09fb", ATTR{idProduct}=="6010", MODE="0666" +SUBSYSTEM=="usb", ATTR{idVendor}=="09fb", ATTR{idProduct}=="6810", MODE="0666" +``` +> **CAUTION**: Do not add extra line breaks to the .rules file. + + c. Reload udev rules without reboot: +```sh +udevadm control --reload-rules +udevadm trigger +``` + + d. You can exit superuser if you wish. + + +4. Unpack the BSP for your Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA SG2: +> **NOTE**: If you installed OpenVINO™ as root you will need to switch to superuser +```sh +cd /opt/intel/openvino/bitstreams/a10_vision_design_sg2_bitstreams/BSP/ +sudo su +tar -xvzf a10_1150_sg2_r4.1.tgz +chmod -R 755 /opt/intel/openvino/bitstreams/a10_vision_design_sg2_bitstreams +``` +> **NOTE**: If you do not know which version of the board you have, please refer to the product label on the fan cover side or by the product SKU: Mustang-F100-A10E-R10 => SG2 + +5. Create an initialization script `/home//init_openvino.sh` with the following content that can be run upon opening a new terminal or rebooting. This will setup your proper environment variables. +```sh +export IOCL_BOARD_PACKAGE_ROOT=/opt/intel/openvino/bitstreams/a10_vision_design_sg2_bitstreams/BSP/a10_1150_sg2 +export AOCL_BOARD_PACKAGE_ROOT=/opt/intel/openvino/bitstreams/a10_vision_design_sg2_bitstreams/BSP/a10_1150_sg2 +export QUARTUS_DIR=/home//intelFPGA/18.1/qprogrammer +export QUARTUS_ROOTDIR=/home//intelFPGA/18.1/qprogrammer +export INTELFPGAOCLSDKROOT=/opt/altera/aocl-pro-rte/aclrte-linux64 +source $INTELFPGAOCLSDKROOT/init_opencl.sh +export PATH=$PATH:$INTELFPGAOCLSDKROOT/host/linux64/bin:$QUARTUS_ROOTDIR/bin +export CL_CONTEXT_COMPILER_MODE_INTELFPGA=3 +source /opt/intel/openvino/bin/setupvars.sh +``` + +6. Source the script. (This assumes you already have installed the Intel® FPGA Runtime Environment for OpenCL Linux x86-64 Pro Edition 19.1) +```sh +source /home//init_openvino.sh +``` + +7. Uninstall any previous BSP before installing the OpenCL BSP for the 2020.4 BSP. Enter **Y** when prompted to uninstall (Enter sudo credentials when prompted): +```sh +aocl uninstall +``` + +8. Install the new BSP. Enter **Y** when prompted to install (Enter sudo credentials when prompted): +```sh +aocl install +``` + +9. Set up the USB Blaster: + + 1. Connect the cable between the board and the host system. Use the letter codes in the diagram below for the connection points: + + 2. Connect the B end of the cable to point B on the board. + + 3. Connect the F end of the cable to point F on the FPGA download cable. + + 4. From point F end of the cable to point F on the FPGA download cable, the connection is as shown: +![](../img/VisionAcceleratorJTAG.png) + +10. Run `jtagconfig` to ensure that your Intel FPGA Download Cable driver is ready to use: +```sh +jtagconfig +``` +Your output is similar to: +```sh +1) USB-Blaster [1-6] +02E660DD 10AX115H1(.|E2|ES)/10AX115H2/.. +``` +or: +```sh +1) USB-BlasterII [3-3] + 02E660DD 10AX115H1(.|E2|ES)/10AX115H2/.. +``` + +11. Use `jtagconfig` to slow the clock. The message "No parameter named JtagClock" can be safely ignored. +```sh +jtagconfig --setparam 1 JtagClock 6M +``` + +12. (OPTIONAL) Confirm the clock is set to 6M: +```sh +jtagconfig --getparam 1 JtagClock +``` +You should see the following: +```sh +6M +``` + +13. Go to `/opt/intel/openvino/bitstreams/a10_vision_design_sg2_bitstreams/BSP/a10_1150_sg2/bringup`, where `sg2_boardtest_2ddr_base.sof`is located: +```sh +cd /opt/intel/openvino/bitstreams/a10_vision_design_sg2_bitstreams/BSP/a10_1150_sg2/bringup +``` + +14. Program the new sof file to the board: +```sh +quartus_pgm -c 1 -m JTAG -o "p;sg2_boardtest_2ddr_base.sof" +``` + +15. Soft reboot: +```sh +reboot +``` + +16. Source the environment variable script you made. +```sh +source /home//init_openvino.sh +``` + +17. Run `aocl diagnose`: +```sh +aocl diagnose +``` +Your screen displays `DIAGNOSTIC_PASSED`. + +> **NOTE**: at this point if you do not want to flash the FPGA Card you can go to "Program a Bitstream" + +### Steps to Flash the FPGA Card + +> **NOTE**: +> - To avoid having to reprogram the board after a power down, a bitstream will be programmed to permanent memory on the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA. This will take about 20 minutes. +> - The steps can be followed below in this guide to do this. + +18. Use `jtagconfig` to slow the clock. The message "No parameter named JtagClock" can be safely ignored. +```sh +jtagconfig --setparam 1 JtagClock 6M +``` + +19. Check if $QUARTUS_ROOTDIR/linux64/perl/bin exists +```sh +ls $QUARTUS_ROOTDIR/linux64/perl/bin +``` + +20. If you see message "ls: cannot access /home//intelFPGA/18.1/qprogrammer/linux64/perl/bin: No such file or directory" create perl/bin directory and a symbolic link to perl +```sh +mkdir -p $QUARTUS_ROOTDIR/linux64/perl/bin +ln -s /usr/bin/perl $QUARTUS_ROOTDIR/linux64/perl/bin/perl +``` + +21. If you see message "perl" go to the next step + +22. Go to `/opt/intel/openvino/bitstreams/a10_vision_design_sg2_bitstreams/BSP/a10_1150_sg2/bringup`, where `sg2_boardtest_2ddr_top.aocx` is located: +```sh +cd /opt/intel/openvino/bitstreams/a10_vision_design_sg2_bitstreams/BSP/a10_1150_sg2/bringup +``` + +23. Program the `sg2_boardtest_2ddr_top.aocx` file to the flash to be made permanently available even after power cycle: +```sh +sudo su +aocl flash acl0 sg2_boardtest_2ddr_top.aocx +``` +> **NOTE**: You will need the USB Blaster for this. + +24. Hard reboot the host system including powering off. + +25. Source the environment variable script you made. +```sh +source /home//init_openvino.sh +``` + +26. Check if the host system recognizes the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA board. Confirm you can detect the PCIe card: +```sh +lspci | grep -i Altera +``` +Your output is similar to: +```sh +01:00.0 Processing accelerators: Altera Corporation Device 2494 (rev 01) +``` + +27. Run `aocl diagnose`: +```sh +aocl diagnose +``` +You should see `DIAGNOSTIC_PASSED` before proceeding to the next steps. + +## 2. Program a Bitstream + +The bitstream you program should correspond to the topology you want to deploy. In this section, you program a SqueezeNet bitstream and deploy the classification sample with a SqueezeNet model that you used the Model Optimizer to convert in the steps before. + +> **IMPORTANT**: Only use bitstreams from the installed version of the Intel® Distribution of OpenVINO™ toolkit. Bitstreams from older versions of the Intel® Distribution of OpenVINO™ toolkit are incompatible with later versions of the Intel® Distribution of OpenVINO™ toolkit. For example, you cannot use the `2019R4_PL2_FP11_AlexNet_GoogleNet_Generic` bitstream, when the Intel® Distribution of OpenVINO™ toolkit supports the `2020-2_PL2_FP11_AlexNet_GoogleNet_Generic` bitstream. + +Depending on how many bitstreams you selected, there are different folders for each FPGA card type which were downloaded in the Intel® Distribution of OpenVINO™ toolkit package: + +1. For the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA SG2, the pre-trained bitstreams are in `/opt/intel/openvino/bitstreams/a10_vision_design_sg2_bitstreams/`. This example uses a SqueezeNet bitstream with low precision for the classification sample. + +2. Source the environment variable script you made. +```sh +source /home//init_openvino.sh +``` + +3. Change to your home directory: +```sh +cd /home/ +``` + +4. Program the bitstream for the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA SG2: +```sh +aocl program acl0 /opt/intel/openvino/bitstreams/a10_vision_design_sg2_bitstreams/2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx +``` + +## 3. Set up a Sample Neural Network Model for FPGA + +> **NOTE**: The SqueezeNet Caffe* model was already downloaded and converted to an FP16 IR when you ran the Image Classification Verification Script while [installing the Intel® Distribution of OpenVINO™ toolkit for Linux* with FPGA Support](installing-openvino-linux-fpga.md). Read this section only if you want to convert the model manually, otherwise skip and go to the next section to run the Image Classification sample application. + +In this section, you will create an FP16 model suitable for hardware accelerators. For more information, see the [FPGA plugin](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html) section in the Inference Engine Developer Guide. + + +1. Create a directory for the FP16 SqueezeNet Model: +```sh +mkdir ~/squeezenet1.1_FP16 +``` + +2. Go to `~/squeezenet1.1_FP16`: +```sh +cd ~/squeezenet1.1_FP16 +``` + +3. Use the Model Optimizer to convert the FP16 SqueezeNet Caffe* model into an FP16 optimized Intermediate Representation (IR). The model files were downloaded when you ran the the Image Classification verification script while [installing the Intel® Distribution of OpenVINO™ toolkit for Linux* with FPGA Support](installing-openvino-linux-fpga.md). To convert, run the Model Optimizer script with the following arguments: +```sh +python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model ~/openvino_models/models/public/squeezenet1.1/squeezenet1.1.caffemodel --data_type FP16 --output_dir . +``` + +4. The `squeezenet1.1.labels` file contains the classes `ImageNet` uses. This file is included so that the inference results show text instead of classification numbers. Copy `squeezenet1.1.labels` to the your optimized model location: +```sh +cp /opt/intel/openvino/deployment_tools/demo/squeezenet1.1.labels . +``` + +5. Copy a sample image to the release directory. You will use this with your optimized model: +```sh +cp /opt/intel/openvino/deployment_tools/demo/car.png ~/inference_engine_samples_build/intel64/Release +``` + +## 4. Run the Image Classification Sample Application + +In this section you will run the Image Classification sample application, with the Caffe* Squeezenet1.1 model on your Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA. + +Image Classification sample application binary file was automatically built and the FP16 model IR files are created when you ran the Image Classification Verification Script while [installing the Intel® Distribution of OpenVINO™ toolkit for Windows* with FPGA Support](installing-openvino-windows-fpga.md): +* Compiled sample Application binaries are located in the `~/inference_engine_samples_build/intel64/Release` directory. +* Generated IR files are in the `~/openvino_models/ir/public/squeezenet1.1/FP16/` directory. + + +1. Go to the samples directory +```sh +cd ~/inference_engine_samples_build/intel64/Release +``` + +2. Use an Inference Engine sample to run a sample inference on the CPU: +```sh +./classification_sample_async -i car.png -m ~/openvino_models/ir/public/squeezenet1.1/FP16/squeezenet1.1.xml +``` +Note the CPU throughput in Frames Per Second (FPS). This tells you how quickly the inference is done on the hardware. Now run the inference using the FPGA. + +3. Add the `-d` option to target the FPGA: +```sh +./classification_sample_async -i car.png -m ~/openvino_models/ir/public/squeezenet1.1/FP16/squeezenet1.1.xml -d HETERO:FPGA,CPU +``` +The throughput on FPGA is listed and may show a lower FPS. This may be due to the initialization time. To account for that, increase the number of iterations or batch size when deploying to get a better sense of the speed the FPGA can run inference at. + +Congratulations, you are done with the Intel® Distribution of OpenVINO™ toolkit installation for FPGA. To learn more about how the Intel® Distribution of OpenVINO™ toolkit works, the Hello World tutorial and are other resources are provided below. + +## Hello World Face Detection Tutorial + +Use the [Intel® Distribution of OpenVINO™ toolkit with FPGA Hello World Face Detection Exercise](https://github.com/fritzboyle/openvino-with-fpga-hello-world-face-detection) to learn more about how the software and hardware work together. + +## Additional Resources + +Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) + +Intel® Distribution of OpenVINO™ toolkit documentation: [https://docs.openvinotoolkit.org/](https://docs.openvinotoolkit.org/) + +Inference Engine FPGA plugin documentation: [https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html) diff --git a/docs/install_guides/VisionAcceleratorFPGA_Configure_2018R5.md b/docs/install_guides/VisionAcceleratorFPGA_Configure_2018R5.md new file mode 100644 index 00000000000000..7f99be9a57ecfe --- /dev/null +++ b/docs/install_guides/VisionAcceleratorFPGA_Configure_2018R5.md @@ -0,0 +1,334 @@ +# Configuration Guide for the Intel® Distribution of OpenVINO™ toolkit 2018R5 and the Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (IEI's Mustang-F100-A10) on Linux* {#openvino_docs_install_guides_VisionAcceleratorFPGA_Configure_2018R5} + +> **NOTES:** +> * For a first-time installation, use all steps. +> * Use steps 1 and 2 only after receiving a new FPGA card. +> * Repeat steps 2-5 when installing a new version of the Intel® Distribution of OpenVINO™ toolkit. +> * Use steps 3-5 when a Neural Network topology used by an Intel® Distribution of OpenVINO™ toolkit application changes. + +## 1. Configure and Install the Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA + +1. Download `fpga_support_files.tgz` from the [Intel Registration Center](http://registrationcenter-download.intel.com/akdlm/irc_nas/12954/fpga_support_files.tgz). The files in this `.tgz` archive are required to ensure your FPGA card and the Intel® Distribution of OpenVINO™ toolkit work correctly. + +2. Go to the directory where you downloaded the `fpga_support_files.tgz` archive. + +3. Unpack the `.tgz` file: +```sh +tar -xvzf fpga_support_files.tgz +``` +A directory named `fpga_support_files` is created. + +4. Go to the `fpga_support_files` directory: +```sh +cd fpga_support_files +``` + +5. Source `setup_env.sh` to set your environment variables: +```sh +source /home//Downloads/fpga_support_files/setup_env.sh +``` + +6. Configure the FPGA Driver Blacklist: +```sh +sudo mv config/blacklist-altera-cvp.conf /etc/modprobe.d +``` + +7. Switch to superuser: +```sh +sudo su +``` + +8. Use the `setup_env.sh` script from `fpga_support_files.tgz` to set your environment variables: +```sh +source /home//Downloads/fpga_support_files/setup_env.sh +``` + +9. Change directory to `Downloads/fpga_support_files/`: +```sh +cd /home//Downloads/fpga_support_files/ +``` + +10. Run the FPGA dependencies script, which allows OpenCL to support Ubuntu* and recent kernels: +```sh +./install_openvino_fpga_dependencies.sh +``` + +11. When asked, select the FPGA card, Intel® GPU, and Intel® Movidius™ Neural Compute Stick, then you can install the correct dependencies. + +12. If you installed the 4.14 kernel as part of the installation script, you will need to reboot the machine and select the new kernel in the Ubuntu (grub) boot menu. You will also need to rerun `setup_env.sh` to set up your environmental variables again. + +13. Install OpenCL™ devices. Enter **Y** when prompted to install: +```sh +aocl install +``` + +14. Reboot the machine: +```sh +reboot +``` + +15. Use the `setup_env.sh` script from `fpga_support_files.tgz` to set your environment variables: +```sh +source /home//Downloads/fpga_support_files/setup_env.sh +``` + +16. Run `aocl diagnose`: +```sh +aocl diagnose +``` +Your screen displays `DIAGNOSTIC_PASSED`. + +## 2. Set Up the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA for 2018R5 + +For the 2018R5 release, the Intel® Distribution of OpenVINO™ toolkit introduced a new board support package (BSP) `a10_1150_sg1` for the Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA, which is included into the `fpga_support_files.tgz` archive. To program the bitstreams for the Intel® Distribution of OpenVINO™ toolkit R5, you need to program the BSP into the board using the USB blaster. + +> **NOTE**: These steps apply only if you update to the Intel® Distribution of OpenVINO™ toolkit R5. Otherwise, you can skip them. + +1. Go to the `config` folder of the `fpga_support_files` directory where the `a10_1150_sg1` is located: +```sh +cd /home//Downloads/fpga_support_files/config/ +``` + +2. Copy the `a10_1150_sg1` folder to the `board` directory: +```sh +sudo cp -rf a10_1150_sg1 /opt/altera/aocl-pro-rte/aclrte-linux64/board/ +``` + +3. Convert the BSP files from DOS to UNIX: +```sh +sudo chmod +x a10_1150_sg1 +find a10_1150_sg1 -type f -print0 | xargs -0 dos2unix +``` + +4. Set up the USB Blaster: + + 1. Connect the cable between the board and the host system. Use the letter codes in the diagram below for the connection points: + + 2. Connect the B end of the cable to point B on the board. + + 3. Connect the F end of the cable to point F on the FPGA download cable. + + 4. From point F end of the cable to point F on the FPGA download cable, the connection is as shown: +![](../img/VisionAcceleratorJTAG.png) + +5. Source the `setup_env.sh` script from the `fpga_support_files` to set up the environment variables: +```sh +source /home//Downloads/fpga_support_files/setup_env.sh +``` + +6. Update the Intel® FPGA Download Cable rules to program the board without root permissions and to flash the initialization bitstreams so that the Intel® FPGA Download Cable can communicate with the board: +```sh +sudo cp config/51-usbblaster.rules /etc/udev/rules.d +``` + +7. Load the USB rules: +```sh +sudo udevadm control --reload-rules && udevadm trigger +``` + +8. Unplug and re-plug the Intel® FPGA Download Cable to enable JTAG connection. + +9. Run `jtagconfig` to ensure that your Intel FPGA Download Cable driver is ready to use: +```sh +jtagconfig +``` +Your output is similar to: +```sh +1) USB-Blaster [1-6] +02E660DD 10AX115H1(.|E2|ES)/10AX115H2/.. +``` + +10. Download [Intel® Quartus® Prime Software Lite Edition 17.1](http://fpgasoftware.intel.com/17.1/?edition=lite). Install the Intel® Quartus® Prime Software Lite to the `/home//intelFPGA/17.1` directory. +> **NOTE**: You will need the complete the Intel® Quartus® Prime Software Lite version when you want to program the `boardtest_1ddr_top.aocx` into the flash for permanent availability. + +11. Export the Intel® Quartus® Prime Software Lite environment variable: +```sh +export QUARTUS_ROOTDIR=/home//intelFPGA/17.1/quartus +``` + +12. Use `jtagconfig` to slow the clock: +```sh +jtagconfig --setparam 1 JtagClock 6M +``` + +13. (OPTIONAL) Confirm the clock is set to 6M: +```sh +jtagconfig --getparam 1 JtagClock +``` +You should see the following: +```sh +6M +``` + +14. Go to `/opt/altera/aocl-pro-rte/aclrte-linux64/board/a10_1150_sg1/bringup`, where `boardtest_1ddr_top.aocx `is located: +```sh +cd /opt/altera/aocl-pro-rte/aclrte-linux64/board/a10_1150_sg1/bringup +``` + +15. Program the `boardtest_1ddr_top.aocx` file to the flash to be made permanently available even after power cycle: +```sh +aocl flash acl0 boardtest_1ddr_top.aocx +``` +> **NOTE**: You will need the USB Blaster for this. + +16. Reboot the host system. + +17. Check if the host system recognizes the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA board. Confirm you can detect the PCIe card: +```sh +lspci | grep -i Altera +``` +Your output is similar to: +```sh +01:00.0 Processing accelerators: Altera Corporation Device 2494 (rev 01) +``` + +18. Source the `setup_env.sh` script from the `fpga_support_files` directory to setup the environment variables: +```sh +source /home//Downloads/fpga_support_file/setup_env.sh +``` + +19. Uninstall the previous BSP before installing the OpenCL drivers for the R5 BSP: +```sh +aocl uninstall /opt/altera/aocl-pro-rte/aclrte-linux64/board// +``` + +20. Export and source the environment script: +```sh +export AOCL_BOARD_PACKAGE_ROOT=/opt/altera/aocl-pro-rte/aclrte-linux64/board/a10_1150_sg1 +``` +```sh +source /opt/altera/aocl-pro-rte/aclrte-linux64/init_opencl.sh +``` + +21. Install OpenCL™ devices: +```sh +aocl install +``` + +22. Run the `diagnose` command: +```sh +aocl diagnose +``` +You should see `DIAGNOSTIC_PASSED` before proceeding to the next steps. + +## 3. Program a Bitstream + +The bitstream you program should correspond to the topology you want to deploy. In this section, you program a SqueezeNet bitstream and deploy the classification sample with a SqueezeNet model that you used the Model Optimizer to convert in the steps before. + +> **IMPORTANT**: Only use bitstreams from the installed version of the Intel® Distribution of OpenVINO™ toolkit. Bitstreams from older versions of the Intel® Distribution of OpenVINO™ toolkit are incompatible with later versions of the Intel® Distribution of OpenVINO™ toolkit. For example, you cannot use the `1-0-1_A10DK_FP16_Generic` bitstream, when the Intel® Distribution of OpenVINO™ toolkit supports the `2-0-1_A10DK_FP16_Generic` bitstream. + +Depending on how many bitstreams you selected, there are different folders for each FPGA card type which were downloaded in the Intel® Distribution of OpenVINO™ toolkit package: + +1. For the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA the pre-trained bistreams are in `/opt/intel/openvino/bitstreams/a10_vision_design_bitstreams`. This example uses a SqueezeNet bitstream with low precision for the classification sample. + +2. Rerun the environment setup script: +```sh +source /home//Downloads/fpga_support_files/setup_env.sh +``` + +3. Change to your home directory: +```sh +cd /home/ +``` + +4. Program the bitstream for the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA: +```sh +aocl program acl0 /opt/intel/openvino/bitstreams/a10_vision_design_bitstreams/5-0_PL1_FP11_SqueezeNet.aocx +``` + +### Optional Steps to Flash the FPGA Card + +> **NOTE**: +> - To avoid having to reprogram the board after a power down, a bitstream will be programmed to permanent memory on the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA. This will take about 20 minutes. +> - The following steps 1-5 need to be done only once for a new Intel® Arria 10 FPGA card. + +1. Plug in the micro USB cable to the card and your host system. + +2. Run `jtagconfig` to ensure that the cable is properly inserted: +```sh +jtagconfig +``` + +3. Use `jtagconfig` to slow the clock: +```sh +jtagconfig --setparam 1 JtagClock 6M +``` + +4. Store the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA bistream on the board: +```sh +aocl flash acl0 /opt/intel/openvino/bitstreams/a10_vision_design_bitstreams/5-0_PL1_FP11_SqueezeNet.aocx +``` +Your output is similar to: +```sh +USB-BlasterII [1-14] +02E660DD 10AX115H1(.|E2|ES)/10AX115H2/.. +020A40DD 5M(1270ZF324|2210Z)/EPM2210 +``` + +## 4. Setup a Neural Network Model for FPGA + +In this section, you will create an FP16 model suitable for hardware accelerators. For more information, see the [FPGA plugin](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html) section in the Inference Engine Developer Guide. + + +1. Create a directory for the FP16 SqueezeNet Model: +```sh +mkdir /home//squeezenet1.1_FP16 +``` + +2. Go to `/home//squeezenet1.1_FP16`: +```sh +cd /home//squeezenet1.1_FP16 +``` + +3. Use the Model Optimizer to convert an FP16 SqueezeNet Caffe* model into an optimized Intermediate Representation (IR): +```sh +python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model /home//openvino_models/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.caffemodel --data_type FP16 --output_dir . +``` + +4. The `squeezenet1.1.labels` file contains the classes `ImageNet` uses. This file is included so that the inference results show text instead of classification numbers. Copy `squeezenet1.1.labels` to the your optimized model location: +```sh +cp /home//openvino_models/ir/squeezenet1.1/FP32/squeezenet1.1.labels . +``` + +5. Copy a sample image to the release directory. You will use this with your optimized model: +```sh +sudo cp /opt/intel/openvino/deployment_tools/demo/car.png ~/inference_engine_samples/intel64/Release +``` + +## 5. Run a Sample Application + +1. Go to the samples directory +```sh +cd /home//inference_engine_samples/intel64/Release +``` + +2. Use an Inference Engine sample to run a sample application on the CPU: +```sh +./classification_sample_async -i car.png -m ~/openvino_models/ir/squeezenet1.1/FP32/squeezenet1.1.xml +``` +Note the CPU throughput in Frames Per Second (FPS). This tells you how quickly the inference is done on the hardware. Now run the inference using the FPGA. + +3. Add the `-d` option to target the FPGA: +```sh +./classification_sample_async -i car.png -m ~/squeezenet1.1_FP16/squeezenet1.1.xml -d HETERO:FPGA,CPU +``` +The throughput on FPGA is listed and may show a lower FPS. This is due to the initialization time. To account for that, the next step increases the iterations to get a better sense of the speed the FPGA can run inference at. + +4. Use `-ni` to increase the number of iterations, This option reduces the initialization impact: +```sh +./classification_sample_async -i car.png -m ~/squeezenet1.1_FP16/squeezenet1.1.xml -d HETERO:FPGA,CPU -ni 100 +``` + +Congratulations, you are done with the Intel® Distribution of OpenVINO™ toolkit installation for FPGA. To learn more about how the Intel® Distribution of OpenVINO™ toolkit works, the Hello World tutorial and are other resources are provided below. + +## Hello World Face Detection Tutorial + +Use the [Intel® Distribution of OpenVINO™ toolkit with FPGA Hello World Face Detection Exercise](https://github.com/fritzboyle/openvino-with-fpga-hello-world-face-detection) to learn more about how the software and hardware work together. + +## Additional Resources + +Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) + +Intel® Distribution of OpenVINO™ toolkit documentation: [https://docs.openvinotoolkit.org/](https://docs.openvinotoolkit.org/) + +Inference Engine FPGA plugin documentation: [https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html) diff --git a/docs/install_guides/VisionAcceleratorFPGA_Configure_2019R1.md b/docs/install_guides/VisionAcceleratorFPGA_Configure_2019R1.md new file mode 100644 index 00000000000000..640f5387c38fa7 --- /dev/null +++ b/docs/install_guides/VisionAcceleratorFPGA_Configure_2019R1.md @@ -0,0 +1,285 @@ +# Configuration Guide for the Intel® Distribution of OpenVINO™ toolkit 2019R1 and the Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (IEI's Mustang-F100-A10) on Linux* {#openvino_docs_install_guides_VisionAcceleratorFPGA_Configure_2019R1} + +> **NOTES:** +> * For a first-time installation, use all steps. +> * Use step 1 only after receiving a new FPGA card. +> * Repeat steps 2-4 when installing a new version of the Intel® Distribution of OpenVINO™ toolkit. +> * Use steps 3-4 when a Neural Network topology used by an Intel® Distribution of OpenVINO™ toolkit application changes. + +## 1. Configure and Set Up the Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA + +For the 2019R1.x releases, the Intel® Distribution of OpenVINO™ toolkit introduced a new board support package (BSP) `a10_1150_sg1` for the Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA, which is included in the `fpga_support_files.tgz` archive below. To program the bitstreams for the Intel® Distribution of OpenVINO™ toolkit 2019R1.x, you need to program the BSP into the board using the USB blaster. + +1. Download [Intel® Quartus® Prime Programmer and Tools Standard Edition 18.1](http://fpgasoftware.intel.com/18.1/?edition=standard&platform=linux&download_manager=direct#tabs-4). Install the Intel® Quartus® Prime Programmer and Tools Software to the `/home//intelFPGA/18.1` directory. + +2. Download `fpga_support_files.tgz` from the [Intel Registration Center](http://registrationcenter-download.intel.com/akdlm/irc_nas/12954/fpga_support_files.tgz) to the `~/Downloads` directory. The files in this `.tgz` archive are required to ensure your FPGA card and the Intel® Distribution of OpenVINO™ toolkit work correctly. + +3. Go to the directory where you downloaded the `fpga_support_files.tgz` archive. + +4. Unpack the `.tgz` file: +```sh +tar -xvzf fpga_support_files.tgz +``` +A directory named `fpga_support_files` is created. + +5. Go to the `fpga_support_files` directory: +```sh +cd fpga_support_files +``` + +6. Switch to superuser: +```sh +sudo su +``` + +7. Use the `setup_env.sh` script from `fpga_support_files.tgz` to set your environment variables: +```sh +source /home//Downloads/fpga_support_files/setup_env.sh +``` + +8. Uninstall any previous BSP before installing the OpenCL BSP for the 2019R1.x BSP: +```sh +aocl uninstall /opt/altera/aocl-pro-rte/aclrte-linux64/board// +``` + +9. Change directory to `Downloads/fpga_support_files/`: +```sh +cd /home//Downloads/fpga_support_files/ +``` + +10. Run the FPGA dependencies script, which allows OpenCL to support Ubuntu* and recent kernels: +```sh +./install_openvino_fpga_dependencies.sh +``` + +11. When asked, select the appropriate hardware accelerators you plan to use so it installs the correct dependencies. + +12. If you installed the 4.14 kernel as part of the installation script, you will need to reboot the machine and select the new kernel in the Ubuntu (grub) boot menu. You will also need to rerun `setup_env.sh` to set up your environmental variables again. + +13. Export the Intel® Quartus® Prime Programmer environment variable: +```sh +export QUARTUS_ROOTDIR=/home//intelFPGA/18.1/qprogrammer +``` + +14. Set up the USB Blaster: + + 1. Connect the cable between the board and the host system. Use the letter codes in the diagram below for the connection points: + + 2. Connect the B end of the cable to point B on the board. + + 3. Connect the F end of the cable to point F on the FPGA download cable. + + 4. From point F end of the cable to point F on the FPGA download cable, the connection is as shown: +![](../img/VisionAcceleratorJTAG.png) + +15. Run `jtagconfig` to ensure that your Intel FPGA Download Cable driver is ready to use: +```sh +jtagconfig +``` +Your output is similar to: +```sh +1) USB-Blaster [1-6] +02E660DD 10AX115H1(.|E2|ES)/10AX115H2/.. +``` + +16. Use `jtagconfig` to slow the clock. The message "No parameter named JtagClock" can be safely ignored. +```sh +jtagconfig --setparam 1 JtagClock 6M +``` + +17. (OPTIONAL) Confirm the clock is set to 6M: +```sh +jtagconfig --getparam 1 JtagClock +``` +You should see the following: +```sh +6M +``` + +18. Go to `/opt/altera/aocl-pro-rte/aclrte-linux64/board/a10_1150_sg1/bringup`, where `sg1_boardtest_2ddr_base.sof`is located: +```sh +cd /opt/altera/aocl-pro-rte/aclrte-linux64/board/a10_1150_sg1/bringup +``` + +19. Program the new sof file to the board: +```sh +quartus_pgm -c 1 -m JTAG -o "p;sg1_boardtest_2ddr_base.sof" +``` + +20. Soft reboot: +```sh +sudo reboot +``` + +21. Open up a new terminal and restore sudo access and the environment variables: +```sh +sudo su +source /home//Downloads/fpga_support_files/setup_env.sh +``` + +22. Install OpenCL™ devices. Enter **Y** when prompted to install: +```sh +aocl install +``` + +23. Reboot the machine: +```sh +reboot +``` + +24. Open up a new terminal and restore sudo access and the environment variables: +```sh +sudo su +source /home//Downloads/fpga_support_files/setup_env.sh +export QUARTUS_ROOTDIR=/home//intelFPGA/18.1/qprogrammer +``` + +25. Run `aocl diagnose`: +```sh +aocl diagnose +``` +Your screen displays `DIAGNOSTIC_PASSED`. + +26. Use `jtagconfig` to slow the clock. The message "No parameter named JtagClock" can be safely ignored. +```sh +jtagconfig --setparam 1 JtagClock 6M +``` + +27. Go to `/opt/intel/openvino/bitstreams/a10_vision_design_bitstreams/`, where `2019R1_PL1_FP11_ResNet_SqueezeNet_VGG.aocx `is located: +```sh +cd /opt/intel/openvino/bitstreams/a10_vision_design_bitstreams/ +``` + +28. Program the `2019R1_PL1_FP11_ResNet_SqueezeNet_VGG.aocx` file to the flash to be made permanently available even after power cycle: +```sh +aocl flash acl0 2019R1_PL1_FP11_ResNet_SqueezeNet_VGG.aocx +``` +> **NOTE**: You will need the USB Blaster for this. + +29. Hard reboot the host system including powering off. + +30. Now Soft reboot the host system to ensure the new PCIe device is seen properly +```sh +reboot +``` + +31. Open up a new terminal and restore sudo access and the environment variables: +```sh +sudo su +source /home//Downloads/fpga_support_files/setup_env.sh +``` + +32. Check if the host system recognizes the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA board. Confirm you can detect the PCIe card: +```sh +lspci | grep -i Altera +``` +Your output is similar to: +```sh +01:00.0 Processing accelerators: Altera Corporation Device 2494 (rev 01) +``` + +33. Run `aocl diagnose`: +```sh +aocl diagnose +``` +You should see `DIAGNOSTIC_PASSED` before proceeding to the next steps. + +## 2. Program a Bitstream + +The bitstream you program should correspond to the topology you want to deploy. In this section, you program a SqueezeNet bitstream and deploy the classification sample with a SqueezeNet model that you used the Model Optimizer to convert in the steps before. + +> **IMPORTANT**: Only use bitstreams from the installed version of the Intel® Distribution of OpenVINO™ toolkit. Bitstreams from older versions of the Intel® Distribution of OpenVINO™ toolkit are incompatible with later versions of the Intel® Distribution of OpenVINO™ toolkit. For example, you cannot use the `1-0-1_A10DK_FP16_Generic` bitstream, when the Intel® Distribution of OpenVINO™ toolkit supports the `2-0-1_A10DK_FP16_Generic` bitstream. + +Depending on how many bitstreams you selected, there are different folders for each FPGA card type which were downloaded in the Intel® Distribution of OpenVINO™ toolkit package: + +1. For the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA the pre-trained bistreams are in `/opt/intel/openvino/bitstreams/a10_vision_design_bitstreams`. This example uses a SqueezeNet bitstream with low precision for the classification sample. + +2. Rerun the environment setup script: +```sh +source /home//Downloads/fpga_support_files/setup_env.sh +``` + +3. Change to your home directory: +```sh +cd /home/ +``` + +4. Program the bitstream for the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA: +```sh +aocl program acl0 /opt/intel/openvino/bitstreams/a10_vision_design_bitstreams/2019R1_PL1_FP11_ResNet_SqueezeNet_VGG.aocx +``` + +### Steps to Flash the FPGA Card + +> **NOTE**: +> - To avoid having to reprogram the board after a power down, a bitstream will be programmed to permanent memory on the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA. This will take about 20 minutes. +> - The steps can be followed in the [Configure and Setup the Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA](#1-configure-and-setup-the-intel-vision-accelerator-design-with-an-intel-arria-10-fpga) section of this guide from steps 14-18 and 28-36. + + +## 3. Setup a Neural Network Model for FPGA + +In this section, you will create an FP16 model suitable for hardware accelerators. For more information, see the [FPGA plugin](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html) section in the Inference Engine Developer Guide. + + +1. Create a directory for the FP16 SqueezeNet Model: +```sh +mkdir /home//squeezenet1.1_FP16 +``` + +2. Go to `/home//squeezenet1.1_FP16`: +```sh +cd /home//squeezenet1.1_FP16 +``` + +3. Use the Model Optimizer to convert the FP32 SqueezeNet Caffe* model into an FP16 optimized Intermediate Representation (IR). The model files were downloaded when you ran the the Image Classification verification script while [installing the Intel® Distribution of OpenVINO™ toolkit for Linux* with FPGA Support](installing-openvino-linux-fpga.md). To convert, run the Model Optimizer script with the following arguments: +```sh +python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model /home//openvino_models/models/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.caffemodel --data_type FP16 --output_dir . +``` + +4. The `squeezenet1.1.labels` file contains the classes `ImageNet` uses. This file is included so that the inference results show text instead of classification numbers. Copy `squeezenet1.1.labels` to the your optimized model location: +```sh +cp /home//openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.labels . +``` + +5. Copy a sample image to the release directory. You will use this with your optimized model: +```sh +sudo cp /opt/intel/openvino/deployment_tools/demo/car.png ~/inference_engine_samples_build/intel64/Release +``` + +## 4. Run a Sample Application + +1. Go to the samples directory +```sh +cd /home//inference_engine_samples_build/intel64/Release +``` + +2. Use an Inference Engine sample to run a sample application on the CPU: +```sh +./classification_sample_async -i car.png -m ~/openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.xml +``` +Note the CPU throughput in Frames Per Second (FPS). This tells you how quickly the inference is done on the hardware. Now run the inference using the FPGA. + +3. Add the `-d` option to target the FPGA: +```sh +./classification_sample_async -i car.png -m ~/squeezenet1.1_FP16/squeezenet1.1.xml -d HETERO:FPGA,CPU +``` +The throughput on FPGA is listed and may show a lower FPS. This is due to the initialization time. To account for that, the next step increases the iterations to get a better sense of the speed the FPGA can run inference at. + +4. Use `-ni` to increase the number of iterations, This option reduces the initialization impact: +```sh +./classification_sample_async -i car.png -m ~/squeezenet1.1_FP16/squeezenet1.1.xml -d HETERO:FPGA,CPU -ni 100 +``` + +Congratulations, you are done with the Intel® Distribution of OpenVINO™ toolkit installation for FPGA. To learn more about how the Intel® Distribution of OpenVINO™ toolkit works, the Hello World tutorial and are other resources are provided below. + +## Hello World Face Detection Tutorial + +Use the [Intel® Distribution of OpenVINO™ toolkit with FPGA Hello World Face Detection Exercise](https://github.com/fritzboyle/openvino-with-fpga-hello-world-face-detection) to learn more about how the software and hardware work together. + +## Additional Resources + +Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) + +Intel® Distribution of OpenVINO™ toolkit documentation: [https://docs.openvinotoolkit.org/](https://docs.openvinotoolkit.org/) + +Inference Engine FPGA plugin documentation: [https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html) diff --git a/docs/install_guides/VisionAcceleratorFPGA_Configure_2019R3.md b/docs/install_guides/VisionAcceleratorFPGA_Configure_2019R3.md new file mode 100644 index 00000000000000..369555f35f2f8a --- /dev/null +++ b/docs/install_guides/VisionAcceleratorFPGA_Configure_2019R3.md @@ -0,0 +1,285 @@ +# Configuration Guide for the Intel® Distribution of OpenVINO™ toolkit 2019R3 and the Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA SG1 and SG2 (IEI's Mustang-F100-A10) on Linux* {#openvino_docs_install_guides_VisionAcceleratorFPGA_Configure_2019R3} + +## 1. Configure and Set Up the Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA + +1. Download [Intel® Quartus® Prime Programmer and Tools Standard Edition 18.1](http://fpgasoftware.intel.com/18.1/?edition=standard&platform=linux&download_manager=direct#tabs-4). Install the Intel® Quartus® Prime Programmer and Tools Software to the `/home//intelFPGA/18.1` directory. + +2. Download `fpga_support_files.tgz` from the [Intel Registration Center](http://registrationcenter-download.intel.com/akdlm/irc_nas/12954/fpga_support_files.tgz) to the `~/Downloads` directory. The files in this `.tgz` archive are required to ensure your FPGA card and the Intel® Distribution of OpenVINO™ toolkit work correctly. + +3. Go to the directory where you downloaded the `fpga_support_files.tgz` archive. + +4. Unpack the `.tgz` file: +```sh +tar -xvzf fpga_support_files.tgz +``` +A directory named `fpga_support_files` is created. + +5. Switch to superuser: +```sh +sudo su +``` + +6. Change directory to `Downloads/fpga_support_files/`: +```sh +cd /home//Downloads/fpga_support_files/ +``` + +7. Copy the USB Blaster Rules file: +```sh +cp config/51-usbblaster.rules /etc/udev/rules.d +udevadm control --reload-rules +udevadm trigger +``` + +8. Copy aocl fixes for latest kernels: +```sh +cp fixes/Command.pm /opt/altera/aocl-pro-rte/aclrte-linux64/share/lib/perl/acl/ +cp config/blacklist-altera-cvp.conf /etc/modprobe.d/ +``` + +9. Copy flash files so we don't need a full Quartus installation: +```sh +cp -r config/aocl_flash/linux64/* /home//intelFPGA/18.1/qprogrammer/linux64 +``` + +10. Unpack the BSP for your appropriate Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA SG1 or SG2: +```sh +cd /opt/intel/openvino/bitstreams/a10_vision_design_sg<#>_bitstreams/BSP/ +tar -xvzf a10_1150_sg<#>_r3.tgz +chmod -R 755 /opt/intel/openvino/bitstreams/a10_vision_design_sg<#>_bitstreams +``` +> **NOTE**: If you do not know which version of the board you have, please refer to the product label on the fan cover side or by the product SKU: Mustang-F100-A10-R10 => SG1; Mustang-F100-A10E-R10 => SG2 + +11. Create an initialization script `/home//init_openvino.sh` with the following content that can be run upon opening a new terminal or rebooting. This will setup your proper environment variables. +```sh +export AOCL_BOARD_PACKAGE_ROOT=/opt/intel/openvino/bitstreams/a10_vision_design_sg<#>_bitstreams/BSP/a10_1150_sg<#> +export QUARTUS_ROOTDIR=/home//intelFPGA/18.1/qprogrammer +export PATH=$PATH:/opt/altera/aocl-pro-rte/aclrte-linux64/bin:/opt/altera/aocl-pro-rte/aclrte-linux64/host/linux64/bin:/home//intelFPGA/18.1/qprogrammer/bin +export INTELFPGAOCLSDKROOT=/opt/altera/aocl-pro-rte/aclrte-linux64 +export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$AOCL_BOARD_PACKAGE_ROOT/linux64/lib +export CL_CONTEXT_COMPILER_MODE_INTELFPGA=3 +source /opt/altera/aocl-pro-rte/aclrte-linux64/init_opencl.sh +source /opt/intel/openvino/bin/setupvars.sh +``` + +12. Source the script. +```sh +source /home//init_openvino.sh +``` + +13. Uninstall any previous BSP before installing the OpenCL BSP for the 2019R3 BSP: +```sh +aocl uninstall /opt/altera/aocl-pro-rte/aclrte-linux64/board// +``` + +14. Set up the USB Blaster: + + 1. Connect the cable between the board and the host system. Use the letter codes in the diagram below for the connection points: + + 2. Connect the B end of the cable to point B on the board. + + 3. Connect the F end of the cable to point F on the FPGA download cable. + + 4. From point F end of the cable to point F on the FPGA download cable, the connection is as shown: +![](../img/VisionAcceleratorJTAG.png) + +15. Run `jtagconfig` to ensure that your Intel FPGA Download Cable driver is ready to use: +```sh +jtagconfig +``` +Your output is similar to: +```sh +1) USB-Blaster [1-6] +02E660DD 10AX115H1(.|E2|ES)/10AX115H2/.. +``` + +16. Use `jtagconfig` to slow the clock. The message "No parameter named JtagClock" can be safely ignored. +```sh +jtagconfig --setparam 1 JtagClock 6M +``` + +17. (OPTIONAL) Confirm the clock is set to 6M: +```sh +jtagconfig --getparam 1 JtagClock +``` +You should see the following: +```sh +6M +``` + +18. Go to `/opt/intel/openvino/bitstreams/a10_vision_design_sg<#>_bitstreams/BSP/a10_1150_sg<#>/bringup`, where `sg<#>_boardtest_2ddr_base.sof`is located: +```sh +cd /opt/intel/openvino/bitstreams/a10_vision_design_sg<#>_bitstreams/BSP/a10_1150_sg<#>/bringup +``` + +19. Program the new sof file to the board: +```sh +quartus_pgm -c 1 -m JTAG -o "p;sg<#>_boardtest_2ddr_base.sof" +``` + +20. Soft reboot: +```sh +reboot +``` + +21. Source the environment variable script you made. +```sh +sudo su +source /home//init_openvino.sh +``` + +22. Install OpenCL™ devices. Enter **Y** when prompted to install: +```sh +aocl install +``` + +23. Reboot the machine: +```sh +reboot +``` + +24. Source the environment variable script you made. +```sh +sudo su +source /home//init_openvino.sh +``` + +25. Run `aocl diagnose`: +```sh +aocl diagnose +``` +Your screen displays `DIAGNOSTIC_PASSED`. + +26. Use `jtagconfig` to slow the clock. The message "No parameter named JtagClock" can be safely ignored. +```sh +jtagconfig --setparam 1 JtagClock 6M +``` + +27. Go to `/opt/intel/openvino/bitstreams/a10_vision_design_sg<#>_bitstreams/`, where `2019R3_PV_PL<#>_FP11_InceptionV1_SqueezeNet.aocx `is located: +```sh +cd /opt/intel/openvino/bitstreams/a10_vision_design_sg<#>_bitstreams/ +``` + +28. Program the `2019R3_PV_PL<#>_FP11_InceptionV1_SqueezeNet.aocx` file to the flash to be made permanently available even after power cycle: +```sh +aocl flash acl0 2019R3_PV_PL<#>_FP11_InceptionV1_SqueezeNet.aocx +``` +> **NOTE**: You will need the USB Blaster for this. + +29. Hard reboot the host system including powering off. + +30. Source the environment variable script you made. +```sh +sudo su +source /home//init_openvino.sh +``` + +31. Check if the host system recognizes the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA board. Confirm you can detect the PCIe card: +```sh +lspci | grep -i Altera +``` +Your output is similar to: +```sh +01:00.0 Processing accelerators: Altera Corporation Device 2494 (rev 01) +``` + +32. Run `aocl diagnose`: +```sh +aocl diagnose +``` +You should see `DIAGNOSTIC_PASSED` before proceeding to the next steps. + +## 2. Program a Bitstream + +The bitstream you program should correspond to the topology you want to deploy. In this section, you program a SqueezeNet bitstream and deploy the classification sample with a SqueezeNet model that you used the Model Optimizer to convert in the steps before. + +> **IMPORTANT**: Only use bitstreams from the installed version of the Intel® Distribution of OpenVINO™ toolkit. Bitstreams from older versions of the Intel® Distribution of OpenVINO™ toolkit are incompatible with later versions of the Intel® Distribution of OpenVINO™ toolkit. For example, you cannot use the `1-0-1_A10DK_FP16_Generic` bitstream, when the Intel® Distribution of OpenVINO™ toolkit supports the `2-0-1_A10DK_FP16_Generic` bitstream. + +Depending on how many bitstreams you selected, there are different folders for each FPGA card type which were downloaded in the Intel® Distribution of OpenVINO™ toolkit package: + +1. For the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA SG1 or SG2, the pre-trained bistreams are in `/opt/intel/openvino/bitstreams/a10_vision_design_sg<#>_bitstreams/`. This example uses a SqueezeNet bitstream with low precision for the classification sample. + +2. Source the environment variable script you made. +```sh +source /home//init_openvino.sh +``` + +3. Change to your home directory: +```sh +cd /home/ +``` + +4. Program the bitstream for the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA SG1 or SG2: +```sh +aocl program acl0 /opt/intel/openvino/bitstreams/a10_vision_design_sg<#>_bitstreams/2019R3_PV_PL<#>_FP11_InceptionV1_SqueezeNet.aocx +``` + +### Steps to Flash the FPGA Card + +> **NOTE**: +> - To avoid having to reprogram the board after a power down, a bitstream will be programmed to permanent memory on the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA. This will take about 20 minutes. +> - The steps can be followed above in this guide to do this. + + +## 3. Setup a Neural Network Model for FPGA + +In this section, you will create an FP16 model suitable for hardware accelerators. For more information, see the [FPGA plugin](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html) section in the Inference Engine Developer Guide. + + +1. Create a directory for the FP16 SqueezeNet Model: +```sh +mkdir ~/squeezenet1.1_FP16 +``` + +2. Go to `~/squeezenet1.1_FP16`: +```sh +cd ~/squeezenet1.1_FP16 +``` + +3. Use the Model Optimizer to convert the FP32 SqueezeNet Caffe* model into an FP16 optimized Intermediate Representation (IR). The model files were downloaded when you ran the the Image Classification verification script while [installing the Intel® Distribution of OpenVINO™ toolkit for Linux* with FPGA Support](installing-openvino-linux-fpga.md). To convert, run the Model Optimizer script with the following arguments: +```sh +python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo.py --input_model ~/openvino_models/models/FP16/public/squeezenet1.1/squeezenet1.1.caffemodel --data_type FP16 --output_dir . +``` + +4. The `squeezenet1.1.labels` file contains the classes `ImageNet` uses. This file is included so that the inference results show text instead of classification numbers. Copy `squeezenet1.1.labels` to the your optimized model location: +```sh +cp ~/openvino_models/ir/FP16/public/squeezenet1.1/squeezenet1.1.labels . +``` + +5. Copy a sample image to the release directory. You will use this with your optimized model: +```sh +cp /opt/intel/openvino/deployment_tools/demo/car.png ~/inference_engine_samples_build/intel64/Release +``` + +## 4. Run a Sample Application + +1. Go to the samples directory +```sh +cd ~/inference_engine_samples_build/intel64/Release +``` + +2. Use an Inference Engine sample to run a sample application on the CPU: +```sh +./classification_sample_async -i car.png -m ~/openvino_models/ir/FP16/public/squeezenet1.1/squeezenet1.1.xml +``` +Note the CPU throughput in Frames Per Second (FPS). This tells you how quickly the inference is done on the hardware. Now run the inference using the FPGA. + +3. Add the `-d` option to target the FPGA: +```sh +./classification_sample_async -i car.png -m ~/openvino_models/ir/FP16/public/squeezenet1.1/squeezenet1.1.xml -d HETERO:FPGA,CPU +``` +The throughput on FPGA is listed and may show a lower FPS. This may be due to the initialization time. To account for that, increase the number of iterations or batch size when deploying to get a better sense of the speed the FPGA can run inference at. + +Congratulations, you are done with the Intel® Distribution of OpenVINO™ toolkit installation for FPGA. To learn more about how the Intel® Distribution of OpenVINO™ toolkit works, the Hello World tutorial and are other resources are provided below. + +## Hello World Face Detection Tutorial + +Use the [Intel® Distribution of OpenVINO™ toolkit with FPGA Hello World Face Detection Exercise](https://github.com/fritzboyle/openvino-with-fpga-hello-world-face-detection) to learn more about how the software and hardware work together. + +## Additional Resources + +Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) + +Intel® Distribution of OpenVINO™ toolkit documentation: [https://docs.openvinotoolkit.org/](https://docs.openvinotoolkit.org/) + +Inference Engine FPGA plugin documentation: [https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html) diff --git a/docs/install_guides/VisionAcceleratorFPGA_Configure_Windows.md b/docs/install_guides/VisionAcceleratorFPGA_Configure_Windows.md new file mode 100644 index 00000000000000..0ee1065016050a --- /dev/null +++ b/docs/install_guides/VisionAcceleratorFPGA_Configure_Windows.md @@ -0,0 +1,115 @@ +# Configuration Guide for the Intel® Distribution of OpenVINO™ toolkit and the Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA SG2 (IEI's Mustang-F100-A10) on Windows* {#openvino_docs_install_guides_VisionAcceleratorFPGA_Configure_Windows} + +> **NOTE**: Intel® Arria® 10 FPGA (Mustang-F100-A10) Speed Grade 1 is not available in the OpenVINO 2020.3 package. + +## 1. Configure and Set Up the Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA + +1. Download [Intel® Quartus® Prime Programmer and Tools Standard Edition 18.1](http://fpgasoftware.intel.com/18.1/?edition=standard&platform=windows&download_manager=direct#tabs-4). Install the Intel® Quartus® Prime Programmer and Tools Software to the `C:\intelFPGA\18.1` directory. + +2. Download [OpenSSL](http://slproweb.com/download/Win64OpenSSL_Light-1_1_1f.exe). Install the OpenSSL and add the `\bin` path to your system `PATH` variable. + +3. Unpack the BSP for your Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA SG2: +Extract `Intel_vision_accel_win_driver_1.2_SG2.zip` from `C:\Program Files (x86)\IntelSWTools\openvino\a10_vision_design_sg2_bitstreams\BSP` to `C:\intelFPGA\19.2\aclrte-windows64\board` +5. Open an admin command prompt. +6. Setup your environment variables: +```sh +set INTELFPGAOCLSDKROOT=C:\intelFPGA\19.2\aclrte-windows64 +set AOCL_BOARD_PACKAGE_ROOT=%INTELFPGAOCLSDKROOT%\board\a10_1150_sg2 +set IOCL_BOARD_PACKAGE_ROOT=%INTELFPGAOCLSDKROOT%\board\a10_1150_sg2 +C:\intelFPGA\19.2\aclrte-windows64\init_opencl.bat +"C:\Program Files (x86)\IntelSWTools\openvino\bin\setupvars.bat" +``` +7. Uninstall any previous BSP before installing the OpenCL BSP for the 2020.3 BSP. Enter **Y** when prompted to uninstall: +```sh +aocl uninstall +``` +8. Install the new BSP. Enter **Y** when prompted to install +```sh +aocl install +``` +9. Run `aocl diagnose`: +```sh +aocl diagnose +``` +Your screen displays `DIAGNOSTIC_PASSED`. + +## 2. Program a Bitstream + +The bitstream you program should correspond to the topology you want to deploy. In this section, you program a SqueezeNet bitstream and deploy the classification sample with a SqueezeNet model that you used the Model Optimizer to convert in the steps before. + +> **IMPORTANT**: Only use bitstreams from the installed version of the Intel® Distribution of OpenVINO™ toolkit. Bitstreams from older versions of the Intel® Distribution of OpenVINO™ toolkit are incompatible with later versions of the Intel® Distribution of OpenVINO™ toolkit. For example, you cannot use the `2019R4_PL2_FP11_AlexNet_GoogleNet_Generic` bitstream, when the Intel® Distribution of OpenVINO™ toolkit supports the `2020-3_PL2_FP11_AlexNet_GoogleNet_Generic` bitstream. + +Depending on how many bitstreams you selected, there are different folders for each FPGA card type which were downloaded in the Intel® Distribution of OpenVINO™ toolkit package: + +1. For the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA SG2, the pre-trained bitstreams are in `C:\Program Files (x86)\IntelSWTools\openvino\a10_vision_design_sg2_bitstreams`. This example uses a SqueezeNet bitstream with low precision for the classification sample. + +2. Program the bitstream for the Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA SG2: +```sh +aocl program acl0 "C:\Program Files (x86)\IntelSWTools\openvino\a10_vision_design_sg2_bitstreams/2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx" +``` + +## 3. Set up a Sample Neural Network Model for FPGA + +> **NOTE**: The SqueezeNet Caffe* model was already downloaded and converted to an FP16 IR when you ran the Image Classification Verification Script while [installing the Intel® Distribution of OpenVINO™ toolkit for Windows* with FPGA Support](installing-openvino-windows-fpga.md). Read this section only if you want to convert the model manually, otherwise skip and go to the next section to run the Image Classification sample application. + +In this section, you will prepare a sample FP16 model suitable for hardware accelerators. For more information, see the [FPGA plugin](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html) section in the Inference Engine Developer Guide. + +1. Create a directory for the FP16 SqueezeNet Model: +```sh +mkdir %HOMEPATH%\squeezenet1.1_FP16 +``` + +2. Go to `%HOMEPATH%\squeezenet1.1_FP16`: +```sh +cd %HOMEPATH%\squeezenet1.1_FP16 +``` + +3. Use the Model Optimizer to convert the FP16 SqueezeNet Caffe* model into an FP16 optimized Intermediate Representation (IR). The model files were downloaded when you ran the the Image Classification verification script while [installing the Intel® Distribution of OpenVINO™ toolkit for Windows* with FPGA Support](installing-openvino-windows-fpga.md). To convert, run the Model Optimizer script with the following arguments: +```sh +python "C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\mo.py" --input_model %HOMEPATH%\Documents\Intel\OpenVINO\openvino_models\models\public\squeezenet1.1\squeezenet1.1.caffemodel --data_type FP16 --output_dir . +``` + +4. The `squeezenet1.1.labels` file contains the classes `ImageNet` uses. This file is included so that the inference results show text instead of classification numbers. Copy `squeezenet1.1.labels` to the your optimized model location: +```sh +xcopy "C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\demo\squeezenet1.1.labels" . +``` + +5. Copy a sample image to the release directory. You will use this with your optimized model: +```sh +xcopy "C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\demo\car.png" %HOMEPATH%\Documents\Intel\OpenVINO\inference_engine_samples_build\intel64\Release +``` + +## 4. Run the Image Classification Sample Application + +In this section you will run the Image Classification sample application, with the Caffe* Squeezenet1.1 model on your Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA. + +Image Classification sample application binary file was automatically built and the FP16 model IR files are created when you ran the Image Classification Verification Script while [installing the Intel® Distribution of OpenVINO™ toolkit for Windows* with FPGA Support](installing-openvino-windows-fpga.md): +* Compiled sample Application binaries are located in the `%HOMEPATH%\Documents\Intel\OpenVINO\inference_engine_samples_build\intel64\Release` folder. +* Generated IR files are in the `%HOMEPATH%\Documents\Intel\OpenVINO\openvino_models\ir\public\squeezenet1.1\FP16` folder. + +1. Go to the samples directory +```sh +cd %HOMEPATH%\Documents\Intel\OpenVINO\inference_engine_samples_build\intel64\Release +``` + +2. Use an Inference Engine sample to run a sample inference on the CPU: +```sh +classification_sample_async -i car.png -m %HOMEPATH%\Documents\Intel\OpenVINO\openvino_models\ir\public\squeezenet1.1\FP16\squeezenet1.1.xml +``` +Note the CPU throughput in Frames Per Second (FPS). This tells you how quickly the inference is done on the hardware. Now run the inference using the FPGA. + +3. Add the `-d` option to target the FPGA: +```sh +classification_sample_async -i car.png -m %HOMEPATH%\Documents\Intel\OpenVINO\openvino_models\ir\public\squeezenet1.1\FP16\squeezenet1.1.xml -d HETERO:FPGA,CPU +``` +The throughput on FPGA is listed and may show a lower FPS. This may be due to the initialization time. To account for that, increase the number of iterations or batch size when deploying to get a better sense of the speed the FPGA can run inference at. + +Congratulations, you are done with the Intel® Distribution of OpenVINO™ toolkit installation for FPGA. To learn more about how the Intel® Distribution of OpenVINO™ toolkit works, try the other resources that are provided below. + +## Additional Resources + +Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) + +Intel® Distribution of OpenVINO™ toolkit documentation: [https://docs.openvinotoolkit.org/](https://docs.openvinotoolkit.org/) + +Inference Engine FPGA plugin documentation: [https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_FPGA.html) diff --git a/docs/install_guides/deployment-manager-tool.md b/docs/install_guides/deployment-manager-tool.md new file mode 100644 index 00000000000000..30287641b6f3aa --- /dev/null +++ b/docs/install_guides/deployment-manager-tool.md @@ -0,0 +1,124 @@ +# OpenVINO™ Deployment Manager Guide {#openvino_docs_install_guides_deployment_manager_tool} + +The Deployment Manager of Intel® Distribution of OpenVINO™ creates a deployment package by assembling the model, IR files, your application, and associated dependencies into a runtime package for your target device. + +The Deployment Manager is a Python\* command-line tool that is delivered within the Intel® Distribution of OpenVINO™ toolkit for Linux\* and Windows\* release packages and available after installation in the `/deployment_tools/tools/deployment_manager` directory. + +## Pre-Requisites + +* Intel® Distribution of OpenVINO™ toolkit for Linux\* (version 2019 R3 or higher) or Intel® Distribution of OpenVINO™ toolkit for Windows\* (version 2019 R4 or higher) installed on your development machine. +* Python\* 3.6 or higher is required to run the Deployment Manager. +* To run inference on a target device other than CPU, device drivers must be pre-installed. To install, see the following steps: + * **For Linux**: + * [Steps for Intel® Processor Graphics (GPU)](https://docs.openvinotoolkit.org/latest/_docs_install_guides_installing_openvino_linux.html#additional-GPU-steps) + * [Steps for Intel® Neural Compute Stick 2](https://docs.openvinotoolkit.org/latest/_docs_install_guides_installing_openvino_linux.html#additional-NCS-steps) + * [Steps for Intel® Vision Accelerator Design with Intel® Movidius™ VPUs](https://docs.openvinotoolkit.org/latest/_docs_install_guides_installing_openvino_linux.html#install-VPU) + * **For Windows**: + * [Steps for Intel® Processor Graphics (GPU)](https://docs.openvinotoolkit.org/latest/_docs_install_guides_installing_openvino_windows.html#Install-GPU) + * [Steps for the Intel® Vision Accelerator Design with Intel® Movidius™ VPUs](https://docs.openvinotoolkit.org/latest/_docs_install_guides_installing_openvino_windows.html#hddl-myriad) + + +> **IMPORTANT**: The operating system on the target host must be the same as the development system on which you are creating the package. For example, if the target system is Ubuntu 16.04, the deployment package must be created from the OpenVINO™ toolkit installed on Ubuntu 16.04. + +## Create Deployment Package Using Deployment Manager + +There are two ways to create a deployment package that includes inference-related components of the OpenVINO™ toolkit:
+You can run the Deployment Manager tool in either Interactive or Standard CLI mode. + +### Run Interactive Mode +
+ Click to expand/collapse + +Interactive mode provides a user-friendly command-line interface that will guide you through the process with text prompts. + +1. To launch the Deployment Manager in the interactive mode, open a new terminal window, go to the Deployment Manager tool directory and run the tool script without parameters: + ```sh + /deployment_tools/tools/deployment_manager + ``` + ```sh + ./deployment_manager.py + ``` +2. The target device selection dialog is displayed: +![Deployment Manager selection dialog](../img/selection_dialog.png "Deployment Manager selection dialog") +Use the options provided on the screen to complete selection of the target devices and press **Enter** to proceed to the package generation dialog. if you want to interrupt the generation process and exit the program, type **q** and press **Enter**. +3. Once you accept the selection, the package generation dialog is displayed: +![Deployment Manager configuration dialog](../img/configuration_dialog.png "Deployment Manager configuration dialog") + 1. The target devices you have selected at the previous step appear on the screen. If you want to change the selection, type **b** and press **Enter** to go back to the previous screen. + + 2. Use the options provided to configure the generation process, or use the default settings. + + 3. Once all the parameters are set, type **g** and press **Enter** to generate the package for the selected target devices. If you want to interrupt the generation process and exit the program, type **q** and press **Enter**. + +The script successfully completes and the deployment package is generated in the output directory specified. +
+ +### Run Standard CLI Mode +
+ Click to expand/collapse + +Alternatively, you can run the Deployment Manager tool in the standard CLI mode. In this mode, you specify the target devices and other parameters as command-line arguments of the Deployment Manager Python script. This mode facilitates integrating the tool in an automation pipeline. + +To launch the Deployment Manager tool in the standard mode, open a new terminal window, go to the Deployment Manager tool directory and run the tool command with the following syntax: +```sh +./deployment_manager.py <--targets> [--output_dir] [--archive_name] [--user_data] +``` + +The following options are available: + +* `<--targets>` — (Mandatory) List of target devices to run inference. To specify more than one target, separate them with spaces. For example: `--targets cpu gpu vpu`. You can get a list of currently available targets running the tool's help: + ```sh + ./deployment_manager.py -h + ``` +* `[--output_dir]` — (Optional) Path to the output directory. By default, it set to your home directory. + +* `[--archive_name]` — (Optional) Deployment archive name without extension. By default, it set to `openvino_deployment_package`. + +* `[--user_data]` — (Optional) Path to a directory with user data (IRs, models, datasets, etc.) required for inference. By default, it's set to `None`, which means that the user data are already present on the target host machine. + +The script successfully completes and the deployment package is generated in the output directory specified. +
+ +## Deploy Package on Target Hosts + +After the Deployment Manager has successfully completed, you can find the generated `.tar.gz` (for Linux) or `.zip` (for Windows) package in the output directory you specified. + +To deploy the Inference Engine components from the development machine to the target host, perform the following steps: + +1. Transfer the generated archive to the target host using your preferred method. + +2. Unpack the archive into the destination directory on the target host (if your archive name is different from the default shown below, replace the `openvino_deployment_package` with the name you use). + * For Linux: + ```sh + tar xf openvino_deployment_package.tar.gz -C + ``` + * For Windows, use an archiver your prefer. + + The package is unpacked to the destination directory and the following subdirectories are created: + * `bin` — Snapshot of the `bin` directory from the OpenVINO installation directory. + * `deployment_tools/inference_engine` — Contains the Inference Engine binary files. + * `install_dependencies` — Snapshot of the `install_dependencies` directory from the OpenVINO installation directory. + * `` — The directory with the user data (IRs, datasets, etc.) you specified while configuring the package. +3. For Linux, to run inference on a target Intel® GPU, Intel® Movidius™ VPU, or Intel® Vision Accelerator Design with Intel® Movidius™ VPUs, you need to install additional dependencies by running the `install_openvino_dependencies.sh` script: + ```sh + cd /openvino/install_dependencies + ``` + ```sh + sudo -E ./install_openvino_dependencies.sh + ``` +4. Set up the environment variables: + * For Linux: + ```sh + cd /openvino/ + ``` + ```sh + source ./bin/setupvars.sh + ``` + * For Windows: + ``` + cd \openvino\ + ``` + ``` + .\bin\setupvars.bat + ``` + +Congratulations, you have finished the deployment of the Inference Engine components to the target host. \ No newline at end of file diff --git a/docs/install_guides/installing-openvino-apt.md b/docs/install_guides/installing-openvino-apt.md new file mode 100644 index 00000000000000..67b4837ed3e154 --- /dev/null +++ b/docs/install_guides/installing-openvino-apt.md @@ -0,0 +1,134 @@ +# Install Intel® Distribution of OpenVINO™ toolkit for Linux* Using APT Repository {#openvino_docs_install_guides_installing_openvino_apt} + +This guide provides installation steps for Intel® Distribution of OpenVINO™ toolkit for Linux* distributed through the APT repository. + +> **IMPORTANT**: By downloading and using this container and the included software, you agree to the terms and conditions of the [software license agreements](https://software.intel.com/en-us/license/eula-for-intel-software-development-products). Please, review the content inside the `/licensing` folder for more details. + +> **NOTE**: Intel® Graphics Compute Runtime for OpenCL™ is not a part of OpenVINO™ APT distribution. You can install it from the [Intel® Graphics Compute Runtime for OpenCL™ GitHub repo](https://github.com/intel/compute-runtime). + +## Set up the Repository +### Install the GPG key for the repository + +1. Download the public key from [https://apt.repos.intel.com/openvino/2020/GPG-PUB-KEY-INTEL-OPENVINO-2020](https://apt.repos.intel.com/openvino/2020/GPG-PUB-KEY-INTEL-OPENVINO-2020) and save it to a file. +2. Add this key to the system keyring: +```sh +sudo apt-key add +``` +3. Check the list of APT keys running the following command: +```sh +sudo apt-key list +``` + +### Add the APT Repository + +Run the following command: +```sh +echo "deb https://apt.repos.intel.com/openvino/2020 all main" | sudo tee /etc/apt/sources.list.d/intel-openvino-2020.list +``` + +### Update the list of packages + +Run the `update` command: +```sh +sudo apt update +``` +There are full release Runtime and Developer packages, and also some available components. + +**Runtime Packages** +- Ubuntu 18.04: `intel-openvino-runtime-ubuntu18` +- Ubuntu 16.04: `intel-openvino-runtime-ubuntu16` + +**Developer Packages** +- Ubuntu 18.04: `intel-openvino-dev-ubuntu18` +- Ubuntu 16.04: `intel-openvino-dev-ubuntu16` + +### Get the list of available packages + +Run the `apt-cache` command to see a list of all available OpenVINO packages and components: +```sh +apt-cache search openvino +``` + +#### Examples + +* **Runtime Packages** + + On Ubuntu 18.04: + ```sh + sudo apt-cache search intel-openvino-runtime-ubuntu18 + ``` + On Ubuntu 16.04: + ```sh + sudo apt-cache search intel-openvino-runtime-ubuntu16 + ``` +* **Developer Packages** + + On Ubuntu 18.04: + ```sh + sudo apt-cache search intel-openvino-dev-ubuntu18 + ``` + On Ubuntu 16.04: + ```sh + sudo apt-cache search intel-openvino-dev-ubuntu16 + ``` + + +## Install the runtime or developer packages using the APT Package Manager +Intel® OpenVINO will be installed in: `/opt/intel/openvino_..` + +A symlink will be created: `/opt/intel/openvino` + +--- +### To Install a specific version + +To get a list of OpenVINO packages available for installation: + +```sh +sudo apt-cache search intel-openvino-runtime-ubuntu18 +``` + +To install a specific version of an OpenVINO package: +```sh +sudo apt install intel-openvino--ubuntu-.. +``` + +#### Examples +* **Runtime Package** + + On Ubuntu 18.04: + ```sh + sudo apt install intel-openvino-runtime-ubuntu18-2020.1.023 + ``` + On Ubuntu 16.04: + ```sh + sudo apt install intel-openvino-runtime-ubuntu16-2020.1.023 + ``` +* **Developer Package**
+ On Ubuntu 18.04: + ```sh + sudo apt install intel-openvino-dev-ubuntu18-2020.1.023 + ``` + On Ubuntu 16.04: + ```sh + sudo apt install intel-openvino-dev-ubuntu16-2020.1.023 + ``` + +--- +### To Uninstall a specific version + +To uninstall a specific full runtime package: +```sh +sudo apt autoremove intel-openvino--ubuntu-.. +``` + + +**Additional Resources** + +- Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) +- OpenVINO™ toolkit online documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org) +- [Model Optimizer Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html). +- [Inference Engine Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Deep_Learning_Inference_Engine_DevGuide.html) +- For more information on Sample Applications, see the [Inference Engine Samples Overview](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Samples_Overview.html). +- For information on Inference Engine Tutorials, see the [Inference Tutorials](https://github.com/intel-iot-devkit/inference-tutorials-generic). +- For IoT Libraries & Code Samples see the [Intel® IoT Developer Kit](https://github.com/intel-iot-devkit). + diff --git a/docs/install_guides/installing-openvino-conda.md b/docs/install_guides/installing-openvino-conda.md new file mode 100644 index 00000000000000..5f1dc3a222f146 --- /dev/null +++ b/docs/install_guides/installing-openvino-conda.md @@ -0,0 +1,59 @@ +# Install Intel® Distribution of OpenVINO™ toolkit from Anaconda* Cloud {#openvino_docs_install_guides_installing_openvino_conda} + +This guide provides installation steps for Intel® Distribution of OpenVINO™ toolkit distributed through the Anaconda* Cloud. + + +## System Requirements + + - [Anaconda* distribution](https://www.anaconda.com/products/individual/) + +**Operating Systems** + +- Ubuntu* 18.04 long-term support (LTS), 64-bit +- macOS* 10.14.x versions. +- Windows 10*, 64-bit Pro, Enterprise or Education (1607 Anniversary Update, Build 14393 or higher) editions +- Windows Server* 2016 or higher + + + +## Install the runtime package using the Anaconda* Package Manager + +1. Set up the Anaconda* environment.  + +2. Updated conda to the latest version: + ```sh + conda update --all + ``` +3. Install the Intel® Distribution of OpenVINO™ Toolkit: + - Ubuntu* 18.04 + ```sh + conda install openvino-ie4py-ubuntu18 -c intel + ``` + - Windows* 10 and macOS* + ```sh + conda install openvino-ie4py -c intel + ``` +4. Verify the package installed: + ```sh + python -c "import openvino" + ``` + +Now you can start to develop and run your application. + + +## Known Issues and Limitations + +- You cannot use Python bindings included in Intel® Distribution of OpenVINO™ toolkit with [Anaconda* distribution](https://www.anaconda.com/products/individual/) +- You cannot use Python OpenVINO™ bindings included in Anaconda* package with official [Python distribution](https://https://www.python.org/). + + +## Additional Resources + +- Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) +- OpenVINO™ toolkit online documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org) +- [Model Optimizer Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html). +- [Inference Engine Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Deep_Learning_Inference_Engine_DevGuide.html) +- For more information on Sample Applications, see the [Inference Engine Samples Overview](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Samples_Overview.html). +- For information on Inference Engine Tutorials, see the [Inference Tutorials](https://github.com/intel-iot-devkit/inference-tutorials-generic). +- Intel® Distribution of OpenVINO™ toolkit Anaconda* home page: [https://anaconda.org/intel/openvino-ie4py](https://anaconda.org/intel/openvino-ie4py) + diff --git a/docs/install_guides/installing-openvino-docker-linux.md b/docs/install_guides/installing-openvino-docker-linux.md new file mode 100644 index 00000000000000..46ed7dc29a871c --- /dev/null +++ b/docs/install_guides/installing-openvino-docker-linux.md @@ -0,0 +1,339 @@ +# Install Intel® Distribution of OpenVINO™ toolkit for Linux* from a Docker* Image {#openvino_docs_install_guides_installing_openvino_docker_linux} + +The Intel® Distribution of OpenVINO™ toolkit quickly deploys applications and solutions that emulate human vision. Based on Convolutional Neural Networks (CNN), the toolkit extends computer vision (CV) workloads across Intel® hardware, maximizing performance. The Intel® Distribution of OpenVINO™ toolkit includes the Intel® Deep Learning Deployment Toolkit. + +This guide provides the steps for creating a Docker* image with Intel® Distribution of OpenVINO™ toolkit for Linux* and further installation. + +## System Requirements + +**Target Operating Systems** + +- Ubuntu\* 18.04 long-term support (LTS), 64-bit + +**Host Operating Systems** + +- Linux with installed GPU driver and with Linux kernel supported by GPU driver + +## Use Docker* Image for CPU + +- Kernel reports the same information for all containers as for native application, for example, CPU, memory information. +- All instructions that are available to host process available for process in container, including, for example, AVX2, AVX512. No restrictions. +- Docker* does not use virtualization or emulation. The process in Docker* is just a regular Linux process, but it is isolated from external world on kernel level. Performance penalty is small. + +### Build a Docker* Image for CPU + +To build a Docker image, create a `Dockerfile` that contains defined variables and commands required to create an OpenVINO toolkit installation image. + +Create your `Dockerfile` using the following example as a template: + +
+ Click to expand/collapse + +```sh +FROM ubuntu:18.04 + +USER root +WORKDIR / + +SHELL ["/bin/bash", "-xo", "pipefail", "-c"] + +# Creating user openvino +RUN useradd -ms /bin/bash openvino && \ + chown openvino -R /home/openvino + +ARG DEPENDENCIES="autoconf \ + automake \ + build-essential \ + cmake \ + cpio \ + curl \ + gnupg2 \ + libdrm2 \ + libglib2.0-0 \ + lsb-release \ + libgtk-3-0 \ + libtool \ + udev \ + unzip \ + dos2unix" + +RUN apt-get update && \ + apt-get install -y --no-install-recommends ${DEPENDENCIES} && \ + rm -rf /var/lib/apt/lists/* + +WORKDIR /thirdparty +RUN sed -Ei 's/# deb-src /deb-src /' /etc/apt/sources.list && \ + apt-get update && \ + apt-get source ${DEPENDENCIES} && \ + rm -rf /var/lib/apt/lists/* + +# setup Python +ENV PYTHON python3.6 + +RUN apt-get update && \ + apt-get install -y --no-install-recommends python3-pip python3-dev lib${PYTHON}=3.6.9-1~18.04 && \ + rm -rf /var/lib/apt/lists/* + +ARG package_url=http://registrationcenter-download.intel.com/akdlm/irc_nas/16612/l_openvino_toolkit_p_0000.0.000.tgz +ARG TEMP_DIR=/tmp/openvino_installer + +WORKDIR ${TEMP_DIR} +ADD ${package_url} ${TEMP_DIR} + +# install product by installation script +ENV INTEL_OPENVINO_DIR /opt/intel/openvino + +RUN tar -xzf ${TEMP_DIR}/*.tgz --strip 1 +RUN sed -i 's/decline/accept/g' silent.cfg && \ + ${TEMP_DIR}/install.sh -s silent.cfg && \ + ${INTEL_OPENVINO_DIR}/install_dependencies/install_openvino_dependencies.sh + +WORKDIR /tmp +RUN rm -rf ${TEMP_DIR} + +# installing dependencies for package +WORKDIR /tmp + +RUN ${PYTHON} -m pip install --no-cache-dir setuptools && \ + find "${INTEL_OPENVINO_DIR}/" -type f -name "*requirements*.*" -path "*/${PYTHON}/*" -exec ${PYTHON} -m pip install --no-cache-dir -r "{}" \; && \ + find "${INTEL_OPENVINO_DIR}/" -type f -name "*requirements*.*" -not -path "*/post_training_optimization_toolkit/*" -not -name "*windows.txt" -not -name "*ubuntu16.txt" -not -path "*/python3*/*" -not -path "*/python2*/*" -exec ${PYTHON} -m pip install --no-cache-dir -r "{}" \; + +WORKDIR ${INTEL_OPENVINO_DIR}/deployment_tools/open_model_zoo/tools/accuracy_checker +RUN source ${INTEL_OPENVINO_DIR}/bin/setupvars.sh && \ + ${PYTHON} -m pip install --no-cache-dir -r ${INTEL_OPENVINO_DIR}/deployment_tools/open_model_zoo/tools/accuracy_checker/requirements.in && \ + ${PYTHON} ${INTEL_OPENVINO_DIR}/deployment_tools/open_model_zoo/tools/accuracy_checker/setup.py install + +WORKDIR ${INTEL_OPENVINO_DIR}/deployment_tools/tools/post_training_optimization_toolkit +RUN if [ -f requirements.txt ]; then \ + ${PYTHON} -m pip install --no-cache-dir -r ${INTEL_OPENVINO_DIR}/deployment_tools/tools/post_training_optimization_toolkit/requirements.txt && \ + ${PYTHON} ${INTEL_OPENVINO_DIR}/deployment_tools/tools/post_training_optimization_toolkit/setup.py install; \ + fi; + +# Post-installation cleanup and setting up OpenVINO environment variables +RUN if [ -f "${INTEL_OPENVINO_DIR}"/bin/setupvars.sh ]; then \ + printf "\nsource \${INTEL_OPENVINO_DIR}/bin/setupvars.sh\n" >> /home/openvino/.bashrc; \ + printf "\nsource \${INTEL_OPENVINO_DIR}/bin/setupvars.sh\n" >> /root/.bashrc; \ + fi; +RUN find "${INTEL_OPENVINO_DIR}/" -name "*.*sh" -type f -exec dos2unix {} \; + +USER openvino +WORKDIR ${INTEL_OPENVINO_DIR} + +CMD ["/bin/bash"] +``` + +
+ +> **NOTE**: Please replace direct link to the Intel® Distribution of OpenVINO™ toolkit package to the latest version in the `package_url` argument. You can copy the link from the [Intel® Distribution of OpenVINO™ toolkit download page](https://software.seek.intel.com/openvino-toolkit) after registration. Right click on **Offline Installer** button on the download page for Linux in your browser and press **Copy link address**. + +You can select which OpenVINO components will be installed by modifying `COMPONENTS` parameter in the `silent.cfg` file. For example to install only CPU runtime for the Inference Engine, set +`COMPONENTS=intel-openvino-ie-rt-cpu__x86_64` in `silent.cfg`. + +To get a full list of available components for installation, run the `./install.sh --list_components` command from the unpacked OpenVINO™ toolkit package. + +To build a Docker* image for CPU, run the following command: +```sh +docker build . -t \ +--build-arg HTTP_PROXY= \ +--build-arg HTTPS_PROXY= +``` + +### Run the Docker* Image for CPU + +Run the image with the following command: +```sh +docker run -it +``` +## Use a Docker* Image for GPU +### Build a Docker* Image for GPU + +**Prerequisites:** +- GPU is not available in container by default, you must attach it to the container. +- Kernel driver must be installed on the host. +- Intel® OpenCL™ runtime package must be included into the container. +- In the container, user must be in the `video` group. + +Before building a Docker* image on GPU, add the following commands to the `Dockerfile` example for CPU above: + +```sh +WORKDIR /tmp/opencl +RUN usermod -aG video openvino +RUN apt-get update && \ + apt-get install -y --no-install-recommends ocl-icd-libopencl1 && \ + rm -rf /var/lib/apt/lists/* && \ + curl -L "https://github.com/intel/compute-runtime/releases/download/19.41.14441/intel-gmmlib_19.3.2_amd64.deb" --output "intel-gmmlib_19.3.2_amd64.deb" && \ + curl -L "https://github.com/intel/compute-runtime/releases/download/19.41.14441/intel-igc-core_1.0.2597_amd64.deb" --output "intel-igc-core_1.0.2597_amd64.deb" && \ + curl -L "https://github.com/intel/compute-runtime/releases/download/19.41.14441/intel-igc-opencl_1.0.2597_amd64.deb" --output "intel-igc-opencl_1.0.2597_amd64.deb" && \ + curl -L "https://github.com/intel/compute-runtime/releases/download/19.41.14441/intel-opencl_19.41.14441_amd64.deb" --output "intel-opencl_19.41.14441_amd64.deb" && \ + curl -L "https://github.com/intel/compute-runtime/releases/download/19.41.14441/intel-ocloc_19.04.12237_amd64.deb" --output "intel-ocloc_19.04.12237_amd64.deb" && \ + dpkg -i /tmp/opencl/*.deb && \ + ldconfig && \ + rm /tmp/opencl +``` + +To build a Docker* image for GPU, run the following command: +```sh +docker build . -t \ +--build-arg HTTP_PROXY= \ +--build-arg HTTPS_PROXY= +``` + +### Run the Docker* Image for GPU + +To make GPU available in the container, attach the GPU to the container using `--device /dev/dri` option and run the container: +```sh +docker run -it --device /dev/dri +``` + +## Use a Docker* Image for Intel® Neural Compute Stick 2 + +### Build a Docker* Image for Intel® Neural Compute Stick 2 + +Build a Docker image using the same steps as for CPU. + +### Run the Docker* Image for Intel® Neural Compute Stick 2 + +**Known limitations:** + +- Intel® Neural Compute Stick 2 device changes its VendorID and DeviceID during execution and each time looks for a host system as a brand new device. It means it cannot be mounted as usual. +- UDEV events are not forwarded to the container by default it does not know about device reconnection. +- Only one device per host is supported. + +Use one of the following options to run **Possible solutions for Intel® Neural Compute Stick 2:** + +- **Solution #1**: + 1. Get rid of UDEV by rebuilding `libusb` without UDEV support in the Docker* image (add the following commands to the `Dockerfile` example for CPU above):
+```sh +RUN usermod -aG users openvino +WORKDIR /opt +RUN curl -L https://github.com/libusb/libusb/archive/v1.0.22.zip --output v1.0.22.zip && \ + unzip v1.0.22.zip + +WORKDIR /opt/libusb-1.0.22 +RUN ./bootstrap.sh && \ + ./configure --disable-udev --enable-shared && \ + make -j4 +RUN apt-get update && \ + apt-get install -y --no-install-recommends libusb-1.0-0-dev=2:1.0.21-2 && \ + rm -rf /var/lib/apt/lists/* + +WORKDIR /opt/libusb-1.0.22/libusb +RUN /bin/mkdir -p '/usr/local/lib' && \ + /bin/bash ../libtool --mode=install /usr/bin/install -c libusb-1.0.la '/usr/local/lib' && \ + /bin/mkdir -p '/usr/local/include/libusb-1.0' && \ + /usr/bin/install -c -m 644 libusb.h '/usr/local/include/libusb-1.0' && \ + /bin/mkdir -p '/usr/local/lib/pkgconfig' + +WORKDIR /opt/libusb-1.0.22/ +RUN /usr/bin/install -c -m 644 libusb-1.0.pc '/usr/local/lib/pkgconfig' && \ + ldconfig +``` +
+ 2. Run the Docker* image:
+```sh +docker run --device-cgroup-rule='c 189:* rmw' -v /dev/bus/usb:/dev/bus/usb +``` + +- **Solution #2**: + Run container in privileged mode, enable Docker network configuration as host, and mount all devices to container:
+```sh +docker run --privileged -v /dev:/dev --network=host +``` + +> **Notes**: +> - It is not secure +> - Conflicts with Kubernetes* and other tools that use orchestration and private networks + +## Use a Docker* Image for Intel® Vision Accelerator Design with Intel® Movidius™ VPUs + +### Build Docker* Image for Intel® Vision Accelerator Design with Intel® Movidius™ VPUs +To use the Docker container for inference on Intel® Vision Accelerator Design with Intel® Movidius™ VPUs: + +1. Set up the environment on the host machine, that is going to be used for running Docker*. It is required to execute `hddldaemon`, which is responsible for communication between the HDDL plugin and the board. To learn how to set up the environment (the OpenVINO package must be pre-installed), see [Configuration Guide for Intel® Vision Accelerator Design with Intel® Movidius™ VPUs](installing-openvino-linux-ivad-vpu.md). +2. Prepare the Docker* image. As a base image, you can use the image from the section [Building Docker Image for CPU](#building-for-cpu). To use it for inference on Intel® Vision Accelerator Design with Intel® Movidius™ VPUs you need to rebuild the image with adding the following dependencies: +```sh +RUN apt-get update && \ + apt-get install -y --no-install-recommends \ + libboost-filesystem1.65-dev=1.65.1+dfsg-0ubuntu5 \ + libboost-thread1.65-dev=1.65.1+dfsg-0ubuntu5 \ + libjson-c3=0.12.1-1.3 libxxf86vm-dev=1:1.1.4-1 && \ + rm -rf /var/lib/apt/lists/* +``` +3. Run `hddldaemon` on the host in a separate terminal session using the following command: +```sh +$HDDL_INSTALL_DIR/hddldaemon +``` + +### Run the Docker* Image for Intel® Vision Accelerator Design with Intel® Movidius™ VPUs +To run the built Docker* image for Intel® Vision Accelerator Design with Intel® Movidius™ VPUs, use the following command: +```sh +docker run --device=/dev/ion:/dev/ion -v /var/tmp:/var/tmp -ti +``` + +> **NOTE**: +> - The device `/dev/ion` need to be shared to be able to use ion buffers among the plugin, `hddldaemon` and the kernel. +> - Since separate inference tasks share the same HDDL service communication interface (the service creates mutexes and a socket file in `/var/tmp`), `/var/tmp` needs to be mounted and shared among them. + +In some cases, the ion driver is not enabled (for example, due to a newer kernel version or iommu incompatibility). `lsmod | grep myd_ion` returns empty output. To resolve, use the following command: +```sh +docker run --rm --net=host -v /var/tmp:/var/tmp –ipc=host -ti +``` +> **NOTE**: +> - When building docker images, create a user in the docker file that has the same UID and GID as the user which runs hddldaemon on the host. +> - Run the application in the docker with this user. +> - Alternatively, you can start hddldaemon with the root user on host, but this approach is not recommended. + +## Use a Docker* Image for FPGA +### Build a Docker* Image for FPGA + +FPGA card is not available in container by default, but it can be mounted there with the following pre-requisites: +- FPGA device is up and ready to run inference. +- FPGA bitstreams were pushed to the device over PCIe. + +To build a Docker* image for FPGA: + +1. Set additional environment variables in the `Dockerfile`:
+```sh +ENV CL_CONTEXT_COMPILER_MODE_INTELFPGA=3 +ENV DLA_AOCX=/opt/intel/openvino/a10_devkit_bitstreams/2-0-1_RC_FP11_Generic.aocx +ENV PATH=/opt/altera/aocl-pro-rte/aclrte-linux64/bin:$PATH +``` +2. Install the following UDEV rule:
+```sh +cat < fpga.rules +KERNEL=="acla10_ref*",GROUP="users",MODE="0660" +EOF +sudo cp fpga.rules /etc/udev/rules.d/ +sudo udevadm control --reload-rules +sudo udevadm trigger +sudo ldconfig +``` +Make sure that a container user is added to the "users" group with the same GID as on host. + +### Run the Docker* container for FPGA + +To run the built Docker* container for FPGA, use the following command: + +```sh +docker run --rm -it \ +--mount type=bind,source=/opt/intel/intelFPGA_pro,destination=/opt/intel/intelFPGA_pro \ +--mount type=bind,source=/opt/altera,destination=/opt/altera \ +--mount type=bind,source=/etc/OpenCL/vendors,destination=/etc/OpenCL/vendors \ +--mount type=bind,source=/opt/Intel/OpenCL/Boards,destination=/opt/Intel/OpenCL/Boards \ +--device /dev/acla10_ref0:/dev/acla10_ref0 \ + +``` + +## Examples +* [ubuntu18_runtime dockerfile](https://docs.openvinotoolkit.org/downloads/ubuntu18_runtime.dockerfile) - Can be used to build OpenVINO™ runtime image containing minimal dependencies needed to use OpenVINO™ in production environment. +* [ubuntu18_dev dockerfile](https://docs.openvinotoolkit.org/downloads/ubuntu18_dev.dockerfile) - Can be used to build OpenVINO™ developer image containing full OpenVINO™ package to use in development environment. + +## Additional Resources + +* Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) + +* OpenVINO™ toolkit documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org) + +* Intel® Neural Compute Stick 2 Get Started: [https://software.intel.com/en-us/neural-compute-stick/get-started](https://software.intel.com/en-us/neural-compute-stick/get-started) + +* Intel® Distribution of OpenVINO™ toolkit Docker Hub* home page: [https://hub.docker.com/u/openvino](https://hub.docker.com/u/openvino) diff --git a/docs/install_guides/installing-openvino-docker-windows.md b/docs/install_guides/installing-openvino-docker-windows.md new file mode 100644 index 00000000000000..7a8af621619ea3 --- /dev/null +++ b/docs/install_guides/installing-openvino-docker-windows.md @@ -0,0 +1,156 @@ +# Install Intel® Distribution of OpenVINO™ toolkit for Windows* from Docker* Image {#openvino_docs_install_guides_installing_openvino_docker_windows} + +The Intel® Distribution of OpenVINO™ toolkit quickly deploys applications and solutions that emulate human vision. Based on Convolutional Neural Networks (CNN), the toolkit extends computer vision (CV) workloads across Intel® hardware, maximizing performance. The Intel® Distribution of OpenVINO™ toolkit includes the Intel® Deep Learning Deployment Toolkit. + +This guide provides the steps for creating a Docker* image with Intel® Distribution of OpenVINO™ toolkit for Windows* and further installation. + +## System Requirements + +**Target Operating Systems** + +- Windows Server Core* + +**Host Operating Systems** + +- Windows 10*, 64-bit Pro, Enterprise or Education (1607 Anniversary Update, Build 14393 or later) editions +- Windows Server* 2016 or higher + +## Build a Docker* Image for CPU + +To build a Docker image, create a `Dockerfile` that contains defined variables and commands required to create an OpenVINO toolkit installation image. + +Create your `Dockerfile` using the following example as a template: + +
+ Click to expand/collapse + +~~~ +# escape= ` +FROM mcr.microsoft.com/windows/servercore:ltsc2019 + +# Restore the default Windows shell for correct batch processing. +SHELL ["cmd", "/S", "/C"] + +USER ContainerAdministrator + +# Setup Redistributable Libraries for Intel(R) C++ Compiler for Windows* + +RUN powershell.exe -Command ` + Invoke-WebRequest -URI https://software.intel.com/sites/default/files/managed/59/aa/ww_icl_redist_msi_2018.3.210.zip -Proxy %HTTPS_PROXY% -OutFile "%TMP%\ww_icl_redist_msi_2018.3.210.zip" ; ` + Expand-Archive -Path "%TMP%\ww_icl_redist_msi_2018.3.210.zip" -DestinationPath "%TMP%\ww_icl_redist_msi_2018.3.210" -Force ; ` + Remove-Item "%TMP%\ww_icl_redist_msi_2018.3.210.zip" -Force + +RUN %TMP%\ww_icl_redist_msi_2018.3.210\ww_icl_redist_intel64_2018.3.210.msi /quiet /passive /log "%TMP%\redist.log" + +# setup Python +ARG PYTHON_VER=python3.7 + +RUN powershell.exe -Command ` + Invoke-WebRequest -URI https://www.python.org/ftp/python/3.7.6/python-3.7.6-amd64.exe -Proxy %HTTPS_PROXY% -OutFile %TMP%\\python-3.7.exe ; ` + Start-Process %TMP%\\python-3.7.exe -ArgumentList '/passive InstallAllUsers=1 PrependPath=1 TargetDir=c:\\Python37' -Wait ; ` + Remove-Item %TMP%\\python-3.7.exe -Force + +RUN python -m pip install --upgrade pip +RUN python -m pip install cmake + +# download package from external URL +ARG package_url=http://registrationcenter-download.intel.com/akdlm/irc_nas/16613/w_openvino_toolkit_p_0000.0.000.exe +ARG TEMP_DIR=/temp + +WORKDIR ${TEMP_DIR} +ADD ${package_url} ${TEMP_DIR} + +# install product by installation script +ARG build_id=0000.0.000 +ENV INTEL_OPENVINO_DIR C:\intel + +RUN powershell.exe -Command ` + Start-Process "./*.exe" -ArgumentList '--s --a install --eula=accept --installdir=%INTEL_OPENVINO_DIR% --output=%TMP%\openvino_install_out.log --components=OPENVINO_COMMON,INFERENCE_ENGINE,INFERENCE_ENGINE_SDK,INFERENCE_ENGINE_SAMPLES,OMZ_TOOLS,POT,INFERENCE_ENGINE_CPU,INFERENCE_ENGINE_GPU,MODEL_OPTIMIZER,OMZ_DEV,OPENCV_PYTHON,OPENCV_RUNTIME,OPENCV,DOCS,SETUPVARS,VC_REDIST_2017_X64,icl_redist' -Wait + +ENV INTEL_OPENVINO_DIR C:\intel\openvino_${build_id} + +# Post-installation cleanup +RUN rmdir /S /Q "%USERPROFILE%\Downloads\Intel" + +# dev package +WORKDIR ${INTEL_OPENVINO_DIR} +RUN python -m pip install --no-cache-dir setuptools && ` + python -m pip install --no-cache-dir -r "%INTEL_OPENVINO_DIR%\python\%PYTHON_VER%\requirements.txt" && ` + python -m pip install --no-cache-dir -r "%INTEL_OPENVINO_DIR%\python\%PYTHON_VER%\openvino\tools\benchmark\requirements.txt" && ` + python -m pip install --no-cache-dir torch==1.4.0+cpu torchvision==0.5.0+cpu -f https://download.pytorch.org/whl/torch_stable.html + +WORKDIR ${TEMP_DIR} +COPY scripts\install_requirements.bat install_requirements.bat +RUN install_requirements.bat %INTEL_OPENVINO_DIR% + + +WORKDIR ${INTEL_OPENVINO_DIR}\deployment_tools\open_model_zoo\tools\accuracy_checker +RUN %INTEL_OPENVINO_DIR%\bin\setupvars.bat && ` + python -m pip install --no-cache-dir -r "%INTEL_OPENVINO_DIR%\deployment_tools\open_model_zoo\tools\accuracy_checker\requirements.in" && ` + python "%INTEL_OPENVINO_DIR%\deployment_tools\open_model_zoo\tools\accuracy_checker\setup.py" install + +WORKDIR ${INTEL_OPENVINO_DIR}\deployment_tools\tools\post_training_optimization_toolkit +RUN python -m pip install --no-cache-dir -r "%INTEL_OPENVINO_DIR%\deployment_tools\tools\post_training_optimization_toolkit\requirements.txt" && ` + python "%INTEL_OPENVINO_DIR%\deployment_tools\tools\post_training_optimization_toolkit\setup.py" install + +WORKDIR ${INTEL_OPENVINO_DIR} + +# Post-installation cleanup +RUN powershell Remove-Item -Force -Recurse "%TEMP%\*" && ` + powershell Remove-Item -Force -Recurse "%TEMP_DIR%" && ` + rmdir /S /Q "%ProgramData%\Package Cache" + +USER ContainerUser + +CMD ["cmd.exe"] +~~~ + +
+ +> **NOTE**: Replace direct link to the Intel® Distribution of OpenVINO™ toolkit package to the latest version in the `package_url` variable and modify install package name in the subsequent commands. You can copy the link from the [Intel® Distribution of OpenVINO™ toolkit download page](https://software.seek.intel.com/openvino-toolkit) after registration. Right click the **Offline Installer** button on the download page for Linux in your browser and press **Copy link address**. +> **NOTE**: Replace build number of the package in the `build_id` variable according to the name of the downloaded Intel® Distribution of OpenVINO™ toolkit package. For example, for the installation file `w_openvino_toolkit_p_2020.3.333.exe`, the `build_id` variable should have the value `2020.3.333`. + +To build a Docker* image for CPU, run the following command: +~~~ +docker build . -t ` +--build-arg HTTP_PROXY= ` +--build-arg HTTPS_PROXY= +~~~ + +## Install additional dependencies +### Install CMake +To add CMake to the image, add the following commands to the `Dockerfile` example above: +~~~ +RUN powershell.exe -Command ` + Invoke-WebRequest -URI https://cmake.org/files/v3.14/cmake-3.14.7-win64-x64.msi -Proxy %HTTPS_PROXY% -OutFile %TMP%\\cmake-3.14.7-win64-x64.msi ; ` + Start-Process %TMP%\\cmake-3.14.7-win64-x64.msi -ArgumentList '/quiet /norestart' -Wait ; ` + Remove-Item %TMP%\\cmake-3.14.7-win64-x64.msi -Force + +RUN SETX /M PATH "C:\Program Files\CMake\Bin;%PATH%" +~~~ + +### Install Microsoft Visual Studio* Build Tools +You can add Microsoft Visual Studio Build Tools* to Windows* OS Docker image. Available options are to use offline installer for Build Tools +(follow [Instruction for the offline installer](https://docs.microsoft.com/en-us/visualstudio/install/create-an-offline-installation-of-visual-studio?view=vs-2019) or +to use online installer for Build Tools (follow [Instruction for the online installer](https://docs.microsoft.com/en-us/visualstudio/install/build-tools-container?view=vs-2019). +Microsoft Visual Studio Build Tools* are licensed as a supplement your existing Microsoft Visual Studio* license. +Any images built with these tools should be for your personal use or for use in your organization in accordance with your existing Visual Studio* and Windows* licenses. + +## Run the Docker* Image for CPU + +To install the OpenVINO toolkit from the prepared Docker image, run the image with the following command: +~~~ +docker run -it +~~~ + +## Examples +* [winserver2019_runtime dockerfile](https://docs.openvinotoolkit.org/downloads/winserver2019_runtime.dockerfile) - Can be used to build OpenVINO™ runtime image containing minimal dependencies needed to use OpenVINO™ in production environment. +* [winserver2019_dev dockerfile](https://docs.openvinotoolkit.org/downloads/winserver2019_dev.dockerfile) - Can be used to build OpenVINO™ developer image containing full OpenVINO™ package to use in development environment. + +## Additional Resources + +* Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) + +* OpenVINO™ toolkit documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org) + +* Intel® Distribution of OpenVINO™ toolkit Docker Hub* home page: [https://hub.docker.com/u/openvino](https://hub.docker.com/u/openvino) diff --git a/docs/install_guides/installing-openvino-linux-fpga.md b/docs/install_guides/installing-openvino-linux-fpga.md new file mode 100644 index 00000000000000..50e41b5922a899 --- /dev/null +++ b/docs/install_guides/installing-openvino-linux-fpga.md @@ -0,0 +1,329 @@ +# Install Intel® Distribution of OpenVINO™ toolkit for Linux* with FPGA Support {#openvino_docs_install_guides_installing_openvino_linux_fpga} + +**NOTES**: +- [Intel® System Studio](https://software.intel.com/en-us/system-studio) is an all-in-one, cross-platform tool suite, purpose-built to simplify system bring-up and improve system and IoT device application performance on Intel® platforms. If you are using the Intel® Distribution of OpenVINO™ with Intel® System Studio, go to [Get Started with Intel® System Studio](https://software.intel.com/en-us/articles/get-started-with-openvino-and-intel-system-studio-2019). +- The Intel® Distribution of OpenVINO™ toolkit was formerly known as the Intel® Computer Vision SDK. +- These steps apply to Ubuntu\*, CentOS\*, and Yocto\*. +- If you are using Intel® Distribution of OpenVINO™ toolkit on Windows\* OS, see the [Installation Guide for Windows*](installing-openvino-windows.md). +- For the Intel Distribution of OpenVINO toolkit without FPGA +support, see [Installation Guide for Linux*](installing-openvino-linux.md). +- CentOS and Yocto installations will require some modifications that +are not covered in this guide. +- An internet connection is required to follow the steps in this guide. + +## Introduction + +The Intel® Distribution of OpenVINO™ toolkit quickly deploys applications and solutions that emulate human vision. Based on Convolutional Neural Networks (CNN), the toolkit extends computer vision (CV) workloads across Intel® hardware, maximizing performance. The Intel® Distribution of OpenVINO™ toolkit includes the Intel® Deep Learning Deployment Toolkit (Intel® DLDT). + +The Intel® Distribution of OpenVINO™ toolkit for Linux\* with FPGA Support: + +- Enables CNN-based deep learning inference on the edge +- Supports heterogeneous execution across Intel® CPU, Intel® Integrated Graphics, Intel® FPGA, Intel® Neural Compute Stick 2 +- Speeds time-to-market via an easy-to-use library of computer vision functions and pre-optimized kernels +- Includes optimized calls for computer vision standards including OpenCV\* and OpenCL™ + +**Included with the Installation and installed by default:** + +| Component | Description | +|-----------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) | This tool imports, converts, and optimizes models that were trained in popular frameworks to a format usable by Intel tools, especially the Inference Engine. 
Popular frameworks include Caffe\*, TensorFlow\*, MXNet\*, and ONNX\*. | +| [Inference Engine](../IE_DG/inference_engine_intro.md) | This is the engine that runs the deep learning model. It includes a set of libraries for an easy inference integration into your applications. | +| Drivers and runtimes for OpenCL™ version 2.1 | Enables OpenCL on the GPU/CPU for Intel® processors | +| Intel® Media SDK | Offers access to hardware accelerated video codecs and frame processing | +| Pre-compiled FPGA bitstream samples | Pre-compiled bitstream samples for the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA, and Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA SG2. | +| Intel® FPGA SDK for OpenCL™ software technology | The Intel® FPGA RTE for OpenCL™ provides utilities, host runtime libraries, drivers, and RTE-specific libraries and files | +| [OpenCV](https://docs.opencv.org/master/) | OpenCV\* community version compiled for Intel® hardware | +| [Inference Engine Code Samples](../IE_DG/Samples_Overview.md) | A set of simple console applications demonstrating how to utilize specific OpenVINO capabilities in an application and how to perform specific tasks, such as loading a model, running inference, querying specific device capabilities, and more. | +| [Demo Applications](@ref omz_demos_README) | A set of simple console applications that provide robust application templates to help you implement specific deep learning scenarios. | + + +## Development and Target Platform + +The development and target platforms have the same requirements, but you can select different components during the installation, based on your intended use. + +**Hardware** + +* 6th to 10th generation Intel® Core™ processors and Intel® Xeon® processors +* Intel® Xeon® processor E family (formerly code named Sandy Bridge, Ivy Bridge, Haswell, and Broadwell) +* 3rd generation Intel® Xeon® Scalable processor (formerly code named Cooper Lake) +* Intel® Xeon® Scalable processor (formerly Skylake and Cascade Lake) +* Intel Atom® processor with support for Intel® Streaming SIMD Extensions 4.1 (Intel® SSE4.1) +* Intel Pentium® processor N4200/5, N3350/5, or N3450/5 with Intel® HD Graphics +* Intel® Neural Compute Stick 2 +* Intel® Vision Accelerator Design with Intel® Movidius™ VPUs +* Intel® Programmable Acceleration Card (PAC) with Intel® Arria® 10 GX FPGA +* Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Mustang-F100-A10) SG2 + +> **NOTE**: With OpenVINO™ 2020.4 release, Intel® Movidius™ Neural Compute Stick is no longer supported. + +> **NOTE**: Intel® Arria 10 FPGA (Mustang-F100-A10) SG1 is no longer supported. If you use Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Mustang-F100-A10) Speed Grade 1, we recommend continuing to use the [Intel® Distribution of OpenVINO™ toolkit 2020.1](https://docs.openvinotoolkit.org/2020.1/_docs_install_guides_VisionAcceleratorFPGA_Configure.html) release. + +> **NOTE**: Intel® Arria® 10 GX FPGA Development Kit is no longer supported. For the Intel® Arria® 10 FPGA GX Development Kit configuration guide, refer to the [2019 R1.1 documentation](http://docs.openvinotoolkit.org/2019_R1.1/_docs_install_guides_GX_Configure_2019R1.html). + +**Processor Notes:** + +- Processor graphics are not included in all processors. See [Product Specifications](https://ark.intel.com/) for information about your processor. +- A chipset that supports processor graphics is required for Intel® Xeon® processors. + +**Operating Systems:** + +- Ubuntu 18.04 or 16.04 long-term support (LTS), 64-bit: Minimum supported kernel is 4.15 +- CentOS 7.4, 64-bit +- Yocto Project v3.0, 64-bit (for target only and requires modifications) + +## Overview + +This guide provides step-by-step instructions on how to install the Intel® Distribution of OpenVINO™ toolkit with FPGA Support. Links are provided for each type of compatible hardware including downloads, initialization and configuration steps. The following steps will be covered: + +1. Install the Intel® Distribution of OpenVINO™ Toolkit +2. Install External software dependencies +3. Configure the Model Optimizer +4. Run the Verification Scripts to Verify Installation and Compile Samples +5. Install your compatible hardware from the list of supported hardware
+6. Use the Face Detection Tutorial + +## Install the Intel® Distribution of OpenVINO™ Toolkit Core Components + +Download the Intel® Distribution of OpenVINO™ toolkit package file from [Intel® Distribution of OpenVINO™ toolkit for Linux* with FPGA Support](https://software.intel.com/en-us/openvino-toolkit/choose-download). +Select the Intel® Distribution of OpenVINO™ toolkit for Linux with FPGA Support package from the dropdown menu. + +1. Open a command prompt terminal window. +2. Change directories to where you downloaded the Intel Distribution of +OpenVINO toolkit for Linux\* with FPGA Support package file.
+ If you downloaded the package file to the current user's `Downloads` + directory: +```sh +cd ~/Downloads/ +``` +By default, the file is saved as `l_openvino_toolkit_fpga_p_.tgz`. + +3. Unpack the .tgz file: +```sh +tar -xvzf l_openvino_toolkit_fpga_p_.tgz +``` +The files are unpacked to the `l_openvino_toolkit_fpga_p_` directory. + +4. Go to the `l_openvino_toolkit_fpga_p_` directory: +```sh +cd l_openvino_toolkit_fpga_p_ +``` +If you have a previous version of the Intel Distribution of OpenVINO toolkit installed, rename or delete these two directories: +- `/home//inference_engine_samples` +- `/home//openvino_models` + +**Installation Notes:** +- Choose an installation option and run the related script as root. +- You can use either a GUI installation wizard or command-line instructions (CLI). +- Screenshots are provided for the GUI, but not for CLI. The following information also applies to CLI and will be helpful to your installation where you will be presented with the same choices and tasks. + +5. Choose your installation option: + - **Option 1:** GUI Installation Wizard: +```sh +sudo ./install_GUI.sh +``` + - **Option 2:** Command-Line Instructions: +```sh +sudo ./install.sh +``` +6. Follow the instructions on your screen. Watch for informational +messages such as the following in case you must complete additional +steps: +![](../img/install-linux-fpga-01.png) + +7. If you select the default options, the **Installation summary** GUI screen looks like this: +![](../img/install-linux-fpga-02.png) + - **Optional:** You can choose **Customize** and select only the bitstreams for your card. This will allow you to minimize + the size of the download by several gigabytes. + - The following bitstreams listed at the bottom of the customization screen are highlighted below. Choose the one for your FPGA: + ![](../img/install-linux-fpga-04.png) + - When installed as **root** the default installation directory for the Intel Distribution of OpenVINO is + `/opt/intel/openvino_fpga_2019./`.
+ For simplicity, a symbolic link to the latest installation is also created: `/opt/intel/openvino/`. + +8. A Complete screen indicates that the core components have been installed: +![](../img/install-linux-fpga-05.png) + +The first core components are installed. Continue to the next section to install additional dependencies. + +## Install External Software Dependencies + +These dependencies are required for: + +- Intel-optimized build of OpenCV library +- Deep Learning Inference Engine +- Deep Learning Model Optimizer tools + +1. Change to the `install_dependencies` directory: +```sh +cd /opt/intel/openvino/install_dependencies +``` +2. Run a script to download and install the external software dependencies: +```sh +sudo -E ./install_openvino_dependencies.sh +``` + +The dependencies are installed. Continue to the next section to configure the Model Optimizer. + +## Configure the Model Optimizer + +The Model Optimizer is a Python\*-based command line tool for importing +trained models from popular deep learning frameworks such as Caffe\*, +TensorFlow\*, Apache MXNet\*, ONNX\* and Kaldi\*. + +The Model Optimizer is a key component of the Intel Distribution of +OpenVINO toolkit. You cannot perform inference on your trained model without +running the model through the Model Optimizer. When you run a +pre-trained model through the Model Optimizer, your output is an +Intermediate Representation (IR) of the network. The Intermediate +Representation is a pair of files that describe the whole model: + +- `.xml`: Describes the network topology +- `.bin`: Contains the weights and biases binary data + +For more information about the Model Optimizer, refer to the [Model Optimizer Developer Guide](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md).  + +### Model Optimizer Configuration Steps + +> **IMPORTANT**: The Internet access is required to execute the following steps successfully. If you have access to the Internet through the proxy server only, please make sure that it is configured in your environment. + +You can choose to either configure all supported frameworks at once **OR** configure one framework at a time. Choose the option that best suits your needs. If you see error messages, make sure you installed all dependencies. + +> **NOTE**: If you installed the Intel® Distribution of OpenVINO™ to the non-default install directory, replace `/opt/intel` with the directory in which you installed the software. + +**Option 1: Configure all supported frameworks at the same time** + +1. Go to the Model Optimizer prerequisites directory: +```sh +cd /opt/intel/openvino/deployment_tools/model_optimizer/install_prerequisites +``` +2. Run the script to configure the Model Optimizer for Caffe, + TensorFlow, MXNet, Kaldi\*, and ONNX: +```sh +sudo ./install_prerequisites.sh +``` + +**Option 2: Configure each framework separately** + +Configure individual frameworks separately **ONLY** if you did not select **Option 1** above. +1. Go to the Model Optimizer prerequisites directory: +```sh +cd /opt/intel/openvino/deployment_tools/model_optimizer/install_prerequisites +``` +2. Run the script for your model framework. You can run more than one script: +- For **Caffe**: + +```sh +sudo ./install_prerequisites_caffe.sh +``` +- For **TensorFlow**: +```sh +sudo ./install_prerequisites_tf.sh +``` +- For **MXNet**: +```sh +sudo ./install_prerequisites_mxnet.sh +``` +- For **ONNX**: +```sh +sudo ./install_prerequisites_onnx.sh +``` +- For **Kaldi**: +```sh +sudo ./install_prerequisites_kaldi.sh +``` +The Model Optimizer is configured for one or more frameworks. + +You are ready to compile the samples by running the verification scripts. + +## Run the Verification Scripts to Verify Installation and Compile Samples + +To verify the installation and compile two samples, run the verification applications provided with the product on the CPU: + +1. Go to the **Inference Engine demo** directory: +```sh +cd /opt/intel/openvino/deployment_tools/demo +``` + +2. Run the **Image Classification verification script**: +```sh +./demo_squeezenet_download_convert_run.sh +``` +This verification script downloads a SqueezeNet model, uses the Model Optimizer to convert the model to the .bin and .xml Intermediate Representation (IR) files. The Inference Engine requires this model conversion so it can use the IR as input and achieve optimum performance on Intel hardware.
+This verification script builds the [Image Classification Sample Async](../../inference-engine/samples/classification_sample_async/README.md) application and run it with the `car.png` image in the demo directory. When the verification script completes, you will have the label and confidence for the top-10 categories: +![](../img/image_classification_script_output_lnx.png) + +3. Run the **Inference Pipeline verification script**: +```sh +./demo_security_barrier_camera.sh +``` +This verification script builds the [Security Barrier Camera Demo](@ref omz_demos_security_barrier_camera_demo_README) application included in the package. + + This verification script uses the `car_1.bmp` image in the demo directory to show an inference pipeline using three of the pre-trained models. The verification script uses vehicle recognition in which vehicle attributes build on each other to narrow in on a specific attribute. + + First, an object is identified as a vehicle. This identification is used as input to the next model, which identifies specific vehicle attributes, including the license plate. Finally, the attributes identified as the license plate are used as input to the third model, which recognizes specific characters in the license plate. + + When the verification script completes, you will see an image that displays the resulting frame with detections rendered as bounding boxes, and text: + ![](../img/security-barrier-results.png) + +4. Close the image viewer window to complete the verification script. + + +To learn about the verification scripts, see the `README.txt` file in `/opt/intel/openvino/deployment_tools/demo`. + +For a description of the Intel Distribution of OpenVINO™ pre-trained object detection and object recognition models, see [Overview of OpenVINO™ Toolkit Pre-Trained Models](@ref omz_models_intel_index). + +You have completed all required installation, configuration and build steps in this guide to use your CPU to work with your trained models. To use other hardware, see Install and Configure your Compatible Hardware below. + +## Install and Configure Your Compatible Hardware + +Install your compatible hardware from the list of supported components below. + +> **NOTE**: Once you've completed your hardware installation, you'll return to this guide to finish installation and configuration of the Intel® Distribution of OpenVINO™ toolkit. + +Links to install and configure compatible hardware +- [The Intel® Programmable Acceleration Card (PAC) with Intel® Arria® 10 GX FPGA](PAC_Configure.md) +- [The Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA SG2 (Mustang-F100-A10)](VisionAcceleratorFPGA_Configure.md) +- [Intel® Vision Accelerator Design with Intel® Movidius™ VPUs](installing-openvino-linux-ivad-vpu.md) + +Congratulations, you have finished the Intel® Distribution of OpenVINO™ toolkit installation for FPGA. To learn more about how the Intel® Distribution of OpenVINO™ toolkit works, the Hello World tutorial and other resources are provided below. + +## Hello World Face Detection Tutorial + +Refer to the [OpenVINO™ with FPGA Hello World Face Detection Exercise](https://github.com/intel-iot-devkit/openvino-with-fpga-hello-world-face-detection). + +## Troubleshooting + +PRC developers might encounter pip installation related issues during OpenVINO™ installation. To resolve the issues, you may use one of the following options at your discretion: +* Add the download source with `-i` parameter in the `pip` command. For example: +``` +pip install numpy.py -i https://mirrors.aliyun.com/pypi/simple/ +``` +Use the `--trusted-host` parameter if the URL above is `http` instead of `https`. + +* Modify or create `~/.pip/pip.conf` file to change the default download source with the content below: +``` +[global] +index-url = http://mirrors.aliyun.com/pypi/simple/ +[install] +trusted-host = mirrors.aliyun.com +``` + +**Additional Resources** + +- Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) +- OpenVINO™ toolkit online documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org) +- [Inference Engine FPGA plugin documentation](../IE_DG/supported_plugins/FPGA.md) +- [Model Optimizer Developer Guide](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +- For more information on Sample Applications, see the [Inference Engine Samples Overview](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Samples_Overview.html) +- To learn about pre-trained models for OpenVINO™ toolkit, see the [Pre-Trained Models Overview](https://docs.openvinotoolkit.org/latest/_docs_docs_Pre_Trained_Models.html) +- For information on Inference Engine Tutorials, see the [Inference Tutorials](https://github.com/intel-iot-devkit/inference-tutorials-generic) +- For IoT Libraries & Code Samples see the [Intel® IoT Developer Kit](https://github.com/intel-iot-devkit). + +To learn more about converting models, go to: + +- [Convert Your Caffe* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_Caffe.md) +- [Convert Your TensorFlow* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md) +- [Convert Your MXNet* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_MxNet.md) +- [Convert Your ONNX* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md) + + diff --git a/docs/install_guides/installing-openvino-linux-ivad-vpu.md b/docs/install_guides/installing-openvino-linux-ivad-vpu.md new file mode 100644 index 00000000000000..405f2f6e32e8bb --- /dev/null +++ b/docs/install_guides/installing-openvino-linux-ivad-vpu.md @@ -0,0 +1,190 @@ +# Configuration Guide for the Intel® Distribution of OpenVINO™ toolkit and the Intel® Vision Accelerator Design with Intel® Movidius™ VPUs on Linux* {#openvino_docs_install_guides_installing_openvino_linux_ivad_vpu} + +> **NOTES**: +> - These steps are only required if you want to perform inference on Intel® Vision Accelerator Design with Intel® Movidius™ VPUs. +> - If you installed the Intel® Distribution of OpenVINO™ to the non-default install directory, replace `/opt/intel` with the directory in which you installed the software. + + +## Configuration Steps + +For Intel® Vision Accelerator Design with Intel® Movidius™ VPUs, the following additional installation steps are required. + +1. Set the environment variables: +```sh +source /opt/intel/openvino/bin/setupvars.sh +``` +> **NOTE**: The `HDDL_INSTALL_DIR` variable is set to `/deployment_tools/inference_engine/external/hddl`. If you installed the Intel® Distribution of OpenVINO™ to the default install directory, the `HDDL_INSTALL_DIR` was set to `/opt/intel/openvino//deployment_tools/inference_engine/external/hddl`. + +2. Install dependencies: +```sh +${HDDL_INSTALL_DIR}/install_IVAD_VPU_dependencies.sh +``` +Note, if the Linux kernel is updated after the installation, it is required to install drivers again: +```sh +cd ${HDDL_INSTALL_DIR}/drivers +``` +```sh +sudo ./setup.sh install +``` +Now the dependencies are installed and you are ready to use the Intel® Vision Accelerator Design with Intel® Movidius™ with the Intel® Distribution of OpenVINO™ toolkit. + +## Optional Steps + +* For advanced configuration steps for your IEI Mustang-V100-MX8 accelerator, see [Intel® Movidius™ VPUs Setup Guide for Use with Intel® Distribution of OpenVINO™ toolkit](movidius-setup-guide.md). + +* After you've configured your Intel® Vision Accelerator Design with Intel® Movidius™ VPUs, see [Intel® Movidius™ VPUs Programming Guide for Use with Intel® Distribution of OpenVINO™ toolkit](movidius-programming-guide.md) to learn how to distribute a model across all 8 VPUs to maximize performance. + +## Troubleshooting + +### Unable to run inference with the MYRIAD Plugin after running with the HDDL Plugin + +Running inference with the MYRIAD Plugin after running with the HDDL Plugin is failed with the following error generated: + +```sh +E: [ncAPI] [ 965618] [MainThread] ncDeviceOpen:677 Failed to find a device, rc: X_LINK_ERROR +``` + +**Possible solutions (use one of the following):** + +* Reboot the host system and run with the MYRIAD Plugin + +* Kill the HDDL Plugin backend service (`hddldaemon`) and reset all Intel® Movidius™ VPUs before running an application that uses the MYRIAD Plugin: +```sh +kill -9 $(pidof hddldaemon autoboot) +pidof hddldaemon autoboot # Make sure none of them is alive +source /opt/intel/openvino/bin/setupvars.sh +${HDDL_INSTALL_DIR}/bin/bsl_reset +``` + +--- +### Get the "No space left on device" error while loading a network +When the application runs inference of a network with a big size(>4MB) of input/output or if the system is running out of the DMA buffer, +the HDDL Plugin will fall back to use shared memory. +In this case, if the application exits abnormally, the shared memory is not released automatically. +To release it manually, remove files with the `hddl_` prefix from the `/dev/shm` directory: +```sh +sudo rm -f /dev/shm/hddl_* +``` + +--- +### How to solve the permission issue? + +Make sure that the following udev rules exist: + - `/etc/udev/rules.d/97-myriad-usbboot.rules` + - `/etc/udev/rules.d/98-hddlbsl.rules` + - `/etc/udev/rules.d/99-hddl-ion.rules` + - `/etc/udev/rules.d/99-myriad-vsc.rules` + +Also make sure that the current user is included in the users groups. If not, run the command below to include: +```sh +sudo usermod -a -G users "$(whoami)" +``` + +--- +### `setup.sh` doesn't install the driver binaries to `/lib/modules` on CentOS systems + +As a temporary workaround, run the commands below to install the drivers. This issue will be fixed in future releases. + +```sh +sudo mkdir -p /lib/modules/$(uname -r)/kernel/drivers/myd/ +``` +```sh +sudo cp drv_vsc/myd_vsc.ko /lib/modules/$(uname -r)/kernel/drivers/myd/ +``` +```sh +sudo cp drv_ion/myd_ion.ko /lib/modules/$(uname -r)/kernel/drivers/myd/ +``` +```sh +sudo touch /etc/modules-load.d/intel_vision_accelerator.conf +``` +```sh +sudo echo "myd_vsc" >> /etc/modules-load.d/intel_vision_accelerator.conf +``` +```sh +sudo echo "myd_ion" >> /etc/modules-load.d/intel_vision_accelerator.conf +``` +```sh +sudo depmod +``` +```sh +sudo modprobe myd_vsc +``` +```sh +sudo modprobe myd_ion +``` + +--- +### Host machine reboots after running an inference application with the HDDL plugin + +**Symptom:** Boot up the host machine, run the inference application with the HDDL plugin. System reboots in a uncertain time. + +**Root Cause:** The I2C address of the reset device of the Intel® Vision Accelerator Design with Intel® Movidius™ VPUs conflicts with another device I2C address in 0x20-0x27 range. If the target Intel® Vision Accelerator Design with Intel® Movidius™ VPUs device needs to be reset (for example, in case of device errors), the `libbsl` library, which is responsible for reset, expects that the target reset device I2C address is in the 0x20-0x27 range on SMBUS. If there is another device on SMBUS in this address range, `libbsl` treats this device as the target reset device and writes an unexpected value to this address. This causes system reboot. + +**Solution:** Detect if there is any I2C device on SMBUS with address in 0x20-0x27 range. If yes, do the following: + +1. Change the DIP switch on the target PCIE card +2. Disable autoscan for the reset device by setting `"autoscan": false` in `${HDDL_INSTALL_DIR}/config/bsl.json` +3. Set the correct address of the I2C reset device (for example, `0x21`) in `${HDDL_INSTALL_DIR}/config/bsl.json` + +```sh +{ + "autoscan": false, + "ioexpander": { + "enabled": true, + "i2c_addr": [ 33 ] + } +} +``` + +--- +###Cannot reset VPU device and cannot find any 0x20-0x27 (Raw data card with HW version Fab-B and before) I2C addresses on SMBUS (using i2c-tools) + +Please contact your motherboard vendor to verify SMBUS pins are connected to the PCIe slot. + +--- +### Get "Error: ipc_connection_linux_UDS : bind() failed" in hddldaemon log. + +You may have run hddldaemon under another user. Run the command below and try again: +```sh +sudo rm -rf /var/tmp/hddl_* +``` + +--- +### Get "I2C bus: SMBus I801 adapter at not found!" in hddldaemon log + +Run the following command to check if a SMBUS I801 adapter can be found: +```sh +i2cdetect -l +``` +Then run: +```sh +sudo modprobe i2c-i801 +``` +--- +### Get "open /dev/ion failed!" in hddldaemon log + +Check if `myd_ion` kernel module is installed by running the following command: +```sh +lsmod | grep myd_ion +``` +If you do not see any output from the command, reinstall the `myd_ion` module. + +--- +### Constantly get "\_name\_mapping open failed err=2,No such file or directory" in hddldaemon log + +Check if myd_vsc kernel module is installed by running the following command: +```sh +lsmod | grep myd_vsc +``` +If you do not see any output from the command reinstall the `myd_vsc` module. + +--- +### Get "Required key not available" when trying to install the `myd_ion` or `myd_vsc` modules + +Run the following commands: +```sh +sudo apt install mokutil +``` +```sh +sudo mokutil --disable-validation +``` diff --git a/docs/install_guides/installing-openvino-linux.md b/docs/install_guides/installing-openvino-linux.md new file mode 100644 index 00000000000000..1bf3dffe1f7db5 --- /dev/null +++ b/docs/install_guides/installing-openvino-linux.md @@ -0,0 +1,468 @@ +# Install Intel® Distribution of OpenVINO™ toolkit for Linux* {#openvino_docs_install_guides_installing_openvino_linux} + +> **NOTES**: +> - These steps apply to Ubuntu\*, CentOS\*, and Yocto\*. +> - If you are using Intel® Distribution of OpenVINO™ toolkit on Windows\* OS, see the [Installation Guide for Windows*](installing-openvino-windows.md). +> - For the Intel Distribution of OpenVINO toolkit with FPGA support, see [Installation Guide for Linux* with FPGA support](installing-openvino-linux-fpga.md). +> - CentOS and Yocto installations will require some modifications that are not covered in this guide. +> - An internet connection is required to follow the steps in this guide. +> - [Intel® System Studio](https://software.intel.com/en-us/system-studio) is an all-in-one, cross-platform tool suite, purpose-built to simplify system bring-up and improve system and IoT device application performance on Intel® platforms. If you are using the Intel® Distribution of OpenVINO™ with Intel® System Studio, go to [Get Started with Intel® System Studio](https://software.intel.com/en-us/articles/get-started-with-openvino-and-intel-system-studio-2019). + +## Introduction + +The Intel® Distribution of OpenVINO™ toolkit quickly deploys applications and solutions that emulate human vision. Based on Convolutional Neural Networks (CNN), the toolkit extends computer vision (CV) workloads across Intel® hardware, maximizing performance. The Intel® Distribution of OpenVINO™ toolkit includes the Intel® Deep Learning Deployment Toolkit (Intel® DLDT). + +The Intel® Distribution of OpenVINO™ toolkit for Linux\*: +- Enables CNN-based deep learning inference on the edge +- Supports heterogeneous execution across Intel® CPU, Intel® Integrated Graphics, Intel® Neural Compute Stick 2, and Intel® Vision Accelerator Design with Intel® Movidius™ VPUs +- Speeds time-to-market via an easy-to-use library of computer vision functions and pre-optimized kernels +- Includes optimized calls for computer vision standards including OpenCV\* and OpenCL™ + +**Included with the Installation and installed by default:** + +| Component | Description | +|-----------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) | This tool imports, converts, and optimizes models that were trained in popular frameworks to a format usable by Intel tools, especially the Inference Engine. 
Popular frameworks include Caffe\*, TensorFlow\*, MXNet\*, and ONNX\*. | +| [Inference Engine](../IE_DG/inference_engine_intro.md) | This is the engine that runs the deep learning model. It includes a set of libraries for an easy inference integration into your applications. | +| Drivers and runtimes for OpenCL™ version 2.1 | Enables OpenCL on the GPU/CPU for Intel® processors | +| Intel® Media SDK | Offers access to hardware accelerated video codecs and frame processing | +| [OpenCV](https://docs.opencv.org/master/) | OpenCV\* community version compiled for Intel® hardware | +| [Inference Engine Code Samples](../IE_DG/Samples_Overview.md) | A set of simple console applications demonstrating how to utilize specific OpenVINO capabilities in an application and how to perform specific tasks, such as loading a model, running inference, querying specific device capabilities, and more. | +| [Demo Applications](@ref omz_demos_README) | A set of simple console applications that provide robust application templates to help you implement specific deep learning scenarios. | +| [Additional Tools](../IE_DG/Tools_Overview.md) | A set of tools to work with your models | +| [Documentation for Pre-Trained Models ](@ref omz_models_intel_index) | Documentation for the pre-trained models available in the [Open Model Zoo repo](https://github.com/opencv/open_model_zoo) | + +## System Requirements + +**Hardware** + +* 6th to 10th generation Intel® Core™ processors and Intel® Xeon® processors +* Intel® Xeon® processor E family (formerly code named Sandy Bridge, Ivy Bridge, Haswell, and Broadwell) +* 3rd generation Intel® Xeon® Scalable processor (formerly code named Cooper Lake) +* Intel® Xeon® Scalable processor (formerly Skylake and Cascade Lake) +* Intel Atom® processor with support for Intel® Streaming SIMD Extensions 4.1 (Intel® SSE4.1) +* Intel Pentium® processor N4200/5, N3350/5, or N3450/5 with Intel® HD Graphics +* Intel® Neural Compute Stick 2 +* Intel® Vision Accelerator Design with Intel® Movidius™ VPUs + +> **NOTE**: With OpenVINO™ 2020.4 release, Intel® Movidius™ Neural Compute Stick is no longer supported. + +**Processor Notes:** + +- Processor graphics are not included in all processors. See [Product Specifications](https://ark.intel.com/) for information about your processor. +- A chipset that supports processor graphics is required for Intel® Xeon® processors. + +**Operating Systems** + +- Ubuntu 18.04.x long-term support (LTS), 64-bit +- CentOS 7.4, 64-bit (for target only) +- Yocto Project v3.0, 64-bit (for target only and requires modifications) + +## Overview + +This guide provides step-by-step instructions on how to install the Intel® Distribution of OpenVINO™ toolkit. Links are provided for each type of compatible hardware including downloads, initialization and configuration steps. The following steps will be covered: + +1. Install the Intel® Distribution of OpenVINO™ Toolkit +2. Install External software dependencies +3. Set the OpenVINO™ Environment Variables: Optional Update to .bashrc. +4. Configure the Model Optimizer +5. Run the Verification Scripts to Verify Installation and Compile Samples +6. Steps for Intel® Processor Graphics (GPU) +7. Steps for Intel® Neural Compute Stick 2 +8. Steps for Intel® Vision Accelerator Design with Intel® Movidius™ VPU
+After installing your Intel® Movidius™ VPU, you will return to this guide to complete OpenVINO™ installation. +9. Run a Sample Application +10. Use the Face Detection Tutorial + +## Install the Intel® Distribution of OpenVINO™ Toolkit Core Components + +Download the Intel® Distribution of OpenVINO™ toolkit package file from [Intel® Distribution of OpenVINO™ toolkit for Linux*](https://software.intel.com/en-us/openvino-toolkit/choose-download). +Select the Intel® Distribution of OpenVINO™ toolkit for Linux package from the dropdown menu. + +1. Open a command prompt terminal window. +2. Change directories to where you downloaded the Intel Distribution of +OpenVINO toolkit for Linux\* package file.
+If you downloaded the package file to the current user's `Downloads` directory: +```sh +cd ~/Downloads/ +``` +By default, the file is saved as `l_openvino_toolkit_p_.tgz`. + +3. Unpack the .tgz file: +```sh +tar -xvzf l_openvino_toolkit_p_.tgz +``` +The files are unpacked to the `l_openvino_toolkit_p_` directory. + +4. Go to the `l_openvino_toolkit_p_` directory: +```sh +cd l_openvino_toolkit_p_ +``` +If you have a previous version of the Intel Distribution of OpenVINO +toolkit installed, rename or delete these two directories: +- `~/inference_engine_samples_build` +- `~/openvino_models` + +**Installation Notes:** + +- Choose an installation option and run the related script as root. +- You can use either a GUI installation wizard or command-line instructions (CLI). +- Screenshots are provided for the GUI, but not for CLI. The following information also applies to CLI and will be helpful to your installation where you will be presented with the same choices and tasks. + +5. Choose your installation option: + - **Option 1:** GUI Installation Wizard: +```sh +sudo ./install_GUI.sh +``` + - **Option 2:** Command-Line Instructions: +```sh +sudo ./install.sh +``` +6. Follow the instructions on your screen. Watch for informational +messages such as the following in case you must complete additional +steps: +![](../img/openvino-install-linux-01.png) + +7. If you select the default options, the **Installation summary** GUI screen +looks like this: +![](../img/openvino-install-linux-02.png) + - **Optional:** You can choose **Customize** to change the installation directory or the components you want to install: + ![](../img/openvino-install-linux-03.png) + When installed as **root** the default installation directory for the Intel Distribution of OpenVINO is + `/opt/intel/openvino_/`.
+ For simplicity, a symbolic link to the latest installation is also created: `/opt/intel/openvino/`. + > **NOTE**: The Intel® Media SDK component is always installed in the `/opt/intel/mediasdk` directory regardless of the OpenVINO installation path chosen. + +8. A Complete screen indicates that the core components have been installed: + +![](../img/openvino-install-linux-04.png) + +The first core components are installed. Continue to the next section to install additional dependencies. + +## Install External Software Dependencies + +> **NOTE**: If you installed the Intel® Distribution of OpenVINO™ to the non-default install directory, replace `/opt/intel` with the directory in which you installed the software. + +These dependencies are required for: + +- Intel-optimized build of OpenCV library +- Deep Learning Inference Engine +- Deep Learning Model Optimizer tools + +1. Change to the `install_dependencies` directory: +```sh +cd /opt/intel/openvino/install_dependencies +``` +2. Run a script to download and install the external software dependencies: +```sh +sudo -E ./install_openvino_dependencies.sh +``` +The dependencies are installed. Continue to the next section to set your environment variables. + +## Set the Environment Variables + +You must update several environment variables before you can compile and run OpenVINO™ applications. Run the following script to temporarily set your environment variables: + +```sh +source /opt/intel/openvino/bin/setupvars.sh +``` + +**Optional:** The OpenVINO environment variables are removed when you close the shell. As an option, you can permanently set the environment variables as follows: + +1. Open the `.bashrc` file in ``: +```sh +vi /.bashrc +``` + +2. Add this line to the end of the file: +```sh +source /opt/intel/openvino/bin/setupvars.sh +``` + +3. Save and close the file: press the **Esc** key and type `:wq`. + +4. To test your change, open a new terminal. You will see `[setupvars.sh] OpenVINO environment initialized`. + +The environment variables are set. Continue to the next section to configure the Model Optimizer. + +## Configure the Model Optimizer + +The Model Optimizer is a Python\*-based command line tool for importing +trained models from popular deep learning frameworks such as Caffe\*, +TensorFlow\*, Apache MXNet\*, ONNX\* and Kaldi\*. + +The Model Optimizer is a key component of the Intel Distribution of OpenVINO toolkit. You cannot perform inference on your trained model without +running the model through the Model Optimizer. When you run a pre-trained model through the Model Optimizer, your output is an +Intermediate Representation (IR) of the network. The Intermediate Representation is a pair of files that describe the whole model: + +- `.xml`: Describes the network topology +- `.bin`: Contains the weights and biases binary data + +For more information about the Model Optimizer, refer to the [Model Optimizer Developer Guide](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md).  + +### Model Optimizer Configuration Steps + +You can choose to either configure all supported frameworks at once **OR** configure one framework at a time. Choose the option that best suits your needs. If you see error messages, make sure you installed all dependencies. + +> **NOTE**: Since the TensorFlow framework is not officially supported on CentOS*, the Model Optimizer for TensorFlow can't be configured and ran on those systems. + +> **IMPORTANT**: The Internet access is required to execute the following steps successfully. If you have access to the Internet through the proxy server only, please make sure that it is configured in your OS environment. + +**Option 1: Configure all supported frameworks at the same time** + +1. Go to the Model Optimizer prerequisites directory: +```sh +cd /opt/intel/openvino/deployment_tools/model_optimizer/install_prerequisites +``` +2. Run the script to configure the Model Optimizer for Caffe, + TensorFlow, MXNet, Kaldi\*, and ONNX: +```sh +sudo ./install_prerequisites.sh +``` + +**Option 2: Configure each framework separately** + +Configure individual frameworks separately **ONLY** if you did not select **Option 1** above. + +1. Go to the Model Optimizer prerequisites directory: +```sh +cd /opt/intel/openvino/deployment_tools/model_optimizer/install_prerequisites +``` +2. Run the script for your model framework. You can run more than one script: + + - For **Caffe**: + ```sh + sudo ./install_prerequisites_caffe.sh + ``` + + - For **TensorFlow**: + ```sh + sudo ./install_prerequisites_tf.sh + ``` + + - For **MXNet**: + ```sh + sudo ./install_prerequisites_mxnet.sh + ``` + + - For **ONNX**: + ```sh + sudo ./install_prerequisites_onnx.sh + ``` + + - For **Kaldi**: + ```sh + sudo ./install_prerequisites_kaldi.sh + ``` +The Model Optimizer is configured for one or more frameworks. + +You are ready to compile the samples by running the verification scripts. + +## Run the Verification Scripts to Verify Installation + +> **IMPORTANT**: This section is required. In addition to confirming your installation was successful, demo scripts perform other steps, such as setting up your computer to use the Inference Engine samples. + +To verify the installation and compile two samples, use the steps below to run the verification applications provided with the product on the CPU. + +> **NOTE:** To run the demo applications on Intel® Processor Graphics or Intel® Neural Compute Stick 2 devices, make sure you first completed the additional Steps for Intel® Processor Graphics (GPU) or Steps for Intel® Neural Compute Stick 2. + +1. Go to the **Inference Engine demo** directory: +```sh +cd /opt/intel/openvino/deployment_tools/demo +``` + +2. Run the **Image Classification verification script**: +```sh +./demo_squeezenet_download_convert_run.sh +``` +This verification script downloads a SqueezeNet model, uses the Model Optimizer to convert the model to the .bin and .xml Intermediate Representation (IR) files. The Inference Engine requires this model conversion so it can use the IR as input and achieve optimum performance on Intel hardware.
+This verification script builds the [Image Classification Sample Async](../../inference-engine/samples/classification_sample_async/README.md) application and run it with the `car.png` image located in the demo directory. When the verification script completes, you will have the label and confidence for the top-10 categories: +![](../img/image_classification_script_output_lnx.png) + +3. Run the **Inference Pipeline verification script**: +```sh +./demo_security_barrier_camera.sh +``` +This script downloads three pre-trained model IRs, builds the [Security Barrier Camera Demo](@ref omz_demos_security_barrier_camera_demo_README) application, and runs it with the downloaded models and the `car_1.bmp` image from the `demo` directory to show an inference pipeline. The verification script uses vehicle recognition in which vehicle attributes build on each other to narrow in on a specific attribute. + + First, an object is identified as a vehicle. This identification is used as input to the next model, which identifies specific vehicle attributes, including the license plate. Finally, the attributes identified as the license plate are used as input to the third model, which recognizes specific characters in the license plate. + + When the verification script completes, you will see an image that displays the resulting frame with detections rendered as bounding boxes, and text: + ![](../img/inference_pipeline_script_lnx.png) + +4. Close the image viewer window to complete the verification script. + + +To learn about the verification scripts, see the `README.txt` file in `/opt/intel/openvino/deployment_tools/demo`. + +For a description of the Intel Distribution of OpenVINO™ pre-trained object detection and object recognition models, see [Overview of OpenVINO™ Toolkit Pre-Trained Models](@ref omz_models_intel_index). + +You have completed all required installation, configuration and build steps in this guide to use your CPU to work with your trained models. +To use other hardware, see; +- Steps for Intel® Processor Graphics (GPU) +- Steps for Intel® Neural Compute Stick 2 +- Steps for Intel® Vision Accelerator Design with Intel® Movidius™ VPUs
+ +## Steps for Intel® Processor Graphics (GPU) + +The steps in this section are required only if you want to enable the toolkit components to use processor graphics (GPU) on your system. + +1. Go to the install_dependencies directory: +```sh +cd /opt/intel/openvino/install_dependencies/ +``` +2. Enter the super user mode: +```sh +sudo -E su +``` +3. Install the **Intel® Graphics Compute Runtime for OpenCL™** driver components required to use the GPU plugin and write custom layers for Intel® Integrated Graphics: +```sh +./install_NEO_OCL_driver.sh +``` +You may see the following command line output: + +- Add OpenCL user to video group +- Run script to install the 4.14 kernel script + +Ignore those suggestions and continue. + +4. **Optional** Install header files to allow compiling a new code. You can find the header files at [Khronos OpenCL™ API Headers](https://github.com/KhronosGroup/OpenCL-Headers.git). + +## Steps for Intel® Neural Compute Stick 2 + +These steps are only required if you want to perform inference on Intel® Movidius™ NCS powered by the Intel® Movidius™ Myriad™ 2 VPU or Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X VPU. See also the [Get Started page for Intel® Neural Compute Stick 2:](https://software.intel.com/en-us/neural-compute-stick/get-started) + +1. Add the current Linux user to the `users` group: +```sh +sudo usermod -a -G users "$(whoami)" +``` +Log out and log in for it to take effect. + +2. To perform inference on Intel® Neural Compute Stick 2, install the USB rules as follows: +```sh +sudo cp /opt/intel/openvino/inference_engine/external/97-myriad-usbboot.rules /etc/udev/rules.d/ +``` +```sh +sudo udevadm control --reload-rules +``` +```sh +sudo udevadm trigger +``` +```sh +sudo ldconfig +``` +> **NOTE**: You may need to reboot your machine for this to take effect. + +## Steps for Intel® Vision Accelerator Design with Intel® Movidius™ VPUs + +To install and configure your Intel® Vision Accelerator Design with Intel® Movidius™ VPUs, see the [Intel® Vision Accelerator Design with Intel® Movidius™ VPUs Configuration Guide](installing-openvino-linux-ivad-vpu.md). + +> **NOTE**: After installing your Intel® Movidius™ VPU, you will return to this guide to complete the Intel® Distribution of OpenVINO™ installation. + +After configuration is done, you are ready to run the verification scripts with the HDDL Plugin for your Intel® Vision Accelerator Design with Intel® Movidius™ VPUs: + +1. Go to the **Inference Engine demo** directory: +```sh +cd /opt/intel/openvino/deployment_tools/demo +``` + +2. Run the **Image Classification verification script**. If you have access to the Internet through the proxy server only, please make sure that it is configured in your OS environment. +```sh +./demo_squeezenet_download_convert_run.sh -d HDDL +``` + +3. Run the **Inference Pipeline verification script**: +```sh +./demo_security_barrier_camera.sh -d HDDL +``` + +## Run a Sample Application + +> **IMPORTANT**: This section requires that you have [Run the Verification Scripts to Verify Installation](#run-the-demos). This script builds the Image Classification sample application and downloads and converts the required Caffe* Squeezenet model to an IR. + +In this section you will run the Image Classification sample application, with the Caffe* Squeezenet1.1 model on three types of Intel® hardware: CPU, GPU and VPUs. + +Image Classification sample application binary file was automatically built and the FP16 model IR files are created when you [Ran the Image Classification Verification Script](#run-the-image-classification-verification-script). + +The Image Classification sample application binary file located in the `/home//inference_engine_samples_build/intel64/Release` directory. +The Caffe* Squeezenet model IR files (`.bin` and `.xml`) are located in the `/home//openvino_models/ir/public/squeezenet1.1/FP16/` directory. + +> **NOTE**: If you installed the Intel® Distribution of OpenVINO™ to the non-default install directory, replace `/opt/intel` with the directory in which you installed the software. + +To run the sample application: + +1. Set up environment variables: +```sh +source /opt/intel/openvino/bin/setupvars.sh +``` +2. Go to the samples build directory: +```sh +cd ~/inference_engine_samples_build/intel64/Release +``` +3. Run the sample executable with specifying the `car.png` file from the `demo` directory as an input image, the IR of your FP16 model and a plugin for a hardware device to perform inference on. +> **NOTE**: Running the sample application on hardware other than CPU requires performing [additional hardware configuration steps](#optional-steps). + + - **For CPU**: + ```sh + ./classification_sample_async -i /opt/intel/openvino/deployment_tools/demo/car.png -m ~/openvino_models/ir/public/squeezenet1.1/FP16/squeezenet1.1.xml -d CPU + ``` + + - **For GPU**: + ```sh + ./classification_sample_async -i /opt/intel/openvino/deployment_tools/demo/car.png -m ~/openvino_models/ir/public/squeezenet1.1/FP16/squeezenet1.1.xml -d GPU + ``` + + - **For MYRIAD**: + > **NOTE**: Running inference on Intel® Neural Compute Stick 2 with the MYRIAD plugin requires performing [additional hardware configuration steps](#additional-NCS-steps). + ```sh + ./classification_sample_async -i /opt/intel/openvino/deployment_tools/demo/car.png -m ~/openvino_models/ir/public/squeezenet1.1/FP16/squeezenet1.1.xml -d MYRIAD + ``` + + - **For HDDL**: + > **NOTE**: Running inference on Intel® Vision Accelerator Design with Intel® Movidius™ VPUs with the HDDL plugin requires performing [additional hardware configuration steps](installing-openvino-linux-ivad-vpu.md) + ```sh + ./classification_sample_async -i /opt/intel/openvino/deployment_tools/demo/car.png -m ~/openvino_models/ir/public/squeezenet1.1/FP16/squeezenet1.1.xml -d HDDL + ``` + +For information on Sample Applications, see the [Inference Engine Samples Overview](../IE_DG/Samples_Overview.md). + +Congratulations, you have finished the installation of the Intel® Distribution of OpenVINO™ toolkit for Linux*. To learn more about how the Intel® Distribution of OpenVINO™ toolkit works, the Hello World tutorial and other resources are provided below. + +## Hello World Face Detection Tutorial + +See the [OpenVINO™ Hello World Face Detection Exercise](https://github.com/intel-iot-devkit/inference-tutorials-generic). + +## Troubleshooting + +PRC developers might encounter pip installation related issues during OpenVINO™ installation. To resolve the issues, you may use one of the following options at your discretion: +* Add the download source with `-i` parameter in the `pip` command. For example: +``` +pip install numpy.py -i https://mirrors.aliyun.com/pypi/simple/ +``` +Use the `--trusted-host` parameter if the URL above is `http` instead of `https`. + +* Modify or create `~/.pip/pip.conf` file to change the default download source with the content below: +``` +[global] +index-url = http://mirrors.aliyun.com/pypi/simple/ +[install] +trusted-host = mirrors.aliyun.com +``` + +## Additional Resources + +- Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) +- OpenVINO™ toolkit online documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org) +- [Model Optimizer Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) +- [Inference Engine Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Deep_Learning_Inference_Engine_DevGuide.html) +- For more information on Sample Applications, see the [Inference Engine Samples Overview](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Samples_Overview.html) +- For information on a set of pre-trained models, see the [Overview of OpenVINO™ Toolkit Pre-Trained Models](@ref omz_models_intel_index) +- For information on Inference Engine Tutorials, see the [Inference Tutorials](https://github.com/intel-iot-devkit/inference-tutorials-generic) +- For IoT Libraries and Code Samples see the [Intel® IoT Developer Kit](https://github.com/intel-iot-devkit). + +To learn more about converting models, go to: + +- [Convert Your Caffe* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_Caffe.md) +- [Convert Your TensorFlow* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md) +- [Convert Your MXNet* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_MxNet.md) +- [Convert Your ONNX* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md) diff --git a/docs/install_guides/installing-openvino-macos.md b/docs/install_guides/installing-openvino-macos.md new file mode 100644 index 00000000000000..9da8173f28f904 --- /dev/null +++ b/docs/install_guides/installing-openvino-macos.md @@ -0,0 +1,313 @@ +# Install Intel® Distribution of OpenVINO™ toolkit for macOS* {#openvino_docs_install_guides_installing_openvino_macos} + +> **NOTES**: +> - The Intel® Distribution of OpenVINO™ is supported on macOS\* 10.14.x versions. +> - This installation has been validated on macOS 10.14.4. +> - An internet connection is required to follow the steps in this guide. If you have access to the Internet through the proxy server only, please make sure that it is configured in your OS environment. + +## Introduction + +The Intel® Distribution of OpenVINO™ toolkit quickly deploys applications and solutions that emulate human vision. Based on Convolutional Neural Networks (CNN), the toolkit extends computer vision (CV) workloads across Intel® hardware, maximizing performance. + +The Intel® Distribution of OpenVINO™ toolkit for macOS* includes the Intel® Deep Learning Deployment Toolkit (Intel® DLDT) and OpenCV* to deploy applications for accelerated inference on Intel® CPUs. + +The Intel® Distribution of OpenVINO™ toolkit for macOS*: + +- Enables CNN-based deep learning inference on the edge +- Supports heterogeneous execution across Intel® CPU and Intel® Neural Compute Stick 2 with Intel® Movidius™ VPUs +- Speeds time-to-market via an easy-to-use library of computer vision functions and pre-optimized kernels +- Includes optimized calls for computer vision standards including OpenCV\* + +**Included with the Installation** + +The following components are installed by default: + +| Component | Description | +| :-------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| [Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) | This tool imports, converts, and optimizes models, which were trained in popular frameworks, to a format usable by Intel tools, especially the Inference Engine.
Popular frameworks include Caffe*, TensorFlow*, MXNet\*, and ONNX\*. | +| [Inference Engine](../IE_DG/inference_engine_intro.md) | This is the engine that runs a deep learning model. It includes a set of libraries for an easy inference integration into your applications. | +| [OpenCV\*](https://docs.opencv.org/master/) | OpenCV\* community version compiled for Intel® hardware | +| [Sample Applications](../IE_DG/Samples_Overview.md) | A set of simple console applications demonstrating how to use the Inference Engine in your applications. | +| [Demos](@ref omz_demos_README) | A set of console applications that demonstrate how you can use the Inference Engine in your applications to solve specific use-cases | +| [Additional Tools](../IE_DG/Tools_Overview.md) | A set of tools to work with your models | +| [Documentation for Pre-Trained Models ](@ref omz_models_intel_index) | Documentation for the pre-trained models available in the [Open Model Zoo repo](https://github.com/opencv/open_model_zoo) | + +## Development and Target Platform + +The development and target platforms have the same requirements, but you can select different components during the installation, based on your intended use. + +**Hardware** + +> **NOTE**: The current version of the Intel® Distribution of OpenVINO™ toolkit for macOS* supports inference on Intel CPUs and Intel® Neural Compute Sticks 2 only. + +* 6th to 10th generation Intel® Core™ processors and Intel® Xeon® processors +* Intel® Xeon® processor E family (formerly code named Sandy Bridge, Ivy Bridge, Haswell, and Broadwell) +* 3rd generation Intel® Xeon® Scalable processor (formerly code named Cooper Lake) +* Intel® Xeon® Scalable processor (formerly Skylake and Cascade Lake) +* Intel® Neural Compute Stick 2 + +**Software Requirements** + +- CMake 3.4 or higher +- Python 3.5 or higher +- Apple Xcode\* Command Line Tools +- (Optional) Apple Xcode\* IDE (not required for OpenVINO, but useful for development) + +**Operating Systems** + +- macOS\* 10.14.4 + +## Overview + +This guide provides step-by-step instructions on how to install the Intel® Distribution of OpenVINO™ 2020.1 toolkit for macOS*. + +The following steps will be covered: + +1. Install the Intel® Distribution of OpenVINO™ Toolkit . +2. Set the OpenVINO environment variables and (optional) Update to .bash_profile. +4. Configure the Model Optimizer. +5. Run verification scripts to verify installation and compile samples. + +## Install the Intel® Distribution of OpenVINO™ toolkit Core Components + +If you have a previous version of the Intel® Distribution of OpenVINO™ toolkit installed, rename or delete these two directories: + +- `/home//inference_engine_samples` +- `/home//openvino_models` + +[Download the latest version of OpenVINO toolkit for macOS*](https://software.intel.com/en-us/openvino-toolkit/choose-download/free-download-macos) then return to this guide to proceed with the installation. + +Install the OpenVINO toolkit core components: + +1. Go to the directory in which you downloaded the Intel® Distribution of OpenVINO™ toolkit. This document assumes this is your `Downloads` directory. By default, the disk image file is saved as `m_openvino_toolkit_p_.dmg`. + +2. Double-click the `m_openvino_toolkit_p_.dmg` file to mount. +The disk image is mounted to `/Volumes/m_openvino_toolkit_p_` and automatically opened in a separate window. + +3. Run the installation wizard application `m_openvino_toolkit_p_.app` + +4. On the **User Selection** screen, choose a user account for the installation: + - Root + - Administrator + - Current user + + ![](../img/openvino-install-macos-01.png) + + The default installation directory path depends on the privileges you choose for the installation. + +5. Click **Next** and follow the instructions on your screen. + +6. If you are missing external dependencies, you will see a warning screen. Take note of any dependencies you are missing. After installing the Intel® Distribution of OpenVINO™ toolkit core components, you will need to install the missing dependencies. For example, the screen example below indicates you are missing two dependencies: + ![](../img/openvino-install-macos-02.png) + +7. Click **Next**. + +8. The **Installation summary** screen shows you the default component set to install: + ![](../img/openvino-install-macos-03.png) + + - If you used **root** or **administrator** privileges to run the installer, it installs the OpenVINO toolkit to `/opt/intel/openvino_/` + + For simplicity, a symbolic link to the latest installation is also created: `/opt/intel/openvino/` + + - If you used **regular user** privileges to run the installer, it installs the OpenVINO toolkit to `/home//intel/openvino_/` + + For simplicity, a symbolic link to the latest installation is also created: `/home//intel/openvino/` + +9. If needed, click **Customize** to change the installation directory or the components you want to install: + ![](../img/openvino-install-macos-04.png) + + Click **Next** to save the installation options and show the Installation summary screen. + +10. On the **Installation summary** screen, press **Install** to begin the installation. + +11. When the first part of installation is complete, the final screen informs you that the core components have been installed + and additional steps still required: + ![](../img/openvino-install-macos-05.png) + +12. Click **Finish** to close the installation wizard. A new browser window opens to the next section of the Installation Guide to set the environment variables. If the installation did not indicate you must install dependencies, you can move ahead to [Set the Environment Variables](#set-the-environment-variables). If you received a message that you were missing external software dependencies, listed under **Software Requirements** at the top of this guide, you need to install them now before continuing on to the next section. + +## Set the Environment Variables + +You need to update several environment variables before you can compile and run OpenVINO™ applications. Open the macOS Terminal\* or a command-line interface shell you prefer and run the following script to temporarily set your environment variables: + + ```sh + source /opt/intel/openvino/bin/setupvars.sh + ``` + +Optional: The OpenVINO environment variables are removed when you close the shell. You can permanently set the environment variables as follows: + +1. Open the `.bash_profile` file in the current user home directory: + ```sh + vi ~/.bash_profile + ``` +2. Press the **i** key to switch to the insert mode. + +3. Add this line to the end of the file: + ```sh + source /opt/intel/openvino/bin/setupvars.sh + ``` + +3. Save and close the file: press the **Esc** key, type `:wq` and press the **Enter** key. + +4. To verify your change, open a new terminal. You will see `[setupvars.sh] OpenVINO environment initialized`. + +The environment variables are set. Continue to the next section to configure the Model Optimizer. + +## Configure the Model Optimizer + +The Model Optimizer is a Python\*-based command line tool for importing +trained models from popular deep learning frameworks such as Caffe\*, +TensorFlow\*, Apache MXNet\*, ONNX\* and Kaldi\*. + +The Model Optimizer is a key component of the OpenVINO toolkit. You cannot perform inference on your trained model without running the model through the Model Optimizer. When you run a pre-trained model through the Model Optimizer, your output is an Intermediate Representation (IR) of the network. The IR is a pair of files that describe the whole model: + +- `.xml`: Describes the network topology +- `.bin`: Contains the weights and biases binary data + +The Inference Engine reads, loads, and infers the IR files, using a common API on the CPU hardware. + +For more information about the Model Optimizer, see the [Model Optimizer Developer Guide](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). + +### Model Optimizer Configuration Steps + +You can choose to either configure the Model Optimizer for all supported frameworks at once, **OR** for one framework at a time. Choose the option that best suits your needs. If you see error messages, verify that you installed all dependencies listed under **Software Requirements** at the top of this guide. + +> **NOTE**: If you installed OpenVINO to a non-default installation directory, replace `/opt/intel/` with the directory where you installed the software. + +**Option 1: Configure the Model Optimizer for all supported frameworks at the same time:** + +1. Go to the Model Optimizer prerequisites directory: + ```sh + cd /opt/intel/openvino/deployment_tools/model_optimizer/install_prerequisites + ``` + +2. Run the script to configure the Model Optimizer for Caffe, TensorFlow, MXNet, Kaldi\*, and ONNX: + ```sh + sudo ./install_prerequisites.sh + ``` + +**Option 2: Configure the Model Optimizer for each framework separately:** + +Configure individual frameworks separately **ONLY** if you did not select **Option 1** above. + +1. Go to the Model Optimizer prerequisites directory: + ```sh + cd /opt/intel/openvino/deployment_tools/model_optimizer/install_prerequisites + ``` + +2. Run the script for your model framework. You can run more than one script: + + - For **Caffe**: + ```sh + sudo ./install_prerequisites_caffe.sh + ``` + + - For **TensorFlow**: + ```sh + sudo ./install_prerequisites_tf.sh + ``` + + - For **MXNet**: + ```sh + sudo ./install_prerequisites_mxnet.sh + ``` + + - For **ONNX**: + ```sh + sudo ./install_prerequisites_onnx.sh + ``` + + - For **Kaldi**: + ```sh + sudo ./install_prerequisites_kaldi.sh + ``` + +The Model Optimizer is configured for one or more frameworks. + +You are ready to verify the installation by running the verification scripts. + +## Run the Verification Scripts to Verify Installation and Compile Samples + +> **NOTES**: +> - The steps shown here assume you used the default installation directory to install the OpenVINO toolkit. If you installed the software to a directory other than `/opt/intel/`, update the directory path with the location where you installed the toolkit. +> - If you installed the product as a root user, you must switch to the root mode before you continue: `sudo -i`. + +To verify the installation and compile two Inference Engine samples, run the verification applications provided with the product on the CPU: + +### Run the Image Classification Verification Script + +1. Go to the **Inference Engine demo** directory: + ```sh + cd /opt/intel/openvino/deployment_tools/demo + ``` + +2. Run the **Image Classification verification script**: + ```sh + ./demo_squeezenet_download_convert_run.sh + ``` + +The Image Classification verification script downloads a public SqueezeNet Caffe* model and runs the Model Optimizer to convert the model to `.bin` and `.xml` Intermediate Representation (IR) files. The Inference Engine requires this model conversion so it can use the IR as input and achieve optimum performance on Intel hardware. + +This verification script creates the directory `/home//inference_engine_samples/`, builds the [Image Classification Sample](../../inference-engine/samples/classification_sample_async/README.md) application and runs with the model IR and `car.png` image located in the `demo` directory. When the verification script completes, you will have the label and confidence for the top-10 categories: + +![](../img/image_classification_script_output_lnx.png) + +For a brief description of the Intermediate Representation `.bin` and `.xml` files, see [Configuring the Model Optimizer](#configure-the-model-optimizer). + +This script is complete. Continue to the next section to run the Inference Pipeline verification script. + +### Run the Inference Pipeline Verification Script + +While still in `/opt/intel/openvino/deployment_tools/demo/`, run the Inference Pipeline verification script: + ```sh + ./demo_security_barrier_camera.sh + ``` + +This verification script downloads three pre-trained model IRs, builds the [Security Barrier Camera Demo](@ref omz_demos_security_barrier_camera_demo_README) application and runs it with the downloaded models and the `car_1.bmp` image from the `demo` directory to show an inference pipeline. The verification script uses vehicle recognition in which vehicle attributes build on each other to narrow in on a specific attribute. + +First, an object is identified as a vehicle. This identification is used as input to the next model, which identifies specific vehicle attributes, including the license plate. Finally, the attributes identified as the license plate are used as input to the third model, which recognizes specific characters in the license plate. + +When the verification script completes, you will see an image that displays the resulting frame with detections rendered as bounding boxes, and text: +![](../img/inference_pipeline_script_mac.png) + +Close the image viewer screen to end the demo. + +**Congratulations**, you have completed the Intel® Distribution of OpenVINO™ 2020.1 installation for macOS. To learn more about what you can do with the Intel® Distribution of OpenVINO™ toolkit, see the additional resources provided below. + +## Steps for Intel® Neural Compute Stick 2 + +These steps are only required if you want to perform inference on Intel® Neural Compute Stick 2 +powered by the Intel® Movidius™ Myriad™ X VPU. See also the +[Get Started page for Intel® Neural Compute Stick 2](https://software.intel.com/en-us/neural-compute-stick/get-started). + +To perform inference on Intel® Neural Compute Stick 2, the `libusb` library is required. You can build it from the [source code](https://github.com/libusb/libusb) or install using the macOS package manager you prefer: [Homebrew*](https://brew.sh/), [MacPorts*](https://www.macports.org/) or other. + +For example, to install the `libusb` library using Homebrew\*, use the following command: +```sh +brew install libusb +``` + +## Hello World Tutorials + +Visit the Intel Distribution of OpenVINO Toolkit [Inference Tutorials for Face Detection and Car Detection Exercises](https://github.com/intel-iot-devkit/inference-tutorials-generic/tree/openvino_toolkit_r3_0) + + +## Additional Resources + +- To learn more about the verification applications, see `README.txt` in `/opt/intel/openvino/deployment_tools/demo/`. + +- For detailed description of the pre-trained models, go to the [Overview of OpenVINO toolkit Pre-Trained Models](@ref omz_models_intel_index) page. + +- More information on [sample applications](../IE_DG/Samples_Overview.md). + +- [Convert Your Caffe* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_Caffe.md) + +- [Convert Your TensorFlow* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md) + +- [Convert Your MXNet* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_MxNet.md) + +- [Convert Your ONNX* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md) + +- [Intel Distribution of OpenVINO Toolkit home page](https://software.intel.com/en-us/openvino-toolkit) + +- [Intel Distribution of OpenVINO Toolkit documentation](https://docs.openvinotoolkit.org) diff --git a/docs/install_guides/installing-openvino-raspbian.md b/docs/install_guides/installing-openvino-raspbian.md new file mode 100644 index 00000000000000..d21855c92e1fb7 --- /dev/null +++ b/docs/install_guides/installing-openvino-raspbian.md @@ -0,0 +1,193 @@ +# Install OpenVINO™ toolkit for Raspbian* OS {#openvino_docs_install_guides_installing_openvino_raspbian} + +> **NOTE**: +> - These steps apply to 32-bit Raspbian\* OS, which is an official OS for Raspberry Pi\* boards. +> - These steps have been validated with Raspberry Pi 3*. +> - All steps in this guide are required unless otherwise stated. +> - An internet connection is required to follow the steps in this guide. If you have access to the Internet through the proxy server only, please make sure that it is configured in your OS environment. + +## Introduction + +The OpenVINO™ toolkit quickly deploys applications and solutions that emulate human vision. Based on Convolutional Neural Networks (CNN), the toolkit extends computer vision (CV) workloads across Intel® hardware, maximizing performance. The OpenVINO toolkit includes the Intel® Deep Learning Deployment Toolkit (Intel® DLDT). + +The OpenVINO™ toolkit for Raspbian* OS includes the Inference Engine and the MYRIAD plugins. You can use it with the Intel® Neural Compute Stick 2 plugged in one of USB ports. + +### Included in the Installation Package + +The OpenVINO toolkit for Raspbian OS is an archive with pre-installed header files and libraries. The following components are installed by default: + +| Component | Description | +| :-------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| [Inference Engine](../IE_DG/inference_engine_intro.md) | This is the engine that runs the deep learning model. It includes a set of libraries for an easy inference integration into your applications. | +| [OpenCV\*](https://docs.opencv.org/master/) | OpenCV\* community version compiled for Intel® hardware. | +| [Sample Applications](../IE_DG/Samples_Overview.md) | A set of simple console applications demonstrating how to use Intel's Deep Learning Inference Engine in your applications. | + +> **NOTE**: +> * The package does not include the [Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). To convert models to Intermediate Representation (IR), you need to install it separately to your host machine. +> * The package does not include the Open Model Zoo demo applications. You can download them separately from the [Open Models Zoo repository](https://github.com/opencv/open_model_zoo). + +## Development and Target Platforms + +**Hardware** + +- Raspberry Pi\* board with ARM* ARMv7-A CPU architecture. Check that `uname -m` returns `armv7l`. +- One of Intel® Movidius™ Visual Processing Units (VPU): + - Intel® Neural Compute Stick 2 + +> **NOTE**: With OpenVINO™ 2020.4 release, Intel® Movidius™ Neural Compute Stick is no longer supported. + +**Operating Systems** + +- Raspbian\* Buster, 32-bit +- Raspbian\* Stretch, 32-bit + +**Software** + +- CMake* 3.7.2 or higher +- Python* 3.5, 32-bit + + +## Overview + +This guide provides step-by-step instructions on how to install the OpenVINO™ toolkit for Raspbian* OS. Links are provided for each type of compatible hardware including downloads, initialization and configuration steps. The following steps will be covered: + +1. [Install the OpenVINO™ toolkit](#install-package) +2. [Install External Software Dependencies](#install-dependencies) +3. [Set the environment variables](#set-environment-variables) +4. [Add USB rules](#add-usb-rules) +5. [Run the Object Detection Sample](#run-sample) to validate Inference Engine installation +6. [Learn About Workflow for Raspberry Pi](#workflow-for-raspberry-pi) + +## Install the OpenVINO™ Toolkit for Raspbian* OS Package + +The guide assumes you downloaded the OpenVINO toolkit for Raspbian* OS. If you do not have a copy of the toolkit package file `l_openvino_toolkit_runtime_raspbian_p_.tgz`, download the latest version from the [Intel® Open Source Technology Center](https://download.01.org/opencv/2020/openvinotoolkit/) and then return to this guide to proceed with the installation. + +> **NOTE**: The OpenVINO toolkit for Raspbian OS is distributed without installer, so you need to perform extra steps comparing to the [Intel® Distribution of OpenVINO™ toolkit for Linux* OS](installing-openvino-linux.md). + +1. Open the Terminal\* or your preferred console application. + +2. Go to the directory in which you downloaded the OpenVINO toolkit. This document assumes this is your `~/Downloads` directory. If not, replace `~/Downloads` with the directory where the file is located. +```sh +cd ~/Downloads/ +``` +By default, the package file is saved as `l_openvino_toolkit_runtime_raspbian_p_.tgz`. + +3. Create an installation folder. +```sh +sudo mkdir -p /opt/intel/openvino +``` + +4. Unpack the archive: +```sh +sudo tar -xf l_openvino_toolkit_runtime_raspbian_p_.tgz --strip 1 -C /opt/intel/openvino +``` + +Now the OpenVINO toolkit components are installed. Additional configuration steps are still required. Continue to the next sections to install External Software Dependencies, configure the environment and set up USB rules. + +## Install External Software Dependencies + +CMake* version 3.7.2 or higher is required for building the Inference Engine sample application. To install, open a Terminal* window and run the following command: +```sh +sudo apt install cmake +``` + +CMake is installed. Continue to the next section to set the environment variables. + +## Set the Environment Variables + +You must update several environment variables before you can compile and run OpenVINO toolkit applications. Run the following script to temporarily set the environment variables: +```sh +source /opt/intel/openvino/bin/setupvars.sh +``` + +**(Optional)** The OpenVINO environment variables are removed when you close the shell. As an option, you can permanently set the environment variables as follows: +```sh +echo "source /opt/intel/openvino/bin/setupvars.sh" >> ~/.bashrc +``` + +To test your change, open a new terminal. You will see the following: +``` +[setupvars.sh] OpenVINO environment initialized +``` + +Continue to the next section to add USB rules for Intel® Neural Compute Stick 2 devices. + +## Add USB Rules + +1. Add the current Linux user to the `users` group: +```sh +sudo usermod -a -G users "$(whoami)" +``` +Log out and log in for it to take effect. + +2. If you didn't modify `.bashrc` to permanently set the environment variables, run `setupvars.sh` again after logging in: +```sh +source /opt/intel/openvino/bin/setupvars.sh +``` + +3. To perform inference on the Intel® Neural Compute Stick 2, install the USB rules running the `install_NCS_udev_rules.sh` script: +```sh +sh /opt/intel/openvino/install_dependencies/install_NCS_udev_rules.sh +``` +4. Plug in your Intel® Neural Compute Stick 2. + +You are ready to compile and run the Object Detection sample to verify the Inference Engine installation. + +## Build and Run Object Detection Sample + +Follow the next steps to run pre-trained Face Detection network using Inference Engine samples from the OpenVINO toolkit. + +1. Navigate to a directory that you have write access to and create a samples build directory. This example uses a directory named `build`: +```sh +mkdir build && cd build +``` + +2. Build the Object Detection Sample: +```sh +cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-march=armv7-a" /opt/intel/openvino/deployment_tools/inference_engine/samples +``` +```sh +make -j2 object_detection_sample_ssd +``` + +3. Download the pre-trained Face Detection model or copy it from the host machine: + + - To download the `.bin` file with weights: + ```sh + wget --no-check-certificate https://download.01.org/opencv/2020/openvinotoolkit/2020.1/open_model_zoo/models_bin/1/face-detection-adas-0001/FP16/face-detection-adas-0001.bin + ``` + + - To download the `.xml` file with the network topology: + ```sh + wget --no-check-certificate https://download.01.org/opencv/2020/openvinotoolkit/2020.1/open_model_zoo/models_bin/1/face-detection-adas-0001/FP16/face-detection-adas-0001.xml + ``` + +4. Run the sample with specifying the model and a path to the input image: +```sh +./armv7l/Release/object_detection_sample_ssd -m face-detection-adas-0001.xml -d MYRIAD -i +``` +The application outputs an image (`out_0.bmp`) with detected faced enclosed in rectangles. + +Congratulations, you have finished the OpenVINO™ toolkit for Raspbian* OS installation. You have completed all required installation, configuration and build steps in this guide. + +Read the next topic if you want to learn more about OpenVINO workflow for Raspberry Pi. + +## Workflow for Raspberry Pi* + +If you want to use your model for inference, the model must be converted to the .bin and .xml Intermediate Representation (IR) files that are used as input by Inference Engine. OpenVINO™ toolkit support on Raspberry Pi only includes the Inference Engine module of the Intel® Distribution of OpenVINO™ toolkit. The Model Optimizer is not supported on this platform. To get the optimized models you can use one of the following options: + +* Download a set of ready-to-use pre-trained models for the appropriate version of OpenVINO from the Intel® Open Source Technology Center: + + * Models for the 2020.1 release of OpenVINO are available at [https://download.01.org/opencv/2020/openvinotoolkit/2020.1/open_model_zoo/](https://download.01.org/opencv/2020/openvinotoolkit/2020.1/open_model_zoo/). + * Models for the 2019 R1 release of OpenVINO are available at [https://download.01.org/opencv/2019/open_model_zoo/R1/](https://download.01.org/opencv/2019/open_model_zoo/R1/). + * Models for the 2018 R5 release of OpenVINO are available at [https://download.01.org/openvinotoolkit/2018_R5/open_model_zoo/](https://download.01.org/openvinotoolkit/2018_R5/open_model_zoo/). + + For more information on pre-trained models, see [Pre-Trained Models Documentation](@ref omz_models_intel_index) + +* Convert the model using the Model Optimizer from a full installation of Intel® Distribution of OpenVINO™ toolkit on one of the supported platforms. Installation instructions are available: + + * [Installation Guide for macOS*](installing-openvino-macos.md) + * [Installation Guide for Windows*](installing-openvino-windows.md) + * [Installation Guide for Linux*](installing-openvino-linux.md) + + For more information about how to use the Model Optimizer, see the [Model Optimizer Developer Guide](https://software.intel.com/articles/OpenVINO-ModelOptimizer) diff --git a/docs/install_guides/installing-openvino-windows-fpga.md b/docs/install_guides/installing-openvino-windows-fpga.md new file mode 100644 index 00000000000000..d2ca21de7c46e5 --- /dev/null +++ b/docs/install_guides/installing-openvino-windows-fpga.md @@ -0,0 +1,430 @@ +# Install Intel® Distribution of OpenVINO™ toolkit for Windows* with FPGA Support {#openvino_docs_install_guides_installing_openvino_windows_fpga} + +**NOTES**: +- These steps apply to Microsoft Windows 10*. +- For the Intel Distribution of OpenVINO toolkit for Windows* without FPGA +support, see [Installation Guide for Windows*](installing-openvino-windows.md). +- An internet connection is required to follow the steps in this guide. +- [Intel® System Studio](https://software.intel.com/en-us/system-studio) is an all-in-one, cross-platform tool suite, purpose-built to simplify system bring-up and improve system and IoT device application performance on Intel® platforms. If you are using the Intel® Distribution of OpenVINO™ with Intel® System Studio, go to [Get Started with Intel® System Studio](https://software.intel.com/en-us/articles/get-started-with-openvino-and-intel-system-studio-2019). + +## Introduction + +> **IMPORTANT**: +> - All steps in this guide are required, unless otherwise stated.
+> - In addition to the download package, you must install dependencies and complete configuration steps. + +Your installation is complete when these are all completed: + +1. Install the Intel® Distribution of OpenVINO™ toolkit core components + +2. Install the dependencies: + + - [Microsoft Visual Studio* with C++ **2019, 2017, or 2015** with MSBuild](http://visualstudio.microsoft.com/downloads/) + - [CMake **3.4 or higher** 64-bit](https://cmake.org/download/) + > **NOTE**: If you want to use Microsoft Visual Studio 2019, you are required to install CMake 3.14. + - [Python **3.6.5** 64-bit](https://www.python.org/downloads/release/python-365/) + > **IMPORTANT**: As part of this installation, make sure you click the option to add the application to your `PATH` environment variable. + +3. Set Environment Variables + +4. Configure the Model Optimizer + +5. Run two Verification Scripts to Verify Installation + +6. Install your compatible hardware from the list of supported hardware
+ +7. Use the Face Detection Tutorial + +### About the Intel® Distribution of OpenVINO™ toolkit + +The Intel® Distribution of OpenVINO™ toolkit speeds the deployment of applications and solutions that emulate human vision. Based on Convolutional Neural Networks (CNN), the toolkit extends computer vision (CV) workloads across Intel® hardware to maximize performance. + +The Intel® Distribution of OpenVINO™ toolkit includes the Intel® Deep Learning Deployment Toolkit (Intel® DLDT). For more information, see the online [Intel® Distribution of OpenVINO™ toolkit Overview](https://software.intel.com/en-us/OpenVINO-toolkit) page. + +The Intel® Distribution of OpenVINO™ toolkit for Windows\* with FPGA Support: + +- Enables CNN-based deep learning inference on the edge +- Supports heterogeneous execution across Intel® CPU, Intel® Integrated Graphics, Intel® FPGA, Intel® Neural Compute Stick 2 +- Speeds time-to-market via an easy-to-use library of computer vision functions and pre-optimized kernels +- Includes optimized calls for computer vision standards including OpenCV\* and OpenCL™ + +#### Included in the Installation Package + +The following components are installed by default: + +| Component | Description | +|-----------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) | This tool imports, converts, and optimizes models that were trained in popular frameworks to a format usable by Intel tools, especially the Inference Engine. 
Popular frameworks include Caffe\*, TensorFlow\*, MXNet\*, and ONNX\*. | +| [Inference Engine](../IE_DG/inference_engine_intro.md) | This is the engine that runs the deep learning model. It includes a set of libraries for an easy inference integration into your applications. | +| Pre-compiled FPGA bitstream samples | Pre-compiled bitstream samples for the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA, and Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA SG2. | +| Intel® FPGA SDK for OpenCL™ software technology | The Intel® FPGA RTE for OpenCL™ provides utilities, host runtime libraries, drivers, and RTE-specific libraries and files | +| [OpenCV](https://docs.opencv.org/master/) | OpenCV\* community version compiled for Intel® hardware | +| [Inference Engine Code Samples](../IE_DG/Samples_Overview.md) | A set of simple console applications demonstrating how to utilize specific OpenVINO capabilities in an application and how to perform specific tasks, such as loading a model, running inference, querying specific device capabilities, and more. | +| [Demo Applications](@ref omz_demos_README) | A set of simple console applications that provide robust application templates to help you implement specific deep learning scenarios. | + + +### System Requirements + +The development and target platforms have the same requirements, but you can select different components during the installation, based on your intended use. + +**Hardware** + +* 6th to 10th generation Intel® Core™ processors and Intel® Xeon® processors +* Intel® Xeon® processor E family (formerly code named Sandy Bridge, Ivy Bridge, Haswell, and Broadwell) +* 3rd generation Intel® Xeon® Scalable processor (formerly code named Cooper Lake) +* Intel® Xeon® Scalable processor (formerly Skylake and Cascade Lake) +* Intel Atom® processor with support for Intel® Streaming SIMD Extensions 4.1 (Intel® SSE4.1) +* Intel Pentium® processor N4200/5, N3350/5, or N3450/5 with Intel® HD Graphics +* Intel® Neural Compute Stick 2 +* Intel® Vision Accelerator Design with Intel® Movidius™ VPUs +* Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Mustang-F100-A10) SG2 + +> **NOTE**: With OpenVINO™ 2020.4 release, Intel® Movidius™ Neural Compute Stick is no longer supported. + +> **NOTE**: With OpenVINO™ 2020.4 release, Intel® Programmable Acceleration Card (PAC) with Intel® Arria® 10 GX FPGA is no longer supported on Windows. + +**Processor Notes:** + +- Processor graphics are not included in all processors. See [Product Specifications](https://ark.intel.com/) for information about your processor. +- A chipset that supports processor graphics is required for Intel® Xeon® processors. + +**Operating Systems:** + +- Microsoft Windows 10*, 64-bit + +**Software** +- [Microsoft Visual Studio* with C++ **2019, 2017, or 2015** with MSBuild](http://visualstudio.microsoft.com/downloads/) +- [CMake **3.4 or higher** 64-bit](https://cmake.org/download/) + > **NOTE**: If you want to use Microsoft Visual Studio 2019, you are required to install CMake 3.14. +- [Python **3.6.5** 64-bit](https://www.python.org/downloads/release/python-365/) + +## Installation Steps + +### Install the Intel® Distribution of OpenVINO™ toolkit Core Components + +1. If you have not downloaded the Intel® Distribution of OpenVINO™ toolkit for Windows* with FPGA Support, [download the latest version](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit/choose-download/windows-fpga.html). By default, the file is saved to the `Downloads` directory as `w_openvino_toolkit_fpga_p_.exe`. +Select the Intel® Distribution of OpenVINO™ toolkit for Windows with FPGA Support package from the dropdown menu. + +2. Go to the `Downloads` folder and double-click `w_openvino_toolkit_fpga_p_.exe`. A window opens to let you choose your installation directory and components. You can also select only the bitstreams for your card. This will allow you to minimize the size of the installation by several gigabytes. The default installation directory is `C:\Program Files (x86)\IntelSWTools\openvino_`, for simplicity, a shortcut to the latest installation is also created: `C:\Program Files (x86)\IntelSWTools\openvino`. If you choose a different installation directory, the installer will create the directory for you. For the default options, the **Installation summary** GUI screen looks like this:: + + ![](../img/openvino-install-windows-fpga-01.png) + +3. Click **Next**. + +4. You are asked if you want to provide consent to gather information. Choose the option of your choice. Click **Next**. + +5. If you are missing external dependencies, you will see a warning screen. Write down the dependencies you are missing. **You need to take no other action at this time**. After installing the Intel® Distribution of OpenVINO™ toolkit core components, install the missing dependencies. +The screen example below indicates you are missing one dependency: + + ![](../img/openvino-install-windows-fpga-02.png) + +6. Click **Next**. + +7. When the first part of installation is complete, the final screen informs you that the core components have been installed and additional steps still required: + + ![](../img/openvino-install-windows-fpga-03.png) + +8. Click **Finish** to close the installation wizard. A new browser window opens to the next section of the installation guide to set the environment variables. You are in the same document. The new window opens in case you ran the installation without first opening this installation guide. + +9. If the installation indicated you must install dependencies, install them first. If there are no missing dependencies, you can go ahead and set the environment variables. + +### Set the Environment Variables + +> **NOTE**: If you installed the Intel® Distribution of OpenVINO™ to the non-default install directory, replace `C:\Program Files (x86)\IntelSWTools` with the directory in which you installed the software. + +You must update several environment variables before you can compile and run OpenVINO™ applications. Open the Command Prompt, and run the `setupvars.bat` batch file to temporarily set your environment variables: +```sh +cd C:\Program Files (x86)\IntelSWTools\openvino\bin\ +``` + +```sh +setupvars.bat +``` + +(Optional): OpenVINO toolkit environment variables are removed when you close the Command Prompt window. As an option, you can permanently set the environment variables manually. + +The environment variables are set. Continue to the next section to configure the Model Optimizer. + +## Configure the Model Optimizer + +> **IMPORTANT**: These steps are required. You must configure the Model Optimizer for at least one framework. The Model Optimizer will fail if you do not complete the steps in this section. + +> **NOTE**: If you see an error indicating Python is not installed when you know you installed it, your computer might not be able to find the program. For the instructions to add Python to your system environment variables, see Update Your Windows Environment Variables. + +The Model Optimizer is a key component of the Intel® Distribution of OpenVINO™ toolkit. You cannot do inference on your trained model without running the model through the Model Optimizer. When you run a pre-trained model through the Model Optimizer, your output is an Intermediate Representation (IR) of the network. The IR is a pair of files that describe the whole model: + +- `.xml`: Describes the network topology +- `.bin`: Contains the weights and biases binary data + +The Inference Engine reads, loads, and infers the IR files, using a common API across the CPU, GPU, or VPU hardware. + +The Model Optimizer is a Python*-based command line tool (`mo.py`), which is located in `C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer`. Use this tool on models trained with popular deep learning frameworks such as Caffe\*, TensorFlow\*, MXNet\*, and ONNX\* to convert them to an optimized IR format that the Inference Engine can use. + +This section explains how to use scripts to configure the Model Optimizer either for all of the supported frameworks at the same time or for individual frameworks. If you want to manually configure the Model Optimizer instead of using scripts, see the **Using Manual Configuration Process** section on the [Configuring the Model Optimizer](../MO_DG/prepare_model/Config_Model_Optimizer.md) page. + +For more information about the Model Optimizer, see the [Model Optimizer Developer Guide](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). + + +### Model Optimizer Configuration Steps + +You can configure the Model Optimizer either for all supported frameworks at once or for one framework at a time. Choose the option that best suits your needs. If you see error messages, make sure you installed all dependencies. + +> **IMPORTANT**: The Internet access is required to execute the following steps successfully. If you have access to the Internet through the proxy server only, please make sure that it is configured in your environment. + +> **NOTE**: +> In the steps below: +> - If you you want to use the Model Optimizer from another installed versions of Intel® Distribution of OpenVINO™ toolkit installed, replace `openvino` with `openvino_`. +> - If you installed the Intel® Distribution of OpenVINO™ toolkit to the non-default installation directory, replace `C:\Program Files (x86)\IntelSWTools` with the directory where you installed the software. + +These steps use a command prompt to make sure you see error messages. + +#### Option 1: Configure the Model Optimizer for all supported frameworks at the same time: + +1. Open a command prompt. To do so, type `cmd` in your **Search Windows** box and then press **Enter**. +Type commands in the opened window: + + ![](../img/command_prompt.PNG) + +2. Go to the Model Optimizer prerequisites directory.
+```sh +cd C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\install_prerequisites +``` + +3. Run the following batch file to configure the Model Optimizer for Caffe\*, TensorFlow\*, MXNet\*, Kaldi\*, and ONNX\*:
+```sh +install_prerequisites.bat +``` + +#### Option 2: Configure the Model Optimizer for each framework separately: + +1. Go to the Model Optimizer prerequisites directory:
+```sh +cd C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\install_prerequisites +``` + +2. Run the batch file for the framework you will use with the Model Optimizer. You can use more than one: + + * For **Caffe**:
+ ```sh + install_prerequisites_caffe.bat + ``` + + * For **TensorFlow**:
+ ```sh + install_prerequisites_tf.bat + ``` + + * For **MXNet**:
+ ```sh + install_prerequisites_mxnet.bat + ``` + + * For **ONNX**: + ```sh + install_prerequisites_onnx.bat + ``` + + * For **Kaldi**: + ```sh + install_prerequisites_kaldi.bat + ``` + +The Model Optimizer is configured for one or more frameworks. Success is indicated by a screen similar to this: + +![](../img/Configure-MO.PNG) + +You are ready to use two short demos to see the results of running the Intel Distribution of OpenVINO toolkit and to verify your installation was successful. The demo scripts are required since they perform additional configuration steps. Continue to the next section. + +If you want to use a GPU or VPU, or update your Windows* environment variables, read through the Optional Steps section. + + +## Use Verification Scripts to Verify Your Installation + +> **IMPORTANT**: This section is required. In addition to confirming your installation was successful, demo scripts perform other steps, such as setting up your computer to use the Inference Engine samples. + +> **NOTE**: +> The paths in this section assume you used the default installation directory. If you used a directory other than `C:\Program Files (x86)\IntelSWTools`, update the directory with the location where you installed the software. +To verify the installation and compile two samples, run the verification applications provided with the product on the CPU: + +1. Open a command prompt window. + +2. Go to the Inference Engine demo directory:
+ ```sh + cd C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\demo\ + ``` + +3. Run the verification scripts by following the instructions in the next section. + + +### Run the Image Classification Verification Script + +To run the script, start the `demo_squeezenet_download_convert_run.bat` file: +```sh +demo_squeezenet_download_convert_run.bat +``` + +This script downloads a SqueezeNet model, uses the Model Optimizer to convert the model to the `.‍bin` and `.‍xml` Intermediate Representation (IR) files. The Inference Engine requires this model conversion so it can use the IR as input and achieve optimum performance on Intel hardware.
+This verification script builds the [Image Classification Sample Async](../../inference-engine/samples/classification_sample_async/README.md) application and run it with the `car.png` image in the demo directory. For a brief description of the Intermediate Representation, see Configuring the Model Optimizer. + +When the verification script completes, you will have the label and confidence for the top-10 categories: +![](../img/image_classification_script_output_win.png) + +This demo is complete. Leave the console open and continue to the next section to run the Inference Pipeline demo. + + +### Run the Inference Pipeline Verification Script + +To run the script, start the `demo_security_barrier_camera.bat` file while still in the console: +```sh +demo_security_barrier_camera.bat +``` + +This script downloads three pre-trained model IRs, builds the [Security Barrier Camera Demo](@ref omz_demos_security_barrier_camera_demo_README) application, and runs it with the downloaded models and the `car_1.bmp` image from the `demo` directory to show an inference pipeline. The verification script uses vehicle recognition in which vehicle attributes build on each other to narrow in on a specific attribute. + +First, an object is identified as a vehicle. This identification is used as input to the next model, which identifies specific vehicle attributes, including the license plate. Finally, the attributes identified as the license plate are used as input to the third model, which recognizes specific characters in the license plate. + +When the demo completes, you have two windows open: + + * A console window that displays information about the tasks performed by the demo + * An image viewer window that displays a resulting frame with detections rendered as bounding boxes, similar to the following: + + ![](../img/inference_pipeline_script_win.png) + +Close the image viewer window to end the demo. + +To learn more about the verification scripts, see `README.txt` in `C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\demo`. + +For detailed description of the OpenVINO™ pre-trained object detection and object recognition models, see the [Overview of OpenVINO™ toolkit Pre-Trained Models](@ref omz_models_intel_index) page. + +In this section, you saw a preview of the Intel® Distribution of OpenVINO™ toolkit capabilities. + +Congratulations. You have completed all the required installation, configuration, and build steps to work with your trained models using CPU. + +If you want to use Intel® Processor graphics (GPU), Intel® Neural Compute Stick 2 or Intel® Vision Accelerator Design with Intel® Movidius™ (VPU), or add CMake* and Python* to your Windows* environment variables, read through the next section for additional steps. + +If you want to continue and run the Image Classification Sample Application on one of the supported hardware device, see the [Run the Image Classification Sample Application](#run-the-image-classification-sample-application) section. + +## Install and Configure Your Compatible FPGA Hardware + +Install your compatible hardware from the list of supported components below. + +> **NOTE**: Once you've completed your hardware installation, you'll return to this guide to finish installation and configuration of the Intel® Distribution of OpenVINO™ toolkit. + +Links to install and configure compatible hardware +- [The Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA SG2 (Mustang-F100-A10)](VisionAcceleratorFPGA_Configure_Windows.md) + +Congratulations, you have finished the Intel® Distribution of OpenVINO™ toolkit installation for FPGA. To learn more about how the Intel® Distribution of OpenVINO™ toolkit works, the Hello World tutorial and other resources are provided below. + +## Optional Steps + +Use the optional steps below if you want to: +* Infer models on Intel® Processor Graphics +* Infer models on Intel® Vision Accelerator Design with Intel® Movidius™ VPUs +* Add CMake* or Python* to your Windows* environment variables. + +### Optional: Additional Installation Steps for Intel® Processor Graphics (GPU) + +> **NOTE**: These steps are required only if you want to use a GPU. + +If your applications offload computation to Intel® Integrated Graphics, you must have the Intel Graphics Driver for Windows version 15.65 or higher. To see if you have this driver installed: + +1. Type **device manager** in your **Search Windows** box. The **Device Manager** opens. + +2. Click the drop-down arrow to view the **Display adapters**. You see the adapter that is installed in your computer: + + ![](../img/DeviceManager.PNG) + +3. Right-click the adapter name and select **Properties**. + +4. Click the **Driver** tab to see the driver version. Make sure the version number is 15.65 or higher. + + ![](../img/DeviceDriverVersion.PNG) + +5. If your device driver version is lower than 15.65, [download and install a higher version](http://downloadcenter.intel.com/product/80939/Graphics-Drivers). + +You are done updating your device driver and are ready to use your GPU. + + +### Optional: Additional Installation Steps for the Intel® Vision Accelerator Design with Intel® Movidius™ VPUs + +> **NOTE**: These steps are required only if you want to use Intel® Vision Accelerator Design with Intel® Movidius™ VPUs. + +To perform inference on Intel® Vision Accelerator Design with Intel® Movidius™ VPUs, the following additional installation steps are required: + + 1. If your Intel® Vision Accelerator Design with Intel® Movidius™ VPUs card requires SMBUS connection to PCIe slot (Raw video data card with HW version Fab-B and before), install the SMBUS driver: + 1. Go to the `\deployment_tools\inference-engine\external\hddl\SMBusDriver` directory, where `` is the directory in which the Intel Distribution of OpenVINO toolkit is installed. + 2. Right click on the `hddlsmbus.inf` file and choose **Install** from the pop up menu. + + 2. Download and install Visual C++ Redistributable for Visual Studio 2015 + +You are done installing your device driver and are ready to use your Intel® Vision Accelerator Design with Intel® Movidius™ VPUs. + +See also: + +* For advanced configuration steps for your IEI Mustang-V100-MX8 accelerator, see [Intel® Movidius™ VPUs Setup Guide for Use with Intel® Distribution of OpenVINO™ toolkit](movidius-setup-guide.md). + +* After you've configurated your Intel® Vision Accelerator Design with Intel® Movidius™ VPUs, see [Intel® Movidius™ VPUs Programming Guide for Use with Intel® Distribution of OpenVINO™ toolkit](movidius-programming-guide.md) to learn how to distribute a model across all 8 VPUs to maximize performance. + +After configuration is done, you are ready to run the verification scripts with the HDDL Plugin for your Intel® Vision Accelerator Design with Intel® Movidius™ VPUs. + +1. Open a command prompt window. + +2. Go to the Inference Engine demo directory: + ```sh + cd C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\demo\ + ``` +3. Run the Image Classification verification script. If you have access to the Internet through the proxy server only, please make sure that it is configured in your environment. + ```sh + demo_squeezenet_download_convert_run.bat -d HDDL + ``` +4. Run the Inference Pipeline verification script: + ```sh + demo_security_barrier_camera.bat -d HDDL + ``` + +### Optional: Update Your Windows Environment Variables + +> **NOTE**: These steps are only required under special circumstances, such as if you forgot to check the box during the CMake\* or Python\* installation to add the application to your Windows `PATH` environment variable. + +Use these steps to update your Windows `PATH` if a command you execute returns an error message stating that an application cannot be found. This might happen if you do not add CMake or Python to your `PATH` environment variable during the installation. + +1. In your **Search Windows** box, type **Edit the system environment variables** and press **Enter**. A window similar to the following displays: + ![](../img/System_Properties.PNG) + +2. At the bottom of the screen, click **Environment Variables**. + +3. Under **System variables**, click **Path** and then **Edit**: + ![](../img/Environment_Variables-select_Path.PNG) + +4. In the opened window, click **Browse**. A browse window opens: + ![](../img/Add_Environment_Variable.PNG) + +5. If you need to add CMake to the `PATH`, browse to the directory in which you installed CMake. The default directory is `C:\Program Files\CMake`. + +6. If you need to add Python to the `PATH`, browse to the directory in which you installed Python. The default directory is `C:\Users\\AppData\Local\Programs\Python\Python36\Python`. + +7. Click **OK** repeatedly to close each screen. + +Your `PATH` environment variable is updated. + +## Hello World Face Detection Tutorial + +Refer to the [OpenVINO™ with FPGA Hello World Face Detection Exercise](https://github.com/intel-iot-devkit/openvino-with-fpga-hello-world-face-detection). + +**Additional Resources** + +- Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) +- OpenVINO™ toolkit online documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org) +- [Inference Engine FPGA plugin documentation](../IE_DG/supported_plugins/FPGA.md) +- [Model Optimizer Developer Guide](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +- For more information on Sample Applications, see the [Inference Engine Samples Overview](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Samples_Overview.html) +- To learn about pre-trained models for OpenVINO™ toolkit, see the [Pre-Trained Models Overview](https://docs.openvinotoolkit.org/latest/_docs_docs_Pre_Trained_Models.html) +- For information on Inference Engine Tutorials, see the [Inference Tutorials](https://github.com/intel-iot-devkit/inference-tutorials-generic) +- For IoT Libraries & Code Samples see the [Intel® IoT Developer Kit](https://github.com/intel-iot-devkit). + +To learn more about converting models, go to: + +- [Convert Your Caffe* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_Caffe.md) +- [Convert Your TensorFlow* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md) +- [Convert Your MXNet* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_MxNet.md) +- [Convert Your ONNX* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md) \ No newline at end of file diff --git a/docs/install_guides/installing-openvino-windows.md b/docs/install_guides/installing-openvino-windows.md new file mode 100644 index 00000000000000..6eb7e87da98c75 --- /dev/null +++ b/docs/install_guides/installing-openvino-windows.md @@ -0,0 +1,473 @@ +# Install Intel® Distribution of OpenVINO™ toolkit for Windows* 10 {#openvino_docs_install_guides_installing_openvino_windows} + +> **NOTES**: +> - This guide applies to Microsoft Windows\* 10 64-bit. For Linux* OS information and instructions, see the [Installation Guide for Linux](installing-openvino-linux.md). +> - For the Intel® Distribution of OpenVINO™ toolkit for Windows* 10 with FPGA +support, see [Installation Guide for Windows* with FPGA support](installing-openvino-windows-fpga.md). +> - [Intel® System Studio](https://software.intel.com/en-us/system-studio) is an all-in-one, cross-platform tool suite, purpose-built to simplify system bring-up and improve system and IoT device application performance on Intel® platforms. If you are using the Intel® Distribution of OpenVINO™ with Intel® System Studio, go to [Get Started with Intel® System Studio](https://software.intel.com/en-us/articles/get-started-with-openvino-and-intel-system-studio-2019). + +## Introduction + +> **IMPORTANT**: +> - All steps in this guide are required, unless otherwise stated.
+> - In addition to the download package, you must install dependencies and complete configuration steps. + +Your installation is complete when these are all completed: + +1. Install the Intel® Distribution of OpenVINO™ toolkit core components + +2. Install the dependencies: + + - [Microsoft Visual Studio* with C++ **2019, 2017, or 2015** with MSBuild](http://visualstudio.microsoft.com/downloads/) + - [CMake **3.4 or higher** 64-bit](https://cmake.org/download/) + > **NOTE**: If you want to use Microsoft Visual Studio 2019, you are required to install CMake 3.14. + - [Python **3.6.5** 64-bit](https://www.python.org/downloads/release/python-365/) + > **IMPORTANT**: As part of this installation, make sure you click the option to add the application to your `PATH` environment variable. + +3. Set Environment Variables + +4. Configure the Model Optimizer + +5. Run two Verification Scripts to Verify Installation + +6. Optional:  + + - Install the Intel® Graphics Driver for Windows* + + - Install the drivers and software for the Intel® Vision Accelerator Design with Intel® Movidius™ VPUs + + - Update Windows* environment variables + +### About the Intel® Distribution of OpenVINO™ toolkit + +The Intel® Distribution of OpenVINO™ toolkit speeds the deployment of applications and solutions that emulate human vision. Based on Convolutional Neural Networks (CNN), the toolkit extends computer vision (CV) workloads across Intel® hardware to maximize performance. + +The Intel® Distribution of OpenVINO™ toolkit includes the Intel® Deep Learning Deployment Toolkit (Intel® DLDT). For more information, see the online [Intel® Distribution of OpenVINO™ toolkit Overview](https://software.intel.com/en-us/OpenVINO-toolkit) page. + +The Intel® Distribution of OpenVINO™ toolkit for Windows\* 10 OS: + +- Enables CNN-based deep learning inference on the edge +- Supports heterogeneous execution across Intel® CPU, Intel® Processor Graphics (GPU), Intel® Neural Compute Stick 2, and Intel® Vision Accelerator Design with Intel® Movidius™ VPUs +- Speeds time-to-market through an easy-to-use library of computer vision functions and pre-optimized kernels +- Includes optimized calls for computer vision standards including OpenCV\* and OpenCL™ + +#### Included in the Installation Package + +The following components are installed by default: + +| Component | Description | +|:---------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +|[Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) |This tool imports, converts, and optimizes models that were trained in popular frameworks to a format usable by Intel tools, especially the Inference Engine.
NOTE: Popular frameworks include such frameworks as Caffe\*, TensorFlow\*, MXNet\*, and ONNX\*. | +|[Inference Engine](../IE_DG/inference_engine_intro.md) |This is the engine that runs the deep learning model. It includes a set of libraries for an easy inference integration into your applications. | +|[OpenCV\*](https://docs.opencv.org/master/) |OpenCV* community version compiled for Intel® hardware | +|[Inference Engine Samples](../IE_DG/Samples_Overview.md) |A set of simple console applications demonstrating how to use Intel's Deep Learning Inference Engine in your applications. | +| [Demos](@ref omz_demos_README) | A set of console applications that demonstrate how you can use the Inference Engine in your applications to solve specific use-cases | +| [Additional Tools](../IE_DG/Tools_Overview.md) | A set of tools to work with your models | +| [Documentation for Pre-Trained Models ](@ref omz_models_intel_index) | Documentation for the pre-trained models available in the [Open Model Zoo repo](https://github.com/opencv/open_model_zoo) | + +### System Requirements + +**Hardware** + +* 6th to 10th generation Intel® Core™ processors and Intel® Xeon® processors +* Intel® Xeon® processor E family (formerly code named Sandy Bridge, Ivy Bridge, Haswell, and Broadwell) +* 3rd generation Intel® Xeon® Scalable processor (formerly code named Cooper Lake) +* Intel® Xeon® Scalable processor (formerly Skylake and Cascade Lake) +* Intel Atom® processor with support for Intel® Streaming SIMD Extensions 4.1 (Intel® SSE4.1) +* Intel Pentium® processor N4200/5, N3350/5, or N3450/5 with Intel® HD Graphics +* Intel® Neural Compute Stick 2 +* Intel® Vision Accelerator Design with Intel® Movidius™ VPUs + +> **NOTE**: With OpenVINO™ 2020.4 release, Intel® Movidius™ Neural Compute Stick is no longer supported. + +**Processor Notes:** + +- Processor graphics are not included in all processors. See [Processors specifications](https://ark.intel.com/#@Processors) for information about your processor. +- A chipset that supports processor graphics is required if you're using an Intel Xeon processor. See [Chipset specifications](https://ark.intel.com/#@Chipsets) for information about your chipset. + +**Operating System** + +- Microsoft Windows\* 10 64-bit + +**Software** +- [Microsoft Visual Studio* with C++ **2019, 2017, or 2015** with MSBuild](http://visualstudio.microsoft.com/downloads/) +- [CMake **3.4 or higher** 64-bit](https://cmake.org/download/) + > **NOTE**: If you want to use Microsoft Visual Studio 2019, you are required to install CMake 3.14. +- [Python **3.6.5** 64-bit](https://www.python.org/downloads/release/python-365/) + +## Installation Steps + +### Install the Intel® Distribution of OpenVINO™ toolkit Core Components + +1. If you have not downloaded the Intel® Distribution of OpenVINO™ toolkit, [download the latest version](http://software.intel.com/en-us/openvino-toolkit/choose-download/free-download-windows). By default, the file is saved to the `Downloads` directory as `w_openvino_toolkit_p_.exe`. + +2. Go to the `Downloads` folder and double-click `w_openvino_toolkit_p_.exe`. A window opens to let you choose your installation directory and components. The default installation directory is `C:\Program Files (x86)\IntelSWTools\openvino_`, for simplicity, a shortcut to the latest installation is also created: `C:\Program Files (x86)\IntelSWTools\openvino`. If you choose a different installation directory, the installer will create the directory for you: + + ![](../img/openvino-install-windows-01.png) + +3. Click **Next**. + +4. You are asked if you want to provide consent to gather information. Choose the option of your choice. Click **Next**. + +5. If you are missing external dependencies, you will see a warning screen. Write down the dependencies you are missing. **You need to take no other action at this time**. After installing the Intel® Distribution of OpenVINO™ toolkit core components, install the missing dependencies. +The screen example below indicates you are missing two dependencies: + + ![](../img/openvino-install-windows-02.png) + +6. Click **Next**. + +7. When the first part of installation is complete, the final screen informs you that the core components have been installed and additional steps still required: + + ![](../img/openvino-install-windows-03.png) + +8. Click **Finish** to close the installation wizard. A new browser window opens to the next section of the installation guide to set the environment variables. You are in the same document. The new window opens in case you ran the installation without first opening this installation guide. + +9. If the installation indicated you must install dependencies, install them first. If there are no missing dependencies, you can go ahead and set the environment variables. + +### Set the Environment Variables + +> **NOTE**: If you installed the Intel® Distribution of OpenVINO™ to the non-default install directory, replace `C:\Program Files (x86)\IntelSWTools` with the directory in which you installed the software. + +You must update several environment variables before you can compile and run OpenVINO™ applications. Open the Command Prompt, and run the `setupvars.bat` batch file to temporarily set your environment variables: +```sh +cd C:\Program Files (x86)\IntelSWTools\openvino\bin\ +``` + +```sh +setupvars.bat +``` + +(Optional): OpenVINO toolkit environment variables are removed when you close the Command Prompt window. As an option, you can permanently set the environment variables manually. + +The environment variables are set. Continue to the next section to configure the Model Optimizer. + +## Configure the Model Optimizer + +> **IMPORTANT**: These steps are required. You must configure the Model Optimizer for at least one framework. The Model Optimizer will fail if you do not complete the steps in this section. + +> **NOTE**: If you see an error indicating Python is not installed when you know you installed it, your computer might not be able to find the program. For the instructions to add Python to your system environment variables, see Update Your Windows Environment Variables. + +The Model Optimizer is a key component of the Intel® Distribution of OpenVINO™ toolkit. You cannot do inference on your trained model without running the model through the Model Optimizer. When you run a pre-trained model through the Model Optimizer, your output is an Intermediate Representation (IR) of the network. The IR is a pair of files that describe the whole model: + +- `.xml`: Describes the network topology +- `.bin`: Contains the weights and biases binary data + +The Inference Engine reads, loads, and infers the IR files, using a common API across the CPU, GPU, or VPU hardware. + +The Model Optimizer is a Python*-based command line tool (`mo.py`), which is located in `C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer`. Use this tool on models trained with popular deep learning frameworks such as Caffe\*, TensorFlow\*, MXNet\*, and ONNX\* to convert them to an optimized IR format that the Inference Engine can use. + +This section explains how to use scripts to configure the Model Optimizer either for all of the supported frameworks at the same time or for individual frameworks. If you want to manually configure the Model Optimizer instead of using scripts, see the **Using Manual Configuration Process** section on the [Configuring the Model Optimizer](../MO_DG/prepare_model/Config_Model_Optimizer.md) page. + +For more information about the Model Optimizer, see the [Model Optimizer Developer Guide](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). + + +### Model Optimizer Configuration Steps + +You can configure the Model Optimizer either for all supported frameworks at once or for one framework at a time. Choose the option that best suits your needs. If you see error messages, make sure you installed all dependencies. + +> **IMPORTANT**: The Internet access is required to execute the following steps successfully. If you have access to the Internet through the proxy server only, please make sure that it is configured in your environment. + +> **NOTE**: +> In the steps below: +> - If you you want to use the Model Optimizer from another installed versions of Intel® Distribution of OpenVINO™ toolkit installed, replace `openvino` with `openvino_`. +> - If you installed the Intel® Distribution of OpenVINO™ toolkit to the non-default installation directory, replace `C:\Program Files (x86)\IntelSWTools` with the directory where you installed the software. + +These steps use a command prompt to make sure you see error messages. + +#### Option 1: Configure the Model Optimizer for all supported frameworks at the same time: + +1. Open a command prompt. To do so, type `cmd` in your **Search Windows** box and then press **Enter**. +Type commands in the opened window: + + ![](../img/command_prompt.PNG) + +2. Go to the Model Optimizer prerequisites directory.
+```sh +cd C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\install_prerequisites +``` + +3. Run the following batch file to configure the Model Optimizer for Caffe\*, TensorFlow\*, MXNet\*, Kaldi\*, and ONNX\*:
+```sh +install_prerequisites.bat +``` + +#### Option 2: Configure the Model Optimizer for each framework separately: + +1. Go to the Model Optimizer prerequisites directory:
+```sh +cd C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer\install_prerequisites +``` + +2. Run the batch file for the framework you will use with the Model Optimizer. You can use more than one: + + * For **Caffe**:
+ ```sh + install_prerequisites_caffe.bat + ``` + + * For **TensorFlow**:
+ ```sh + install_prerequisites_tf.bat + ``` + + * For **MXNet**:
+ ```sh + install_prerequisites_mxnet.bat + ``` + + * For **ONNX**: + ```sh + install_prerequisites_onnx.bat + ``` + + * For **Kaldi**: + ```sh + install_prerequisites_kaldi.bat + ``` + +The Model Optimizer is configured for one or more frameworks. Success is indicated by a screen similar to this: + +![](../img/Configure-MO.PNG) + +You are ready to use two short demos to see the results of running the Intel Distribution of OpenVINO toolkit and to verify your installation was successful. The demo scripts are required since they perform additional configuration steps. Continue to the next section. + +If you want to use a GPU or VPU, or update your Windows* environment variables, read through the Optional Steps section. + + +## Use Verification Scripts to Verify Your Installation + +> **IMPORTANT**: This section is required. In addition to confirming your installation was successful, demo scripts perform other steps, such as setting up your computer to use the Inference Engine samples. + +> **NOTE**: +> The paths in this section assume you used the default installation directory. If you used a directory other than `C:\Program Files (x86)\IntelSWTools`, update the directory with the location where you installed the software. +To verify the installation and compile two samples, run the verification applications provided with the product on the CPU: + +1. Open a command prompt window. + +2. Go to the Inference Engine demo directory:
+ ```sh + cd C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\demo\ + ``` + +3. Run the verification scripts by following the instructions in the next section. + + +### Run the Image Classification Verification Script + +To run the script, start the `demo_squeezenet_download_convert_run.bat` file: +```sh +demo_squeezenet_download_convert_run.bat +``` + +This script downloads a SqueezeNet model, uses the Model Optimizer to convert the model to the `.‍bin` and `.‍xml` Intermediate Representation (IR) files. The Inference Engine requires this model conversion so it can use the IR as input and achieve optimum performance on Intel hardware.
+This verification script builds the [Image Classification Sample Async](../../inference-engine/samples/classification_sample_async/README.md) application and run it with the `car.png` image in the demo directory. For a brief description of the Intermediate Representation, see Configuring the Model Optimizer. + +When the verification script completes, you will have the label and confidence for the top-10 categories: +![](../img/image_classification_script_output_win.png) + +This demo is complete. Leave the console open and continue to the next section to run the Inference Pipeline demo. + + +### Run the Inference Pipeline Verification Script + +To run the script, start the `demo_security_barrier_camera.bat` file while still in the console: +```sh +demo_security_barrier_camera.bat +``` + +This script downloads three pre-trained model IRs, builds the [Security Barrier Camera Demo](@ref omz_demos_security_barrier_camera_demo_README) application, and runs it with the downloaded models and the `car_1.bmp` image from the `demo` directory to show an inference pipeline. The verification script uses vehicle recognition in which vehicle attributes build on each other to narrow in on a specific attribute. + +First, an object is identified as a vehicle. This identification is used as input to the next model, which identifies specific vehicle attributes, including the license plate. Finally, the attributes identified as the license plate are used as input to the third model, which recognizes specific characters in the license plate. + +When the demo completes, you have two windows open: + + * A console window that displays information about the tasks performed by the demo + * An image viewer window that displays a resulting frame with detections rendered as bounding boxes, similar to the following: + + ![](../img/inference_pipeline_script_win.png) + +Close the image viewer window to end the demo. + +To learn more about the verification scripts, see `README.txt` in `C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\demo`. + +For detailed description of the OpenVINO™ pre-trained object detection and object recognition models, see the [Overview of OpenVINO™ toolkit Pre-Trained Models](@ref omz_models_intel_index) page. + +In this section, you saw a preview of the Intel® Distribution of OpenVINO™ toolkit capabilities. + +Congratulations. You have completed all the required installation, configuration, and build steps to work with your trained models using CPU. + +If you want to use Intel® Processor graphics (GPU), Intel® Neural Compute Stick 2 or Intel® Vision Accelerator Design with Intel® Movidius™ (VPU), or add CMake* and Python* to your Windows* environment variables, read through the next section for additional steps. + +If you want to continue and run the Image Classification Sample Application on one of the supported hardware device, see the [Run the Image Classification Sample Application](#run-the-image-classification-sample-application) section. + +## Optional Steps + +Use the optional steps below if you want to: +* Infer models on Intel® Processor Graphics +* Infer models on Intel® Vision Accelerator Design with Intel® Movidius™ VPUs +* Add CMake* or Python* to your Windows* environment variables. + +### Optional: Additional Installation Steps for Intel® Processor Graphics (GPU) + +> **NOTE**: These steps are required only if you want to use a GPU. + +If your applications offload computation to Intel® Integrated Graphics, you must have the Intel Graphics Driver for Windows version 15.65 or higher. To see if you have this driver installed: + +1. Type **device manager** in your **Search Windows** box. The **Device Manager** opens. + +2. Click the drop-down arrow to view the **Display adapters**. You see the adapter that is installed in your computer: + + ![](../img/DeviceManager.PNG) + +3. Right-click the adapter name and select **Properties**. + +4. Click the **Driver** tab to see the driver version. Make sure the version number is 15.65 or higher. + + ![](../img/DeviceDriverVersion.PNG) + +5. If your device driver version is lower than 15.65, [download and install a higher version](http://downloadcenter.intel.com/product/80939/Graphics-Drivers). + +You are done updating your device driver and are ready to use your GPU. + + +### Optional: Additional Installation Steps for the Intel® Vision Accelerator Design with Intel® Movidius™ VPUs + +> **NOTE**: These steps are required only if you want to use Intel® Vision Accelerator Design with Intel® Movidius™ VPUs. + +To perform inference on Intel® Vision Accelerator Design with Intel® Movidius™ VPUs, the following additional installation steps are required: + + 1. If your Intel® Vision Accelerator Design with Intel® Movidius™ VPUs card requires SMBUS connection to PCIe slot (Raw video data card with HW version Fab-B and before), install the SMBUS driver: + 1. Go to the `\deployment_tools\inference-engine\external\hddl\SMBusDriver` directory, where `` is the directory in which the Intel Distribution of OpenVINO toolkit is installed. + 2. Right click on the `hddlsmbus.inf` file and choose **Install** from the pop up menu. + + 2. Download and install Visual C++ Redistributable for Visual Studio 2015 + +You are done installing your device driver and are ready to use your Intel® Vision Accelerator Design with Intel® Movidius™ VPUs. + +See also: + +* For advanced configuration steps for your IEI Mustang-V100-MX8 accelerator, see [Intel® Movidius™ VPUs Setup Guide for Use with Intel® Distribution of OpenVINO™ toolkit](movidius-setup-guide.md). + +* After you've configurated your Intel® Vision Accelerator Design with Intel® Movidius™ VPUs, see [Intel® Movidius™ VPUs Programming Guide for Use with Intel® Distribution of OpenVINO™ toolkit](movidius-programming-guide.md) to learn how to distribute a model across all 8 VPUs to maximize performance. + +After configuration is done, you are ready to run the verification scripts with the HDDL Plugin for your Intel® Vision Accelerator Design with Intel® Movidius™ VPUs. + +1. Open a command prompt window. + +2. Go to the Inference Engine demo directory: + ```sh + cd C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\demo\ + ``` +3. Run the Image Classification verification script. If you have access to the Internet through the proxy server only, please make sure that it is configured in your environment. + ```sh + demo_squeezenet_download_convert_run.bat -d HDDL + ``` +4. Run the Inference Pipeline verification script: + ```sh + demo_security_barrier_camera.bat -d HDDL + ``` + +### Optional: Update Your Windows Environment Variables + +> **NOTE**: These steps are only required under special circumstances, such as if you forgot to check the box during the CMake\* or Python\* installation to add the application to your Windows `PATH` environment variable. + +Use these steps to update your Windows `PATH` if a command you execute returns an error message stating that an application cannot be found. This might happen if you do not add CMake or Python to your `PATH` environment variable during the installation. + +1. In your **Search Windows** box, type **Edit the system environment variables** and press **Enter**. A window similar to the following displays: + ![](../img/System_Properties.PNG) + +2. At the bottom of the screen, click **Environment Variables**. + +3. Under **System variables**, click **Path** and then **Edit**: + ![](../img/Environment_Variables-select_Path.PNG) + +4. In the opened window, click **Browse**. A browse window opens: + ![](../img/Add_Environment_Variable.PNG) + +5. If you need to add CMake to the `PATH`, browse to the directory in which you installed CMake. The default directory is `C:\Program Files\CMake`. + +6. If you need to add Python to the `PATH`, browse to the directory in which you installed Python. The default directory is `C:\Users\\AppData\Local\Programs\Python\Python36\Python`. + +7. Click **OK** repeatedly to close each screen. + +Your `PATH` environment variable is updated. + +## Run the Image Classification Sample Application + +> **IMPORTANT**: This section requires that you have [Run the Verification Scripts to Verify Installation](#run-the-demos). This script builds the Image Classification sample application and downloads and converts the required Caffe* Squeezenet model to an IR. + +In this section you will run the Image Classification sample application, with the Caffe* Squeezenet1.1 model on three types of Intel® hardware: CPU, GPU and VPUs. + +Image Classification sample application binary file was automatically built and the FP16 model IR files are created when you [Ran the Image Classification Verification Script](#run-the-image-classification-verification-script). + +The Image Classification sample application binary file located in the `C:\Users\\Documents\Intel\OpenVINO\inference_engine_samples_build\intel64\Release\` directory. +The Caffe* Squeezenet model IR files (`.bin` and `.xml`) are located in the in the `C:\Users\\Documents\Intel\OpenVINO\openvino_models\ir\public\squeezenet1.1\FP16\` directory. + +> **NOTE**: If you installed the Intel® Distribution of OpenVINO™ toolkit to the non-default installation directory, replace `C:\Program Files (x86)\IntelSWTools` with the directory where you installed the software. + +To run the sample application: + +1. Set up environment variables: +```sh +cd C:\Program Files (x86)\IntelSWTools\openvino\bin\setupvars.bat +``` +2. Go to the samples build directory: +```sh +cd C:\Users\\Documents\Intel\OpenVINO\inference_engine_samples_build\intel64\Release +``` +3. Run the sample executable with specifying the `car.png` file from the `demo` directory as an input image, the IR of your FP16 model and a plugin for a hardware device to perform inference on. +> **NOTE**: Running the sample application on hardware other than CPU requires performing [additional hardware configuration steps](#optional-steps). + + - For CPU: + ```sh + classification_sample_async.exe -i "C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\demo\car.png" -m "C:\Users\\Documents\Intel\OpenVINO\openvino_models\ir\public\squeezenet1.1\FP16\squeezenet1.1.xml" -d CPU + ``` + + - For GPU: + ```sh + classification_sample_async.exe -i "C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\demo\car.png" -m "C:\Users\\Documents\Intel\OpenVINO\openvino_models\ir\public\squeezenet1.1\FP16\squeezenet1.1.xml" -d GPU + ``` + + - For VPU (Intel® Neural Compute Stick 2): + ```sh + classification_sample_async.exe -i "C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\demo\car.png" -m "C:\Users\\Documents\Intel\OpenVINO\openvino_models\ir\public\squeezenet1.1\FP16\squeezenet1.1.xml" -d MYRIAD + ``` + + - For VPU (Intel® Vision Accelerator Design with Intel® Movidius™ VPUs): + ```sh + classification_sample_async.exe -i "C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\demo\car.png" -m "C:\Users\\Documents\Intel\OpenVINO\openvino_models\ir\public\squeezenet1.1\FP16\squeezenet1.1.xml" -d HDDL + ``` + +For information on Sample Applications, see the [Inference Engine Samples Overview](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Samples_Overview.html). + +Congratulations, you have finished the installation of the Intel® Distribution of OpenVINO™ toolkit for Windows*. To learn more about how the Intel® Distribution of OpenVINO™ toolkit works, the Hello World tutorial and other resources are provided below. + + +## Summary + +In this document, you installed the Intel® Distribution of OpenVINO™ toolkit and its dependencies. You also configured the Model Optimizer for one or more frameworks. After the software was installed and configured, you ran two verification scripts. You might have also installed drivers that will let you use a GPU or VPU to infer your models and run the Image Classification Sample application. + +You are now ready to learn more about converting models trained with popular deep learning frameworks to the Inference Engine format, following the links below, or you can move on to running the [sample applications](../IE_DG/Samples_Overview.md). + +To learn more about converting deep learning models, go to: + +- [Convert Your Caffe* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_Caffe.md) +- [Convert Your TensorFlow* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md) +- [Convert Your MXNet* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_MxNet.md) +- [Convert Your ONNX* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md) + +## Additional Resources + +- [Intel Distribution of OpenVINO Toolkit home page](https://software.intel.com/en-us/openvino-toolkit) +- [Intel Distribution of OpenVINO Toolkit documentation](https://software.intel.com/en-us/openvino-toolkit/documentation/featured) +- [OpenVINO™ Release Notes](https://software.intel.com/en-us/articles/OpenVINO-RelNotes) +- [Introduction to Intel® Deep Learning Deployment Toolkit](../IE_DG/Introduction.md) +- [Inference Engine Developer Guide](../IE_DG/Deep_Learning_Inference_Engine_DevGuide.md) +- [Model Optimizer Developer Guide](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) +- [Inference Engine Samples Overview](../IE_DG/Samples_Overview.md) +- [Overview of OpenVINO™ Toolkit Pre-Trained Models](@ref omz_models_intel_index) +- Intel Distribution of OpenVINO Toolkit Hello World Activities, see the [Inference Tutorials for Face Detection and Car Detection Exercises](https://github.com/intel-iot-devkit/inference-tutorials-generic/tree/openvino_toolkit_r3_0) +- [Intel® Neural Compute Stick 2 Get Started](https://software.intel.com/en-us/neural-compute-stick/get-started) + + +[myriad_driver]: ../img/myriad_driver.png diff --git a/docs/install_guides/installing-openvino-yocto.md b/docs/install_guides/installing-openvino-yocto.md new file mode 100644 index 00000000000000..d992fe5c1e41f5 --- /dev/null +++ b/docs/install_guides/installing-openvino-yocto.md @@ -0,0 +1,102 @@ +# Create a Yocto* Image with OpenVINO™ toolkit {#openvino_docs_install_guides_installing_openvino_yocto} +This document provides instructions for creating a Yocto* image with OpenVINO™ toolkit. + +Instructions were validated and tested for [Yocto OpenVINO 2020.3 release](http://git.yoctoproject.org/cgit/cgit.cgi/meta-intel). + +## System Requirements +Use the [Yocto Project* official documentation](https://www.yoctoproject.org/docs/latest/mega-manual/mega-manual.html#brief-compatible-distro) to set up and configure your host machine to be compatible with BitBake*. + +## Setup + +### Set up Git repositories +The following Git repositories are required to build a Yocto image: + +- [Poky](https://www.yoctoproject.org/docs/latest/mega-manual/mega-manual.html#poky) +- [Meta-intel](http://git.yoctoproject.org/cgit/cgit.cgi/meta-intel/tree/README) +- [Meta-openembedded](http://cgit.openembedded.org/meta-openembedded/tree/README) +- Meta-clang + +Clone these Git repositories to your host machine: +```sh +git clone https://git.yoctoproject.org/git/poky +git clone https://git.yoctoproject.org/git/meta-intel +git clone https://git.openembedded.org/meta-openembedded +git clone https://github.com/kraj/meta-clang.git +``` + +### Set up BitBake* Layers + +```sh +source poky/oe-init-build-env +bitbake-layers add-layer ../meta-intel +bitbake-layers add-layer ../meta-openembedded/meta-oe +bitbake-layers add-layer ../meta-openembedded/meta-python +bitbake-layers add-layer ../meta-clang +``` + +### Set up BitBake Configurations + +Include extra configuration in conf/local.conf in your build directory as required. + +```sh +# Build with SSE4.2, AVX2 etc. extensions +MACHINE = "intel-skylake-64" + +# Enable clDNN GPU plugin when needed. +# This requires meta-clang and meta-oe layers to be included in bblayers.conf +# and is not enabled by default. +PACKAGECONFIG_append_pn-openvino-inference-engine = " opencl" + +# Enable building inference engine python API. +# This requires meta-python layer to be included in bblayers.conf. +PACKAGECONFIG_append_pn-openvino-inference-engine = " python3" + +# This adds inference engine related libraries in the target image. +CORE_IMAGE_EXTRA_INSTALL_append = " openvino-inference-engine" + +# This adds inference engine samples in the target image. +CORE_IMAGE_EXTRA_INSTALL_append = " openvino-inference-engine-samples" + +# Include inference engine python API package in the target image. +CORE_IMAGE_EXTRA_INSTALL_append = " openvino-inference-engine-python3" + +# This adds inference engine unit tests in the target image. +CORE_IMAGE_EXTRA_INSTALL_append = " openvino-inference-engine-ptest" + +# Enable MYRIAD plugin +CORE_IMAGE_EXTRA_INSTALL_append = " openvino-inference-engine-vpu-firmware" + +# Include model optimizer in the target image. +CORE_IMAGE_EXTRA_INSTALL_append = " openvino-model-optimizer" +``` + +## Build a Yocto Image with OpenVINO Packages + +Run BitBake to build the minimal image with OpenVINO packages: +```sh +bitbake core-image-minimal +``` + +## Verify the Created Yocto Image with OpenVINO Packages + +Verify that OpenVINO packages were built successfully. +Run 'oe-pkgdata-util list-pkgs | grep openvino' command. +```sh +oe-pkgdata-util list-pkgs | grep openvino +``` + +Verify that it returns the list of packages below: +```sh +openvino-inference-engine +openvino-inference-engine-dbg +openvino-inference-engine-dev +openvino-inference-engine-ptest +openvino-inference-engine-python3 +openvino-inference-engine-samples +openvino-inference-engine-src +openvino-inference-engine-staticdev +openvino-inference-engine-vpu-firmware +openvino-model-optimizer +openvino-model-optimizer-dbg +openvino-model-optimizer-dev +``` diff --git a/docs/install_guides/installing-openvino-yum.md b/docs/install_guides/installing-openvino-yum.md new file mode 100644 index 00000000000000..72a599d8e6c3a1 --- /dev/null +++ b/docs/install_guides/installing-openvino-yum.md @@ -0,0 +1,111 @@ +# Install Intel® Distribution of OpenVINO™ toolkit for Linux* Using YUM Repository {#openvino_docs_install_guides_installing_openvino_yum} + +This guide provides installation steps for the Intel® Distribution of OpenVINO™ toolkit for Linux* distributed through the YUM repository. + +> **IMPORTANT**: By downloading and using this container and the included software, you agree to the terms and conditions of the [software license agreements](https://software.intel.com/en-us/license/eula-for-intel-software-development-products). Please, review the content inside the `/licensing` folder for more details. + +> **NOTE**: Intel® Graphics Compute Runtime for OpenCL™ is not a part of OpenVINO™ YUM distribution. You can install it from the [Intel® Graphics Compute Runtime for OpenCL™ GitHub repo](https://github.com/intel/compute-runtime). + +## Set up the Repository + +> **NOTE:** You must be logged in as root to set up and install the repository. +
+Configure YUM with the OpenVINO repository to install OpenVINO. You have two options for this, using the `yum-config-manager` or manually by creating a text file and pointing YUM to the file. + +* **OPTION 1:** Import the `.repo` file using the `yum-config-manager`: + 1. `yum-utils` must be installed on your system. If it’s not currently installed, run the command: + ```sh + sudo yum install yum-utils + ``` + 2. Add repository using the `yum-config-manager`: + ```sh + sudo yum-config-manager --add-repo https://yum.repos.intel.com/openvino/2020/setup/intel-openvino-2020.repo + ``` + 3. Import the gpg public key for the repository: + ```sh + sudo rpm --import https://yum.repos.intel.com/openvino/2020/setup/RPM-GPG-KEY-INTEL-OPENVINO-2020 + ``` + +* **OPTION 2:** Create the repository file manually: + 1. Navigate to the repository directory: + ```sh + cd /etc/yum.repos.d + ``` + 2. Edit the repo file: + ```sh + vi intel-openvino-2020.repo + ``` + 3. Append the following code: + ```sh + [intel-openvino-2020] + name=Intel(R) Distribution of OpenVINO 2020 + baseurl=https://yum.repos.intel.com/openvino/2020 + enabled=1 + gpgcheck=1 + gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-INTEL-OPENVINO-2020 + ``` + 4. Save and close the `intel-openvino-2020.repo` file. + 5. Import the gpg public key for the repository: + ```sh + sudo rpm --import https://yum.repos.intel.com/openvino/2020/setup/RPM-GPG-KEY-INTEL-OPENVINO-2020 + ``` + +### Verify that the new repo is properly setup +Run the following command: +```sh +yum repolist | grep -i openvino +``` + +Results: +```sh +intel-openvino-2020 Intel(R) Distribution of OpenVINO 2020 +``` + +### To list the available OpenVINO packages +Use the following command: +```sh +yum list intel-openvino* +``` + +--- + +## Install the runtime packages Using the YUM Package Manager + +Intel® OpenVINO will be installed in: `/opt/intel/openvino_..` +
+A symlink will be created: `/opt/intel/openvino` + +--- + +### To install the latest version +To install the full runtime version of the OpenVINO package: +```sh +sudo yum install intel-openvino-runtime-centos7 +``` + +--- + +### To install a specific version +To install the full runtime version of the OpenVINO package: +```sh +sudo yum install intel-openvino-runtime-centos7-.. +``` + +--- + +### To Uninstall a specific version + +To uninstall a specific full runtime package: +```sh +sudo yum autoremove intel-openvino-runtime-centos-.. +``` +**Additional Resources** + +- Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) +- OpenVINO™ toolkit online documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org) +- [Model Optimizer Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html). +- [Inference Engine Developer Guide](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Deep_Learning_Inference_Engine_DevGuide.html) +- For more information on Sample Applications, see the [Inference Engine Samples Overview](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Samples_Overview.html). +- For information on Inference Engine Tutorials, see the [Inference Tutorials](https://github.com/intel-iot-devkit/inference-tutorials-generic). +- For IoT Libraries & Code Samples see the [Intel® IoT Developer Kit](https://github.com/intel-iot-devkit). + diff --git a/docs/install_guides/movidius-programming-guide.md b/docs/install_guides/movidius-programming-guide.md new file mode 100644 index 00000000000000..668ca9f697881d --- /dev/null +++ b/docs/install_guides/movidius-programming-guide.md @@ -0,0 +1,70 @@ +# Intel® Movidius™ VPUs Programming Guide for Use with Intel® Distribution of OpenVINO™ toolkit {#openvino_docs_install_guides_movidius_programming_guide} + +## See Also + +- [Intel® Movidius™ VPUs Setup Guide for use with the Intel® Distribution of OpenVINO™](movidius-setup-guide.md) +- Intel® Vision Accelerator Design with Intel® Movidius™ VPUs HAL Configuration Guide +- Intel® Vision Accelerator Design with Intel® Movidius™ VPUs Workload Distribution User Guide +- Intel® Vision Accelerator Design with Intel® Movidius™ VPUs Scheduler User Guide +- Intel® Vision Accelerator Design with Intel® Movidius™ VPUs Errata + +The following section provides information on how to distribute a model across all 8 VPUs to maximize performance. + +## Programming a C++ Application for the Accelerator + +### Declare a Structure to Track Requests + +The structure should hold: +1. A pointer to an inference request. +2. An ID to keep track of the request. +```cpp +struct Request { + InferenceEngine::InferRequest::Ptr inferRequest; + int frameidx; +}; +``` + +### Declare a Vector of Requests + +```cpp +// numRequests is the number of frames (max size, equal to the number of VPUs in use) +vector request(numRequests); +``` + +Declare and initialize 2 mutex variables: +1. For each request +2. For when all 8 requests are done + +### Declare a Conditional Variable + +Conditional variable indicates when at most 8 requests are done at a time. + +For inference requests, use the asynchronous IE API calls: + +```cpp +// initialize infer request pointer – Consult IE API for more detail. +request[i].inferRequest = executable_network.CreateInferRequestPtr(); +``` + +```cpp +// Run inference +request[i].inferRequest->StartAsync(); +``` + + +### Create a Lambda Function + +Lambda Function enables the parsing and display of results. + +Inside the Lambda body use the completion callback function: + +```cpp +request[i].inferRequest->SetCompletionCallback +(nferenceEngine::IInferRequest::Ptr context) +``` + +## Additional Resources + +- [Intel Distribution of OpenVINO Toolkit home page](https://software.intel.com/en-us/openvino-toolkit) + +- [Intel Distribution of OpenVINO Toolkit documentation](https://docs.openvinotoolkit.org) diff --git a/docs/install_guides/movidius-setup-guide.md b/docs/install_guides/movidius-setup-guide.md new file mode 100644 index 00000000000000..4c3956266140f5 --- /dev/null +++ b/docs/install_guides/movidius-setup-guide.md @@ -0,0 +1,136 @@ +# Intel® Movidius™ VPUs Setup Guide for Use with Intel® Distribution of OpenVINO™ toolkit {#openvino_docs_install_guides_movidius_setup_guide} + +## See Also + +- [Intel® Movidius™ VPUs Programming Guide for use with the Intel® Distribution of OpenVINO™](movidius-programming-guide.md) +- Intel® Vision Accelerator Design with Intel® Movidius™ VPUs HAL Configuration Guide +- Intel® Vision Accelerator Design with Intel® Movidius™ VPUs Workload Distribution User Guide +- Intel® Vision Accelerator Design with Intel® Movidius™ VPUs Scheduler User Guide +- Intel® Vision Accelerator Design with Intel® Movidius™ VPUs Errata + +The IEI Mustang-V100-MX8 is an OEM version of the Intel® Vision Accelerator Design with Intel® Movidius™ VPUs. +This guide assumes you have installed the [Mustang-V100-MX8](https://download.ieiworld.com/) and the [Intel® Distribution of OpenVINO™ Toolkit](https://software.intel.com/en-us/openvino-toolkit). + +Instructions in this guide for configuring your accelerator include: +1. Installing the required IEI\* BSL reset software +2. Configuration settings for the `hddldaemon` service + +> **NOTE**: This guide does not apply to Uzel\* cards. + +## IEI Reset Software Installation + +Using the IEI Mustang-V100-MX8 requires downloading and installing the most current software for your system. + +Visit the [IEI Download Center](https://download.ieiworld.com/) for the most current software and documentation. +Search for **Mustang-V100-MX8**. + +Download the appropriate software for your system, decompress the downloaded archive, enter the newly created directory, and run the install script: + +On **Linux**\*: +- Run the `install.sh script` with `sudo`, or as `root`. + +On **Windows**\*, do one of the following:
+- **GUI**: Double-click `install.bat` +- **CLI**: Open a console with administrator privileges, cd into the directory, and run `install.bat`. + +## Mustang-V100-MX8 Service Configuration + +The `hddldaemon` is a system service, a binary executable that is run to manage the computational workload on the board. It is a required abstraction layer that handles inference, graphics processing, and any type of computation that should be run on the video processing units (VPUs). Depending on the board configuration, there can be 8 or 16 VPUs. + +> **NOTE**: Graphics and other specialized processing may require some custom development. + +### Conventions Used in This Document + +`` refers to the following default OpenVINO™ Inference Engine directories: +- **Linux:** + ``` + /opt/intel/openvino/inference_engine + ``` +- **Windows:** +``` +C:\Program Files (x86)\IntelSWTools\openvino\inference_engine +``` + +If you have installed OpenVINO™ in a different directory on your system, you will need to enter your unique directory path. + +### Configuration File Location + +`\external\hddl\config\hddl_service.config` + +### Service Configuration File Settings + +Below are some possible configuration options. + +> **NOTE:** After changing a configuration file, the `hddldaemon` must be restarted. + +### Recommended Settings + +`device_snapshot_mode` +Changes the output of the `hddldaemon` to display a table with individual VPU statistics. + +**Default Setting:** +`"device_snapshot_mode": "none"` + +**Suggested Setting:** +`"device_snapshot_mode": "full"` + +**Supported Settings:** + - `none` (default) + - `base` + - `full` + +`device_snapshot_style` + +**Default Setting:** +`"device_snapshot_style": "table"` + +**Recommended Setting:** +`"device_snapshot_style": "table"` + +The `table` setting presents labels on the left for each column and is recommended as easier to read. +The `tape` setting prints the labels in each column. + +**Supported Settings:** +- `tape` +- `table` (default) + +`user_group ` +Restricts the service to group members. + +**Recommended setting depends on your unique system configuration.** + +**Default Setting** +`"user_group": "users"` + +The `hddldaemon` may be restricted to a privileged group of users. The appropriate group will vary according to the local system configuration. + +**Supported Settings:** +Valid groups on the current system. The `"users"` group is a default group that exists on Windows and most Linux distributions. + + +**Optional Recommended Settings:** + +`"device_utilization" : "off"` +This setting displays the percent of time each VPU is in use. It appears in the `table` column when turned on, or if `"device_fps"` is turned on. + +`"memory_usage" : "off"` +This setting reports the amount of memory being used by each VPU. + +`"max_cycle_switchout": 3` +Requires the squeeze scheduler. This setting might speed up performance significantly, depending on the app. + +> **NOTE:** This setting works in conjunction with: `max_task_number_switch_out`. + +`"client_fps" : "off"` +This setting reports the total FPS for the dispatching hddl_service (which will have one or more clients per app). + +`debug_service` +`"debug_service": "false"` +(default: `"true"`) + + +## Additional Resources + +- [Intel Distribution of OpenVINO Toolkit home page](https://software.intel.com/en-us/openvino-toolkit) + +- [Intel Distribution of OpenVINO Toolkit documentation](https://docs.openvinotoolkit.org) \ No newline at end of file diff --git a/docs/ops/activation/Clamp_1.md b/docs/ops/activation/Clamp_1.md index 283498c8489aef..4a4151a4d18291 100644 --- a/docs/ops/activation/Clamp_1.md +++ b/docs/ops/activation/Clamp_1.md @@ -1,4 +1,4 @@ -## Clamp +## Clamp {#openvino_docs_ops_activation_Clamp_1} **Versioned name**: *Clamp-1* diff --git a/docs/ops/activation/Elu_1.md b/docs/ops/activation/Elu_1.md index 25eb4f251aa1e9..bc69b40de18a0f 100644 --- a/docs/ops/activation/Elu_1.md +++ b/docs/ops/activation/Elu_1.md @@ -1,4 +1,4 @@ -## Elu +## Elu {#openvino_docs_ops_activation_Elu_1} **Versioned name**: *Elu-1* diff --git a/docs/ops/activation/Exp_1.md b/docs/ops/activation/Exp_1.md index dc549c5256ecd9..c3f05c72db96f9 100644 --- a/docs/ops/activation/Exp_1.md +++ b/docs/ops/activation/Exp_1.md @@ -1,4 +1,4 @@ -## Exp +## Exp {#openvino_docs_ops_activation_Exp_1} **Versioned name**: *Exp-1* diff --git a/docs/ops/activation/GELU_2.md b/docs/ops/activation/GELU_2.md index 5625062567dca3..c22e72d2b99821 100644 --- a/docs/ops/activation/GELU_2.md +++ b/docs/ops/activation/GELU_2.md @@ -1,4 +1,4 @@ -## GELU- Gaussian Error Linear Unit +## GELU- Gaussian Error Linear Unit {#openvino_docs_ops_activation_GELU_2} **Versioned name**: *Gelu-2* diff --git a/docs/ops/activation/HardSigmoid_1.md b/docs/ops/activation/HardSigmoid_1.md index f368a45f340547..c36bf7a5a6f41d 100644 --- a/docs/ops/activation/HardSigmoid_1.md +++ b/docs/ops/activation/HardSigmoid_1.md @@ -1,4 +1,4 @@ -## HardSigmoid +## HardSigmoid {#openvino_docs_ops_activation_HardSigmoid_1} **Versioned name**: *HardSigmoid-1* diff --git a/docs/ops/activation/Mish_4.md b/docs/ops/activation/Mish_4.md index 21bc37208765f8..de8397c188825a 100644 --- a/docs/ops/activation/Mish_4.md +++ b/docs/ops/activation/Mish_4.md @@ -1,4 +1,4 @@ -## Mish +## Mish {#openvino_docs_ops_activation_Mish_4} **Versioned name**: *Mish-4* diff --git a/docs/ops/activation/PReLU_1.md b/docs/ops/activation/PReLU_1.md index a7261301e673af..74920e1306be47 100644 --- a/docs/ops/activation/PReLU_1.md +++ b/docs/ops/activation/PReLU_1.md @@ -1,4 +1,4 @@ -## PReLU +## PReLU {#openvino_docs_ops_activation_PReLU_1} **Versioned name**: *PReLU-1* diff --git a/docs/ops/activation/ReLU_1.md b/docs/ops/activation/ReLU_1.md index e31bae1f797d9f..56b907648e43d4 100644 --- a/docs/ops/activation/ReLU_1.md +++ b/docs/ops/activation/ReLU_1.md @@ -1,4 +1,4 @@ -## ReLU +## ReLU {#openvino_docs_ops_activation_ReLU_1} **Versioned name**: *ReLU-1* diff --git a/docs/ops/activation/Sigmoid_1.md b/docs/ops/activation/Sigmoid_1.md index 02ea475c71fbab..f4a70faaff0705 100644 --- a/docs/ops/activation/Sigmoid_1.md +++ b/docs/ops/activation/Sigmoid_1.md @@ -1,4 +1,4 @@ -## Sigmoid +## Sigmoid {#openvino_docs_ops_activation_Sigmoid_1} **Versioned name**: *Sigmoid-1* diff --git a/docs/ops/activation/SoftMax_1.md b/docs/ops/activation/SoftMax_1.md index a5bccd11f0e0b2..41a28f05792644 100644 --- a/docs/ops/activation/SoftMax_1.md +++ b/docs/ops/activation/SoftMax_1.md @@ -1,4 +1,4 @@ -## SoftMax +## SoftMax {#openvino_docs_ops_activation_SoftMax_1} **Versioned name**: *SoftMax-1* diff --git a/docs/ops/activation/SoftPlus_4.md b/docs/ops/activation/SoftPlus_4.md index ad57f4e0212e3c..112faa2873098e 100644 --- a/docs/ops/activation/SoftPlus_4.md +++ b/docs/ops/activation/SoftPlus_4.md @@ -1,4 +1,4 @@ -## SoftPlus +## SoftPlus {#openvino_docs_ops_activation_SoftPlus_4} **Versioned name**: *SoftPlus-4* diff --git a/docs/ops/activation/Swish_4.md b/docs/ops/activation/Swish_4.md index 403341d7112014..e8a51c9dc048db 100644 --- a/docs/ops/activation/Swish_4.md +++ b/docs/ops/activation/Swish_4.md @@ -1,4 +1,4 @@ -## Swish +## Swish {#openvino_docs_ops_activation_Swish_4} **Versioned name**: *Swish-4* diff --git a/docs/ops/arithmetic/Abs_1.md b/docs/ops/arithmetic/Abs_1.md index d9091192d22c22..91b296381a7aac 100644 --- a/docs/ops/arithmetic/Abs_1.md +++ b/docs/ops/arithmetic/Abs_1.md @@ -1,4 +1,4 @@ -## Abs +## Abs {#openvino_docs_ops_arithmetic_Abs_1} **Versioned name**: *Abs-1* diff --git a/docs/ops/arithmetic/Acos_1.md b/docs/ops/arithmetic/Acos_1.md index 31f10dd7df4dac..384f033ac25296 100644 --- a/docs/ops/arithmetic/Acos_1.md +++ b/docs/ops/arithmetic/Acos_1.md @@ -1,4 +1,4 @@ -## Acos +## Acos {#openvino_docs_ops_arithmetic_Acos_1} **Versioned name**: *Acos-1* diff --git a/docs/ops/arithmetic/Acosh_1.md b/docs/ops/arithmetic/Acosh_1.md new file mode 100644 index 00000000000000..2b944da5e8595f --- /dev/null +++ b/docs/ops/arithmetic/Acosh_1.md @@ -0,0 +1,50 @@ +## Acosh {#openvino_docs_ops_arithmetic_Acosh_1} + +**Versioned name**: *Acosh-1* + +**Category**: Arithmetic unary operation + +**Short description**: *Acosh* performs element-wise hyperbolic inverse cosine (arccosh) operation with given tensor. + +**Attributes**: + + No attributes available. + +**Inputs** + +* **1**: An tensor of type T. **Required.** + +**Outputs** + +* **1**: The result of element-wise acosh operation. A tensor of type T. + +**Types** + +* *T*: any numeric type. + +*Acosh* does the following with the input tensor *a*: + +\f[ +a_{i} = acosh(a_{i}) +\f] + +**Examples** + +*Example 1* + +```xml + + + + 256 + 56 + + + + + 256 + 56 + + + +``` diff --git a/docs/ops/arithmetic/Acosh_3.md b/docs/ops/arithmetic/Acosh_3.md index 7b1ca90d2d657f..23f6b64385f2b4 100644 --- a/docs/ops/arithmetic/Acosh_3.md +++ b/docs/ops/arithmetic/Acosh_3.md @@ -1,4 +1,4 @@ -## Acosh +## Acosh {#openvino_docs_ops_arithmetic_Acosh_3} **Versioned name**: *Acosh-3* diff --git a/docs/ops/arithmetic/Add_1.md b/docs/ops/arithmetic/Add_1.md index 291f256d56ca56..cd81141d1eaa44 100644 --- a/docs/ops/arithmetic/Add_1.md +++ b/docs/ops/arithmetic/Add_1.md @@ -1,4 +1,4 @@ -## Add +## Add {#openvino_docs_ops_arithmetic_Add_1} **Versioned name**: *Add-1* diff --git a/docs/ops/arithmetic/Asin_1.md b/docs/ops/arithmetic/Asin_1.md index 5f274704c9db35..db6ea5074bff67 100644 --- a/docs/ops/arithmetic/Asin_1.md +++ b/docs/ops/arithmetic/Asin_1.md @@ -1,4 +1,4 @@ -## Asin +## Asin {#openvino_docs_ops_arithmetic_Asin_1} **Versioned name**: *Asin-1* diff --git a/docs/ops/arithmetic/Asinh_1.md b/docs/ops/arithmetic/Asinh_1.md new file mode 100644 index 00000000000000..6e407f13525f5e --- /dev/null +++ b/docs/ops/arithmetic/Asinh_1.md @@ -0,0 +1,50 @@ +## Asinh {#openvino_docs_ops_arithmetic_Asinh_1} + +**Versioned name**: *Asinh-1* + +**Category**: Arithmetic unary operation + +**Short description**: *Asinh* performs element-wise hyperbolic inverse sine (arcsinh) operation with given tensor. + +**Attributes**: + + No attributes available. + +**Inputs** + +* **1**: An tensor of type T. **Required.** + +**Outputs** + +* **1**: The result of element-wise asinh operation. A tensor of type T. + +**Types** + +* *T*: any numeric type. + +*Asinh* does the following with the input tensor *a*: + +\f[ +a_{i} = asinh(a_{i}) +\f] + +**Examples** + +*Example 1* + +```xml + + + + 256 + 56 + + + + + 256 + 56 + + + +``` diff --git a/docs/ops/arithmetic/Asinh_3.md b/docs/ops/arithmetic/Asinh_3.md index 67e0634f7ccaae..9c2e3096d8977f 100644 --- a/docs/ops/arithmetic/Asinh_3.md +++ b/docs/ops/arithmetic/Asinh_3.md @@ -1,4 +1,4 @@ -## Asinh +## Asinh {#openvino_docs_ops_arithmetic_Asinh_3} **Versioned name**: *Asinh-3* diff --git a/docs/ops/arithmetic/Atan_1.md b/docs/ops/arithmetic/Atan_1.md index 494015321b328a..43f9a1c4213572 100644 --- a/docs/ops/arithmetic/Atan_1.md +++ b/docs/ops/arithmetic/Atan_1.md @@ -1,4 +1,4 @@ -## Atan +## Atan {#openvino_docs_ops_arithmetic_Atan_1} **Versioned name**: *Atan-1* diff --git a/docs/ops/arithmetic/Atanh_1.md b/docs/ops/arithmetic/Atanh_1.md new file mode 100644 index 00000000000000..80c01571b16f09 --- /dev/null +++ b/docs/ops/arithmetic/Atanh_1.md @@ -0,0 +1,50 @@ +## Atanh {#openvino_docs_ops_arithmetic_Atanh_1} + +**Versioned name**: *Atanh-1* + +**Category**: Arithmetic unary operation + +**Short description**: *Atanh* performs element-wise hyperbolic inverse tangent (arctangenth) operation with given tensor. + +**Attributes**: + + No attributes available. + +**Inputs** + +* **1**: An tensor of type T. **Required.** + +**Outputs** + +* **1**: The result of element-wise atanh operation. A tensor of type T. + +**Types** + +* *T*: any numeric type. + +*Atanh* does the following with the input tensor *a*: + +\f[ +a_{i} = atanh(a_{i}) +\f] + +**Examples** + +*Example 1* + +```xml + + + + 256 + 56 + + + + + 256 + 56 + + + +``` diff --git a/docs/ops/arithmetic/Atanh_3.md b/docs/ops/arithmetic/Atanh_3.md index 0bbec4d53646d4..e2d5dfa36eb147 100644 --- a/docs/ops/arithmetic/Atanh_3.md +++ b/docs/ops/arithmetic/Atanh_3.md @@ -1,4 +1,4 @@ -## Atanh +## Atanh {#openvino_docs_ops_arithmetic_Atanh_3} **Versioned name**: *Atanh-3* diff --git a/docs/ops/arithmetic/Ceiling_1.md b/docs/ops/arithmetic/Ceiling_1.md index dd08cbdd9a9f6c..588b5ff6842f55 100644 --- a/docs/ops/arithmetic/Ceiling_1.md +++ b/docs/ops/arithmetic/Ceiling_1.md @@ -1,4 +1,4 @@ -## Ceiling +## Ceiling {#openvino_docs_ops_arithmetic_Ceiling_1} **Versioned name**: *Ceiling-1* diff --git a/docs/ops/arithmetic/Cos_1.md b/docs/ops/arithmetic/Cos_1.md index c8d42125593b54..3ff7c3593cabd8 100644 --- a/docs/ops/arithmetic/Cos_1.md +++ b/docs/ops/arithmetic/Cos_1.md @@ -1,4 +1,4 @@ -## Cos +## Cos {#openvino_docs_ops_arithmetic_Cos_1} **Versioned name**: *Cos-1* diff --git a/docs/ops/arithmetic/Cosh_1.md b/docs/ops/arithmetic/Cosh_1.md index fc59e6c24d3022..d90f6a182fe5c9 100644 --- a/docs/ops/arithmetic/Cosh_1.md +++ b/docs/ops/arithmetic/Cosh_1.md @@ -1,4 +1,4 @@ -## Cosh +## Cosh {#openvino_docs_ops_arithmetic_Cosh_1} **Versioned name**: *Cosh-1* diff --git a/docs/ops/arithmetic/CumSum_3.md b/docs/ops/arithmetic/CumSum_3.md index d1b579f30049da..952998069a1b51 100644 --- a/docs/ops/arithmetic/CumSum_3.md +++ b/docs/ops/arithmetic/CumSum_3.md @@ -1,4 +1,4 @@ -## CumSum +## CumSum {#openvino_docs_ops_arithmetic_CumSum_3} **Versioned name**: *CumSum-3* diff --git a/docs/ops/arithmetic/Divide_1.md b/docs/ops/arithmetic/Divide_1.md index ba16bd8ac055f3..63db0fa8a1bd01 100644 --- a/docs/ops/arithmetic/Divide_1.md +++ b/docs/ops/arithmetic/Divide_1.md @@ -1,4 +1,4 @@ -## Divide +## Divide {#openvino_docs_ops_arithmetic_Divide_1} **Versioned name**: *Divide-1* diff --git a/docs/ops/arithmetic/Erf_1.md b/docs/ops/arithmetic/Erf_1.md index f6f7654c511b1e..a672ab6431d304 100644 --- a/docs/ops/arithmetic/Erf_1.md +++ b/docs/ops/arithmetic/Erf_1.md @@ -1,4 +1,4 @@ -## Erf +## Erf {#openvino_docs_ops_arithmetic_Erf_1} **Versioned name**: *Erf-1* diff --git a/docs/ops/arithmetic/FloorMod_1.md b/docs/ops/arithmetic/FloorMod_1.md index 4087e9021526ca..fca707d5a31d68 100644 --- a/docs/ops/arithmetic/FloorMod_1.md +++ b/docs/ops/arithmetic/FloorMod_1.md @@ -1,4 +1,4 @@ -## FloorMod +## FloorMod {#openvino_docs_ops_arithmetic_FloorMod_1} **Versioned name**: *FloorMod-1* diff --git a/docs/ops/arithmetic/Floor_1.md b/docs/ops/arithmetic/Floor_1.md index dbf8a31ed36b82..f76c3b24752e8c 100644 --- a/docs/ops/arithmetic/Floor_1.md +++ b/docs/ops/arithmetic/Floor_1.md @@ -1,4 +1,4 @@ -## Floor +## Floor {#openvino_docs_ops_arithmetic_Floor_1} **Versioned name**: *Floor-1* diff --git a/docs/ops/arithmetic/Log_1.md b/docs/ops/arithmetic/Log_1.md index 2645f02b2ed36c..6f33b002b693b7 100644 --- a/docs/ops/arithmetic/Log_1.md +++ b/docs/ops/arithmetic/Log_1.md @@ -1,4 +1,4 @@ -## Log +## Log {#openvino_docs_ops_arithmetic_Log_1} **Versioned name**: *Log-1* diff --git a/docs/ops/arithmetic/Maximum_1.md b/docs/ops/arithmetic/Maximum_1.md index abae8341e86e11..9c4a96d4d637e9 100644 --- a/docs/ops/arithmetic/Maximum_1.md +++ b/docs/ops/arithmetic/Maximum_1.md @@ -1,4 +1,4 @@ -## Maximum +## Maximum {#openvino_docs_ops_arithmetic_Maximum_1} **Versioned name**: *Maximum-1* diff --git a/docs/ops/arithmetic/Minimum_1.md b/docs/ops/arithmetic/Minimum_1.md index 7304d43c50efca..283f33c0b094e2 100644 --- a/docs/ops/arithmetic/Minimum_1.md +++ b/docs/ops/arithmetic/Minimum_1.md @@ -1,4 +1,4 @@ -## Minimum +## Minimum {#openvino_docs_ops_arithmetic_Minimum_1} **Versioned name**: *Minimum-1* diff --git a/docs/ops/arithmetic/Mod_1.md b/docs/ops/arithmetic/Mod_1.md index b11375fc409fe1..e95ccf2e5c6aa8 100644 --- a/docs/ops/arithmetic/Mod_1.md +++ b/docs/ops/arithmetic/Mod_1.md @@ -1,4 +1,4 @@ -## Mod +## Mod {#openvino_docs_ops_arithmetic_Mod_1} **Versioned name**: *Mod-1* diff --git a/docs/ops/arithmetic/Multiply_1.md b/docs/ops/arithmetic/Multiply_1.md index a7dd4d40f4d896..3fa0365473c86c 100644 --- a/docs/ops/arithmetic/Multiply_1.md +++ b/docs/ops/arithmetic/Multiply_1.md @@ -1,4 +1,4 @@ -## Multiply +## Multiply {#openvino_docs_ops_arithmetic_Multiply_1} **Versioned name**: *Multiply-1* diff --git a/docs/ops/arithmetic/Negative_1.md b/docs/ops/arithmetic/Negative_1.md index 46bdff46fb3eda..2e17112e7bcc51 100644 --- a/docs/ops/arithmetic/Negative_1.md +++ b/docs/ops/arithmetic/Negative_1.md @@ -1,4 +1,4 @@ -## Negative +## Negative {#openvino_docs_ops_arithmetic_Negative_1} **Versioned name**: *Negative-1* diff --git a/docs/ops/arithmetic/Power_1.md b/docs/ops/arithmetic/Power_1.md index 6f145a00fec67b..81f13e9802b0d8 100644 --- a/docs/ops/arithmetic/Power_1.md +++ b/docs/ops/arithmetic/Power_1.md @@ -1,4 +1,4 @@ -## Power +## Power {#openvino_docs_ops_arithmetic_Power_1} **Versioned name**: *Power-1* diff --git a/docs/ops/arithmetic/Selu_1.md b/docs/ops/arithmetic/Selu_1.md index 3bebaa0eb917d4..8d69d13fbf2e37 100644 --- a/docs/ops/arithmetic/Selu_1.md +++ b/docs/ops/arithmetic/Selu_1.md @@ -1,4 +1,4 @@ -## Selu +## Selu {#openvino_docs_ops_arithmetic_Selu_1} **Versioned name**: *Selu-1* diff --git a/docs/ops/arithmetic/Sign_1.md b/docs/ops/arithmetic/Sign_1.md index c979e3b41221c1..8a42258e0da2e2 100644 --- a/docs/ops/arithmetic/Sign_1.md +++ b/docs/ops/arithmetic/Sign_1.md @@ -1,4 +1,4 @@ -## Sign +## Sign {#openvino_docs_ops_arithmetic_Sign_1} **Versioned name**: *Sign-1* diff --git a/docs/ops/arithmetic/Sin_1.md b/docs/ops/arithmetic/Sin_1.md index 5a677a9ed375c5..0a4e9ca413db3c 100644 --- a/docs/ops/arithmetic/Sin_1.md +++ b/docs/ops/arithmetic/Sin_1.md @@ -1,4 +1,4 @@ -## Sin +## Sin {#openvino_docs_ops_arithmetic_Sin_1} **Versioned name**: *Sin-1* diff --git a/docs/ops/arithmetic/Sinh_1.md b/docs/ops/arithmetic/Sinh_1.md index d7557239a78add..a4a0264f31c59e 100644 --- a/docs/ops/arithmetic/Sinh_1.md +++ b/docs/ops/arithmetic/Sinh_1.md @@ -1,4 +1,4 @@ -## Sinh +## Sinh {#openvino_docs_ops_arithmetic_Sinh_1} **Versioned name**: *Sinh-1* diff --git a/docs/ops/arithmetic/Sqrt_1.md b/docs/ops/arithmetic/Sqrt_1.md index f3b6505551d296..8677bd566d5bc9 100644 --- a/docs/ops/arithmetic/Sqrt_1.md +++ b/docs/ops/arithmetic/Sqrt_1.md @@ -1,4 +1,4 @@ -## Sqrt +## Sqrt {#openvino_docs_ops_arithmetic_Sqrt_1} **Versioned name**: *Sqrt-1* diff --git a/docs/ops/arithmetic/SquaredDifference_1.md b/docs/ops/arithmetic/SquaredDifference_1.md index 93cdfd48cb5ca8..565dc00f0fca44 100644 --- a/docs/ops/arithmetic/SquaredDifference_1.md +++ b/docs/ops/arithmetic/SquaredDifference_1.md @@ -1,4 +1,4 @@ -## SquaredDifference +## SquaredDifference {#openvino_docs_ops_arithmetic_SquaredDifference_1} **Versioned name**: *SquaredDifference-1* diff --git a/docs/ops/arithmetic/Subtract_1.md b/docs/ops/arithmetic/Subtract_1.md index 5af03d6625cd2b..b82a3f1c19061e 100644 --- a/docs/ops/arithmetic/Subtract_1.md +++ b/docs/ops/arithmetic/Subtract_1.md @@ -1,4 +1,4 @@ -## Subtract +## Subtract {#openvino_docs_ops_arithmetic_Subtract_1} **Versioned name**: *Subtract-1* diff --git a/docs/ops/arithmetic/Tan_1.md b/docs/ops/arithmetic/Tan_1.md index abc414cbdfca84..8a4640262cc56f 100644 --- a/docs/ops/arithmetic/Tan_1.md +++ b/docs/ops/arithmetic/Tan_1.md @@ -1,4 +1,4 @@ -## Tan +## Tan {#openvino_docs_ops_arithmetic_Tan_1} **Versioned name**: *Tan-1* diff --git a/docs/ops/arithmetic/Tanh_1.md b/docs/ops/arithmetic/Tanh_1.md index 950a3a1d48eac8..9f6e2d8079fb2f 100644 --- a/docs/ops/arithmetic/Tanh_1.md +++ b/docs/ops/arithmetic/Tanh_1.md @@ -1,4 +1,4 @@ -## Tanh +## Tanh {#openvino_docs_ops_arithmetic_Tanh_1} **Versioned name**: *Tanh-1* diff --git a/docs/ops/comparison/Equal_1.md b/docs/ops/comparison/Equal_1.md index 389afa0bcd25d9..22d75588da6e0f 100644 --- a/docs/ops/comparison/Equal_1.md +++ b/docs/ops/comparison/Equal_1.md @@ -1,4 +1,4 @@ -## Equal +## Equal {#openvino_docs_ops_comparison_Equal_1} **Versioned name**: *Equal-1* diff --git a/docs/ops/comparison/GreaterEqual_1.md b/docs/ops/comparison/GreaterEqual_1.md index 4f236ab6223769..8c0b22d6c32f11 100644 --- a/docs/ops/comparison/GreaterEqual_1.md +++ b/docs/ops/comparison/GreaterEqual_1.md @@ -1,4 +1,4 @@ -## GreaterEqual +## GreaterEqual {#openvino_docs_ops_comparison_GreaterEqual_1} **Versioned name**: *GreaterEqual-1* diff --git a/docs/ops/comparison/Greater_1.md b/docs/ops/comparison/Greater_1.md index 6fd4e63811c34c..0e286b51762e52 100644 --- a/docs/ops/comparison/Greater_1.md +++ b/docs/ops/comparison/Greater_1.md @@ -1,4 +1,4 @@ -## Greater +## Greater {#openvino_docs_ops_comparison_Greater_1} **Versioned name**: *Greater-1* diff --git a/docs/ops/comparison/LessEqual_1.md b/docs/ops/comparison/LessEqual_1.md index 1b4f5d535ceb1d..34d66e6592a405 100644 --- a/docs/ops/comparison/LessEqual_1.md +++ b/docs/ops/comparison/LessEqual_1.md @@ -1,4 +1,4 @@ -## LessEqual +## LessEqual {#openvino_docs_ops_comparison_LessEqual_1} **Versioned name**: *LessEqual-1* diff --git a/docs/ops/comparison/Less_1.md b/docs/ops/comparison/Less_1.md index 7f724cce897d84..90113253fcab6e 100644 --- a/docs/ops/comparison/Less_1.md +++ b/docs/ops/comparison/Less_1.md @@ -1,4 +1,4 @@ -## Less +## Less {#openvino_docs_ops_comparison_Less_1} **Versioned name**: *Less-1* diff --git a/docs/ops/comparison/NotEqual_1.md b/docs/ops/comparison/NotEqual_1.md index d1858c34f70030..c55b14be61a769 100644 --- a/docs/ops/comparison/NotEqual_1.md +++ b/docs/ops/comparison/NotEqual_1.md @@ -1,4 +1,4 @@ -## NotEqual +## NotEqual {#openvino_docs_ops_comparison_NotEqual_1} **Versioned name**: *NotEqual-1* diff --git a/docs/ops/condition/Bucketize_3.md b/docs/ops/condition/Bucketize_3.md index 578e79b2e913f6..f1b23cc0cb9d9b 100644 --- a/docs/ops/condition/Bucketize_3.md +++ b/docs/ops/condition/Bucketize_3.md @@ -1,4 +1,4 @@ -## Bucketize +## Bucketize {#openvino_docs_ops_condition_Bucketize_3} **Versioned name**: *Bucketize-3* diff --git a/docs/ops/condition/NonZero_3.md b/docs/ops/condition/NonZero_3.md index d1f14220450532..acf75ae0886684 100644 --- a/docs/ops/condition/NonZero_3.md +++ b/docs/ops/condition/NonZero_3.md @@ -1,4 +1,4 @@ -## NonZero +## NonZero {#openvino_docs_ops_condition_NonZero_3} **Versioned name**: *NonZero-3* diff --git a/docs/ops/condition/Select_1.md b/docs/ops/condition/Select_1.md index 10d91e504f0092..8f51624961078e 100644 --- a/docs/ops/condition/Select_1.md +++ b/docs/ops/condition/Select_1.md @@ -1,4 +1,4 @@ -## Select +## Select {#openvino_docs_ops_condition_Select_1} **Versioned name**: *Select-1* diff --git a/docs/ops/convolution/BinaryConvolution_1.md b/docs/ops/convolution/BinaryConvolution_1.md index 4f50aab06d675a..d6aabd0b20c11a 100644 --- a/docs/ops/convolution/BinaryConvolution_1.md +++ b/docs/ops/convolution/BinaryConvolution_1.md @@ -1,4 +1,4 @@ -## BinaryConvolution +## BinaryConvolution {#openvino_docs_ops_convolution_BinaryConvolution_1} **Versioned name**: *BinaryConvolution-1* diff --git a/docs/ops/convolution/ConvolutionBackpropData_1.md b/docs/ops/convolution/ConvolutionBackpropData_1.md index b9e6759eb25179..472309cf07c8a7 100644 --- a/docs/ops/convolution/ConvolutionBackpropData_1.md +++ b/docs/ops/convolution/ConvolutionBackpropData_1.md @@ -1,4 +1,4 @@ -## ConvolutionBackpropData +## ConvolutionBackpropData {#openvino_docs_ops_convolution_ConvolutionBackpropData_1} **Versioned name**: *ConvolutionBackpropData-1* diff --git a/docs/ops/convolution/Convolution_1.md b/docs/ops/convolution/Convolution_1.md index 2f74d127b40bb2..e1f9276b1584d2 100644 --- a/docs/ops/convolution/Convolution_1.md +++ b/docs/ops/convolution/Convolution_1.md @@ -1,4 +1,4 @@ -## Convolution +## Convolution {#openvino_docs_ops_convolution_Convolution_1} **Versioned name**: *Convolution-1* diff --git a/docs/ops/convolution/DeformableConvolution_1.md b/docs/ops/convolution/DeformableConvolution_1.md index 247fcc0792709e..323926c873f212 100644 --- a/docs/ops/convolution/DeformableConvolution_1.md +++ b/docs/ops/convolution/DeformableConvolution_1.md @@ -1,4 +1,4 @@ -## DeformableConvolution +## DeformableConvolution {#openvino_docs_ops_convolution_DeformableConvolution_1} **Versioned name**: *DeformableConvolution-1* diff --git a/docs/ops/convolution/GroupConvolutionBackpropData_1.md b/docs/ops/convolution/GroupConvolutionBackpropData_1.md index 3ffbacc4690ae4..c3ebfcc62306cb 100644 --- a/docs/ops/convolution/GroupConvolutionBackpropData_1.md +++ b/docs/ops/convolution/GroupConvolutionBackpropData_1.md @@ -1,4 +1,4 @@ -## GroupConvolutionBackpropData +## GroupConvolutionBackpropData {#openvino_docs_ops_convolution_GroupConvolutionBackpropData_1} **Versioned name**: *GroupConvolutionBackpropData-1* diff --git a/docs/ops/convolution/GroupConvolution_1.md b/docs/ops/convolution/GroupConvolution_1.md index 4f5e7667d7627d..4c59445a526fe8 100644 --- a/docs/ops/convolution/GroupConvolution_1.md +++ b/docs/ops/convolution/GroupConvolution_1.md @@ -1,4 +1,4 @@ -## GroupConvolution +## GroupConvolution {#openvino_docs_ops_convolution_GroupConvolution_1} **Versioned name**: *GroupConvolution-1* diff --git a/docs/ops/detection/DeformablePSROIPooling_1.md b/docs/ops/detection/DeformablePSROIPooling_1.md index 749e5013ef9758..2adcfb82e2e5c2 100644 --- a/docs/ops/detection/DeformablePSROIPooling_1.md +++ b/docs/ops/detection/DeformablePSROIPooling_1.md @@ -1,4 +1,4 @@ -## DeformablePSROIPooling +## DeformablePSROIPooling {#openvino_docs_ops_detection_DeformablePSROIPooling_1} **Versioned name**: *DeformablePSROIPooling-1* diff --git a/docs/ops/detection/DetectionOutput_1.md b/docs/ops/detection/DetectionOutput_1.md index d735eb474a72d9..84cd483d6d5b88 100644 --- a/docs/ops/detection/DetectionOutput_1.md +++ b/docs/ops/detection/DetectionOutput_1.md @@ -1,4 +1,4 @@ -## DetectionOutput +## DetectionOutput {#openvino_docs_ops_detection_DetectionOutput_1} **Versioned name**: *DetectionOutput-1* diff --git a/docs/ops/detection/PSROIPooling_1.md b/docs/ops/detection/PSROIPooling_1.md index 29d2c1b686e351..ae82d0f93dc898 100644 --- a/docs/ops/detection/PSROIPooling_1.md +++ b/docs/ops/detection/PSROIPooling_1.md @@ -1,4 +1,4 @@ -## PSROIPooling +## PSROIPooling {#openvino_docs_ops_detection_PSROIPooling_1} **Versioned name**: *PSROIPooling-1* diff --git a/docs/ops/detection/PriorBoxClustered_1.md b/docs/ops/detection/PriorBoxClustered_1.md index 36dc2c9037cad3..33eaf9ed78bf05 100644 --- a/docs/ops/detection/PriorBoxClustered_1.md +++ b/docs/ops/detection/PriorBoxClustered_1.md @@ -1,4 +1,4 @@ -## PriorBoxClustered +## PriorBoxClustered {#openvino_docs_ops_detection_PriorBoxClustered_1} **Versioned name**: *PriorBoxClustered-1* diff --git a/docs/ops/detection/PriorBox_1.md b/docs/ops/detection/PriorBox_1.md index 44f0daade39495..8cec849b965104 100644 --- a/docs/ops/detection/PriorBox_1.md +++ b/docs/ops/detection/PriorBox_1.md @@ -1,4 +1,4 @@ -## PriorBox +## PriorBox {#openvino_docs_ops_detection_PriorBox_1} **Versioned name**: *PriorBox-1* diff --git a/docs/ops/detection/Proposal_1.md b/docs/ops/detection/Proposal_1.md index 2309eccd2a4449..65858eb95870bb 100644 --- a/docs/ops/detection/Proposal_1.md +++ b/docs/ops/detection/Proposal_1.md @@ -1,4 +1,4 @@ -## Proposal +## Proposal {#openvino_docs_ops_detection_Proposal_1} **Versioned name**: *Proposal-1* diff --git a/docs/ops/detection/ROIAlign_3.md b/docs/ops/detection/ROIAlign_3.md index a46a4b0a1fece2..533a84a021d807 100644 --- a/docs/ops/detection/ROIAlign_3.md +++ b/docs/ops/detection/ROIAlign_3.md @@ -1,4 +1,4 @@ -## ROIAlign +## ROIAlign {#openvino_docs_ops_detection_ROIAlign_3} **Versioned name**: *ROIAlign-3* diff --git a/docs/ops/detection/ROIPooling_1.md b/docs/ops/detection/ROIPooling_1.md index 3d6aeb545026de..4ab319875dc45f 100644 --- a/docs/ops/detection/ROIPooling_1.md +++ b/docs/ops/detection/ROIPooling_1.md @@ -1,4 +1,4 @@ -## ROIPooling +## ROIPooling {#openvino_docs_ops_detection_ROIPooling_1} **Versioned name**: *ROIPooling-1* diff --git a/docs/ops/detection/RegionYolo_1.md b/docs/ops/detection/RegionYolo_1.md index 45f3b6c58d3829..f5a4067ad739a2 100644 --- a/docs/ops/detection/RegionYolo_1.md +++ b/docs/ops/detection/RegionYolo_1.md @@ -1,4 +1,4 @@ -## RegionYolo +## RegionYolo {#openvino_docs_ops_detection_RegionYolo_1} **Versioned name**: *RegionYolo-1* diff --git a/docs/ops/detection/ReorgYolo_1.md b/docs/ops/detection/ReorgYolo_1.md index c7c0627073dc03..25c4669e8b9a56 100644 --- a/docs/ops/detection/ReorgYolo_1.md +++ b/docs/ops/detection/ReorgYolo_1.md @@ -1,4 +1,4 @@ -## ReorgYolo Layer +## ReorgYolo Layer {#openvino_docs_ops_detection_ReorgYolo_1} **Versioned name**: *ReorgYolo-1* diff --git a/docs/ops/generation/Range_1.md b/docs/ops/generation/Range_1.md index 0b32cc294edb36..e1097f23278258 100644 --- a/docs/ops/generation/Range_1.md +++ b/docs/ops/generation/Range_1.md @@ -1,4 +1,4 @@ -## Range +## Range {#openvino_docs_ops_generation_Range_1} **Versioned name**: *Range-1* diff --git a/docs/ops/image/Interpolate_1.md b/docs/ops/image/Interpolate_1.md index d4d1e05902eb00..28935c4e80e8a3 100644 --- a/docs/ops/image/Interpolate_1.md +++ b/docs/ops/image/Interpolate_1.md @@ -1,4 +1,4 @@ -## Interpolate +## Interpolate {#openvino_docs_ops_image_Interpolate_1} **Versioned name**: *Interpolate-1* diff --git a/docs/ops/image/Interpolate_4.md b/docs/ops/image/Interpolate_4.md index a9adae1982faa7..0a283109f1fdf1 100644 --- a/docs/ops/image/Interpolate_4.md +++ b/docs/ops/image/Interpolate_4.md @@ -1,4 +1,4 @@ -## Interpolate +## Interpolate {#openvino_docs_ops_image_Interpolate_4} **Versioned name**: *Interpolate-4* diff --git a/docs/ops/infrastructure/Assign_3.md b/docs/ops/infrastructure/Assign_3.md index d899c51426c64b..8e4ed887795c2b 100644 --- a/docs/ops/infrastructure/Assign_3.md +++ b/docs/ops/infrastructure/Assign_3.md @@ -1,4 +1,4 @@ -## Assign +## Assign {#openvino_docs_ops_infrastructure_Assign_3} **Versioned name**: *Assign-3* diff --git a/docs/ops/infrastructure/Constant_1.md b/docs/ops/infrastructure/Constant_1.md index dfe32d650d873e..5a5919fc34bb2e 100644 --- a/docs/ops/infrastructure/Constant_1.md +++ b/docs/ops/infrastructure/Constant_1.md @@ -1,4 +1,4 @@ -## Constant +## Constant {#openvino_docs_ops_infrastructure_Constant_1} **Versioned name**: *Constant-1* diff --git a/docs/ops/infrastructure/Parameter_1.md b/docs/ops/infrastructure/Parameter_1.md index cc99c16b59dc19..0467d5a241e7ce 100644 --- a/docs/ops/infrastructure/Parameter_1.md +++ b/docs/ops/infrastructure/Parameter_1.md @@ -1,4 +1,4 @@ -## Parameter +## Parameter {#openvino_docs_ops_infrastructure_Parameter_1} **Versioned name**: *Parameter-1* diff --git a/docs/ops/infrastructure/ReadValue_3.md b/docs/ops/infrastructure/ReadValue_3.md index 74998f82b97eea..261ff657f64982 100644 --- a/docs/ops/infrastructure/ReadValue_3.md +++ b/docs/ops/infrastructure/ReadValue_3.md @@ -1,4 +1,4 @@ -## ReadValue +## ReadValue {#openvino_docs_ops_infrastructure_ReadValue_3} **Versioned name**: *ReadValue-3* diff --git a/docs/ops/infrastructure/Result_1.md b/docs/ops/infrastructure/Result_1.md index ef6eb4d7c17747..f2afa552fdd3fe 100644 --- a/docs/ops/infrastructure/Result_1.md +++ b/docs/ops/infrastructure/Result_1.md @@ -1,4 +1,4 @@ -## Result +## Result {#openvino_docs_ops_infrastructure_Result_1} **Versioned name**: *Result-1* diff --git a/docs/ops/infrastructure/TensorIterator_1.md b/docs/ops/infrastructure/TensorIterator_1.md index 3a3cc6fc87c9b3..6edffef5daef25 100644 --- a/docs/ops/infrastructure/TensorIterator_1.md +++ b/docs/ops/infrastructure/TensorIterator_1.md @@ -1,4 +1,4 @@ -## TensorIterator +## TensorIterator {#openvino_docs_ops_infrastructure_TensorIterator_1} **Versioned name**: *TensorIterator-1* diff --git a/docs/ops/logical/LogicalAnd_1.md b/docs/ops/logical/LogicalAnd_1.md index 50ae997f39ce30..54a04881fbfa28 100644 --- a/docs/ops/logical/LogicalAnd_1.md +++ b/docs/ops/logical/LogicalAnd_1.md @@ -1,4 +1,4 @@ -## LogicalAnd +## LogicalAnd {#openvino_docs_ops_logical_LogicalAnd_1} **Versioned name**: *LogicalAnd-1* diff --git a/docs/ops/logical/LogicalNot_1.md b/docs/ops/logical/LogicalNot_1.md index 7185c94e8b3340..f31eab7ddbc3f0 100644 --- a/docs/ops/logical/LogicalNot_1.md +++ b/docs/ops/logical/LogicalNot_1.md @@ -1,4 +1,4 @@ -## LogicalNot +## LogicalNot {#openvino_docs_ops_logical_LogicalNot_1} **Versioned name**: *LogicalNot-1* diff --git a/docs/ops/logical/LogicalOr_1.md b/docs/ops/logical/LogicalOr_1.md index c2563c5890cf83..b900f9c151aed6 100644 --- a/docs/ops/logical/LogicalOr_1.md +++ b/docs/ops/logical/LogicalOr_1.md @@ -1,4 +1,4 @@ -## LogicalOr +## LogicalOr {#openvino_docs_ops_logical_LogicalOr_1} **Versioned name**: *LogicalOr-1* diff --git a/docs/ops/logical/LogicalXor_1.md b/docs/ops/logical/LogicalXor_1.md index 1921718d3cf1ac..bc0b197a2b3053 100644 --- a/docs/ops/logical/LogicalXor_1.md +++ b/docs/ops/logical/LogicalXor_1.md @@ -1,4 +1,4 @@ -## LogicalXor +## LogicalXor {#openvino_docs_ops_logical_LogicalXor_1} **Versioned name**: *LogicalXor-1* diff --git a/docs/ops/matrix/MatMul_1.md b/docs/ops/matrix/MatMul_1.md index 2a27a495abc67f..efba993160b4d0 100644 --- a/docs/ops/matrix/MatMul_1.md +++ b/docs/ops/matrix/MatMul_1.md @@ -1,4 +1,4 @@ -## MatMul +## MatMul {#openvino_docs_ops_matrix_MatMul_1} **Versioned name**: *MatMul-1* diff --git a/docs/ops/movement/BatchToSpace_2.md b/docs/ops/movement/BatchToSpace_2.md index 39308ffe0a5328..936d597792eba3 100644 --- a/docs/ops/movement/BatchToSpace_2.md +++ b/docs/ops/movement/BatchToSpace_2.md @@ -1,4 +1,4 @@ -## BatchToSpace +## BatchToSpace {#openvino_docs_ops_movement_BatchToSpace_2} **Versioned name**: *BatchToSpace-2* diff --git a/docs/ops/movement/Broadcast_1.md b/docs/ops/movement/Broadcast_1.md index 0bb49c0ab45674..a9b37505ce81ea 100644 --- a/docs/ops/movement/Broadcast_1.md +++ b/docs/ops/movement/Broadcast_1.md @@ -1,4 +1,4 @@ -## Broadcast +## Broadcast {#openvino_docs_ops_movement_Broadcast_1} **Versioned name**: *Broadcast-1* diff --git a/docs/ops/movement/Broadcast_3.md b/docs/ops/movement/Broadcast_3.md index 200e44ec99fc90..460f613dbd739f 100644 --- a/docs/ops/movement/Broadcast_3.md +++ b/docs/ops/movement/Broadcast_3.md @@ -1,4 +1,4 @@ -## Broadcast +## Broadcast {#openvino_docs_ops_movement_Broadcast_3} **Versioned name**: *Broadcast-3* diff --git a/docs/ops/movement/Concat_1.md b/docs/ops/movement/Concat_1.md index da2274ef4cbe86..45ac2e2d4108dd 100644 --- a/docs/ops/movement/Concat_1.md +++ b/docs/ops/movement/Concat_1.md @@ -1,4 +1,4 @@ -## Concat +## Concat {#openvino_docs_ops_movement_Concat_1} **Versioned name**: *Concat-1* diff --git a/docs/ops/movement/DepthToSpace_1.md b/docs/ops/movement/DepthToSpace_1.md index ca0f38271f5312..14e192d4f75971 100644 --- a/docs/ops/movement/DepthToSpace_1.md +++ b/docs/ops/movement/DepthToSpace_1.md @@ -1,4 +1,4 @@ -## DepthToSpace +## DepthToSpace {#openvino_docs_ops_movement_DepthToSpace_1} **Versioned name**: *DepthToSpace-1* diff --git a/docs/ops/movement/ExtractImagePatches_3.md b/docs/ops/movement/ExtractImagePatches_3.md index dfc2ae7436bbd2..3604d3b49ca19d 100644 --- a/docs/ops/movement/ExtractImagePatches_3.md +++ b/docs/ops/movement/ExtractImagePatches_3.md @@ -1,4 +1,4 @@ -## ExtractImagePatches +## ExtractImagePatches {#openvino_docs_ops_movement_ExtractImagePatches_3} **Versioned name**: *ExtractImagePatches-3* diff --git a/docs/ops/movement/GatherTree_1.md b/docs/ops/movement/GatherTree_1.md index ea4ceb5c5e05a0..773beeae794568 100644 --- a/docs/ops/movement/GatherTree_1.md +++ b/docs/ops/movement/GatherTree_1.md @@ -1,4 +1,4 @@ -## GatherTree +## GatherTree {#openvino_docs_ops_movement_GatherTree_1} **Versioned name**: *GatherTree-1* diff --git a/docs/ops/movement/Gather_1.md b/docs/ops/movement/Gather_1.md index 107d9f2222c88f..984cdfc1ec515b 100644 --- a/docs/ops/movement/Gather_1.md +++ b/docs/ops/movement/Gather_1.md @@ -1,4 +1,4 @@ -## Gather +## Gather {#openvino_docs_ops_movement_Gather_1} **Versioned name**: *Gather-1* diff --git a/docs/ops/movement/Pad_1.md b/docs/ops/movement/Pad_1.md index 8f9d55236b2b74..5b2c591c10ac86 100644 --- a/docs/ops/movement/Pad_1.md +++ b/docs/ops/movement/Pad_1.md @@ -1,4 +1,4 @@ -## Pad +## Pad {#openvino_docs_ops_movement_Pad_1} **Versioned name**: *Pad-1* diff --git a/docs/ops/movement/ReverseSequence_1.md b/docs/ops/movement/ReverseSequence_1.md index c9e2100150a3ec..a4fc4e25a5db37 100644 --- a/docs/ops/movement/ReverseSequence_1.md +++ b/docs/ops/movement/ReverseSequence_1.md @@ -1,4 +1,4 @@ -## ReverseSequence +## ReverseSequence {#openvino_docs_ops_movement_ReverseSequence_1} **Versioned name**: *ReverseSequence-1* diff --git a/docs/ops/movement/Reverse_1.md b/docs/ops/movement/Reverse_1.md index 3bdf5c261627ba..4b96f0f035093e 100644 --- a/docs/ops/movement/Reverse_1.md +++ b/docs/ops/movement/Reverse_1.md @@ -1,4 +1,4 @@ -## Reverse +## Reverse {#openvino_docs_ops_movement_Reverse_1} **Versioned name**: *Reverse-1* diff --git a/docs/ops/movement/ScatterElementsUpdate_3.md b/docs/ops/movement/ScatterElementsUpdate_3.md index 39e741d3126a90..994715d214270c 100644 --- a/docs/ops/movement/ScatterElementsUpdate_3.md +++ b/docs/ops/movement/ScatterElementsUpdate_3.md @@ -1,4 +1,4 @@ -## ScatterElementsUpdate +## ScatterElementsUpdate {#openvino_docs_ops_movement_ScatterElementsUpdate_3} **Versioned name**: *ScatterElementsUpdate-3* diff --git a/docs/ops/movement/ScatterNDUpdate_3.md b/docs/ops/movement/ScatterNDUpdate_3.md index c59e98eda7e6c3..256cd963e1e20a 100644 --- a/docs/ops/movement/ScatterNDUpdate_3.md +++ b/docs/ops/movement/ScatterNDUpdate_3.md @@ -1,4 +1,4 @@ -## ScatterNDUpdate +## ScatterNDUpdate {#openvino_docs_ops_movement_ScatterNDUpdate_3} **Versioned name**: *ScatterNDUpdate-3* diff --git a/docs/ops/movement/ScatterUpdate_3.md b/docs/ops/movement/ScatterUpdate_3.md index 7e6ebebcb7dab6..2b71d74a8a46db 100644 --- a/docs/ops/movement/ScatterUpdate_3.md +++ b/docs/ops/movement/ScatterUpdate_3.md @@ -1,4 +1,4 @@ -## ScatterUpdate +## ScatterUpdate {#openvino_docs_ops_movement_ScatterUpdate_3} **Versioned name**: *ScatterUpdate-3* diff --git a/docs/ops/movement/ShuffleChannels_1.md b/docs/ops/movement/ShuffleChannels_1.md index 7150cc13e4063b..ec7cfc75d9db6f 100644 --- a/docs/ops/movement/ShuffleChannels_1.md +++ b/docs/ops/movement/ShuffleChannels_1.md @@ -1,4 +1,4 @@ -## ShuffleChannels +## ShuffleChannels {#openvino_docs_ops_movement_ShuffleChannels_1} **Versioned name**: *ShuffleChannels-1* diff --git a/docs/ops/movement/SpaceToBatch_2.md b/docs/ops/movement/SpaceToBatch_2.md index 8a4adea25a6796..66c064e27bee35 100644 --- a/docs/ops/movement/SpaceToBatch_2.md +++ b/docs/ops/movement/SpaceToBatch_2.md @@ -1,4 +1,4 @@ -## SpaceToBatch +## SpaceToBatch {#openvino_docs_ops_movement_SpaceToBatch_2} **Versioned name**: *SpaceToBatch-2* diff --git a/docs/ops/movement/SpaceToDepth_1.md b/docs/ops/movement/SpaceToDepth_1.md index 0c1a0a433588c6..0004a56a225acf 100644 --- a/docs/ops/movement/SpaceToDepth_1.md +++ b/docs/ops/movement/SpaceToDepth_1.md @@ -1,4 +1,4 @@ -## SpaceToDepth +## SpaceToDepth {#openvino_docs_ops_movement_SpaceToDepth_1} **Versioned name**: *SpaceToDepth-1* diff --git a/docs/ops/movement/Split_1.md b/docs/ops/movement/Split_1.md index a3e982b1bbd71d..67711f2947e2bd 100644 --- a/docs/ops/movement/Split_1.md +++ b/docs/ops/movement/Split_1.md @@ -1,4 +1,4 @@ -## Split +## Split {#openvino_docs_ops_movement_Split_1} **Versioned name**: *Split-1* diff --git a/docs/ops/movement/StridedSlice_1.md b/docs/ops/movement/StridedSlice_1.md index f60f7bb7f7d678..6c07665d8f930c 100644 --- a/docs/ops/movement/StridedSlice_1.md +++ b/docs/ops/movement/StridedSlice_1.md @@ -1,4 +1,4 @@ -## StridedSlice +## StridedSlice {#openvino_docs_ops_movement_StridedSlice_1} **Versioned name**: *StridedSlice-1* diff --git a/docs/ops/movement/Tile_1.md b/docs/ops/movement/Tile_1.md index 0018158ab94d97..d71f009f40eef7 100644 --- a/docs/ops/movement/Tile_1.md +++ b/docs/ops/movement/Tile_1.md @@ -1,4 +1,4 @@ -## Tile +## Tile {#openvino_docs_ops_movement_Tile_1} **Versioned name**: *Tile-1* diff --git a/docs/ops/movement/Transpose_1.md b/docs/ops/movement/Transpose_1.md index c752b89de5a16d..877f60181514c8 100644 --- a/docs/ops/movement/Transpose_1.md +++ b/docs/ops/movement/Transpose_1.md @@ -1,4 +1,4 @@ -## Transpose +## Transpose {#openvino_docs_ops_movement_Transpose_1} **Versioned name**: *Transpose-1* diff --git a/docs/ops/movement/VariadicSplit_1.md b/docs/ops/movement/VariadicSplit_1.md index 3f3784b342dcd8..0efdcafe467c3b 100644 --- a/docs/ops/movement/VariadicSplit_1.md +++ b/docs/ops/movement/VariadicSplit_1.md @@ -1,4 +1,4 @@ -## VariadicSplit +## VariadicSplit {#openvino_docs_ops_movement_VariadicSplit_1} **Versioned name**: *VariadicSplit-1* diff --git a/docs/ops/normalization/BatchNormInference_1.md b/docs/ops/normalization/BatchNormInference_1.md index 5d33775effdc25..d7bf1f59edd74f 100644 --- a/docs/ops/normalization/BatchNormInference_1.md +++ b/docs/ops/normalization/BatchNormInference_1.md @@ -1,4 +1,4 @@ -## BatchNormInference +## BatchNormInference {#openvino_docs_ops_normalization_BatchNormInference_1} **Versioned name**: *BatchNormInference-1* diff --git a/docs/ops/normalization/GRN_1.md b/docs/ops/normalization/GRN_1.md index ae0350cbf578dd..656a46e3cc16ee 100644 --- a/docs/ops/normalization/GRN_1.md +++ b/docs/ops/normalization/GRN_1.md @@ -1,4 +1,4 @@ -## GRN +## GRN {#openvino_docs_ops_normalization_GRN_1} **Versioned name**: *GRN-1* diff --git a/docs/ops/normalization/LRN_1.md b/docs/ops/normalization/LRN_1.md index a1c7585b24cc73..989b40bc521bc6 100644 --- a/docs/ops/normalization/LRN_1.md +++ b/docs/ops/normalization/LRN_1.md @@ -1,4 +1,4 @@ -## LRN +## LRN {#openvino_docs_ops_normalization_LRN_1} **Versioned name**: *LRN-1* diff --git a/docs/ops/normalization/MVN_1.md b/docs/ops/normalization/MVN_1.md index 8d712209453f7c..1c55d626679c0e 100644 --- a/docs/ops/normalization/MVN_1.md +++ b/docs/ops/normalization/MVN_1.md @@ -1,4 +1,4 @@ -## MVN +## MVN {#openvino_docs_ops_normalization_MVN_1} **Versioned name**: *MVN-1* diff --git a/docs/ops/normalization/NormalizeL2_1.md b/docs/ops/normalization/NormalizeL2_1.md index c5dfb8e273ec8b..56fd13092adcd4 100644 --- a/docs/ops/normalization/NormalizeL2_1.md +++ b/docs/ops/normalization/NormalizeL2_1.md @@ -1,4 +1,4 @@ -## NormalizeL2 +## NormalizeL2 {#openvino_docs_ops_normalization_NormalizeL2_1} **Versioned name**: *NormalizeL2-1* diff --git a/docs/ops/opset.md b/docs/ops/opset.md index dcf30b6f46a559..958e7bd99b96a7 100644 --- a/docs/ops/opset.md +++ b/docs/ops/opset.md @@ -1,4 +1,4 @@ -# Available Operations Sets +# Available Operations Sets {#openvino_docs_ops_opset} According to capabilities of supported deep learning frameworks and hardware capabilities of a target inference device, all operations are combined into operations sets each fully supported in a specific version of OpenVINO™ toolkit. diff --git a/docs/ops/opset1.md b/docs/ops/opset1.md index 0f52e29ccaad2a..73da245d2dc541 100644 --- a/docs/ops/opset1.md +++ b/docs/ops/opset1.md @@ -1,4 +1,4 @@ -# Operation Set `opset1` Specification +# Operation Set `opset1` Specification {#openvino_docs_ops_opset1} This specification document describes `opset1` operation set supported in OpenVINO. Support for each particular operation from the list below depends on the capabilities available in a inference plugin diff --git a/docs/ops/opset2.md b/docs/ops/opset2.md index 41a4d31ff27781..bfee6cee9c45a8 100644 --- a/docs/ops/opset2.md +++ b/docs/ops/opset2.md @@ -1,4 +1,4 @@ -# Operation Set `opset2` Specification +# Operation Set `opset2` Specification {#openvino_docs_ops_opset2} This specification document describes `opset2` operation set supported in OpenVINO. Support for each particular operation from the list below depends on the capabilities available in a inference plugin diff --git a/docs/ops/opset3.md b/docs/ops/opset3.md index c042f5bdca46b2..e36d4be27c5227 100644 --- a/docs/ops/opset3.md +++ b/docs/ops/opset3.md @@ -1,4 +1,4 @@ -# Operation Set `opset3` Specification +# Operation Set `opset3` Specification {#openvino_docs_ops_opset3} This specification document describes `opset3` operation set supported in OpenVINO. Support for each particular operation from the list below depends on the capabilities available in a inference plugin diff --git a/docs/ops/opset4.md b/docs/ops/opset4.md index 8f8842fe5b22ba..0b4f66423563b5 100644 --- a/docs/ops/opset4.md +++ b/docs/ops/opset4.md @@ -1,4 +1,4 @@ -# Operation Set `opset4` Specification +# Operation Set `opset4` Specification {#openvino_docs_ops_opset4} This specification document describes `opset4` operation set supported in OpenVINO. Support for each particular operation from the list below depends on the capabilities available in a inference plugin diff --git a/docs/ops/pooling/AvgPool_1.md b/docs/ops/pooling/AvgPool_1.md index 76bb5975a7254c..dfa04c476b02ed 100644 --- a/docs/ops/pooling/AvgPool_1.md +++ b/docs/ops/pooling/AvgPool_1.md @@ -1,4 +1,4 @@ -## AvgPool +## AvgPool {#openvino_docs_ops_pooling_AvgPool_1} **Versioned name**: *AvgPool-1* diff --git a/docs/ops/pooling/MaxPool_1.md b/docs/ops/pooling/MaxPool_1.md index 51606d5805306e..6e705e49a22c8e 100644 --- a/docs/ops/pooling/MaxPool_1.md +++ b/docs/ops/pooling/MaxPool_1.md @@ -1,4 +1,4 @@ -## MaxPool +## MaxPool {#openvino_docs_ops_pooling_MaxPool_1} **Versioned name**: *MaxPool-1* diff --git a/docs/ops/quantization/FakeQuantize_1.md b/docs/ops/quantization/FakeQuantize_1.md index 3fa62dddc621ad..aa5271b70d0bbf 100644 --- a/docs/ops/quantization/FakeQuantize_1.md +++ b/docs/ops/quantization/FakeQuantize_1.md @@ -1,4 +1,4 @@ -## FakeQuantize +## FakeQuantize {#openvino_docs_ops_quantization_FakeQuantize_1} **Versioned name**: *FakeQuantize-1* @@ -93,4 +93,4 @@ else:
-``` \ No newline at end of file +``` diff --git a/docs/ops/reduction/ReduceLogicalAnd_1.md b/docs/ops/reduction/ReduceLogicalAnd_1.md index e917fd3b69dc2e..824d5fdec7fd28 100644 --- a/docs/ops/reduction/ReduceLogicalAnd_1.md +++ b/docs/ops/reduction/ReduceLogicalAnd_1.md @@ -1,4 +1,4 @@ -## ReduceLogicalAnd +## ReduceLogicalAnd {#openvino_docs_ops_reduction_ReduceLogicalAnd_1} **Versioned name**: *ReduceLogicalAnd-1* diff --git a/docs/ops/reduction/ReduceLogicalOr_1.md b/docs/ops/reduction/ReduceLogicalOr_1.md index 8f5f9e67ed524b..e67a1d0dc3da0c 100644 --- a/docs/ops/reduction/ReduceLogicalOr_1.md +++ b/docs/ops/reduction/ReduceLogicalOr_1.md @@ -1,4 +1,4 @@ -## ReduceLogicalOr +## ReduceLogicalOr {#openvino_docs_ops_reduction_ReduceLogicalOr_1} **Versioned name**: *ReduceLogicalOr-1* diff --git a/docs/ops/reduction/ReduceMax_1.md b/docs/ops/reduction/ReduceMax_1.md index b3915d004c7bf7..13379caf9bbe74 100644 --- a/docs/ops/reduction/ReduceMax_1.md +++ b/docs/ops/reduction/ReduceMax_1.md @@ -1,4 +1,4 @@ -## ReduceMax +## ReduceMax {#openvino_docs_ops_reduction_ReduceMax_1} **Versioned name**: *ReduceMax-1* diff --git a/docs/ops/reduction/ReduceMean_1.md b/docs/ops/reduction/ReduceMean_1.md index b5a27df81356a0..3c7b06ed17955d 100644 --- a/docs/ops/reduction/ReduceMean_1.md +++ b/docs/ops/reduction/ReduceMean_1.md @@ -1,4 +1,4 @@ -## ReduceMean +## ReduceMean {#openvino_docs_ops_reduction_ReduceMean_1} **Versioned name**: *ReduceMean-1* diff --git a/docs/ops/reduction/ReduceMin_1.md b/docs/ops/reduction/ReduceMin_1.md index ce29b405121f15..4dd3981eb512ab 100644 --- a/docs/ops/reduction/ReduceMin_1.md +++ b/docs/ops/reduction/ReduceMin_1.md @@ -1,4 +1,4 @@ -## ReduceMin +## ReduceMin {#openvino_docs_ops_reduction_ReduceMin_1} **Versioned name**: *ReduceMin-1* diff --git a/docs/ops/reduction/ReduceProd_1.md b/docs/ops/reduction/ReduceProd_1.md index 86a15f0e239d14..dab222ce8f86ef 100644 --- a/docs/ops/reduction/ReduceProd_1.md +++ b/docs/ops/reduction/ReduceProd_1.md @@ -1,4 +1,4 @@ -## ReduceProd +## ReduceProd {#openvino_docs_ops_reduction_ReduceProd_1} **Versioned name**: *ReduceProd-1* diff --git a/docs/ops/reduction/ReduceSum_1.md b/docs/ops/reduction/ReduceSum_1.md index f0b7982ee4df40..b348ad769ea11e 100644 --- a/docs/ops/reduction/ReduceSum_1.md +++ b/docs/ops/reduction/ReduceSum_1.md @@ -1,4 +1,4 @@ -## ReduceSum +## ReduceSum {#openvino_docs_ops_reduction_ReduceSum_1} **Versioned name**: *ReduceSum-1* diff --git a/docs/ops/sequence/CTCGreedyDecoder_1.md b/docs/ops/sequence/CTCGreedyDecoder_1.md index 2cec32880e3083..ca9acfa29d7e3e 100644 --- a/docs/ops/sequence/CTCGreedyDecoder_1.md +++ b/docs/ops/sequence/CTCGreedyDecoder_1.md @@ -1,4 +1,4 @@ -## CTCGreedyDecoder +## CTCGreedyDecoder {#openvino_docs_ops_sequence_CTCGreedyDecoder_1} **Versioned name**: *CTCGreedyDecoder-1* diff --git a/docs/ops/sequence/CTCLoss_4.md b/docs/ops/sequence/CTCLoss_4.md index 0fa47ad66af2ee..f9dcd2837e0b12 100644 --- a/docs/ops/sequence/CTCLoss_4.md +++ b/docs/ops/sequence/CTCLoss_4.md @@ -1,4 +1,4 @@ -## CTCLoss +## CTCLoss {#openvino_docs_ops_sequence_CTCLoss_4} **Versioned name**: *CTCLoss-4* diff --git a/docs/ops/sequence/GRUCell_3.md b/docs/ops/sequence/GRUCell_3.md index a669aea98982c7..b3be69715747bf 100644 --- a/docs/ops/sequence/GRUCell_3.md +++ b/docs/ops/sequence/GRUCell_3.md @@ -1,4 +1,4 @@ -## GRUCell +## GRUCell {#openvino_docs_ops_sequence_GRUCell_3} **Versioned name**: *GRUCell-3* diff --git a/docs/ops/sequence/LSTMCell_1.md b/docs/ops/sequence/LSTMCell_1.md index e30ebb90215759..0898357f29c124 100644 --- a/docs/ops/sequence/LSTMCell_1.md +++ b/docs/ops/sequence/LSTMCell_1.md @@ -1,4 +1,4 @@ -## LSTMCell +## LSTMCell {#openvino_docs_ops_sequence_LSTMCell_1} **Versioned name**: *LSTMCell-1* diff --git a/docs/ops/sequence/LSTMSequence_1.md b/docs/ops/sequence/LSTMSequence_1.md index 66743535d5a9dc..12825e081ef892 100644 --- a/docs/ops/sequence/LSTMSequence_1.md +++ b/docs/ops/sequence/LSTMSequence_1.md @@ -1,4 +1,4 @@ -## LSTMSequence +## LSTMSequence {#openvino_docs_ops_sequence_LSTMSequence_1} **Versioned name**: *LSTMSequence-1* diff --git a/docs/ops/sequence/OneHot_1.md b/docs/ops/sequence/OneHot_1.md index 0e343111b1e70f..9f96611802b67d 100644 --- a/docs/ops/sequence/OneHot_1.md +++ b/docs/ops/sequence/OneHot_1.md @@ -1,4 +1,4 @@ -## OneHot +## OneHot {#openvino_docs_ops_sequence_OneHot_1} **Versioned name**: *OneHot-1* diff --git a/docs/ops/sequence/RNNCell_1.md b/docs/ops/sequence/RNNCell_1.md index 873444b602a874..12f07544eed386 100644 --- a/docs/ops/sequence/RNNCell_1.md +++ b/docs/ops/sequence/RNNCell_1.md @@ -1,4 +1,4 @@ -## RNNCell +## RNNCell {#openvino_docs_ops_sequence_RNNCell_1} **Versioned name**: *RNNCell-1* diff --git a/docs/ops/sequence/RNNCell_3.md b/docs/ops/sequence/RNNCell_3.md index 75f248a0a5c22c..f9f694c6287bc7 100644 --- a/docs/ops/sequence/RNNCell_3.md +++ b/docs/ops/sequence/RNNCell_3.md @@ -1,4 +1,4 @@ -## RNNCell +## RNNCell {#openvino_docs_ops_sequence_RNNCell_3} **Versioned name**: *RNNCell-3* diff --git a/docs/ops/shape/Reshape_1.md b/docs/ops/shape/Reshape_1.md index c6213cb0b115b2..f8d975f588554a 100644 --- a/docs/ops/shape/Reshape_1.md +++ b/docs/ops/shape/Reshape_1.md @@ -1,4 +1,4 @@ -## Reshape +## Reshape {#openvino_docs_ops_shape_Reshape_1} **Versioned name**: *Reshape-1* diff --git a/docs/ops/shape/ShapeOf_1.md b/docs/ops/shape/ShapeOf_1.md index 7cd1c5bae00509..d19a24d4a37151 100644 --- a/docs/ops/shape/ShapeOf_1.md +++ b/docs/ops/shape/ShapeOf_1.md @@ -1,4 +1,4 @@ -## ShapeOf +## ShapeOf {#openvino_docs_ops_shape_ShapeOf_1} **Versioned name**: *ShapeOf-1* diff --git a/docs/ops/shape/ShapeOf_3.md b/docs/ops/shape/ShapeOf_3.md index 0a5dc105ad0688..a325b92cd63a95 100644 --- a/docs/ops/shape/ShapeOf_3.md +++ b/docs/ops/shape/ShapeOf_3.md @@ -1,4 +1,4 @@ -## ShapeOf +## ShapeOf {#openvino_docs_ops_shape_ShapeOf_3} **Versioned name**: *ShapeOf-3* diff --git a/docs/ops/shape/Squeeze_1.md b/docs/ops/shape/Squeeze_1.md index ee55b533736893..5b2426d2487659 100644 --- a/docs/ops/shape/Squeeze_1.md +++ b/docs/ops/shape/Squeeze_1.md @@ -1,4 +1,4 @@ -## Squeeze +## Squeeze {#openvino_docs_ops_shape_Squeeze_1} **Versioned name**: *Squeeze-1* diff --git a/docs/ops/shape/Unsqueeze_1.md b/docs/ops/shape/Unsqueeze_1.md index 6c1c068c7f049c..f0479e061b2b79 100644 --- a/docs/ops/shape/Unsqueeze_1.md +++ b/docs/ops/shape/Unsqueeze_1.md @@ -1,4 +1,4 @@ -## Unsqueeze +## Unsqueeze {#openvino_docs_ops_shape_Unsqueeze_1} **Versioned name**: *Unsqueeze-1* diff --git a/docs/ops/sort/NonMaxSuppression_1.md b/docs/ops/sort/NonMaxSuppression_1.md index b290b9dd1ad637..fa1b0bb135a512 100644 --- a/docs/ops/sort/NonMaxSuppression_1.md +++ b/docs/ops/sort/NonMaxSuppression_1.md @@ -1,4 +1,4 @@ -## NonMaxSuppression +## NonMaxSuppression {#openvino_docs_ops_sort_NonMaxSuppression_1} **Versioned name**: *NonMaxSuppression-1* diff --git a/docs/ops/sort/NonMaxSuppression_3.md b/docs/ops/sort/NonMaxSuppression_3.md index a869a56d1d4337..1b5e6f3746714b 100644 --- a/docs/ops/sort/NonMaxSuppression_3.md +++ b/docs/ops/sort/NonMaxSuppression_3.md @@ -1,4 +1,4 @@ -## NonMaxSuppression +## NonMaxSuppression {#openvino_docs_ops_sort_NonMaxSuppression_3} **Versioned name**: *NonMaxSuppression-3* diff --git a/docs/ops/sort/NonMaxSuppression_4.md b/docs/ops/sort/NonMaxSuppression_4.md index 22cb3606e94bd8..3c1b2cc7405397 100644 --- a/docs/ops/sort/NonMaxSuppression_4.md +++ b/docs/ops/sort/NonMaxSuppression_4.md @@ -1,4 +1,4 @@ -## NonMaxSuppression +## NonMaxSuppression {#openvino_docs_ops_sort_NonMaxSuppression_4} **Versioned name**: *NonMaxSuppression-4* diff --git a/docs/ops/sort/TopK_1.md b/docs/ops/sort/TopK_1.md index 6ed2111fe86405..ad42e13902273d 100644 --- a/docs/ops/sort/TopK_1.md +++ b/docs/ops/sort/TopK_1.md @@ -1,4 +1,4 @@ -## TopK +## TopK {#openvino_docs_ops_sort_TopK_1} **Versioned name**: *TopK-1* diff --git a/docs/ops/sort/TopK_3.md b/docs/ops/sort/TopK_3.md index eddf19fb6b4415..7636f66137f3c0 100644 --- a/docs/ops/sort/TopK_3.md +++ b/docs/ops/sort/TopK_3.md @@ -1,4 +1,4 @@ -## TopK +## TopK {#openvino_docs_ops_sort_TopK_3} **Versioned name**: *TopK-3* diff --git a/docs/ops/sparse/EmbeddingBagOffsetsSum_3.md b/docs/ops/sparse/EmbeddingBagOffsetsSum_3.md index 82476ddc8083d9..e702198988eece 100644 --- a/docs/ops/sparse/EmbeddingBagOffsetsSum_3.md +++ b/docs/ops/sparse/EmbeddingBagOffsetsSum_3.md @@ -1,4 +1,4 @@ -## EmbeddingBagOffsetsSum +## EmbeddingBagOffsetsSum {#openvino_docs_ops_sparse_EmbeddingBagOffsetsSum_3} **Versioned name**: *EmbeddingBagOffsetsSum-3* diff --git a/docs/ops/sparse/EmbeddingBagPackedSum_3.md b/docs/ops/sparse/EmbeddingBagPackedSum_3.md index 8114250b261d68..c288ef01d8c0c7 100644 --- a/docs/ops/sparse/EmbeddingBagPackedSum_3.md +++ b/docs/ops/sparse/EmbeddingBagPackedSum_3.md @@ -1,4 +1,4 @@ -## EmbeddingBagPackedSum +## EmbeddingBagPackedSum {#openvino_docs_ops_sparse_EmbeddingBagPackedSum_3} **Versioned name**: *EmbeddingBagPackedSum-3* diff --git a/docs/ops/sparse/EmbeddingSegmentsSum_3.md b/docs/ops/sparse/EmbeddingSegmentsSum_3.md index e9b29605da3b8f..80e45bdd1fed2f 100644 --- a/docs/ops/sparse/EmbeddingSegmentsSum_3.md +++ b/docs/ops/sparse/EmbeddingSegmentsSum_3.md @@ -1,4 +1,4 @@ -## EmbeddingSegmentsSum +## EmbeddingSegmentsSum {#openvino_docs_ops_sparse_EmbeddingSegmentsSum_3} **Versioned name**: *EmbeddingSegmentsSum-3* diff --git a/docs/ops/type/ConvertLike_1.md b/docs/ops/type/ConvertLike_1.md index 69cf74920a5893..2d82cbc5d0a894 100644 --- a/docs/ops/type/ConvertLike_1.md +++ b/docs/ops/type/ConvertLike_1.md @@ -1,4 +1,4 @@ -## ConvertLike +## ConvertLike {#openvino_docs_ops_type_ConvertLike_1} **Versioned name**: *ConvertLike-1* diff --git a/docs/ops/type/Convert_1.md b/docs/ops/type/Convert_1.md index e9bf1d9051de2d..ef9b0d9e95283f 100644 --- a/docs/ops/type/Convert_1.md +++ b/docs/ops/type/Convert_1.md @@ -1,4 +1,4 @@ -## Convert +## Convert {#openvino_docs_ops_type_Convert_1} **Versioned name**: *Convert-1* diff --git a/docs/optimization_guide/dldt_optimization_guide.md b/docs/optimization_guide/dldt_optimization_guide.md new file mode 100644 index 00000000000000..38d9b224d6b5cf --- /dev/null +++ b/docs/optimization_guide/dldt_optimization_guide.md @@ -0,0 +1,610 @@ +# Optimization Guide {#openvino_docs_optimization_guide_dldt_optimization_guide} + +## Introduction + +The purpose of this document is to give you performance-related insights to every step of the network deployment process. + +For information on the general workflow, refer to the documentation in See Also. For an example Inference Engine API snippet, see Request-Based API and “GetBlob” Idiom. + +### Deep Learning Inference Engine Overview + +Deep Learning Inference Engine is a part of Intel® Deep Learning Deployment Toolkit (Intel® DL Deployment Toolkit) and OpenVINO™ toolkit. Inference Engine facilitates deployment of deep learning solutions by delivering a unified, device-agnostic API. + +Below, there are the three main steps of the deployment process: + +1. **Conversion**
+ Trained models are converted from a specific framework (like Caffe\* or TensorFlow\*) to a framework-agnostic Intermediate Representation (IR) format. + + - *Performance flow*: This is an offline step where general topology-level optimizations happen automatically (see Model Optimizer Knobs Related to Performance). + + - *Tools*: Intel DL Deployment Toolkit features the Model Optimizer that enables automatic and seamless transition from the training environment to the deployment environment. + +2. **Model Inference/Execution**
+ After conversion, Inference Engine consumes the IR to perform inference. While Inference Engine API itself is target-agnostic, internally, it has a notion of plugins, which are device-specific libraries facilitating the hardware-assisted acceleration. + + - *Performance flow*: Upon conversion to IR, the execution starts with existing [Inference Engine samples](../IE_DG/Samples_Overview.md) to measure and tweak the performance of the network on different devices.
+ > **NOTE**: While consuming the same IR, each plugin performs additional device-specific optimizations at load time, so the resulting accuracy might differ. Also, enabling and optimizing custom kernels is error-prone (see Optimizing Custom Kernels). + + - *Tools*: Beyond inference performance that samples report (see Latency vs. Throughput), you can get further device- and kernel-level timing with the Inference Engine performance counters and Intel® VTune™. + +3. **Integration to the product**
+ After model inference is verified with the [samples](../IE_DG/Samples_Overview.md), the Inference Engine code is typically integrated into a real application or pipeline. + + - *Performance flow*: The most important point is to preserve the sustained performance achieved with the stand-alone model execution. Take precautions when combining with other APIs and be careful testing the performance of every integration step. + + - *Tools*: Beyond tracking the actual wall-clock time of your application, see Intel® VTune™ Examples for application-level and system-level information. + + +## Gathering the Performance Numbers + +Performance data comes in a variety of forms. For example, one of the the most common performance metrics is latency, which represents the time required to complete a unit of work (for instance, inference time for a single image). In the following sections, you will see important recommendations for measuring the performance. + +### Measure the Proper Set of Operations + +When evaluating performance of your model with the Inference Engine, you must measure the proper set of operations. To do so, consider the following tips: + +- Avoid including one-time costs like model loading. For examples, refer to the [Inference Engine samples](../IE_DG/Samples_Overview.md). +- Track separately the operations that happen outside the Inference Engine, like video decoding. + +> **NOTE**: Some image pre-processing can be baked into the IR and accelerated. For more information, refer to Model Optimizer Knobs Related to Performance. + +### Latency vs. Throughput + +In the asynchronous case (see Request-Based API and “GetBlob” Idiom), the performance of an individual infer request is usually of less concern. Instead, you typically execute multiple requests asynchronously and measure the throughput in images per second by dividing the number of images that were processed by the processing time. +In contrast, for the latency-oriented tasks, the time to a single frame is more important. + +Refer to the [Benchmark App](../../inference-engine/samples/benchmark_app/README.md) sample, which allows latency vs. throughput measuring. + +> **NOTE**: Most samples also support batching (automatically packing multiple input images into a single request). However, high batch size results in a latency penalty. So for more real-time oriented usages, lower batch sizes (as low as a single input) are usually used. However, devices like CPU, Intel® Movidius™ Myriad™ 2 VPU, Intel® Movidius™ Myriad™ X VPU, or Intel® Vision Accelerator Design with Intel® Movidius™ VPU require a number of parallel requests instead of batching to leverage the performance. + +### Comparing Performance with Native/Framework Code + +When comparing the Inference Engine performance with the framework or another reference code, make sure that both versions are as similar as possible: + +- Wrap exactly the inference execution (refer to the [Inference Engine Samples](../IE_DG/Samples_Overview.md) for examples). +- Do not include model loading time. +- Ensure the inputs are identical for the Inference Engine and the framework. For example, Caffe\* allows to auto-populate the input with random values. Notice that it might give different performance than on real images. +- Similarly, for correct performance comparison, make sure the access pattern, for example, input layouts, is optimal for Inference Engine (currently, it is NCHW). +- Any user-side pre-processing should be tracked separately. +- Make sure to try the same environment settings that the framework developers recommend, for example, for TensorFlow*. In many cases, things that are more machine friendly, like respecting NUMA (see CPU Checklist), might work well for the Inference Engine as well. +- If applicable, use batching with the Inference Engine. +- If possible, demand the same accuracy. For example, TensorFlow allows `FP16` support, so when comparing to that, make sure to test the Inference Engine with the `FP16` as well. + +### Getting Credible Performance Numbers + +You need to build your performance conclusions on reproducible data. Do the performance measurements with a large number of invocations of the same routine. Since the first iteration is almost always significantly slower than the subsequent ones, you can use an aggregated value for the execution time for final projections: + +- If the warm-up run does not help or execution time still varies, you can try running a large number of iterations and then average or find a mean of the results. +- For time values that range too much, use geomean. + +Refer to the [Inference Engine Samples](../IE_DG/Samples_Overview.md) for code examples for the performance measurements. Almost every sample, except interactive demos, has a `-ni` option to specify the number of iterations. + +## Model Optimizer Knobs Related to Performance + +Networks training is typically done on high-end data centers, using popular training frameworks like Caffe\*, TensorFlow\*, and MXNet\*. Model Optimizer converts the trained model in original proprietary formats to IR that describes the topology. IR is accompanied by a binary file with weights. These files in turn are consumed by the Inference Engine and used for scoring. + +![](../img/workflow_steps.png) + +As described in the [Model Optimizer Guide](../MO_DG/prepare_model/Prepare_Trained_Model.md), there are a number of device-agnostic optimizations the tool performs. For example, certain primitives like linear operations (BatchNorm and ScaleShift), are automatically fused into convolutions. Generally, these layers should not be manifested in the resulting IR: + +![](../img/resnet_269.png) + +The picture above shows Caffe\* Resnet269\* topology. The left model is the original model, and the one on the right (after conversion) is the resulting model that the Model Optimizer produces, with BatchNorm and ScaleShift layers fused into the convolution weights rather than constituting separate layers. + +If you still see these operations, inspect the Model Optimizer output carefully while searching for warnings, such as on the tool being unable to fuse. For example, non-linear operations (like activations) in between convolutions and linear operations might prevent the fusing. If performance is of concern, try to change (and potentially re-train) the topology. Refer to the [Model Optimizer Guide](../MO_DG/prepare_model/Model_Optimization_Techniques.md) for more optimizations. + +Notice that the activation (`_relu`) is not touched by the Model Optimizer, and while it can be merged into convolution as well, this is rather a device-specific optimization, covered by Inference Engine during the model loading time. You are encouraged to inspect performance counters from plugins that should indicate that these particular layers are not executed (“Optimized out”). For more information, refer to Internal Inference Performance Counters. + +Also: + +- **Image mean/scale parameters**
+ Make sure to use the input image mean/scale parameters (`--scale` and `–mean_values`) with the Model Optimizer when you need pre-processing. It allows the tool to bake the pre-processing into the IR to get accelerated by the Inference Engine. + +- **RGB vs. BGR inputs**
+ If, for example, your network assumes the RGB inputs, the Model Optimizer can swap the channels in the first convolution using the `--reverse_input_channels` command line option, so you do not need to convert your inputs to RGB every time you get the BGR image, for example, from OpenCV*. + +- **Larger batch size**
+ Notice that the devices like GPU are doing better with larger batch size. While it is possible to set the batch size in the runtime using the Inference Engine [ShapeInference feature](../IE_DG/ShapeInference.md). + +- **Resulting IR precision**
+The resulting IR precision, for instance, `FP16` or `FP32`, directly affects performance. As CPU now supports `FP16` (while internally upscaling to `FP32` anyway) and because this is the best precision for a GPU target, you may want to always convert models to `FP16`. Notice that this is the only precision that Intel® Movidius™ Myriad™ 2 and Intel® Myriad™ X VPUs support. + + +## Device-Specific Optimizations + +The Inference Engine supports several target devices (CPU, GPU, Intel® Movidius™ Myriad™ 2 VPU, Intel® Movidius™ Myriad™ X VPU, Intel® Vision Accelerator Design with Intel® Movidius™ Vision Processing Units (VPU) and FPGA), and each of them has a corresponding plugin. If you want to optimize a specific device, you must keep in mind the following tips to increase the performance. + +### CPU Checklist + +CPU plugin completely relies on the Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) for major primitives acceleration, for example, Convolutions or FullyConnected. + +The only hint you can get from that is how the major primitives are accelerated (and you cannot change this). For example, on the Core machines, you should see variations of the `jit_avx2` when inspecting the internal inference performance counters (and additional '_int8' postfix for [int8 inference](../IE_DG/Int8Inference.md)). If you are an advanced user, you can further trace the CPU execution with (see Intel® VTune™). + +Internally, the Inference Engine has a threading abstraction level, which allows for compiling the [open source version](https://github.com/opencv/dldt) with either Intel® Threading Building Blocks (Intel® TBB) which is now default, or OpenMP* as an alternative parallelism solution. When using inference on the CPU, this is particularly important to align threading model with the rest of your application (and any third-party libraries that you use) to avoid oversubscription. For more information, see Note on the App-Level Threading section. + + Since R1 2019, the OpenVINO™ toolkit comes pre-compiled with Intel TBB, + so any OpenMP* API or environment settings (like `OMP_NUM_THREADS`) has no effect anymore. + Certain tweaks (like number of threads used for inference on the CPU) are still possible via [CPU configuration options](../IE_DG/supported_plugins/CPU.md). + Finally, the OpenVINO CPU inference is NUMA-aware, please refer to the Tips for inference on NUMA systems section. + +Other general recommendations: +- Usually, batching improves CPU performance. However, the need to gather frames in the batch might complicate the application logic. Instead, you can keep a separate infer request per camera or other source of input and process the requests in parallel. For more information, see the next section. +- If your application simultaneously performs inference of multiple models on the same CPU, make sure you do not oversubscribe the machine. See Performance Aspects of Running Multiple Requests Simultaneously for more information. +- Notice that the heterogeneous execution might implicitly load the CPU. For details, refer to the Heterogeneity section. +- Consider [8-bit integer inference on the CPU](../IE_DG/Int8Inference.md). + +#### Throughput Mode for CPU +Unlike most accelerators, CPU is perceived as an inherently latency-oriented device. +In fact, the OpenVINO does support the "throughput" mode for the CPU, which allows the Inference Engine to efficiently run multiple inference requests on the CPU simultaneously, greatly improving the overall throughput. + +Internally, the execution resources are split/pinned into execution "streams". +This feature usually provides much better performance for the networks than batching. This is especially true for the many-core server machines: +![](../img/cpu_streams_explained.png) + +Try the [Benchmark App](../../inference-engine/samples/benchmark_app/README.md) sample and play with number of streams running in parallel. The rule of thumb is tying up to a number of CPU cores on your machine. +For example, on an 8-core CPU, compare the `-nstreams 1` (which is a legacy, latency-oriented scenario) to the 2, 4, and 8 streams. + +In addition, you can play with the batch size to find the throughput sweet spot. + +If your application is hard or impossible to change in accordance with the multiple-requests logic, consider the "multiple-instance" trick to improve the throughput: +- For multi-socket execution, it is recommended to set [`KEY_CPU_THREADS_NUM`](../IE_DG/supported_plugins/CPU.md) to the number of cores per socket, and run as many instances of the application as you have sockets. +- Similarly, for extremely lightweight networks (running faster than 1ms) and/or many-core machines (16+ cores), try limiting the number of CPU inference threads to just `#‍phys` cores and further, while trying to saturate the machine with running multiple instances of the application. + + +### GPU Checklist + +Inference Engine relies on the [Compute Library for Deep Neural Networks (clDNN)](https://01.org/cldnn) for Convolutional Neural Networks acceleration on Intel® GPUs. Internally, clDNN uses OpenCL™ to implement the kernels. Thus, many general tips apply: + +- Prefer `FP16` over `FP32`, as the Model Optimizer can generate both variants and the `FP32` is default. +- Try to group individual infer jobs by using batches. +- Notice that using the GPU introduces one-time overhead (order of few seconds) of compiling the OpenCL kernels. The compilation happens upon loading the network to the GPU plugin and does not affect the inference time. +- If your application is simultaneously using the inference on the CPU or otherwise loads the host heavily, make sure that the OpenCL driver threads do not starve. You can use [CPU configuration options](../IE_DG/supported_plugins/CPU.md) to limit number of inference threads for the CPU plugin. +- In the GPU-only scenario, a GPU driver might occupy a CPU core with spin-looped polling for completion. If the _CPU_ utilization is a concern, consider the `KEY_CLDND_PLUGIN_THROTTLE` configuration option. + +> **NOTE**: See the [Benchmark App Sample](../../inference-engine/samples/benchmark_app/README.md) code for a usage example. +Notice that while disabling the polling, this option might reduce the GPU performance, so usually this option is used with multiple [GPU streams](../IE_DG/supported_plugins/CL_DNN.md). + + +### Intel® Movidius™ Myriad™ X Visual Processing Unit and Intel® Vision Accelerator Design with Intel® Movidius™ VPUs + +Since Intel® Movidius™ Myriad™ X Visual Processing Unit (Intel® Movidius™ Myriad™ 2 VPU) communicates with the host over USB, minimum four infer requests in flight are recommended to hide the data transfer costs. See Request-Based API and “GetBlob” Idiom and [Benchmark App Sample](../../inference-engine/samples/benchmark_app/README.md) for more information. + +Intel® Vision Accelerator Design with Intel® Movidius™ VPUs requires to keep at least 32 inference requests in flight to fully saturate the device. + +### FPGA + +Below are listed the most important tips for the efficient usage of the FPGA: + +- Just like for the Intel® Movidius™ Myriad™ VPU flavors, for the FPGA, it is important to hide the communication overheads by running multiple inference requests in parallel. For examples, refer to the [Benchmark App Sample](../../inference-engine/samples/benchmark_app/README.md). +- Since the first inference iteration with FPGA is always significantly slower than the subsequent ones, make sure you run multiple iterations (all samples, except GUI-based demos, have the `-ni` or 'niter' option to do that). +- FPGA performance heavily depends on the bitstream. +- Number of the infer request per executable network is limited to five, so “channel” parallelism (keeping individual infer request per camera/video input) would not work beyond five inputs. Instead, you need to mux the inputs into some queue that will internally use a pool of (5) requests. +- In most scenarios, the FPGA acceleration is leveraged through heterogeneous execution with further specific tips. +- For multi-device FPGA execution please refer to the [FPGA plugin documentation](../IE_DG/supported_plugins/FPGA.md) + +## Heterogeneity + +Heterogeneous execution (constituted by the dedicated Inference Engine [“Hetero” plugin](../IE_DG/supported_plugins/HETERO.md)) enables to schedule a network inference to the multiple devices. + +### Typical Heterogeneous Scenarios of Concern + +The primary points for executing a network in heterogeneous mode are as follows: + +- Calculate the heaviest pieces of the network with an accelerator while falling back to the CPU for the layers that are not supported by the accelerator.
+ This is particularly useful when certain custom (user) kernels are implemented only for the CPU (and much harder or even impossible to implement for the accelerator). + +- Use all available compute devices more efficiently, for example, by running branches of the network on the different devices. + +### Heterogeneous Flow + +The execution through heterogeneous plugin has three distinct steps: + +1. **Applying affinity setting for the layers**, that is, binding them to the devices. + + - This can be done automatically using *fallback priorities*, or on the *per-layer* basis. + + - The affinity setting is made before loading the network to the (heterogeneous) plugin, so this is always a **static** setup with respect to execution. + +2. **Loading a network to the heterogeneous plugin**, which internally splits the network into subgraphs.
+ You can check the decisions the plugin makes, see Analysing the Heterogeneous Execution. + +3. **Executing the infer requests**. From user’s side, this looks identical to a single-device case, while internally, the subgraphs are executed by actual plugins/devices. + +Performance benefits of the heterogeneous execution depend heavily on the communications granularity between devices. If transmitting/converting data from one part device to another takes more time than the execution, the heterogeneous approach makes little or no sense. Using Intel® VTune™ helps to visualize the execution flow on a timeline (see Intel® VTune™ Examples). + +Similarly, if there are too much subgraphs, the synchronization and data transfers might eat the entire performance. In some cases, you can define the (coarser) affinity manually to avoid sending data back and forth many times during one inference. + +The general affinity “rule of thumb” is to keep computationally-intensive kernels on the accelerator, and "glue" or helper kernels on the CPU. Notice that this includes the granularity considerations. For example, running some custom activation (that comes after every accelerator-equipped convolution) on the CPU might result in performance degradation due to too much data type and/or layout conversions, even though the activation itself can be extremely fast. In this case, it might make sense to consider implementing the kernel for the accelerator (see Optimizing Custom Kernels). The conversions typically manifest themselves as outstanding (comparing to CPU-only execution) 'Reorder' entries (see Internal Inference Performance Counters). + +For general details on the heterogeneous plugin, refer to the [corresponding section in the Inference Engine Developer Guide](../IE_DG/supported_plugins/HETERO.md). + +### Trying the Heterogeneous Plugin with Inference Engine Samples + +Every Inference Engine sample supports the `-d` (device) option. + +For example, here is a command to run an [Object Detection Sample SSD Sample](../../inference-engine/samples/object_detection_sample_ssd/README.md): + +```sh +./object_detection_sample_ssd -m /ModelSSD.xml -i /picture.jpg -d HETERO:FPGA,CPU +``` + +where: + +- `HETERO` stands for Heterogeneous plugin. +- `FPGA,CPU` points to fallback policy with first priority on FPGA and further fallback to CPU. + +You can point more than two devices: `-d HETERO:FPGA,GPU,CPU`. + +### Heterogeneous Scenarios with FPGA + +As FPGA is considered as an inference accelerator, most performance issues are related to the fact that due to the fallback, the CPU can be still used quite heavily. +- Yet in most cases, the CPU does only small/lightweight layers, for example, post-processing (`SoftMax` in most classification models or `DetectionOutput` in the SSD*-based topologies). In that case, limiting the number of CPU threads with [`KEY_CPU_THREADS_NUM`](../IE_DG/supported_plugins/CPU.md) config would further reduce the CPU utilization without significantly degrading the overall performance. +- Also, if you are still using OpenVINO version earlier than R1 2019, or if you have recompiled the Inference Engine with OpemMP (say for backward compatibility), setting the `KMP_BLOCKTIME` environment variable to something less than default 200ms (we suggest 1ms) is particularly helpful. Use `KMP_BLOCKTIME=0` if the CPU subgraph is small. + +> **NOTE**: General threading tips (see Note on the App-Level Threading) apply well, even when the entire topology fits the FPGA, because there is still a host-side code for data pre- and post-processing. + +### General Tips on GPU/CPU Execution + +The following tips are provided to give general guidance on optimizing execution on GPU/CPU devices. + +- Generally, GPU performance is better on heavy kernels (like Convolutions) and large inputs. So if the network inference time is already too small (~1ms of execution time), using the GPU would unlikely give a boost. + +- A typical strategy to start with is to test the CPU-only and GPU-only scenarios first (with samples this is plain `-d CPU` or `-d GPU`). If there are specific kernels that are not supported by the GPU, the best option to try is the `HETERO:GPU,CPU` that automatically applies default splitting (based on the plugins layers support). Then, you can play with the manual affinity settings (for example, to further minimize the number of subgraphs). + +- The general affinity “rule of thumb” is to keep computationally-intensive kernels on the accelerator, and "glue" (or helper) kernels on the CPU. Notice that this includes the granularity considerations. For example, running some (custom) activation on the CPU would result in too many conversions. + +- It is advised to do performance analysis to determine “hotspot” kernels, which should be the first candidates for offloading. At the same time, it is often more efficient to offload some reasonably sized sequence of kernels, rather than individual kernels, to minimize scheduling and other run-time overheads. + +- Notice that GPU can be busy with other tasks (like rendering). Similarly, the CPU can be in charge for the general OS routines and other application threads (see Note on the App-Level Threading). Also, a high interrupt rate due to many subgraphs can raise the frequency of the one device and drag the frequency of another down. + +- Device performance can be affected by dynamic frequency scaling. For example, running long kernels on both devices simultaneously might eventually result in one or both devices stopping use of the Intel® Turbo Boost Technology. This might result in overall performance decrease, even comparing to single-device scenario. + +- Mixing the `FP16` (GPU) and `FP32` (CPU) execution results in conversions and, thus, performance issues. If you are seeing a lot of heavy outstanding (compared to the CPU-only execution) Reorders, consider implementing actual GPU kernels. Refer to Internal Inference Performance Counters for more information. + +### Analyzing Heterogeneous Execution + +There is a dedicated configuration option that enables dumping the visualization of the subgraphs created by the heterogeneous plugin: + +```cpp +#include "ie_plugin_config.hpp" +#include "hetero/hetero_plugin_config.hpp" +using namespace InferenceEngine::PluginConfigParams; +using namespace InferenceEngine::HeteroConfigParams; + +... +enginePtr = dispatcher.getPluginByDevice("HETERO:FPGA,CPU"); +InferencePlugin plugin(enginePtr); +plugin.SetConfig({ {KEY_HETERO_DUMP_GRAPH_DOT, YES} }); +``` + +After enabling the configuration key, the heterogeneous plugin generates two files: + +- `hetero_affinity.dot` - per-layer affinities. This file is generated only if default fallback policy was executed (as otherwise you have set the affinities by yourself, so you know them). +- `hetero_subgraphs.dot` - affinities per sub-graph. This file is written to the disk during execution of `ICNNNetwork::LoadNetwork` for the heterogeneous plugin. + +You can use GraphViz\* utility or `.dot` converters (for example, to `.png` or `.pdf`), like xdot\*, available on Linux\* OS with `sudo apt-get install xdot`. Below is an example of the output trimmed to the two last layers (one executed on the FPGA and another on the CPU): + +![](../img/output_trimmed.png) + +You can also use performance data (in samples, it is an option `-pc`) to get performance data on each subgraph. Refer to Internal Inference Performance Counters for more information. + + +## Optimizing Custom Kernels + +### Few Initial Performance Considerations + +The Inference Engine supports CPU, GPU and VPU custom kernels. Typically, custom kernels are used to quickly implement missing layers for new topologies. You should not override standard layers implementation, especially on the critical path, for example, Convolutions. Also, overriding existing layers can disable some existing performance optimizations, such as fusing. + +It is usually easier to start with the CPU extension and switch to the GPU after debugging with the CPU path. Sometimes, when the custom layers are at the very end of your pipeline, it is easier to implement them as regular post-processing in your application without wrapping them as kernels. This is particularly true for the kernels that do not fit the GPU well, for example, output bounding boxes sorting. In many cases, you can do such post-processing on the CPU. + +There are many cases when sequence of the custom kernels can be implemented as a "super" kernel allowing to save on data accesses. + +Finally, with the heterogeneous execution, it is possible to execute the vast majority of intensive computations with the accelerator and keep the custom pieces on the CPU. The tradeoff is granularity/costs of communication between different devices. + +For more details on custom layers in Inference Engine, see [Inference Engine Extensibility Mechanism](../IE_DG/Extensibility_DG/Intro.md) + +### Understanding Performance Contribution of Your Custom Kernels + +In most cases, before actually implementing a full-blown code for the kernel, you can estimate the final performance by doing a simple stub kernel that does nothing (and thus is "infinitely" fast) just to let the topology execute end-to-end. Of course, the estimation is valid only if the kernel output does not affect the performance, for instance, if its output is not driving any branches or loops. + +Other than that, when implementing the kernels, you can try the methods from the previous chapter to understand actual contribution and, if any custom kernel is in the hotspots, optimize that. + +### Few Device-Specific Tips + +- As already outlined in the CPU Checklist, align the threading model that you use in your CPU kernels with the model that the rest of the Inference Engine compiled with. +- For CPU extensions, consider kernel flavor that supports blocked layout, if your kernel is in the hotspots (see Internal Inference Performance Counters). Since Intel MKL-DNN internally operates on the blocked layouts, this would save you a data packing (Reorder) on tensor inputs/outputs of your kernel. For example of the blocked layout support, please, refer to the extensions in the `/deployment_tools/samples/extension/`. + +## Plugging Inference Engine to Applications + +### Tips for inference on NUMA systems +For inference on the CPU there are multiple threads binding options, see +[CPU configuration options](../IE_DG/supported_plugins/CPU.md). + - 'YES' (default) binding option maps threads to cores and works best for static/synthetic scenarios like benchmarks. It uses only available cores (determined thru the process mask) and binds the threads in a round-robin fashion. + - 'NUMA' binding may perform better in the real-life scenarios, leaving much more room for the OS scheduling. The main anticipated usage is _contended_ scenarios, for example multiple (CPU) inference-heavy apps executed simultaneously on a single machine. + +If you are building an app-level pipeline with third-party components like GStreamer*, the general guidance for NUMA machines is as follows: +- Whenever possible, use at least one instance of the pipeline per NUMA node: + - Pin the _entire_ pipeline instance to the specific NUMA node at the outer-most level (for example, use Kubernetes* and/or `numactl` command with proper settings before actual GStreamer commands). + - Disable any individual pinning by the pipeline components (e.g. set [CPU_BIND_THREADS to 'NO'](../IE_DG/supported_plugins/CPU.md)). + - Limit each instance with respect to number of inference threads. Use [CPU_THREADS_NUM](../IE_DG/supported_plugins/CPU.md) or or other means (e.g. virtualization, Kubernetes*, etc), to avoid oversubscription. +- If pinning instancing/pinning of the entire pipeline is not possible or desirable, relax the inference threads pinning to just 'NUMA'. + - This is less restrictive compared to the default pinning of threads to cores, yet avoids NUMA penalties. + +### Note on the App-Level Threading + +- As explained in the CPU Checklist section, by default the Inference Engine uses Intel TBB as a parallel engine. Thus, any OpenVINO-internal threading (including CPU inference) uses the same threads pool, provided by the TBB. But there are also other threads in your application, so oversubscription is possible at the application level: +- The rule of thumb is that you should try to have the overall number of active threads in your application equal to the number of cores in your machine. Keep in mind the spare core(s) that the OpenCL driver under the GPU plugin might also need. +- One specific workaround to limit the number of threads for the Inference Engine is using the [CPU configuration options](../IE_DG/supported_plugins/CPU.md). +- To avoid further oversubscription, use the same threading model in all modules/libraries that your application uses. Notice that third party components might bring their own threading. For example, using Inference Engine which is now compiled with the TBB by default might lead to [performance troubles](https://www.threadingbuildingblocks.org/docs/help/reference/appendices/known_issues/interoperability.html) when mixed in the same app with another computationally-intensive library, but compiled with OpenMP. You can try to compile the [open source version](https://github.com/opencv/dldt) of the Inference Engine to use the OpenMP as well. But notice that in general, the TBB offers much better composability, than other threading solutions. +- If your code (or third party libraries) uses GNU OpenMP, the Intel® OpenMP (if you have recompiled Inference Engine with that) must be initialized first. This can be achieved by linking your application with the Intel OpenMP instead of GNU OpenMP, or using `LD_PRELOAD` on Linux* OS. + +### Letting the Inference Engine Accelerate Image Pre-processing/Conversion + +In many cases, a network expects a pre-processed image, so make sure you do not perform unnecessary steps in your code: +- Model Optimizer can efficiently bake the mean and normalization (scale) values into the model (for example, weights of the first convolution). See Model Optimizer Knobs Related to Performance. +- If regular 8-bit per channel images are your native media (for instance, decoded frames), do not convert to the `FP32` on your side, as this is something that plugins can accelerate. Use the `InferenceEngine::Precision::U8` as your input format:
+```cpp +InferenceEngine::InputsDataMap info(netReader.getNetwork().getInputsInfo()); +auto& inputInfoFirst = info.begin()->second; +info->setInputPrecision(Precision::U8); +``` + +Note that in many cases, you can directly share the (input) data with the Inference Engine. + +### Basic Interoperability with Other APIs + +The general approach for sharing data between Inference Engine and media/graphics APIs like Intel® Media Server Studio (Intel® MSS) is based on sharing the *system* memory. That is, in your code, you should map or copy the data from the API to the CPU address space first. + +For Intel MSS, it is recommended to perform a viable pre-processing, for example, crop/resize, and then convert to RGB again with the [Video Processing Procedures (VPP)](https://software.intel.com/en-us/node/696108). Then lock the result and create an Inference Engine blob on top of that. The resulting pointer can be used for the `SetBlob`: +```cpp +//Lock Intel MSS surface +mfxFrameSurface1 *frame_in; //Input MSS surface. +mfxFrameAllocator* pAlloc = &m_mfxCore.FrameAllocator(); +pAlloc->Lock(pAlloc->pthis, frame_in->Data.MemId, &frame_in->Data); +//Inference Engine code +``` + +**WARNING**: The `InferenceEngine::NHWC` layout is not supported natively by most InferenceEngine plugins so internal conversion might happen. + +```cpp +InferenceEngine::SizeVector dims_src = { + 1 /* batch, N*/, + (size_t) frame_in->Info.Height /* Height */, + (size_t) frame_in->Info.Width /* Width */, + 3 /*Channels,*/, + }; +TensorDesc desc(InferenceEngine::Precision::U8, dims_src, InferenceEngine::NHWC); +/* wrapping the surface data, as RGB is interleaved, need to pass only ptr to the R, notice that this wouldn’t work with planar formats as these are 3 separate planes/pointers*/ +InferenceEngine::TBlob::Ptr p = InferenceEngine::make_shared_blob( desc, (uint8_t*) frame_in->Data.R); +inferRequest.SetBlob(“input”, p); +inferRequest.Infer(); +//Make sure to unlock the surface upon inference completion, to return the ownership back to the Intel MSS +pAlloc->Unlock(pAlloc->pthis, frame_in->Data.MemId, &frame_in->Data); +``` + +Alternatively, you can use RGBP (planar RGB) output from Intel MSS. This allows to wrap the (locked) result as regular NCHW which is generally friendly for most plugins (unlike NHWC). Then you can use it with `SetBlob` just like in previous example: + +```cpp +InferenceEngine::SizeVector dims_src = { + 1 /* batch, N*/, + 3 /*Channels,*/, + (size_t) frame_in->Info.Height /* Height */, + (size_t) frame_in->Info.Width /* Width */, + }; +TensorDesc desc(InferenceEngine::Precision::U8, dims_src, InferenceEngine::NCHW); +/* wrapping the RGBP surface data*/ +InferenceEngine::TBlob::Ptr p = InferenceEngine::make_shared_blob( desc, (uint8_t*) frame_in->Data.R); +inferRequest.SetBlob("input", p); +… +``` + +The only downside of this approach is that VPP conversion to RGBP is not hardware accelerated (and performed on the GPU EUs). Also, it is available only on LInux. + +### OpenCV* Interoperability Example + +Unlike APIs that use dedicated address space and/or special data layouts (for instance, compressed OpenGL* textures), regular OpenCV data objects like `cv::Mat` reside in the conventional system memory. That is, the memory can be actually shared with the Inference Engine and only data ownership to be transferred. + +Again, if the OpenCV and Inference Engine layouts match, the data can be wrapped as Inference Engine (input/output) blob. Notice that by default, Inference Engine accepts the **planar** and **not interleaved** inputs in NCHW, so the NHWC (which is exactly the interleaved layout) should be specified explicitly: + +**WARNING**: The `InferenceEngine::NHWC` layout is not supported natively by most InferenceEngine plugins so internal conversion might happen. + +```cpp +cv::Mat frame = ...; // regular CV_8UC3 image, interleaved +// creating blob that wraps the OpenCV’s Mat +// (the data it points should persists until the blob is released): +InferenceEngine::SizeVector dims_src = { + 1 /* batch, N*/, + (size_t)frame.rows /* Height */, + (size_t)frame.cols /* Width */, + (size_t)frame.channels() /*Channels,*/, + }; +TensorDesc desc(InferenceEngine::Precision::U8, dims_src, InferenceEngine::NHWC); +InferenceEngine::TBlob::Ptr p = InferenceEngine::make_shared_blob( desc, (uint8_t*)frame.data, frame.step[0] * frame.rows); +inferRequest.SetBlob(“input”, p); +inferRequest.Infer(); +… +// similarly, you can wrap the output tensor (let’s assume it is FP32) +// notice that the output should be also explicitly stated as NHWC with setLayout +const float* output_data = output_blob->buffer(). + as::value_type*>(); +cv::Mat res (rows, cols, CV_32FC3, output_data, CV_AUTOSTEP); +``` + +Notice that original `cv::Mat`/blobs cannot be used simultaneously by the application and the Inference Engine. Alternatively, the data that the pointer references to can be copied to unlock the original data and return ownership to the original API. + +### Request-Based API and “GetBlob” Idiom + +Infer Request based API offers two types of request: Sync and Async. The Sync is considered below. The Async splits (synchronous) `Infer` into `StartAsync` and `Wait` (see Inference Engine Async API). + +More importantly, an infer request encapsulates the reference to the “executable” network and actual inputs/outputs. Now, when you load the network to the plugin, you get a reference to the executable network (you may consider that as a queue). Actual infer requests are created by the executable network: + +```cpp +CNNNetReader network_reader; +network_reader.ReadNetwork("Model.xml"); +network_reader.ReadWeights("Model.bin"); +auto network = network_reader.getNetwork(); +InferenceEngine::InputsDataMap input_info(network.getInputsInfo()); + +InferenceEnginePluginPtr engine_ptr = PluginDispatcher(pluginDirs).getSuitablePlugin(TargetDevice::eGPU); +InferencePlugin plugin(engine_ptr); + +auto executable_network = plugin.LoadNetwork(network, {/*opt config*/}); +auto infer_request = executable_network.CreateInferRequest(); + +for (auto & item : inputInfo) { + std::string input_name = item->first; + auto input = infer_request.GetBlob(input_name); + /** Lock/Fill input tensor with data **/ + unsigned char* data = + input->buffer().as::value_type*>(); + ... +} + +infer_request->Infer(); +``` + +`GetBlob` is a recommend way to communicate with the network, as it internally allocates the data with right padding/alignment for the device. For example, the GPU inputs/outputs blobs are mapped to the host (which is fast) if the `GetBlob` is used. But if you called the `SetBlob`, the copy (from/to the blob you have set) into the internal GPU plugin structures will happen. + +### Performance Aspects of Running Multiple Requests Simultaneously + +If your application simultaneously executes multiple infer requests: + +- For the CPU, the best solution, you can use the CPU "throughput" mode. + - If latency is of more concern, you can try the `EXCLUSIVE_ASYNC_REQUESTS` [configuration option](../IE_DG/supported_plugins/CPU.md) that limits the number of the simultaneously executed requests for all (executable) networks that share the specific device to just one:
+ ```cpp + //these two networks go thru same plugin (aka device) and their requests will not overlap. + auto executable_network0 = plugin.LoadNetwork(network0, {{PluginConfigParams::KEY_EXCLUSIVE_ASYNC_REQUESTS, PluginConfigParams::YES}}); + auto executable_network1 = plugin.LoadNetwork(network1, {{PluginConfigParams::KEY_EXCLUSIVE_ASYNC_REQUESTS, PluginConfigParams::YES}}); + ``` +
For more information on the executable networks notation, see Request-Based API and “GetBlob” Idiom. + + - The heterogeneous device uses the `EXCLUSIVE_ASYNC_REQUESTS` by default. + + - `KEY_EXCLUSIVE_ASYNC_REQUESTS` option affects only device queues of the individual application. + +- For FPGA and GPU, the actual work is serialized by a plugin and/or a driver anyway. + +- Finally, for any VPU flavor, using multiple requests is a must for achieving good throughput. + +In the Inference Engine, there is no notion of requests priorities. It is left to the user side (for example, not queuing the low priority infer request, until another higher priority is waiting). Notice that it would require additional logic to synchronize between executable networks (queues) in your application code. + +### Inference Engine Async API + +Inference Engine Async API can improve overall frame rate of the application. While accelerator is busy with the inference, the application can continue doing things on the host rather than wait for the inference to complete. + +In the example below, inference is applied to the results of the video decoding. So it is possible to keep two parallel infer requests, and while the current is processed, the input frame for the next is being captured. This essentially hides the latency of capturing, so that the overall frame rate is rather determined only by the slowest part of the pipeline (decoding IR inference) and not by the sum of the stages. + +You can compare the pseudo-codes for the regular and async-based approaches: + +- In the regular way, the frame is captured with OpenCV and then immediately processed:
+```cpp +while(…) { + capture frame + populate CURRENT InferRequest + Infer CURRENT InferRequest //this call is synchronous + display CURRENT result +} +``` +![Intel® VTune™ screenshot](../img/vtune_regular.png) + +- In the "true" async mode, the `NEXT` request is populated in the main (application) thread, while the `CURRENT` request is processed:
+```cpp +while(…) { + capture frame + populate NEXT InferRequest + start NEXT InferRequest //this call is async and returns immediately + wait for the CURRENT InferRequest //processed in a dedicated thread + display CURRENT result + swap CURRENT and NEXT InferRequests +} +``` +![Intel® VTune™ screenshot](../img/vtune_async.png) + +The technique can be generalized to any available parallel slack. For example, you can do inference and simultaneously encode the resulting or previous frames or run further inference, like emotion detection on top of the face detection results. + +There are important performance caveats though: for example, the tasks that run in parallel should try to avoid oversubscribing the shared compute resources. If the inference is performed on the FPGA and the CPU is essentially idle, it makes sense to do things on the CPU in parallel. However, multiple infer requests can oversubscribe that. Notice that heterogeneous execution can implicitly use the CPU, refer to Heterogeneity. + +Also, if the inference is performed on the graphics processing unit (GPU), it can take little gain to do the encoding, for instance, of the resulting video, on the same GPU in parallel, because the device is already busy. + +Refer to the [Object Detection SSD Demo](@ref omz_demos_object_detection_demo_ssd_async_README) (latency-oriented Async API showcase) and [Benchmark App Sample](../../inference-engine/samples/benchmark_app/README.md) (which has both latency and throughput-oriented modes) for complete examples of the Async API in action. + +## Using Tools + +Whether you are tuning for the first time or doing advanced performance optimization, you need a a tool that provides accurate insights. Intel® VTune™ Amplifier gives you the tool to mine it and interpret the profiling data. + +Alternatively, you can gather the raw profiling data that samples report, the second chapter provides example of how to interpret these. + +### Intel® VTune™ Examples + +All major performance calls of the Inference Engine are instrumented with Instrumentation and Tracing Technology APIs. This allows viewing the Inference Engine calls on the Intel® VTune™ timelines and aggregations plus correlating them to the underlying APIs, like OpenCL. In turn, this enables careful per-layer execution breakdown. + +When choosing the Analysis type in Intel® VTune™ Amplifier, make sure to select the **Analyze user tasks, events, and counters** option: + +![](../img/vtune_option.jpg) + +See the [corresponding section in the Intel® VTune™ Amplifier User's Guide](https://software.intel.com/en-us/vtune-amplifier-help-task-analysis) for details. + +Example of Inference Engine calls: + +- On the Intel VTune Amplifier timeline. + Notice that `Task_runNOThrow` is an Async API wrapper and it is executed in a different thread and triggers the Intel MKL-DNN execution: + + ![](../img/vtune_timeline.png) + +- In the Intel VTune Amplifier **Top-down view**, grouped by the **Task Domain**. + Notice the `Task_runNoThrow` and `MKLDNN _INFER` that are bracketing the actual Intel MKL-DNN kernels execution: + + ![](../img/vtune_topdown_view.jpg) + +Similarly, you can use any GPU analysis in the Intel VTune Amplifier and get general correlation with Inference Engine API as well as the execution breakdown for OpenCL kernels. + +Just like with regular native application, further drill down in the counters is possible, however, this is mostly useful for optimizing custom kernels. Finally, with the Intel VTune Amplifier, the profiling is not limited to your user-level code (see the [corresponding section in the Intel® VTune™ Amplifier User's Guide](https://software.intel.com/en-us/vtune-amplifier-help-analyze-performance)). + +### Internal Inference Performance Counters + +Almost every sample (inspect command-line options for a specific sample with `-h`) supports a `-pc` command that outputs internal execution breakdown. Refer to the [samples code](../IE_DG/Samples_Overview.md) for the actual Inference Engine API behind that. + +Below is example of CPU plugin output for a network (since the device is CPU, the layers wall clock `realTime` and the `cpu` time are the same): + +``` +conv1 EXECUTED layerType: Convolution realTime: 706 cpu: 706 execType: jit_avx2 +conv2_1_x1 EXECUTED layerType: Convolution realTime: 137 cpu: 137 execType: jit_avx2_1x1 +fc6 EXECUTED layerType: Convolution realTime: 233 cpu: 233 execType: jit_avx2_1x1 +fc6_nChw8c_nchw EXECUTED layerType: Reorder realTime: 20 cpu: 20 execType: reorder +out_fc6 EXECUTED layerType: Output realTime: 3 cpu: 3 execType: unknown +relu5_9_x2 OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: undef +``` + +This contains layers name (as seen in IR), layers type and execution statistics. Notice the `OPTIMIZED_OUT`, which indicates that the particular activation was fused into adjacent convolution. Also, the `unknown` stays for the Inference Engine specific CPU (helper) primitives that are not part of the Intel MKL-DNN. + +Notice that there are some helper layers in the CPU execution breakdown, which were not presented in the original topology. These are automatically added by the plugin. For example, the `Reorder` re-packs the Intel MKL-DNN internal (blocked) layout to the regular plain NCHW (that the user expects as the output). As explained in the Few Device-Specific Tips, if your custom kernels introduces a lot of outstanding/expensive Reorders, consider blocked implementation for the kernels. + +Notice that in the heterogeneous cases, there will be additional information on which subgraph the statistics is about (the first subgraph is GPU, so its `cpu`/host time is really small compared to the actual `realTime`): + +``` +subgraph1: squeeze1x1 EXECUTED layerType: Convolution realTime: 227 cpu:3 execType: GPU +… +subgraph2: detection_out EXECUTED layerType: DetectionOutput realTime: 121 cpu:121 execType: unknown +… +``` + +As mentioned earlier, `unknown` here means CPU kernel with unknown (for example, not AVX2 or AVX512) acceleration path. +Since FPGA execution does not separate individual kernels, only bulk execution/data transfer statistics is available: + +``` +subgraph1: 1. input preprocessing (mean data/FPGA):EXECUTED layerType: preprocessing realTime: 129 cpu: 129 +subgraph1: 2. input transfer to DDR:EXECUTED layerType: realTime: 201 cpu: 0 +subgraph1: 3. FPGA execute time:EXECUTED layerType: realTime: 3808 cpu: 0 subgraph1: 4. output transfer from DDR:EXECUTED layerType: realTime: 55 cpu: 0 +subgraph1: 5. FPGA output postprocessing:EXECUTED layerType: realTime: 7 cpu: 7 +subgraph1: 6. softmax/copy: EXECUTED layerType: realTime: 2 cpu: 2 +subgraph2: out_prob: NOT_RUN layerType: Output realTime: 0 cpu: 0 +subgraph2: prob: EXECUTED layerType: SoftMax realTime: 10 cpu: 10 +Total time: 4212 microseconds +``` + +The `softmax/copy` is a glue layer that connects the FPGA subgraph to the CPU subgraph (and copies the data). + +## See Also + +- [Inference Engine Developer Guide](https://software.intel.com/en-us/articles/OpenVINO-inferengine) +- [Model Optimizer Developer Guide](https://software.intel.com/en-us/articles/OpenVINO-ModelOptimizer) diff --git a/docs/resources/introduction.md b/docs/resources/introduction.md new file mode 100644 index 00000000000000..84d53409c92c4f --- /dev/null +++ b/docs/resources/introduction.md @@ -0,0 +1,21 @@ +# Overview of OpenVINO Resources {#openvino_docs_resources_introduction} + + +## Samples + +- [Inference Engine Samples](../IE_DG/Samples_Overview.md) +- [DL Streamer Samples](../IE_DG/Tools_Overview.md) + +## Demos + +- [Demos](@ref omz_demos_README) + + +## Additional Tools + +- [Tools for models calibration and accuracy measurement](../IE_DG/Tools_Overview.md) + +## Pre-Trained Models + +- [Intel's Pre-trained Models from Open Model Zoo](@ref omz_models_intel_index) +- [Public Pre-trained Models Available with OpenVINO™ from Open Model Zoo](@ref omz_models_public_index) \ No newline at end of file diff --git a/docs/security_guide/introduction.md b/docs/security_guide/introduction.md new file mode 100644 index 00000000000000..607a99bdadae2b --- /dev/null +++ b/docs/security_guide/introduction.md @@ -0,0 +1,7 @@ +# Introduction to OpenVINO™ Security {#openvino_docs_security_guide_introduction} + +Deploying deep learning models for OpenVINO™ may raise security and privacy issues. +Trained models are often valuable intellectual property and you may choose to protect them with encryption or other security tools. + +Actual security and privacy requirements depend on your unique deployment scenario. +This section provides general guidance on using OpenVINO tools and libraries securely. diff --git a/docs/security_guide/workbench.md b/docs/security_guide/workbench.md new file mode 100644 index 00000000000000..7d8b128cb1f123 --- /dev/null +++ b/docs/security_guide/workbench.md @@ -0,0 +1,36 @@ +# Deep Learning Workbench Security {#openvino_docs_security_guide_workbench} + +Deep Learning Workbench (DL Workbench) is a web application running within a Docker\* container. + +## Run DL Workbench + +Unless necessary, limit the connections to the DL Workbench to `localhost` (127.0.0.1), so that it +is only accessible from the machine the Docker container is built on: + +* The script [starting the DL Workbench from the + package](@ref workbench_docs_Workbench_DG_Install_from_Package) ensures that the container and the web + application are accessible only from the `localhost` by default. + +* When using `docker run` to [start the DL Workbench from Docker + Hub](@ref workbench_docs_Workbench_DG_Install_from_Docker_Hub), limit connections for the host IP 127.0.0.1. + For example, limit the connections for the host IP to the port `5665` with the `-p + 127.0.0.1:5665:5665` command . Refer to [Container + networking](https://docs.docker.com/config/containers/container-networking/#published-ports) for + details. + +## Authentication Security + +DL Workbench uses [authentication tokens](@ref workbench_docs_Workbench_DG_Authentication) to access the +application. The script starting the DL Workbench creates an authentication token each time the DL +Workbench starts. Anyone who has the authentication token can use the DL Workbench. + +When you finish working with the DL Workbench, log out to prevent the use of the DL Workbench from +the same browser session without authentication. + +To invalidate the authentication token completely, [restart the DL +Workbench](@ref workbench_docs_Workbench_DG_Docker_Container). + +## Use TLS to Protect Communications + +[Configure Transport Layer Security (TLS)](@ref workbench_docs_Workbench_DG_Configure_TLS) to keep the +authentication token encrypted. diff --git a/inference-engine/ie_bridges/c/docs/api_overview.md b/inference-engine/ie_bridges/c/docs/api_overview.md index f41d77796227ab..908aa397b61e17 100644 --- a/inference-engine/ie_bridges/c/docs/api_overview.md +++ b/inference-engine/ie_bridges/c/docs/api_overview.md @@ -1,4 +1,4 @@ -# Overview of Inference Engine C* API +# Overview of Inference Engine C* API {#openvino_inference_engine_ie_bridges_c_docs_api_overview} > **NOTE**: It is a preview version of the Inference Engine C* API for evaluation purpose only. > Module structure and API itself may be changed in future releases. diff --git a/inference-engine/ie_bridges/c/samples/hello_classification/README.md b/inference-engine/ie_bridges/c/samples/hello_classification/README.md index 3109bbfd2f4ce4..671e26eb868c28 100644 --- a/inference-engine/ie_bridges/c/samples/hello_classification/README.md +++ b/inference-engine/ie_bridges/c/samples/hello_classification/README.md @@ -1,4 +1,4 @@ -# Hello Classification C Sample +# Hello Classification C Sample {#openvino_inference_engine_ie_bridges_c_samples_hello_classification_README} This topic describes how to run the Hello Classification C sample application. @@ -10,13 +10,13 @@ It demonstrates how to use the following Inference Engine C API in applications: There is also an API introduced to crop a ROI object and set it as input without additional memory re-allocation. To properly demonstrate this API, it is required to run several networks in pipeline which is out of scope of this sample. -> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). +> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../../../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). ## Running -To run the sample, you can use public or pre-trained models. To download the pre-trained models, use the OpenVINO [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/). +To run the sample, you can use public or pre-trained models. To download the pre-trained models, use the OpenVINO [Model Downloader](@ref omz_tools_downloader_README) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/). -> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). You can do inference of an image using a trained AlexNet network on a GPU using the following command: diff --git a/inference-engine/ie_bridges/c/samples/hello_nv12_input_classification/README.md b/inference-engine/ie_bridges/c/samples/hello_nv12_input_classification/README.md index 1ddd6866c221e9..929350691785f5 100644 --- a/inference-engine/ie_bridges/c/samples/hello_nv12_input_classification/README.md +++ b/inference-engine/ie_bridges/c/samples/hello_nv12_input_classification/README.md @@ -1,8 +1,8 @@ -# Hello NV12 Input Classification C Sample +# Hello NV12 Input Classification C Sample {#openvino_inference_engine_ie_bridges_c_samples_hello_nv12_input_classification_README} This topic describes how to run the Hello NV12 Input Classification sample application. The sample demonstrates how to use the new NV12 automatic input pre-processing API of the Inference Engine in your applications. -Refer to [Integrate the Inference Engine New Request API with Your Application](./docs/IE_DG/Integrate_with_customer_application_new_API.md) for details. +Refer to [Integrate the Inference Engine New Request API with Your Application](../../../../../docs/IE_DG/Integrate_with_customer_application_new_API.md) for details. ## How It Works @@ -30,16 +30,16 @@ ffmpeg -i cat.jpg -pix_fmt nv12 cat.yuv > model to work with RGB order, you need to reconvert your model using the Model Optimizer tool > with `--reverse_input_channels` argument specified. For more information about the argument, > refer to **When to Reverse Input Channels** section of -> [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). +> [Converting a Model Using General Conversion Parameters](../../../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). ## Running To run the sample, you can use public or pre-trained models. To download pre-trained models, use -the OpenVINO™ [Model Downloader](https://github.com/opencv/open_model_zoo/tree/master/model_downloader) +the OpenVINO™ [Model Downloader](@ref omz_tools_downloader_README) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/). > **NOTE**: Before running the sample with a trained model, make sure the model is converted to the -> Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +> Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). You can perform inference on an NV12 image using a trained AlexNet network on CPU with the following command: ```sh diff --git a/inference-engine/ie_bridges/c/samples/object_detection_sample_ssd/README.md b/inference-engine/ie_bridges/c/samples/object_detection_sample_ssd/README.md index 70df34b46e40b7..882131d796199d 100644 --- a/inference-engine/ie_bridges/c/samples/object_detection_sample_ssd/README.md +++ b/inference-engine/ie_bridges/c/samples/object_detection_sample_ssd/README.md @@ -1,16 +1,16 @@ -# Object Detection C Sample SSD +# Object Detection C Sample SSD {#openvino_inference_engine_ie_bridges_c_samples_object_detection_sample_ssd_README} This topic demonstrates how to run the Object Detection C sample application, which does inference using object detection networks like SSD-VGG on Intel® Processors and Intel® HD Graphics. -> **NOTE:** This topic describes usage of C implementation of the Object Detection Sample SSD. For the C++* implementation, refer to [Object Detection C++* Sample SSD](./inference-engine/samples/object_detection_sample_ssd/README.md) and for the Python* implementation, refer to [Object Detection Python* Sample SSD](./inference-engine/ie_bridges/python/sample/object_detection_sample_ssd/README.md). +> **NOTE:** This topic describes usage of C implementation of the Object Detection Sample SSD. For the C++* implementation, refer to [Object Detection C++* Sample SSD](../../../../samples/object_detection_sample_ssd/README.md) and for the Python* implementation, refer to [Object Detection Python* Sample SSD](../../../python/sample/object_detection_sample_ssd/README.md). ## How It Works Upon the start-up the sample application reads command line parameters and loads a network and an image to the Inference Engine device. When inference is done, the application creates output images and outputs data to the standard output stream. -> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). +> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../../../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). ## Running @@ -37,9 +37,9 @@ Options: Running the application with the empty list of options yields the usage message given above and an error message. -To run the sample, you can use public or pre-trained models. To download the pre-trained models, use the OpenVINO [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/). +To run the sample, you can use public or pre-trained models. To download the pre-trained models, use the OpenVINO [Model Downloader](@ref omz_tools_downloader_README) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/). -> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). For example, to do inference on a CPU with the OpenVINO™ toolkit person detection SSD models, run one of the following commands: @@ -66,5 +66,5 @@ classes of the detected objects along with the respective confidence values and ## See Also -* [Model Optimizer](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) -* [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) +* [Model Optimizer](../../../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) +* [Model Downloader](@ref omz_tools_downloader_README) diff --git a/inference-engine/ie_bridges/python/README.md b/inference-engine/ie_bridges/python/README.md index 6dbe6a0c00579a..68e9f943637b2e 100644 --- a/inference-engine/ie_bridges/python/README.md +++ b/inference-engine/ie_bridges/python/README.md @@ -1,4 +1,4 @@ -## Software Requirements +## Software Requirements {#openvino_inference_engine_ie_bridges_python_README} - [CMake\*](https://cmake.org/download/) 3.9 or later - Microsoft\* Visual Studio 2015 or later on Windows\* - gcc 4.8 or later on Linux diff --git a/inference-engine/ie_bridges/python/docs/api_overview.md b/inference-engine/ie_bridges/python/docs/api_overview.md index afb3c086bdfe58..85338cc8caa01f 100644 --- a/inference-engine/ie_bridges/python/docs/api_overview.md +++ b/inference-engine/ie_bridges/python/docs/api_overview.md @@ -1,4 +1,4 @@ -# Overview of Inference Engine Python* API +# Overview of Inference Engine Python* API {#openvino_inference_engine_ie_bridges_python_docs_api_overview} This API provides a simplified interface for Inference Engine functionality that allows you to: diff --git a/inference-engine/ie_bridges/python/sample/classification_sample/README.md b/inference-engine/ie_bridges/python/sample/classification_sample/README.md index 7812e07a4cfc12..1116e770bbbdd3 100644 --- a/inference-engine/ie_bridges/python/sample/classification_sample/README.md +++ b/inference-engine/ie_bridges/python/sample/classification_sample/README.md @@ -1,4 +1,4 @@ -# Image Classification Python* Sample +# Image Classification Python* Sample {#openvino_inference_engine_ie_bridges_python_sample_classification_sample_README} This topic demonstrates how to run the Image Classification sample application, which performs inference using image classification networks such as AlexNet and GoogLeNet. @@ -9,7 +9,7 @@ Upon the start-up, the sample application reads command line parameters and load Engine plugin. When inference is done, the application creates an output image and outputs data to the standard output stream. -> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). +> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../../../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). ## Running @@ -46,9 +46,9 @@ Options: Running the application with the empty list of options yields the usage message given above. -To run the sample, you can use AlexNet and GoogLeNet or other image classification models. You can download the pre-trained models with the OpenVINO [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) or from [https://download.01.org/opencv/](https://download.01.org/opencv/). +To run the sample, you can use AlexNet and GoogLeNet or other image classification models. You can download the pre-trained models with the OpenVINO [Model Downloader](@ref omz_tools_downloader_README) or from [https://download.01.org/opencv/](https://download.01.org/opencv/). -> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). For example, to perform inference of an AlexNet model (previously converted to the Inference Engine format) on CPU, use the following command: @@ -66,6 +66,6 @@ For example, to get the top-5 results on GPU, run the following command: ``` ## See Also -* [Using Inference Engine Samples](./docs/IE_DG/Samples_Overview.md) -* [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) -* [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) +* [Using Inference Engine Samples](../../../../../docs/IE_DG/Samples_Overview.md) +* [Model Optimizer tool](../../../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) +* [Model Downloader](@ref omz_tools_downloader_README) diff --git a/inference-engine/ie_bridges/python/sample/classification_sample_async/README.md b/inference-engine/ie_bridges/python/sample/classification_sample_async/README.md index 18ada6fecabc20..25dbfdf8cf8edd 100644 --- a/inference-engine/ie_bridges/python/sample/classification_sample_async/README.md +++ b/inference-engine/ie_bridges/python/sample/classification_sample_async/README.md @@ -1,9 +1,9 @@ -# Image Classification Python* Sample Async +# Image Classification Python* Sample Async {#openvino_inference_engine_ie_bridges_python_sample_classification_sample_async_README} This sample demonstrates how to run the Image Classification sample application with inference executed in the asynchronous mode. The sample demonstrates how to use the new Infer Request API of Inference Engine in applications. -Refer to [Integrate the Inference Engine New Request API with Your Application](./docs/IE_DG/Integrate_with_customer_application_new_API.md) for details. +Refer to [Integrate the Inference Engine New Request API with Your Application](../../../../../docs/IE_DG/Integrate_with_customer_application_new_API.md) for details. The sample demonstrates how to build and execute an inference request 10 times in the asynchronous mode on example of classifications networks. The asynchronous mode might increase the throughput of the pictures. @@ -21,7 +21,7 @@ After that, the application starts inference for the first infer request and wai When inference is done, the application outputs data to the standard output stream. -> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). +> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../../../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). ## Running @@ -59,9 +59,9 @@ Options: Running the application with the empty list of options yields the usage message given above and an error message. -To run the sample, you can use AlexNet and GoogLeNet or other image classification models. You can download the pre-trained models with the OpenVINO [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) or from [https://download.01.org/opencv/](https://download.01.org/opencv/). +To run the sample, you can use AlexNet and GoogLeNet or other image classification models. You can download the pre-trained models with the OpenVINO [Model Downloader](@ref omz_tools_downloader_README) or from [https://download.01.org/opencv/](https://download.01.org/opencv/). -> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). You can do inference of an image using a trained AlexNet network on FPGA with fallback to CPU using the following command: @@ -75,4 +75,4 @@ By default, the application outputs top-10 inference results for each infer requ It also provides throughput value measured in frames per seconds. ## See Also -* [Using Inference Engine Samples](./docs/IE_DG/Samples_Overview.md) +* [Using Inference Engine Samples](../../../../../docs/IE_DG/Samples_Overview.md) diff --git a/inference-engine/ie_bridges/python/sample/hello_query_device/README.md b/inference-engine/ie_bridges/python/sample/hello_query_device/README.md index 24f10afd8fcbe9..28425b2faa4146 100644 --- a/inference-engine/ie_bridges/python/sample/hello_query_device/README.md +++ b/inference-engine/ie_bridges/python/sample/hello_query_device/README.md @@ -1,4 +1,4 @@ -# Hello Query Device Python* Sample +# Hello Query Device Python* Sample {#openvino_inference_engine_ie_bridges_python_sample_hello_query_device_README} This topic demonstrates how to run the Hello Query Device sample application, which queries Inference Engine devices and prints their metrics and default configuration values. The sample shows @@ -47,4 +47,4 @@ Available devices: RANGE_FOR_STREAMS: 6 ``` ## See Also -* [Using Inference Engine Samples](./docs/IE_DG/Samples_Overview.md) +* [Using Inference Engine Samples](../../../../../docs/IE_DG/Samples_Overview.md) diff --git a/inference-engine/ie_bridges/python/sample/object_detection_sample_ssd/README.md b/inference-engine/ie_bridges/python/sample/object_detection_sample_ssd/README.md index 7b9182538b5dcb..5ec822876128a5 100644 --- a/inference-engine/ie_bridges/python/sample/object_detection_sample_ssd/README.md +++ b/inference-engine/ie_bridges/python/sample/object_detection_sample_ssd/README.md @@ -1,9 +1,9 @@ -# Object Detection Python* Sample SSD +# Object Detection Python* Sample SSD {#openvino_inference_engine_ie_bridges_python_sample_object_detection_sample_ssd_README} This sample demonstrates how to run the Object Detection sample application. The sample demonstrates how to use the new Infer Request API of Inference Engine in applications. -Refer to [Integrate the Inference Engine New Request API with Your Application](./docs/IE_DG/Integrate_with_customer_application_new_API.md) for details. +Refer to [Integrate the Inference Engine New Request API with Your Application](../../../../../docs/IE_DG/Integrate_with_customer_application_new_API.md) for details. The sample demonstrates how to build and execute an inference request on example of object detection networks. Due to properties of SSD networks, this sample works correctly only on a batch of the size 1. For a greater number of images in a batch, network reshape is required. @@ -17,7 +17,7 @@ Then, the sample creates an inference request object and executes inference on i When inference is done, the application outputs data to the standard output stream and creates an output image with bounding boxes drawn atop the initial image. -> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). +> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../../../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). ## Running @@ -55,9 +55,9 @@ Options: Running the application with the empty list of options yields the usage message given above and an error message. -To run the sample, you can use RMNet_SSD or other object-detection models. You can download the pre-trained models with the OpenVINO [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) or from [https://download.01.org/opencv/](https://download.01.org/opencv/). +To run the sample, you can use RMNet_SSD or other object-detection models. You can download the pre-trained models with the OpenVINO [Model Downloader](@ref omz_tools_downloader_README) or from [https://download.01.org/opencv/](https://download.01.org/opencv/). -> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). You can do inference of an image using a trained RMNet_SSD network on FPGA with fallback to CPU using the following command: @@ -70,4 +70,4 @@ You can do inference of an image using a trained RMNet_SSD network on FPGA with By default, the application outputs all inference results and draws bounding boxes for inference results with an over 50% confidence. ## See Also -* [Using Inference Engine Samples](./docs/IE_DG/Samples_Overview.md) +* [Using Inference Engine Samples](../../../../../docs/IE_DG/Samples_Overview.md) diff --git a/inference-engine/ie_bridges/python/sample/style_transfer_sample/README.md b/inference-engine/ie_bridges/python/sample/style_transfer_sample/README.md index 3cc1e39e86c347..ad1834829ad8f9 100644 --- a/inference-engine/ie_bridges/python/sample/style_transfer_sample/README.md +++ b/inference-engine/ie_bridges/python/sample/style_transfer_sample/README.md @@ -1,13 +1,13 @@ -# Neural Style Transfer Python* Sample +# Neural Style Transfer Python* Sample {#openvino_inference_engine_ie_bridges_python_sample_style_transfer_sample_README} This topic demonstrates how to run the Neural Style Transfer sample application, which performs inference of style transfer models. -> **NOTE**: The OpenVINO™ toolkit does not include a pre-trained model to run the Neural Style Transfer sample. A public model from the [Zhaw's Neural Style Transfer repository](https://github.com/zhaw/neural_style) can be used. Read the [Converting a Style Transfer Model from MXNet*](./docs/MO_DG/prepare_model/convert_model/mxnet_specific/Convert_Style_Transfer_From_MXNet.md) topic from the [Model Optimizer Developer Guide](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) to learn about how to get the trained model and how to convert it to the Inference Engine format (\*.xml + \*.bin). +> **NOTE**: The OpenVINO™ toolkit does not include a pre-trained model to run the Neural Style Transfer sample. A public model from the [Zhaw's Neural Style Transfer repository](https://github.com/zhaw/neural_style) can be used. Read the [Converting a Style Transfer Model from MXNet*](../../../../../docs/MO_DG/prepare_model/convert_model/mxnet_specific/Convert_Style_Transfer_From_MXNet.md) topic from the [Model Optimizer Developer Guide](../../../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) to learn about how to get the trained model and how to convert it to the Inference Engine format (\*.xml + \*.bin). ## How It Works -> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). +> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../../../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). ## Running @@ -62,7 +62,7 @@ To perform inference of an image using a trained model of NST network on Intel® The application outputs an image (`out1.bmp`) or a sequence of images (`out1.bmp`, ..., `out.bmp`) which are redrawn in style of the style transfer model used for sample. -## See Also -* [Using Inference Engine Samples](./docs/IE_DG/Samples_Overview.md) +## See Also +* [Using Inference Engine Samples](../../../../../docs/IE_DG/Samples_Overview.md) diff --git a/inference-engine/samples/benchmark_app/README.md b/inference-engine/samples/benchmark_app/README.md index 031534c2eaf300..b90c5b48a7103d 100644 --- a/inference-engine/samples/benchmark_app/README.md +++ b/inference-engine/samples/benchmark_app/README.md @@ -1,15 +1,15 @@ -# Benchmark C++ Tool +# Benchmark C++ Tool {#openvino_inference_engine_samples_benchmark_app_README} This topic demonstrates how to use the Benchmark C++ Tool to estimate deep learning inference performance on supported devices. Performance can be measured for two inference modes: synchronous (latency-oriented) and asynchronous (throughput-oriented). -> **NOTE:** This topic describes usage of C++ implementation of the Benchmark Tool. For the Python* implementation, refer to [Benchmark Python* Tool](./inference-engine/tools/benchmark_tool/README.md). +> **NOTE:** This topic describes usage of C++ implementation of the Benchmark Tool. For the Python* implementation, refer to [Benchmark Python* Tool](../../tools/benchmark_tool/README.md). ## How It Works Upon start-up, the application reads command-line parameters and loads a network and images/binary files to the Inference Engine plugin, which is chosen depending on a specified device. The number of infer requests and execution approach depend on the mode defined with the `-api` command-line parameter. -> **NOTE**: By default, Inference Engine samples, tools and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). +> **NOTE**: By default, Inference Engine samples, tools and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). If you run the application in the synchronous mode, it creates one infer request and executes the `Infer` method. If you run the application in the asynchronous mode, it creates as many infer requests as specified in the `-nireq` command-line parameter and executes the `StartAsync` method for each of them. If `-nireq` is not set, the application will use the default value for specified device. @@ -38,23 +38,23 @@ enable statistics dumping by setting the `-report_type` parameter to one of the Depending on the type, the report is stored to `benchmark_no_counters_report.csv`, `benchmark_average_counters_report.csv`, or `benchmark_detailed_counters_report.csv` file located in the path specified in `-report_folder`. -The application also saves executable graph information serialized to a XML file if you specify a path to it with the +The application also saves executable graph information serialized to an XML file if you specify a path to it with the `-exec_graph_path` parameter. ## Run the Tool -Notice that the benchmark_app usually produces optimal performance for any device out of the box. +Note that the benchmark_app usually produces optimal performance for any device out of the box. **So in most cases you don't need to play the app options explicitly and the plain device name is enough**, for example, for CPU: ```sh ./benchmark_app -m -i -d CPU ``` -But it is still may be non-optimal for some cases, especially for very small networks. More details can read in [Introduction to Performance Topics](./docs/IE_DG/Intro_to_Performance.md). +But it is still may be non-optimal for some cases, especially for very small networks. More details can read in [Introduction to Performance Topics](../../../docs/IE_DG/Intro_to_Performance.md). -As explained in the [Introduction to Performance Topics](./docs/IE_DG/Intro_to_Performance.md) section, for all devices, including new [MULTI device](./docs/IE_DG/supported_plugins/MULTI.md) it is preferable to use the FP16 IR for the model. +As explained in the [Introduction to Performance Topics](../../../docs/IE_DG/Intro_to_Performance.md) section, for all devices, including new [MULTI device](../../../docs/IE_DG/supported_plugins/MULTI.md) it is preferable to use the FP16 IR for the model. Also if latency of the CPU inference on the multi-socket machines is of concern, please refer to the same -[Introduction to Performance Topics](./docs/IE_DG/Intro_to_Performance.md) document. +[Introduction to Performance Topics](../../../docs/IE_DG/Intro_to_Performance.md) document. Running the application with the `-h` option yields the following usage message: ``` @@ -82,7 +82,7 @@ Options: -nireq "" Optional. Number of infer requests. Default value is determined automatically for a device. -b "" Optional. Batch size value. If not specified, the batch size value is determined from Intermediate Representation. -stream_output Optional. Print progress as a plain text. When specified, an interactive progress bar is replaced with a multiline output. - -t Optional. Time in seconds to execute topology. + -t Optional. Time, in seconds, to execute topology. -progress Optional. Show progress bar (can affect performance measurement). Default values is "false". -shape Optional. Set shape for input. For example, "input1[1,3,224,224],input2[1,4]" or "[1,3,224,224]" in case of one input size. @@ -93,7 +93,7 @@ Options: Please note that although the automatic selection usually provides a reasonable performance, it still may be non-optimal for some cases, especially for very small networks. -nthreads "" Optional. Number of threads to use for inference on the CPU (including HETERO and MULTI cases). - -enforcebf16 Optional. Enforcing of floating point operations execution in bfloat16 precision where it is acceptable. + -enforcebf16 Optional. Enforcing of floating point operations execution in bfloat16 precision on platforms with native bfloat16 support. By default, this key sets "true" on platforms with native bfloat16 support and "false" for other platforms. Use "-enforcebf16=false" to disable this feature. -pin "YES"/"NO"/"NUMA" Optional. Enable threads->cores ("YES", default), threads->(NUMA)nodes ("NUMA") or completely disable ("NO") CPU threads pinning for CPU-involved inference. @@ -108,14 +108,14 @@ Options: Running the application with the empty list of options yields the usage message given above and an error message. -Application supports topologies with one or more inputs. If a topology is not data sensitive, you can skip the input parameter. In this case, inputs are filled with random values. -If a model has only image input(s), please a provide folder with images or a path to an image as input. -If a model has some specific input(s) (not images), please prepare a binary file(s), which is filled with data of appropriate precision and provide a path to them as input. +Application supports topologies with one or more inputs. If a topology is not data-sensitive, you can skip the input parameter. In this case, inputs are filled with random values. +If a model has only image input(s), please provide a folder with images or a path to an image as input. +If a model has some specific input(s) (not images), please prepare a binary file(s) that is filled with data of appropriate precision and provide a path to them as input. If a model has mixed input types, input folder should contain all required files. Image inputs are filled with image files one by one. Binary inputs are filled with binary inputs one by one. -To run the tool, you can use public or Intel's pre-trained models. To download the models, use the OpenVINO [Model Downloader](./tools/downloader/README.md) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/). +To run the tool, you can use public or Intel's pre-trained models. To download the models, use the OpenVINO [Model Downloader](@ref omz_tools_downloader_README) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/). -> **NOTE**: Before running the tool with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +> **NOTE**: Before running the tool with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). ## Examples of Running the Tool @@ -148,7 +148,7 @@ This section provides step-by-step instructions on how to run the Benchmark Tool ./benchmark_app -m /googlenet-v1.xml -d HETERO:FPGA,CPU -api async -i /deployment_tools/demo/car.png --progress true ``` -The application outputs the number of executed iterations, total duration of execution, latency and throughput. +The application outputs the number of executed iterations, total duration of execution, latency, and throughput. Additionally, if you set the `-report_type` parameter, the application outputs statistics report. If you set the `-pc` parameter, the application outputs performance counters. If you set `-exec_graph_path`, the application reports executable graph information serialized. All measurements including per-layer PM counters are reported in milliseconds. Below are fragments of sample output for CPU and FPGA devices: @@ -181,6 +181,6 @@ Below are fragments of sample output for CPU and FPGA devices: ``` ## See Also -* [Using Inference Engine Samples](./docs/IE_DG/Samples_Overview.md) -* [Model Optimizer](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) -* [Model Downloader](./tools/downloader/README.md) \ No newline at end of file +* [Using Inference Engine Samples](../../../docs/IE_DG/Samples_Overview.md) +* [Model Optimizer](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) +* [Model Downloader](@ref omz_tools_downloader_README) \ No newline at end of file diff --git a/inference-engine/samples/classification_sample_async/README.md b/inference-engine/samples/classification_sample_async/README.md index a92f978559be17..f41d00951576da 100644 --- a/inference-engine/samples/classification_sample_async/README.md +++ b/inference-engine/samples/classification_sample_async/README.md @@ -1,11 +1,11 @@ -# Image Classification C++ Sample Async +# Image Classification C++ Sample Async {#openvino_inference_engine_samples_classification_sample_async_README} This sample demonstrates how to run the Image Classification sample application with inference executed in the asynchronous mode. -> **NOTE:** This topic describes usage of C++ implementation of the Image Classification Sample Async. For the Python* implementation, refer to [Image Classification Python* Sample Async](./inference-engine/ie_bridges/python/sample/classification_sample_async/README.md). +> **NOTE:** This topic describes usage of C++ implementation of the Image Classification Sample Async. For the Python* implementation, refer to [Image Classification Python* Sample Async](../../ie_bridges/python/sample/classification_sample_async/README.md). The sample demonstrates how to use the new Infer Request API of Inference Engine in applications. -Refer to [Integrate the Inference Engine New Request API with Your Application](./docs/IE_DG/Integrate_with_customer_application_new_API.md) for details. +Refer to [Integrate the Inference Engine New Request API with Your Application](../../../docs/IE_DG/Integrate_with_customer_application_new_API.md) for details. The sample demonstrates how to build and execute an inference request 10 times in the asynchronous mode on example of classifications networks. The asynchronous mode might increase the throughput of the pictures. @@ -23,7 +23,7 @@ After that, the application starts inference for the first infer request and wai When inference is done, the application outputs data to the standard output stream. -> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). +> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). ## Running @@ -49,9 +49,9 @@ Options: Running the application with the empty list of options yields the usage message given above and an error message. -To run the sample, use AlexNet and GoogLeNet or other public or pre-trained image classification models. To download the pre-trained models, use the OpenVINO [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/). +To run the sample, use AlexNet and GoogLeNet or other public or pre-trained image classification models. To download the pre-trained models, use the OpenVINO [Model Downloader](@ref omz_tools_downloader_README) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/). -> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). You can do inference of an image using a trained AlexNet network on FPGA with fallback to CPU using the following command: ```sh @@ -63,6 +63,6 @@ You can do inference of an image using a trained AlexNet network on FPGA with fa By default the application outputs top-10 inference results for each infer request. ## See Also -* [Using Inference Engine Samples](./docs/IE_DG/Samples_Overview.md) -* [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) -* [Model Optimizer](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) +* [Using Inference Engine Samples](../../../docs/IE_DG/Samples_Overview.md) +* [Model Downloader](@ref omz_tools_downloader_README) +* [Model Optimizer](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) diff --git a/inference-engine/samples/hello_classification/README.md b/inference-engine/samples/hello_classification/README.md index 90833a58ea699c..5f744f18d229cc 100644 --- a/inference-engine/samples/hello_classification/README.md +++ b/inference-engine/samples/hello_classification/README.md @@ -1,7 +1,7 @@ -# Hello Classification C++ Sample +# Hello Classification C++ Sample {#openvino_inference_engine_samples_hello_classification_README} This topic describes how to run the Hello Infer Classification sample application. -The sample is simplified version of [Image Classification Sample Async](./inference-engine/samples/classification_sample_async/README.md) +The sample is simplified version of [Image Classification Sample Async](../classification_sample_async/README.md) and developed with support of UNICODE. It demonstrates how to use the following Inference Engine API in applications: * Synchronous Infer Request API @@ -10,18 +10,18 @@ It demonstrates how to use the following Inference Engine API in applications: There is also an API introduced to crop a ROI object and set it as input without additional memory re-allocation. To properly demonstrate this API, it is required to run several networks in pipeline which is out of scope of this sample. -Please refer to [Security Barrier Camera Demo](./demos/security_barrier_camera_demo/README.md), or -[Crossroad Camera Demo](./demos/crossroad_camera_demo/README.md) with an example of using of new crop ROI API. +Please refer to [Security Barrier Camera Demo](@ref omz_demos_security_barrier_camera_demo_README), or +[Crossroad Camera Demo](@ref omz_demos_crossroad_camera_demo_README) with an example of using of new crop ROI API. -Refer to [Integrate the Inference Engine New Request API with Your Application](./docs/IE_DG/Integrate_with_customer_application_new_API.md) for details. +Refer to [Integrate the Inference Engine New Request API with Your Application](../../../docs/IE_DG/Integrate_with_customer_application_new_API.md) for details. -> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). +> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). ## Running -To run the sample, you can use public or pre-trained models. To download the pre-trained models, use the OpenVINO [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/). +To run the sample, you can use public or pre-trained models. To download the pre-trained models, use the OpenVINO [Model Downloader](@ref omz_tools_downloader_README) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/). -> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). You can do inference of an image using a trained AlexNet network on a GPU using the following command: ```sh @@ -33,4 +33,4 @@ You can do inference of an image using a trained AlexNet network on a GPU using The application outputs top-10 inference results. ## See Also -* [Using Inference Engine Samples](./docs/IE_DG/Samples_Overview.md) +* [Using Inference Engine Samples](../../../docs/IE_DG/Samples_Overview.md) diff --git a/inference-engine/samples/hello_nv12_input_classification/README.md b/inference-engine/samples/hello_nv12_input_classification/README.md index b22dc2e077b219..3986bcbfa049b1 100644 --- a/inference-engine/samples/hello_nv12_input_classification/README.md +++ b/inference-engine/samples/hello_nv12_input_classification/README.md @@ -1,9 +1,9 @@ -# Hello NV12 Input Classification C++ Sample +# Hello NV12 Input Classification C++ Sample {#openvino_inference_engine_samples_hello_nv12_input_classification_README} This topic describes how to run the Hello NV12 Input Classification sample application. -The sample is a simplified version of the [Image Classification Sample Async](./inference-engine/samples/classification_sample_async/README.md). +The sample is a simplified version of the [Image Classification Sample Async](../classification_sample_async/README.md). It demonstrates how to use the new NV12 automatic input pre-processing API of the Inference Engine in your applications. -Refer to [Integrate the Inference Engine New Request API with Your Application](./docs/IE_DG/Integrate_with_customer_application_new_API.md) for details. +Refer to [Integrate the Inference Engine New Request API with Your Application](../../../docs/IE_DG/Integrate_with_customer_application_new_API.md) for details. ## How It Works @@ -31,16 +31,16 @@ ffmpeg -i cat.jpg -pix_fmt nv12 cat.yuv > model to work with RGB order, you need to reconvert your model using the Model Optimizer tool > with `--reverse_input_channels` argument specified. For more information about the argument, > refer to **When to Reverse Input Channels** section of -> [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). +> [Converting a Model Using General Conversion Parameters](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). ## Running To run the sample, you can use public or pre-trained models. To download pre-trained models, use -the OpenVINO™ [Model Downloader](https://github.com/opencv/open_model_zoo/tree/master/model_downloader) +the OpenVINO™ [Model Downloader](@ref omz_tools_downloader_README) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/). > **NOTE**: Before running the sample with a trained model, make sure the model is converted to the -> Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +> Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). You can perform inference on an NV12 image using a trained AlexNet network on CPU with the following command: ```sh @@ -52,4 +52,4 @@ You can perform inference on an NV12 image using a trained AlexNet network on CP The application outputs top-10 inference results. ## See Also -* [Using Inference Engine Samples](./docs/IE_DG/Samples_Overview.md) +* [Using Inference Engine Samples](../../../docs/IE_DG/Samples_Overview.md) diff --git a/inference-engine/samples/hello_query_device/README.md b/inference-engine/samples/hello_query_device/README.md index 0d796de305a14e..b884284b71fee7 100644 --- a/inference-engine/samples/hello_query_device/README.md +++ b/inference-engine/samples/hello_query_device/README.md @@ -1,8 +1,8 @@ -# Hello Query Device C++ Sample +# Hello Query Device C++ Sample {#openvino_inference_engine_samples_hello_query_device_README} -This topic demonstrates how to run the Hello Query Device sample application, which queries Inference Engine devices and prints their metrics and default configuration values. The sample shows how to use [Query Device API feature](./docs/IE_DG/InferenceEngine_QueryAPI.md). +This topic demonstrates how to run the Hello Query Device sample application, which queries Inference Engine devices and prints their metrics and default configuration values. The sample shows how to use [Query Device API feature](../../../docs/IE_DG/InferenceEngine_QueryAPI.md). > **NOTE:** This topic describes usage of C++ implementation of the Query Device Sample. -> For the Python* implementation, refer to [Hello Query Device Python* Sample](./inference-engine/ie_bridges/python/sample/hello_query_device/README.md) +> For the Python* implementation, refer to [Hello Query Device Python* Sample](../../ie_bridges/python/sample/hello_query_device/README.md) ## Running To see quired information, run the following: @@ -51,6 +51,6 @@ Available devices: ``` ## See Also -* [Using Inference Engine Samples](./docs/IE_DG/Samples_Overview.md) -* [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) -* [Model Optimizer](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) +* [Using Inference Engine Samples](../../../docs/IE_DG/Samples_Overview.md) +* [Model Downloader](@ref omz_tools_downloader_README) +* [Model Optimizer](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) diff --git a/inference-engine/samples/hello_reshape_ssd/README.md b/inference-engine/samples/hello_reshape_ssd/README.md index c33c97db67a2ae..c9291a866ab5e9 100644 --- a/inference-engine/samples/hello_reshape_ssd/README.md +++ b/inference-engine/samples/hello_reshape_ssd/README.md @@ -1,15 +1,15 @@ -# Hello Reshape SSD C++ Sample +# Hello Reshape SSD C++ Sample {#openvino_inference_engine_samples_hello_reshape_ssd_README} This topic demonstrates how to run the Hello Reshape SSD application, which does inference using object detection -networks like SSD-VGG. The sample shows how to use [Shape Inference feature](./docs/IE_DG/ShapeInference.md). +networks like SSD-VGG. The sample shows how to use [Shape Inference feature](../../../docs/IE_DG/ShapeInference.md). -> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). +> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). ## Running -To run the sample, you can use public or pre-trained models. To download the pre-trained models, use the OpenVINO [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/). +To run the sample, you can use public or pre-trained models. To download the pre-trained models, use the OpenVINO [Model Downloader](@ref omz_tools_downloader_README) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/). -> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). You can use the following command to do inference on CPU of an image using a trained SSD network: ```sh @@ -23,6 +23,6 @@ of the detected objects along with the respective confidence values and the coor rectangles to the standard output stream. ## See Also -* [Using Inference Engine Samples](./docs/IE_DG/Samples_Overview.md) -* [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) -* [Model Optimizer](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) +* [Using Inference Engine Samples](../../../docs/IE_DG/Samples_Overview.md) +* [Model Downloader](@ref omz_tools_downloader_README) +* [Model Optimizer](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) diff --git a/inference-engine/samples/ngraph_function_creation_sample/README.md b/inference-engine/samples/ngraph_function_creation_sample/README.md index ae98173455a740..6833441cf09d8b 100644 --- a/inference-engine/samples/ngraph_function_creation_sample/README.md +++ b/inference-engine/samples/ngraph_function_creation_sample/README.md @@ -1,4 +1,4 @@ -# nGraph Function C++ Sample +# nGraph Function C++ Sample {#openvino_inference_engine_samples_ngraph_function_creation_sample_README} This sample demonstrates how to execute an inference using ngraph::Function to create a network. The sample uses the LeNet classifications network as an example. @@ -13,12 +13,12 @@ When the inference is done, the application outputs inference results to the sta > **NOTE**: This sample supports models with FP32 weights only. -The `lenet.bin` weights file was generated by the [Model Optimizer](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) +The `lenet.bin` weights file was generated by the [Model Optimizer](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) tool from the public LeNet model with the `--input_shape [64,1,28,28]` parameter specified. The original model is available in the [Caffe* repository](https://github.com/BVLC/caffe/tree/master/examples/mnist) on GitHub\*. -> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). +> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). ## Running @@ -53,6 +53,23 @@ For example, to do inference of an UByte image on a GPU run the following comman By default, the application outputs top-10 inference results for each inference request. +## Deprecation Notice + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* + ## See Also -* [Using Inference Engine Samples](./docs/IE_DG/Samples_Overview.md) +* [Using Inference Engine Samples](../../../docs/IE_DG/Samples_Overview.md) diff --git a/inference-engine/samples/object_detection_sample_ssd/README.md b/inference-engine/samples/object_detection_sample_ssd/README.md index 1b3ac679200329..a6acae7d3fcfb7 100644 --- a/inference-engine/samples/object_detection_sample_ssd/README.md +++ b/inference-engine/samples/object_detection_sample_ssd/README.md @@ -1,9 +1,9 @@ -# Object Detection C++ Sample SSD +# Object Detection C++ Sample SSD {#openvino_inference_engine_samples_object_detection_sample_ssd_README} This topic demonstrates how to run the Object Detection sample application, which does inference using object detection networks like SSD-VGG on Intel® Processors and Intel® HD Graphics. -> **NOTE:** This topic describes usage of C++ implementation of the Object Detection Sample SSD. For the Python* implementation, refer to [Object Detection Python* Sample SSD](./inference-engine/ie_bridges/python/sample/object_detection_sample_ssd/README.md). +> **NOTE:** This topic describes usage of C++ implementation of the Object Detection Sample SSD. For the Python* implementation, refer to [Object Detection Python* Sample SSD](../../ie_bridges/python/sample/object_detection_sample_ssd/README.md). ## How It Works @@ -11,7 +11,7 @@ Upon the start-up the sample application reads command line parameters and loads Engine device. When inference is done, the application creates an output image and outputs data to the standard output stream. -> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). +> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). ## Running @@ -36,9 +36,9 @@ Options: Running the application with the empty list of options yields the usage message given above and an error message. -To run the sample, you can use public or pre-trained models. To download the pre-trained models, use the OpenVINO [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/). +To run the sample, you can use public or pre-trained models. To download the pre-trained models, use the OpenVINO [Model Downloader](@ref omz_tools_downloader_README) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/). -> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). For example, to do inference on a CPU with the OpenVINO™ toolkit person detection SSD models, run one of the following commands: @@ -58,6 +58,6 @@ rectangles to the standard output stream. ## See Also -* [Using Inference Engine Samples](./docs/IE_DG/Samples_Overview.md) -* [Model Optimizer](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) -* [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) +* [Using Inference Engine Samples](../../../docs/IE_DG/Samples_Overview.md) +* [Model Optimizer](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) +* [Model Downloader](@ref omz_tools_downloader_README) diff --git a/inference-engine/samples/speech_libs_and_demos/Kaldi_SLM_conversion_tool.md b/inference-engine/samples/speech_libs_and_demos/Kaldi_SLM_conversion_tool.md index 6a2f5cf9fe0c00..a429f70577fbbe 100644 --- a/inference-engine/samples/speech_libs_and_demos/Kaldi_SLM_conversion_tool.md +++ b/inference-engine/samples/speech_libs_and_demos/Kaldi_SLM_conversion_tool.md @@ -1,4 +1,4 @@ -# Kaldi* Statistical Language Model Conversion Tool +# Kaldi* Statistical Language Model Conversion Tool {#openvino_inference_engine_samples_speech_libs_and_demos_Kaldi_SLM_conversion_tool} The Kaldi* Statistical Language Model (SLM) Conversion Tool is a command-line tool that converts [Kaldi](https://kaldi-asr.org/) language model resources to the format supported by the OpenVINO™ Speech Recognition Demos. diff --git a/inference-engine/samples/speech_libs_and_demos/Live_speech_recognition_demo.md b/inference-engine/samples/speech_libs_and_demos/Live_speech_recognition_demo.md index 8b47b2caff75e4..778f8574a9636f 100644 --- a/inference-engine/samples/speech_libs_and_demos/Live_speech_recognition_demo.md +++ b/inference-engine/samples/speech_libs_and_demos/Live_speech_recognition_demo.md @@ -1,4 +1,4 @@ -# Live Speech Recognition Demo +# Live Speech Recognition Demo {#openvino_inference_engine_samples_speech_libs_and_demos_Live_speech_recognition_demo} This demo provides a GUI interface for automatic speech recognition using selected OpenVINO™ Inference Engine plugin, OpenVINO™ Feature Extraction Library, and OpenVINO™ Decoder Library. @@ -8,13 +8,13 @@ The application transcribes audio from a WAV file and/or audio device. It suppor The software stack used by the demo is as follows: -![](./inference-engine/samples/speech_libs_and_demos/img/sw_components.png) +![](img/sw_components.png) ## Running The application main window looks like this: -![](./inference-engine/samples/speech_libs_and_demos/img/live_speech_recognition_demo_annotated.jpg) +![](img/live_speech_recognition_demo_annotated.jpg) Refer to the sections below for instructions for particular scenarios. diff --git a/inference-engine/samples/speech_libs_and_demos/Offline_speech_recognition_demo.md b/inference-engine/samples/speech_libs_and_demos/Offline_speech_recognition_demo.md index 2e9f656927cf20..71e1d693e1fa6c 100644 --- a/inference-engine/samples/speech_libs_and_demos/Offline_speech_recognition_demo.md +++ b/inference-engine/samples/speech_libs_and_demos/Offline_speech_recognition_demo.md @@ -1,4 +1,4 @@ -# Offline Speech Recognition Demo +# Offline Speech Recognition Demo {#openvino_inference_engine_samples_speech_libs_and_demos_Offline_speech_recognition_demo} This demo provides a command-line interface for automatic speech recognition using OpenVINO™. Components used by this executable: diff --git a/inference-engine/samples/speech_libs_and_demos/Speech_library.md b/inference-engine/samples/speech_libs_and_demos/Speech_library.md index d0507a43c6a38e..b407447156787a 100644 --- a/inference-engine/samples/speech_libs_and_demos/Speech_library.md +++ b/inference-engine/samples/speech_libs_and_demos/Speech_library.md @@ -1,4 +1,4 @@ -# Speech Library +# Speech Library {#openvino_inference_engine_samples_speech_libs_and_demos_Speech_library} ## Overview @@ -24,7 +24,7 @@ The pipeline consists of the following stages: 2. Neural acoustic scoring: the OpenVINO ™ Inference Engine transcribes the extracted features into a sequence of phonemes using a neural acoustic model 3. Language model decoding: the Intel® Speech Decoder turns the phonemes into text hypothesis. The decoding graph takes into account the grammar of the data, as well as the distribution and probabilities of contiguous specific words (n-grams) -![](./inference-engine/samples/speech_libs_and_demos/img/asr_pipeline.png) +![](img/asr_pipeline.png) ## Speech Library API @@ -35,11 +35,11 @@ The Speech Library API consists of simple routines: * Inform about new stable recognition result The flow is described below: -![](./inference-engine/samples/speech_libs_and_demos/img/speech_library_api.png) +![](img/speech_library_api.png) See `/data_processing/audio/speech_recognition/include/speech_library.h` for details about the API. -A great example on how to use the API is the source code of [offline speech recognition demo](./inference-engine/samples/speech_libs_and_demos/Offline_speech_recognition_demo.md). +A great example on how to use the API is the source code of [offline speech recognition demo](Offline_speech_recognition_demo.md). ## Run Your Application diff --git a/inference-engine/samples/speech_libs_and_demos/Speech_libs_and_demos.md b/inference-engine/samples/speech_libs_and_demos/Speech_libs_and_demos.md index 7d988868182f43..f83cf089feb811 100644 --- a/inference-engine/samples/speech_libs_and_demos/Speech_libs_and_demos.md +++ b/inference-engine/samples/speech_libs_and_demos/Speech_libs_and_demos.md @@ -1,4 +1,4 @@ -# Speech Library and Speech Recognition Demos +# Speech Library and Speech Recognition Demos {#openvino_inference_engine_samples_speech_libs_and_demos_Speech_libs_and_demos} Starting with the 2020.1 release, OpenVINO™ provides a set of libraries and demos to demonstrate end-to-end speech recognition, as well as new acoustic and language models that can work with these demos. @@ -7,11 +7,11 @@ as postprocessing (decoding) to produce text from scores. Together with OpenVINO these libraries provide an end-to-end pipeline converting speech to text. This pipeline is demonstrated by the end-to-end demos: -![](./inference-engine/samples/speech_libs_and_demos/img/new_speech_demos.png) +![](img/new_speech_demos.png) -Note that the OpenVINO™ package also includes an [automatic speech recognition sample](./inference-engine/samples/speech_sample/README.md) demonstrating acoustic model inference based on Kaldi\* neural networks. The sample works with Kaldi ARK files only, so it does not cover an end-to-end speech recognition scenario (speech to text),requiring additional preprocessing (feature extraction) to get a feature vector from a speech signal, as well as postprocessing (decoding) to produce text from scores: +Note that the OpenVINO™ package also includes an [automatic speech recognition sample](../speech_sample/README.md) demonstrating acoustic model inference based on Kaldi\* neural networks. The sample works with Kaldi ARK files only, so it does not cover an end-to-end speech recognition scenario (speech to text),requiring additional preprocessing (feature extraction) to get a feature vector from a speech signal, as well as postprocessing (decoding) to produce text from scores: -![](./inference-engine/samples/speech_libs_and_demos/img/speech_sample.png) +![](img/speech_sample.png) The main purpose of the sample is to demonstrate a variety of features and options provided by OpenVINO™ for speech recognition neural networks. @@ -24,20 +24,18 @@ Find new libraries, demos, and models at `/data_processing/audio/sp The package contains the following components: -* [Speech Library](./inference-engine/samples/speech_libs_and_demos/Speech_library.md), which includes a feature extractor and decoder +* [Speech Library](Speech_library.md), which includes a feature extractor and decoder -* [Offline Speech Recognition Demo](./inference-engine/samples/speech_libs_and_demos/Offline_speech_recognition_demo.md), which can process wave files with recorded speech +* [Offline Speech Recognition Demo](Offline_speech_recognition_demo.md), which can process wave files with recorded speech -* [Live Speech Recognition Demo](./inference-engine/samples/speech_libs_and_demos/Live_speech_recognition_demo.md), which showcases transcription from a microphone or speakers +* [Live Speech Recognition Demo](Live_speech_recognition_demo.md), which showcases transcription from a microphone or speakers -* [Kaldi Statistical Language Model Conversion Tool](./inference-engine/samples/speech_libs_and_demos/Kaldi_SLM_conversion_tool.md), which converts custom language models to use in the decoder +* [Kaldi Statistical Language Model Conversion Tool](Kaldi_SLM_conversion_tool.md), which converts custom language models to use in the decoder Additionally, [new acoustic and language models](http://download.01.org/opencv/2020/openvinotoolkit/2020.1/models_contrib/speech/kaldi/librispeech_s5/) to be used by new demos are located at [download.01.org](https://01.org/). ## Run Speech Recognition Demos with Pretrained Models -> **NOTE**: This section describes the script included to the OpenVINO™ 2020.1 release. For technical reasons, this script is not provided with the 2020.2 release, but it will be back in the next release. All demos and tools listed in the previous section are included to the 2020.2 release, so you can use the links above for the instructions on how to use them. - To download pretrained models and build all dependencies: * On Linux* OS or macOS*, use the shell script `/deployment_tools/demo/demo_speech_recognition.sh` @@ -82,8 +80,8 @@ Before running demonstration applications with custom models, follow the steps b 1. Build the Speech Library and demonstration application using the `demo_speech_recognition.sh/.bat` file mentioned in Run Speech Recognition Demos with Pretrained Models 2. Train acoustic and statistical language models using the Kaldi framework (if required) -3. [Convert the acoustic model](./docs/MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md) using Model Optimizer for Kaldi -4. [Convert the language model](./inference-engine/samples/speech_libs_and_demos/Kaldi_SLM_conversion_tool.md) using the Kaldi toolkit and provided converter +3. [Convert the acoustic model](../../../docs/MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md) using Model Optimizer for Kaldi +4. [Convert the language model](Kaldi_SLM_conversion_tool.md) using the Kaldi toolkit and provided converter 5. Create a configuration file that lists all the models required for recognition 6. Copy configuration file to `{OpenVINO build folder}/data_processing/audio/speech_recognition/models/{LANG}`. The demo models are trained for US English, so use `en-us` for the `{LANG}` folder name. @@ -98,7 +96,7 @@ In order to convert acoustic models, the following Kaldi files are required: - Counts file, `pdf.counts` (if used) - Feature transformation file, `final.feature_transform` (if used) -For conversion steps, follow [Converting a Kaldi* Model](./docs/MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md). +For conversion steps, follow [Converting a Kaldi* Model](../../../docs/MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md). > **NOTE**: Set the path to the XML file with the converted model in the configuration file. @@ -130,6 +128,6 @@ Model conversion from Kaldi requires the following steps: > **NOTE**: Put the paths to `cl.fst` and `labels.bin` files in the configuration file to use them with the Live Speech Recognition Demo Application. -See the [offline speech recognition demo documentation](./inference-engine/samples/speech_libs_and_demos/Offline_speech_recognition_demo.md) to learn about the configuration file format. +See the [offline speech recognition demo documentation](Offline_speech_recognition_demo.md) to learn about the configuration file format. -See [Kaldi* Statistical Language Model Conversion Tool](./inference-engine/samples/speech_libs_and_demos/Kaldi_SLM_conversion_tool.md) for more information on the conversion tool. +See [Kaldi* Statistical Language Model Conversion Tool](Kaldi_SLM_conversion_tool.md) for more information on the conversion tool. diff --git a/inference-engine/samples/speech_libs_and_demos/img/live_speech_recognition_demo_annotated.jpg b/inference-engine/samples/speech_libs_and_demos/img/live_speech_recognition_demo_annotated.jpg index 6759b6ad4c4e5b..d17c55374399a5 100644 Binary files a/inference-engine/samples/speech_libs_and_demos/img/live_speech_recognition_demo_annotated.jpg and b/inference-engine/samples/speech_libs_and_demos/img/live_speech_recognition_demo_annotated.jpg differ diff --git a/inference-engine/samples/speech_sample/README.md b/inference-engine/samples/speech_sample/README.md index 0785ee7c3f8dcc..cec758ff8c63f6 100644 --- a/inference-engine/samples/speech_sample/README.md +++ b/inference-engine/samples/speech_sample/README.md @@ -1,4 +1,4 @@ -# Automatic Speech Recognition C++ Sample +# Automatic Speech Recognition C++ Sample {#openvino_inference_engine_samples_speech_sample_README} This topic shows how to run the speech sample application, which demonstrates acoustic model inference based on Kaldi\* neural networks @@ -136,7 +136,7 @@ The following pre-trained models are available: * rm\_lstm4f * rm\_cnn4a\_smbr -All of them can be downloaded from [https://download.01.org/openvinotoolkit/models_contrib/speech/kaldi](https://download.01.org/openvinotoolkit/models_contrib/speech/kaldi) or using the OpenVINO [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) . +All of them can be downloaded from [https://download.01.org/openvinotoolkit/models_contrib/speech/kaldi](https://download.01.org/openvinotoolkit/models_contrib/speech/kaldi) or using the OpenVINO [Model Downloader](@ref omz_tools_downloader_README) . ### Speech Inference @@ -154,7 +154,7 @@ scores (`wsj_dnn5b_smbr_dev93_scores_10.ark`) corresponding to the input feature file (`wsj_dnn5b_smbr_dev93_10.ark`) are assumed to be available for comparison. -> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +> **NOTE**: Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). ## Sample Output @@ -202,6 +202,6 @@ cat out.txt | utils/int2sym.pl -f 2- words.txt | sed s:\::g | compute-wer ``` ## See Also -* [Using Inference Engine Samples](./docs/IE_DG/Samples_Overview.md) -* [Model Optimizer](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) -* [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) +* [Using Inference Engine Samples](../../../docs/IE_DG/Samples_Overview.md) +* [Model Optimizer](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) +* [Model Downloader](@ref omz_tools_downloader_README) diff --git a/inference-engine/samples/style_transfer_sample/README.md b/inference-engine/samples/style_transfer_sample/README.md index 402769f14d20f2..73f13a06890333 100644 --- a/inference-engine/samples/style_transfer_sample/README.md +++ b/inference-engine/samples/style_transfer_sample/README.md @@ -1,11 +1,11 @@ -# Neural Style Transfer C++ Sample +# Neural Style Transfer C++ Sample {#openvino_inference_engine_samples_style_transfer_sample_README} This topic demonstrates how to run the Neural Style Transfer sample application, which performs inference of style transfer models. -> **NOTE**: The OpenVINO™ toolkit does not include a pre-trained model to run the Neural Style Transfer sample. A public model from the [Zhaw's Neural Style Transfer repository](https://github.com/zhaw/neural_style) can be used. Read the [Converting a Style Transfer Model from MXNet*](./docs/MO_DG/prepare_model/convert_model/mxnet_specific/Convert_Style_Transfer_From_MXNet.md) topic from the [Model Optimizer Developer Guide](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) to learn about how to get the trained model and how to convert it to the Inference Engine format (\*.xml + \*.bin). +> **NOTE**: The OpenVINO™ toolkit does not include a pre-trained model to run the Neural Style Transfer sample. A public model from the [Zhaw's Neural Style Transfer repository](https://github.com/zhaw/neural_style) can be used. Read the [Converting a Style Transfer Model from MXNet*](../../../docs/MO_DG/prepare_model/convert_model/mxnet_specific/Convert_Style_Transfer_From_MXNet.md) topic from the [Model Optimizer Developer Guide](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) to learn about how to get the trained model and how to convert it to the Inference Engine format (\*.xml + \*.bin). -> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). +> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). ## Running @@ -41,6 +41,6 @@ To perform inference of an image using a trained model of NST network on Intel® The application outputs an image (`out1.bmp`) or a sequence of images (`out1.bmp`, ..., `out.bmp`) which are redrawn in style of the style transfer model used for sample. ## See Also -* [Using Inference Engine Samples](./docs/IE_DG/Samples_Overview.md) -* [Model Optimizer](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) -* [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) +* [Using Inference Engine Samples](../../../docs/IE_DG/Samples_Overview.md) +* [Model Optimizer](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) +* [Model Downloader](@ref omz_tools_downloader_README) diff --git a/inference-engine/tools/benchmark_tool/README.md b/inference-engine/tools/benchmark_tool/README.md index 658f4df6dca960..42222c2abb42df 100644 --- a/inference-engine/tools/benchmark_tool/README.md +++ b/inference-engine/tools/benchmark_tool/README.md @@ -1,14 +1,14 @@ -# Benchmark Python* Tool +# Benchmark Python* Tool {#openvino_inference_engine_tools_benchmark_tool_README} This topic demonstrates how to run the Benchmark Python* Tool, which performs inference using convolutional networks. Performance can be measured for two inference modes: synchronous (latency-oriented) and asynchronous (throughput-oriented). -> **NOTE:** This topic describes usage of Python implementation of the Benchmark Tool. For the C++ implementation, refer to [Benchmark C++ Tool](./inference-engine/samples/benchmark_app/README.md). +> **NOTE:** This topic describes usage of Python implementation of the Benchmark Tool. For the C++ implementation, refer to [Benchmark C++ Tool](../../samples/benchmark_app/README.md). ## How It Works Upon start-up, the application reads command-line parameters and loads a network and images/binary files to the Inference Engine plugin, which is chosen depending on a specified device. The number of infer requests and execution approach depend on the mode defined with the `-api` command-line parameter. -> **NOTE**: By default, Inference Engine samples, tools and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). +> **NOTE**: By default, Inference Engine samples, tools and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md). ### Synchronous API @@ -47,7 +47,7 @@ Notice that the benchmark_app usually produces optimal performance for any devic python3 benchmark_app.py -m -i -d CPU ``` -But it is still may be non-optimal for some cases, especially for very small networks. More details can read in [Introduction to Performance Topics](./docs/IE_DG/Intro_to_Performance.md). +But it is still may be non-optimal for some cases, especially for very small networks. More details can read in [Introduction to Performance Topics](../../../docs/IE_DG/Intro_to_Performance.md). Running the application with the `-h` or `--help`' option yields the following usage message: @@ -129,9 +129,9 @@ If a model has only image input(s), please a provide folder with images or a pat If a model has some specific input(s) (not images), please prepare a binary file(s), which is filled with data of appropriate precision and provide a path to them as input. If a model has mixed input types, input folder should contain all required files. Image inputs are filled with image files one by one. Binary inputs are filled with binary inputs one by one. -To run the tool, you can use public or Intel's pre-trained models. To download the models, use the OpenVINO [Model Downloader](./tools/downloader/README.md) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/). +To run the tool, you can use public or Intel's pre-trained models. To download the models, use the OpenVINO [Model Downloader](@ref omz_tools_downloader_README) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/). -> **NOTE**: Before running the tool with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +> **NOTE**: Before running the tool with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). ## Examples of Running the Tool @@ -195,6 +195,6 @@ Below are fragments of sample output for CPU and FPGA devices: ``` ## See Also -* [Using Inference Engine Samples](./docs/IE_DG/Samples_Overview.md) -* [Model Optimizer](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) -* [Model Downloader](./tools/downloader/README.md) +* [Using Inference Engine Samples](../../../docs/IE_DG/Samples_Overview.md) +* [Model Optimizer](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) +* [Model Downloader](@ref omz_tools_downloader_README) diff --git a/inference-engine/tools/compile_tool/README.md b/inference-engine/tools/compile_tool/README.md index e9a1eadd5be02f..fd4f0b943acb75 100644 --- a/inference-engine/tools/compile_tool/README.md +++ b/inference-engine/tools/compile_tool/README.md @@ -1,4 +1,4 @@ -# Compile Tool +# Compile Tool {#openvino_inference_engine_tools_compile_tool_README} The Compile tool is a C++ application that enables you to dump a loaded diff --git a/inference-engine/tools/cross_check_tool/README.md b/inference-engine/tools/cross_check_tool/README.md index 6ec1a873e0d238..a2f37929e4b1cd 100644 --- a/inference-engine/tools/cross_check_tool/README.md +++ b/inference-engine/tools/cross_check_tool/README.md @@ -1,4 +1,4 @@ -# Cross Check Tool +# Cross Check Tool {#openvino_inference_engine_tools_cross_check_tool_README} Cross Check Tool is a console application that enables comparing accuracy and performance metrics for two successive model inferences that are performed on two different supported Intel® devices or with different precisions. diff --git a/inference-engine/tools/vpu/vpu_compile/README.md b/inference-engine/tools/vpu/vpu_compile/README.md index 625f4346c06bcb..285b59bf5e9557 100644 --- a/inference-engine/tools/vpu/vpu_compile/README.md +++ b/inference-engine/tools/vpu/vpu_compile/README.md @@ -1,4 +1,4 @@ -# myriad_compile tool +# myriad_compile tool {#openvino_inference_engine_tools_vpu_vpu_compile_README} This topic demonstrates how to run the `myriad_compile` tool application, which intended to dump blob for `vpu` plugins of Inference Engine by configuration options. diff --git a/inference-engine/tools/vpu/vpu_profile/README.md b/inference-engine/tools/vpu/vpu_profile/README.md index f5e57fb40850e9..c21f267e09646f 100644 --- a/inference-engine/tools/vpu/vpu_profile/README.md +++ b/inference-engine/tools/vpu/vpu_profile/README.md @@ -1,4 +1,4 @@ -# vpu_profile tool +# vpu_profile tool {#openvino_inference_engine_tools_vpu_vpu_profile_README} This topic demonstrates how to run the `vpu_profile` tool application, which intended to get per layer or per stage performance statistics for vpu plugins of Inference Engine by configuration options. diff --git a/ngraph/changes.md b/ngraph/changes.md index 57212d90b9047d..32b851de73137e 100644 --- a/ngraph/changes.md +++ b/ngraph/changes.md @@ -1,5 +1,22 @@ # API Changes +## Deprecation Notice + + + + + + + + + + +
Deprecation BeginsJune 1, 2020
Removal DateDecember 1, 2020
+ +*Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit.* + +*Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware.* + ## Op Definition * Every Op class must declare a `static constexpr NodeTypeInfo type_info{name, version}` in the class definition and define it in the .cpp file. See any op definition for an example. * The boolean function `is_type` is for testing if a node is the op `T`.