Docs: model caching page update according to OpenVINO API 2.0 (#10981)

openvinotoolkit · Mar 16, 2022 · 7cea7dd · 7cea7dd
1 parent 2687f6f
commit 7cea7dd
Show file tree

Hide file tree

Showing 9 changed files with 172 additions and 146 deletions.
diff --git a/docs/OV_Runtime_UG/Model_caching_overview.md b/docs/OV_Runtime_UG/Model_caching_overview.md
@@ -1,135 +1,120 @@
 # Model Caching Overview {#openvino_docs_IE_DG_Model_caching_overview}
 
-## Introduction (C++)
+## Introduction
 
-@sphinxdirective
-.. raw:: html
-
-    <div id="switcher-cpp" class="switcher-anchor">C++</div>
-@endsphinxdirective
-
-As described in the [OpenVINO™ Runtime User Guide](openvino_intro.md), a common application flow consists of the following steps:
+As described in the [Integrate OpenVINO™ with Your Application](integrate_with_your_application.md), a common application flow consists of the following steps:
 
-1. **Create a Core object**: First step to manage available devices and read network objects
+1. **Create a Core object**: First step to manage available devices and read model objects
 
-2. **Read the Intermediate Representation**: Read an Intermediate Representation file into an object of the `InferenceEngine::CNNNetwork`
+2. **Read the Intermediate Representation**: Read an Intermediate Representation file into an object of the `ov::Model`
 
 3. **Prepare inputs and outputs**: If needed, manipulate precision, memory layout, size or color format
 
 4. **Set configuration**: Pass device-specific loading configurations to the device
 
-5. **Compile and Load Network to device**: Use the `InferenceEngine::Core::LoadNetwork()` method with a specific device
+5. **Compile and Load Network to device**: Use the `ov::Core::compile_model()` method with a specific device
 
-6. **Set input data**: Specify input blob
+6. **Set input data**: Specify input tensor
 
 7. **Execute**: Carry out inference and process results
 
 Step 5 can potentially perform several time-consuming device-specific optimizations and network compilations,
 and such delays can lead to a bad user experience on application startup. To avoid this, some devices offer
 import/export network capability, and it is possible to either use the [Compile tool](../../tools/compile_tool/README.md)
-or enable model caching to export compiled network automatically. Reusing cached networks can significantly reduce load network time.
+or enable model caching to export compiled model automatically. Reusing cached model can significantly reduce compile model time.
 
-### Set "CACHE_DIR" config option to enable model caching
+### Set "cache_dir" config option to enable model caching
 
 To enable model caching, the application must specify a folder to store cached blobs, which is done like this:
 
-@snippet snippets/InferenceEngine_Caching0.cpp part0
-
-With this code, if the device specified by `LoadNetwork` supports import/export network capability, a cached blob is automatically created inside the `myCacheFolder` folder.
-CACHE_DIR config is set to the Core object. If the device does not support import/export capability, cache is not created and no error is thrown.
-
-Depending on your device, total time for loading network on application startup can be significantly reduced.
-Also note that the very first LoadNetwork (when cache is not yet created) takes slightly longer time to "export" the compiled blob into a cache file:
-
-![caching_enabled]
+@sphinxdirective
 
-### Even faster: use LoadNetwork(modelPath)
+.. tab:: C++
 
-In some cases, applications do not need to customize inputs and outputs every time. Such an application always
-call `cnnNet = ie.ReadNetwork(...)`, then `ie.LoadNetwork(cnnNet, ..)` and it can be further optimized.
-For these cases, the 2021.4 release introduces a more convenient API to load the network in a single call, skipping the export step:
+      .. doxygensnippet:: docs/snippets/ov_caching.cpp
+         :language: cpp
+         :fragment: [ov:caching:part0]
 
-@snippet snippets/InferenceEngine_Caching1.cpp part1
+.. tab:: Python
 
-With model caching enabled, total load time is even smaller, if ReadNetwork is optimized as well.
+      .. doxygensnippet:: docs/snippets/ov_caching.py
+         :language: python
+         :fragment: [ov:caching:part0]
 
-@snippet snippets/InferenceEngine_Caching2.cpp part2
+@endsphinxdirective
 
-![caching_times]
+With this code, if the device specified by `device_name` supports import/export model capability, a cached blob is automatically created inside the `/path/to/cache/dir` folder.
+If the device does not support import/export capability, cache is not created and no error is thrown.
 
-### Advanced Examples
+Depending on your device, total time for compiling model on application startup can be significantly reduced.
+Also note that the very first `compile_model` (when cache is not yet created) takes slightly longer time to "export" the compiled blob into a cache file:
 
-Not every device supports network import/export capability. For those that don't, enabling caching has no effect.
-To check in advance if a particular device supports model caching, your application can use the following code:
+![caching_enabled]
 
-@snippet snippets/InferenceEngine_Caching3.cpp part3
+### Even faster: use compile_model(modelPath)
 
-## Introduction (Python)
+In some cases, applications do not need to customize inputs and outputs every time. Such application always
+call `model = core.read_model(...)`, then `core.compile_model(model, ..)` and it can be further optimized.
+For these cases, there is a more convenient API to compile the model in a single call, skipping the read step:
 
 @sphinxdirective
-.. raw:: html
 
-    <div id="switcher-python" class="switcher-anchor">Python</div>
-@endsphinxdirective
+.. tab:: C++
 
-As described in OpenVINO User Guide, a common application flow consists of the following steps:
+      .. doxygensnippet:: docs/snippets/ov_caching.cpp
+         :language: cpp
+         :fragment: [ov:caching:part1]
 
-1. **Create a Core Object**
-2. **Read the Intermediate Representation** - Read an Intermediate Representation file into an object of the [ie_api.IENetwork](api/ie_python_api/_autosummary/openvino.inference_engine.IENetwork.html)
-3. **Prepare inputs and outputs**
-4. **Set configuration** - Pass device-specific loading configurations to the device
-5. **Compile and Load Network to device** - Use the `IECore.load_network()` method and specify the target device
-6. **Set input data**
-7. **Execute the model** - Run inference
+.. tab:: Python
 
-Step #5 can potentially perform several time-consuming device-specific optimizations and network compilations, and such delays can lead to bad user experience on application startup. To avoid this, some devices offer Import/Export network capability, and it is possible to either use the [Compile tool](../../tools/compile_tool/README.md) or enable model caching to export the compiled network automatically. Reusing cached networks can significantly reduce load network time.
+      .. doxygensnippet:: docs/snippets/ov_caching.py
+         :language: python
+         :fragment: [ov:caching:part1]
 
-### Set the “CACHE_DIR” config option to enable model caching
+@endsphinxdirective
 
-To enable model caching, the application must specify the folder where to store cached blobs. It can be done using [IECore.set_config](api/ie_python_api/_autosummary/openvino.inference_engine.IECore.html#openvino.inference_engine.IECore.set_config).
+With model caching enabled, total load time is even smaller, if `read_model` is optimized as well.
 
-``` python
-from openvino.inference_engine import IECore
+@sphinxdirective
 
-ie = IECore()
-ie.set_config(config={"CACHE_DIR": path_to_cache}, device_name=device)
-net = ie.read_network(model=path_to_xml_file)
-exec_net = ie.load_network(network=net, device_name=device)
-```
+.. tab:: C++
 
-With this code, if a device supports the Import/Export network capability, a cached blob is automatically created inside the path_to_cache directory `CACHE_DIR` config is set to the Core object. If device does not support Import/Export capability, cache is just not created and no error is thrown
+      .. doxygensnippet:: docs/snippets/ov_caching.cpp
+         :language: cpp
+         :fragment: [ov:caching:part2]
 
-Depending on your device, total time for loading network on application startup can be significantly reduced. Please also note that very first [IECore.load_network](api/ie_python_api/_autosummary/openvino.inference_engine.IECore.html#openvino.inference_engine.IECore.load_network) (when the cache is not yet created) takes slightly longer time to ‘export’ the compiled blob into a cache file.
+.. tab:: Python
 
-![caching_enabled]
+      .. doxygensnippet:: docs/snippets/ov_caching.py
+         :language: python
+         :fragment: [ov:caching:part2]
 
+@endsphinxdirective
+
+![caching_times]
 
-### Even Faster: Use IECore.load_network(path_to_xml_file)
+### Advanced Examples
 
-In some cases, applications do not need to customize inputs and outputs every time. These applications always call [IECore.read_network](api/ie_python_api/_autosummary/openvino.inference_engine.IECore.html#openvino.inference_engine.IECore.read_network), then `IECore.load_network(model=path_to_xml_file)` and may be further optimized. For such cases, it's more convenient to load the network in a single call to `ie.load_network()`
-A model can be loaded directly to the device, with model caching enabled:
+Not every device supports network import/export capability. For those that don't, enabling caching has no effect.
+To check in advance if a particular device supports model caching, your application can use the following code:
 
-``` python
-from openvino.inference_engine import IECore
+@sphinxdirective
 
-ie = IECore()
-ie.set_config(config={"CACHE_DIR" : path_to_cache}, device_name=device)
-ie.load_network(network=path_to_xml_file, device_name=device)
-```
+.. tab:: C++
 
-![caching_times]
+      .. doxygensnippet:: docs/snippets/ov_caching.cpp
+         :language: cpp
+         :fragment: [ov:caching:part3]
 
-### Advanced Examples
+.. tab:: Python
 
-Not every device supports network import/export capability, enabling of caching for such devices does not have any effect. To check in advance if a particular device supports model caching, your application can use the following code:
+      .. doxygensnippet:: docs/snippets/ov_caching.py
+         :language: python
+         :fragment: [ov:caching:part3]
 
-```python
-all_metrics = ie.get_metric(device_name=device, metric_name="SUPPORTED_METRICS")
-# Find the 'IMPORT_EXPORT_SUPPORT' metric in supported metrics
-allows_caching = "IMPORT_EXPORT_SUPPORT" in all_metrics
-```
+@endsphinxdirective
 
-> **NOTE**: The GPU plugin does not have the IMPORT_EXPORT_SUPPORT capability, and does not support model caching yet. However, the GPU plugin supports caching kernels (see the [GPU plugin documentation](supported_plugins/GPU.md)). Kernel caching for the GPU plugin can be accessed the same way as model caching: by setting the `CACHE_DIR` configuration key to a folder where the cache should be stored.
+> **NOTE**: The GPU plugin does not have the EXPORT_IMPORT capability, and does not support model caching yet. However, the GPU plugin supports caching kernels (see the [GPU plugin documentation](supported_plugins/GPU.md)). Kernel caching for the GPU plugin can be accessed the same way as model caching: by setting the `CACHE_DIR` configuration key to a folder where the cache should be stored.
 
 
 [caching_enabled]: ../img/caching_enabled.png

diff --git a/docs/img/caching_enabled.png b/docs/img/caching_enabled.png
diff --git a/docs/img/caching_times.png b/docs/img/caching_times.png
diff --git a/docs/snippets/InferenceEngine_Caching0.cpp b/docs/snippets/InferenceEngine_Caching0.cpp
diff --git a/docs/snippets/InferenceEngine_Caching1.cpp b/docs/snippets/InferenceEngine_Caching1.cpp
diff --git a/docs/snippets/InferenceEngine_Caching2.cpp b/docs/snippets/InferenceEngine_Caching2.cpp
diff --git a/docs/snippets/InferenceEngine_Caching3.cpp b/docs/snippets/InferenceEngine_Caching3.cpp
diff --git a/docs/snippets/ov_caching.cpp b/docs/snippets/ov_caching.cpp
@@ -0,0 +1,69 @@
+#include <openvino/runtime/core.hpp>
+
+void part0() {
+    std::string modelPath = "/tmp/myModel.xml";
+    std::string device = "GNA";
+    ov::AnyMap config;
+//! [ov:caching:part0]
+ov::Core core;                                              // Step 1: create ov::Core object
+core.set_property(ov::cache_dir("/path/to/cache/dir"));     // Step 1b: Enable caching
+auto model = core.read_model(modelPath);                    // Step 2: Read Model
+//...                                                       // Step 3: Prepare inputs/outputs
+//...                                                       // Step 4: Set device configuration
+auto compiled = core.compile_model(model, device, config);  // Step 5: LoadNetwork
+//! [ov:caching:part0]
+    if (!compiled) {
+        throw std::runtime_error("error");
+    }
+}
+
+void part1() {
+    std::string modelPath = "/tmp/myModel.xml";
+    std::string device = "GNA";
+    ov::AnyMap config;
+//! [ov:caching:part1]
+ov::Core core;                                                  // Step 1: create ov::Core object
+auto compiled = core.compile_model(modelPath, device, config);  // Step 2: Compile model by file path
+//! [ov:caching:part1]
+    if (!compiled) {
+        throw std::runtime_error("error");
+    }
+}
+
+void part2() {
+    std::string modelPath = "/tmp/myModel.xml";
+    std::string device = "GNA";
+    ov::AnyMap config;
+//! [ov:caching:part2]
+ov::Core core;                                                  // Step 1: create ov::Core object
+core.set_property(ov::cache_dir("/path/to/cache/dir"));         // Step 1b: Enable caching
+auto compiled = core.compile_model(modelPath, device, config);  // Step 2: Compile model by file path
+//! [ov:caching:part2]
+    if (!compiled) {
+        throw std::runtime_error("error");
+    }
+}
+
+void part3() {
+    std::string deviceName = "GNA";
+    ov::AnyMap config;
+    ov::Core core;
+//! [ov:caching:part3]
+// Get list of supported device capabilities
+std::vector<std::string> caps = core.get_property(deviceName, ov::device::capabilities);
+
+// Find 'EXPORT_IMPORT' capability in supported capabilities
+bool cachingSupported = std::find(caps.begin(), caps.end(), ov::device::capability::EXPORT_IMPORT) != caps.end();
+//! [ov:caching:part3]
+    if (!cachingSupported) {
+        throw std::runtime_error("GNA should support model caching");
+    }
+}
+
+int main() {
+    part0();
+    part1();
+    part2();
+    part3();
+    return 0;
+}
diff --git a/docs/snippets/ov_caching.py b/docs/snippets/ov_caching.py
@@ -0,0 +1,36 @@
+# Copyright (C) 2018-2022 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+#
+
+from openvino.runtime import Core
+
+device_name = 'GNA'
+xml_path = '/tmp/myModel.xml'
+# ! [ov:caching:part0]
+core = Core()
+core.set_property({'CACHE_DIR': '/path/to/cache/dir'})
+model = core.read_model(model=xml_path)
+compiled_model = core.compile_model(model=model, device_name=device_name)
+# ! [ov:caching:part0]
+
+assert compiled_model
+
+# ! [ov:caching:part1]
+core = Core()
+compiled_model = core.compile_model(model_path=xml_path, device_name=device_name)
+# ! [ov:caching:part1]
+
+assert compiled_model
+
+# ! [ov:caching:part2]
+core = Core()
+core.set_property({'CACHE_DIR': '/path/to/cache/dir'})
+compiled_model = core.compile_model(model_path=xml_path, device_name=device_name)
+# ! [ov:caching:part2]
+
+assert compiled_model
+
+# ! [ov:caching:part3]
+# Find 'EXPORT_IMPORT' capability in supported capabilities
+caching_supported = 'EXPORT_IMPORT' in core.get_property(device_name, 'OPTIMIZATION_CAPABILITIES')
+# ! [ov:caching:part3]