Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed information about FPGA plugin #7474

Merged
merged 1 commit into from
Sep 13, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion docs/IE_DG/Glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ Glossary {#openvino_docs_IE_DG_Glossary}
| ELU | Exponential Linear rectification Unit |
| FCN | Fully Convolutional Network |
| FP | Floating Point |
| FPGA | Field-Programmable Gate Array |
| GCC | GNU Compiler Collection |
| GPU | Graphics Processing Unit |
| HD | High Definition |
Expand Down
2 changes: 0 additions & 2 deletions docs/IE_DG/InferenceEngine_QueryAPI.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,6 @@ The function returns list of available devices, for example:
```
MYRIAD.1.2-ma2480
MYRIAD.1.4-ma2480
FPGA.0
FPGA.1
CPU
GPU.0
GPU.1
Expand Down
17 changes: 4 additions & 13 deletions docs/IE_DG/supported_plugins/HETERO.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ If transmitting data from one part of a network to another part in heterogeneous
In this case, you can define heaviest part manually and set the affinity to avoid sending data back and forth many times during one inference.

## Annotation of Layers per Device and Default Fallback Policy
Default fallback policy decides which layer goes to which device automatically according to the support in dedicated plugins (FPGA, GPU, CPU, MYRIAD).
Default fallback policy decides which layer goes to which device automatically according to the support in dedicated plugins (GPU, CPU, MYRIAD).

Another way to annotate a network is to set affinity manually using <code>ngraph::Node::get_rt_info</code> with key `"affinity"`:

Expand All @@ -46,25 +46,16 @@ If you rely on the default affinity distribution, you can avoid calling <code>In
During loading of the network to heterogeneous plugin, network is divided to separate parts and loaded to dedicated plugins.
Intermediate blobs between these sub graphs are allocated automatically in the most efficient way.

## Execution Precision
Precision for inference in heterogeneous plugin is defined by
* Precision of IR.
* Ability of final plugins to execute in precision defined in IR

Examples:
* If you want to execute GPU with CPU fallback with FP16 on GPU, you need to use only FP16 IR.
* If you want to execute on FPGA with CPU fallback, you can use any precision for IR. The execution on FPGA is defined by bitstream, the execution on CPU happens in FP32.

Samples can be used with the following command:

```sh
./object_detection_sample_ssd -m <path_to_model>/ModelSSD.xml -i <path_to_pictures>/picture.jpg -d HETERO:FPGA,CPU
./object_detection_sample_ssd -m <path_to_model>/ModelSSD.xml -i <path_to_pictures>/picture.jpg -d HETERO:GPU,CPU
```
where:
- `HETERO` stands for heterogeneous plugin
- `FPGA,CPU` points to fallback policy with priority on FPGA and fallback to CPU
- `GPU,CPU` points to fallback policy with priority on GPU and fallback to CPU

You can point more than two devices: `-d HETERO:FPGA,GPU,CPU`
You can point more than two devices: `-d HETERO:GPU,GPU,CPU`

## Analyzing Heterogeneous Execution
After enabling of <code>KEY_HETERO_DUMP_GRAPH_DOT</code> config key, you can dump GraphViz* `.dot` files with annotations of devices per layer.
Expand Down
2 changes: 0 additions & 2 deletions docs/doxygen/openvino_docs.xml
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,9 @@ limitations under the License.
<tab type="usergroup" title="Installation Guides" url=""><!--automatically generated-->
<tab type="usergroup" title="Linux" url="@ref openvino_docs_install_guides_installing_openvino_linux">
<tab type="user" title="Install Intel® Distribution of OpenVINO™ toolkit for Linux* OS" url="@ref openvino_docs_install_guides_installing_openvino_linux"/>
<tab type="user" title="[DEPRECATED] Install Intel® Distribution of OpenVINO™ toolkit for Linux with FPGA Support" url="@ref openvino_docs_install_guides_installing_openvino_linux_fpga"/>
</tab>
<tab type="usergroup" title="Windows" url="@ref openvino_docs_install_guides_installing_openvino_windows">
<tab type="user" title="Install Intel® Distribution of OpenVINO™ toolkit for Windows* 10" url="@ref openvino_docs_install_guides_installing_openvino_windows"/>
<tab type="user" title="[DEPRECATED] Install Intel® Distribution of OpenVINO™ toolkit for Windows* with FPGA support" url="@ref openvino_docs_install_guides_installing_openvino_windows_fpga"/>
</tab>
<tab type="user" title="macOS" url="@ref openvino_docs_install_guides_installing_openvino_macos"/>
<tab type="user" title="Raspbian OS" url="@ref openvino_docs_install_guides_installing_openvino_raspbian"/>
Expand Down
8 changes: 0 additions & 8 deletions docs/install_guides/installing-openvino-docker-linux.md
Original file line number Diff line number Diff line change
Expand Up @@ -353,14 +353,6 @@ docker run -itu root:root --rm --device=/dev/ion:/dev/ion -v /var/tmp:/var/tmp <
/bin/bash -c "apt update && apt install sudo && deployment_tools/demo/demo_security_barrier_camera.sh -d HDDL -sample-options -no_show"
```

## Use a Docker* Image for FPGA

Intel will be transitioning to the next-generation programmable deep-learning solution based on FPGAs in order to increase the level of customization possible in FPGA deep-learning. As part of this transition, future standard releases (i.e., non-LTS releases) of Intel® Distribution of OpenVINO™ toolkit will no longer include the Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA and the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA.

Intel® Distribution of OpenVINO™ toolkit 2020.3.X LTS release will continue to support Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA and the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA. For questions about next-generation programmable deep-learning solutions based on FPGAs, please talk to your sales representative or contact us to get the latest FPGA updates.

For instructions for previous releases with FPGA Support, see documentation for the [2020.4 version](https://docs.openvinotoolkit.org/2020.4/openvino_docs_install_guides_installing_openvino_docker_linux.html#use_a_docker_image_for_fpga) or lower.

## Troubleshooting

If you got proxy issues, please setup proxy settings for Docker. See the Proxy section in the [Install the DL Workbench from Docker Hub* ](@ref workbench_docs_Workbench_DG_Run_Locally) topic.
Expand Down
21 changes: 0 additions & 21 deletions docs/install_guides/installing-openvino-linux-fpga.md

This file was deleted.

21 changes: 0 additions & 21 deletions docs/install_guides/installing-openvino-windows-fpga.md

This file was deleted.

24 changes: 3 additions & 21 deletions docs/optimization_guide/dldt_optimization_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,16 +196,6 @@ Since Intel® Movidius™ Myriad™ X Visual Processing Unit (Intel® Movidius

Intel® Vision Accelerator Design with Intel® Movidius™ VPUs requires keeping at least 32 inference requests in flight to fully saturate the device.

### FPGA <a name="fpga"></a>

Below are listed the most important tips for the efficient usage of the FPGA:

- Just like for the Intel® Movidius™ Myriad™ VPU flavors, for the FPGA, it is important to hide the communication overheads by running multiple inference requests in parallel. For examples, refer to the [Benchmark App Sample](../../inference-engine/samples/benchmark_app/README.md).
- Since the first inference iteration with FPGA is always significantly slower than the subsequent ones, make sure you run multiple iterations (all samples, except GUI-based demos, have the `-ni` or 'niter' option to do that).
- FPGA performance heavily depends on the bitstream.
- Number of the infer request per executable network is limited to five, so “channel” parallelism (keeping individual infer request per camera/video input) would not work beyond five inputs. Instead, you need to mux the inputs into some queue that will internally use a pool of (5) requests.
- In most scenarios, the FPGA acceleration is leveraged through <a href="heterogeneity">heterogeneous execution</a> with further specific tips.

## Heterogeneity <a name="heterogeneity"></a>

Heterogeneous execution (constituted by the dedicated Inference Engine [“Hetero” plugin](../IE_DG/supported_plugins/HETERO.md)) enables to schedule a network inference to the multiple devices.
Expand Down Expand Up @@ -249,23 +239,15 @@ Every Inference Engine sample supports the `-d` (device) option.
For example, here is a command to run an [Object Detection Sample SSD Sample](../../inference-engine/samples/object_detection_sample_ssd/README.md):

```sh
./object_detection_sample_ssd -m <path_to_model>/ModelSSD.xml -i <path_to_pictures>/picture.jpg -d HETERO:FPGA,CPU
./object_detection_sample_ssd -m <path_to_model>/ModelSSD.xml -i <path_to_pictures>/picture.jpg -d HETERO:GPU,CPU
```

where:

- `HETERO` stands for Heterogeneous plugin.
- `FPGA,CPU` points to fallback policy with first priority on FPGA and further fallback to CPU.

You can point more than two devices: `-d HETERO:FPGA,GPU,CPU`.

### Heterogeneous Scenarios with FPGA <a name="heterogeneous-scenarios-fpga"></a>

As FPGA is considered as an inference accelerator, most performance issues are related to the fact that due to the fallback, the CPU can be still used quite heavily.
- Yet in most cases, the CPU does only small/lightweight layers, for example, post-processing (`SoftMax` in most classification models or `DetectionOutput` in the SSD*-based topologies). In that case, limiting the number of CPU threads with [`KEY_CPU_THREADS_NUM`](../IE_DG/supported_plugins/CPU.md) config would further reduce the CPU utilization without significantly degrading the overall performance.
- Also, if you are still using OpenVINO™ toolkit version earlier than R1 2019, or if you have recompiled the Inference Engine with OpenMP (say for backward compatibility), setting the `KMP_BLOCKTIME` environment variable to something less than default 200ms (we suggest 1ms) is particularly helpful. Use `KMP_BLOCKTIME=0` if the CPU subgraph is small.
- `GPU,CPU` points to fallback policy with first priority on GPU and further fallback to CPU.

> **NOTE**: General threading tips (see <a href="#note-on-app-level-threading">Note on the App-Level Threading</a>) apply well, even when the entire topology fits the FPGA, because there is still a host-side code for data pre- and post-processing.
You can point more than two devices: `-d HETERO:GPU,MYRIAD,CPU`.

### General Tips on GPU/CPU Execution <a name="tips-on-gpu-cpu-execution"></a>

Expand Down
2 changes: 1 addition & 1 deletion docs/snippets/HETERO1.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ auto function = network.getFunction();

// This example demonstrates how to perform default affinity initialization and then
// correct affinity manually for some layers
const std::string device = "HETERO:FPGA,CPU";
const std::string device = "HETERO:GPU,CPU";

// QueryNetworkResult object contains map layer -> device
InferenceEngine::QueryNetworkResult res = core.QueryNetwork(network, device, { });
Expand Down
2 changes: 1 addition & 1 deletion docs/snippets/HETERO2.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ using namespace InferenceEngine;
//! [part2]
InferenceEngine::Core core;
auto network = core.ReadNetwork("sample.xml");
auto executable_network = core.LoadNetwork(network, "HETERO:FPGA,CPU");
auto executable_network = core.LoadNetwork(network, "HETERO:GPU,CPU");
//! [part2]
return 0;
}
17 changes: 0 additions & 17 deletions docs/snippets/dldt_optimization_guide0.cpp

This file was deleted.

2 changes: 1 addition & 1 deletion inference-engine/ie_bridges/c/include/c_api/ie_c_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -501,7 +501,7 @@ INFERENCE_ENGINE_C_API(IE_NODISCARD IEStatusCode) ie_core_get_config(const ie_co
* @brief Gets available devices for neural network inference.
* @ingroup Core
* @param core A pointer to ie_core_t instance.
* @param avai_devices The devices are returned as { CPU, FPGA.0, FPGA.1, MYRIAD }
* @param avai_devices The devices are returned as { CPU, GPU.0, GPU.1, MYRIAD }
* If there more than one device of specific type, they are enumerated with .# suffix
* @return Status code of the operation: OK(0) for success.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ from enum import Enum
supported_precisions = ['FP32', 'FP64', 'FP16', 'I64', 'U64', 'I32', 'U32',
'I16', 'I4', 'I8', 'U16', 'U4', 'U8', 'BOOL', 'BIN', 'BF16']

known_plugins = ['CPU', 'GPU', 'FPGA', 'MYRIAD', 'HETERO', 'HDDL', 'MULTI']
known_plugins = ['CPU', 'GPU', 'MYRIAD', 'HETERO', 'HDDL', 'MULTI']

layout_int_to_str_map = {0: 'ANY', 1: 'NCHW', 2: 'NHWC', 3: 'NCDHW', 4: 'NDHWC', 64: 'OIHW', 95: 'SCALAR', 96: 'C',
128: 'CHW', 192: 'HW', 193: 'NC', 194: 'CN', 200: 'BLOCKED'}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -541,7 +541,7 @@ cdef class IECore:
def get_config(self, device_name: str, config_name: str):
return self.impl.getConfig(device_name.encode(), config_name.encode())

## A list of devices. The devices are returned as \[CPU, FPGA.0, FPGA.1, MYRIAD\].
## A list of devices. The devices are returned as \[CPU, GPU.0, GPU.1, MYRIAD\].
# If there are more than one device of a specific type, they all are listed followed by a dot and a number.
@property
def available_devices(self):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ OpenVINO™ toolkit quickly deploys applications and solutions that emulate huma
OpenVINO™ toolkit:

- Enables CNN-based deep learning inference on the edge
- Supports heterogeneous execution across an Intel® CPU, Intel® Integrated Graphics, Intel® FPGA, Intel® Neural Compute Stick 2, and Intel® Vision Accelerator Design with Intel® Movidius™ VPUs
- Supports heterogeneous execution across an Intel® CPU, Intel® Integrated Graphics, Intel® Neural Compute Stick 2, and Intel® Vision Accelerator Design with Intel® Movidius™ VPUs
- Speeds time-to-market via an easy-to-use library of computer vision functions and pre-optimized kernels
- Includes optimized calls for computer vision standards, including OpenCV\* and OpenCL™

Expand Down
12 changes: 6 additions & 6 deletions inference-engine/samples/benchmark_app/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ To run the tool, you can use [public](@ref omz_models_group_public) or [Intel's]

## Examples of Running the Tool

This section provides step-by-step instructions on how to run the Benchmark Tool with the `googlenet-v1` public model on CPU or FPGA devices. As an input, the `car.png` file from the `<INSTALL_DIR>/deployment_tools/demo/` directory is used.
This section provides step-by-step instructions on how to run the Benchmark Tool with the `googlenet-v1` public model on CPU or GPU devices. As an input, the `car.png` file from the `<INSTALL_DIR>/deployment_tools/demo/` directory is used.

> **NOTE:** The Internet access is required to execute the following steps successfully. If you have access to the Internet through the proxy server only, please make sure that it is configured in your OS environment.

Expand All @@ -158,21 +158,21 @@ This section provides step-by-step instructions on how to run the Benchmark Tool
```sh
python3 mo.py --input_model <models_dir>/public/googlenet-v1/googlenet-v1.caffemodel --data_type FP32 --output_dir <ir_dir>
```
3. Run the tool with specifying the `<INSTALL_DIR>/deployment_tools/demo/car.png` file as an input image, the IR of the `googlenet-v1` model and a device to perform inference on. The following commands demonstrate running the Benchmark Tool in the asynchronous mode on CPU and FPGA devices:
3. Run the tool with specifying the `<INSTALL_DIR>/deployment_tools/demo/car.png` file as an input image, the IR of the `googlenet-v1` model and a device to perform inference on. The following commands demonstrate running the Benchmark Tool in the asynchronous mode on CPU and GPU devices:

* On CPU:
```sh
./benchmark_app -m <ir_dir>/googlenet-v1.xml -i <INSTALL_DIR>/deployment_tools/demo/car.png -d CPU -api async --progress true
```
* On FPGA:
* On GPU:
```sh
./benchmark_app -m <ir_dir>/googlenet-v1.xml -i <INSTALL_DIR>/deployment_tools/demo/car.png -d HETERO:FPGA,CPU -api async --progress true
./benchmark_app -m <ir_dir>/googlenet-v1.xml -i <INSTALL_DIR>/deployment_tools/demo/car.png -d GPU -api async --progress true
```

The application outputs the number of executed iterations, total duration of execution, latency, and throughput.
Additionally, if you set the `-report_type` parameter, the application outputs statistics report. If you set the `-pc` parameter, the application outputs performance counters. If you set `-exec_graph_path`, the application reports executable graph information serialized. All measurements including per-layer PM counters are reported in milliseconds.

Below are fragments of sample output for CPU and FPGA devices:
Below are fragments of sample output for CPU and GPU devices:

* For CPU:
```
Expand All @@ -189,7 +189,7 @@ Below are fragments of sample output for CPU and FPGA devices:
Throughput: 76.73 FPS
```

* For FPGA:
* For GPU:
```
[Step 10/11] Measuring performance (Start inference asynchronously, 5 inference requests using 4 streams for CPU, limits: 120000 ms duration)
Progress: [....................] 100% done
Expand Down
Loading