cvat-ai · nmanovic · Dec 26, 2018 · Dec 10, 2018 · Dec 13, 2018 · Dec 13, 2018
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,25 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [Unreleased]
+### Added
+- OpenVINO auto annotation: it is possible to upload a custom model and annotate images automatically.
+
+### Changed
+-
+
+### Deprecated
+-
+
+### Removed
+-
+
+### Fixed
+-
+
+### Security
+-
+
 ## [0.3.0] - 2018-12-29
 ### Added
 - Ability to copy Object URL and Frame URL via object context menu and player context menu respectively.

diff --git a/README.md b/README.md
@@ -126,6 +126,13 @@ volumes:
 ```
 You can change the share device path to your actual share. For user convenience we have defined the enviroment variable $CVAT_SHARE_URL. This variable contains a text (url for example) which will be being shown in the client-share browser.
 
+### Additional optional components
+
+- [Support for Intel OpenVINO: auto annotation](components/openvino/README.md)
+- [Analytics: management and monitoring of data annotation team](components/analytics/README.md)
+- [TF Object Detection API: auto annotation](components/tf_annotation/README.md)
+- [Support for NVIDIA GPUs](components/cuda/README.md)
+
 ## Questions
 
 CVAT usage related questions or unclear concepts can be posted in our [Gitter chat](https://gitter.im/opencv-cvat) for **quick replies** from contributors and other users.

@@ -1,5 +1,7 @@
 ## Analytics for Computer Vision Annotation Tool (CVAT)
 
+![](/cvat/apps/documentation/static/documentation/images/image097.jpg)
+
 It is possible to proxy annotation logs from client to ELK. To do that run the following command below:
 
 ### Build docker image

@@ -0,0 +1,163 @@
+## Auto annotation
+
+### Description
+
+The application will be enabled automatically if OpenVINO&trade; component is
+installed. It allows to use custom models for auto annotation. Only models in
+OpenVINO&trade; toolkit format are supported. If you would like to annotate a
+task with a custom model please convert it to the intermediate representation
+(IR) format via the model optimizer tool. See [OpenVINO documentation](https://software.intel.com/en-us/articles/OpenVINO-InferEngine) for details.
+
+### Usage
+
+To annotate a task with a custom model you need to prepare 4 files:
+1. __Model config__ (*.xml) - a text file with network configuration.
+1. __Model weights__ (*.bin) - a binary file with trained weights.
+1. __Label map__ (*.json) - a simple json file with `label_map` dictionary like
+object with string values for label numbers. Values in `label_map` should be
+exactly equal to labels for the annotation task, otherwise objects with mismatched
+labels will be ignored.
+  Example:
+    ```json
+    {
+      "label_map": {
+        "0": "background",
+        "1": "aeroplane",
+        "2": "bicycle",
+        "3": "bird",
+        "4": "boat",
+        "5": "bottle",
+        "6": "bus",
+        "7": "car",
+        "8": "cat",
+        "9": "chair",
+        "10": "cow",
+        "11": "diningtable",
+        "12": "dog",
+        "13": "horse",
+        "14": "motorbike",
+        "15": "person",
+        "16": "pottedplant",
+        "17": "sheep",
+        "18": "sofa",
+        "19": "train",
+        "20": "tvmonitor"
+      }
+    }
+    ```
+1. __Interpretation script__ (*.py) - a file used to convert net output layer
+to a predefined structure which can be processed by CVAT. This code will be run
+inside a restricted python's environment, but it's possible to use some
+builtin functions like __str, int, float, max, min, range__.
+
+   Also two variables are available in the scope:
+
+   - __detections__ - a list of dictionaries with detections for each frame:
+      * __frame_id__ - frame number
+      * __frame_height__ - frame height
+      * __frame_width__ - frame width
+      * __detections__ - output np.ndarray (See [ExecutableNetwork.infer](https://software.intel.com/en-us/articles/OpenVINO-InferEngine#inpage-nav-11-6-3) for details).
+
+   - __results__ - an instance of python class with converted results.
+     Following methods should be used to add shapes:
+     ```python
+     # xtl, ytl, xbr, ybr - expected values are float or int
+     # label - expected value is int
+     # frame_number - expected value is int
+     # attributes - dictionary of attribute_name: attribute_value pairs, for example {"confidence": "0.83"}
+     add_box(self, xtl, ytl, xbr, ybr, label, frame_number, attributes=None)
+
+     # points - list of (x, y) pairs of float or int, for example [(57.3, 100), (67, 102.7)]
+     # label - expected value is int
+     # frame_number - expected value is int
+     # attributes - dictionary of attribute_name: attribute_value pairs, for example {"confidence": "0.83"}
+     add_points(self, points, label, frame_number, attributes=None)
+     add_polygon(self, points, label, frame_number, attributes=None)
+     add_polyline(self, points, label, frame_number, attributes=None)
+     ```
+
+### Examples
+
+#### [Person-vehicle-bike-detection-crossroad-0078](https://github.com/opencv/open_model_zoo/blob/2018/intel_models/person-vehicle-bike-detection-crossroad-0078/description/person-vehicle-bike-detection-crossroad-0078.md) (OpenVINO toolkit)
+
+__Note__: Model configuration(*.xml) and weights (*.bin) are available in OpenVINO redistributable package.
+
+__Task labels__: person vehicle non-vehicle
+
+__label_map.json__:
+```json
+{
+"label_map": {
+    "1": "person",
+    "2": "vehicle",
+    "3": "non-vehicle"
+    }
+}
+```
+__Interpretation script for SSD based networks__:
+```python
+def clip(value):
+  return max(min(1.0, value), 0.0)
+
+for frame_results in detections:
+  frame_height = frame_results["frame_height"]
+  frame_width = frame_results["frame_width"]
+  frame_number = frame_results["frame_id"]
+
+  for i in range(frame_results["detections"].shape[2]):
+    confidence = frame_results["detections"][0, 0, i, 2]
+    if confidence < 0.5:
+      continue
+
+    results.add_box(
+      xtl=clip(frame_results["detections"][0, 0, i, 3]) * frame_width,
+      ytl=clip(frame_results["detections"][0, 0, i, 4]) * frame_height,
+      xbr=clip(frame_results["detections"][0, 0, i, 5]) * frame_width,
+      ybr=clip(frame_results["detections"][0, 0, i, 6]) * frame_height,
+      label=int(frame_results["detections"][0, 0, i, 1]),
+      frame_number=frame_number,
+      attributes={
+        "confidence": "{:.2f}".format(confidence),
+      },
+    )
+```
+
+
+#### [Landmarks-regression-retail-0009](https://github.com/opencv/open_model_zoo/blob/2018/intel_models/landmarks-regression-retail-0009/description/landmarks-regression-retail-0009.md) (OpenVINO toolkit)
+
+__Note__: Model configuration (.xml) and weights (.bin) are available in OpenVINO redistributable package.
+
+__Task labels__: left_eye right_eye tip_of_nose left_lip_corner right_lip_corner
+
+__label_map.json__:
+```json
+{
+  "label_map": {
+      "0": "left_eye",
+      "1": "right_eye",
+      "2": "tip_of_nose",
+      "3": "left_lip_corner",
+      "4": "right_lip_corner"
+  }
+}
+```
+__Interpretation script__:
+```python
+def clip(value):
+  return max(min(1.0, value), 0.0)
+
+for frame_results in detections:
+  frame_height = frame_results["frame_height"]
+  frame_width = frame_results["frame_width"]
+  frame_number = frame_results["frame_id"]
+
+  for i in range(0, frame_results["detections"].shape[1], 2):
+      x = frame_results["detections"][0, i, 0, 0]
+      y = frame_results["detections"][0, i + 1, 0, 0]
+
+      results.add_points(
+        points=[(clip(x) * frame_width, clip(y) * frame_height)],
+        label=i // 2, # see label map and model output specification,
+        frame_number=frame_number,
+      )
+```
@@ -0,0 +1,8 @@
+
+# Copyright (C) 2018 Intel Corporation
+#
+# SPDX-License-Identifier: MIT
+
+from cvat.settings.base import JS_3RDPARTY
+
+JS_3RDPARTY['dashboard'] = JS_3RDPARTY.get('dashboard', []) + ['auto_annotation/js/auto_annotation.js']
@@ -0,0 +1,4 @@
+
+# Copyright (C) 2018 Intel Corporation
+#
+# SPDX-License-Identifier: MIT
@@ -0,0 +1,11 @@
+
+# Copyright (C) 2018 Intel Corporation
+#
+# SPDX-License-Identifier: MIT
+
+from django.apps import AppConfig
+
+
+class AutoAnnotationConfig(AppConfig):
+    name = "auto_annotation"
+
@@ -0,0 +1,24 @@
+
+# Copyright (C) 2018 Intel Corporation
+#
+# SPDX-License-Identifier: MIT
+
+import cv2
+
+class ImageLoader():
+    def __init__(self, image_list):
+        self.image_list = image_list
+
+    def __getitem__(self, i):
+        return self.image_list[i]
+
+    def __iter__(self):
+        for imagename in self.image_list:
+            yield imagename, self._load_image(imagename)
+
+    def __len__(self):
+        return len(self.image_list)
+
+    @staticmethod
+    def _load_image(path_to_image):
+        return cv2.imread(path_to_image)
@@ -0,0 +1,5 @@
+
+# Copyright (C) 2018 Intel Corporation
+#
+# SPDX-License-Identifier: MIT
+
@@ -0,0 +1,59 @@
+
+# Copyright (C) 2018 Intel Corporation
+#
+# SPDX-License-Identifier: MIT
+
+import json
+import cv2
+import os
+import subprocess
+
+from openvino.inference_engine import IENetwork, IEPlugin
+
+class ModelLoader():
+    def __init__(self, model, weights):
+        self._model = model
+        self._weights = weights
+
+        IE_PLUGINS_PATH = os.getenv("IE_PLUGINS_PATH")
+        if not IE_PLUGINS_PATH:
+            raise OSError("Inference engine plugin path env not found in the system.")
+
+        plugin = IEPlugin(device="CPU", plugin_dirs=[IE_PLUGINS_PATH])
+        if (self._check_instruction("avx2")):
+            plugin.add_cpu_extension(os.path.join(IE_PLUGINS_PATH, "libcpu_extension_avx2.so"))
+        elif (self._check_instruction("sse4")):
+            plugin.add_cpu_extension(os.path.join(IE_PLUGINS_PATH, "libcpu_extension_sse4.so"))
+        else:
+            raise Exception("Inference engine requires a support of avx2 or sse4.")
+
+        network = IENetwork.from_ir(model=self._model, weights=self._weights)
+        supported_layers = plugin.get_supported_layers(network)
+        not_supported_layers = [l for l in network.layers.keys() if l not in supported_layers]
+        if len(not_supported_layers) != 0:
+            raise Exception("Following layers are not supported by the plugin for specified device {}:\n {}".
+                      format(plugin.device, ", ".join(not_supported_layers)))
+
+        self._input_blob_name = next(iter(network.inputs))
+        self._output_blob_name = next(iter(network.outputs))
+
+        self._net = plugin.load(network=network, num_requests=2)
+        input_type = network.inputs[self._input_blob_name]
+        self._input_layout = input_type if isinstance(input_type, list) else input_type.shape
+
+    def infer(self, image):
+        _, _, h, w = self._input_layout
+        in_frame = image if image.shape[:-1] == (h, w) else cv2.resize(image, (w, h))
+        in_frame = in_frame.transpose((2, 0, 1))  # Change data layout from HWC to CHW
+        return self._net.infer(inputs={self._input_blob_name: in_frame})[self._output_blob_name].copy()
+
+    @staticmethod
+    def _check_instruction(instruction):
+        return instruction == str.strip(
+            subprocess.check_output(
+                "lscpu | grep -o \"{}\" | head -1".format(instruction), shell=True
+            ).decode("utf-8"))
+
+def load_label_map(labels_path):
+        with open(labels_path, "r") as f:
+            return json.load(f)["label_map"]
@@ -0,0 +1,4 @@
+
+# Copyright (C) 2018 Intel Corporation
+#
+# SPDX-License-Identifier: MIT