Support for ICDAR dataset format (#2866)

Co-authored-by: Maxim Zhiltsov <[email protected]>
cvat-ai · Mar 26, 2021 · efad0b0 · efad0b0
1 parent 30bf11f
commit efad0b0
Show file tree

Hide file tree

Showing 8 changed files with 382 additions and 14 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -30,6 +30,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - [Market-1501](https://www.aitribune.com/dataset/2018051063) format support (<https://github.com/openvinotoolkit/cvat/pull/2869>)
 - Ability of upload manifest for dataset with images (<https://github.com/openvinotoolkit/cvat/pull/2763>)
 - Annotations filters UI using react-awesome-query-builder (https://github.com/openvinotoolkit/cvat/issues/1418)
+- [ICDAR](https://rrc.cvc.uab.es/?ch=2) format support (<https://github.com/openvinotoolkit/cvat/pull/2866>)
 
 ### Changed
 

diff --git a/README.md b/README.md
@@ -65,6 +65,7 @@ For more information about supported formats look at the
 | [WIDER Face](http://shuoyang1213.me/WIDERFACE/)                               | X      | X      |
 | [VGGFace2](https://github.com/ox-vgg/vgg_face2)                               | X      | X      |
 | [Market-1501](https://www.aitribune.com/dataset/2018051063)                   | X      | X      |
+| [ICDAR13/15](https://rrc.cvc.uab.es/?ch=2)                                    | X      | X      |
 
 ## Deep learning serverless functions for automatic labeling
 

diff --git a/cvat/apps/dataset_manager/formats/README.md b/cvat/apps/dataset_manager/formats/README.md
@@ -23,6 +23,7 @@
   - [WIDER Face](#widerface)
   - [VGGFace2](#vggface2)
   - [Market-1501](#market1501)
+  - [ICDAR13/15](#icdar)
 
 ## How to add a new annotation format support<a id="how-to-add"></a>
 
@@ -817,17 +818,17 @@ Downloaded file: a zip archive of the following structure:
 ```bash
 # if we save images:
 taskname.zip/
-└── label1/
-    ├── label1_image1.jpg
-    └── label1_image2.jpg
+├── label1/
+|   ├── label1_image1.jpg
+|   └── label1_image2.jpg
 └── label2/
     ├── label2_image1.jpg
     ├── label2_image3.jpg
     └── label2_image4.jpg
 
 # if we keep only annotation:
 taskname.zip/
-└── <any_subset_name>.txt
+├── <any_subset_name>.txt
 └── synsets.txt
 
 ```
@@ -849,12 +850,12 @@ Downloaded file: a zip archive of the following structure:
 ```bash
 taskname.zip/
 ├── labelmap.txt # optional, required for non-CamVid labels
-└── <any_subset_name>/
-    ├── image1.png
-    └── image2.png
-└── <any_subset_name>annot/
-    ├── image1.png
-    └── image2.png
+├── <any_subset_name>/
+|   ├── image1.png
+|   └── image2.png
+├── <any_subset_name>annot/
+|   ├── image1.png
+|   └── image2.png
 └── <any_subset_name>.txt
 
 # labelmap.txt
@@ -974,3 +975,72 @@ s1 - sequence
 Uploaded file: a zip archive of the structure above
 
 - supported annotations: Label `market-1501` with atrributes (`query`, `person_id`, `camera_id`)
+
+### [ICDAR13/15](https://rrc.cvc.uab.es/?ch=2)<a id="icdar" />
+
+#### ICDAR13/15 Dumper
+
+Downloaded file: a zip archive of the following structure:
+
+```bash
+# word recognition task
+taskname.zip/
+└── word_recognition/
+    └── <any_subset_name>/
+        ├── images
+        |   ├── word1.png
+        |   └── word2.png
+        └── gt.txt
+# text localization task
+taskname.zip/
+└── text_localization/
+    └── <any_subset_name>/
+        ├── images
+        |   ├── img_1.png
+        |   └── img_2.png
+        ├── gt_img_1.txt
+        └── gt_img_1.txt
+#text segmentation task
+taskname.zip/
+└── text_localization/
+    └── <any_subset_name>/
+        ├── images
+        |   ├── 1.png
+        |   └── 2.png
+        ├── 1_GT.bmp
+        ├── 1_GT.txt
+        ├── 2_GT.bmp
+        └── 2_GT.txt
+```
+
+**Word recognition task**:
+
+- supported annotations: Label `icdar` with attribute `caption`
+
+**Text localization task**:
+
+- supported annotations: Rectangles and Polygons with label `icdar`
+  and attribute `text`
+
+**Text segmentation task**:
+
+- supported annotations: Rectangles and Polygons with label `icdar`
+  and attributes `index`, `text`, `color`, `center`
+
+#### ICDAR13/15 Loader
+
+Uploaded file: a zip archive of the structure above
+
+**Word recognition task**:
+
+- supported annotations: Label `icdar` with attribute `caption`
+
+**Text localization task**:
+
+- supported annotations: Rectangles and Polygons with label `icdar`
+  and attribute `text`
+
+**Text segmentation task**:
+
+- supported annotations: Rectangles and Polygons with label `icdar`
+  and attributes `index`, `text`, `color`, `center`
diff --git a/cvat/apps/dataset_manager/formats/icdar.py b/cvat/apps/dataset_manager/formats/icdar.py
@@ -0,0 +1,131 @@
+# Copyright (C) 2021 Intel Corporation
+#
+# SPDX-License-Identifier: MIT
+
+import zipfile
+from tempfile import TemporaryDirectory
+
+from datumaro.components.dataset import Dataset
+from datumaro.components.extractor import (AnnotationType, Caption, Label,
+    LabelCategories, Transform)
+
+from cvat.apps.dataset_manager.bindings import (CvatTaskDataExtractor,
+    import_dm_annotations)
+from cvat.apps.dataset_manager.util import make_zip_archive
+
+from .registry import dm_env, exporter, importer
+
+
+class AddLabelToAnns(Transform):
+    def __init__(self, extractor, label):
+        super().__init__(extractor)
+
+        assert isinstance(label, str)
+        self._categories = {}
+        label_cat = self._extractor.categories().get(AnnotationType.label)
+        if not label_cat:
+            label_cat = LabelCategories()
+        self._label = label_cat.add(label)
+        self._categories[AnnotationType.label] = label_cat
+
+    def categories(self):
+        return self._categories
+
+    def transform_item(self, item):
+        annotations = item.annotations
+        for ann in annotations:
+            if ann.type in [AnnotationType.polygon,
+                    AnnotationType.bbox, AnnotationType.mask]:
+                ann.label = self._label
+        return item.wrap(annotations=annotations)
+
+class CaptionToLabel(Transform):
+    def __init__(self, extractor, label):
+        super().__init__(extractor)
+
+        assert isinstance(label, str)
+        self._categories = {}
+        label_cat = self._extractor.categories().get(AnnotationType.label)
+        if not label_cat:
+            label_cat = LabelCategories()
+        self._label = label_cat.add(label)
+        self._categories[AnnotationType.label] = label_cat
+
+    def categories(self):
+        return self._categories
+
+    def transform_item(self, item):
+        annotations = item.annotations
+        captions = [ann for ann in annotations
+            if ann.type == AnnotationType.caption]
+        for ann in captions:
+            annotations.append(Label(self._label,
+                attributes={'text': ann.caption}))
+            annotations.remove(ann)
+        return item.wrap(annotations=annotations)
+
+class LabelToCaption(Transform):
+    def transform_item(self, item):
+        annotations = item.annotations
+        anns = [p for p in annotations
+            if 'text' in p.attributes]
+        for ann in anns:
+            annotations.append(Caption(ann.attributes['text']))
+            annotations.remove(ann)
+        return item.wrap(annotations=annotations)
+
+@exporter(name='ICDAR Recognition', ext='ZIP', version='1.0')
+def _export_recognition(dst_file, task_data, save_images=False):
+    dataset = Dataset.from_extractors(CvatTaskDataExtractor(
+        task_data, include_images=save_images), env=dm_env)
+    dataset.transform(LabelToCaption)
+    with TemporaryDirectory() as temp_dir:
+        dataset.export(temp_dir, 'icdar_word_recognition', save_images=save_images)
+        make_zip_archive(temp_dir, dst_file)
+
+@importer(name='ICDAR Recognition', ext='ZIP', version='1.0')
+def _import(src_file, task_data):
+    with TemporaryDirectory() as tmp_dir:
+        zipfile.ZipFile(src_file).extractall(tmp_dir)
+        dataset = Dataset.import_from(tmp_dir, 'icdar_word_recognition', env=dm_env)
+        dataset.transform(CaptionToLabel, 'icdar')
+        import_dm_annotations(dataset, task_data)
+
+
+@exporter(name='ICDAR Localization', ext='ZIP', version='1.0')
+def _export_localization(dst_file, task_data, save_images=False):
+    dataset = Dataset.from_extractors(CvatTaskDataExtractor(
+        task_data, include_images=save_images), env=dm_env)
+    with TemporaryDirectory() as temp_dir:
+        dataset.export(temp_dir, 'icdar_text_localization', save_images=save_images)
+        make_zip_archive(temp_dir, dst_file)
+
+@importer(name='ICDAR Localization', ext='ZIP', version='1.0')
+def _import(src_file, task_data):
+    with TemporaryDirectory() as tmp_dir:
+        zipfile.ZipFile(src_file).extractall(tmp_dir)
+
+        dataset = Dataset.import_from(tmp_dir, 'icdar_text_localization', env=dm_env)
+        dataset.transform(AddLabelToAnns, 'icdar')
+        import_dm_annotations(dataset, task_data)
+
+
+@exporter(name='ICDAR Segmentation', ext='ZIP', version='1.0')
+def _export_segmentation(dst_file, task_data, save_images=False):
+    dataset = Dataset.from_extractors(CvatTaskDataExtractor(
+        task_data, include_images=save_images), env=dm_env)
+    with TemporaryDirectory() as temp_dir:
+        dataset.transform('polygons_to_masks')
+        dataset.transform('boxes_to_masks')
+        dataset.transform('merge_instance_segments')
+        dataset.export(temp_dir, 'icdar_text_segmentation', save_images=save_images)
+        make_zip_archive(temp_dir, dst_file)
+
+@importer(name='ICDAR Segmentation', ext='ZIP', version='1.0')
+def _import(src_file, task_data):
+    with TemporaryDirectory() as tmp_dir:
+        zipfile.ZipFile(src_file).extractall(tmp_dir)
+        dataset = Dataset.import_from(tmp_dir, 'icdar_text_segmentation', env=dm_env)
+        dataset.transform(AddLabelToAnns, 'icdar')
+        dataset.transform('masks_to_polygons')
+        import_dm_annotations(dataset, task_data)
diff --git a/cvat/apps/dataset_manager/formats/registry.py b/cvat/apps/dataset_manager/formats/registry.py
@@ -98,3 +98,4 @@ def make_exporter(name):
 import cvat.apps.dataset_manager.formats.widerface
 import cvat.apps.dataset_manager.formats.vggface2
 import cvat.apps.dataset_manager.formats.market1501
+import cvat.apps.dataset_manager.formats.icdar
diff --git a/cvat/apps/dataset_manager/tests/test_formats.py b/cvat/apps/dataset_manager/tests/test_formats.py
@@ -285,6 +285,9 @@ def test_export_formats_query(self):
             'WiderFace 1.0',
             'VGGFace2 1.0',
             'Market-1501 1.0',
+            'ICDAR Recognition 1.0',
+            'ICDAR Localization 1.0',
+            'ICDAR Segmentation 1.0',
         })
 
     def test_import_formats_query(self):
@@ -306,6 +309,9 @@ def test_import_formats_query(self):
             'WiderFace 1.0',
             'VGGFace2 1.0',
             'Market-1501 1.0',
+            'ICDAR Recognition 1.0',
+            'ICDAR Localization 1.0',
+            'ICDAR Segmentation 1.0',
         })
 
     def test_exports(self):
@@ -319,7 +325,7 @@ def check(file_path):
 
             format_name = f.DISPLAY_NAME
             if format_name == "VGGFace2 1.0":
-                self.skipTest("Format does not support multiple shapes for one item")
+                self.skipTest("Format is disabled")
 
             for save_images in { True, False }:
                 images = self._generate_task_images(3)
@@ -349,6 +355,9 @@ def test_empty_images_are_exported(self):
             ('WiderFace 1.0', 'wider_face'),
             ('VGGFace2 1.0', 'vgg_face2'),
             ('Market-1501 1.0', 'market1501'),
+            ('ICDAR Recognition 1.0', 'icdar_word_recognition'),
+            ('ICDAR Localization 1.0', 'icdar_text_localization'),
+            ('ICDAR Segmentation 1.0', 'icdar_text_segmentation'),
         ]:
             with self.subTest(format=format_name):
                 if not dm.formats.registry.EXPORT_FORMATS[format_name].ENABLED: