SamsungLabs · filaPro · Jul 21, 2021 · Jul 20, 2021 · Jul 20, 2021 · Jul 21, 2021
diff --git a/README.md b/README.md
@@ -4,6 +4,7 @@
 # ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection
 
 **News**:
+ * :fire: July, 2021. We update `ScanNet` image preprocessing both [here](https://github.com/saic-vul/imvoxelnet/pull/21) and in [mmdetection3d](https://github.com/open-mmlab/mmdetection3d/pull/696).
  * :fire: June, 2021. `ImVoxelNet` for `KITTI` is now [supported](https://github.com/open-mmlab/mmdetection3d/tree/master/configs/imvoxelnet) in [mmdetection3d](https://github.com/open-mmlab/mmdetection3d).
 
 This repository contains implementation of the monocular/multi-view 3D object detector ImVoxelNet, introduced in our paper:
@@ -38,7 +39,7 @@ We support three benchmarks based on the **SUN RGB-D** dataset.
    you should follow the instructions in [sunrgbd](data/sunrgbd). 
  * For the [PerspectiveNet](https://papers.nips.cc/paper/2019/hash/b87517992f7dce71b674976b280257d2-Abstract.html)
    benchmark with 30 object categories, the same instructions can be applied; 
-   you only need to pass `--dataset sunrgbd_monocular` when running `create_data.py`.
+   you only need to set `dataset` argument to `sunrgbd_monocular` when running `create_data.py`.
  * The [Total3DUnderstanding](https://github.com/yinyunie/Total3DUnderstanding)
    benchmark implies detecting objects of 37 categories along with camera pose and room layout estimation.
    Download the preprocessed data as 
@@ -49,38 +50,9 @@ We support three benchmarks based on the **SUN RGB-D** dataset.
    python tools/data_converter/sunrgbd_total.py
    ```
 
-**ScanNet.** Please follow instructions in [scannet](data/scannet).
-Note that `create_data.py` works with point clouds, not RGB images; thus, you should do some preprocessing before running `create_data.py`.
-1. First, you should obtain RGB images. We recommend using a script from [SensReader](https://github.com/ScanNet/ScanNet/tree/master/SensReader/python).
-2. Then, copy the camera pose `.txt` files and `.jpg` images to the `scannet/sens_reader` folder.
-3. Copy axis alignment matrix `.txt` files to the `scannet/txts` folder.
-4. Move the results of `batch_load_scannet_data.py` to the `scannet/mmdetection3d` folder. Final directory structure:
-```
-scannet
-├── sens_reader
-│   ├── scans
-│   │   ├── scene0000_00
-│   │   │   ├── out
-│   │   │   │   ├── frame-000001.color.jpg
-│   │   │   │   ├── frame-000001.pose.txt
-│   │   │   │   ├── frame-000002.color.jpg
-│   │   │   │   ├── ...
-│   │   ├── ...
-├── txts
-│   ├── scene0000_00.txt
-│   ├── ...
-├── mmdetection3d
-│   ├── scene0000_00_bbox.npy
-│   ├── scene0000_00_ins_label.npy
-│   ├── scene0000_00_sem_label.npy
-│   ├── scene0000_00_vert.npy
-│   ├── scene0000_01_bbox.npy
-│   ├── ...
-```
-Now, you may run `create_data.py` with `--dataset scannet_monocular`.
-
+For **ScanNet** please follow instructions in [scannet](data/scannet).
 For **KITTI** and **nuScenes**, please follow instructions in [getting_started.md](docs/getting_started.md).
-For `nuScenes`, set `--dataset nuscenes_monocular`.
+For `nuScenes`, set `dataset` argument to `nuscenes_monocular`.
 
 ### Getting Started
 

diff --git a/data/scannet/README.md b/data/scannet/README.md
@@ -1,23 +1,30 @@
-### Prepare ScanNet Data
+### Prepare ScanNet Data for Indoor Detection or Segmentation Task
+
 We follow the procedure in [votenet](https://github.com/facebookresearch/votenet/).
 
-1. Download ScanNet v2 data [HERE](https://github.com/ScanNet/ScanNet). Link or move the 'scans' folder to this level of directory.
+1. Download ScanNet v2 data [HERE](https://github.com/ScanNet/ScanNet). Link or move the 'scans' folder to this level of directory. If you are performing segmentation tasks and want to upload the results to its official [benchmark](http://kaldir.vc.in.tum.de/scannet_benchmark/), please also link or move the 'scans_test' folder to this directory.
+
+2. In this directory, extract point clouds and annotations by running `python batch_load_scannet_data.py`. Add the `--max_num_point 50000` flag if you only use the ScanNet data for the detection task. It will downsample the scenes to less points.
+
+3. In this directory, extract RGB image with poses by running `python extract_posed_images.py`. This step is optional. Skip it if you don't plan to use multi-view RGB images. Add `--max-images-per-scene -1` to disable limiting number of images per scene. ScanNet scenes contain up to 5000+ frames per each. After extraction, all the .jpg images require 2 Tb disk space. The recommended 300 images per scene require less then 100 Gb. For example multi-view 3d detector ImVoxelNet samples 50 and 100 images per training and test scene.
 
-2. In this directory, extract point clouds and annotations by running `python batch_load_scannet_data.py`.
+4. Enter the project root directory, generate training data by running
 
-3. Enter the project root directory, generate training data by running
 ```bash
 python tools/create_data.py scannet --root-path ./data/scannet --out-dir ./data/scannet --extra-tag scannet
 ```
 
 The overall process could be achieved through the following script
+
 ```bash
 python batch_load_scannet_data.py
+python extract_posed_images.py 
 cd ../..
 python tools/create_data.py scannet --root-path ./data/scannet --out-dir ./data/scannet --extra-tag scannet
 ```
 
 The directory structure after pre-processing should be as below
+
 ```
 scannet
 ├── scannet_utils.py
@@ -26,11 +33,26 @@ scannet
 ├── scannet_utils.py
 ├── README.md
 ├── scans
-├── scannet_train_instance_data
+├── scans_test
+├── scannet_instance_data
 ├── points
+│   ├── xxxxx.bin
 ├── instance_mask
+│   ├── xxxxx.bin
 ├── semantic_mask
+│   ├── xxxxx.bin
+├── seg_info
+│   ├── train_label_weight.npy
+│   ├── train_resampled_scene_idxs.npy
+│   ├── val_label_weight.npy
+│   ├── val_resampled_scene_idxs.npy
+├── posed_images
+│   ├── scenexxxx_xx
+│   │   ├── xxxxxx.txt
+│   │   ├── xxxxxx.jpg
+│   │   ├── intrinsic.txt
 ├── scannet_infos_train.pkl
 ├── scannet_infos_val.pkl
+├── scannet_infos_test.pkl
 
 ```
diff --git a/data/scannet/batch_load_scannet_data.py b/data/scannet/batch_load_scannet_data.py
@@ -16,58 +16,81 @@
 from load_scannet_data import export
 from os import path as osp
 
-SCANNET_DIR = 'scans'
 DONOTCARE_CLASS_IDS = np.array([])
 OBJ_CLASS_IDS = np.array(
     [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 24, 28, 33, 34, 36, 39])
 
 
-def export_one_scan(scan_name, output_filename_prefix, max_num_point,
-                    label_map_file, scannet_dir):
+def export_one_scan(scan_name,
+                    output_filename_prefix,
+                    max_num_point,
+                    label_map_file,
+                    scannet_dir,
+                    test_mode=False):
     mesh_file = osp.join(scannet_dir, scan_name, scan_name + '_vh_clean_2.ply')
     agg_file = osp.join(scannet_dir, scan_name,
                         scan_name + '.aggregation.json')
     seg_file = osp.join(scannet_dir, scan_name,
                         scan_name + '_vh_clean_2.0.010000.segs.json')
     # includes axisAlignment info for the train set scans.
     meta_file = osp.join(scannet_dir, scan_name, f'{scan_name}.txt')
-    mesh_vertices, semantic_labels, instance_labels, instance_bboxes, \
-        instance2semantic = export(mesh_file, agg_file, seg_file,
-                                   meta_file, label_map_file, None)
-
-    mask = np.logical_not(np.in1d(semantic_labels, DONOTCARE_CLASS_IDS))
-    mesh_vertices = mesh_vertices[mask, :]
-    semantic_labels = semantic_labels[mask]
-    instance_labels = instance_labels[mask]
-
-    num_instances = len(np.unique(instance_labels))
-    print(f'Num of instances: {num_instances}')
-
-    bbox_mask = np.in1d(instance_bboxes[:, -1], OBJ_CLASS_IDS)
-    instance_bboxes = instance_bboxes[bbox_mask, :]
-    print(f'Num of care instances: {instance_bboxes.shape[0]}')
-
-    N = mesh_vertices.shape[0]
-    if N > max_num_point:
-        choices = np.random.choice(N, max_num_point, replace=False)
-        mesh_vertices = mesh_vertices[choices, :]
-        semantic_labels = semantic_labels[choices]
-        instance_labels = instance_labels[choices]
+    mesh_vertices, semantic_labels, instance_labels, unaligned_bboxes, \
+        aligned_bboxes, instance2semantic, axis_align_matrix = export(
+            mesh_file, agg_file, seg_file, meta_file, label_map_file, None,
+            test_mode)
+
+    if not test_mode:
+        mask = np.logical_not(np.in1d(semantic_labels, DONOTCARE_CLASS_IDS))
+        mesh_vertices = mesh_vertices[mask, :]
+        semantic_labels = semantic_labels[mask]
+        instance_labels = instance_labels[mask]
+
+        num_instances = len(np.unique(instance_labels))
+        print(f'Num of instances: {num_instances}')
+
+        bbox_mask = np.in1d(unaligned_bboxes[:, -1], OBJ_CLASS_IDS)
+        unaligned_bboxes = unaligned_bboxes[bbox_mask, :]
+        bbox_mask = np.in1d(aligned_bboxes[:, -1], OBJ_CLASS_IDS)
+        aligned_bboxes = aligned_bboxes[bbox_mask, :]
+        assert unaligned_bboxes.shape[0] == aligned_bboxes.shape[0]
+        print(f'Num of care instances: {unaligned_bboxes.shape[0]}')
+
+    if max_num_point is not None:
+        max_num_point = int(max_num_point)
+        N = mesh_vertices.shape[0]
+        if N > max_num_point:
+            choices = np.random.choice(N, max_num_point, replace=False)
+            mesh_vertices = mesh_vertices[choices, :]
+            if not test_mode:
+                semantic_labels = semantic_labels[choices]
+                instance_labels = instance_labels[choices]
 
     np.save(f'{output_filename_prefix}_vert.npy', mesh_vertices)
-    np.save(f'{output_filename_prefix}_sem_label.npy', semantic_labels)
-    np.save(f'{output_filename_prefix}_ins_label.npy', instance_labels)
-    np.save(f'{output_filename_prefix}_bbox.npy', instance_bboxes)
-
-
-def batch_export(max_num_point, output_folder, train_scan_names_file,
-                 label_map_file, scannet_dir):
+    if not test_mode:
+        np.save(f'{output_filename_prefix}_sem_label.npy', semantic_labels)
+        np.save(f'{output_filename_prefix}_ins_label.npy', instance_labels)
+        np.save(f'{output_filename_prefix}_unaligned_bbox.npy',
+                unaligned_bboxes)
+        np.save(f'{output_filename_prefix}_aligned_bbox.npy', aligned_bboxes)
+        np.save(f'{output_filename_prefix}_axis_align_matrix.npy',
+                axis_align_matrix)
+
+
+def batch_export(max_num_point,
+                 output_folder,
+                 scan_names_file,
+                 label_map_file,
+                 scannet_dir,
+                 test_mode=False):
+    if test_mode and not os.path.exists(scannet_dir):
+        # test data preparation is optional
+        return
     if not os.path.exists(output_folder):
         print(f'Creating new data folder: {output_folder}')
         os.mkdir(output_folder)
 
-    train_scan_names = [line.rstrip() for line in open(train_scan_names_file)]
-    for scan_name in train_scan_names:
+    scan_names = [line.rstrip() for line in open(scan_names_file)]
+    for scan_name in scan_names:
         print('-' * 20 + 'begin')
         print(datetime.datetime.now())
         print(scan_name)
@@ -78,7 +101,7 @@ def batch_export(max_num_point, output_folder, train_scan_names_file,
             continue
         try:
             export_one_scan(scan_name, output_filename_prefix, max_num_point,
-                            label_map_file, scannet_dir)
+                            label_map_file, scannet_dir, test_mode)
         except Exception:
             print(f'Failed export scan: {scan_name}')
         print('-' * 20 + 'done')
@@ -88,14 +111,18 @@ def main():
     parser = argparse.ArgumentParser()
     parser.add_argument(
         '--max_num_point',
-        default=50000,
+        default=None,
         help='The maximum number of the points.')
     parser.add_argument(
         '--output_folder',
-        default='./scannet_train_instance_data',
+        default='./scannet_instance_data',
         help='output folder of the result.')
     parser.add_argument(
-        '--scannet_dir', default='scans', help='scannet data directory.')
+        '--train_scannet_dir', default='scans', help='scannet data directory.')
+    parser.add_argument(
+        '--test_scannet_dir',
+        default='scans_test',
+        help='scannet data directory.')
     parser.add_argument(
         '--label_map_file',
         default='meta_data/scannetv2-labels.combined.tsv',
@@ -104,10 +131,25 @@ def main():
         '--train_scan_names_file',
         default='meta_data/scannet_train.txt',
         help='The path of the file that stores the scan names.')
+    parser.add_argument(
+        '--test_scan_names_file',
+        default='meta_data/scannetv2_test.txt',
+        help='The path of the file that stores the scan names.')
     args = parser.parse_args()
-    batch_export(args.max_num_point, args.output_folder,
-                 args.train_scan_names_file, args.label_map_file,
-                 args.scannet_dir)
+    batch_export(
+        args.max_num_point,
+        args.output_folder,
+        args.train_scan_names_file,
+        args.label_map_file,
+        args.train_scannet_dir,
+        test_mode=False)
+    batch_export(
+        args.max_num_point,
+        args.output_folder,
+        args.test_scan_names_file,
+        args.label_map_file,
+        args.test_scannet_dir,
+        test_mode=True)
 
 
 if __name__ == '__main__':