A Faster Implementation of Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Please refer to Scan2Cap for the data preparation and setup details.
For submission to the Scan2Cap benchmark, run the following script to generate predictions:
python benchmark/predict.py --config outputs/XYZ_MULTIVIEW_NORMAL/VOTENET_SCAN2CAP/info.yaml --test_split test
Please compress the benchmark_test.json
as a .zip or .7z file and follow the instructions to upload your results.
Before submitting the results on the test set to the official benchmark, you can also benchmark the performance on the val set. Run the following script to generate GTs for val set first:
python scripts/build_benchmark_gt.py --split val
NOTE: don't forget to change the
DATA_ROOT
inscripts/build_benchmark_gt.py
Generate the predictions on val set:
python benchmark/predict.py --config outputs/XYZ_MULTIVIEW_NORMAL/VOTENET_SCAN2CAP/info.yaml --test_split val
Evaluate the predictions on the val set:
python benchmark/eval.py --split val --path <path to predictions> --verbose
Run the following script to start the end-to-end training of Scan2Cap model using the multiview features and normals. For more training options, please run scripts/train.py -h
:
python scripts/train.py --config config/votenet_scan2cap.yaml
The trained model as well as the intermediate results will be dumped into outputs/<output_folder>
. For evaluating the model (@0.5IoU), please run the following script and change the <output_folder>
accordingly, and note that arguments must match the ones for training:
python scripts/eval.py --config outputs/XYZ_MULTIVIEW_NORMAL/VOTENET_SCAN2CAP/info.yaml --eval_caption
Evaluating the detection performance:
python scripts/eval.py --config outputs/XYZ_MULTIVIEW_NORMAL/VOTENET_SCAN2CAP/info.yaml --eval_detection
You can even evaluate the pretraiend object detection backbone:
If you found our work helpful, please kindly cite our paper via:
@inproceedings{chen2021scan2cap,
title={Scan2Cap: Context-aware Dense Captioning in RGB-D Scans},
author={Chen, Zhenyu and Gholami, Ali and Nie{\ss}ner, Matthias and Chang, Angel X},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={3193--3203},
year={2021}
}
Scan2Cap is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Copyright (c) 2021 Dave Zhenyu Chen, Ali Gholami, Matthias Nießner, Angel X. Chang