English | 简体中文
Youquan Liu1,*
Lingdong Kong1,2,*
Jun Cen3
Runnan Chen4
Wenwei Zhang1,5
Liang Pan5
Kai Chen1
Ziwei Liu5
1Shanghai AI Laboratory
2National University of Singapore
3The Hong Kong University of Science and Technology
4The University of Hong Kong
5S-Lab, Nanyang Technological University
Seal
is a versatile self-supervised learning framework capable of segmenting any automotive point clouds by leveraging off-the-shelf knowledge from vision foundation models (VFMs) and encouraging spatial and temporal consistency from such knowledge during the representation learning stage.
- 🚀 Scalability:
Seal
directly distills the knowledge from VFMs into point clouds, eliminating the need for annotations in either 2D or 3D during pretraining. - ⚖️ Consistency:
Seal
enforces the spatial and temporal relationships at both the camera-to-LiDAR and point-to-segment stages, facilitating cross-modal representation learning. - 🌈 Generalizability:
Seal
enables knowledge transfer in an off-the-shelf manner to downstream tasks involving diverse point clouds, including those from real/synthetic, low/high-resolution, large/small-scale, and clean/corrupted datasets.
Demo 1 | Demo 2 | Demo 3 |
---|---|---|
Link |
Link |
Link |
- [2023.12] - We are hosting The RoboDrive Challenge at ICRA 2024. 🚙
- [2023.09] -
Seal
was selected as a ✨ spotlight ✨ at NeurIPS 2023. - [2023.09] -
Seal
was accepted to NeurIPS 2023! 🎉 - [2023.07] - We release the code for generating semantic superpixel & superpoint by SLIC, SAM, and SEEM. More VFMs coming on the way!
- [2023.06] - Our paper is available on arXiv, click here to check it out. Code will be available later!
- Installation
- Data Preparation
- Superpoint Generation
- Getting Started
- Main Result
- TODO List
- License
- Acknowledgement
- Citation
Please refer to INSTALL.md for the installation details.
nuScenes | SemanticKITTI | Waymo Open | ScribbleKITTI |
---|---|---|---|
RELLIS-3D | SemanticPOSS | SemanticSTF | DAPS-3D |
SynLiDAR | Synth4D | nuScenes-C | |
Please refer to DATA_PREPARE.md for the details to prepare these datasets.
Raw Point Cloud | Semantic Superpoint | Groundtruth |
---|---|---|
Kindly refer to SUPERPOINT.md for the details to generate the semantic superpixels & superpoints with vision foundation models.
Kindly refer to GET_STARTED.md to learn more usage of this codebase.
Method | nuScenes | KITTI | Waymo | Synth4D | |||||
---|---|---|---|---|---|---|---|---|---|
LP | 1% | 5% | 10% | 25% | Full | 1% | 1% | 1% | |
Random | 8.10 | 30.30 | 47.84 | 56.15 | 65.48 | 74.66 | 39.50 | 39.41 | 20.22 |
PointContrast | 21.90 | 32.50 | - | - | - | - | 41.10 | - | - |
DepthContrast | 22.10 | 31.70 | - | - | - | - | 41.50 | - | - |
PPKT | 35.90 | 37.80 | 53.74 | 60.25 | 67.14 | 74.52 | 44.00 | 47.60 | 61.10 |
SLidR | 38.80 | 38.30 | 52.49 | 59.84 | 66.91 | 74.79 | 44.60 | 47.12 | 63.10 |
ST-SLidR | 40.48 | 40.75 | 54.69 | 60.75 | 67.70 | 75.14 | 44.72 | 44.93 | - |
Seal 🦭 | 44.95 | 45.84 | 55.64 | 62.97 | 68.41 | 75.60 | 46.63 | 49.34 | 64.50 |
Method | ScribbleKITTI | RELLIS-3D | SemanticPOSS | SemanticSTF | SynLiDAR | DAPS-3D | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
1% | 10% | 1% | 10% | Half | Full | Half | Full | 1% | 10% | Half | Full | |
Random | 23.81 | 47.60 | 38.46 | 53.60 | 46.26 | 54.12 | 48.03 | 48.15 | 19.89 | 44.74 | 74.32 | 79.38 |
PPKT | 36.50 | 51.67 | 49.71 | 54.33 | 50.18 | 56.00 | 50.92 | 54.69 | 37.57 | 46.48 | 78.90 | 84.00 |
SLidR | 39.60 | 50.45 | 49.75 | 54.57 | 51.56 | 55.36 | 52.01 | 54.35 | 42.05 | 47.84 | 81.00 | 85.40 |
Seal 🦭 | 40.64 | 52.77 | 51.09 | 55.03 | 53.26 | 56.89 | 53.46 | 55.36 | 43.58 | 49.26 | 81.88 | 85.90 |
Init | Backbone | mCE | mRR | Fog | Wet | Snow | Motion | Beam | Cross | Echo | Sensor |
---|---|---|---|---|---|---|---|---|---|---|---|
Random | PolarNet | 115.09 | 76.34 | 58.23 | 69.91 | 64.82 | 44.60 | 61.91 | 40.77 | 53.64 | 42.01 |
Random | CENet | 112.79 | 76.04 | 67.01 | 69.87 | 61.64 | 58.31 | 49.97 | 60.89 | 53.31 | 24.78 |
Random | WaffleIron | 106.73 | 72.78 | 56.07 | 73.93 | 49.59 | 59.46 | 65.19 | 33.12 | 61.51 | 44.01 |
Random | Cylinder3D | 105.56 | 78.08 | 61.42 | 71.02 | 58.40 | 56.02 | 64.15 | 45.36 | 59.97 | 43.03 |
Random | SPVCNN | 106.65 | 74.70 | 59.01 | 72.46 | 41.08 | 58.36 | 65.36 | 36.83 | 62.29 | 49.21 |
Random | MinkUNet | 112.20 | 72.57 | 62.96 | 70.65 | 55.48 | 51.71 | 62.01 | 31.56 | 59.64 | 39.41 |
PPKT | MinkUNet | 105.64 | 76.06 | 64.01 | 72.18 | 59.08 | 57.17 | 63.88 | 36.34 | 60.59 | 39.57 |
SLidR | MinkUNet | 106.08 | 75.99 | 65.41 | 72.31 | 56.01 | 56.07 | 62.87 | 41.94 | 61.16 | 38.90 |
Seal 🦭 | MinkUNet | 92.63 | 83.08 | 72.66 | 74.31 | 66.22 | 66.14 | 65.96 | 57.44 | 59.87 | 39.85 |
- Initial release. 🚀
- Add license. See here for more details.
- Add video demos 🎥
- Add installation details.
- Add data preparation details.
- Support semantic superpixel generation.
- Support semantic superpoint generation.
- Add evaluation details.
- Add training details.
If you find this work helpful, please kindly consider citing our paper:
@inproceedings{liu2023segment,
title = {Segment Any Point Cloud Sequences by Distilling Vision Foundation Models},
author = {Liu, Youquan and Kong, Lingdong and Cen, Jun and Chen, Runnan and Zhang, Wenwei and Pan, Liang and Chen, Kai and Liu, Ziwei},
booktitle = {Advances in Neural Information Processing Systems},
year = {2023},
}
@misc{liu2023segment_any_point_cloud,
title = {The Segment Any Point Cloud Codebase},
author = {Liu, Youquan and Kong, Lingdong and Cen, Jun and Chen, Runnan and Zhang, Wenwei and Pan, Liang and Chen, Kai and Liu, Ziwei},
howpublished = {\url{https://github.com/youquanl/Segment-Any-Point-Cloud}},
year = {2023},
}
This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This work is developed based on the MMDetection3D codebase.
MMDetection3D is an open-source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D detection. It is a part of the OpenMMLab project developed by MMLab.
Part of this codebase has been adapted from SLidR, Segment Anything, X-Decoder, OpenSeeD, Segment Everything Everywhere All at Once, LaserMix, and Robo3D.
❤️ We thank the exceptional contributions from the above open-source repositories!