XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation

Created by Ziyi Wang*, Yanbo Wang*, Xumin Yu, Jie Zhou, Jiwen Lu.

This repository is a pyTorch implementation of our NeurIPS 2024 paper XMask3D.

XMask3D is a framework for open vocabulary 3D semantic segmentation that improves fine-grained boundary delineation by aligning 3D features with a 2D-text embedding space at the mask level. Using a mask generator based on a pre-trained diffusion model, it enables precise textual control over dense pixel representations, enhancing the versatility of generated masks. By integrating 3D global features into a 2D denoising UNet, XMask3D adds 3D geometry awareness to mask generation. The resulting 2D masks align 3D representations with vision-language features, yielding competitive segmentation performance across benchmarks.

[arXiv]

Installation

Follow the installation.md to install all required packages so you can do the training & evaluation afterwards.

Data Preparation

For convenience, the download link for the processed dataset is provided here. You can download the dataset by executing the command below.

sh scripts/download_datasets.sh

Pre-trained Model Preparation

For this project, you will need the pre-trained CLIP model and the Stable Diffusion model. Due to the instability of official network links, we provide alternative download options below:

# CLIP ViT-Large Patch14
cd /path/to/your/workspace
wget -O openai.tar.gz https://cloud.tsinghua.edu.cn/f/3890f1df1c5248a7a6e8/?dl=1
tar -xzvf openai.tar.gz
# Stable Diffusion v1.3 Checkpoint
wget -O sd_model.tar.gz https://cloud.tsinghua.edu.cn/f/8dce9b137f574e6eb57c/?dl=1
tar -xzvf sd_model.tar.gz

Usage

Training

sh run/train.sh --exp_dir=<EXPERIMENT_DIRECTORY> --config=<CONFIG_FILE>

For example, to train on the ScanNet B15N4 benchmark, run:

sh run/train.sh --exp_dir=out/exp_b15n4 --config=config/scannet/xmask3d_scannet_B15N4.yaml

Resume

sh run/resume.sh --exp_dir=<EXPERIMENT_DIRECTORY> --config=<CONFIG_FILE>

For example, to resume the last ckpt on the ScanNet B15N4 benchmark, run:

sh run/resume.sh --exp_dir=out/exp_b15n4 --config=config/scannet/xmask3d_scannet_B15N4.yaml

Inference

sh run/infer.sh --exp_dir=<EXPERIMENT_DIRECTORY> --config=<CONFIG_FILE> --ckpt_name=<CKPT_NAME>

For example, to run inference using the checkpoint b15n4.pth.tar on the ScanNet B15N4 benchmark, execute the following command:

sh run/infer.sh --exp_dir=out/exp_b15n4 --config=config/scannet/xmask3d_scannet_B15N4.yaml --ckpt_name=b15n4.pth.tar

Checkpoint

Benchmark	hIoU / mIoU_b / mIoU_n	Download Link
Scannet B15N4	70.0 / 69.8 / 70.2	[Tsinghua Cloud] [Google]
Scannet B12N7	61.7 / 70.2 / 55.1	[Tsinghua Cloud] [Google]
Scannet B10N9	55.7 / 76.5 / 43.8	[Tsinghua Cloud] [Google]
Scannet B170N30	18.0 / 27.8 / 13.3	[Tsinghua Cloud] [Google]
Scannet B150N50	15.5 / 24.4 / 11.4	[Tsinghua Cloud] [Google]

Citation

If you find our work useful in your research, please consider citing:

@article{wang2024xmask3d,
  title={XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation},
  author={Wang, Ziyi and Wang, Yanbo and Yu, Xumin and Zhou, Jie and Lu, Jiwen},
  journal={arXiv preprint arXiv:2411.13243},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
config/scannet		config/scannet
data/caption		data/caption
dataset		dataset
fig		fig
models		models
run		run
scripts		scripts
third_party/Mask2Former		third_party/Mask2Former
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
installation.md		installation.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation

Installation

Data Preparation

Pre-trained Model Preparation

Usage

Training

Resume

Inference

Checkpoint

Citation

About

Releases

Packages

Contributors 2

Languages

License

wangzy22/XMask3D

Folders and files

Latest commit

History

Repository files navigation

XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation

Installation

Data Preparation

Pre-trained Model Preparation

Usage

Training

Resume

Inference

Checkpoint

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages