Gyeongjin Kang
·
Jisang Yoo
·
Jihyeon Park
·
Seungtae Nam
·
Hyeonsoo Im
·
Shin sangheon
·
Sangpil Kim
·
Eunbyung Park
To get started, create a virtual environment using Python 3.10:
conda create -n selfsplat python=3.10 -y
conda activate selfsplat
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
cd src/model/encoder/croco/croco_backbone/curope
python setup.py build_ext --inplace
You can find pre-trained checkpoints here, and put then ub pretrained
directory
For CroCo pretrained model, download from here. Put CroCo_V2_ViTLarge_BaseDecoder.pth
in checkpoints
directory.
Our Code uses the same training datasets as pixelSplat. Below we quote pixelSplat's detailed instructions on getting datasets.
pixelSplat was trained using versions of the RealEstate10k and ACID datasets that were split into ~100 MB chunks for use on server cluster file systems. Small subsets of the Real Estate 10k and ACID datasets in this format can be found here. To use them, simply unzip them into a newly created
datasets
folder in the project root directory.
If you would like to convert downloaded versions of the Real Estate 10k and ACID datasets to our format, you can use the scripts here. Reach out to us (pixelSplat) if you want the full versions of our processed datasets, which are about 500 GB and 160 GB for Real Estate 10k and ACID respectively.
For DL3DV dataset we follow the DepthSplat, it provides the detailed instruction on preparing DL3DV dataset. We only used 3K and 4K subset in training.
The main entry point is src/main.py
. Call it via:
python3 -m src.main +experiment=re10k
This configuration requires a single GPU with 80 GB of VRAM (A100 or H100). To reduce memory usage, you can change the batch size as follows:
python3 -m src.main +experiment=re10k data_loader.train.batch_size=1
Our code supports multi-GPU training. The above batch size is the per-GPU batch size.
To render frames from an existing checkpoint, run the following:
# Real Estate 10k
python3 -m src.main +experiment=re10k mode=test checkpointing.load=pretrained/re10k.ckpt
# ACID
python3 -m src.main +experiment=acid mode=test checkpointing.load=pretrained/acid.ckpt
# DL3DV
python3 -m src.main +experiment=dl3dv mode=test checkpointing.load=pretrained/dl3dv.ckpt
Our extrinsics are OpenCV-style camera-to-world matrices. This means that +Z is the camera look vector, +X is the camera right vector, and -Y is the camera up vector. Our intrinsics are normalized, meaning that the first row is divided by image width, and the second row is divided by image height.
This project is built on top of several outstanding repositories: pixelSplat, MVSplat, DepthSplat, CoPoNeRF, CroCo, UniMatch, and gsplat. We thank the original authors for opensourcing their excellent work.