Skip to content

Gynjn/selfsplat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting

Gyeongjin Kang · Jisang Yoo · Jihyeon Park · Seungtae Nam · Hyeonsoo Im
· Shin sangheon · Sangpil Kim · Eunbyung Park

Installation

To get started, create a virtual environment using Python 3.10:

conda create -n selfsplat python=3.10 -y
conda activate selfsplat
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

Install RoPE2D

cd src/model/encoder/croco/croco_backbone/curope
python setup.py build_ext --inplace

Acquiring Pre-trained Checkpoints and Datasets

You can find pre-trained checkpoints here, and put then ub pretrained directory

For CroCo pretrained model, download from here. Put CroCo_V2_ViTLarge_BaseDecoder.pth in checkpoints directory.

RealEstate10K and ACID

Our Code uses the same training datasets as pixelSplat. Below we quote pixelSplat's detailed instructions on getting datasets.

pixelSplat was trained using versions of the RealEstate10k and ACID datasets that were split into ~100 MB chunks for use on server cluster file systems. Small subsets of the Real Estate 10k and ACID datasets in this format can be found here. To use them, simply unzip them into a newly created datasets folder in the project root directory.

If you would like to convert downloaded versions of the Real Estate 10k and ACID datasets to our format, you can use the scripts here. Reach out to us (pixelSplat) if you want the full versions of our processed datasets, which are about 500 GB and 160 GB for Real Estate 10k and ACID respectively.

DL3DV

For DL3DV dataset we follow the DepthSplat, it provides the detailed instruction on preparing DL3DV dataset. We only used 3K and 4K subset in training.

Running the Code

Training

The main entry point is src/main.py. Call it via:

python3 -m src.main +experiment=re10k

This configuration requires a single GPU with 80 GB of VRAM (A100 or H100). To reduce memory usage, you can change the batch size as follows:

python3 -m src.main +experiment=re10k data_loader.train.batch_size=1

Our code supports multi-GPU training. The above batch size is the per-GPU batch size.

Evaluation

To render frames from an existing checkpoint, run the following:

# Real Estate 10k
python3 -m src.main +experiment=re10k mode=test checkpointing.load=pretrained/re10k.ckpt

# ACID
python3 -m src.main +experiment=acid mode=test checkpointing.load=pretrained/acid.ckpt

# DL3DV
python3 -m src.main +experiment=dl3dv mode=test checkpointing.load=pretrained/dl3dv.ckpt

Camera Conventions

Our extrinsics are OpenCV-style camera-to-world matrices. This means that +Z is the camera look vector, +X is the camera right vector, and -Y is the camera up vector. Our intrinsics are normalized, meaning that the first row is divided by image width, and the second row is divided by image height.

Acknowledgements

This project is built on top of several outstanding repositories: pixelSplat, MVSplat, DepthSplat, CoPoNeRF, CroCo, UniMatch, and gsplat. We thank the original authors for opensourcing their excellent work.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages