Paper | Project Page | Video
Official implementation of LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors
Hanyang Yu, Xiaoxiao Long and Ping Tan.
Abstract: We aim to address sparse-view reconstruction of a 3D scene by leveraging priors from large-scale vision models. While recent advancements such as 3D Gaussian Splatting (3DGS) have demonstrated remarkable success in 3D reconstruction, these methods typically necessitate hundreds of input images that densely capture the underlying scene, making them time-consuming and impractical for real-world applications. However, sparse-view reconstruction is inherently ill-posed and under-constrained, often resulting in inferior and incomplete outcomes. This is due to issues such as failed initialization, overfitting to input images, and a lack of detail. To mitigate these challenges, we introduce LM-Gaussian, a method capable of generating high-quality reconstructions from a limited number of images. Specifically, we propose a robust initialization module that leverages stereo priors to aid in the recovery of camera poses and the reliable initialization of point clouds. Additionally, a diffusion-based refinement is iteratively applied to incorporate image diffusion priors into the Gaussian optimization process to preserve intricate scene details. Finally, we utilize video diffusion priors to further enhance the rendered images for realistic visual effects. Overall, our approach significantly reduces the data acquisition requirements compared to previous 3DGS methods. We validate the effectiveness of our framework through experiments on various public datasets, demonstrating its potential for high- quality 360-degree scene reconstruction.
- We update the RaDe-GS part and accelerate the training process.
- Fix some known bugs.
Our method takes unposed sparse images as inputs. For example, we select 8 images from the Horse Scene to cover a 360-degree view. Initially, we utilize a Background-Aware Depth-guided Initialization Module to generate dense point clouds and camera poses (see Section IV-B). These variables act as the initialization for the Gaussian kernels. Subsequently, in the Multi-modal Regularized Gaussian Reconstruction Module (see Section IV-C), we collectively optimize the Gaussian network through depth, normal, and virtual-view regularizations. After this stage, we train a Gaussian Repair model capable of enhancing Gaussian-rendered new view images. These improved images serve as guides for the training network, iteratively restoring Gaussian details (see Section IV-D). Finally, we employ a scene enhancement module to further enhance the rendered images for realistic visual effects (see Section IV-E).
- Support 2D-GS
- Support Scaffold-gs
- Add Increamental Test pose alignment module
- Support controlnet-tile-sdxl-1.0
git clone https://github.com/hanyangyu1021/LMGaussian.git --recursive
- create an environment(LM-Gaussian is tested with CUDA 11.8, Python 3.10.12)
conda env create --file environment.yml
conda activate lmgaussian
- install submodules
pip install submodules/diff-gaussian-rasterization
pip install submodules/simple-knn/
pip install submodules/minLoRA
# tetra-nerf for Marching Tetrahedra
cd submodules/tetra-triangulation
conda install cmake
conda install conda-forge::gmp
conda install conda-forge::cgal
cmake .
# you can specify your own cuda path
# export CPATH=/usr/local/cuda-11.8/targets/x86_64-linux/include:$CPATH
make
pip install -e .
Put unposed sparse images in the './data/{dataset_name}/train/images/'
folder. Checkpoints can be found at:
./Marigold/checkpoint/marigold-depth-lcm-v1-0/
and
./Marigold/checkpoint/marigold-normals-lcm-v0-1/
python Marigold/getmonodepthnormal.py -s data/horse_8
Download the dust3r checkpoint "DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth"
and place it into
'./dust3r/checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth'
.
python dust3r/coarse_initialization.py -s data/horse_8
python stage1_360.py -s data/horse_8 -r 2 --save outputs/horse_8
To set up the model, download the following checkpoints to the ./models
folder:
Download clip-vit-large-patch14 model to ./openai
python train_repairmodel.py --exp_name outputs/controlnet_finetune/horse_8 --prompt "any prompt describe the scene" --resolution 1 --gs_dir outputs/horse_8 --data_dir data/horse_8 --bg_white
python stage2_360.py -s data/horse_8 --exp_name outputs/controlnet_finetune/horse_8 --prompt "any prompt describe the scene" --bg_white --start_checkpoint "outputs/horse_8/chkpnt6000.pth"
python render_interpolate.py -s data/horse_8 --start_checkpoint outputs/horse_8/chkpnt30000.pth
Checkpoints can be found at:
Download the checkpoints to./models/zeroscope_v2_XL/
python scene_enhance.py --model_path ./models/zeroscope_v2_XL --input_path outputs/horse8/30000_render_video.mp4
bash scripts/run_num.sh
This work is built on many amazing research works and open-source projects, thanks a lot to all the authors for sharing!
If you find our work useful in your research, please consider giving a star ⭐ and citing the following paper 📝.
@misc{yu2024lmgaussianboostsparseview3d,
title={LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors},
author={Hanyang Yu and Xiaoxiao Long and Ping Tan},
year={2024},
eprint={2409.03456},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2409.03456},
}