The official implementation of HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction
Authors: Shengji Tang, Weicai Ye, Peng Ye, Weihao Lin, Yang Zhou, Tao Chen and Wanli Ouyang.
2025-01-22 🏆HiSplat is accepted by ICLR 2025!
2025-01-03 🌟Add checkpoint of dataset ACID in Google Drive and Baidu Cloud.
2024-12-06 🌟Add the implementation for zero-shot testing on DTU and Replica in the 3-view setting.
Reconstructing 3D scenes from multiple viewpoints is a fundamental task in stereo vision. Recently, advances in generalizable 3D Gaussian Splatting have enabled high-quality novel view synthesis for unseen scenes from sparse input views by feed-forward predicting per-pixel Gaussian parameters without extra optimization. However, existing methods typically generate single-scale 3D Gaussians, which lack representation of both large-scale structure and texture details, resulting in mislocation and artefacts. In this paper, we propose a novel framework, HiSplat, which introduces a hierarchical manner in generalizable 3D Gaussian Splatting to construct hierarchical 3D Gaussians via a coarse-to-fine strategy. Specifically, HiSplat generates large coarse-grained Gaussians to capture largescale structures, followed by fine-grained Gaussians to enhance delicate texture details. To promote inter-scale interactions, we propose an Error Aware Module for Gaussian compensation and a Modulating Fusion Module for Gaussian repair. Our method achieves joint optimization of hierarchical representations, allowing for novel view synthesis using only two-view reference images. Comprehensive experiments on various datasets demonstrate that HiSplat significantly enhances reconstruction quality and cross-dataset generalization compared to prior singlescale methods. The corresponding ablation study and analysis of different-scale 3D Gaussians reveal the mechanism behind the effectiveness.
✅ Release basic code and checkpoints.
🔲 Release all checkpoints and more useful scripts.
🔲 Scripts for processing other datasets and private data.
To get started, you need to
1.Create an environment of HiSplat(Necessary)
2.Prepare the corresponding dataset(For training and testing; We also provide a tiny demo detaset from DTU and RealEstate10K)
3.Download the checkpoints(For training, download pretrained checkpoint of DinoV2 and Unimatch; For testing, download our checkpoint)
It is recommended to use Anaconda to create a virtual environment.
# create and initialize the virtual environment
conda create -n hisplat python=3.10
conda activate hisplat
# install pytorch
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
# install other requirements
pip install -r requirements.txt
For RealEstate10K and ACID, we use the same datasets as MVSPlat and PixelSplat. Below we quote the dataset preparation of them.
pixelSplat was trained using versions of the RealEstate10k and ACID datasets that were split into ~100 MB chunks for use on server cluster file systems. Small subsets of the Real Estate 10k and ACID datasets in this format can be found here. To use them, simply unzip them into a newly created
datasets
folder in the project root directory.
If you would like to convert downloaded versions of the Real Estate 10k and ACID datasets to our format, you can use the scripts here. Reach out to us (pixelSplat) if you want the full versions of our processed datasets, which are about 500 GB and 160 GB for Real Estate 10k and ACID respectively.
- Download the preprocessed DTU data dtu_training.rar.
- Convert DTU to chunks by running
python src/scripts/convert_dtu.py --input_dir PATH_TO_DTU --output_dir datasets/dtu
- [Optional] Generate the evaluation index by running
python src/scripts/generate_dtu_evaluation_index.py --n_contexts=N
, where N is the number of context views. (For N=2 and N=3, we have already provided our tested version under/assets
.)
For Replica, we follow Semantic-NeRF and use the provided pre-rendered Replica dataset. You can follow the scripts as below
- Download the pre-rendered Replica dataset and unzip them. We only need the pictures and camera poses in office 0-4 and room 0-2.
- Convert Replica to chunks by running
python src/scripts/convert_replica.py --input_dir PATH_TO_REPLICA --output_dir datasets/replica
Besides the offical link, we also provide link of Google Drive and Baidu Cloud for all checkpoints. Because HiSplat checkpoints includes the stat dict of optimizer, the checkpoints are about 1.04GB.
# download the pretrained weight of unimatch from official link
wget 'https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmdepth-scale1-resumeflowthings-scannet-5d9d7964.pth'
mv gmdepth-scale1-resumeflowthings-scannet-5d9d7964.pth ./checkpoints
# download the pretrained weight of DinoV2[DINOv2-B]
wget 'https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_pretrain.pth'
mv dinov2_vitb14_pretrain.pth ./checkpoints
# download our pretrained checkpoints and move it to ./checkpoints
mv hisplat_re10k.ckpt ./checkpoints
To validate the environment and show the performance, just run the following:
python demo.py +experiment=re10k mode=test output_dir=temp checkpointing.load=./checkpoints/hisplat_re10k.ckpt
It will output a video from two context images in demo/output.
To render novel views and compute evaluation metrics from a pretrained model,
-
get the pretrained models, and save them to
./checkpoints
-
run the following:
# Testing on RealEstate10K (input 2 views)
python -m src.main +experiment=re10k checkpointing.load=./hisplat_re10k.ckpt mode=test dataset/view_sampler=evaluation dataset.view_sampler.index_path=assets/evaluation_index_re10k.json test.compute_scores=true output_dir=test_re10k
# Testing on ACID (input 2 views)
python -m src.main +experiment=acid checkpointing.load=./hisplat_acid.ckpt mode=test dataset/view_sampler=evaluation dataset.view_sampler.index_path=assets/evaluation_index_acid.json test.compute_scores=true output_dir=test_acid
# Cross-dataset testing RealEstate10K -> DTU (input 2 or 3 views)
python -m src.main +experiment=dtu checkpointing.load=./hisplat_re10k.ckpt mode=test dataset/view_sampler=evaluation dataset.view_sampler.index_path=assets/evaluation_index_dtu_nctx2.json test.compute_scores=true output_dir=test_dtu
python -m src.main +experiment=dtu checkpointing.load=./hisplat_re10k.ckpt mode=test dataset/view_sampler=evaluation dataset.view_sampler.index_path=assets/evaluation_index_dtu_nctx3.json test.compute_scores=true output_dir=test_dtu dataset.view_sampler.num_context_views=3
# Cross-dataset testing RealEstate10K -> ACID (input 2 views)
python -m src.main +experiment=acid checkpointing.load=./hisplat_re10k.ckpt mode=test dataset/view_sampler=evaluation dataset.view_sampler.index_path=assets/evaluation_index_acid.json test.compute_scores=true output_dir=test_acid
# Cross-dataset testing RealEstate10K -> Replica (input 2 or 3 views)
python -m src.main +experiment=replica checkpointing.load=./hisplat_re10k.ckpt mode=test dataset/view_sampler=evaluation dataset.view_sampler.index_path=assets/evaluation_index_replica_nctx2.json test.compute_scores=true output_dir=test_replica
python -m src.main +experiment=replica checkpointing.load=./hisplat_re10k.ckpt mode=test dataset/view_sampler=evaluation dataset.view_sampler.index_path=assets/evaluation_index_replica_nctx3.json test.compute_scores=true output_dir=test_replica dataset.view_sampler.num_context_views=3
Run the following:
# Train on RealEstate10K
python -m src.main +experiment=re10k data_loader.train.batch_size=2 device=auto output_dir=EXP_SAVING_PATH trainer.val_check_interval=3000
# Train on ACID
python -m src.main +experiment=acid data_loader.train.batch_size=2 device=auto output_dir=EXP_SAVING_PATH trainer.val_check_interval=3000
Our model is trained on 8 RTX4090. The training batch size of each card is 2 and the total batch size is 16. The peak training GPU memory is about 19.7GB. You can use less or more GPUs depending on the GPU memory.
@article{tang2024hisplat,
title={HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction},
author={Tang, Shengji and Ye, Weicai and Ye, Peng and Lin, Weihao and Zhou, Yang and Chen, Tao and Ouyang, Wanli},
journal={arXiv preprint arXiv:2410.06245},
year={2024}
}
This project is based on MVSplat, PixelSplat and MVSFormer++. We owe a great deal of thanks to these three projects for their outstanding contributions!