[Paper] [Project Page]
Updates:
-
[Aug 6, 2024] To access the raw data for Figures 5, 7, and 10, please download our submission from arXiv. You can do this by clicking the [Download source] button at https://arxiv.org/format/2306.15667. The submission includes the following files:
plots_co3dv2.tex
,plots_re10k.tex
, andplots_co3dv2_suppl.tex
. These files use data fromcsvs.tex
andcsvs_suppl.tex
, which are also included in the LaTeX submission source, to generate the figures. The split we used for Re10k can be found at re10k_test_1800.txt. -
[Apr 24, 2024] You may also have an interest in VGGSfM, where a model similar to PoseDiffusion is used as the camera predictor. It also supports to optimize camera parameters through bundle adjustment.
-
[Apr 24, 2024] Updated the checkpoint for RealEstate10K dataset.
We provide a simple installation script that, by default, sets up a conda environment with Python 3.9, PyTorch 1.13, and CUDA 11.6.
source install.sh
You can download the model checkpoint trained on Co3D or RealEstate10K. Please note that the checkpoint for RealEstate10K was re-trained on an image size of 336 (you need to change the image_size
in the coorresponding config). The predicted camera poses and focal lengths are defined in NDC coordinate.
python demo.py image_folder="samples/apple" ckpt="/PATH/TO/DOWNLOADED/CKPT"
You can experiment with your own data by specifying a different image_folder
.
On a Quadro GP100 GPU, the inference time for a 20-frame sequence is approximately 0.8 seconds without GGS and around 80 seconds with GGS (including 20 seconds for matching extraction).
You can choose to enable or disable GGS (or other settings) in ./cfgs/default.yaml
.
We use Visdom by default for visualization. Ensure your Visdom settings are correctly configured to visualize the results accurately. However, Visdom is not necessary for running the model.
Start by following the instructions here to preprocess the annotations of the Co3D V2 dataset. This will significantly reduce data processing time during training.
Next, specify the paths for CO3D_DIR
and CO3D_ANNOTATION_DIR
in ./cfgs/default_train.yaml
. CO3D_DIR
should be set to the path where your downloaded Co3D dataset is located, while CO3D_ANNOTATION_DIR
should point to the location of the annotation files generated after completing the preprocessing in step 1.
-
For 1-GPU Training:
python train.py
-
For multi-GPU training, launch the training script using accelerate, e.g., training on 8 GPUs (processes) in 1 node (machines):
accelerate launch --num_processes=8 --multi_gpu --num_machines=1 train.py
All configurations are specified inside ./cfgs/default_train.yaml
. Please notice that we use Visdom to record logs.
For each iteration, the training should take around 1~3 seconds depending on difference devices. You can check it by looking at the sec/it
of the log. The whole training should take around 2-3 days on 8 A100 GPUs.
NOTE: In some clusters we found the publicly released training code can be super slow when using multiple GPUs. This looks because that accelerate
does not work well under some settings and hence the data loading is very slow (if not hangs out). The simplest solution is to remove accelerate (accelerator)
from the code, and use pytorch's own distributed trainer or pytorch-lighting to launch the training. This problem does not affect single GPU training. Please submit an issue if you observed a higher number or report your case here (this should be related to the data loading of accelerate
, a simple solution is to use pytorch's own distributed training).
Please specify the paths CO3D_DIR
, CO3D_ANNOTATION_DIR
, and resume_ckpt
in ./cfgs/default_test.yaml
. The flag resume_ckpt
refers to your downloaded model checkpoint.
python test.py
You can check different testing settings by adjusting num_frames
, GGS.enable
, and others in ./cfgs/default_test.yaml
.
Thanks for the great implementation of denoising-diffusion-pytorch, guided-diffusion, hloc, relpose.
See the LICENSE file for details about the license under which this code is made available.