Skip to content

Official implementation of "DepthLab: From Partial to Complete"

License

Notifications You must be signed in to change notification settings

ant-research/DepthLab

Repository files navigation

DepthLab: From Partial to Complete

This repository represents the official implementation of the paper titled "DepthLab: From Partial to Complete".

Website Paper License Hugging Face Model

Zhiheng Liu* · Ka Leong Cheng* · Qiuyu Wang · Shuzhe Wang · Hao Ouyang · Bin Tan · Kai Zhu · Yujun Shen · Qifeng Chen · Ping Luo

We present DepthLab, a robust depth inpainting foundation model that can be applied to various downstream tasks to enhance performance. Many tasks naturally contain partial depth information, such as (1) 3D Gaussian inpainting, (2) LiDAR depth completion, (3) sparse-view reconstruction with DUSt3R, and (4) text-to-scene generation. Our model leverages this known information to achieve improved depth estimation, enhancing performance in downstream tasks. We hope to motivate more related tasks to adopt DepthLab.

teaser

📢 News

  • 2024-12-25: Inference code and paper is released.
  • [To-do]: Release the training code to facilitate fine-tuning, allowing adaptation to different mask types in your downstream tasks.

🛠️ Setup

📦 Repository

Clone the repository (requires git):

git clone https://github.com/Johanan528/DepthLab.git
cd DepthLab

💻 Dependencies

Install with conda:

conda env create -f environment.yaml
conda activate DepthLab

📦 Checkpoints

Download the Marigold checkpoint here, the image encoder checkpoint here, and our checkpoints at Hugging Face. The downloaded checkpoint directory has the following structure:

.
`-- checkpoints
    |-- marigold-depth-v1-0
    |-- CLIP-ViT-H-14-laion2B-s32B-b79K
    `-- DepthLab
        |-- denoising_unet.pth
        |-- reference_unet.pth
        `-- mapping_layer.pth

🏃 Testing on your cases

📷 Prepare images, masks, known depths

Masks: PNG/JPG or Numpy, where black (0) represents the known regions, and white (1) indicates the predicted areas.

Known depths: Numpy

Images: PNG/JPG

We provide a case in 'test_cases' folder.

🎮 Run inference

cd scripts
bash infer.sh

You can find all results in output/in-the-wild_example. Enjoy!

⚙️ Inference settings

The default settings are optimized for the best result. However, the behavior of the code can be customized:

  • --denoise_steps: Number of denoising steps of each inference pass. For the original (DDIM) version, it's recommended to use 20-50 steps.
  • --processing_res: The processing resolution. For cases where the mask is sparse, such as in depth completion scenarios, it is advisable to set the 'processing_res' and the mask size to be the same in order to avoid accuracy loss in the mask due to resizing.
  • --normalize_scale: When the known depth scale cannot encompass the global scale, it is possible to reduce the normalization scale, allowing the model to better predict the depth of distant objects.
  • --strength: When set to 1, the prediction is entirely based on the model itself. When set to a value less than 1, the model is partially assisted by interpolated masked depth to some extent.
  • --blend: Whether to use Blend Diffusion, a commonly used technique in image inpainting.
  • --refine: If you want to refine depthmap of DUSt3R, or you have a full initial depthmap, turn this option on.

🌺 Acknowledgements

This project is developped on the codebase of Marigold and MagicAnimate. We appreciate their great works!

🎓 Citation

Please cite our paper:

@article{liu2024depthlab,
  title={DepthLab: From Partial to Complete},
  author={Liu, Zhiheng and Cheng, Ka Leong and Wang, Qiuyu and Wang, Shuzhe and Ouyang, Hao and Tan, Bin and Zhu, Kai and Shen, Yujun and Chen, Qifeng and Luo, Ping},
  journal={arXiv preprint arXiv:2412.18153},
  year={2024}
}