Skip to content

Latest commit

 

History

History
118 lines (82 loc) · 5.19 KB

README.md

File metadata and controls

118 lines (82 loc) · 5.19 KB

TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision
Official PyTorch implementation

Teaser image

Abstract: In this paper, we investigate an open research task of generating controllable 3D textured shapes from the given textual descriptions. Previous works either require ground truth caption labeling or extensive optimization time. To resolve these issues, we present a novel framework, TAPS3D, to train a text-guided 3D shape generator with pseudo captions. Specifically, based on rendered 2D images, we retrieve relevant words from the CLIP vocabulary and construct pseudo captions using templates. Our constructed captions give high-level semantic supervision for generated 3D shapes. Further, in order to produce fine-grained textures and increase geometry diversity, we propose to adopt low-level image regularization to enable fake-rendered images to align with the real ones.
During the inference phase, our proposed model can generate 3D textured shapes from the given text without any additional optimization. We conduct extensive experiments to analyze each of our proposed components and show the efficacy of our framework in generating high-fidelity 3D textured and text-relevant shapes.
Teaser Results "A red car": Teaser gifs0 "A brown chair": Teaser gifs1

Requirements

  • We recommend Linux for performance and compatibility reasons.
  • 8 high-end NVIDIA GPUs. We have done all testing and development using V100 or A100 GPUs.
  • 64-bit Python 3.8 and PyTorch 1.9.0. See https://pytorch.org for PyTorch install instructions.
  • CUDA toolkit 11.1 or later. (Why is a separate CUDA toolkit installation required? We use the custom CUDA extensions from the StyleGAN3 repo. Please see Troubleshooting) .
  • CLIP from official repo
  • We also recommend to install Nvdiffrast following instructions from official repo, and install Kaolin.
  • We provide a script to install packages.

Preparation

Environment

bash install_taps3d.sh

Dataset

Please download the ShapeNetCore.v1 dataset from this link.

Render images

Please follow the instructions from GET3D to render Shapenet dataset.

Generate pseudo captions (optional)

Run pseudo caption generation script

bash generate_captions.sh IMG_ROOT 

Train the model

Clone the code and necessary files:

cd YOUR_CODE_PARH
git clone [email protected]:nv-tlabs/GET3D.git
cd GET3D; mkdir cache; cd cache
wget https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/metrics/inception-2015-12-05.pkl

Train the model

cd YOUR_CODE_PATH 
export PYTHONPATH=$PWD:$PYTHONPATH

Download the unconditional pretrained model from GET3D.

python train.py --outdir OUTPUT --num_gpus NUM_GPUS --batch_size BATCH_SIZE --batch_gpu BATCH_GPU --network PRETRAINED_MODEL --seed 1 --snap 1000 --lr LR --lambda_global 1 --lambda_direction 0 --lambda_imgcos 1 --image_root IMG_ROOT --gen_class CLASS --mask_weight 0.05 --workers 8 --tex_weight 4 --geo_weight 0.02

Inference

Inference on a pretrained model for visualization

  • Inference could operate on a single GPU with 16 GB memory.

Generate samples:

python generate_samples.py --network TRAINED_MODEL --class_id CLASS --seed 0 --outdir save_inference_results/ --text INPUT_TEXT
  • To generate mesh with textures, add one option to the inference command: --inference_to_generate_textured_mesh 1

Broader Information

TAPS3D builds upon several previous works:

Citation

@inproceedings{wei2023taps3d,
  title={TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision},
  author={Wei, Jiacheng and Wang, Hao and Feng, Jiashi and Lin, Guosheng and Yap, Kim-Hui},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={16805--16815},
  year={2023}
}