This repository contains the code for the weakly-supervised shape completion method, called amortized maximum likelihood (AML), described in:
@article{Stutz2018ARXIV,
author = {David Stutz and Andreas Geiger},
title = {Learning 3D Shape Completion under Weak Supervision},
journal = {CoRR},
volume = {abs/1805.07290},
year = {2018},
url = {http://arxiv.org/abs/1805.07290},
}
If you use this code for your research, please cite the paper.
This work is an extension of [1] and davidstutz/daml-shape-completion. The extension improves visual quality of the completed shapes and increases their variety. Additionally, the experiments are based on improved benchmarks, see Data.
[1] David Stutz, Andreas Geiger.
Learning 3D Shape Completion from Laser Scan Data with Weak Supervision.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
The paper proposes a weakly-supervised approach to shape completion. In particular, a denoising variational auto-encoder (VAE) is trained to learn a shape prior - on a set of synthetic shapes from ShapeNet or ModelNet. The generative model (the decoder) is then fixed and a new recognition model (encoder) is trained to embed observations in the same latent shape space. The encoder can be trained in an unsupervised fashion
- as we know the object category, the approach can be described as weakly-supervised. In particular, the encoder predicts Gaussian distributions that match the prior on the latent space (a unit Gaussian) and simultaneously minimizes the loss between generated shape and observations. The overall approach can be described as amortized maximum likelihood (AML), as the encoder is trained to minimize a maximum likelihood loss. As shape representation, occupancy grids and signed distance functions are used.
In this repository we provide our implementation of the amortized maximum likelihood approach, two supervised baselines (including [5]), and two data-driven baselines (including [6]):
[5] Angela Dai, Charles Ruizhongtai Qi, Matthias Nießner.
Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis.
CoRR abs/1612.00101 (2016).
[6] Francis Engelmann, Jörg Stückler, Bastian Leibe:
Joint Object Pose Estimation and Shape Reconstruction in Urban Street Scenes Using 3D Shape Priors. GCPR 2016: 219-230
This paragraph briefly summarizes the difference of this code to the code corresponding to [1] which can be found in davidstutz/daml-shape-completion. First of all, the code has been adapted to the improvsed datasets as described in the paper and downloadable in Data. This is mainly due to an improved data pipeline allowing to obtain higher-quality watertight meshes through TSDF fusion (the corresponding implementation can be found at davidstutz/mesh-fusion). Additionally, datasets for higher resolutions are provided. Regarding the proposed approach, we improve noise handling and additionally try to enforce more variety and details in the completed shapes. To this end, the encoder trained for shape inference does not predict a deterministic maximum likelihood solution anymore, but predicts a Gaussian distribution (similar to the VAE shape prior). In the loss, the quadratic regularizer is then replaced by a Kullback-Leibler divergence which ensures that the encoder predicts a variety of different codes while still fitting the shapes to the observations. Finally, more comprehensive experiments on ModelNet have been conducted and additional baselines have been considered.
Why do we provide both repositories separately?
Our training procedure as well as the architectures and the training data changed significantly. In order to provide reproducible results, we provide the code and data separately, although the underlying code base might have some overlap.
LUA/Torch requirements:
- Torch (torch/distro recommended);
- deepmind/torch-hdf5;
- harningt/luajson;
- luafilesystem;
- clementfarabet/lua---nnx (already included in torch/distro);
- nicholas-leonard/cunnx;
- davidstutz/torch-volumetric-nnup (see the installation instructions in the repository).
Installing deepmind/torch-hdf5 might be tricky. After building orch-hdf5,
git clone https://github.com/deepmind/torch-hdf5
cd torch-hdf5
luarocks make hdf5-0-0.rockspec
cd ..
it might be necessary to adapt the configuration in case you installed
HDF5 locally. For example, when installing hdf5 locally in DB_PATH
,
torch/install/share/lua/5.1/hdf5/config.lua
might need
to be adapted as follows:
require('os')
db_path = os.getenv("DB_PATH")
hdf5._config = {
HDF5_INCLUDE_PATH = db_path .. "/hdf5/hdf5/include/",
HDF5_LIBRARIES = db_path .. "/hdf5/hdf5/lib/libhdf5_cpp.so;" .. db_path .. "/hdf5/hdf5/lib/libhdf5.so;/usr/lib/x86_64-linux-gnu/libpthread.so;/usr/lib/x86_64-linux-gnu/libz.so;/usr/lib/x86_64-linux-gnu/libdl.so;/usr/lib/x86_64-linux-gnu/libm.so"
}
Make sure that nnx
, cunnx
and the volumetric nearest neighbor
upsampling layer works by following the instructions in
davidstutz/torch-volumetric-nnup.
The remaining packages can easily be installed using luarocks. You can run
th check_requirements.lua
to check the packages listed above.
Pyton requirements:
- NumPy;
- h5py;
- PyMCubes (make sure to use the
voxel_center
branch).
For installing PyMCubes, follow the instructions here;
NumPy and h5py can be installed using pip
and might themselves
have dependencies.
We also include an implementation of the method by Engelmann et al. [1].
[1] Francis Engelmann, Jörg Stückler, Bastian Leibe:
Joint Object Pose Estimation and Shape Reconstruction in Urban Street Scenes Using 3D Shape Priors. GCPR 2016: 219-230
First, make sure to install:
- OpenCV 2.4.13.x;
- VTK 7.1.x;
- Ceres:
- Eigen3;
- GLog;
- GFlags;
- (Suitesparse);
The installation of Ceres might be a bit annoying; so we provide our
installation scripts for some of the dependencies in rw/dependencies
for further details.
These still need to be adapted (e.g. they assume that dependencies as well
as Ceres are installed in $WORK/dev-box
where $WORK
is a defined base directory).
Note that SuiteSparse is optional but significantly reduces runtime
(by roughly factor 2-4).
When all dependencies are installed, make sure to adapt the corresponding
CMake files in rw/cmake_modules
. This means removing NO_CMAKE_SYSTEM_PATH
if necessary and inserting the correct paths to the installations.
Then:
cd rw/external/viz
mkdir build
cd build
cmake ..
make
# make sure VIZ is built correctly
cd ../../
mkdir build
cd build
cmake ..
make
# make sure KittiShapePrior and ShapeNetShapePrior are built correctly
Also make sure to download the pre-trained PCA shape prior from VisualComputingInstitute/ShapePriors_GCPR16.
For the C++ implementation of the evaluation tool (mesh-to-mesh and point-to-mesh distances), follow the instructions here: davidstutz/mesh-evaluation; essentially, the tool requires:
- CMake;
- Boost;
- Eigen;
- OpenMP;
- C++11.
Make sure to adapt the corresponding CMake modules, then run:
cd mesh-evaluation
mkdir build
cd build
cmake ..
make
For building the ICP baseline, Eigen3, HDF5, Boost and OpenMP are required.
It might be necessary to adapt (or even add) the corresponding CMake
modules in icp/cmake/
. Then:
cd icp/
mkdir build
cd build
cmake ..
make
The data is derived from ShapeNet [1], KITTI [2], ModelNet [3] and Kinect [4]. For ShapeNet, two datasets, in the paper referred to as SN-clean and SN-noisy, were created. We also provide the raw, watertight models for ShapeNet and ModelNet.
Download links:
- SN-clean (2.0GB): Amazon AWS MPI-INF
- SN-noisy (1.5GB): Amazon AWS MPI-INF
- KITTI (5.6GB): Amazon AWS MPI-INF
- bathtubs (3.1GB): Amazon AWS MPI-INF
- chairs (3.8GB): Amazon AWS MPI-INF
- desks (3.8GB): Amazon AWS MPI-INF
- tables (3.6GB): Amazon AWS MPI-INF
- ModelNet10 (6.1GB): Amazon AWS MPI-INF
- ShapeNet Models (241.5MB): Amazon AWS MPI-INF
- ModelNet10 Models (3.5GB): Amazon AWS MPI-INF
Note that SN-clean and SN-noisy are different from the corresponding datasets in our CVPR'18 paper!
See Data for more details.
Make sure to cite [3] and [4] in addition to this paper when using the data.
[3] Andreas Geiger, Philip Lenz, Raquel Urtasun:
Are we ready for autonomous driving? The KITTI vision benchmark suite. CVPR 2012: 3354-3361
[4] Angel X. Chang, Thomas A. Funkhouser, Leonidas J. Guibas, Pat Hanrahan, Qi-Xing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, Fisher Yu:
ShapeNet: An Information-Rich 3D Model Repository. CoRR abs/1512.03012 (2015)
Models can be downloaded here:
- AML Models: Amazon AWS MPI-INF
- Dai et al. Models: Amazon AWS MPI-INF
- Sup (Supervised Baseline) Models: Amazon AWS MPI-INF
- DVAE (Shape Prior) Models: Amazon AWS MPI-INF
The downloaded ZIP-archive contains models for the approach by Dai et al.,
our supervised baseline, our shape prior and the AML approach. The
directory structure is named according to the configuration files. For example,
check vae/config/
.
The models have been saved using data/tools/lua/compress_model_dat.lua
in order to reduce their size.
For running a model, it is sufficient to extract the corresponding
.dat
file and place it in the correct base_directory
according
to the configuration file. For example, for vae/config/clean.low
,
create the directory vae/config/clean.low
and place the model file inside.
Then:
th 4_run.lua config/clean.low.json
More details are given for the training procedures, see below.
Make sure that the data has been downloaded and
all requirements are met, for example run
check_requirements.lua
and check_requirements.py
.
Note that not all cases have been tested and some configuration files have been copy-and-pasted from our experiments to ensure reproducibility.
The shape prior is trained using vae/1_train.lua
. However, vae/0_generate_codes.py
should
be run first to generate a set of random codes - this was mainly used for
reproducible experiments. For example:
python 0_generate_codes.py --code_size=10 --number=100
Then, the configuration files, for example vae/config/clean.low.json
should
be adapted. In particular, the data_directory
key needs to be set according
to the downloaded data. Then:
th 1_train.lua config/clean.low.json
By default, only view iterations are done for illustration. However, training parameters can be adjusted in the corresponding configuration file. The options include:
noise_level
: probability of Bernoulli noise and 2*standard deviation of Gaussian noise; if0
is used, a standard variational auto-encoder (not denoising) will be trained.centering
: if set totrue
, the data will be centered.optimizer
: the optimizer to be used; ADAM is recommended.learning_rate
: the initial learning rate.momentum
: the initial momentum; for ADAM this does not matter.weight_decay
: weight decay for training.batch_size
: batch sized for training.epochs
: number of epoch for training; one epoch includesN/batch_size
iterations.weight_initialization
: weight initialization method to use, seelib/th/WeightInitialization.lua
.decay_iterations
: in which steps to decay learning rate and momentum.decay_learning_rate
: factor for learning rate decay.decay_momentum
: factor for momentum decay.
Additional options are fixed in vae/1_train.lua
.
After training, vae/2_a_marching_cubes.py
can be used to convert the
SDFs predicted on the test set to triangular meshes. This tool requires
the PyMCubes installation from the voxel_center
branch in
davidstutz/PyMCubes; options
are:
usage: Read LTSDF file and run marching cubes. [-h] [--input INPUT]
[--output OUTPUT]
optional arguments:
-h, --help show this help message and exit
--input INPUT Input HDF5 file.
--output OUTPUT Output directory.
Similarly, vae/2_b_test.py
and vae/3_sanity_check.py
can be used
to evaluate and generate simple visualizations of the predictions.
Available options are:
usage: Evaluate predicted occupancy grids and LTSDFs. [-h]
[--predictions PREDICTIONS]
[--targets_occ TARGETS_OCC]
[--targets_sdf TARGETS_SDF]
[--results_file RESULTS_FILE]
optional arguments:
-h, --help show this help message and exit
--predictions PREDICTIONS
Predictions HDF5 file.
--targets_occ TARGETS_OCC
Ground truth occupancy grids as HDF5 file.
--targets_sdf TARGETS_SDF
Ground truth LTSDF as HDF5 file.
--results_file RESULTS_FILE
Results txt file.
usage: Visualize predictions. [-h] [--predictions PREDICTIONS]
[--targets_occ TARGETS_OCC]
[--targets_sdf TARGETS_SDF] [--randoms RANDOMS]
[--directory DIRECTORY]
[--n_observations N_OBSERVATIONS]
optional arguments:
-h, --help show this help message and exit
--predictions PREDICTIONS
Predictions HDF5 file.
--targets_occ TARGETS_OCC
Ground truth occupancy grids HDF5 file.
--targets_sdf TARGETS_SDF
Ground truth SDF HDF5 file.
--randoms RANDOMS Random predictions HDF5 file.
--directory DIRECTORY
Output directory.
--n_observations N_OBSERVATIONS
Shape inference using AML requires a pre-trained shape model. Therefore,
a model (usually prior_model.dat
) from the previous step is required.
This model needs to be copied manually into the corresponding
base directory. For training on the clean ShapeNet dataset (corresponding to
aml/config/clean.low.json
), the correct path would be
aml/clean.low/prior_model.dat
.
Afterwards, the shape inference model can be trained using
th 1_a_train.lua config/clean.low.json
The training parameters in the configuration files are similar to the ones described above, except for:
weights
: determines the relative weights applied to the occupancy point loss, the occupancy free space loss, the SDF point loss and the SDF free space loss in this order.weighted
: determines whether (for noisy cases), the free space statistics intraining_statistics
should be used.reinitialize_encoder
: whether the loaded encoder from the shape prior should be reinitialized or fine-tuned.
As above, the remaining Python tools are used for visualization and evaluation.
For KITTI, aml/1_b_traing.lua
needs to be used as ground truth shapes
are not available to monitor training.
The supervised baseline by Dai et al. [5] is included in dai/
.
Note that the model introduced in [5] needed to be adapted slightly for
higher resolutions as well as on ShapeNet and KITTI due to the non-cubic resolution
of 24 x 54 x 24
. The models can be found in dai/0_model.lua
.
For training, the configuration files in dai/config/
provide
the corresponding hyper parameters, which are very similar
to the parameters described above for the shape prior. Then,
training is started using
th 1_train.lua config/clean.low.json
Evaluation is done using the remaining tools in dai/
as also
illustrated above.
First, make sure that the work by Engelmann et al. [1] can be compiled as outlined in Installation.
Then, two command line tools are provided:
KittiShapePrior
for running the approach on KITTI; arguments are the input directory with point clouds as.txt
files, a.txt
file containing the correpsonding bounding boxes, and the output directory.ShapeNetShapePrior
for running the approach on ShapeNet ("clean" and "noisy"); arguments are the input directory containing the point clouds as.txt
files, and the output directory.
Running the approach on ShapeNet might look as follows:
./ShapeNetShapePrior /path/to/test_off_gt_5_48x64_24x54x24_clean output_directory
Subsequently, rw/tools/shapenet_marching_cubes.py
can be used to obtain
meshes from the predicted signed distance functions:
python shapenet_marching_cubes.py output_directory off_directory
For KITTI, the approach is similar; however, in addition to the input points, the bounding boxes are required:
./KittiShapePrior /path/to/bounding_boxes_txt_validation_gt_padding_corrected_1_24x54x24/ /path/to/bounding_boxes_validation_gt_padding_corrected_1_24x54x24.txt output_directory
Similarly, rw/tools/kitti_marching_cubes.py
expects the bounding boxes
as second argument as well.
The ICP baseline might be quite slow; therefore a simple example is inclided in
icp/test
. The tool is split into sampling the reference meshes and then
performing point-to-point icp given a partial point cloud:
# From within the icp/build directory:
../bin/sample ../test/off/ ../test/points.h5
../bin/icp ../test/txt/0.txt ../test/off/ ../test/points.h5 ../test/output/ ../test/out.log
So, the ICP baseline can be called for each point cloud in TXT format individually.
The original meshes included in the data downloads as well as the predicted meshes (of both the shape prior and shape inference) can be visualized using MeshLab.
Additional tools for visualization using Blender are provided in davidstutz/bpy-visualization-utils.
Evaluation can be done using the tool included in davidstutz/mesh-evaluation.
Licenses for source code and data corresponding to:
D. Stutz, A. Geiger. Learning 3D Shape Completion under Weak Supervision. International Journal of Computer Vision (2018).
Note that the source code and/or data is based on the following projects for which separate licenses apply:
- ShapeNet
- KITTI
- ModelNet
- Kinect
- pyrender
- pyfusion
- Tomas_Akenine-Möller
- Box-Ray Intersection
- Tomas Akenine-Möller Code
- griegler/pyrender
- christopherbatty/SDFGen
- High-Resolution Timer
- Tronic/cmake-modules
- dimatura/binvox-rw-py
- alextsui05/blender-off-addon
Copyright (c) 2018 David Stutz, Max-Planck-Gesellschaft
Please read carefully the following terms and conditions and any accompanying documentation before you download and/or use this software and associated documentation files (the "Software").
The authors hereby grant you a non-exclusive, non-transferable, free of charge right to copy, modify, merge, publish, distribute, and sublicense the Software for the sole purpose of performing non-commercial scientific research, non-commercial education, or non-commercial artistic projects.
Any other use, in particular any use for commercial purposes, is prohibited. This includes, without limitation, incorporation in a commercial product, use in a commercial service, or production of other artefacts for commercial purposes.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
You understand and agree that the authors are under no obligation to provide either maintenance services, update services, notices of latent defects, or corrections of defects with regard to the Software. The authors nevertheless reserve the right to update, modify, or discontinue the Software at any time.
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. You agree to cite the corresponding papers (see above) in documents and papers that report on research using the Software.
Copyright (c) 2018 David Stutz, Max-Planck-Gesellschaft
Please read carefully the following terms and conditions and any accompanying documentation before you download and/or use the data (the "Dataset").
The authors grant you a non-exclusive, non-transferable, free of charge right: To download the Dataset and use it on computers owned, leased or otherwise controlled by you and/or your organisation; To use the Dataset for the sole purpose of performing non-commercial scientific research, non-commercial education, or non-commercial artistic projects.
Any other use, in particular any use for commercial purposes, is prohibited. This includes, without limitation, incorporation in a commercial product, use in a commercial service, or production of other artefacts for commercial purposes.
Without prior written approval from the authors, the Dataset, in whole or in part, shall not be further distributed, published, copied, or disseminated in any way or form whatsoever, whether for profit or not. This includes further distributing, copying or disseminating to a different facility or organizational unit in the requesting university, organization, or company.
THE DATASET IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE DATASET OR THE USE OR OTHER DEALINGS IN THE DATASET.
You understand and agree that the authors are under no obligation to provide either maintenance services, update services, notices of latent defects, or corrections of defects with regard to the Dataset. The authors nevertheless reserve the right to update, modify, or discontinue the Dataset at any time.
You agree to cite the corresponding papers (see above) in documents and papers that report on research using the Dataset.