This package provides an implementation of the Geometric Manifold Component Estimator, or GEOMANCER, as described in Disentangling by Subspace Diffusion (2020), as well as information about the Stanford 3D Objects for Disentangling (S3O4D) dataset. GEOMANCER is a nonparametric algorithm for disentangling, somewhat similar in spirit to Laplacian Eigenmaps or Vector Diffusion Maps, except instead of producing an embedding for the data, it produces a set of subspaces around each data point, one subspace for each disentangled factor of variation in the data. This differs from more common algorithms for disentangling that originated in the deep learning community, such as the beta-VAE, TCVAE or FactorVAE, which learn a nonlinear embedding and probabilistic generative model of the data. GEOMANCER is intended for data where the individual factors of variation might be more than one dimensional, for instance 3D rotations. At the moment, GEOMANCER works best when some ground truth information about the metric in the data space is available, for instance knowledge of the "true" nearest neighbors around each point, and we do not recommend running GEOMANCER directly on unstructured data from high-dimensional spaces. We are providing the code here to enable the interested researcher to get some hands-on experience with the ideas around differential geometry, holonomy and higher-order graph connection Laplacians we explore in the paper.
To install the package locally in a new virtual environment run:
python3 -m venv geomancer
source geomancer/bin/activate
git clone https://github.com/deepmind/deepmind-research.git .
cd deepmind-research/geomancer
pip install -e .
To run, simply load or generate an array of data, and call the fit
function:
import numpy as np
import geomancer
# Generate data from a product of two spheres
data = []
for i in range(2):
foo = np.random.randn(1000, 3)
data.append(foo / np.linalg.norm(foo, axis=1, keepdims=True))
data = np.concatenate(data, axis=1)
# Run GEOMANCER. The underlying manifold is 4-dimensional.
components, spectrum = geomancer.fit(data, 4)
If ground truth information about the tangent spaces is available in a space
that is aligned with the data, then the performance can be evaluated using the
eval_aligned
function. If ground truth data is only available in an unaligned
space, for instance if the embedding used to generate the data is not the same
as the space in which the data is observed, then the eval_unaligned
function
can be used, which requires both the data and disentangled tangent vectors in
the ground truth space. Examples of both evaluation metrics are given in the
demo in train.py
.
The file train.py
runs GEOMANCER on a product of manifolds that can be
specified by the user. The number of data points to train on is given by the
--npts
flag, while the specification of the manifold is given by the
--specification
flag. The --rotate
flag specifies whether a random rotation
should be applied to the data. If false, eval_aligned
will be used to evaluate
the result. If true, eval_unaligned
will be used to evaluate the result.
For instance, to run on the product of the sphere in 2 and 4 dimensions and the special orthogonal group in 3 dimensions, run:
python3 train.py --specification='S^2','S^4','SO(3)' --npts=100000
This passes a list of strings as the manifold specification flag. Note that a manifold this large will require a large amount of data to work and may require hours or days to run. The default example should run in just a few minutes.
The demo plots 3 different outputs:
- The eigenvalue spectrum of the 2nd-order graph Laplacian. This should have a large gap in the spectrum at the eigenvalue equal to the number of submanifolds.
- The basis vectors for each disentangled subspace around one point.
- The ground truth basis vectors for the disentangled subspaces at the same
point. If
--rotate=False
, and GEOMANCER has sufficient data, each basis matrix should span the same subspace as the results in the second plot.
The data used in the "Stanford 3D Objects" section of the experimental results is available in TensorFlow Datasets. The data consists of 100,000 renderings each of the Bunny and Dragon objects from the Stanford 3D Scanning Repository. More objects may be added in the future, but only the Bunny and Dragon are used in the paper. Each object is rendered with a uniformly sampled illumination from a point on the 2-sphere, and a uniformly sampled 3D rotation. The true latent states are provided as NumPy arrays along with the images. The lighting is given as a 3-vector with unit norm, while the rotation is provided both as a quaternion and a 3x3 orthogonal matrix.
There are many similarities between S3O4D and existing ML benchmark datasets like NORB, 3D Chairs, 3D Shapes and many others, which also include renderings of a set of objects under different pose and illumination conditions. However, none of these existing datasets include the full manifold of rotations in 3D - most include only a subset of changes to elevation and azimuth. S3O4D images are sampled uniformly and independently from the full space of rotations and illuminations, meaning the dataset contains objects that are upside down and illuminated from behind or underneath. We believe that this makes S3O4D uniquely suited for research on generative models where the latent space has non-trivial topology, as well as for general manifold learning methods where the curvature of the manifold is important.
To load from TensorFlow Datasets, simply run:
import tensorflow_datasets as tfds
ds = tfds.load('s3o4d', split='bunny_train', shuffle_files=True)
for example in ds.take(1):
image, label, illumination, pose_mat, pose_quat = (
example['image'], example['label'], example['illumination'],
example['pose_mat'], example['pose_quat'])
where the split can be any of bunny_train
, dragon_train
, bunny_test
or
dragon_test
.
If you prefer to not have TensorFlow as a dependency for your project, and want
to download the data manually, you can find the raw data (as zipped JPEGs and
NumPy arrays) on Google Cloud.
To load the data for a given object, unzip images.zip
into a folder called
images
in the same directory as latents.npz
, and from inside that
directory run:
import numpy as np
from PIL import Image
with open('latents.npz', 'r') as f:
data = np.load(f)
illumination = data['illumination'] # lighting source position, a 3-vector
pose_quat = data['pose_quat'] # object pose (3D rotation as a quaternion)
pose_mat = data['pose_mat'] # object pose (3D rotation as a matrix)
def get_data(i):
"""Return data and latent given an index up to 100,000."""
img = np.array(Image.open(f'images/{i:05}.jpg'))
# Uses the matrix, not quaternion, representation,
# similarly to the experiments in the paper
latent = np.concatenate((illumination[i],
pose_mat[i].reshape(-1)))
return img, latent
img, latent = get_data(0)
To do the same train/test split as in TensorFlow Datasets, simply use the first 80,000 images for each object as training data and the last 20,000 as test.
If you use this code or the Stanford 3D Objects for Disentangling data in your work, we ask you to cite this paper:
@article{pfau2020disentangling,
title={Disentangling by Subspace Diffusion},
author={Pfau, David and Higgins, Irina and Botev, Aleksandar and Racani\`ere,
S{\'e}bastian},
journal={Advances in Neural Information Processing Systems (NeurIPS)},
year={2020}
}
This is not an official Google product.