Two modifications to obtain the derivative of the mean function
- mu can now be used with diff contained in the symbolic toolbox to obtain the symbolic derivative
- dmu returns the value of the derivative of mu at XX directly
This repository contains code necessary to not only reproduce the experiments presented in Scalable Gaussian Process Regression with Derivatives @ NIPS 2018, but also for any wishing to use our methods.
Directory Structure: our root directory contains the code/
and demos/
folders, which contain the research code written for
our paper and the demo code to reproduce our results, respectively. demos/
is contains a number of subdirectories, each
corresponding to an experiment run in our paper, and we will detail these later on in the readme. code/
has the following subdirectories:
bayesopt/
estimator/
gp/
kernels/
lik/
mvm/
testfunctions/
utils/
bayesopt/
contains code for our Bayesian optimization experiments. estimator/
contains code for stochastic trace estimation detailed
in this paper. gp/
is the primary GPML-like interface for training a Gaussian process. kernels/
contains implementations of various kernels, including the D-SKI and D-SKIP kernels we developed. lik/
contains standard marginal log likelihood
functions and mvm/
contains fast matrix-vector multiplication subroutines. testfunctions/
contains standard global optimization test functions
used to test our Bayesian optimization routines. Lastly, utils/
contains various other subroutines we use.
We use Matlab's new element-wise multiplication operators, and thus require Matlab R2017A at a minimum.
To start, clone the repository and mex ski_diag.c
, a
key routine extracting the diagonal of the D-SKI matrix needed when preconditioning. Mexing allows one to run c code
from matlab, but mexing ski_diag.c
is not necessary if you do not plan on using D-SKI.
Before attempting any experiments, make sure to run startup.m
, which
simply adds all necessary directories and subdirectories to Matlab's path.
To train a basic GP with no gradients, run:
[mu, K] = gp(X, Y)
where the pair X,Y
represent the training points and values respectively. mu
is a function handle predicting on new points e.g. y_predict = mu(x_predict)
, and K
is the dense
kernel matrix. The same idea holds when gradients are available
[mu, K] = gp_grad(X, Y, dY)
where dY
contains the respective gradients of points in X
. Similar interfaces are available for (D-)SKI and (D-)SKIP kernels, except K
is
a function handle performing matrix-vector multiplication instead of a dense matrix.
Any approximate kernel's spectrum should closely match the spectrum of the original kernel. This experiment
shows that our approximate kernels D-SKI and D-SKIP indeed possess this property. Run SpectrumError.m
located in /Demos/Spectrum/
to generate the following plot (figure 2 of section 4).
The cost of an matrix-vector multiply with a dense kernel matrix is quadratic, whereas the cost of a matrix-vector multiply
with either D-SKI or D-SKIP is linear, which is verified in this experiment. Run ScalingComparison.m
located in /Demos/Scaling/
to generate the following plot (figure 3 of section 4).
Possessing gradients allows one to perform dimensionality reduction via active subspace sampling. Run DimReduction.m
located in /Demos/ActiveSubspace
to generate the following plot (figure 4 and table 2 of section 4). For quick runtimes, the
demo code uses far fewer points than the experiment presented in the paper. The conclusion remains the same; D-SKIP with active
subspace sampling performs far better than the dense code.
Preconditioning is key to increasing convergence of iterative methods; we found that pivoted Cholesky is an effective preconditioner.
In general, convergence of iterative methods depends on the spectrum of the kernel matrix, and if there is spectral clustering or a large spectral
gap, we expect fast convergence. In this experiment, we consider the SE Kernel and plot the effectiveness of pivoted Cholesky in terms of both
lengthscale and regularization parameters. Run PrecondCompare.m
located in /Demos/Precond
to recreate the following plot (figure 8 in the appendix).
We apply our method to rough terrain reconsutrction of Mount St. Helens from lidar
data.
Run MountStHelens.m
located in /Demos/Maps
to recreate the following plot (figure 5 in section 4).
We apply our method to implicit surface reconstruction. We reconstruct the Stanford bunny
from noisy surface normals (i.e. gradients), which is difficult to do using traditional spline methods. Note that, as we are using roughly 40 thousand
points and gradients, reconstruction will take a bit of time. Run bunny.m
located in /Demos/ImplicitBunny
to recreate the following plot (figure 8 in the appendix).
Dimensionality reduction with gradient information is possible estimating the dominant active subspace. We imbed two popular test functions for global
optimization, the Ackley and Rastragin functions, in higher dimensions and run scalable Bayesian optimization with dimensionality reduction and see
promising results. Run boComparison.m
located in /Demos/BayesOpt
to recreate the following plot (figure 8 in the appendix).