Skip to content

Commit

Permalink
Initial commit.
Browse files Browse the repository at this point in the history
  • Loading branch information
tillahoffmann committed Sep 13, 2023
0 parents commit 06cfadc
Show file tree
Hide file tree
Showing 34 changed files with 8,203 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.git
venv
21 changes: 21 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
.cook
.ipynb_checkpoints
*.hpp
*.html
*.ipynb
*.o
*.pdf
*.pkl
*.tmp
__pycache__
padding/exact
padding/padded
profile/fourier_centered
profile/fourier_non_centered
profile/graph_centered
profile/graph_non_centered
profile/standard_centered
profile/standard_non_centered
trees/trees
tube/tube
venv
6 changes: 6 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
FROM python:3.10
WORKDIR /workdir
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
RUN python -m cmdstanpy.install_cmdstan --verbose --version 2.33.0

54 changes: 54 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Reproduction materials for "Scalable Gaussian Process Inference with Stan"

This repository comprises code and data to reproduce the results and figures in the accompanying manuscript [Scalable Gaussian Process Inference with Stan](https://doi.org/10.48550/arXiv.2301.08836).

## Setting up a computing environment

A computing environment can either be set up directly on your machine by following the [steps below](#direct-installation), or you can reproduce the results in a [Docker container](https://en.wikipedia.org/wiki/Docker_(software)) (see [the following section](#containerized-runtime)).

### Direct installation

1. Ensure a recent version of Python is installed (>= 3.8). The code was tested with Python 3.8, 3.9, 3.10, and 3.11 on Ubuntu (see [here](https://github.com/onnela-lab/gptools/actions/workflows/main.yml) for details) and 3.10 on macOS.
2. Install the Python requirements by running `pip install -r requirements.txt` from the root directory.
3. Install `cmdstan` by running `python -m cmdstanpy.install_cmdstan --verbose --version=2.33.0`. Other versions of `cmdstan` may also work but have not been tested.
4. Verify the installation by running `python -m gptools.stan` which should print `The include path is ...`.

### Containerized runtime

To simplify reproduction, we also provide a `Dockerfile` to build a containerized runtime environment.

1. Ensure Docker is installed. Docker Desktop 4.22.1 (118664) with Docker Engine 24.0.5 was used to test the following steps.
2. Run `docker build -t gptools .` to build the Docker image.
3. Verify the installation by running `./in-docker.sh python -m gptools.stan` which should print `The include path is ...`.

For any of the subsequent commands, they can be executed by either running `[the command]` directly or `./in-docker.sh [the command]` to use the containerized runtime. In the following `[./in-docker.sh]` indicates the optional prefix to run commands in a container.

### Removing compiled Stan programs

If you switch between containerized and local runtime, you may need to remove compiled Stan programs because binaries are not interoperable. Run `[./in-docker.sh] cook exec rm-compiled` to do so.

## Running the experiments

Figures in the manuscript were generated using the containerized runtime, and all runtime estimates below are based on a 2020 Macbook Pro with M1 chip and 16 GB of memory running macOS 13.4 (22F66). All figures can be reproduced by running `[./in-docker.sh] cook exec figures` (see below for details).

### Applications

The folders `trees` and `tube` contain code and data to reproduce the two applied examples in the manuscript. Each example takes about five minutes to run. The figures can be reproduced by running `[./in-docker.sh] cook exec tube:fig trees:fig` and will be saved in the corresponding folder as png and pdf files.

### Profiling experiments

The folder `profile` contains code to reproduce profiling experiments, and running all experiments can take up to ten hours. The profiling figure can be reproduced by running `[./in-docker.sh] cook exec profile:fig`. If a reduced runtime (but more noisy results) are desired, run `FAST=true cook exec profile:fig` which takes about 90 minutes.

All experiments are seeded for reproducibility, but profiling experiments are subject to variability due to different hardware and competing processes running on the same machine. Despite seeding, results may also vary depending on the operating system and stdlib implementation.

### Kernel properties and effect of padding

The folders `kernels` and `padding` contain code to reproduce figures on the properties of different kernels and their spectral properties as well as the effect of padding on Fourier methods, respectively. The figures can be reproduced by running `[./in-docker.sh] cook exec kernels:fig padding:fig`; the latter takes about ten minutes to generate.

## Expected results

- `[./in-docker.sh] cook exec kernels:fig` ![](kernels/kernels.png)
- `[./in-docker.sh] cook exec padding:fig` ![](padding/padding.png)
- `[./in-docker.sh] cook exec profile:fig` ![](profile/profile.png)
- `[./in-docker.sh] cook exec trees:fig` ![](trees/trees.png)
- `[./in-docker.sh] cook exec tube:fig` ![](tube/tube.png)
1 change: 1 addition & 0 deletions in-docker.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
docker run --rm -it -e FAST -v `pwd`:/workdir gptools $@
6 changes: 6 additions & 0 deletions jss.mplstyle
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
figure.figsize : 6.1, 4.5
figure.dpi : 144
font.size : 10
legend.fontsize: small
image.origin: lower
animation.html: html5
87 changes: 87 additions & 0 deletions kernels/kernels.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.14.4
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
---

```{code-cell} ipython3
from gptools.util.kernels import ExpQuadKernel, MaternKernel
from gptools.util.fft.fft1 import transform_irfft, evaluate_rfft_scale
import matplotlib as mpl
from matplotlib import pyplot as plt
import numpy as np
from pathlib import Path
mpl.style.use("../jss.mplstyle")
```

```{code-cell} ipython3
np.random.seed(9) # Seed picked for good legend positioning. Works for any though.
fig, axes = plt.subplots(2, 2)
length_scale = 0.2
kernels = {
"squared exp.": lambda period: ExpQuadKernel(1, length_scale, period),
"Matern ³⁄₂": lambda period: MaternKernel(1.5, 1, length_scale, period),
}
x = np.linspace(0, 1, 101, endpoint=False)
z = np.random.normal(0, 1, x.size)
for ax, (key, kernel) in zip(axes[1], kernels.items()):
value = kernel(None).evaluate(0, x[:, None])
line, = axes[0, 0].plot(x, value, ls="--")
rfft = kernel(1).evaluate_rfft([x.size])
value = np.fft.irfft(rfft, x.size)
axes[0, 1].plot(rfft, label=key)
axes[0, 0].plot(x, value, color=line.get_color())
for maxf, ls in [(x.size // 2 + 1, "-"), (5, "--"), (3, ":")]:
rfft_scale = evaluate_rfft_scale(cov_rfft=rfft, size=x.size)
rfft_scale[maxf:] = 0
f = transform_irfft(z, np.zeros_like(z), rfft_scale=rfft_scale)
ax.plot(x, f, ls=ls, color=line.get_color(), label=fr"$\xi_\max={maxf}$")
ax.set_xlabel("position $x$")
ax.set_ylabel(f"{key} GP $f$")
ax = axes[0, 0]
ax.set_ylabel("kernel $k(0,x)$")
ax.set_xlabel("position $x$")
ax.legend([
mpl.lines.Line2D([], [], ls="--", color="gray"),
mpl.lines.Line2D([], [], ls="-", color="gray"),
], ["standard", "periodic"], fontsize="small")
ax.text(0.05, 0.05, "(a)", transform=ax.transAxes)
ax.yaxis.set_ticks([0, 0.5, 1])
ax = axes[0, 1]
ax.set_yscale("log")
ax.set_ylim(1e-5, x.size)
ax.set_xlabel(r"frequency $\xi$")
ax.set_ylabel(r"Fourier kernel $\tilde k=\phi\left(k\right)$")
ax.legend(fontsize="small", loc="center right")
ax.text(0.95, 0.95, "(b)", transform=ax.transAxes, ha="right", va="top")
ax = axes[1, 0]
ax.legend(fontsize="small", loc="lower center")
ax.text(0.95, 0.95, "(c)", transform=ax.transAxes, ha="right", va="top")
ax = axes[1, 1]
ax.legend(fontsize="small", loc="lower center")
ax.sharey(axes[1, 0])
ax.text(0.95, 0.95, "(d)", transform=ax.transAxes, ha="right", va="top")
for ax in [axes[0, 0], *axes[1]]:
ax.xaxis.set_ticks([0, 0.5, 1])
fig.tight_layout()
fig.savefig("kernels.pdf", bbox_inches="tight")
fig.savefig("kernels.png", bbox_inches="tight")
```
Binary file added kernels/kernels.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
33 changes: 33 additions & 0 deletions padding/exact.stan
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
data {
int num_observations, observe_first;
int kernel;
array [num_observations] real x;
vector[num_observations] y;
real<lower=0> kappa, length_scale, epsilon, sigma;
}

transformed data {
cov_matrix[num_observations] cov;
if (kernel == 0) {
cov = gp_exp_quad_cov(x, sigma, length_scale);
} else if (kernel == 1) {
cov = gp_matern32_cov(x, sigma, length_scale);
} else {
reject("kernel=", kernel);
}
cov = add_diag(cov, epsilon);
matrix[num_observations, num_observations] chol = cholesky_decompose(cov);
}

parameters {
vector[num_observations] z;
}

transformed parameters {
vector[num_observations] f = chol * z;
}

model {
z ~ normal(0, 1);
y[:observe_first] ~ normal(f[:observe_first], kappa);
}
35 changes: 35 additions & 0 deletions padding/padded.stan
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
functions {
#include gptools/util.stan
#include gptools/fft1.stan
}

data {
int num_observations, padding, observe_first;
int kernel;
vector[num_observations] y;
real<lower=0> kappa, length_scale, epsilon, sigma;
}

transformed data {
int num = num_observations + padding;
vector[num %/% 2 + 1] cov_rfft;
if (kernel == 0) {
cov_rfft = gp_periodic_exp_quad_cov_rfft(num, sigma, length_scale, num);
} else if (kernel == 1) {
cov_rfft = gp_periodic_matern_cov_rfft(1.5, num, sigma, length_scale, num);
}
cov_rfft += epsilon;
}

parameters {
vector[num] z;
}

transformed parameters {
vector[num] f = gp_inv_rfft(z, zeros_vector(num), cov_rfft);
}

model {
z ~ normal(0, 1);
y[:observe_first] ~ normal(f[:observe_first], kappa);
}
Loading

0 comments on commit 06cfadc

Please sign in to comment.