Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisions. #3

Merged
merged 9 commits into from
Jan 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/workflows/main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@ jobs:
- uses: "actions/checkout@v3"
- name: Build Docker image.
run: docker build -t gptools .
- name: Verify notebooks are up to date.
run: docker run --rm -v `pwd`:/workdir gptools pytest -v
- name: Render the getting started notebook for Python.
run: docker run --rm -v `pwd`:/workdir gptools cook exec getting_started:run
- name: Render the getting started Rmarkdown for R.
run: docker run --rm -v `pwd`:/workdir gptools cook exec getting_started_R:run
- name: Generate figures (using ./in-docker.sh but we can't use `-it` in the Action).
run: docker run --rm -e FAST=true -v `pwd`:/workdir gptools cook exec figures
- name: Upload figures and reports as artifacts.
Expand Down
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
__pycache__
.cook
.DS_Store
.ipynb_checkpoints
.Rhistory
*.hpp
*.html
*.ipynb
*.o
*.pdf
*.pkl
*.tmp
__pycache__
linear/linear
padding/exact
padding/padded
playground
profile/fourier_centered
profile/fourier_non_centered
profile/graph_centered
Expand Down
10 changes: 9 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
FROM python:3.10
# Install R and dependencies.
RUN apt-get update && apt-get install -y \
r-base \
r-cran-devtools \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /workdir
COPY setup.R .
RUN Rscript setup.R

# Install Python dependencies and compile cmdstan.
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
RUN python -m cmdstanpy.install_cmdstan --verbose --version 2.33.0

27 changes: 18 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,28 +31,37 @@ If you switch between containerized and local runtime, you may need to remove co

To ensure the reproducibility of these materials, the results are also computed as the output of a GitHub Action workflow [![Reproduction Materials](https://github.com/onnela-lab/gptools-reproduction-material/actions/workflows/main.yaml/badge.svg)](https://github.com/onnela-lab/gptools-reproduction-material/actions/workflows/main.yaml) with the `FAST=true` flag. Figures can be obtained by selecting a workflow run and downloading the `figures-reports` artifact.

## Getting started

The `getting_started` folder contains a Python notebook and Rmarkdown file to reproduce the results of the "Getting started" sections in the accompanying manuscript. HTML reports can be generated in the `getting_started` folder by running

- `[./in-docker.sh] cook exec getting_started:run` for Python
- `[./in-docker.sh] cook exec getting_started_R:run` for R

However, the folder is likely most suitable for interactive exploration and experimentation to get familiar with the package.

## Running the experiments

Figures in the manuscript were generated using the containerized runtime, and all runtime estimates below are based on a 2020 Macbook Pro with M1 chip and 16 GB of memory running macOS 13.4 (22F66). Figures are generated by executing Jupyter notebooks stored in markdown format. All figures can be reproduced by running `[./in-docker.sh] cook exec figures` (see below for details).
Figures in the manuscript were generated using the containerized runtime, and all runtime estimates below are based on a 2020 Macbook Pro with M1 chip and 16 GB of memory running macOS 13.4 (22F66). All figures can be reproduced by running `[./in-docker.sh] cook exec figures` (see below for details). Figures are generated by executing Jupyter notebooks stored in markdown format; the notebooks can be opened directly in a standard Jupyter environment using the [`jupytext`](https://jupytext.readthedocs.io/en/latest/) extension. If you prefer, the folder for each experiment also contains a corresponding `*.ipynb` file. To open and use the notebooks, please set up a local computing environment as described above.

### Applications

The folders `trees` and `tube` contain code and data to reproduce the two applied examples in the manuscript. Each example takes about five minutes to run. The figures can be reproduced by running `[./in-docker.sh] cook exec tube:fig trees:fig` and will be saved in the corresponding folder as png and pdf files.
The folders `trees` and `tube` contain code and data to reproduce the two applied examples in the manuscript. Each example takes about five minutes to run. The figures can be reproduced by running `[./in-docker.sh] cook exec tube:run trees:run` and will be saved in the corresponding folder as png and pdf files.

### Profiling experiments

The folder `profile` contains code to reproduce profiling experiments, and running all experiments can take up to ten hours. The profiling figure can be reproduced by running `[./in-docker.sh] cook exec profile:fig`. If a reduced runtime (but more noisy results) are desired, run `FAST=true cook exec profile:fig` which takes about 90 minutes.
The folder `profile` contains code to reproduce profiling experiments, and running all experiments can take up to ten hours. The profiling figure can be reproduced by running `[./in-docker.sh] cook exec profile:run`. If a reduced runtime (but more noisy results) are desired, run `FAST=true cook exec profile:run` which takes about 90 minutes.

All experiments are seeded for reproducibility, but profiling experiments are subject to variability due to different hardware and competing processes running on the same machine. Despite seeding, results may also vary depending on the operating system and stdlib implementation.

### Kernel properties and effect of padding

The folders `kernels` and `padding` contain code to reproduce figures on the properties of different kernels and their spectral properties as well as the effect of padding on Fourier methods, respectively. The figures can be reproduced by running `[./in-docker.sh] cook exec kernels:fig padding:fig`; the latter takes about ten minutes to generate.
The folders `kernels` and `padding` contain code to reproduce figures on the properties of different kernels and their spectral properties as well as the effect of padding on Fourier methods, respectively. The figures can be reproduced by running `[./in-docker.sh] cook exec kernels:run padding:run`; the latter takes about ten minutes to generate.

## Expected results

- `[./in-docker.sh] cook exec kernels:fig` ![](kernels/kernels.png)
- `[./in-docker.sh] cook exec padding:fig` ![](padding/padding.png)
- `[./in-docker.sh] cook exec profile:fig` ![](profile/profile.png)
- `[./in-docker.sh] cook exec trees:fig` ![](trees/trees.png)
- `[./in-docker.sh] cook exec tube:fig` ![](tube/tube.png)
- `[./in-docker.sh] cook exec kernels:run` ![](kernels/kernels.png)
- `[./in-docker.sh] cook exec padding:run` ![](padding/padding.png)
- `[./in-docker.sh] cook exec profile:run` ![](profile/profile.png)
- `[./in-docker.sh] cook exec trees:run` ![](trees/trees.png)
- `[./in-docker.sh] cook exec tube:run` ![](tube/tube.png)
1 change: 1 addition & 0 deletions getting_started/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
getting_started
32 changes: 32 additions & 0 deletions getting_started/getting_started.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
Install the packages. We explicitly specify the repos here so we don't get asked for the mirror when
rendering the Rmarkdown.

```{r}
install.packages(
"cmdstanr",
repos = c("https://mc-stan.org/r-packages/", "http://cran.us.r-project.org")
)
install.packages("gptoolsStan", repos=c("http://cran.us.r-project.org"))
```

Compile and run the model.

```{r}
library(cmdstanr)
library(gptoolsStan)

model <- cmdstan_model(
stan_file="getting_started.stan",
include_paths=gptools_include_path(),
)
fit <- model$sample(
data=list(n=100, sigma=1, length_scale=0.1, period=1),
chains=1,
iter_warmup=500,
iter_sampling=50
)
f <- fit$draws("f")
dim(f)
```

Expected output: `[1] 50 1 100`
44 changes: 44 additions & 0 deletions getting_started/getting_started.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "d2b9dac2",
"metadata": {},
"outputs": [],
"source": [
">>> import cmdstanpy\n",
">>> from gptools.stan import get_include\n",
">>>\n",
">>> model = cmdstanpy.CmdStanModel(\n",
"... stan_file=\"getting_started.stan\",\n",
"... stanc_options={\"include-paths\": get_include()},\n",
"... )\n",
">>> fit = model.sample(\n",
"... data = {\"n\": 100, \"sigma\": 1, \"length_scale\": 0.1, \"period\": 1},\n",
"... chains=1,\n",
"... iter_warmup=500,\n",
"... iter_sampling=50,\n",
"... )\n",
">>> fit.f.shape"
]
},
{
"cell_type": "markdown",
"id": "34ffcd17",
"metadata": {},
"source": [
"Expected output: `(50, 100)`"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
31 changes: 31 additions & 0 deletions getting_started/getting_started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.15.1
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
---

```{code-cell} ipython3
>>> import cmdstanpy
>>> from gptools.stan import get_include
>>>
>>> model = cmdstanpy.CmdStanModel(
... stan_file="getting_started.stan",
... stanc_options={"include-paths": get_include()},
... )
>>> fit = model.sample(
... data = {"n": 100, "sigma": 1, "length_scale": 0.1, "period": 1},
... chains=1,
... iter_warmup=500,
... iter_sampling=50,
... )
>>> fit.f.shape
```

Expected output: `(50, 100)`
22 changes: 22 additions & 0 deletions getting_started/getting_started.stan
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
functions {
#include gptools/util.stan
#include gptools/fft.stan
}

data {
int n;
real<lower=0> sigma, length_scale, period;
}

transformed data {
vector [n %/% 2 + 1] cov_rfft =
gp_periodic_exp_quad_cov_rfft(n, sigma, length_scale, period) + 1e-9;
}

parameters {
vector [n] f;
}

model {
f ~ gp_rfft(zeros_vector(n), cov_rfft);
}
100 changes: 100 additions & 0 deletions kernels/kernels.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "031e9913",
"metadata": {},
"outputs": [],
"source": [
"from gptools.util.kernels import ExpQuadKernel, MaternKernel\n",
"from gptools.util.fft.fft1 import transform_irfft, evaluate_rfft_scale\n",
"import matplotlib as mpl\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"from pathlib import Path\n",
"\n",
"mpl.style.use(\"../jss.mplstyle\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "250db79b",
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(9) # Seed picked for good legend positioning. Works for any though.\n",
"fig, axes = plt.subplots(2, 2)\n",
"length_scale = 0.2\n",
"kernels = {\n",
" \"squared exp.\": lambda period: ExpQuadKernel(1, length_scale, period),\n",
" \"Matern ³⁄₂\": lambda period: MaternKernel(1.5, 1, length_scale, period),\n",
"}\n",
"\n",
"x = np.linspace(0, 1, 101, endpoint=False)\n",
"z = np.random.normal(0, 1, x.size)\n",
"\n",
"for ax, (key, kernel) in zip(axes[1], kernels.items()):\n",
" value = kernel(None).evaluate(0, x[:, None])\n",
" line, = axes[0, 0].plot(x, value, ls=\"--\")\n",
" rfft = kernel(1).evaluate_rfft([x.size])\n",
" value = np.fft.irfft(rfft, x.size)\n",
" axes[0, 1].plot(rfft, label=key)\n",
" axes[0, 0].plot(x, value, color=line.get_color())\n",
"\n",
" for maxf, ls in [(x.size // 2 + 1, \"-\"), (5, \"--\"), (3, \":\")]:\n",
" rfft_scale = evaluate_rfft_scale(cov_rfft=rfft, size=x.size)\n",
" rfft_scale[maxf:] = 0\n",
" f = transform_irfft(z, np.zeros_like(z), rfft_scale=rfft_scale)\n",
" ax.plot(x, f, ls=ls, color=line.get_color(), label=fr\"$\\xi_\\max={maxf}$\")\n",
"\n",
" ax.set_xlabel(\"position $x$\")\n",
" ax.set_ylabel(f\"{key} GP $f$\")\n",
"\n",
"ax = axes[0, 0]\n",
"ax.set_ylabel(\"kernel $k(0,x)$\")\n",
"ax.set_xlabel(\"position $x$\")\n",
"ax.legend([\n",
" mpl.lines.Line2D([], [], ls=\"--\", color=\"gray\"),\n",
" mpl.lines.Line2D([], [], ls=\"-\", color=\"gray\"),\n",
"], [\"standard\", \"periodic\"], fontsize=\"small\")\n",
"ax.text(0.05, 0.05, \"(a)\", transform=ax.transAxes)\n",
"ax.yaxis.set_ticks([0, 0.5, 1])\n",
"\n",
"ax = axes[0, 1]\n",
"ax.set_yscale(\"log\")\n",
"ax.set_ylim(1e-5, x.size)\n",
"ax.set_xlabel(r\"frequency $\\xi$\")\n",
"ax.set_ylabel(r\"Fourier kernel $\\tilde k=\\phi\\left(k\\right)$\")\n",
"ax.legend(fontsize=\"small\", loc=\"center right\")\n",
"ax.text(0.95, 0.95, \"(b)\", transform=ax.transAxes, ha=\"right\", va=\"top\")\n",
"\n",
"ax = axes[1, 0]\n",
"ax.legend(fontsize=\"small\", loc=\"lower center\")\n",
"ax.text(0.95, 0.95, \"(c)\", transform=ax.transAxes, ha=\"right\", va=\"top\")\n",
"\n",
"ax = axes[1, 1]\n",
"ax.legend(fontsize=\"small\", loc=\"lower center\")\n",
"ax.sharey(axes[1, 0])\n",
"ax.text(0.95, 0.95, \"(d)\", transform=ax.transAxes, ha=\"right\", va=\"top\")\n",
"\n",
"for ax in [axes[0, 0], *axes[1]]:\n",
" ax.xaxis.set_ticks([0, 0.5, 1])\n",
"\n",
"fig.tight_layout()\n",
"fig.savefig(\"kernels.pdf\", bbox_inches=\"tight\")\n",
"fig.savefig(\"kernels.png\", bbox_inches=\"tight\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
57 changes: 57 additions & 0 deletions linear/linear.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "eb2c936b",
"metadata": {},
"source": [
"# Linear regression example from Section 2 of the manuscript"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "42ecaa12",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"np.random.seed(0)\n",
"n = 100\n",
"p = 3\n",
"X = np.random.normal(0, 1, (n, p))\n",
"theta = np.random.normal(0, 1, p)\n",
"sigma = np.random.gamma(2, 2)\n",
"y = np.random.normal(X @ theta, sigma)\n",
"\n",
"print(f\"coefficients: {theta}\")\n",
"print(f\"observation noise scale: {sigma}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "07069f61",
"metadata": {},
"outputs": [],
"source": [
"import cmdstanpy\n",
"\n",
"model = cmdstanpy.CmdStanModel(stan_file=\"linear.stan\")\n",
"fit = model.sample(data={\"n\": n, \"p\": p, \"X\": X, \"y\": y}, seed=0)\n",
"\n",
"print(fit.summary())"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading