Skip to content

Commit

Permalink
Fast notebooks for CI (#390)
Browse files Browse the repository at this point in the history
  • Loading branch information
henrymoss authored Nov 1, 2021
1 parent cb51442 commit 1aff59b
Show file tree
Hide file tree
Showing 32 changed files with 273 additions and 76 deletions.
15 changes: 15 additions & 0 deletions .github/workflows/develop-checks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,18 @@ jobs:
python-version: 3.7
- run: pip install tox
- run: tox -e alltests

fulldocs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: 3.7
- run: pip install tox
- run: |
TEMP_DEB="$(mktemp)" &&
wget -O "$TEMP_DEB" 'https://github.com/jgm/pandoc/releases/download/2.10.1/pandoc-2.10.1-1-amd64.deb' &&
sudo dpkg -i "$TEMP_DEB"
rm -f "$TEMP_DEB"
- run: tox -e docs
3 changes: 1 addition & 2 deletions .github/workflows/quality-checks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -61,5 +61,4 @@ jobs:
wget -O "$TEMP_DEB" 'https://github.com/jgm/pandoc/releases/download/2.10.1/pandoc-2.10.1-1-amd64.deb' &&
sudo dpkg -i "$TEMP_DEB"
rm -f "$TEMP_DEB"
- run: tox -e docs

- run: tox -e quickdocs
1 change: 1 addition & 0 deletions common_build/types/constraints.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@ mypy==0.910
mypy-extensions==0.4.3
toml==0.10.2
typed-ast==1.4.3
types-PyYAML==6.0.0
typing-extensions==3.7.4.3
1 change: 1 addition & 0 deletions common_build/types/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
# limitations under the License.

mypy
types-PyYAML
# pin typing-extensions so we're compatible with venvs that have tensorflow installed
typing-extensions~=3.7.4

27 changes: 25 additions & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ setting up a virtual environment as described above and running the following fr
which you can ignore.

```bash
sphinx-build -M html . _build -D exclude_patterns=_build,Thumbs.db,.DS_Store,notebooks
$ sphinx-build -M html . _build -D exclude_patterns=_build,Thumbs.db,.DS_Store,notebooks
```

The easiest way to test a specific notebook for *Python errors* is to run it with `python`
Expand All @@ -47,7 +47,30 @@ if you wish to build the documentation for a specific notebook, you can run some
the following command to exclude all other notebooks and the API documentation:

```bash
sphinx-build -M html . _build -D autoapi_dirs= -D exclude_patterns=_build,Thumbs.db,.DS_Store,notebooks/[a-su-z]*
$ sphinx-build -M html . _build -D autoapi_dirs= -D exclude_patterns=_build,Thumbs.db,.DS_Store,notebooks/[a-su-z]*
```

### Partial executions of the notebooks

For continuous integration, we save time by executing only dummy runs of the notebook
optimization loops (the notebooks are still executed in full after merging).
To do this locally, you can run the following:

```bash
$ tox -e quickdocs
```

These partial runs are managed by the
configuration in the `docs/notebooks/quickrun` subdirectory.
To run them in your own virtual environment, execute the following from the `docs`
subdirectory before building the documentation:
```bash
$ python notebooks/quickrun/quickrun.py
```
Note that this will modify the notebook files.
To revert them to what they were before, you can execute the following:
```python
python notebooks/quickrun/quickrun.py --revert
```

# License
Expand Down
6 changes: 5 additions & 1 deletion docs/notebooks/active_learning.pct.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
# %matplotlib inline
import numpy as np
import tensorflow as tf
import pandas as pd

np.random.seed(1793)
tf.random.set_seed(1793)
Expand Down Expand Up @@ -168,3 +167,8 @@ def pred_var(x):
from util.plotting import plot_bo_points, plot_function_2d

plot_active_learning_query(result, bo_iter, num_initial_points, query_points, num_query)

# %% [markdown]
# ## LICENSE
#
# [Apache License 2.0](https://github.com/secondmind-labs/trieste/blob/develop/LICENSE)
5 changes: 2 additions & 3 deletions docs/notebooks/ask_tell_optimization.pct.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
from trieste.models.gpflow.models import GaussianProcessRegression
from trieste.objectives import scaled_branin, SCALED_BRANIN_MINIMUM
from trieste.objectives.utils import mk_observer
from trieste.observer import OBJECTIVE
from trieste.space import Box

from util.plotting import plot_regret
Expand Down Expand Up @@ -129,7 +128,7 @@ def plot_ask_tell_regret(ask_tell_result):
# ## External experiment: storing optimizer state
#
# Now let's suppose you are optimizing a process that takes hours or even days to complete, e.g. a lab experiment or a hyperparameter optimization of a big machine learning model. This time you cannot even express the objective function in Python code. Instead you would like to ask Trieste what configuration to run next, go to the lab, perform the experiment, collect data, feed it back to Trieste and ask for the next configuration, and so on. It would be very convenient to be able to store intermediate optimization state to disk or database or other storage, so that your machine can be switched off while you are waiting for observation results.
#
#
# In this section we'll show how you could do it with Ask-Tell in Trieste. Of course we cannot perform a real physical experiment within this notebook, so we will just mimick it by using pickle to write optimization state and read it back.

# %%
Expand Down Expand Up @@ -159,4 +158,4 @@ def plot_ask_tell_regret(ask_tell_result):


# %% [markdown]
# A word of warning. This serialization technique is not guaranteed to work smoothly with every Tensorflow-based model, so apply to your own problems with caution.
# A word of warning. This serialization technique is not guaranteed to work smoothly with every Tensorflow-based model, so apply to your own problems with caution.
18 changes: 1 addition & 17 deletions docs/notebooks/asynchronous_greedy_multiprocessing.pct.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.12.0
# kernelspec:
# display_name: 'Python 3.7.5 64-bit (''.venv'': venv)'
# name: python3
# ---

# %% [markdown]
# # Asynchronous Bayesian optimization with Trieste
#
Expand All @@ -22,13 +9,10 @@
# silence TF warnings and info messages, only print errors
# https://stackoverflow.com/questions/35911252/disable-tensorflow-debugging-information
import os

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
import tensorflow as tf

tf.get_logger().setLevel("ERROR")
import numpy as np

import time
import timeit

Expand Down Expand Up @@ -103,7 +87,7 @@ def build_model(data):
# To keep this notebook as reproducible as possible, we will only be using Python's multiprocessing package here. In this section we will explain our setup and define some common code to be used later.
#
# In both synchronous and asynchronous scenarios we will have a fixed set of worker processes performing observations. We will also have a main process responsible for optimization process with Trieste. When Trieste suggests a new point, it is inserted into a points queue. One of the workers picks this point from the queue, performs the observation, and inserts the output into the observations queue. The main process then picks up the observation from the queue, at which moment it either waits for the rest of the points in the batch to come back (synchronous scenario) or immediately suggests a new point (asynchronous scenario). This process continues either for a certain number of iterations or until we accumulate necessary number of observations.
#
#
# The overall setup is illustrated in this diagram:
# ![multiprocessing setup](figures/async_bo.png)

Expand Down
23 changes: 7 additions & 16 deletions docs/notebooks/asynchronous_nongreedy_batch_ray.pct.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,3 @@
# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.12.0
# kernelspec:
# display_name: 'Python 3.7.5 64-bit (''.venv'': venv)'
# name: python3
# ---

# %% [markdown]
# # Asynchronous batch Bayesian optimization
#
Expand All @@ -27,11 +14,11 @@
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
import tensorflow as tf
tf.get_logger().setLevel("ERROR")

import ray
import numpy as np
import time


# %% [markdown]
# Just as in the other [notebook on asynchronous optimization](asynchronous_greedy_multiprocessing.ipynb), we use Branin function with delays.

Expand Down Expand Up @@ -121,7 +108,8 @@ def build_model(data):
from trieste.ask_tell_optimization import AskTellOptimizer

model = build_model(initial_data)
acquisition_function = BatchMonteCarloExpectedImprovement(sample_size=10000)
monte_carlo_sample_size = 10000
acquisition_function = BatchMonteCarloExpectedImprovement(sample_size=monte_carlo_sample_size)
async_rule = AsynchronousOptimization(acquisition_function, num_query_points=batch_size) # type: ignore
async_bo = AskTellOptimizer(search_space, initial_data, model, async_rule)

Expand Down Expand Up @@ -195,4 +183,7 @@ def launch_worker(x):
# %%
ray.shutdown() # "Undo ray.init()". Terminate all the processes started in this notebook.

# %%
# %% [markdown]
# ## LICENSE
#
# [Apache License 2.0](https://github.com/secondmind-labs/trieste/blob/develop/LICENSE)
25 changes: 14 additions & 11 deletions docs/notebooks/batch_optimization.pct.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@
# %%
import numpy as np
import tensorflow as tf
from util.plotting import create_grid, plot_acq_function_2d
from util.plotting_plotly import plot_function_plotly
from util.plotting import plot_acq_function_2d
import matplotlib.pyplot as plt
import trieste

Expand Down Expand Up @@ -84,9 +83,11 @@ def build_model(data):
from trieste.acquisition import BatchMonteCarloExpectedImprovement
from trieste.acquisition.rule import EfficientGlobalOptimization

batch_ei_acq = BatchMonteCarloExpectedImprovement(sample_size=1000, jitter=1e-5)
monte_carlo_sample_size = 1000
batch_ei_acq = BatchMonteCarloExpectedImprovement(sample_size=monte_carlo_sample_size, jitter=1e-5)
batch_size = 10
batch_ei_acq_rule = EfficientGlobalOptimization( # type: ignore
num_query_points=10, builder=batch_ei_acq)
num_query_points=batch_size, builder=batch_ei_acq)
points_chosen_by_batch_ei = batch_ei_acq_rule.acquire_single(search_space, model, dataset=initial_data)

# %% [markdown]
Expand All @@ -95,9 +96,10 @@ def build_model(data):
# %%
from trieste.acquisition import LocalPenalizationAcquisitionFunction

local_penalization_acq = LocalPenalizationAcquisitionFunction(search_space, num_samples=2000)
sample_size = 2000
local_penalization_acq = LocalPenalizationAcquisitionFunction(search_space, num_samples=sample_size)
local_penalization_acq_rule = EfficientGlobalOptimization( # type: ignore
num_query_points=10, builder=local_penalization_acq)
num_query_points=batch_size, builder=local_penalization_acq)
points_chosen_by_local_penalization = local_penalization_acq_rule.acquire_single(
search_space, model, dataset=initial_data)

Expand All @@ -107,9 +109,9 @@ def build_model(data):
# %%
from trieste.acquisition import GIBBON

gibbon_acq = GIBBON(search_space, grid_size = 2000)
gibbon_acq = GIBBON(search_space, grid_size = sample_size)
gibbon_acq_rule = EfficientGlobalOptimization( # type: ignore
num_query_points=10, builder=gibbon_acq)
num_query_points=batch_size, builder=gibbon_acq)
points_chosen_by_gibbon = gibbon_acq_rule.acquire_single(
search_space, model, dataset=initial_data)

Expand Down Expand Up @@ -174,7 +176,8 @@ def build_model(data):
batch_ei_rule = EfficientGlobalOptimization( # type: ignore
num_query_points=3, builder=batch_ei_acq
)
qei_result = bo.optimize(10, initial_data, model, acquisition_rule=batch_ei_rule)
num_steps = 10
qei_result = bo.optimize(num_steps, initial_data, model, acquisition_rule=batch_ei_rule)

# %% [markdown]
# then we repeat the same optimization with `LocalPenalizationAcquisitionFunction`...
Expand All @@ -184,7 +187,7 @@ def build_model(data):
num_query_points=3, builder=local_penalization_acq
)
local_penalization_result = bo.optimize(
10, initial_data, model, acquisition_rule=local_penalization_rule
num_steps, initial_data, model, acquisition_rule=local_penalization_rule
)

# %% [markdown]
Expand All @@ -195,7 +198,7 @@ def build_model(data):
num_query_points=3, builder=gibbon_acq
)
gibbon_result = bo.optimize(
10, initial_data, model, acquisition_rule=gibbon_rule
num_steps, initial_data, model, acquisition_rule=gibbon_rule
)

# %% [markdown]
Expand Down
8 changes: 4 additions & 4 deletions docs/notebooks/data_transformation.pct.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# -*- coding: utf-8 -*-
# %% [markdown]
# # Data transformation with the help of Ask-Tell interface.

# %%
import os

import gpflow
import matplotlib.pyplot as plt
import numpy as np
Expand Down Expand Up @@ -120,10 +120,10 @@ def build_gp_model(data, x_std = 1.0, y_std = 0.1):
# We'll run the optimizer for 100 steps. Note: this may take a while!

# %%
num_acquisitions = 100
num_steps = 100

bo = trieste.bayesian_optimizer.BayesianOptimizer(observer, search_space)
result = bo.optimize(num_acquisitions, initial_data, model)
result = bo.optimize(num_steps, initial_data, model)
dataset = result.try_get_final_dataset()


Expand Down Expand Up @@ -200,7 +200,7 @@ def normalise(x, mean=None, std=None):
normalised_data = Dataset(query_points=x_sta, observations=y_sta)

dataset = initial_data
for step in range(num_acquisitions):
for step in range(num_steps):

if step == 0:
model = build_gp_model(normalised_data)
Expand Down
Loading

0 comments on commit 1aff59b

Please sign in to comment.