Merge #145

145: Docsgp r=lm2612 a=lm2612 I've started on the GP emulator docs to include: 1. Brief introduction to what a Gaussian process is (could definitely be expanded) 2. Implementation: choice of GPJL or SKJL, how to optimize hyperparameters and how to predict new points 3. Prediction Type: data or latent variable. This part needs more info. 4. Kernels: How to change kernel, I've started with examples for different kernels in GPJL, not yet added SKLJL options. 5. Learning the noise: how to add option to learn noise, again could be expanded. Co-authored-by: lm2612 <[email protected]>
CliMA · Jul 15, 2022 · 3ba3749 · 3ba3749
2 parents 245b264 + b6fb5b0
commit 3ba3749
Show file tree

Hide file tree

Showing 3 changed files with 147 additions and 0 deletions.
diff --git a/docs/make.jl b/docs/make.jl
@@ -25,6 +25,7 @@ pages = [
     "Calibrate" => "calibrate.md",
     "Emulate" => "emulate.md",
     "Examples" => examples,
+    "Gaussian Process" => "GaussianProcessEmulator.md",
     "Package Design" => design,
     "API" => api,
     "Glossary" => "glossary.md",

diff --git a/docs/src/GaussianProcessEmulator.md b/docs/src/GaussianProcessEmulator.md
@@ -0,0 +1,131 @@
+# Gaussian Process Emulator
+
+One type of `MachineLearningTool` we provide for emulation is a Gaussian process.
+Gaussian processes are a generalization of the Gaussian probability distribution, extended to functions rather than random variables.
+They can be used for statistical emulation, as they provide both mean and covariances.
+To build a Gaussian process, we first define a prior over all possible functions, by choosing the covariance function or kernel.
+The kernel describes how similar two outputs `(y_i, y_j)` are, given the similarities between their input values `(x_i, x_j)`.
+Kernels encode the functional form of these relationships and are defined by hyperparameters, which are usually initially unknown to the user.
+To learn the posterior Gaussian process, we condition on data using Bayes theorem and optimize the hyperparameters of the kernel. 
+Then, we can make predictions to predict a mean function and covariance for new data points.
+
+A useful resource to learn about Gaussian processes is [Rasmussen and Williams (2006)](http://gaussianprocess.org/gpml/).
+
+
+# User Interface
+
+`CalibrateEmulateSample.jl` allows the Gaussian process emulator to be built using
+either [`GaussianProcesses.jl`](https://stor-i.github.io/GaussianProcesses.jl/latest/) 
+or [`ScikitLearn.jl`](https://scikitlearnjl.readthedocs.io/en/latest/models/#scikitlearn-models).
+To use `GaussianProcesses.jl`, define the package type as
+```julia
+gppackage = Emulators.GPJL()
+```
+
+To use `ScikitLearn.jl`, define the package type as
+```julia
+gppackage = Emulators.SKLJL()
+```
+
+
+Initialize a basic Gaussian Process with
+```julia
+gauss_proc = GaussianProcess(gppackage)
+```
+
+This initializes the prior Gaussian process. 
+We train the Gaussian process by feeding the `gauss_proc` alongside the data into the `Emulator` struct and optimizing the hyperparameters,
+described [here](https://clima.github.io/CalibrateEmulateSample.jl/dev/emulate/#Typical-construction-from-Lorenz_example.jl).
+
+# Prediction Type
+
+You can specify the type of prediction when initializing the Gaussian Process emulator.
+The default type of prediction is to predict data, `YType()`. 
+You can create a latent function type prediction with
+
+```julia
+gauss_proc = GaussianProcess(
+    gppackage,
+    prediction_type = FType())
+
+```
+
+
+# Kernels
+
+The Gaussian process above assumes the default kernel: the Squared Exponential kernel, also called 
+the [Radial Basis Function (RBF)](https://en.wikipedia.org/wiki/Radial_basis_function_kernel). 
+A different type of kernel can be specified when the Gaussian process is initialized. 
+Read more about [kernel options here](https://www.cs.toronto.edu/~duvenaud/cookbook/).
+
+
+## GPJL 
+For the `GaussianProcess.jl` package, there are [a range of kernels to choose from](https://stor-i.github.io/GaussianProcesses.jl/latest/kernels). 
+For example, 
+```julia
+using GaussianProcesses
+my_kernel = GaussianProcesses.Mat32Iso(0., 0.)      # Create a Matern 3/2 kernel with lengthscale=0 and sd=0
+gauss_proc = GaussianProcess(
+    gppackage;
+    kernel = my_kernel )
+```
+You do not need to provide useful hyperparameter values when you define the kernel, as these are learned in 
+`optimize_hyperparameters!(emulator)`.
+
+You can also combine kernels together through linear operations, for example,
+```julia
+using GaussianProcesses
+kernel_1 = GaussianProcesses.Mat32Iso(0., 0.)      # Create a Matern 3/2 kernel with lengthscale=0 and sd=0
+kernel_2 = GaussianProcesses.Lin(0.)               # Create a linear kernel with lengthscale=0
+my_kernel = kernel_1 + kernel_2                    # Create a new additive kernel
+gauss_proc = GaussianProcess(
+    gppackage;
+    kernel = my_kernel )
+```
+
+## SKLJL
+Alternatively if you are using the `ScikitLearn.jl` package, you can [find the list of kernels here](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.gaussian_process). 
+These need this preamble:
+```julia
+using PyCall
+using ScikitLearn
+const pykernels = PyNULL()
+function __init__()
+    copy!(pykernels, pyimport("sklearn.gaussian_process.kernels"))
+end
+```
+Then they are accessible, for example, as
+```julia
+my_kernel = pykernels.RBF(length_scale = 1)
+gauss_proc = GaussianProcess(
+    gppackage;
+    kernel = my_kernel )
+```
+You can also combine multiple ScikitLearn kernels via linear operations in the same way as above.
+
+# Learning the noise
+
+Often it is useful to learn the noise of the data by adding a white noise kernel. This is added with the 
+Boolean keyword `noise_learn` when initializing the Gaussian process. The default is true. 
+
+```julia
+gauss_proc = GaussianProcess(
+    gppackage;
+    noise_learn = true )
+```
+
+When `noise_learn` is true, an additional white noise kernel is added to the kernel. This white noise is present
+across all parameter values, including the training data. 
+The scale parameters of the white noise kernel are learned in `optimize_hyperparameters!(emulator)`. 
+
+You may not need to learn the noise if you already have a good estimate of the noise from your training data. 
+When `noise_learn` is false, additional regularization is added for stability.
+The default value is `1e-3` but this can be chosen through the optional argument `alg_reg_noise`:
+
+```julia
+gauss_proc = GaussianProcess(
+    gppackage;
+    noise_learn = false,
+    alg_reg_noise = 1e-3 )
+```
+
diff --git a/docs/src/emulate.md b/docs/src/emulate.md
@@ -26,6 +26,21 @@ emulator = Emulator(
 ```
 The optional arguments above relate to the data processing.
 
+### Emulator Training
+
+The emulator is trained when we combine the machine learning tool and the data into the `Emulator` above. 
+For any machine learning tool, we must also optimize the hyperparameters:
+```julia
+optimize_hyperparameters!(emulator)
+```
+In the Lorenz example, this line learns the hyperparameters of the Gaussian process, which depend on the choice of [kernel](https://clima.github.io/CalibrateEmulateSample.jl/dev/GaussianProcessEmulator/#kernels).
+Predictions at new inputs can then be made using
+```julia
+y, cov = Emulator.predict(emulator, new_inputs)
+```
+This returns both a mean value and a covariance.
+
+
 ## Data processing
 
 Some effects of the following are outlined in a practical setting in the results and appendices of [Howland, Dunbar, Schneider, (2022)](https://doi.org/10.1029/2021MS002735).