diff --git a/docs/index.md b/docs/index.md index 2ee142c9f..7879bc8a3 100644 --- a/docs/index.md +++ b/docs/index.md @@ -4,7 +4,7 @@ GPJax is a didactic Gaussian process (GP) library in JAX, supporting GPU acceleration and just-in-time compilation. We seek to provide a flexible API to enable researchers to rapidly prototype and develop new ideas. -![Gaussian process posterior.](./_static/GP.svg) +![Gaussian process posterior.](static/GP.svg) ## "Hello, GP!" @@ -40,7 +40,7 @@ would write on paper, as shown below. !!! Install - GPJax can be installed via pip. See our [installation guide](https://docs.jaxgaussianprocesses.com/installation/) for further details. + GPJax can be installed via pip. See our [installation guide](installation.md) for further details. ```bash pip install gpjax @@ -48,7 +48,7 @@ would write on paper, as shown below. !!! New - New to GPs? Then why not check out our [introductory notebook](https://docs.jaxgaussianprocesses.com/examples/intro_to_gps/) that starts from Bayes' theorem and univariate Gaussian distributions. + New to GPs? Then why not check out our [introductory notebook](_examples/intro_to_gps.md) that starts from Bayes' theorem and univariate Gaussian distributions. !!! Begin diff --git a/docs/sharp_bits.md b/docs/sharp_bits.md index be88beb0c..72aeb726b 100644 --- a/docs/sharp_bits.md +++ b/docs/sharp_bits.md @@ -60,7 +60,7 @@ learning rate is greater is than 0.03, we would end up with a negative variance We visualise this issue below where the red cross denotes the invalid lengthscale value that would be obtained, were we to optimise in the unconstrained parameter space. -![](_static/step_size_figure.svg) +![](static/step_size_figure.svg) A simple but impractical solution would be to use a tiny learning rate which would reduce the possibility of stepping outside of the parameter's support. However, this @@ -70,7 +70,7 @@ subspace of the real-line onto the entire real-line. Here, gradient updates are applied in the unconstrained parameter space before transforming the value back to the original support of the parameters. Such a transformation is known as a bijection. -![](_static/bijector_figure.svg) +![](static/bijector_figure.svg) To help understand this, we show the effect of using a log-exp bijector in the above figure. We have six points on the positive real line that range from 0.1 to 3 depicted @@ -81,8 +81,7 @@ value, we apply the inverse of the bijector, which is the exponential function i case. This gives us back the blue cross. In GPJax, we supply bijective functions using [Tensorflow Probability](https://www.tensorflow.org/probability/api_docs/python/tfp/substrates/jax/bijectors). -In our [PyTrees doc](examples/pytrees.md) document, we detail how the user can define -their own bijectors and attach them to the parameter(s) of their model. + ## Positive-definiteness @@ -91,8 +90,7 @@ their own bijectors and attach them to the parameter(s) of their model. ### Why is positive-definiteness important? The Gram matrix of a kernel, a concept that we explore more in our -[kernels notebook](examples/constructing_new_kernels.py) and our [PyTree notebook](examples/pytrees.md), is a -symmetric positive definite matrix. As such, we +[kernels notebook](_examples/constructing_new_kernels.md). As such, we have a range of tools at our disposal to make subsequent operations on the covariance matrix faster. One of these tools is the Cholesky factorisation that uniquely decomposes any symmetric positive-definite matrix $\mathbf{\Sigma}$ by @@ -158,7 +156,7 @@ for some problems, this amount may need to be increased. ## Slow-to-evaluate Famously, a regular Gaussian process model (as detailed in -[our regression notebook](examples/regression.py)) will scale cubically in the number of data points. +[our regression notebook](_examples/regression.md)) will scale cubically in the number of data points. Consequently, if you try to fit your Gaussian process model to a data set containing more than several thousand data points, then you will likely incur a significant computational overhead. In such cases, we recommend using Sparse Gaussian processes to @@ -168,7 +166,7 @@ When the data contains less than around 50000 data points, we recommend using the collapsed evidence lower bound objective [@titsias2009] to optimise the parameters of your sparse Gaussian process model. Such a model will scale linearly in the number of data points and quadratically in the number of inducing points. We demonstrate its use -in [our sparse regression notebook](examples/collapsed_vi.py). +in [our sparse regression notebook](_examples/collapsed_vi.md). For data sets exceeding 50000 data points, even the sparse Gaussian process outlined above will become computationally infeasible. In such cases, we recommend using the @@ -176,4 +174,4 @@ uncollapsed evidence lower bound objective [@hensman2013gaussian] that allows st mini-batch optimisation of the parameters of your sparse Gaussian process model. Such a model will scale linearly in the batch size and quadratically in the number of inducing points. We demonstrate its use in -[our sparse stochastic variational inference notebook](examples/uncollapsed_vi.py). +[our sparse stochastic variational inference notebook](_examples/uncollapsed_vi.md).