Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation improvements (inc. lengthscale explanation) and Matern12Kernel alias #213

Merged
merged 20 commits into from
Jan 9, 2021
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions docs/src/create_kernel.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

KernelFunctions.jl contains the most popular kernels already but you might want to make your own!

Here are a few ways depending on how complicated your kernel is :
Here are a few ways depending on how complicated your kernel is:

### SimpleKernel for kernels function depending on a metric
### SimpleKernel for kernel functions depending on a metric

If your kernel function is of the form `k(x, y) = f(d(x, y))` where `d(x, y)` is a `PreMetric`,
you can construct your custom kernel by defining `kappa` and `metric` for your kernel.
Expand All @@ -20,15 +20,15 @@ KernelFunctions.metric(::MyKernel) = SqEuclidean()
### Kernel for more complex kernels

If your kernel does not satisfy such a representation, all you need to do is define `(k::MyKernel)(x, y)` and inherit from `Kernel`.
For example we recreate here the `NeuralNetworkKernel`
For example, we recreate here the `NeuralNetworkKernel`:

```julia
struct MyKernel <: KernelFunctions.Kernel end

(::MyKernel)(x, y) = asin(dot(x, y) / sqrt((1 + sum(abs2, x)) * (1 + sum(abs2, y))))
```

Note that `BaseKernel` do not use `Distances.jl` and can therefore be a bit slower.
Note that the fallback implementation of the base `Kernel` evaluation does not use `Distances.jl` and can therefore be a bit slower.

### Additional Options

Expand All @@ -37,7 +37,7 @@ Finally there are additional functions you can define to bring in more features:
- `KernelFunctions.dim(x::MyDataType)`: by default the dimension of the inputs will only be checked for vectors of type `AbstractVector{<:Real}`. If you want to check the dimensionality of your inputs, dispatch the `dim` function on your datatype. Note that `0` is the default.
- `dim` is called within `KernelFunctions.validate_inputs(x::MyDataType, y::MyDataType)`, which can instead be directly overloaded if you want to run special checks for your input types.
- `kernelmatrix(k::MyKernel, ...)`: you can redefine the diverse `kernelmatrix` functions to eventually optimize the computations.
- `Base.print(io::IO, k::MyKernel)`: if you want to specialize the printing of your kernel
- `Base.print(io::IO, k::MyKernel)`: if you want to specialize the printing of your kernel.

KernelFunctions uses [Functors.jl](https://github.com/FluxML/Functors.jl) for specifying trainable kernel parameters
in a way that is compatible with the [Flux ML framework](https://github.com/FluxML/Flux.jl).
Expand Down
2 changes: 1 addition & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# KernelFunctions.jl

Model agnostic kernel functions compatible with automatic differentiation
Model-agnostic kernel functions compatible with automatic differentiation

**KernelFunctions.jl** is a general purpose kernel package.
It aims at providing a flexible framework for creating kernels and manipulating them.
Expand Down
71 changes: 40 additions & 31 deletions docs/src/kernels.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

# Base Kernels

These are the basic kernels without any transformation of the data. They are the building blocks of KernelFunctions
These are the basic kernels without any transformation of the data. They are the building blocks of KernelFunctions.


## Constant Kernels
Expand Down Expand Up @@ -86,21 +86,20 @@ The [`FBMKernel`](@ref) is defined as
k(x,x';h) = \frac{|x|^{2h} + |x'|^{2h} - |x-x'|^{2h}}{2},
```

where $h$ is the [Hurst index](https://en.wikipedia.org/wiki/Hurst_exponent#Generalized_exponent) and $0<h<1$.
where $h$ is the [Hurst index](https://en.wikipedia.org/wiki/Hurst_exponent#Generalized_exponent) and $0 < h < 1$.

## Gabor Kernel

The [`GaborKernel`](@ref) is defined as

```math
k(x,x'; l,p) =& h(x-x';l,p)\\
h(u;l,p) =& \exp\left(-\cos\left(\pi \sum_i \frac{u_i}{p_i}\right)\sum_i \frac{u_i^2}{l_i^2}\right),
k(x,x'; l,p) = \exp\left(-\cos\left(\pi \sum_i \frac{x_i - x'_i}{p_i}\right)\sum_i \frac{(x_i - x'_i)^2}{l_i^2}\right),
```
where $l_i >0 $ is the lengthscale and $p_i>0$ is the period.
where $l_i > 0$ is the lengthscale and $p_i > 0$ is the period.

## Matern Kernels
## Matérn Kernels

### Matern Kernel
### General Matérn Kernel

The [`MaternKernel`](@ref) is defined as

Expand All @@ -110,15 +109,23 @@ The [`MaternKernel`](@ref) is defined as

where $\nu > 0$.

### Matern 3/2 Kernel
### Matérn 1/2 Kernel

The Matérn 1/2 kernel is defined as
```math
k(x,x') = \exp\left(-|x-x'|\right),
```
equivalent to the Exponential kernel. `Matern12Kernel` is an alias for [`ExponentialKernel`](@ref).

### Matérn 3/2 Kernel

The [`Matern32Kernel`](@ref) is defined as

```math
k(x,x') = \left(1+\sqrt{3}|x-x'|\right)\exp\left(\sqrt{3}|x-x'|\right).
```

### Matern 5/2 Kernel
### Matérn 5/2 Kernel

The [`Matern52Kernel`](@ref) is defined as

Expand All @@ -128,7 +135,7 @@ The [`Matern52Kernel`](@ref) is defined as

## Neural Network Kernel

The [`NeuralNetworkKernel`](@ref) (as in the kernel for an infinitely wide neural network interpretated as a Gaussian process) is defined as
The [`NeuralNetworkKernel`](@ref) (as in the kernel for an infinitely wide neural network interpreted as a Gaussian process) is defined as

```math
k(x, x') = \arcsin\left(\frac{\langle x, x'\rangle}{\sqrt{(1+\langle x, x\rangle)(1+\langle x',x'\rangle)}}\right).
Expand All @@ -142,19 +149,23 @@ The [`PeriodicKernel`](@ref) is defined as
k(x,x';r) = \exp\left(-0.5 \sum_i (sin (π(x_i - x'_i))/r_i)^2\right),
```

where $r$ has the same dimension as $x$ and $r_i >0$.
where $r$ has the same dimension as $x$ and $r_i > 0$.

## Piecewise Polynomial Kernel

The [`PiecewisePolynomialKernel`](@ref) is defined as

The [`PiecewisePolynomialKernel`](@ref) is defined for $x, x'\in \mathbb{R}^D$, a positive-definite matrix $P \in \mathbb{R}^{D \times D}$, and $V \in \{0,1,2,3\}$ as
```math
k(x,x'; P, V) =& \max(1 - r, 0)^{j + V} f(r, j),\\
r =& x^\top P x',\\
j =& \lfloor \frac{D}{2}\rfloor + V + 1,
k(x,x'; P, V) = \max(1 - x^\top P x', 0)^{j + V} f_V(x^\top P x', j),
st-- marked this conversation as resolved.
Show resolved Hide resolved
```
where $j = \lfloor \frac{D}{2}\rfloor + V + 1$, and $f_V$ are polynomials defined as follows:
````math
st-- marked this conversation as resolved.
Show resolved Hide resolved
\begin{aligned}
f_0(r, j) &= 1, \\
f_1(r, j) &= 1 + (j + 1) r, \\
f_2(r, j) &= 1 + (j + 2) r + ((j^2 + 4j + 3) / 3) r^2, \\
f_3(r, j) &= 1 + (j + 3) r + ((6 j^2 + 36j + 45) / 15) r^2 + ((j^3 + 9 j^2 + 23j + 15) / 15) r^3.
\end{aligned}
```
st-- marked this conversation as resolved.
Show resolved Hide resolved
where $x\in \mathbb{R}^D$, $V \in \{0,1,2,3\} and $P$ is a positive definite matrix.
$f$ is a piecewise polynomial (see source code).

## Polynomial Kernels

Expand All @@ -166,7 +177,7 @@ The [`LinearKernel`](@ref) is defined as
k(x,x';c) = \langle x,x'\rangle + c,
```

where $c \in \mathbb{R}$
where $c \in \mathbb{R}$.

### Polynomial Kernel

Expand All @@ -176,7 +187,7 @@ The [`PolynomialKernel`](@ref) is defined as
k(x,x';c,d) = \left(\langle x,x'\rangle + c\right)^d,
```

where $c \in \mathbb{R}$ and $d>0$
where $c \in \mathbb{R}$ and $d>0$.


## Rational Quadratic
Expand Down Expand Up @@ -223,43 +234,41 @@ where $i\in\{-1,0,1,2,3\}$ and coefficients $a_i$, $b_i$ are fixed and residuals

### Transformed Kernel

The [`TransformedKernel`](@ref) is a kernel where input are transformed via a function `f`
The [`TransformedKernel`](@ref) is a kernel where inputs are transformed via a function `f`:

```math
k(x,x';f,\widetile{k}) = \widetilde{k}(f(x),f(x')),
k(x,x';f,\widetilde{k}) = \widetilde{k}(f(x),f(x')),
```

Where $\widetilde{k}$ is another kernel and $f$ is an arbitrary mapping.
where $\widetilde{k}$ is another kernel and $f$ is an arbitrary mapping.

### Scaled Kernel

The [`ScaledKernel`](@ref) is defined as

```math
k(x,x';\sigma^2,\widetilde{k}) = \sigma^2\widetilde{k}(x,x')
k(x,x';\sigma^2,\widetilde{k}) = \sigma^2\widetilde{k}(x,x') ,
```

Where $\widetilde{k}$ is another kernel and $\sigma^2 > 0$.
where $\widetilde{k}$ is another kernel and $\sigma^2 > 0$.

### Kernel Sum

The [`KernelSum`](@ref) is defined as a sum of kernels
The [`KernelSum`](@ref) is defined as a sum of kernels:

```math
k(x, x'; \{k_i\}) = \sum_i k_i(x, x').
devmotion marked this conversation as resolved.
Show resolved Hide resolved
```

### KernelProduct
### Kernel Product

The [`KernelProduct`](@ref) is defined as a product of kernels
The [`KernelProduct`](@ref) is defined as a product of kernels:

```math
k(x,x';\{k_i\}) = \prod_i k_i(x,x').
devmotion marked this conversation as resolved.
Show resolved Hide resolved
```

### Tensor Product

The [`TensorProduct`](@ref) is defined as :
The [`TensorProduct`](@ref) is defined as:

```math
k(x,x';\{k_i\}) = \prod_i k_i(x_i,x'_i)
Expand Down
15 changes: 9 additions & 6 deletions docs/src/metrics.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,19 @@
# Metrics

KernelFunctions.jl relies on [Distances.jl](https://github.com/JuliaStats/Distances.jl) for computing the pairwise matrix.
To do so a distance measure is needed for each kernel. Two very common ones can already be used : `SqEuclidean` and `Euclidean`.
However all kernels do not rely on distances metrics respecting all the definitions. That's why additional metrics come with the package such as `DotProduct` (`<x,y>`) and `Delta` (`δ(x,y)`).
Note that every `SimpleKernel` must have a defined metric defined as :
`SimpleKernel` implementations rely on [Distances.jl](https://github.com/JuliaStats/Distances.jl) for efficiently computing the pairwise matrix.
This requires a distance measure or metric, such as the commonly used `SqEuclidean` and `Euclidean`.

The metric used by a given kernel type is specified as
```julia
KernelFunctions.metric(::CustomKernel) = SqEuclidean()
KernelFunctions.metric(::CustomKernel) = SqEuclidean()
```

However, there are kernels that can be implemented efficiently using "metrics" that do not respect all the definitions expected by Distances.jl. For this reason, KernelFunctions.jl provides additional "metrics" such as `DotProduct` ($\langle x, y \rangle$) and `Delta` ($\delta(x,y)$).


## Adding a new metric

If you want to create a new distance just implement the following :
If you want to create a new "metric" just implement the following:

```julia
struct Delta <: Distances.PreMetric
Expand Down
10 changes: 5 additions & 5 deletions docs/src/transform.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Transform
# Input Transforms

`Transform` is the object that takes care of transforming the input data before distances are being computed. It can be as standard as `IdentityTransform` returning the same input, or multiplying the data by a scalar with `ScaleTransform` or by a vector with `ARDTransform`.
There is a more general `Transform`: `FunctionTransform` that uses a function and apply it on each vector via `mapslices`.
You can also create a pipeline of `Transform` via `TransformChain`. For example `LowRankTransform(rand(10,5))∘ScaleTransform(2.0)`.
There is a more general `Transform`: `FunctionTransform` that uses a function and applies it on each vector via `mapslices`.
You can also create a pipeline of `Transform` via `TransformChain`. For example, `LowRankTransform(rand(10,5))∘ScaleTransform(2.0)`.

One apply a transformation on a matrix or a vector via `KernelFunctions.apply(t::Transform,v::AbstractVecOrMat)`
A transformation `t` can be applied to a matrix or a vector `v` via `KernelFunctions.apply(t, v)`.

Check the list on the [API page](@ref Transforms)
Check the list on the [API page](@ref Transforms).
35 changes: 34 additions & 1 deletion src/basekernels/exponential.jl
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,29 @@ iskroncompatible(::SqExponentialKernel) = true
Base.show(io::IO,::SqExponentialKernel) = print(io,"Squared Exponential Kernel")

## Aliases ##

"""
RBFKernel()

See [`SqExponentialKernel`](@ref)
"""
const RBFKernel = SqExponentialKernel

"""
GaussianKernel()

See [`SqExponentialKernel`](@ref)
"""
const GaussianKernel = SqExponentialKernel

"""
SEKernel()

See [`SqExponentialKernel`](@ref)
"""
const SEKernel = SqExponentialKernel


"""
ExponentialKernel()

Expand All @@ -42,9 +61,23 @@ iskroncompatible(::ExponentialKernel) = true

Base.show(io::IO, ::ExponentialKernel) = print(io, "Exponential Kernel")

## Alias ##
## Aliases ##

"""
LaplacianKernel()

See [`ExponentialKernel`](@ref)
"""
const LaplacianKernel = ExponentialKernel

"""
Matern12Kernel()

See [`ExponentialKernel`](@ref)
"""
const Matern12Kernel = ExponentialKernel
devmotion marked this conversation as resolved.
Show resolved Hide resolved


"""
GammaExponentialKernel(; γ = 2.0)

Expand Down
2 changes: 2 additions & 0 deletions src/basekernels/matern.jl
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ metric(::MaternKernel) = Euclidean()

Base.show(io::IO, κ::MaternKernel) = print(io, "Matern Kernel (ν = ", first(κ.ν), ")")

## Matern12Kernel = ExponentialKernel aliased in exponential.jl

"""
Matern32Kernel()

Expand Down
5 changes: 2 additions & 3 deletions src/basekernels/piecewisepolynomial.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,12 @@
PiecewisePolynomialKernel{V}(maha::AbstractMatrix)

Piecewise Polynomial covariance function with compact support, V = 0,1,2,3.
The kernel functions are 2v times continuously differentiable and the corresponding
processes are hence v times mean-square differentiable. The kernel function is:
The kernel functions are 2V times continuously differentiable and the corresponding
processes are hence V times mean-square differentiable. The kernel function is:
st-- marked this conversation as resolved.
Show resolved Hide resolved
```math
κ(x, y) = max(1 - r, 0)^(j + V) * f(r, j) with j = floor(D / 2) + V + 1
```
where `r` is the Mahalanobis distance mahalanobis(x,y) with `maha` as the metric.

"""
struct PiecewisePolynomialKernel{V, A<:AbstractMatrix{<:Real}} <: SimpleKernel
maha::A
Expand Down
1 change: 0 additions & 1 deletion src/transform/lineartransform.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ The second dimension of `A` must match the number of features of the target.

```julia-repl
julia> A = rand(10, 5)

julia> tr = LinearTransform(A)
```
"""
Expand Down