Skip to content

Commit

Permalink
Merge pull request #152 from JuliaGNI/rework_array_docs
Browse files Browse the repository at this point in the history
Update of documentation (including docstrings) for custom array types
  • Loading branch information
michakraus authored Jun 11, 2024
2 parents 7ce3ab9 + c527cc2 commit 87e1f5d
Show file tree
Hide file tree
Showing 35 changed files with 1,333 additions and 259 deletions.
5 changes: 2 additions & 3 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ ChainRules = "082447d4-558c-5d27-93f4-14fc19e9eca2"
ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
ChainRulesTestUtils = "cdddcdb0-9152-4a09-a978-84456f9df70a"
Distances = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
GeometricBase = "9a0b12b7-583b-4f04-aa1f-d8551b6addc9"
GeometricEquations = "c85262ba-a08a-430a-b926-d29770767bf2"
Expand All @@ -37,7 +36,6 @@ ChainRules = "1"
ChainRulesCore = "1"
ChainRulesTestUtils = "1"
Distances = "0.10"
Documenter = "0.27, 1"
ForwardDiff = "0.10"
GeometricBase = "0.10"
GeometricEquations = "0.16"
Expand All @@ -56,6 +54,7 @@ julia = "1.8"
[extras]
Aqua = "4c88cf16-eb10-579e-8560-4a9242c79595"
ChainRulesTestUtils = "cdddcdb0-9152-4a09-a978-84456f9df70a"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
FiniteDifferences = "26cc04aa-876d-5657-8c51-4c34ba976000"
GeometricEquations = "c85262ba-a08a-430a-b926-d29770767bf2"
GeometricProblems = "18cb22b4-ad41-5c80-9c5f-710df63fbdc9"
Expand All @@ -65,4 +64,4 @@ SafeTestsets = "1bc83da4-3b8d-516f-aca4-4fe02f6d838f"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["ChainRulesTestUtils", "Random", "SafeTestsets", "Test", "GeometricProblems", "FiniteDifferences"]
test = ["Documenter", "ChainRulesTestUtils", "Random", "SafeTestsets", "Test", "GeometricProblems", "FiniteDifferences"]
9 changes: 5 additions & 4 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ const html_format = Documenter.HTML(;
],
# specifies that we do not display the package name again (it's already in the logo)
sidebar_sitename = false,
# we should get rid of this line again eventually. We will be able to do this once we got rid of library.md
size_threshold = 262144,
)

# if platform is set to "none" then no output pdf is generated
Expand All @@ -48,7 +50,7 @@ end
function latex_graphics(path::String; label = nothing, caption = nothing, width = .5)
figure_width = "$(width)\\textwidth"
latex_label = isnothing(label) ? "" : "\\label{" * label * "}"
latex_caption = isnothing(caption) ? "" : "\\caption{" * caption * "}"
latex_caption = isnothing(caption) ? "" : "\\caption{" * string(Markdown.parse(caption))[1:end-2] * "}"
latex_string = """\\begin{figure}
\\includegraphics[width = """ * figure_width * "]{" * path * ".png}" *
latex_caption *
Expand Down Expand Up @@ -144,10 +146,9 @@ makedocs(;
"Riemannian Manifolds" => "manifolds/riemannian_manifolds.md",
"Homogeneous Spaces" => "manifolds/homogeneous_spaces.md",
],
"Arrays" => [
"Special Arrays" => [
"Symmetric and Skew-Symmetric Matrices" => "arrays/skew_symmetric_matrix.md",
"Stiefel Global Tangent Space" => "arrays/stiefel_lie_alg_horizontal.md",
"Grassmann Global Tangent Space"=> "arrays/grassmann_lie_alg_hor_matrix.md",
"Global Tangent Spaces" => "arrays/global_tangent_spaces.md",
],
"Optimizer Framework" => [
"Optimizers" => "Optimizer.md",
Expand Down
9 changes: 9 additions & 0 deletions docs/src/GeometricMachineLearning.bib
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,15 @@ @inproceedings{feng1987symplectic
organization={Springer}
}

@book{holm2009geometric,
title={Geometric mechanics and symmetry: from finite to infinite dimensions},
author={Holm, Darryl D and Schmah, Tanya and Stoica, Cristina},
volume={12},
year={2009},
publisher={Oxford University Press},
address={Oxford, UK}
}

@article{ge1988approximation,
title={On the approximation of linear Hamiltonian systems},
author={Ge, Zhong and Feng, Kang},
Expand Down
2 changes: 1 addition & 1 deletion docs/src/Optimizer.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Optimizer

In order to generalize neural network optimizers to [homogeneous spaces](manifolds/homogeneous_spaces.md), a class of manifolds we often encounter in machine learning, we have to find a [global tangent space representation](arrays/stiefel_lie_alg_horizontal.md) which we call $\mathfrak{g}^\mathrm{hor}$ here.
In order to generalize neural network optimizers to [homogeneous spaces](@ref "Homogeneous Spaces"), a class of manifolds we often encounter in machine learning, we have to find a [global tangent space representation](@ref "Global Tangent Spaces") which we call $\mathfrak{g}^\mathrm{hor}$ here.

Starting from an element of the tangent space $T_Y\mathcal{M}$[^1], we need to perform two mappings to arrive at $\mathfrak{g}^\mathrm{hor}$, which we refer to by $\Omega$ and a red horizontal arrow:

Expand Down
2 changes: 1 addition & 1 deletion docs/src/architectures/linear_symplectic_transformer.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
The linear symplectic transformer consists of a combination of [linear symplectic attention](@ref "Linear Symplectic Attention") and [gradient](@ref "SympNet Gradient Layer") layers and is visualized below:

```@example
Main.include_graphics("../tikz/linear_symplectic_transformer"; caption = raw"Visualization of the linear symplectic transformer architecutre. \texttt{n\_sympnet} refers to the number of SympNet layers (\texttt{n\_sympnet=2} in this figure) and \texttt{L} refers to the number of transformer blocks (\texttt{L=1} in this figure).", width = .3) # hide
Main.include_graphics("../tikz/linear_symplectic_transformer"; caption = raw"Visualization of the linear symplectic transformer architecutre. ``\mathtt{n\_sympnet}`` refers to the number of SympNet layers (``\mathtt{n\_sympnet}=2`` in this figure) and ``\mathtt{L}`` refers to the number of transformer blocks (``\mathtt{L=1}`` in this figure).", width = .3) # hide
```

## Library Functions
Expand Down
212 changes: 212 additions & 0 deletions docs/src/arrays/global_tangent_spaces.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
# Global Tangent Spaces

In `GeometricMachineLearning` standard neural network optimizers are generalized to [homogeneous spaces](@ref "Homogeneous Spaces") by leveraging the special structure of the tangent spaces of this class of manifolds. When we introduced homogeneous spaces we already talked about that every tangent space to a homogeneous space ``T_Y\mathcal{M}`` is of the form:

```math
T_Y\mathcal{M} = \mathfrak{g} \cdot Y := \{AY: A\in{}\mathfrak{g}\}.
```

We then have a decomposition of ``\mathfrak{g}`` into a vertical part ``\mathfrak{g}^{\mathrm{ver}, Y}`` and a horizontal part ``\mathfrak{g}^{\mathrm{hor}, Y}`` and the horizontal part is isomorphic to ``T_Y\mathcal{M}``.

We now identify a special element ``E \in \mathcal{M}`` and designate the horizontal component ``\mathfrak{g}^{\mathrm{hor}, E}`` as our *global tangent space*. We will refer to this global tangent space by ``\mathfrak{g}^\mathrm{hor}``. We can now find a transformation from any ``\mathfrak{g}^{\mathrm{hor}, Y}`` to ``\mathfrak{g}^\mathrm{hor}`` and vice-versa (these spaces are isomorphic).

```@eval
Main.theorem(raw"Let ``A\in{}G`` an element such that ``AE = Y``. Then we have
" * Main.indentation * raw"```math
" * Main.indentation * raw"A^{-1}\cdot\mathfrak{g}^{\mathrm{hor},Y}\cdot{}A = \mathfrak{g}^\mathrm{hor},
" * Main.indentation * raw"```
" * Main.indentation * raw"i.e. for every element ``B\in\mathfrak{g}^\mathrm{hor}`` we can find a ``B^Y \in \mathfrak{g}^{\mathrm{hor},Y}`` s.t. ``B = A^{-1}B^YA`` (and vice-versa).")
```

```@eval
Main.proof(raw"We first show that for every ``B^Y\in\mathfrak{g}^{\mathrm{hor},Y}`` the element ``A^{-1}B^YA`` is in ``\mathfrak{g}^{\mathrm{hor}}``. First not that ``A^{-1}B^YA\in\mathfrak{g}`` by a fundamental theorem of Lie group theory (closedness of the Lie algebra under adjoint action). Now assume that ``A^{-1}B^YA`` is not fully contained in ``\mathfrak{g}^\mathrm{hor}``, i.e. it also has a vertical component. So we would lose information when performing ``A^{-1}B^YA \mapsto A^{-1}B^YAE = A^{-1}B^YY``, but this contradicts the fact that ``B^Y\in\mathfrak{g}^{\mathrm{hor},Y}.`` We now have to proof that for every ``B\in\mathfrak{g}^\mathrm{hor}`` we can find an element in ``\mathfrak{g}^{\mathrm{hor}, Y}`` such that this element is mapped to ``B``. By a argument similar to the one above we can show that ``ABA^{-1}\in\mathfrak{g}^\mathrm{hor, Y}`` and this element maps to ``B``. Proofing that the map is injective is now trivial.")
```

We should note that we have written all Lie group and Lie algebra actions as simple matrix multiplications, like ``AE = Y``. For some Lie groups and Lie algebras we should use different notations [holm2009geometric](@cite). These Lie groups are however not relevant for what we use in `GeometricMachineLearning` and we will stick to regular matrix notation.

## Global Sections

Note that the theorem above requires us to find an element ``A\in{}G`` such that ``AE = Y``. If we can find a mapping ``\lambda:\mathcal{M}\to{}G`` we call such a mapping a *global section*.

```@eval
Main.theorem(raw"We call a mapping from ``\lambda:\mathcal{M} \to G`` a homogeneous space to its associated Lie group a **global section** if it satisfies:
" * Main.indentation * raw"```math
" * Main.indentation * raw"\lambda(Y)E = Y,
" * Main.indentation * raw"```
" * Main.indentation * raw"where ``E`` is the distinct element of the homogeneous space.")
```

Note that in general global sections are not unique because the rank of ``G`` is in general greater than that of ``\mathcal{M}``. We give an example of how to construct such a global section for the Stiefel and the Grassmann manifolds below.

## The Global Tangent Space for the Stiefel Manifold

We now discuss the specific form of the global tangent space for the [Stiefel manifold](@ref "The Stiefel Manifold"). We choose the distinct element[^1] ``E`` to have an especially simple form (this matrix can be build by calling [`StiefelProjection`](@ref)):

[^1]: We already introduced this special matrix together with the Stiefel manifold.

```math
E = \begin{bmatrix}
\mathbb{I}_n \\
\mathbb{O}
\end{bmatrix}\in{}St(n, N).
```

Based on this elements of the vector space ``\mathfrak{g}^{\mathrm{hor}, E} =: \mathfrak{g}^{\mathrm{hor}}`` are:

```math
\begin{pmatrix}
A & B^T \\ B & \mathbb{O}
\end{pmatrix},
```

where ``A`` is a skew-symmetric matrix of size ``n\times{}n`` and ``B`` is an arbitrary matrix of size ``(N - n)\times{}n``.

Arrays of type ``\mathfrak{g}^{\mathrm{hor}, E}`` are implemented in `GeometricMachineLearning` under the name [`StiefelLieAlgHorMatrix`](@ref).

We can call this with e.g. a skew-symmetric matrix ``A`` and an arbitrary matrix ``B``:

```@example call_stiefel_lie_alg_hor_matrix_1
using GeometricMachineLearning # hide
N, n = 10, 4
A = rand(SkewSymMatrix, n)
```

```@example call_stiefel_lie_alg_hor_matrix_1
B = rand(N - n, n)
```

```@example call_stiefel_lie_alg_hor_matrix_1
B1 = StiefelLieAlgHorMatrix(A, B, N, n)
```

We can also call it with a matrix of shape ``N\times{}N``:

```@example call_stiefel_lie_alg_hor_matrix_1
B2 = Matrix(B1) # note that this does not have any special structure
StiefelLieAlgHorMatrix(B2, n)
```

Or we can call it a matrix of shape ``N\times{}n``:

```@example call_stiefel_lie_alg_hor_matrix_1
E = StiefelProjection(N, n)
```

```@example call_stiefel_lie_alg_hor_matrix_1
B3 = B1 * E
StiefelLieAlgHorMatrix(B3, n)
```

We now demonstrate how to map from an element of ``\mathfrak{g}^{\mathrm{hor}, Y}`` to an element of ``\mathfrak{g}^\mathrm{hor}``:

```@example global_section
using GeometricMachineLearning # hide
N, n = 10, 5
Y = rand(StiefelManifold, N, n)
Δ = rgrad(Y, rand(N, n))
ΩΔ = GeometricMachineLearning.Ω(Y, Δ)
λY = GlobalSection(Y)
λY_mat = Matrix(λY)
round.(λY_mat' * ΩΔ * λY_mat; digits = 3)
```

Performing this computation directly is computationally very inefficient however and the user is strongly discouraged to call `Matrix` on an instance of [`GlobalSection`](@ref). The better option is calling [`global_rep`](@ref):

```@example global_section
using GeometricMachineLearning: _round # hide
_round(global_rep(λY, Δ); digits = 3)
```

Internally `GlobalSection` calls the function [`GeometricMachineLearning.global_section`](@ref) which does the following for the Stiefel manifold:

```julia
A = randn(N, N - n) # or the gpu equivalent
A = A - Y * (Y' * A)
Y = qr(A).Q[1:N, 1:(N - n)]
```

So we draw ``(N - n)`` new columns randomly, subtract the part that is spanned by the columns of ``Y`` and then perform a ``QR`` composition on the resulting matrix. The ``Q`` part of the decomposition is a matrix of ``(N - n)`` columns that is orthogonal to ``Y`` and is typically referred to as ``Y_\perp`` [absil2004riemannian, absil2008optimization, bendokat2020grassmann](@cite). We can easily check that this ``Y_\perp`` is indeed orthogonal to ``Y``.

```@eval
Main.theorem(raw"The matrix ``Y_\perp`` constructed with the above algorithm satisfies
" * Main.indentation * raw"```math
" * Main.indentation * raw"Y^TY_\perp = \mathbb{O},
" * Main.indentation * raw"```
" * Main.indentation * raw"and
" * Main.indentation * raw"```math
" * Main.indentation * raw"(Y_\perp)^TY_\perp = \mathbb{I},
" * Main.indentation * raw"```
" * Main.indentation * raw"i.e. all the columns in the big matrix ``[Y, Y_\perp]\in\mathbb{R}^{N\times{}N}`` are mutually orthonormal and it therefore is an element of ``SO(N)``.")
```

```@eval
Main.proof(raw"The second property is trivially satisfied because the ``Q`` component of a ``QR`` decomposition is an orthogonal matrix. For the first property note that ``Y^TQR = \mathbb{O}`` is zero because we have subtracted the ``Y`` component from the matrix ``QR``. The matrix ``R\in\mathbb{R}^{N\times{}(N-n)}`` further has the property ``[R]_{ij} = 0`` for ``i > j`` and we have that
" * Main.indentation * raw"```math
" * Main.indentation * raw"(Y^TQ)R = [r_{11}(Y^TQ)_{1\bullet}, r_{12}(Y^TQ)_{1\bullet} + r_{22}(Y^TQ)_{2\bullet}, \ldots, \sum_{i=1}^{N-n}r_{i(N-n)}(Y^TQ)_{i\bullet}].
" * Main.indentation * raw"```
" * Main.indentation * raw"Now all the coefficients ``r_{ii}`` are non-zero because the matrix we performed the ``QR`` decomposition on has full rank and we can see that if ``(Y^TQ)R`` is zero ``Y^TQ`` also has to be zero.")
```

We now discuss the global tangent space for the Grassmann manifold. This is similar to the Stiefel case.

## Global Tangent Space for the Grassmann Manifold

In the case of the Grassmann manifold we construct the global tangent space with respect to the distinct element ``\mathcal{E}=\mathrm{span}(E)\in{}Gr(n,N)``, where ``E`` is again the same matrix.

The tangent tangent space ``T_\mathcal{E}Gr(n,N)`` can be represented through matrices:

```math
\begin{pmatrix}
0 & \cdots & 0 \\
\cdots & \cdots & \cdots \\
0 & \cdots & 0 \\
b_{11} & \cdots & b_{1n} \\
\cdots & \cdots & \cdots \\
b_{(N-n)1} & \cdots & b_{(N-n)n}
\end{pmatrix}.
```

This representation is based on the identification ``T_\mathcal{E}Gr(n,N)\to{}T_E\mathcal{S}_E`` that was discussed in the section on the [Grassmann manifold](@ref "The Grassmann Manifold")[^2]. We use the following notation:

[^2]: We derived the following expression for the [Riemannian gradient of the Grassmann manifold](@ref "The Riemannian Gradient of the Grassmann Manifold"): ``\mathrm{grad}_\mathcal{Y}^{Gr}L = \nabla_Y{}L - YY^T\nabla_YL``. The tangent space to the element ``\mathcal{E}`` can thus be written as ``\bar{B} - EE^T\bar{B}`` where ``B\in\mathbb{R}^{N\times{}n}`` and the matrices in this tangent space have the desired form.

```math
\mathfrak{g}^\mathrm{hor} = \mathfrak{g}^{\mathrm{hor},\mathcal{E}} = \left\{\begin{pmatrix} 0 & -B^T \\ B & 0 \end{pmatrix}: \text{$B$ arbitrary}\right\}.
```

This is equivalent to the horizontal component of ``\mathfrak{g}`` for the Stiefel manifold for the case when ``A`` is zero. This is a reflection of the rotational invariance of the Grassmann manifold: the skew-symmetric matrices ``A`` are connected to the group of rotations ``O(n)`` which is factored out in the Grassmann manifold ``Gr(n,N)\simeq{}St(n,N)/O(n)``. In `GeometricMachineLearning` we thus treat the Grassmann manifold as being embedded in the Stiefel manifold. In [bendokat2020grassmann](@cite) viewing the Grassmann manifold as a quotient space of the Stiefel manifold is important for "feasibility" in "practical computations".

## Library Functions

```@docs; canonical=false
GeometricMachineLearning.AbstractLieAlgHorMatrix
StiefelLieAlgHorMatrix
StiefelLieAlgHorMatrix(::AbstractMatrix, ::Int)
GrassmannLieAlgHorMatrix
GrassmannLieAlgHorMatrix(::AbstractMatrix, ::Int)
GlobalSection
GeometricMachineLearning.global_section
global_rep
```


## References

```@bibliography
Pages = []
Canonical = false
absil2004riemannian
absil2008optimization
bendokat2020grassmann
brantner2023generalizing
```
34 changes: 0 additions & 34 deletions docs/src/arrays/grassmann_lie_alg_hor_matrix.md

This file was deleted.

Loading

0 comments on commit 87e1f5d

Please sign in to comment.