diff --git a/latest/.documenter-siteinfo.json b/latest/.documenter-siteinfo.json index 0ff1ca534..954ddde5e 100644 --- a/latest/.documenter-siteinfo.json +++ b/latest/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.10.2","generation_timestamp":"2024-04-15T15:39:46","documenter_version":"1.4.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.10.2","generation_timestamp":"2024-04-15T15:41:40","documenter_version":"1.4.0"}} \ No newline at end of file diff --git a/latest/Optimizer/index.html b/latest/Optimizer/index.html index 2aba991e3..97be2d39c 100644 --- a/latest/Optimizer/index.html +++ b/latest/Optimizer/index.html @@ -1,2 +1,2 @@ -Optimizers · GeometricMachineLearning.jl

Optimizer

In order to generalize neural network optimizers to homogeneous spaces, a class of manifolds we often encounter in machine learning, we have to find a global tangent space representation which we call $\mathfrak{g}^\mathrm{hor}$ here.

Starting from an element of the tangent space $T_Y\mathcal{M}$[1], we need to perform two mappings to arrive at $\mathfrak{g}^\mathrm{hor}$, which we refer to by $\Omega$ and a red horizontal arrow:

Here the mapping $\Omega$ is a horizontal lift from the tangent space onto the horizontal component of the Lie algebra at $Y$.

The red line maps the horizontal component at $Y$, i.e. $\mathfrak{g}^{\mathrm{hor},Y}$, to the horizontal component at $\mathfrak{g}^\mathrm{hor}$.

The $\mathrm{cache}$ stores information about previous optimization steps and is dependent on the optimizer. The elements of the $\mathrm{cache}$ are also in $\mathfrak{g}^\mathrm{hor}$. Based on this the optimer (Adam in this case) computes a final velocity, which is the input of a retraction. Because this update is done for $\mathfrak{g}^{\mathrm{hor}}\equiv{}T_Y\mathcal{M}$, we still need to perform a mapping, called apply_section here, that then finally updates the network parameters. The two red lines are described in global sections.

References

[15]
B. Brantner. Generalizing Adam To Manifolds For Efficiently Training Transformers, arXiv preprint arXiv:2305.16901 (2023).
  • 1In practice this is obtained by first using an AD routine on a loss function $L$, and then computing the Riemannian gradient based on this. See the section of the Stiefel manifold for an example of this.
+Optimizers · GeometricMachineLearning.jl

Optimizer

In order to generalize neural network optimizers to homogeneous spaces, a class of manifolds we often encounter in machine learning, we have to find a global tangent space representation which we call $\mathfrak{g}^\mathrm{hor}$ here.

Starting from an element of the tangent space $T_Y\mathcal{M}$[1], we need to perform two mappings to arrive at $\mathfrak{g}^\mathrm{hor}$, which we refer to by $\Omega$ and a red horizontal arrow:

Here the mapping $\Omega$ is a horizontal lift from the tangent space onto the horizontal component of the Lie algebra at $Y$.

The red line maps the horizontal component at $Y$, i.e. $\mathfrak{g}^{\mathrm{hor},Y}$, to the horizontal component at $\mathfrak{g}^\mathrm{hor}$.

The $\mathrm{cache}$ stores information about previous optimization steps and is dependent on the optimizer. The elements of the $\mathrm{cache}$ are also in $\mathfrak{g}^\mathrm{hor}$. Based on this the optimer (Adam in this case) computes a final velocity, which is the input of a retraction. Because this update is done for $\mathfrak{g}^{\mathrm{hor}}\equiv{}T_Y\mathcal{M}$, we still need to perform a mapping, called apply_section here, that then finally updates the network parameters. The two red lines are described in global sections.

References

[15]
B. Brantner. Generalizing Adam To Manifolds For Efficiently Training Transformers, arXiv preprint arXiv:2305.16901 (2023).
  • 1In practice this is obtained by first using an AD routine on a loss function $L$, and then computing the Riemannian gradient based on this. See the section of the Stiefel manifold for an example of this.
diff --git a/latest/architectures/autoencoders/index.html b/latest/architectures/autoencoders/index.html index 635d6558b..c97bfe899 100644 --- a/latest/architectures/autoencoders/index.html +++ b/latest/architectures/autoencoders/index.html @@ -1,3 +1,3 @@ Variational Autoencoders · GeometricMachineLearning.jl

Variational Autoencoders

Variational autoencoders (Lee and Carlberg, 2020) train on the following set:

\[\mathcal{X}(\mathbb{P}_\mathrm{train}) := \{\mathbf{x}^k(\mu) - \mathbf{x}^0(\mu):0\leq{}k\leq{}K,\mu\in\mathbb{P}_\mathrm{train}\},\]

where $\mathbf{x}^k(\mu)\approx\mathbf{x}(t^k;\mu)$. Note that $\mathbf{0}\in\mathcal{X}(\mathbb{P}_\mathrm{train})$ as $k$ can also be zero.

The encoder $\Psi^\mathrm{enc}$ and decoder $\Psi^\mathrm{dec}$ are then trained on this set $\mathcal{X}(\mathbb{P}_\mathrm{train})$ by minimizing the reconstruction error:

\[|| \mathbf{x} - \Psi^\mathrm{dec}\circ\Psi^\mathrm{enc}(\mathbf{x}) ||\text{ for $\mathbf{x}\in\mathcal{X}(\mathbb{P}_\mathrm{train})$}.\]

Initial condition

No matter the parameter $\mu$ the initial condition in the reduced system is always $\mathbf{x}_{r,0}(\mu) = \mathbf{x}_{r,0} = \Psi^\mathrm{enc}(\mathbf{0})$.

Reconstructed solution

In order to arrive at the reconstructed solution one first has to decode the reduced state and then add the reference state:

\[\mathbf{x}^\mathrm{reconstr}(t;\mu) = \mathbf{x}^\mathrm{ref}(\mu) + \Psi^\mathrm{dec}(\mathbf{x}_r(t;\mu)),\]

where $\mathbf{x}^\mathrm{ref}(\mu) = \mathbf{x}(t_0;\mu) - \Psi^\mathrm{dec}\circ\Psi^\mathrm{dec}(\mathbf{0})$.

Symplectic reduced vector field

A symplectic vector field is one whose flow conserves the symplectic structure $\mathbb{J}$. This is equivalent[1] to there existing a Hamiltonian $H$ s.t. the vector field $X$ can be written as $X = \mathbb{J}\nabla{}H$.

If the full-order Hamiltonian is $H^\mathrm{full}\equiv{}H$ we can obtain another Hamiltonian on the reduces space by simply setting:

\[H^\mathrm{red}(\mathbf{x}_r(t;\mu)) = H(\mathbf{x}^\mathrm{reconstr}(t;\mu)) = H(\mathbf{x}^\mathrm{ref}(\mu) + \Psi^\mathrm{dec}(\mathbf{x}_r(t;\mu))).\]

The ODE associated to this Hamiltonian is also the one corresponding to Manifold Galerkin ROM (see (Lee and Carlberg, 2020)).

Manifold Galerkin ROM

Define the FOM ODE residual as:

\[r: (\mathbf{v}, \xi, \tau; \mu) \mapsto \mathbf{v} - f(\xi, \tau; \mu).\]

The reduced ODE is then defined to be:

\[\dot{\hat{\mathbf{x}}}(t;\mu) = \mathrm{arg\,{}min}_{\hat{\mathbf{v}}\in\mathbb{R}^p}|| r(\mathcal{J}(\hat{\mathbf{x}}(t;\mu))\hat{\mathbf{v}},\hat{\mathbf{x}}^\mathrm{ref}(\mu) + \Psi^\mathrm{dec}(\hat{\mathbf{x}}(t;\mu)),t;\mu) ||_2^2,\]

where $\mathcal{J}$ is the Jacobian of the decoder $\Psi^\mathrm{dec}$. This leads to:

\[\mathcal{J}(\hat{\mathbf{x}}(t;\mu))\hat{\mathbf{v}} - f(\hat{\mathbf{x}}^\mathrm{ref}(\mu) + \Psi^\mathrm{dec}(\hat{\mathbf{x}}(t;\mu)), t; \mu) \overset{!}{=} 0 \implies -\hat{\mathbf{v}} = \mathcal{J}(\hat{\mathbf{x}}(t;\mu))^+f(\hat{\mathbf{x}}^\mathrm{ref}(\mu) + \Psi^\mathrm{dec}(\hat{\mathbf{x}}(t;\mu)), t; \mu),\]

where $\mathcal{J}(\hat{\mathbf{x}}(t;\mu))^+$ is the pseudoinverse of $\mathcal{J}(\hat{\mathbf{x}}(t;\mu))$. Because $\mathcal{J}(\hat{\mathbf{x}}(t;\mu))$ is a symplectic matrix the pseudoinverse is the symplectic inverse (see (Peng and Mohseni, 2016)).

Furthermore, because $f$ is Hamiltonian, the vector field describing $dot{\hat{\mathbf{x}}}(t;\mu)$ will also be Hamiltonian.

References

  • K. Lee and K. Carlberg. “Model reduction of dynamical systems on nonlinear manifolds using

deep convolutional autoencoders”. In: Journal of Computational Physics 404 (2020), p. 108973.

  • Peng L, Mohseni K. Symplectic model reduction of Hamiltonian systems[J]. SIAM Journal on Scientific Computing, 2016, 38(1): A1-A27.
  • 1Technically speaking the definitions are equivalent only for simply-connected manifolds, so also for vector spaces.
+\hat{\mathbf{v}} = \mathcal{J}(\hat{\mathbf{x}}(t;\mu))^+f(\hat{\mathbf{x}}^\mathrm{ref}(\mu) + \Psi^\mathrm{dec}(\hat{\mathbf{x}}(t;\mu)), t; \mu),\]

where $\mathcal{J}(\hat{\mathbf{x}}(t;\mu))^+$ is the pseudoinverse of $\mathcal{J}(\hat{\mathbf{x}}(t;\mu))$. Because $\mathcal{J}(\hat{\mathbf{x}}(t;\mu))$ is a symplectic matrix the pseudoinverse is the symplectic inverse (see (Peng and Mohseni, 2016)).

Furthermore, because $f$ is Hamiltonian, the vector field describing $dot{\hat{\mathbf{x}}}(t;\mu)$ will also be Hamiltonian.

References

deep convolutional autoencoders”. In: Journal of Computational Physics 404 (2020), p. 108973.

diff --git a/latest/architectures/sympnet/index.html b/latest/architectures/sympnet/index.html index a4ec77f2d..1de2d8e31 100644 --- a/latest/architectures/sympnet/index.html +++ b/latest/architectures/sympnet/index.html @@ -81,4 +81,4 @@ \begin{pmatrix} q \\ K^T \mathrm{diag}(a)\sigma(Kq+b)+p - \end{pmatrix}.\]

The parameters of this layer are the scaling matrix $K\in\mathbb{R}^{m\times d}$, the bias $b\in\mathbb{R}^{m}$ and the scaling vector $a\in\mathbb{R}^{m}$. The name "gradient layer" has its origin in the fact that the expression $[K^T\mathrm{diag}(a)\sigma(Kq+b)]_i = \sum_jk_{ji}a_j\sigma(\sum_\ell{}k_{j\ell}q_\ell+b_j)$ is the gradient of a function $\sum_ja_j\tilde{\sigma}(\sum_\ell{}k_{j\ell}q_\ell+b_j)$, where $\tilde{\sigma}$ is the antiderivative of $\sigma$. The first dimension of $K$ we refer to as the upscaling dimension.

If we denote by $\mathcal{M}^G$ the set of gradient layers, a $G$-SympNet is a function of the form $\Psi=g_k \circ g_{k-1} \circ \cdots \circ g_0$ where $(g_i)_{0\leq i\leq k} \subset (\mathcal{M}^G)^k$. The index $k$ is again the number of hidden layers.

Further note here the different roles played by round and square brackets: the latter indicates a nonlinear operation as opposed to a regular vector or matrix.

Universal approximation theorems

In order to state the universal approximation theorem for both architectures we first need a few definitions:

Let $U$ be an open set of $\mathbb{R}^{2d}$, and let us denote by $\mathcal{SP}^r(U)$ the set of $C^r$ smooth symplectic maps on $U$. We now define a topology on $C^r(K, \mathbb{R}^n)$, the set of $C^r$-smooth maps from a compact set $K\subset\mathbb{R}^{n}$ to $\mathbb{R}^{n}$ through the norm

\[||f||_{C^r(K,\mathbb{R}^{n})} = \underset{|\alpha|\leq r}{\sum} \underset{1\leq i \leq n}{\max}\underset{x\in K}{\sup} |D^\alpha f_i(x)|,\]

where the differential operator $D^\alpha$ is defined by

\[D^\alpha f = \frac{\partial^{|\alpha|} f}{\partial x_1^{\alpha_1}...x_n^{\alpha_n}},\]

with $|\alpha| = \alpha_1 +...+ \alpha_n$.

Definition $\sigma$ is $r$-finite if $\sigma\in C^r(\mathbb{R},\mathbb{R})$ and $\int |D^r\sigma(x)|dx <+\infty$.

Definition Let $m,n,r\in \mathbb{N}$ with $m,n>0$ be given, $U$ an open set of $\mathbb{R}^m$, and $I,J\subset C^r(U,\mathbb{R}^n)$. We say $J$ is $r$-uniformly dense on compacta in $I$ if $J \subset I$ and for any $f\in I$, $\epsilon>0$, and any compact $K\subset U$, there exists $g\in J$ such that $||f-g||_{C^r(K,\mathbb{R}^{n})} < \epsilon$.

We can now state the universal approximation theorems:

Theorem (Approximation theorem for LA-SympNet) For any positive integer $r>0$ and open set $U\in \mathbb{R}^{2d}$, the set of $LA$-SympNet is $r$-uniformly dense on compacta in $SP^r(U)$ if the activation function $\sigma$ is $r$-finite.

Theorem (Approximation theorem for G-SympNet) For any positive integer $r>0$ and open set $U\in \mathbb{R}^{2d}$, the set of $G$-SympNet is $r$-uniformly dense on compacta in $SP^r(U)$ if the activation function $\sigma$ is $r$-finite.

There are many $r$-finite activation functions commonly used in neural networks, for example:

The universal approximation theorems state that we can, in principle, get arbitrarily close to any symplectomorphism defined on $\mathbb{R}^{2d}$. But this does not tell us anything about how to optimize the network. This is can be done with any common neural network optimizer and these neural network optimizers always rely on a corresponding loss function.

Loss function

To train the SympNet, one need data along a trajectory such that the model is trained to perform an integration. These data are $(Q,P)$ where $Q[i,j]$ (respectively $P[i,j]$) is the real number $q_j(t_i)$ (respectively $p[i,j]$) which is the j-th coordinates of the generalized position (respectively momentum) at the i-th time step. One also need a loss function defined as :

\[Loss(Q,P) = \underset{i}{\sum} d(\Phi(Q[i,-],P[i,-]), [Q[i,-] P[i,-]]^T)\]

where $d$ is a distance on $\mathbb{R}^d$.

See the tutorial section for an introduction into using SympNets with GeometricMachineLearning.jl.

References

[1]
P. Jin, Z. Zhang, A. Zhu, Y. Tang and G. E. Karniadakis. SympNets: Intrinsic structure-preserving symplectic networks for identifying Hamiltonian systems. Neural Networks 132, 166–179 (2020).
+ \end{pmatrix}.\]

The parameters of this layer are the scaling matrix $K\in\mathbb{R}^{m\times d}$, the bias $b\in\mathbb{R}^{m}$ and the scaling vector $a\in\mathbb{R}^{m}$. The name "gradient layer" has its origin in the fact that the expression $[K^T\mathrm{diag}(a)\sigma(Kq+b)]_i = \sum_jk_{ji}a_j\sigma(\sum_\ell{}k_{j\ell}q_\ell+b_j)$ is the gradient of a function $\sum_ja_j\tilde{\sigma}(\sum_\ell{}k_{j\ell}q_\ell+b_j)$, where $\tilde{\sigma}$ is the antiderivative of $\sigma$. The first dimension of $K$ we refer to as the upscaling dimension.

If we denote by $\mathcal{M}^G$ the set of gradient layers, a $G$-SympNet is a function of the form $\Psi=g_k \circ g_{k-1} \circ \cdots \circ g_0$ where $(g_i)_{0\leq i\leq k} \subset (\mathcal{M}^G)^k$. The index $k$ is again the number of hidden layers.

Further note here the different roles played by round and square brackets: the latter indicates a nonlinear operation as opposed to a regular vector or matrix.

Universal approximation theorems

In order to state the universal approximation theorem for both architectures we first need a few definitions:

Let $U$ be an open set of $\mathbb{R}^{2d}$, and let us denote by $\mathcal{SP}^r(U)$ the set of $C^r$ smooth symplectic maps on $U$. We now define a topology on $C^r(K, \mathbb{R}^n)$, the set of $C^r$-smooth maps from a compact set $K\subset\mathbb{R}^{n}$ to $\mathbb{R}^{n}$ through the norm

\[||f||_{C^r(K,\mathbb{R}^{n})} = \underset{|\alpha|\leq r}{\sum} \underset{1\leq i \leq n}{\max}\underset{x\in K}{\sup} |D^\alpha f_i(x)|,\]

where the differential operator $D^\alpha$ is defined by

\[D^\alpha f = \frac{\partial^{|\alpha|} f}{\partial x_1^{\alpha_1}...x_n^{\alpha_n}},\]

with $|\alpha| = \alpha_1 +...+ \alpha_n$.

Definition $\sigma$ is $r$-finite if $\sigma\in C^r(\mathbb{R},\mathbb{R})$ and $\int |D^r\sigma(x)|dx <+\infty$.

Definition Let $m,n,r\in \mathbb{N}$ with $m,n>0$ be given, $U$ an open set of $\mathbb{R}^m$, and $I,J\subset C^r(U,\mathbb{R}^n)$. We say $J$ is $r$-uniformly dense on compacta in $I$ if $J \subset I$ and for any $f\in I$, $\epsilon>0$, and any compact $K\subset U$, there exists $g\in J$ such that $||f-g||_{C^r(K,\mathbb{R}^{n})} < \epsilon$.

We can now state the universal approximation theorems:

Theorem (Approximation theorem for LA-SympNet) For any positive integer $r>0$ and open set $U\in \mathbb{R}^{2d}$, the set of $LA$-SympNet is $r$-uniformly dense on compacta in $SP^r(U)$ if the activation function $\sigma$ is $r$-finite.

Theorem (Approximation theorem for G-SympNet) For any positive integer $r>0$ and open set $U\in \mathbb{R}^{2d}$, the set of $G$-SympNet is $r$-uniformly dense on compacta in $SP^r(U)$ if the activation function $\sigma$ is $r$-finite.

There are many $r$-finite activation functions commonly used in neural networks, for example:

The universal approximation theorems state that we can, in principle, get arbitrarily close to any symplectomorphism defined on $\mathbb{R}^{2d}$. But this does not tell us anything about how to optimize the network. This is can be done with any common neural network optimizer and these neural network optimizers always rely on a corresponding loss function.

Loss function

To train the SympNet, one need data along a trajectory such that the model is trained to perform an integration. These data are $(Q,P)$ where $Q[i,j]$ (respectively $P[i,j]$) is the real number $q_j(t_i)$ (respectively $p[i,j]$) which is the j-th coordinates of the generalized position (respectively momentum) at the i-th time step. One also need a loss function defined as :

\[Loss(Q,P) = \underset{i}{\sum} d(\Phi(Q[i,-],P[i,-]), [Q[i,-] P[i,-]]^T)\]

where $d$ is a distance on $\mathbb{R}^d$.

See the tutorial section for an introduction into using SympNets with GeometricMachineLearning.jl.

References

[1]
P. Jin, Z. Zhang, A. Zhu, Y. Tang and G. E. Karniadakis. SympNets: Intrinsic structure-preserving symplectic networks for identifying Hamiltonian systems. Neural Networks 132, 166–179 (2020).
diff --git a/latest/arrays/grassmann_lie_alg_hor_matrix/index.html b/latest/arrays/grassmann_lie_alg_hor_matrix/index.html index b81507c72..4c12893f5 100644 --- a/latest/arrays/grassmann_lie_alg_hor_matrix/index.html +++ b/latest/arrays/grassmann_lie_alg_hor_matrix/index.html @@ -9,4 +9,4 @@ a_{11} & \cdots & a_{1n} \\ \cdots & \cdots & \cdots \\ a_{(N-n)1} & \cdots & a_{(N-n)n} -\end{pmatrix},\]

where we have used the identification $T_\mathcal{E}Gr(n,N)\to{}T_E\mathcal{S}_E$ that was discussed in the section on the Grassmann manifold. The Grassmann manifold can also be seen as the Stiefel manifold modulo an equivalence class. This leads to the following (which is used for optimization):

\[\mathfrak{g}^\mathrm{hor} = \mathfrak{g}^{\mathrm{hor},\mathcal{E}} = \left\{\begin{pmatrix} 0 & -B^T \\ B & 0 \end{pmatrix}: \text{$B$ arbitrary}\right\}.\]

This is equivalent to the horizontal component of $\mathfrak{g}$ for the Stiefel manifold for the case when $A$ is zero. This is a reflection of the rotational invariance of the Grassmann manifold: the skew-symmetric matrices $A$ are connected to the group of rotations $O(n)$ which is factored out in the Grassmann manifold $Gr(n,N)\simeq{}St(n,N)/O(n)$.

+\end{pmatrix},\]

where we have used the identification $T_\mathcal{E}Gr(n,N)\to{}T_E\mathcal{S}_E$ that was discussed in the section on the Grassmann manifold. The Grassmann manifold can also be seen as the Stiefel manifold modulo an equivalence class. This leads to the following (which is used for optimization):

\[\mathfrak{g}^\mathrm{hor} = \mathfrak{g}^{\mathrm{hor},\mathcal{E}} = \left\{\begin{pmatrix} 0 & -B^T \\ B & 0 \end{pmatrix}: \text{$B$ arbitrary}\right\}.\]

This is equivalent to the horizontal component of $\mathfrak{g}$ for the Stiefel manifold for the case when $A$ is zero. This is a reflection of the rotational invariance of the Grassmann manifold: the skew-symmetric matrices $A$ are connected to the group of rotations $O(n)$ which is factored out in the Grassmann manifold $Gr(n,N)\simeq{}St(n,N)/O(n)$.

diff --git a/latest/arrays/skew_symmetric_matrix/index.html b/latest/arrays/skew_symmetric_matrix/index.html index 190ba0d2f..32a1f152f 100644 --- a/latest/arrays/skew_symmetric_matrix/index.html +++ b/latest/arrays/skew_symmetric_matrix/index.html @@ -2,18 +2,18 @@ Symmetric and Skew-Symmetric Matrices · GeometricMachineLearning.jl

SymmetricMatrix and SkewSymMatrix

There are special implementations of symmetric and skew-symmetric matrices in GeometricMachineLearning.jl. They are implemented to work on GPU and for multiplication with tensors. The following image demonstrates how the data necessary for an instance of SkewSymMatrix are stored[1]:

end #

So what is stored internally is a vector of size $n(n-1)/2$ for the skew-symmetric matrix and a vector of size $n(n+1)/2$ for the symmetric matrix. We can sample a random skew-symmetric matrix:

using GeometricMachineLearning # hide
 
 A = rand(SkewSymMatrix, 5)
5×5 SkewSymMatrix{Float64, Vector{Float64}}:
- 0.0       -0.716714  -0.652467  -0.399772  -0.845725
- 0.716714   0.0       -0.200171  -0.682448  -0.598291
- 0.652467   0.200171   0.0       -0.93225   -0.8443
- 0.399772   0.682448   0.93225    0.0       -0.213746
- 0.845725   0.598291   0.8443     0.213746   0.0

and then access the vector:

A.S
10-element Vector{Float64}:
- 0.7167143165606976
- 0.6524666372602955
- 0.20017076199256711
- 0.3997716348831375
- 0.682448461486373
- 0.9322500731717295
- 0.8457247408498944
- 0.5982912244637281
- 0.844300491850363
- 0.21374617967832232
  • 1It works similarly for SymmetricMatrix.
+ 0.0 -0.386481 -0.587248 -0.58316 -0.296752 + 0.386481 0.0 -0.75987 -0.0395198 -0.791269 + 0.587248 0.75987 0.0 -0.993098 -0.806084 + 0.58316 0.0395198 0.993098 0.0 -0.144389 + 0.296752 0.791269 0.806084 0.144389 0.0

and then access the vector:

A.S
10-element Vector{Float64}:
+ 0.3864805493023272
+ 0.5872480099511458
+ 0.7598695437592404
+ 0.5831596656088702
+ 0.039519799779951015
+ 0.9930982798738633
+ 0.296751627407034
+ 0.7912689001441062
+ 0.8060839030961772
+ 0.14438925368580213
diff --git a/latest/arrays/stiefel_lie_alg_horizontal/index.html b/latest/arrays/stiefel_lie_alg_horizontal/index.html index ce3c74258..be8a470ff 100644 --- a/latest/arrays/stiefel_lie_alg_horizontal/index.html +++ b/latest/arrays/stiefel_lie_alg_horizontal/index.html @@ -5,4 +5,4 @@ \end{bmatrix},\]

where $A\in\mathbb{R}^{n\times{}n}$ is skew-symmetric and $B\in\mathbb{R}^{N\times{}n}$ is arbitary. In GeometricMachineLearning the struct StiefelLieAlgHorMatrix implements elements of this form.

Theoretical background

Vertical and horizontal components

The Stiefel manifold $St(n, N)$ is a homogeneous space obtained from $SO(N)$ by setting two matrices, whose first $n$ columns conincide, equivalent. Another way of expressing this is:

\[A_1 \sim A_2 \iff A_1E = A_2E\]

for

\[E = \begin{bmatrix} \mathbb{I} \\ \mathbb{O}\end{bmatrix}.\]

Because $St(n,N)$ is a homogeneous space, we can take any element $Y\in{}St(n,N)$ and $SO(N)$ acts transitively on it, i.e. can produce any other element in $SO(N)$. A similar statement is also true regarding the tangent spaces of $St(n,N)$, namely:

\[T_YSt(n,N) = \mathfrak{g}\cdot{}Y,\]

i.e. every tangent space can be expressed through an action of the associated Lie algebra.

The kernel of the mapping $\mathfrak{g}\to{}T_YSt(n,N), B\mapsto{}BY$ is referred to as $\mathfrak{g}^{\mathrm{ver},Y}$, the vertical component of the Lie algebra at $Y$. In the case $Y=E$ it is easy to see that elements belonging to $\mathfrak{g}^{\mathrm{ver},E}$ are of the following form:

\[\begin{bmatrix} \hat{\mathbb{O}} & \tilde{\mathbb{O}}^T \\ \tilde{\mathbb{O}} & C -\end{bmatrix},\]

where $\hat{\mathbb{O}}\in\mathbb{R}^{n\times{}n}$ is a "small" matrix and $\tilde{\mathbb{O}}\in\mathbb{R}^{N\times{}n}$ is a bigger one. $C\in\mathbb{R}^{N\times{}N}$ is a skew-symmetric matrix.

The orthogonal complement of the vertical component is referred to as the horizontal component and denoted by $\mathfrak{g}^{\mathrm{hor}, Y}$. It is isomorphic to $T_YSt(n,N)$ and this isomorphism can be found explicitly. In the case of the Stiefel manifold:

\[\Omega(Y, \cdot):T_YSt(n,N)\to\mathfrak{g}^{\mathrm{hor},Y},\, \Delta \mapsto (\mathbb{I} - \frac{1}{2}YY^T)\Delta{}Y^T - Y\Delta^T(\mathbb{I} - \frac{1}{2}YY^T)\]

The elements of $\mathfrak{g}^{\mathrm{hor},E}=:\mathfrak{g}^\mathrm{hor}$, i.e. for the special case $Y=E$. Its elements are of the form described on top of this page.

Special functions

You can also draw random elements from $\mathfrak{g}^\mathrm{hor}$ through e.g.

rand(CUDADevice(), StiefelLieAlgHorMatrix{Float32}, 10, 5)

In this example: $N=10$ and $n=5$.

+\end{bmatrix},\]

where $\hat{\mathbb{O}}\in\mathbb{R}^{n\times{}n}$ is a "small" matrix and $\tilde{\mathbb{O}}\in\mathbb{R}^{N\times{}n}$ is a bigger one. $C\in\mathbb{R}^{N\times{}N}$ is a skew-symmetric matrix.

The orthogonal complement of the vertical component is referred to as the horizontal component and denoted by $\mathfrak{g}^{\mathrm{hor}, Y}$. It is isomorphic to $T_YSt(n,N)$ and this isomorphism can be found explicitly. In the case of the Stiefel manifold:

\[\Omega(Y, \cdot):T_YSt(n,N)\to\mathfrak{g}^{\mathrm{hor},Y},\, \Delta \mapsto (\mathbb{I} - \frac{1}{2}YY^T)\Delta{}Y^T - Y\Delta^T(\mathbb{I} - \frac{1}{2}YY^T)\]

The elements of $\mathfrak{g}^{\mathrm{hor},E}=:\mathfrak{g}^\mathrm{hor}$, i.e. for the special case $Y=E$. Its elements are of the form described on top of this page.

Special functions

You can also draw random elements from $\mathfrak{g}^\mathrm{hor}$ through e.g.

rand(CUDADevice(), StiefelLieAlgHorMatrix{Float32}, 10, 5)

In this example: $N=10$ and $n=5$.

diff --git a/latest/data_loader/TODO/index.html b/latest/data_loader/TODO/index.html index 3f53390e7..92db5f5b1 100644 --- a/latest/data_loader/TODO/index.html +++ b/latest/data_loader/TODO/index.html @@ -1,2 +1,2 @@ -DATA Loader TODO · GeometricMachineLearning.jl

DATA Loader TODO

  • [x] Implement @views instead of allocating a new array in every step.
  • [x] Implement sampling without replacement.
  • [x] Store information on the epoch and the current loss.
  • [x] Usually the training loss is computed over the entire data set, we are probably going to do this for one epoch via

\[loss_e = \frac{1}{|batches|}\sum_{batch\in{}batches}loss(batch).\]

Point 4 makes sense because the output of an AD routine is the value of the loss function as well as the pullback.

+DATA Loader TODO · GeometricMachineLearning.jl

DATA Loader TODO

  • [x] Implement @views instead of allocating a new array in every step.
  • [x] Implement sampling without replacement.
  • [x] Store information on the epoch and the current loss.
  • [x] Usually the training loss is computed over the entire data set, we are probably going to do this for one epoch via

\[loss_e = \frac{1}{|batches|}\sum_{batch\in{}batches}loss(batch).\]

Point 4 makes sense because the output of an AD routine is the value of the loss function as well as the pullback.

diff --git a/latest/data_loader/data_loader/index.html b/latest/data_loader/data_loader/index.html index fb2251045..d6cfa8cf1 100644 --- a/latest/data_loader/data_loader/index.html +++ b/latest/data_loader/data_loader/index.html @@ -1,28 +1,28 @@ Routines · GeometricMachineLearning.jl

Data Loader

Data Loader is a struct that creates an instance based on a tensor (or different input format) and is designed to make training convenient.

Constructor

The data loader can be called with various inputs:

  • A single vector: If the data loader is called with a single vector (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the second axis indicates parameter values and/or time steps and the system has a single degree of freedom (i.e. the system dimension is one).
  • A single matrix: If the data loader is called with a single matrix (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the first axis is assumed to indicate the degrees of freedom of the system and the second axis indicates parameter values and/or time steps.
  • A single tensor: If the data loader is called with a single tensor, then this is interpreted as an integration problem with the second axis indicating the time step and the third one indicating the parameters.
  • A tensor and a vector: This is a special case (MNIST classification problem). For the MNIST problem for example the input are $n_p$ matrices (first input argument) and $n_p$ integers (second input argument).
  • A NamedTuple with fields q and p: The NamedTuple contains (i) two matrices or (ii) two tensors.
  • An EnsembleSolution: The EnsembleSolution typically comes from GeometricProblems.

When we supply a single vector or a single matrix as input to DataLoader and further set autoencoder = false (keyword argument), then the data are stored as an integration problem and the second axis is assumed to indicate time steps.

The data loader can be called with various types of arrays as input, for example a snapshot matrix:

SnapshotMatrix = rand(Float32, 10, 100)
 
-dl = DataLoader(SnapshotMatrix)
DataLoader{Float32, Array{Float32, 3}, Nothing, :RegularData}(Float32[0.22285312; 0.24307334; … ; 0.6759369; 0.5519217;;; 0.20352274; 0.28339177; … ; 0.9727076; 0.7658595;;; 0.968267; 0.7961354; … ; 0.12518084; 0.76060724;;; … ;;; 0.9591879; 0.09975815; … ; 0.7455645; 0.8058178;;; 0.08448684; 0.6794417; … ; 0.67843664; 0.2641533;;; 0.7456144; 0.14503914; … ; 0.46142083; 0.4525925], nothing, 10, 1, 100, nothing, nothing)

or a snapshot tensor:

SnapshotTensor = rand(Float32, 10, 100, 5)
+dl = DataLoader(SnapshotMatrix)
DataLoader{Float32, Array{Float32, 3}, Nothing, :RegularData}(Float32[0.24799973; 0.3912878; … ; 0.64255303; 0.47915155;;; 0.24978966; 0.12909335; … ; 0.032393575; 0.18256313;;; 0.67669857; 0.53128225; … ; 0.6543872; 0.7989751;;; … ;;; 0.3812762; 0.29651302; … ; 0.027504563; 0.8203194;;; 0.090964735; 0.25212443; … ; 0.197765; 0.5558512;;; 0.75686467; 0.3287887; … ; 0.27345914; 0.38904423], nothing, 10, 1, 100, nothing, nothing)

or a snapshot tensor:

SnapshotTensor = rand(Float32, 10, 100, 5)
 
-dl = DataLoader(SnapshotTensor)
DataLoader{Float32, Array{Float32, 3}, Nothing, :TimeSeries}(Float32[0.7372873 0.5297716 … 0.30026257 0.34512484; 0.18515235 0.62638575 … 0.33359766 0.76147586; … ; 0.3167023 0.13998479 … 0.6190827 0.43921083; 0.8222499 0.9680691 … 0.3314203 0.038860917;;; 0.02583152 0.2619208 … 0.6834038 0.70247525; 0.7046808 0.49961072 … 0.78303885 0.78600234; … ; 0.009305418 0.8941876 … 0.9503127 0.24760056; 0.3240661 0.687296 … 0.9268558 0.79006183;;; 0.8235485 0.68454385 … 0.578376 0.29106402; 0.83938086 0.6670284 … 0.46135998 0.31197536; … ; 0.76915234 0.56561184 … 0.01469028 0.97325367; 0.11667901 0.3636588 … 0.24557573 0.20570123;;; 0.58321995 0.6568781 … 0.18913174 0.8958822; 0.9195444 0.70537513 … 0.10124564 0.31622803; … ; 0.019436717 0.9826356 … 0.43633884 0.0022076368; 0.25794375 0.0050197244 … 0.86520654 0.47125947;;; 0.03387499 0.5816525 … 0.72205895 0.12548625; 0.48261827 0.26107526 … 0.48584896 0.019511104; … ; 0.8815814 0.34440374 … 0.3523628 0.77762336; 0.74658465 0.07134575 … 0.39682704 0.6709598], nothing, 10, 100, 5, nothing, nothing)

Here the DataLoader has different properties :RegularData and :TimeSeries. This indicates that in the first case we treat all columns in the input tensor independently (this is mostly used for autoencoder problems), whereas in the second case we have time series-like data, which are mostly used for integration problems. We can also treat a problem with a matrix as input as a time series-like problem by providing an additional keyword argument: autoencoder=false:

SnapshotMatrix = rand(Float32, 10, 100)
+dl = DataLoader(SnapshotTensor)
DataLoader{Float32, Array{Float32, 3}, Nothing, :TimeSeries}(Float32[0.13292462 0.8675269 … 0.7886783 0.6382356; 0.22432995 0.43434638 … 0.82763267 0.41215235; … ; 0.29805195 0.04973328 … 0.23735166 0.6277334; 0.31186426 0.59075046 … 0.43199152 0.29078525;;; 0.614533 0.37017387 … 0.71245986 0.96971285; 0.98493093 0.031980693 … 0.2118206 0.86453134; … ; 0.2347545 0.7969141 … 0.030377686 0.53750414; 0.0451746 0.1492858 … 0.8383721 0.9667157;;; 0.07496083 0.7030878 … 0.41185933 0.8934395; 0.4459529 0.523878 … 0.27900404 0.4171574; … ; 0.9874781 0.93942124 … 0.611218 0.2650187; 0.24236047 0.13514596 … 0.4100657 0.44404012;;; 0.18803877 0.4530118 … 0.31370008 0.19260997; 0.5630235 0.97113204 … 0.31883442 0.536943; … ; 0.35017914 0.95842427 … 0.5171707 0.5879185; 0.44269866 0.81681085 … 0.78467906 0.037288547;;; 0.85101664 0.19772315 … 0.19121706 0.10227281; 0.40648496 0.049670577 … 0.88277376 0.54447365; … ; 0.9227476 0.87870425 … 0.14486456 0.39141983; 0.8475448 0.2717154 … 0.36061782 0.061712027], nothing, 10, 100, 5, nothing, nothing)

Here the DataLoader has different properties :RegularData and :TimeSeries. This indicates that in the first case we treat all columns in the input tensor independently (this is mostly used for autoencoder problems), whereas in the second case we have time series-like data, which are mostly used for integration problems. We can also treat a problem with a matrix as input as a time series-like problem by providing an additional keyword argument: autoencoder=false:

SnapshotMatrix = rand(Float32, 10, 100)
 
 dl = DataLoader(SnapshotMatrix; autoencoder=false)
 dl.input_time_steps
100

DataLoader can also be called with a NamedTuple that has q and p as keys.

In this case the field input_dim of DataLoader is interpreted as the sum of the $q$- and $p$-dimensions, i.e. if $q$ and $p$ both evolve on $\mathbb{R}^n$, then input_dim is $2n$.

SymplecticSnapshotTensor = (q = rand(Float32, 10, 100, 5), p = rand(Float32, 10, 100, 5))
 
-dl = DataLoader(SymplecticSnapshotTensor)
DataLoader{Float32, @NamedTuple{q::Array{Float32, 3}, p::Array{Float32, 3}}, Nothing, :TimeSeries}((q = Float32[0.63194525 0.48117453 … 0.42750108 0.7052233; 0.29683727 0.7666657 … 0.93470424 0.65966356; … ; 0.76579696 0.77377367 … 0.58042306 0.53537774; 0.68570423 0.34214318 … 0.6517937 0.7268162;;; 0.56836736 0.32606965 … 0.40225732 0.5918043; 0.5771421 0.98329556 … 0.6552729 0.6097527; … ; 0.18114781 0.8483613 … 0.091011465 0.7567567; 0.25187266 0.6789831 … 0.64611125 0.43449926;;; 0.50327224 0.97427225 … 0.34510094 0.21600366; 0.24288523 0.8388147 … 0.37709677 0.54414016; … ; 0.75977355 0.3065204 … 0.53313583 0.47730428; 0.09038824 0.93326914 … 0.020009875 0.61045855;;; 0.24906754 0.4860415 … 0.7999047 0.0420987; 0.25769067 0.12035012 … 0.80716753 0.10296035; … ; 0.14724338 0.53261405 … 0.06628728 0.56460816; 0.43888104 0.96188486 … 0.44307256 0.8584739;;; 0.93536085 0.6686914 … 0.78082573 0.5877933; 0.17444223 0.73573333 … 0.18575513 0.91286063; … ; 0.9622858 0.8713324 … 0.5640003 0.47448134; 0.38630593 0.19958311 … 0.1085515 0.7231936], p = Float32[0.42188752 0.5476018 … 0.27022403 0.042162895; 0.94916725 0.47921842 … 0.92821187 0.10728997; … ; 0.40955967 0.47208154 … 0.43827945 0.70578724; 0.9111329 0.14379501 … 0.9947228 0.13129443;;; 0.505744 0.47397435 … 0.32164508 0.62912714; 0.80585307 0.6837222 … 0.28863782 0.8620921; … ; 0.43336862 0.36984855 … 0.032850742 0.65122026; 0.7694597 0.05448234 … 0.47539407 0.61601883;;; 0.598147 0.0035421848 … 0.1798827 0.0069471; 0.96715087 0.4422326 … 0.33253896 0.7255499; … ; 0.5145125 0.021776497 … 0.7848872 0.5934332; 0.6444791 0.34651232 … 0.8601005 0.11753702;;; 0.9952162 0.53902924 … 0.37067795 0.4272086; 0.11934304 0.80591553 … 0.7042335 0.24562919; … ; 0.44097763 0.32638228 … 0.1363405 0.5713097; 0.5102197 0.77721447 … 0.77541935 0.33808726;;; 0.990427 0.022998571 … 0.87408257 0.016198039; 0.61876625 0.97801495 … 0.45116526 0.58369213; … ; 0.5273809 0.38570845 … 0.84813946 0.4014138; 0.8545336 0.84762865 … 0.8272887 0.62784296]), nothing, 20, 100, 5, nothing, nothing)
dl.input_dim
20

The Batch struct

Batch is a struct whose functor acts on an instance of DataLoader to produce a sequence of training samples for training for one epoch.

The Constructor

The constructor for Batch is called with:

  • batch_size::Int
  • seq_length::Int (optional)
  • prediction_window::Int (optional)

The first one of these arguments is required; it indicates the number of training samples in a batch. If we deal with time series data then we can additionaly supply a sequence length and a prediction window as input arguments to Batch. These indicate the number of input vectors and the number of output vectors.

The functor

An instance of Batch can be called on an instance of DataLoader to produce a sequence of samples that contain all the input data, i.e. for training for one epoch. The output of applying batch:Batch to dl::DataLoader is a tuple of vectors of integers. Each of these vectors contains two integers: the first is the time index and the second one is the parameter index.

matrix_data = rand(Float32, 2, 10)
+dl = DataLoader(SymplecticSnapshotTensor)
DataLoader{Float32, @NamedTuple{q::Array{Float32, 3}, p::Array{Float32, 3}}, Nothing, :TimeSeries}((q = Float32[0.15644282 0.7073246 … 0.8249736 0.4740212; 0.21305132 0.16514009 … 0.16166991 0.8499346; … ; 0.024424314 0.728299 … 0.2242353 0.38895804; 0.60622776 0.95988494 … 0.9982563 0.44332892;;; 0.5877351 0.0811286 … 0.20768261 0.46987653; 0.5586822 0.9591143 … 0.5811738 0.9609387; … ; 0.5536761 0.36311072 … 0.1073218 0.33813965; 0.3897627 0.17892796 … 0.43707442 0.28174007;;; 0.99351263 0.99489754 … 0.0545007 0.4126557; 0.44839823 0.43165177 … 0.48689526 0.9167457; … ; 0.9912741 0.34914166 … 0.31272775 0.6109176; 0.91831887 0.4854077 … 0.58914196 0.31581843;;; 0.9183417 0.9080335 … 0.89300394 0.2955643; 0.4026698 0.7506492 … 0.5082923 0.939201; … ; 0.4048438 0.08850813 … 0.9773625 0.7758554; 0.4620188 0.53673846 … 0.7752993 0.929549;;; 0.70152813 0.70411754 … 0.4844982 0.55630016; 0.09341776 0.08390343 … 0.43498808 0.8259233; … ; 0.9540665 0.71738315 … 0.31836736 0.3504672; 0.08056831 0.11129582 … 0.18075955 0.84167683], p = Float32[0.24897355 0.84406173 … 0.8821633 0.7332919; 0.35389954 0.33957994 … 0.44175637 0.04273534; … ; 0.018718898 0.59222263 … 0.0070593357 0.8794088; 0.45509583 0.14810872 … 0.0074183345 0.8188201;;; 0.65852267 0.79771835 … 0.29122263 0.29927695; 0.8663034 0.9410317 … 0.02499646 0.09458858; … ; 0.3977123 0.6167766 … 0.026077628 0.22049832; 0.6863993 0.5649373 … 0.7501161 0.7416841;;; 0.21843678 0.4145254 … 0.7728342 0.051957905; 0.3930192 0.3803016 … 0.6458817 0.8798668; … ; 0.73145735 0.21938437 … 0.4649971 0.09445888; 0.036848545 0.11338526 … 0.9082846 0.23102015;;; 0.14175218 0.8213072 … 0.6976974 0.7727129; 0.27857906 0.026842952 … 0.46511102 0.29197192; … ; 0.29707193 0.93260044 … 0.38173282 0.023017883; 0.36474448 0.86480606 … 0.36332452 0.16360503;;; 0.14413375 0.31174034 … 0.3052305 0.31921828; 0.7458084 0.9550317 … 0.039779007 0.16072929; … ; 0.41656 0.24584025 … 0.25366008 0.4624713; 0.21026474 0.75346506 … 0.7817601 0.96784276]), nothing, 20, 100, 5, nothing, nothing)
dl.input_dim
20

The Batch struct

Batch is a struct whose functor acts on an instance of DataLoader to produce a sequence of training samples for training for one epoch.

The Constructor

The constructor for Batch is called with:

  • batch_size::Int
  • seq_length::Int (optional)
  • prediction_window::Int (optional)

The first one of these arguments is required; it indicates the number of training samples in a batch. If we deal with time series data then we can additionaly supply a sequence length and a prediction window as input arguments to Batch. These indicate the number of input vectors and the number of output vectors.

The functor

An instance of Batch can be called on an instance of DataLoader to produce a sequence of samples that contain all the input data, i.e. for training for one epoch. The output of applying batch:Batch to dl::DataLoader is a tuple of vectors of integers. Each of these vectors contains two integers: the first is the time index and the second one is the parameter index.

matrix_data = rand(Float32, 2, 10)
 dl = DataLoader(matrix_data; autoencoder = true)
 
 batch = Batch(3)
-batch(dl)
([(1, 7), (1, 9), (1, 5)], [(1, 10), (1, 1), (1, 8)], [(1, 6), (1, 2), (1, 3)], [(1, 4)])

This also works if the data are in $qp$ form:

qp_data = (q = rand(Float32, 2, 10), p = rand(Float32, 2, 10))
+batch(dl)
([(1, 1), (1, 5), (1, 3)], [(1, 2), (1, 6), (1, 8)], [(1, 9), (1, 10), (1, 4)], [(1, 7)])

This also works if the data are in $qp$ form:

qp_data = (q = rand(Float32, 2, 10), p = rand(Float32, 2, 10))
 dl = DataLoader(qp_data; autoencoder = true)
 
 batch = Batch(3)
-batch(dl)
([(1, 5), (1, 1), (1, 9)], [(1, 10), (1, 8), (1, 4)], [(1, 7), (1, 2), (1, 3)], [(1, 6)])

In those two examples the autoencoder keyword was set to true (the default). This is why the first index was always 1. This changes if we set autoencoder = false:

qp_data = (q = rand(Float32, 2, 10), p = rand(Float32, 2, 10))
+batch(dl)
([(1, 1), (1, 6), (1, 7)], [(1, 9), (1, 8), (1, 2)], [(1, 10), (1, 5), (1, 3)], [(1, 4)])

In those two examples the autoencoder keyword was set to true (the default). This is why the first index was always 1. This changes if we set autoencoder = false:

qp_data = (q = rand(Float32, 2, 10), p = rand(Float32, 2, 10))
 dl = DataLoader(qp_data; autoencoder = false) # false is default
 
 batch = Batch(3)
-batch(dl)
([(8, 1), (5, 1), (7, 1)], [(2, 1), (9, 1), (3, 1)], [(6, 1), (1, 1), (4, 1)])

Specifically the routines do the following:

  1. $\mathtt{n\_indices}\leftarrow \mathtt{n\_params}\lor\mathtt{input\_time\_steps},$
  2. $\mathtt{indices} \leftarrow \mathtt{shuffle}(\mathtt{1:\mathtt{n\_indices}}),$
  3. $\mathcal{I}_i \leftarrow \mathtt{indices[(i - 1)} \cdot \mathtt{batch\_size} + 1 \mathtt{:} i \cdot \mathtt{batch\_size]}\text{ for }i=1, \ldots, (\mathrm{last} -1),$
  4. $\mathcal{I}_\mathtt{last} \leftarrow \mathtt{indices[}(\mathtt{n\_batches} - 1) \cdot \mathtt{batch\_size} + 1\mathtt{:end]}.$

Note that the routines are implemented in such a way that no two indices appear double.

Sampling from a tensor

We can also sample tensor data.

qp_data = (q = rand(Float32, 2, 20, 3), p = rand(Float32, 2, 20, 3))
+batch(dl)
([(3, 1), (8, 1), (1, 1)], [(7, 1), (4, 1), (6, 1)], [(2, 1), (5, 1), (9, 1)])

Specifically the routines do the following:

  1. $\mathtt{n\_indices}\leftarrow \mathtt{n\_params}\lor\mathtt{input\_time\_steps},$
  2. $\mathtt{indices} \leftarrow \mathtt{shuffle}(\mathtt{1:\mathtt{n\_indices}}),$
  3. $\mathcal{I}_i \leftarrow \mathtt{indices[(i - 1)} \cdot \mathtt{batch\_size} + 1 \mathtt{:} i \cdot \mathtt{batch\_size]}\text{ for }i=1, \ldots, (\mathrm{last} -1),$
  4. $\mathcal{I}_\mathtt{last} \leftarrow \mathtt{indices[}(\mathtt{n\_batches} - 1) \cdot \mathtt{batch\_size} + 1\mathtt{:end]}.$

Note that the routines are implemented in such a way that no two indices appear double.

Sampling from a tensor

We can also sample tensor data.

qp_data = (q = rand(Float32, 2, 20, 3), p = rand(Float32, 2, 20, 3))
 dl = DataLoader(qp_data)
 
 # also specify sequence length here
 batch = Batch(4, 5)
-batch(dl)
([(10, 3), (2, 3), (6, 3), (9, 3)], [(7, 3), (8, 3), (4, 3), (11, 3)], [(5, 3), (1, 3), (3, 3), (10, 1)], [(2, 1), (6, 1), (9, 1), (7, 1)], [(8, 1), (4, 1), (11, 1), (5, 1)], [(1, 1), (3, 1), (10, 2), (2, 2)], [(6, 2), (9, 2), (7, 2), (8, 2)], [(4, 2), (11, 2), (5, 2), (1, 2)], [(3, 2)])

Sampling from a tensor is done the following way ($\mathcal{I}_i$ again denotes the batch indices for the $i$-th batch):

  1. $\mathtt{time\_indices} \leftarrow \mathtt{shuffle}(\mathtt{1:}(\mathtt{input\_time\_steps} - \mathtt{seq\_length} - \mathtt{prediction_window}),$
  2. $\mathtt{parameter\_indices} \leftarrow \mathtt{shuffle}(\mathtt{1:n\_params}),$
  3. $\mathtt{complete\_indices} \leftarrow \mathtt{product}(\mathtt{time\_indices}, \mathtt{parameter\_indices}),$
  4. $\mathcal{I}_i \leftarrow \mathtt{complete\_indices[}(i - 1) \cdot \mathtt{batch\_size} + 1 : i \cdot \mathtt{batch\_size]}\text{ for }i=1, \ldots, (\mathrm{last} -1),$
  5. $\mathcal{I}_\mathrm{last} \leftarrow \mathtt{complete\_indices[}(\mathrm{last} - 1) \cdot \mathtt{batch\_size} + 1\mathtt{:end]}.$

This algorithm can be visualized the following way (here batch_size = 4):

Here the sampling is performed over the second axis (the time step dimension) and the third axis (the parameter dimension). Whereas each block has thickness 1 in the $x$ direction (i.e. pertains to a single parameter), its length in the $y$ direction is seq_length. In total we sample as many such blocks as the batch size is big. By construction those blocks are never the same throughout a training epoch but may intersect each other!

+batch(dl)
([(10, 3), (7, 3), (4, 3), (9, 3)], [(2, 3), (11, 3), (3, 3), (1, 3)], [(8, 3), (5, 3), (6, 3), (10, 2)], [(7, 2), (4, 2), (9, 2), (2, 2)], [(11, 2), (3, 2), (1, 2), (8, 2)], [(5, 2), (6, 2), (10, 1), (7, 1)], [(4, 1), (9, 1), (2, 1), (11, 1)], [(3, 1), (1, 1), (8, 1), (5, 1)], [(6, 1)])

Sampling from a tensor is done the following way ($\mathcal{I}_i$ again denotes the batch indices for the $i$-th batch):

  1. $\mathtt{time\_indices} \leftarrow \mathtt{shuffle}(\mathtt{1:}(\mathtt{input\_time\_steps} - \mathtt{seq\_length} - \mathtt{prediction_window}),$
  2. $\mathtt{parameter\_indices} \leftarrow \mathtt{shuffle}(\mathtt{1:n\_params}),$
  3. $\mathtt{complete\_indices} \leftarrow \mathtt{product}(\mathtt{time\_indices}, \mathtt{parameter\_indices}),$
  4. $\mathcal{I}_i \leftarrow \mathtt{complete\_indices[}(i - 1) \cdot \mathtt{batch\_size} + 1 : i \cdot \mathtt{batch\_size]}\text{ for }i=1, \ldots, (\mathrm{last} -1),$
  5. $\mathcal{I}_\mathrm{last} \leftarrow \mathtt{complete\_indices[}(\mathrm{last} - 1) \cdot \mathtt{batch\_size} + 1\mathtt{:end]}.$

This algorithm can be visualized the following way (here batch_size = 4):

Here the sampling is performed over the second axis (the time step dimension) and the third axis (the parameter dimension). Whereas each block has thickness 1 in the $x$ direction (i.e. pertains to a single parameter), its length in the $y$ direction is seq_length. In total we sample as many such blocks as the batch size is big. By construction those blocks are never the same throughout a training epoch but may intersect each other!

diff --git a/latest/data_loader/snapshot_matrix/index.html b/latest/data_loader/snapshot_matrix/index.html index 3550e42a2..5cddbd9c8 100644 --- a/latest/data_loader/snapshot_matrix/index.html +++ b/latest/data_loader/snapshot_matrix/index.html @@ -5,4 +5,4 @@ \hat{u}_3(t_0) & \hat{u}_3(t_1) & \ldots & \hat{u}_3(t_f) \\ \ldots & \ldots & \ldots & \ldots \\ \hat{u}_{2N}(t_0) & \hat{u}_{2N}(t_1) & \ldots & \hat{u}_{2N}(t_f) \\ -\end{array}\right].\]

In the above example we store a matrix whose first axis is the system dimension (i.e. a state is an element of $\mathbb{R}^{2n}$) and the second dimension gives the time step.

The starting point for using the snapshot matrix as data for a machine learning model is that all the columns of $M$ live on a lower-dimensional solution manifold and we can use techniques such as POD and autoencoders to find this solution manifold. We also note that the second axis of $M$ does not necessarily indicate time but can also represent various parameters (including initial conditions). The second axis in the DataLoader struct is therefore saved in the field n_params.

Snapshot tensor

The snapshot tensor fulfills the same role as the snapshot matrix but has a third axis that describes different initial parameters (such as different initial conditions).

When drawing training samples from the snapshot tensor we also need to specify a sequence length (as an argument to the Batch struct). When sampling a batch from the snapshot tensor we sample over the starting point of the time interval (which is of length seq_length) and the third axis of the tensor (the parameters). The total number of batches in this case is $\lceil\mathtt{(dl.input\_time_steps - batch.seq\_length) * dl.n\_params / batch.batch_size}\rceil$.

+\end{array}\right].\]

In the above example we store a matrix whose first axis is the system dimension (i.e. a state is an element of $\mathbb{R}^{2n}$) and the second dimension gives the time step.

The starting point for using the snapshot matrix as data for a machine learning model is that all the columns of $M$ live on a lower-dimensional solution manifold and we can use techniques such as POD and autoencoders to find this solution manifold. We also note that the second axis of $M$ does not necessarily indicate time but can also represent various parameters (including initial conditions). The second axis in the DataLoader struct is therefore saved in the field n_params.

Snapshot tensor

The snapshot tensor fulfills the same role as the snapshot matrix but has a third axis that describes different initial parameters (such as different initial conditions).

When drawing training samples from the snapshot tensor we also need to specify a sequence length (as an argument to the Batch struct). When sampling a batch from the snapshot tensor we sample over the starting point of the time interval (which is of length seq_length) and the third axis of the tensor (the parameters). The total number of batches in this case is $\lceil\mathtt{(dl.input\_time_steps - batch.seq\_length) * dl.n\_params / batch.batch_size}\rceil$.

diff --git a/latest/index.html b/latest/index.html index 85f2db0a0..1f69d75fd 100644 --- a/latest/index.html +++ b/latest/index.html @@ -1,2 +1,2 @@ -Home · GeometricMachineLearning.jl

Geometric Machine Learning

GeometricMachineLearning.jl implements various scientific machine learning models that aim at learning dynamical systems with geometric structure, such as Hamiltonian (symplectic) or Lagrangian (variational) systems.

Installation

GeometricMachineLearning.jl and all of its dependencies can be installed via the Julia REPL by typing

]add GeometricMachineLearning

Architectures

There are several architectures tailored towards problems in scientific machine learning implemented in GeometricMachineLearning.

Manifolds

GeometricMachineLearning supports putting neural network weights on manifolds. These include:

Special Neural Network Layer

Many layers have been adapted in order to be used for problems in scientific machine learning. Including:

Tutorials

Tutorials for using GeometricMachineLearning are:

Reduced Order Modeling

A short description of the key concepts in reduced order modeling (where GeometricMachineLearning can be used) are in:

+Home · GeometricMachineLearning.jl

Geometric Machine Learning

GeometricMachineLearning.jl implements various scientific machine learning models that aim at learning dynamical systems with geometric structure, such as Hamiltonian (symplectic) or Lagrangian (variational) systems.

Installation

GeometricMachineLearning.jl and all of its dependencies can be installed via the Julia REPL by typing

]add GeometricMachineLearning

Architectures

There are several architectures tailored towards problems in scientific machine learning implemented in GeometricMachineLearning.

Manifolds

GeometricMachineLearning supports putting neural network weights on manifolds. These include:

Special Neural Network Layer

Many layers have been adapted in order to be used for problems in scientific machine learning. Including:

Tutorials

Tutorials for using GeometricMachineLearning are:

Reduced Order Modeling

A short description of the key concepts in reduced order modeling (where GeometricMachineLearning can be used) are in:

diff --git a/latest/layers/attention_layer/index.html b/latest/layers/attention_layer/index.html index aa31292a2..e5bf3a7be 100644 --- a/latest/layers/attention_layer/index.html +++ b/latest/layers/attention_layer/index.html @@ -8,4 +8,4 @@ \vdots & \ddots & \vdots \\ \mathbb{O}_T & \cdots & \Lambda(Z) \end{array} -\right]\]

from the left onto the big vector.

Historical Note

Attention was used before, but always in connection with recurrent neural networks (see (Luong et al, 2015) and (Bahdanau et al, 2014)).

References

[20]
D. Bahdanau, K. Cho and Y. Bengio. Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473 (2014).
[19]
M.-T. Luong, H. Pham and C. D. Manning. Effective approaches to attention-based neural machine translation, arXiv preprint arXiv:1508.04025 (2015).
+\right]\]

from the left onto the big vector.

Historical Note

Attention was used before, but always in connection with recurrent neural networks (see (Luong et al, 2015) and (Bahdanau et al, 2014)).

References

[20]
D. Bahdanau, K. Cho and Y. Bengio. Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473 (2014).
[19]
M.-T. Luong, H. Pham and C. D. Manning. Effective approaches to attention-based neural machine translation, arXiv preprint arXiv:1508.04025 (2015).
diff --git a/latest/layers/multihead_attention_layer/index.html b/latest/layers/multihead_attention_layer/index.html index 5a5471cde..2050729cd 100644 --- a/latest/layers/multihead_attention_layer/index.html +++ b/latest/layers/multihead_attention_layer/index.html @@ -1,2 +1,2 @@ -Multihead Attention · GeometricMachineLearning.jl

Multihead Attention Layer

In order to arrive from the attention layer at the multihead attention layer we have to do a few modifications:

Note that these neural networks were originally developed for natural language processing (NLP) tasks and the terminology used here bears some resemblance to that field. The input to a multihead attention layer typicaly comprises three components:

  1. Values $V\in\mathbb{R}^{n\times{}T}$: a matrix whose columns are value vectors,
  2. Queries $Q\in\mathbb{R}^{n\times{}T}$: a matrix whose columns are query vectors,
  3. Keys $K\in\mathbb{R}^{n\times{}T}$: a matrix whose columns are key vectors.

Regular attention performs the following operation:

\[\mathrm{Attention}(Q,K,V) = V\mathrm{softmax}(\frac{K^TQ}{\sqrt{n}}),\]

where $n$ is the dimension of the vectors in $V$, $Q$ and $K$. The softmax activation function here acts column-wise, so it can be seen as a transformation $\mathrm{softmax}:\mathbb{R}^{T}\to\mathbb{R}^T$ with $[\mathrm{softmax}(v)]_i = e^{v_i}/\left(\sum_{j=1}e^{v_j}\right)$. The $K^TQ$ term is a similarity matrix between the queries and the vectors.

The transformer contains a self-attention mechanism, i.e. takes an input $X$ and then transforms it linearly to $V$, $Q$ and $K$, i.e. $V = P^VX$, $Q = P^QX$ and $K = P^KX$. What distinguishes the multihead attention layer from the singlehead attention layer, is that there is not just one $P^V$, $P^Q$ and $P^K$, but there are several: one for each head of the multihead attention layer. After computing the individual values, queries and vectors, and after applying the softmax, the outputs are then concatenated together in order to obtain again an array that is of the same size as the input array:

Here the various $P$ matrices can be interpreted as being projections onto lower-dimensional subspaces, hence the designation by the letter $P$. Because of this interpretation as projection matrices onto smaller spaces that should capture features in the input data it makes sense to constrain these elements to be part of the Stiefel manifold.

Computing Correlations in the Multihead-Attention Layer

The attention mechanism describes a reweighting of the "values" $V_i$ based on correlations between the "keys" $K_i$ and the "queries" $Q_i$. First note the structure of these matrices: they are all a collection of $T$ vectors $(N\div\mathtt{n\_heads})$-dimensional vectors, i.e. $V_i=[v_i^{(1)}, \ldots, v_i^{(T)}], K_i=[k_i^{(1)}, \ldots, k_i^{(T)}], Q_i=[q_i^{(1)}, \ldots, q_i^{(T)}]$ . Those vectors have been obtained by applying the respective projection matrices onto the original input $I_i\in\mathbb{R}^{N\times{}T}$.

When performing the reweighting of the columns of $V_i$ we first compute the correlations between the vectors in $K_i$ and in $Q_i$ and store the results in a correlation matrix $C_i$:

\[ [C_i]_{mn} = \left(k_i^{(m)}\right)^Tq_i^{(n)}.\]

The columns of this correlation matrix are than rescaled with a softmax function, obtaining a matrix of probability vectors $\mathcal{P}_i$:

\[ [\mathcal{P}_i]_{\bullet{}n} = \mathrm{softmax}([C_i]_{\bullet{}n}).\]

Finally the matrix $\mathcal{P}_i$ is multiplied onto $V_i$ from the right, resulting in 16 convex combinations of the 16 vectors $v_i^{(m)}$ with $m=1,\ldots,T$:

\[ V_i\mathcal{P}_i = \left[\sum_{m=1}^{16}[\mathcal{P}_i]_{m,1}v_i^{(m)}, \ldots, \sum_{m=1}^{T}[\mathcal{P}_i]_{m,T}v_i^{(m)}\right].\]

With this we can now give a better interpretation of what the projection matrices $W_i^V$, $W_i^K$ and $W_i^Q$ should do: they map the original data to lower-dimensional subspaces. We then compute correlations between the representation in the $K$ and in the $Q$ basis and use this correlation to perform a convex reweighting of the vectors in the $V$ basis. These reweighted values are then fed into a standard feedforward neural network.

Because the main task of the $W_i^V$, $W_i^K$ and $W_i^Q$ matrices here is for them to find bases, it makes sense to constrain them onto the Stiefel manifold; they do not and should not have the maximum possible generality.

References

[21]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin. Attention is all you need. Advances in neural information processing systems 30 (2017).
+Multihead Attention · GeometricMachineLearning.jl

Multihead Attention Layer

In order to arrive from the attention layer at the multihead attention layer we have to do a few modifications:

Note that these neural networks were originally developed for natural language processing (NLP) tasks and the terminology used here bears some resemblance to that field. The input to a multihead attention layer typicaly comprises three components:

  1. Values $V\in\mathbb{R}^{n\times{}T}$: a matrix whose columns are value vectors,
  2. Queries $Q\in\mathbb{R}^{n\times{}T}$: a matrix whose columns are query vectors,
  3. Keys $K\in\mathbb{R}^{n\times{}T}$: a matrix whose columns are key vectors.

Regular attention performs the following operation:

\[\mathrm{Attention}(Q,K,V) = V\mathrm{softmax}(\frac{K^TQ}{\sqrt{n}}),\]

where $n$ is the dimension of the vectors in $V$, $Q$ and $K$. The softmax activation function here acts column-wise, so it can be seen as a transformation $\mathrm{softmax}:\mathbb{R}^{T}\to\mathbb{R}^T$ with $[\mathrm{softmax}(v)]_i = e^{v_i}/\left(\sum_{j=1}e^{v_j}\right)$. The $K^TQ$ term is a similarity matrix between the queries and the vectors.

The transformer contains a self-attention mechanism, i.e. takes an input $X$ and then transforms it linearly to $V$, $Q$ and $K$, i.e. $V = P^VX$, $Q = P^QX$ and $K = P^KX$. What distinguishes the multihead attention layer from the singlehead attention layer, is that there is not just one $P^V$, $P^Q$ and $P^K$, but there are several: one for each head of the multihead attention layer. After computing the individual values, queries and vectors, and after applying the softmax, the outputs are then concatenated together in order to obtain again an array that is of the same size as the input array:

Here the various $P$ matrices can be interpreted as being projections onto lower-dimensional subspaces, hence the designation by the letter $P$. Because of this interpretation as projection matrices onto smaller spaces that should capture features in the input data it makes sense to constrain these elements to be part of the Stiefel manifold.

Computing Correlations in the Multihead-Attention Layer

The attention mechanism describes a reweighting of the "values" $V_i$ based on correlations between the "keys" $K_i$ and the "queries" $Q_i$. First note the structure of these matrices: they are all a collection of $T$ vectors $(N\div\mathtt{n\_heads})$-dimensional vectors, i.e. $V_i=[v_i^{(1)}, \ldots, v_i^{(T)}], K_i=[k_i^{(1)}, \ldots, k_i^{(T)}], Q_i=[q_i^{(1)}, \ldots, q_i^{(T)}]$ . Those vectors have been obtained by applying the respective projection matrices onto the original input $I_i\in\mathbb{R}^{N\times{}T}$.

When performing the reweighting of the columns of $V_i$ we first compute the correlations between the vectors in $K_i$ and in $Q_i$ and store the results in a correlation matrix $C_i$:

\[ [C_i]_{mn} = \left(k_i^{(m)}\right)^Tq_i^{(n)}.\]

The columns of this correlation matrix are than rescaled with a softmax function, obtaining a matrix of probability vectors $\mathcal{P}_i$:

\[ [\mathcal{P}_i]_{\bullet{}n} = \mathrm{softmax}([C_i]_{\bullet{}n}).\]

Finally the matrix $\mathcal{P}_i$ is multiplied onto $V_i$ from the right, resulting in 16 convex combinations of the 16 vectors $v_i^{(m)}$ with $m=1,\ldots,T$:

\[ V_i\mathcal{P}_i = \left[\sum_{m=1}^{16}[\mathcal{P}_i]_{m,1}v_i^{(m)}, \ldots, \sum_{m=1}^{T}[\mathcal{P}_i]_{m,T}v_i^{(m)}\right].\]

With this we can now give a better interpretation of what the projection matrices $W_i^V$, $W_i^K$ and $W_i^Q$ should do: they map the original data to lower-dimensional subspaces. We then compute correlations between the representation in the $K$ and in the $Q$ basis and use this correlation to perform a convex reweighting of the vectors in the $V$ basis. These reweighted values are then fed into a standard feedforward neural network.

Because the main task of the $W_i^V$, $W_i^K$ and $W_i^Q$ matrices here is for them to find bases, it makes sense to constrain them onto the Stiefel manifold; they do not and should not have the maximum possible generality.

References

[21]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin. Attention is all you need. Advances in neural information processing systems 30 (2017).
diff --git a/latest/layers/volume_preserving_feedforward/index.html b/latest/layers/volume_preserving_feedforward/index.html index 22882d3f6..905bf61be 100644 --- a/latest/layers/volume_preserving_feedforward/index.html +++ b/latest/layers/volume_preserving_feedforward/index.html @@ -9,4 +9,4 @@ b_{21} & \ddots & & \vdots \\ \vdots & \ddots & \ddots & \vdots \\ b_{n1} & \cdots & b_{n(n-1)} & 1 -\end{pmatrix},\]

and the determinant of $J$ is 1, i.e. the map is volume-preserving.

Neural network architecture

Volume-preserving feedforward neural networks should be used as Architectures in GeometricMachineLearning. The constructor for them is:

The constructor is called with the following arguments:

The constructor produces the following architecture[2]:

Here LinearLowerLayer performs $x \mapsto x + Lx$ and NonLinearLowerLayer performs $x \mapsto x + \sigma(Lx + b)$. The activation function $\sigma$ is the forth input argument to the constructor and tanh by default.

Note on Sympnets

As SympNets are symplectic maps, they also conserve phase space volume and therefore form a subcategory of volume-preserving feedforward layers.

+\end{pmatrix},\]

and the determinant of $J$ is 1, i.e. the map is volume-preserving.

Neural network architecture

Volume-preserving feedforward neural networks should be used as Architectures in GeometricMachineLearning. The constructor for them is:

The constructor is called with the following arguments:

The constructor produces the following architecture[2]:

Here LinearLowerLayer performs $x \mapsto x + Lx$ and NonLinearLowerLayer performs $x \mapsto x + \sigma(Lx + b)$. The activation function $\sigma$ is the forth input argument to the constructor and tanh by default.

Note on Sympnets

As SympNets are symplectic maps, they also conserve phase space volume and therefore form a subcategory of volume-preserving feedforward layers.

diff --git a/latest/library/index.html b/latest/library/index.html index adadffa7b..280639dce 100644 --- a/latest/library/index.html +++ b/latest/library/index.html @@ -1,25 +1,25 @@ -Library · GeometricMachineLearning.jl

GeometricMachineLearning Library Functions

GeometricMachineLearning.AbstractRetractionType

AbstractRetraction is a type that comprises all retraction methods for manifolds. For every manifold layer one has to specify a retraction method that takes the layer and elements of the (global) tangent space.

source
GeometricMachineLearning.ActivationLayerPMethod

Performs:

\[\begin{pmatrix} +Library · GeometricMachineLearning.jl

GeometricMachineLearning Library Functions

GeometricMachineLearning.AbstractRetractionType

AbstractRetraction is a type that comprises all retraction methods for manifolds. For every manifold layer one has to specify a retraction method that takes the layer and elements of the (global) tangent space.

source
GeometricMachineLearning.AdamOptimizerWithDecayType

Defines the Adam Optimizer with weight decay.

Constructors

The default constructor takes as input:

  • n_epochs::Int
  • η₁: the learning rate at the start
  • η₂: the learning rate at the end
  • ρ₁: the decay parameter for the first moment
  • ρ₂: the decay parameter for the second moment
  • δ: the safety parameter
  • T (keyword argument): the type.

The second constructor is called with:

  • n_epochs::Int
  • T

... the rest are keyword arguments

source
GeometricMachineLearning.BFGSCacheType

The cache for the BFGS optimizer.

It stores an array for the previous time step B and the inverse of the Hessian matrix H.

It is important to note that setting up this cache already requires a derivative! This is not the case for the other optimizers.

source
GeometricMachineLearning.BFGSDummyCacheType

In order to initialize BGGSCache we first need gradient information. This is why we initially have this BFGSDummyCache until gradient information is available.

NOTE: we may not need this.

source
GeometricMachineLearning.BatchType

Batch is a struct whose functor acts on an instance of DataLoader to produce a sequence of training samples for training for one epoch.

The Constructor

The constructor for Batch is called with:

  • batch_size::Int
  • seq_length::Int (optional)
  • prediction_window::Int (optional)

The first one of these arguments is required; it indicates the number of training samples in a batch. If we deal with time series data then we can additionaly supply a sequence length and a prediction window as input arguments to Batch. These indicate the number of input vectors and the number of output vectors.

The functor

An instance of Batch can be called on an instance of DataLoader to produce a sequence of samples that contain all the input data, i.e. for training for one epoch. The output of applying batch:Batch to dl::DataLoader is a tuple of vectors of integers. Each of these vectors contains two integers: the first is the time index and the second one is the parameter index.

source
GeometricMachineLearning.ClassificationType

Classification Layer that takes a matrix as an input and returns a vector that is used for MNIST classification.

It has the following arguments:

  • M: input dimension
  • N: output dimension
  • activation: the activation function

And the following optional argument:

  • average: If this is set to true, then the output is computed as $\frac{1}{N}\sum_{i=1}^N[input]_{\bullet{}i}$. If set to false (the default) it picks the last column of the input.
source
GeometricMachineLearning.ClassificationTransformerType

This is a transformer neural network for classification purposes. At the moment this is only used for training on MNIST, but can in theory be used for any classification problem.

It has to be called with a DataLoader that stores an input and an output tensor. The optional arguments are:

  • n_heads: The number of heads in the MultiHeadAttention (mha) layers. Default: 7.
  • n_layers: The number of transformer layers. Default: 16.
  • activation: The activation function. Default: softmax.
  • Stiefel: Wheter the matrices in the mha layers are on the Stiefel manifold.
  • add_connection: Whether the input is appended to the output of the mha layer. (skip connection)
source
GeometricMachineLearning.DataLoaderType

Data Loader is a struct that creates an instance based on a tensor (or different input format) and is designed to make training convenient.

Constructor

The data loader can be called with various inputs:

  • A single vector: If the data loader is called with a single vector (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the second axis indicates parameter values and/or time steps and the system has a single degree of freedom (i.e. the system dimension is one).
  • A single matrix: If the data loader is called with a single matrix (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the first axis is assumed to indicate the degrees of freedom of the system and the second axis indicates parameter values and/or time steps.
  • A single tensor: If the data loader is called with a single tensor, then this is interpreted as an integration problem with the second axis indicating the time step and the third one indicating the parameters.
  • A tensor and a vector: This is a special case (MNIST classification problem). For the MNIST problem for example the input are $n_p$ matrices (first input argument) and $n_p$ integers (second input argument).
  • A NamedTuple with fields q and p: The NamedTuple contains (i) two matrices or (ii) two tensors.
  • An EnsembleSolution: The EnsembleSolution typically comes from GeometricProblems.

When we supply a single vector or a single matrix as input to DataLoader and further set autoencoder = false (keyword argument), then the data are stored as an integration problem and the second axis is assumed to indicate time steps.

Fields of DataLoader

The fields of the DataLoader struct are the following: - input: The input data with axes (i) system dimension, (ii) number of time steps and (iii) number of parameters. - output: The tensor that contains the output (supervised learning) - this may be of type Nothing if the constructor is only called with one tensor (unsupervised learning). - input_dim: The dimension of the system, i.e. what is taken as input by a regular neural network. - input_time_steps: The length of the entire time series (length of the second axis). - n_params: The number of parameters that are present in the data set (length of third axis) - output_dim: The dimension of the output tensor (first axis). If output is of type Nothing, then this is also of type Nothing. - output_time_steps: The size of the second axis of the output tensor. If output is of type Nothing, then this is also of type Nothing.

The input and output fields of DataLoader

Even though the arguments to the Constructor may be vectors or matrices, internally DataLoader always stores tensors.

source
GeometricMachineLearning.DataLoaderMethod

Data Loader is a struct that creates an instance based on a tensor (or different input format) and is designed to make training convenient.

Constructor

The data loader can be called with various inputs:

  • A single vector: If the data loader is called with a single vector (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the second axis indicates parameter values and/or time steps and the system has a single degree of freedom (i.e. the system dimension is one).
  • A single matrix: If the data loader is called with a single matrix (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the first axis is assumed to indicate the degrees of freedom of the system and the second axis indicates parameter values and/or time steps.
  • A single tensor: If the data loader is called with a single tensor, then this is interpreted as an integration problem with the second axis indicating the time step and the third one indicating the parameters.
  • A tensor and a vector: This is a special case (MNIST classification problem). For the MNIST problem for example the input are $n_p$ matrices (first input argument) and $n_p$ integers (second input argument).
  • A NamedTuple with fields q and p: The NamedTuple contains (i) two matrices or (ii) two tensors.
  • An EnsembleSolution: The EnsembleSolution typically comes from GeometricProblems.

When we supply a single vector or a single matrix as input to DataLoader and further set autoencoder = false (keyword argument), then the data are stored as an integration problem and the second axis is assumed to indicate time steps.

source
GeometricMachineLearning.GSympNetType

GSympNet is called with a single input argument, the system dimension, or with an instance of DataLoader. Optional input arguments are:

  • upscaling_dimension::Int: The upscaling dimension of the gradient layer. See the documentation for GradientLayerQ and GradientLayerP for further explanation. The default is 2*dim.
  • nhidden::Int: The number of hidden layers (i.e. layers that are not input or output layers). The default is 2.
  • activation: The activation function that is applied. By default this is tanh.
  • init_upper::Bool: Initialize the gradient layer so that it first modifies the $q$-component. The default is true.
source
GeometricMachineLearning.GlobalSectionType

This implements global sections for the Stiefel manifold and the Symplectic Stiefel manifold.

In practice this is implemented using Householder reflections, with the auxiliary column vectors given by: |0| |0| |.| |1| ith spot for i in (n+1) to N (or with random columns) |0| |.| |0|

Maybe consider dividing the output in the check functions by n!

Implement a general global section here!!!! Tₓ𝔐 → G×𝔤 !!!!!! (think about random initialization!)

source
GeometricMachineLearning.AdamOptimizerWithDecayType

Defines the Adam Optimizer with weight decay.

Constructors

The default constructor takes as input:

  • n_epochs::Int
  • η₁: the learning rate at the start
  • η₂: the learning rate at the end
  • ρ₁: the decay parameter for the first moment
  • ρ₂: the decay parameter for the second moment
  • δ: the safety parameter
  • T (keyword argument): the type.

The second constructor is called with:

  • n_epochs::Int
  • T

... the rest are keyword arguments

source
GeometricMachineLearning.BFGSCacheType

The cache for the BFGS optimizer.

It stores an array for the previous time step B and the inverse of the Hessian matrix H.

It is important to note that setting up this cache already requires a derivative! This is not the case for the other optimizers.

source
GeometricMachineLearning.BFGSDummyCacheType

In order to initialize BGGSCache we first need gradient information. This is why we initially have this BFGSDummyCache until gradient information is available.

NOTE: we may not need this.

source
GeometricMachineLearning.BatchType

Batch is a struct whose functor acts on an instance of DataLoader to produce a sequence of training samples for training for one epoch.

The Constructor

The constructor for Batch is called with:

  • batch_size::Int
  • seq_length::Int (optional)
  • prediction_window::Int (optional)

The first one of these arguments is required; it indicates the number of training samples in a batch. If we deal with time series data then we can additionaly supply a sequence length and a prediction window as input arguments to Batch. These indicate the number of input vectors and the number of output vectors.

The functor

An instance of Batch can be called on an instance of DataLoader to produce a sequence of samples that contain all the input data, i.e. for training for one epoch. The output of applying batch:Batch to dl::DataLoader is a tuple of vectors of integers. Each of these vectors contains two integers: the first is the time index and the second one is the parameter index.

source
GeometricMachineLearning.ClassificationType

Classification Layer that takes a matrix as an input and returns a vector that is used for MNIST classification.

It has the following arguments:

  • M: input dimension
  • N: output dimension
  • activation: the activation function

And the following optional argument:

  • average: If this is set to true, then the output is computed as $\frac{1}{N}\sum_{i=1}^N[input]_{\bullet{}i}$. If set to false (the default) it picks the last column of the input.
source
GeometricMachineLearning.ClassificationTransformerType

This is a transformer neural network for classification purposes. At the moment this is only used for training on MNIST, but can in theory be used for any classification problem.

It has to be called with a DataLoader that stores an input and an output tensor. The optional arguments are:

  • n_heads: The number of heads in the MultiHeadAttention (mha) layers. Default: 7.
  • n_layers: The number of transformer layers. Default: 16.
  • activation: The activation function. Default: softmax.
  • Stiefel: Wheter the matrices in the mha layers are on the Stiefel manifold.
  • add_connection: Whether the input is appended to the output of the mha layer. (skip connection)
source
GeometricMachineLearning.DataLoaderType

Data Loader is a struct that creates an instance based on a tensor (or different input format) and is designed to make training convenient.

Constructor

The data loader can be called with various inputs:

  • A single vector: If the data loader is called with a single vector (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the second axis indicates parameter values and/or time steps and the system has a single degree of freedom (i.e. the system dimension is one).
  • A single matrix: If the data loader is called with a single matrix (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the first axis is assumed to indicate the degrees of freedom of the system and the second axis indicates parameter values and/or time steps.
  • A single tensor: If the data loader is called with a single tensor, then this is interpreted as an integration problem with the second axis indicating the time step and the third one indicating the parameters.
  • A tensor and a vector: This is a special case (MNIST classification problem). For the MNIST problem for example the input are $n_p$ matrices (first input argument) and $n_p$ integers (second input argument).
  • A NamedTuple with fields q and p: The NamedTuple contains (i) two matrices or (ii) two tensors.
  • An EnsembleSolution: The EnsembleSolution typically comes from GeometricProblems.

When we supply a single vector or a single matrix as input to DataLoader and further set autoencoder = false (keyword argument), then the data are stored as an integration problem and the second axis is assumed to indicate time steps.

Fields of DataLoader

The fields of the DataLoader struct are the following: - input: The input data with axes (i) system dimension, (ii) number of time steps and (iii) number of parameters. - output: The tensor that contains the output (supervised learning) - this may be of type Nothing if the constructor is only called with one tensor (unsupervised learning). - input_dim: The dimension of the system, i.e. what is taken as input by a regular neural network. - input_time_steps: The length of the entire time series (length of the second axis). - n_params: The number of parameters that are present in the data set (length of third axis) - output_dim: The dimension of the output tensor (first axis). If output is of type Nothing, then this is also of type Nothing. - output_time_steps: The size of the second axis of the output tensor. If output is of type Nothing, then this is also of type Nothing.

The input and output fields of DataLoader

Even though the arguments to the Constructor may be vectors or matrices, internally DataLoader always stores tensors.

source
GeometricMachineLearning.DataLoaderMethod

Data Loader is a struct that creates an instance based on a tensor (or different input format) and is designed to make training convenient.

Constructor

The data loader can be called with various inputs:

  • A single vector: If the data loader is called with a single vector (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the second axis indicates parameter values and/or time steps and the system has a single degree of freedom (i.e. the system dimension is one).
  • A single matrix: If the data loader is called with a single matrix (and no other arguments are given), then this is interpreted as an autoencoder problem, i.e. the first axis is assumed to indicate the degrees of freedom of the system and the second axis indicates parameter values and/or time steps.
  • A single tensor: If the data loader is called with a single tensor, then this is interpreted as an integration problem with the second axis indicating the time step and the third one indicating the parameters.
  • A tensor and a vector: This is a special case (MNIST classification problem). For the MNIST problem for example the input are $n_p$ matrices (first input argument) and $n_p$ integers (second input argument).
  • A NamedTuple with fields q and p: The NamedTuple contains (i) two matrices or (ii) two tensors.
  • An EnsembleSolution: The EnsembleSolution typically comes from GeometricProblems.

When we supply a single vector or a single matrix as input to DataLoader and further set autoencoder = false (keyword argument), then the data are stored as an integration problem and the second axis is assumed to indicate time steps.

source
GeometricMachineLearning.GSympNetType

GSympNet is called with a single input argument, the system dimension, or with an instance of DataLoader. Optional input arguments are:

  • upscaling_dimension::Int: The upscaling dimension of the gradient layer. See the documentation for GradientLayerQ and GradientLayerP for further explanation. The default is 2*dim.
  • nhidden::Int: The number of hidden layers (i.e. layers that are not input or output layers). The default is 2.
  • activation: The activation function that is applied. By default this is tanh.
  • init_upper::Bool: Initialize the gradient layer so that it first modifies the $q$-component. The default is true.
source
GeometricMachineLearning.GlobalSectionType

This implements global sections for the Stiefel manifold and the Symplectic Stiefel manifold.

In practice this is implemented using Householder reflections, with the auxiliary column vectors given by: |0| |0| |.| |1| ith spot for i in (n+1) to N (or with random columns) |0| |.| |0|

Maybe consider dividing the output in the check functions by n!

Implement a general global section here!!!! Tₓ𝔐 → G×𝔤 !!!!!! (think about random initialization!)

source
GeometricMachineLearning.GradientLayerPMethod

The gradient layer that changes the $q$ component. It is of the form:

\[\begin{bmatrix} \mathbb{I} & \mathbb{O} \\ \nabla{}V & \mathbb{I} -\end{bmatrix},\]

with $V(p) = \sum_{i=1}^Ma_i\Sigma(\sum_jk_{ij}p_j+b_i)$, where $\Sigma$ is the antiderivative of the activation function $\sigma$ (one-layer neural network). We refer to $M$ as the upscaling dimension. Such layers are by construction symplectic.

source
GeometricMachineLearning.GradientLayerQMethod

The gradient layer that changes the $q$ component. It is of the form:

\[\begin{bmatrix} +\end{bmatrix},\]

with $V(p) = \sum_{i=1}^Ma_i\Sigma(\sum_jk_{ij}p_j+b_i)$, where $\Sigma$ is the antiderivative of the activation function $\sigma$ (one-layer neural network). We refer to $M$ as the upscaling dimension. Such layers are by construction symplectic.

source
GeometricMachineLearning.GradientLayerQMethod

The gradient layer that changes the $q$ component. It is of the form:

\[\begin{bmatrix} \mathbb{I} & \nabla{}V \\ \mathbb{O} & \mathbb{I} -\end{bmatrix},\]

with $V(p) = \sum_{i=1}^Ma_i\Sigma(\sum_jk_{ij}p_j+b_i)$, where $\Sigma$ is the antiderivative of the activation function $\sigma$ (one-layer neural network). We refer to $M$ as the upscaling dimension. Such layers are by construction symplectic.

source
GeometricMachineLearning.LASympNetType

LASympNet is called with a single input argument, the system dimension, or with an instance of DataLoader. Optional input arguments are:

  • depth::Int: The number of linear layers that are applied. The default is 5.
  • nhidden::Int: The number of hidden layers (i.e. layers that are not input or output layers). The default is 2.
  • activation: The activation function that is applied. By default this is tanh.
  • init_upper_linear::Bool: Initialize the linear layer so that it first modifies the $q$-component. The default is true.
  • init_upper_act::Bool: Initialize the activation layer so that it first modifies the $q$-component. The default is true.
source
GeometricMachineLearning.LinearLayerPMethod

Equivalent to a left multiplication by the matrix:

\[\begin{pmatrix} +\end{bmatrix},\]

with $V(p) = \sum_{i=1}^Ma_i\Sigma(\sum_jk_{ij}p_j+b_i)$, where $\Sigma$ is the antiderivative of the activation function $\sigma$ (one-layer neural network). We refer to $M$ as the upscaling dimension. Such layers are by construction symplectic.

source
GeometricMachineLearning.LASympNetType

LASympNet is called with a single input argument, the system dimension, or with an instance of DataLoader. Optional input arguments are:

  • depth::Int: The number of linear layers that are applied. The default is 5.
  • nhidden::Int: The number of hidden layers (i.e. layers that are not input or output layers). The default is 2.
  • activation: The activation function that is applied. By default this is tanh.
  • init_upper_linear::Bool: Initialize the linear layer so that it first modifies the $q$-component. The default is true.
  • init_upper_act::Bool: Initialize the activation layer so that it first modifies the $q$-component. The default is true.
source
GeometricMachineLearning.LowerTriangularType

A lower-triangular matrix is an $n\times{}n$ matrix that has ones on the diagonal and zeros on the upper triangular.

The data are stored in a vector $S$ similarly to SkewSymMatrix.

The struct two fields: S and n. The first stores all the entries of the matrix in a sparse fashion (in a vector) and the second is the dimension $n$ for $A\in\mathbb{R}^{n\times{}n}$.

source
GeometricMachineLearning.OptimizerType

Optimizer struct that stores the 'method' (i.e. Adam with corresponding hyperparameters), the cache and the optimization step.

It takes as input an optimization method and the parameters of a network.

For technical reasons we first specify an OptimizerMethod that stores all the hyperparameters of the optimizer.

source
GeometricMachineLearning.OptimizerMethod

A functor for Optimizer. It is called with: - nn::NeuralNetwork - dl::DataLoader - batch::Batch - n_epochs::Int - loss

The last argument is a function through which Zygote differentiates. This argument is optional; if it is not supplied GeometricMachineLearning defaults to an appropriate loss for the DataLoader.

source
GeometricMachineLearning.PSDLayerType

This is a PSD-like layer used for symplectic autoencoders. One layer has the following shape:

\[A = \begin{bmatrix} \Phi & \mathbb{O} \\ \mathbb{O} & \Phi \end{bmatrix},\]

where $\Phi$ is an element of the Stiefel manifold $St(n, N)$.

The constructor of PSDLayer is called by PSDLayer(M, N; retraction=retraction):

  • M is the input dimension.
  • N is the output dimension.
  • retraction is an instance of a struct with supertype AbstractRetraction. The only options at the moment are Geodesic() and Cayley().
source
GeometricMachineLearning.ReducedSystemType

ReducedSystem computes the reconstructed dynamics in the full system based on the reduced one. Optionally it can be compared to the FOM solution.

It can be called using the following constructor: ReducedSystem(N, n, encoder, decoder, fullvectorfield, reducedvectorfield, params, tspan, tstep, ics, projection_error) where

  • encoder: a function $\mathbb{R}^{2N}\mapsto{}\mathbb{R}^{2n}$
  • decoder: a (differentiable) function $\mathbb{R}^{2n}\mapsto\mathbb{R}^{2N}$
  • fullvectorfield: a (differentiable) mapping defined the same way as in GeometricIntegrators
  • reducedvectorfield: a (differentiable) mapping defined the same way as in GeometricIntegrators
  • params: a NamedTuple that parametrizes the vector fields (the same for fullvectorfield and reducedvectorfield)
  • tspan: a tuple (t₀, tₗ) that specifies start and end point of the time interval over which integration is performed.
  • tstep: the time step
  • ics: the initial condition for the big system.
  • projection_error: the error $||M - \mathcal{R}\circ\mathcal{P}(M)||$ where $M$ is the snapshot matrix; $\mathcal{P}$ and $\mathcal{R}$ are the reduction and reconstruction respectively.
source
GeometricMachineLearning.RegularTransformerIntegratorType

The regular transformer used as an integrator (multi-step method).

The constructor is called with the following arguments:

  • sys_dim::Int
  • transformer_dim::Int: the default is transformer_dim = sys_dim.
  • n_blocks::Int: The default is 1.
  • n_heads::Int: the number of heads in the multihead attentio layer (default is n_heads = sys_dim)
  • L::Int the number of transformer blocks (default is L = 2).
  • upscaling_activation: by default identity
  • resnet_activation: by default tanh
  • add_connection:Bool=true (keyword argument): if the input should be added to the output.
source
GeometricMachineLearning.SkewSymMatrixType

A SkewSymMatrix is a matrix $A$ s.t. $A^T = -A$.

If the constructor is called with a matrix as input it returns a symmetric matrix via the projection $A \mapsto \frac{1}{2}(A - A^T)$. This is a projection defined via the canonical metric $\mathbb{R}^{n\times{}n}\times\mathbb{R}^{n\times{}n}\to\mathbb{R}, (A,B) \mapsto \mathrm{Tr}(A^TB)$.

The first index is the row index, the second one the column index.

The struct two fields: S and n. The first stores all the entries of the matrix in a sparse fashion (in a vector) and the second is the dimension $n$ for $A\in\mathbb{R}^{n\times{}n}$.

source
GeometricMachineLearning.StiefelLieAlgHorMatrixType

StiefelLieAlgHorMatrix is the horizontal component of the Lie algebra of skew-symmetric matrices (with respect to the canonical metric). The projection here is: (\pi:S \to SE ) where

\[E = \begin{pmatrix} \mathbb{I}_{n} \\ \mathbb{O}_{(N-n)\times{}n} \end{pmatrix}.\]

The matrix (E) is implemented under StiefelProjection in GeometricMachineLearning.

An element of StiefelLieAlgMatrix takes the form:

\[\begin{pmatrix} +\end{pmatrix}, \]

where $B$ is a symmetric matrix.

source
GeometricMachineLearning.LowerTriangularType

A lower-triangular matrix is an $n\times{}n$ matrix that has ones on the diagonal and zeros on the upper triangular.

The data are stored in a vector $S$ similarly to SkewSymMatrix.

The struct two fields: S and n. The first stores all the entries of the matrix in a sparse fashion (in a vector) and the second is the dimension $n$ for $A\in\mathbb{R}^{n\times{}n}$.

source
GeometricMachineLearning.OptimizerType

Optimizer struct that stores the 'method' (i.e. Adam with corresponding hyperparameters), the cache and the optimization step.

It takes as input an optimization method and the parameters of a network.

For technical reasons we first specify an OptimizerMethod that stores all the hyperparameters of the optimizer.

source
GeometricMachineLearning.OptimizerMethod

A functor for Optimizer. It is called with: - nn::NeuralNetwork - dl::DataLoader - batch::Batch - n_epochs::Int - loss

The last argument is a function through which Zygote differentiates. This argument is optional; if it is not supplied GeometricMachineLearning defaults to an appropriate loss for the DataLoader.

source
GeometricMachineLearning.PSDLayerType

This is a PSD-like layer used for symplectic autoencoders. One layer has the following shape:

\[A = \begin{bmatrix} \Phi & \mathbb{O} \\ \mathbb{O} & \Phi \end{bmatrix},\]

where $\Phi$ is an element of the Stiefel manifold $St(n, N)$.

The constructor of PSDLayer is called by PSDLayer(M, N; retraction=retraction):

  • M is the input dimension.
  • N is the output dimension.
  • retraction is an instance of a struct with supertype AbstractRetraction. The only options at the moment are Geodesic() and Cayley().
source
GeometricMachineLearning.ReducedSystemType

ReducedSystem computes the reconstructed dynamics in the full system based on the reduced one. Optionally it can be compared to the FOM solution.

It can be called using the following constructor: ReducedSystem(N, n, encoder, decoder, fullvectorfield, reducedvectorfield, params, tspan, tstep, ics, projection_error) where

  • encoder: a function $\mathbb{R}^{2N}\mapsto{}\mathbb{R}^{2n}$
  • decoder: a (differentiable) function $\mathbb{R}^{2n}\mapsto\mathbb{R}^{2N}$
  • fullvectorfield: a (differentiable) mapping defined the same way as in GeometricIntegrators
  • reducedvectorfield: a (differentiable) mapping defined the same way as in GeometricIntegrators
  • params: a NamedTuple that parametrizes the vector fields (the same for fullvectorfield and reducedvectorfield)
  • tspan: a tuple (t₀, tₗ) that specifies start and end point of the time interval over which integration is performed.
  • tstep: the time step
  • ics: the initial condition for the big system.
  • projection_error: the error $||M - \mathcal{R}\circ\mathcal{P}(M)||$ where $M$ is the snapshot matrix; $\mathcal{P}$ and $\mathcal{R}$ are the reduction and reconstruction respectively.
source
GeometricMachineLearning.RegularTransformerIntegratorType

The regular transformer used as an integrator (multi-step method).

The constructor is called with the following arguments:

  • sys_dim::Int
  • transformer_dim::Int: the default is transformer_dim = sys_dim.
  • n_blocks::Int: The default is 1.
  • n_heads::Int: the number of heads in the multihead attentio layer (default is n_heads = sys_dim)
  • L::Int the number of transformer blocks (default is L = 2).
  • upscaling_activation: by default identity
  • resnet_activation: by default tanh
  • add_connection:Bool=true (keyword argument): if the input should be added to the output.
source
GeometricMachineLearning.SkewSymMatrixType

A SkewSymMatrix is a matrix $A$ s.t. $A^T = -A$.

If the constructor is called with a matrix as input it returns a symmetric matrix via the projection $A \mapsto \frac{1}{2}(A - A^T)$. This is a projection defined via the canonical metric $\mathbb{R}^{n\times{}n}\times\mathbb{R}^{n\times{}n}\to\mathbb{R}, (A,B) \mapsto \mathrm{Tr}(A^TB)$.

The first index is the row index, the second one the column index.

The struct two fields: S and n. The first stores all the entries of the matrix in a sparse fashion (in a vector) and the second is the dimension $n$ for $A\in\mathbb{R}^{n\times{}n}$.

source
GeometricMachineLearning.StiefelLieAlgHorMatrixType

StiefelLieAlgHorMatrix is the horizontal component of the Lie algebra of skew-symmetric matrices (with respect to the canonical metric). The projection here is: (\pi:S \to SE ) where

\[E = \begin{pmatrix} \mathbb{I}_{n} \\ \mathbb{O}_{(N-n)\times{}n} \end{pmatrix}.\]

The matrix (E) is implemented under StiefelProjection in GeometricMachineLearning.

An element of StiefelLieAlgMatrix takes the form:

\[\begin{pmatrix} A & B^T \\ B & \mathbb{O} \end{pmatrix},\]

where (A) is skew-symmetric (this is SkewSymMatrix in GeometricMachineLearning).

If the constructor is called with a big (N\times{}N) matrix, then the projection is performed the following way:

\[\begin{pmatrix} A & B_1 \\ @@ -28,11 +28,11 @@ \begin{pmatrix} \mathrm{skew}(A) & -B_2^T \\ B_2 & \mathbb{O} -\end{pmatrix}.\]

The operation $\mathrm{skew}:\mathbb{R}^{n\times{}n}\to\mathcal{S}_\mathrm{skew}(n)$ is the skew-symmetrization operation. This is equivalent to calling the constructor of SkewSymMatrix with an (n\times{}n) matrix.

source
GeometricMachineLearning.StiefelProjectionType

An array that essentially does vcat(I(n), zeros(N-n, n)) with GPU support. It has three inner constructors. The first one is called with the following arguments:

  1. backend: backends as supported by KernelAbstractions.
  2. T::Type
  3. N::Integer
  4. n::Integer

The second constructor is called by supplying a matrix as input. The constructor will then extract the backend, the type and the dimensions of that matrix.

The third constructor is called by supplying an instance of StiefelLieAlgHorMatrix.

Technically this should be a subtype of StiefelManifold.

source
GeometricMachineLearning.SymmetricMatrixType

A SymmetricMatrix $A$ is a matrix $A^T = A$.

If the constructor is called with a matrix as input it returns a symmetric matrix via the projection:

\[A \mapsto \frac{1}{2}(A + A^T).\]

This is a projection defined via the canonical metric $(A,B) \mapsto \mathrm{tr}(A^TB)$.

Internally the struct saves a vector $S$ of size $n(n+1)\div2$. The conversion is done the following way:

\[[A]_{ij} = \begin{cases} S[( (i-1) i ) \div 2 + j] & \text{if $i\geq{}j$}\\ - S[( (j-1) j ) \div 2 + i] & \text{else}. \end{cases}\]

So $S$ stores a string of vectors taken from $A$: $S = [\tilde{a}_1, \tilde{a}_2, \ldots, \tilde{a}_n]$ with $\tilde{a}_i = [[A]_{i1},[A]_{i2},\ldots,[A]_{ii}]$.

source
GeometricMachineLearning.SympNetLayerType

Implements the various layers from the SympNet paper: (https://www.sciencedirect.com/science/article/abs/pii/S0893608020303063). This is a super type of Gradient, Activation and Linear.

For the linear layer, the activation and the bias are left out, and for the activation layer $K$ and $b$ are left out!

source
GeometricMachineLearning.SymplecticPotentialType

SymplecticPotential(n)

Returns a symplectic matrix of size 2n x 2n

\[\begin{pmatrix} +\end{pmatrix}.\]

The operation $\mathrm{skew}:\mathbb{R}^{n\times{}n}\to\mathcal{S}_\mathrm{skew}(n)$ is the skew-symmetrization operation. This is equivalent to calling the constructor of SkewSymMatrix with an (n\times{}n) matrix.

source
GeometricMachineLearning.StiefelProjectionType

An array that essentially does vcat(I(n), zeros(N-n, n)) with GPU support. It has three inner constructors. The first one is called with the following arguments:

  1. backend: backends as supported by KernelAbstractions.
  2. T::Type
  3. N::Integer
  4. n::Integer

The second constructor is called by supplying a matrix as input. The constructor will then extract the backend, the type and the dimensions of that matrix.

The third constructor is called by supplying an instance of StiefelLieAlgHorMatrix.

Technically this should be a subtype of StiefelManifold.

source
GeometricMachineLearning.SymmetricMatrixType

A SymmetricMatrix $A$ is a matrix $A^T = A$.

If the constructor is called with a matrix as input it returns a symmetric matrix via the projection:

\[A \mapsto \frac{1}{2}(A + A^T).\]

This is a projection defined via the canonical metric $(A,B) \mapsto \mathrm{tr}(A^TB)$.

Internally the struct saves a vector $S$ of size $n(n+1)\div2$. The conversion is done the following way:

\[[A]_{ij} = \begin{cases} S[( (i-1) i ) \div 2 + j] & \text{if $i\geq{}j$}\\ + S[( (j-1) j ) \div 2 + i] & \text{else}. \end{cases}\]

So $S$ stores a string of vectors taken from $A$: $S = [\tilde{a}_1, \tilde{a}_2, \ldots, \tilde{a}_n]$ with $\tilde{a}_i = [[A]_{i1},[A]_{i2},\ldots,[A]_{ii}]$.

source
GeometricMachineLearning.SympNetLayerType

Implements the various layers from the SympNet paper: (https://www.sciencedirect.com/science/article/abs/pii/S0893608020303063). This is a super type of Gradient, Activation and Linear.

For the linear layer, the activation and the bias are left out, and for the activation layer $K$ and $b$ are left out!

source
GeometricMachineLearning.UpperTriangularType

An upper-triangular matrix is an $n\times{}n$ matrix that has ones on the diagonal and zeros on the upper triangular.

The data are stored in a vector $S$ similarly to SkewSymMatrix.

The struct two fields: S and n. The first stores all the entries of the matrix in a sparse fashion (in a vector) and the second is the dimension $n$ for $A\in\mathbb{R}^{n\times{}n}$.

source
GeometricMachineLearning.VolumePreservingAttentionType

Volume-preserving attention (single head attention)

Drawbacks:

  • the super fast activation is only implemented for sequence lengths of 2, 3, 4 and 5.
  • other sequence lengths only work on CPU for now (lu decomposition has to be implemented to work for tensors in parallel).

Constructor

The constructor is called with:

  • dim::Int: The system dimension
  • seq_length::Int: The sequence length to be considered. The default is zero, i.e. arbitrary sequence lengths; this works for all sequence lengths but doesn't apply the super-fast activation.
  • skew_sym::Bool (keyword argument): specifies if we the weight matrix is skew symmetric or arbitrary (default is false).

Functor

Applying a layer of type VolumePreservingAttention does the following:

  • First we perform the operation $X \mapsto X^T A X =: C$, where $X\in\mathbb{R}^{N\times\mathtt{seq\_length}}$ is a vector containing time series data and $A$ is the skew symmetric matrix associated with the layer.
  • In a second step we compute the Cayley transform of $C$; $\Lambda = \mathrm{Cayley}(C)$.
  • The output of the layer is then $X\Lambda$.
source
GeometricMachineLearning.VolumePreservingFeedForwardType

Realizes a volume-preserving neural network as a combination of VolumePreservingLowerLayer and VolumePreservingUpperLayer.

Constructor

The constructor is called with the following arguments:

  • sys_dim::Int: The system dimension.
  • n_blocks::Int: The number of blocks in the neural network (containing linear layers and nonlinear layers). Default is 1.
  • n_linear::Int: The number of linear VolumePreservingLowerLayers and VolumePreservingUpperLayers in one block. Default is 1.
  • activation: The activation function for the nonlinear layers in a block.
  • init_upper::Bool=false (keyword argument): Specifies if the first layer is lower or upper.
source
GeometricMachineLearning.VolumePreservingFeedForwardLayerType

Super-type of VolumePreservingLowerLayer and VolumePreservingUpperLayer. The layers do the following:

\[x \mapsto \begin{cases} \sigma(Lx + b) & \text{where $L$ is }\mathtt{LowerTriangular} \\ \sigma(Ux + b) & \text{where $U$ is }\mathtt{UpperTriangular}. \end{cases}\]

The functor can be applied to a vecotr, a matrix or a tensor.

Constructor

The constructors are called with:

  • sys_dim::Int: the system dimension.
  • activation=tanh: the activation function.
  • include_bias::Bool=true (keyword argument): specifies whether a bias should be used.
source
AbstractNeuralNetworks.update!Method

Optimization for an entire neural networks with BFGS. What is different in this case is that we still have to initialize the cache.

If o.step == 1, then we initialize the cache

source
Base.iterateMethod

This function computes a trajectory for a Transformer that has already been trained for valuation purposes.

It takes as input:

  • nn: a NeuralNetwork (that has been trained).
  • ics: initial conditions (a matrix in $\mathbb{R}^{2n\times\mathtt{seq\_length}}$ or NamedTuple of two matrices in $\mathbb{R}^{n\times\mathtt{seq\_length}}$)
  • n_points::Int=100 (keyword argument): The number of steps for which we run the prediction.
  • prediction_window::Int=size(ics.q, 2): The prediction window (i.e. the number of steps we predict into the future) is equal to the sequence length (i.e. the number of input time steps) by default.
source
Base.iterateMethod

This function computes a trajectory for a SympNet that has already been trained for valuation purposes.

It takes as input:

  • nn: a NeuralNetwork (that has been trained).
  • ics: initial conditions (a NamedTuple of two vectors)
source
Base.vecMethod

If vec is applied onto Triangular, then the output is the associated vector.

source
Base.vecMethod

If vec is applied onto SkewSymMatrix, then the output is the associated vector.

source
GeometricMachineLearning.GradientFunction

This is an old constructor and will be depricated. For change_q=true it is equivalent to GradientLayerQ; for change_q=false it is equivalent to GradientLayerP.

If full_grad=false then ActivationLayer is called

source
GeometricMachineLearning.TransformerMethod

The architecture for a "transformer encoder" is essentially taken from arXiv:2010.11929, but with the difference that no layer normalization is employed. This is because we still need to find a generalization of layer normalization to manifolds.

The transformer is called with the following inputs:

  • dim: the dimension of the transformer
  • n_heads: the number of heads
  • L: the number of transformer blocks

In addition we have the following optional arguments:

  • activation: the activation function used for the ResNet (tanh by default)
  • Stiefel::Bool: if the matrices $P^V$, $P^Q$ and $P^K$ should live on a manifold (false by default)
  • retraction: which retraction should be used (Geodesic() by default)
  • add_connection::Bool: if the input should by added to the ouput after the MultiHeadAttention layer is used (true by default)
  • use_bias::Bool: If the ResNet should use a bias (true by default)
source
GeometricMachineLearning.accuracyMethod

Computes the accuracy (as opposed to the loss) of a neural network classifier.

It takes as input:

  • model::Chain
  • ps: parameters of the network
  • dl::DataLoader
source
GeometricMachineLearning.apply_layer_to_nt_and_return_arrayMethod

This function is used in the wrappers where the input to the SympNet layers is not a NamedTuple (as it should be) but an AbstractArray.

It converts the Array to a NamedTuple (via assign_q_and_p), then calls the SympNet routine(s) and converts back to an AbstractArray (with vcat).

source
GeometricMachineLearning.assign_batch_kernel!Method

Takes as input a batch tensor (to which the data are assigned), the whole data tensor and two vectors params and time_steps that include the specific parameters and time steps we want to assign.

Note that this assigns sequential data! For e.g. being processed by a transformer.

source
GeometricMachineLearning.assign_output_estimateMethod

The function assign_output_estimate is closely related to the transformer. It takes the last prediction_window columns of the output and uses them for the final prediction. i.e.

\[\mathbb{R}^{N\times\mathtt{pw}}\to\mathbb{R}^{N\times\mathtt{pw}}, + - noisemaker

source
GeometricMachineLearning.UpperTriangularType

An upper-triangular matrix is an $n\times{}n$ matrix that has ones on the diagonal and zeros on the upper triangular.

The data are stored in a vector $S$ similarly to SkewSymMatrix.

The struct two fields: S and n. The first stores all the entries of the matrix in a sparse fashion (in a vector) and the second is the dimension $n$ for $A\in\mathbb{R}^{n\times{}n}$.

source
GeometricMachineLearning.VolumePreservingAttentionType

Volume-preserving attention (single head attention)

Drawbacks:

  • the super fast activation is only implemented for sequence lengths of 2, 3, 4 and 5.
  • other sequence lengths only work on CPU for now (lu decomposition has to be implemented to work for tensors in parallel).

Constructor

The constructor is called with:

  • dim::Int: The system dimension
  • seq_length::Int: The sequence length to be considered. The default is zero, i.e. arbitrary sequence lengths; this works for all sequence lengths but doesn't apply the super-fast activation.
  • skew_sym::Bool (keyword argument): specifies if we the weight matrix is skew symmetric or arbitrary (default is false).

Functor

Applying a layer of type VolumePreservingAttention does the following:

  • First we perform the operation $X \mapsto X^T A X =: C$, where $X\in\mathbb{R}^{N\times\mathtt{seq\_length}}$ is a vector containing time series data and $A$ is the skew symmetric matrix associated with the layer.
  • In a second step we compute the Cayley transform of $C$; $\Lambda = \mathrm{Cayley}(C)$.
  • The output of the layer is then $X\Lambda$.
source
GeometricMachineLearning.VolumePreservingFeedForwardType

Realizes a volume-preserving neural network as a combination of VolumePreservingLowerLayer and VolumePreservingUpperLayer.

Constructor

The constructor is called with the following arguments:

  • sys_dim::Int: The system dimension.
  • n_blocks::Int: The number of blocks in the neural network (containing linear layers and nonlinear layers). Default is 1.
  • n_linear::Int: The number of linear VolumePreservingLowerLayers and VolumePreservingUpperLayers in one block. Default is 1.
  • activation: The activation function for the nonlinear layers in a block.
  • init_upper::Bool=false (keyword argument): Specifies if the first layer is lower or upper.
source
GeometricMachineLearning.VolumePreservingFeedForwardLayerType

Super-type of VolumePreservingLowerLayer and VolumePreservingUpperLayer. The layers do the following:

\[x \mapsto \begin{cases} \sigma(Lx + b) & \text{where $L$ is }\mathtt{LowerTriangular} \\ \sigma(Ux + b) & \text{where $U$ is }\mathtt{UpperTriangular}. \end{cases}\]

The functor can be applied to a vecotr, a matrix or a tensor.

Constructor

The constructors are called with:

  • sys_dim::Int: the system dimension.
  • activation=tanh: the activation function.
  • include_bias::Bool=true (keyword argument): specifies whether a bias should be used.
source
AbstractNeuralNetworks.update!Method

Optimization for an entire neural networks with BFGS. What is different in this case is that we still have to initialize the cache.

If o.step == 1, then we initialize the cache

source
Base.iterateMethod

This function computes a trajectory for a Transformer that has already been trained for valuation purposes.

It takes as input:

  • nn: a NeuralNetwork (that has been trained).
  • ics: initial conditions (a matrix in $\mathbb{R}^{2n\times\mathtt{seq\_length}}$ or NamedTuple of two matrices in $\mathbb{R}^{n\times\mathtt{seq\_length}}$)
  • n_points::Int=100 (keyword argument): The number of steps for which we run the prediction.
  • prediction_window::Int=size(ics.q, 2): The prediction window (i.e. the number of steps we predict into the future) is equal to the sequence length (i.e. the number of input time steps) by default.
source
Base.iterateMethod

This function computes a trajectory for a SympNet that has already been trained for valuation purposes.

It takes as input:

  • nn: a NeuralNetwork (that has been trained).
  • ics: initial conditions (a NamedTuple of two vectors)
source
Base.vecMethod

If vec is applied onto Triangular, then the output is the associated vector.

source
Base.vecMethod

If vec is applied onto SkewSymMatrix, then the output is the associated vector.

source
GeometricMachineLearning.GradientFunction

This is an old constructor and will be depricated. For change_q=true it is equivalent to GradientLayerQ; for change_q=false it is equivalent to GradientLayerP.

If full_grad=false then ActivationLayer is called

source
GeometricMachineLearning.TransformerMethod

The architecture for a "transformer encoder" is essentially taken from arXiv:2010.11929, but with the difference that no layer normalization is employed. This is because we still need to find a generalization of layer normalization to manifolds.

The transformer is called with the following inputs:

  • dim: the dimension of the transformer
  • n_heads: the number of heads
  • L: the number of transformer blocks

In addition we have the following optional arguments:

  • activation: the activation function used for the ResNet (tanh by default)
  • Stiefel::Bool: if the matrices $P^V$, $P^Q$ and $P^K$ should live on a manifold (false by default)
  • retraction: which retraction should be used (Geodesic() by default)
  • add_connection::Bool: if the input should by added to the ouput after the MultiHeadAttention layer is used (true by default)
  • use_bias::Bool: If the ResNet should use a bias (true by default)
source
GeometricMachineLearning.accuracyMethod

Computes the accuracy (as opposed to the loss) of a neural network classifier.

It takes as input:

  • model::Chain
  • ps: parameters of the network
  • dl::DataLoader
source
GeometricMachineLearning.apply_layer_to_nt_and_return_arrayMethod

This function is used in the wrappers where the input to the SympNet layers is not a NamedTuple (as it should be) but an AbstractArray.

It converts the Array to a NamedTuple (via assign_q_and_p), then calls the SympNet routine(s) and converts back to an AbstractArray (with vcat).

source
GeometricMachineLearning.assign_batch_kernel!Method

Takes as input a batch tensor (to which the data are assigned), the whole data tensor and two vectors params and time_steps that include the specific parameters and time steps we want to assign.

Note that this assigns sequential data! For e.g. being processed by a transformer.

source
GeometricMachineLearning.assign_output_estimateMethod

The function assign_output_estimate is closely related to the transformer. It takes the last prediction_window columns of the output and uses them for the final prediction. i.e.

\[\mathbb{R}^{N\times\mathtt{pw}}\to\mathbb{R}^{N\times\mathtt{pw}}, \begin{bmatrix} z^{(1)}_1 & \cdots & z^{(T)}_1 \\ \cdots & \cdots & \cdots \\ @@ -51,4 +51,4 @@ \begin{bmatrix} z^{(T - \mathtt{pw})}_1 & \cdots & z^{(T)}_1 \\ \cdots & \cdots & \cdots \\ - z^{(T - \mathtt{pw})}_n & \cdots & z^{(T})_n\end{bmatrix} \]

source
GeometricMachineLearning.assign_q_and_pMethod

Allocates two new arrays q and p whose first dimension is half of that of the input x. This should also be supplied through the second argument N.

The output is a Tuple containing q and p.

source
GeometricMachineLearning.init_optimizer_cacheMethod

Wrapper for the functions setup_adam_cache, setup_momentum_cache, setup_gradient_cache, setup_bfgs_cache. These appear outside of optimizer_caches.jl because the OptimizerMethods first have to be defined.

source
GeometricMachineLearning.lossMethod

Wrapper if we deal with a neural network.

You can supply an instance of NeuralNetwork instead of the two arguments model (of type Union{Chain, AbstractExplicitLayer}) and parameters (of type Union{Tuple, NamedTuple}).

source
GeometricMachineLearning.lossMethod

Computes the loss for a neural network and a data set. The computed loss is

\[||output - \mathcal{NN}(input)||_F/||output||_F,\]

where $||A||_F := \sqrt{\sum_{i_1,\ldots,i_k}|a_{i_1,\ldots,i_k}^2}|^2$ is the Frobenius norm.

It takes as input:

  • model::Union{Chain, AbstractExplicitLayer}
  • ps::Union{Tuple, NamedTuple}
  • input::Union{Array, NamedTuple}
  • output::Uniont{Array, NamedTuple}
source
GeometricMachineLearning.lossMethod

The autoencoder loss:

\[||output - \mathcal{NN}(input)||_F/||output||_F.\]

It takes as input:

  • model::Union{Chain, AbstractExplicitLayer}
  • ps::Union{Tuple, NamedTuple}
  • input::Union{Array, NamedTuple}
source
GeometricMachineLearning.metricMethod

Implements the canonical Riemannian metric for the Stiefel manifold:

\[g_Y: (\Delta_1, \Delta_2) \mapsto \mathrm{tr}(\Delta_1^T(\mathbb{I} - \frac{1}{2}YY^T)\Delta_2).\]

It is called with:

  • Y::StiefelManifold
  • Δ₁::AbstractMatrix
  • Δ₂::AbstractMatrix`
source
GeometricMachineLearning.onehotbatchMethod

One-hot-batch encoding of a vector of integers: $input\in\{0,1,\ldots,9\}^\ell$. The output is a tensor of shape $10\times1\times\ell$.

\[0 \mapsto \begin{bmatrix} 1 & 0 & \ldots & 0 \end{bmatrix}.\]

In more abstract terms: $i \mapsto e_i$.

source
GeometricMachineLearning.optimization_step!Method

Optimization for a single layer.

inputs:

  • o::Optimizer
  • d::Union{AbstractExplicitLayer, AbstractExplicitCell}
  • ps::NamedTuple: the parameters
  • C::NamedTuple: NamedTuple of the caches
  • dx::NamedTuple: NamedTuple of the derivatives (output of AD routine)

ps, C and dx must have the same keys.

source
GeometricMachineLearning.optimize_for_one_epoch!Method

Optimize for an entire epoch. For this you have to supply:

  • an instance of the optimizer.
  • the neural network model
  • the parameters of the model
  • the data (in form of DataLoader)
  • in instance of Batch that contains batch_size (and optionally seq_length)

With the optional argument:

  • the loss, which takes the model, the parameters ps and an instance of DataLoader as input.

The output of optimize_for_one_epoch! is the average loss over all batches of the epoch:

\[output = \frac{1}{\mathtt{steps\_per\_epoch}}\sum_{t=1}^\mathtt{steps\_per\_epoch}loss(\theta^{(t-1)}).\]

This is done because any reverse differentiation routine always has two outputs: a pullback and the value of the function it is differentiating. In the case of zygote: loss_value, pullback = Zygote.pullback(ps -> loss(ps), ps) (if the loss only depends on the parameters).

source
GeometricMachineLearning.rgradMethod

Computes the Riemannian gradient for the Stiefel manifold given an element $Y\in{}St(N,n)$ and a matrix $\nabla{}L\in\mathbb{R}^{N\times{}n}$ (the Euclidean gradient). It computes the Riemannian gradient with respect to the canonical metric (see the documentation for the function metric for an explanation of this). The precise form of the mapping is:

\[\mathtt{rgrad}(Y, \nabla{}L) \mapsto \nabla{}L - Y(\nabla{}L)^TY\]

It is called with inputs:

  • Y::StiefelManifold
  • e_grad::AbstractMatrix: i.e. the Euclidean gradient (what was called $\nabla{}L$) above.
source
GeometricMachineLearning.split_and_flattenMethod

split_and_flatten takes a tensor as input and produces another one as output (essentially rearranges the input data in an intricate way) so that it can easily be processed with a transformer.

The optional arguments are:

  • patch_length: by default this is 7.
  • number_of_patches: by default this is 16.
source
GeometricMachineLearning.tensor_mat_skew_sym_assignMethod

Takes as input:

  • Z::AbstractArray{T, 3}: A tensor that stores a bunch of time series.
  • A::AbstractMatrix: A matrix that is used to perform various scalar products.

For one of these time series the function performs the following computation:

\[ (z^{(i)}, z^{(j)}) \mapsto (z^{(i)})^TAz^{(j)} \text{ for } i > j.\]

The result of this are $n(n-2)\div2$ scalar products. These scalar products are written into a lower-triangular matrix and the final output of the function is a tensor of these lower-triangular matrices.

source
GeometricMachineLearning.train!Function
train!(...)

Perform a training of a neural networks on data using given method a training Method

Different ways of use:

train!(neuralnetwork, data, optimizer = GradientOptimizer(1e-2), training_method; nruns = 1000, batch_size = default(data, type), showprogress = false )

Arguments

  • neuralnetwork::LuxNeuralNetwork : the neural net work using LuxBackend
  • data : the data (see TrainingData)
  • optimizer = GradientOptimizer: the optimization method (see Optimizer)
  • training_method : specify the loss function used
  • nruns : number of iteration through the process with default value
  • batch_size : size of batch of data used for each step
source
GeometricMachineLearning.train!Method
train!(neuralnetwork, data, optimizer, training_method; nruns = 1000, batch_size, showprogress = false )

Arguments

  • neuralnetwork::LuxNeuralNetwork : the neural net work using LuxBackend
  • data::AbstractTrainingData : the data
  • ``
source
GeometricMachineLearning.transformer_lossMethod

The transformer works similarly to the regular loss, but with the difference that $\mathcal{NN}(input)$ and $output$ may have different sizes.

It takes as input:

  • model::Union{Chain, AbstractExplicitLayer}
  • ps::Union{Tuple, NamedTuple}
  • input::Union{Array, NamedTuple}
  • output::Uniont{Array, NamedTuple}
source
+ z^{(T - \mathtt{pw})}_n & \cdots & z^{(T})_n\end{bmatrix} \]

source
GeometricMachineLearning.assign_q_and_pMethod

Allocates two new arrays q and p whose first dimension is half of that of the input x. This should also be supplied through the second argument N.

The output is a Tuple containing q and p.

source
GeometricMachineLearning.init_optimizer_cacheMethod

Wrapper for the functions setup_adam_cache, setup_momentum_cache, setup_gradient_cache, setup_bfgs_cache. These appear outside of optimizer_caches.jl because the OptimizerMethods first have to be defined.

source
GeometricMachineLearning.lossMethod

Wrapper if we deal with a neural network.

You can supply an instance of NeuralNetwork instead of the two arguments model (of type Union{Chain, AbstractExplicitLayer}) and parameters (of type Union{Tuple, NamedTuple}).

source
GeometricMachineLearning.lossMethod

Computes the loss for a neural network and a data set. The computed loss is

\[||output - \mathcal{NN}(input)||_F/||output||_F,\]

where $||A||_F := \sqrt{\sum_{i_1,\ldots,i_k}|a_{i_1,\ldots,i_k}^2}|^2$ is the Frobenius norm.

It takes as input:

  • model::Union{Chain, AbstractExplicitLayer}
  • ps::Union{Tuple, NamedTuple}
  • input::Union{Array, NamedTuple}
  • output::Uniont{Array, NamedTuple}
source
GeometricMachineLearning.lossMethod

The autoencoder loss:

\[||output - \mathcal{NN}(input)||_F/||output||_F.\]

It takes as input:

  • model::Union{Chain, AbstractExplicitLayer}
  • ps::Union{Tuple, NamedTuple}
  • input::Union{Array, NamedTuple}
source
GeometricMachineLearning.metricMethod

Implements the canonical Riemannian metric for the Stiefel manifold:

\[g_Y: (\Delta_1, \Delta_2) \mapsto \mathrm{tr}(\Delta_1^T(\mathbb{I} - \frac{1}{2}YY^T)\Delta_2).\]

It is called with:

  • Y::StiefelManifold
  • Δ₁::AbstractMatrix
  • Δ₂::AbstractMatrix`
source
GeometricMachineLearning.onehotbatchMethod

One-hot-batch encoding of a vector of integers: $input\in\{0,1,\ldots,9\}^\ell$. The output is a tensor of shape $10\times1\times\ell$.

\[0 \mapsto \begin{bmatrix} 1 & 0 & \ldots & 0 \end{bmatrix}.\]

In more abstract terms: $i \mapsto e_i$.

source
GeometricMachineLearning.optimization_step!Method

Optimization for a single layer.

inputs:

  • o::Optimizer
  • d::Union{AbstractExplicitLayer, AbstractExplicitCell}
  • ps::NamedTuple: the parameters
  • C::NamedTuple: NamedTuple of the caches
  • dx::NamedTuple: NamedTuple of the derivatives (output of AD routine)

ps, C and dx must have the same keys.

source
GeometricMachineLearning.optimize_for_one_epoch!Method

Optimize for an entire epoch. For this you have to supply:

  • an instance of the optimizer.
  • the neural network model
  • the parameters of the model
  • the data (in form of DataLoader)
  • in instance of Batch that contains batch_size (and optionally seq_length)

With the optional argument:

  • the loss, which takes the model, the parameters ps and an instance of DataLoader as input.

The output of optimize_for_one_epoch! is the average loss over all batches of the epoch:

\[output = \frac{1}{\mathtt{steps\_per\_epoch}}\sum_{t=1}^\mathtt{steps\_per\_epoch}loss(\theta^{(t-1)}).\]

This is done because any reverse differentiation routine always has two outputs: a pullback and the value of the function it is differentiating. In the case of zygote: loss_value, pullback = Zygote.pullback(ps -> loss(ps), ps) (if the loss only depends on the parameters).

source
GeometricMachineLearning.rgradMethod

Computes the Riemannian gradient for the Stiefel manifold given an element $Y\in{}St(N,n)$ and a matrix $\nabla{}L\in\mathbb{R}^{N\times{}n}$ (the Euclidean gradient). It computes the Riemannian gradient with respect to the canonical metric (see the documentation for the function metric for an explanation of this). The precise form of the mapping is:

\[\mathtt{rgrad}(Y, \nabla{}L) \mapsto \nabla{}L - Y(\nabla{}L)^TY\]

It is called with inputs:

  • Y::StiefelManifold
  • e_grad::AbstractMatrix: i.e. the Euclidean gradient (what was called $\nabla{}L$) above.
source
GeometricMachineLearning.split_and_flattenMethod

split_and_flatten takes a tensor as input and produces another one as output (essentially rearranges the input data in an intricate way) so that it can easily be processed with a transformer.

The optional arguments are:

  • patch_length: by default this is 7.
  • number_of_patches: by default this is 16.
source
GeometricMachineLearning.tensor_mat_skew_sym_assignMethod

Takes as input:

  • Z::AbstractArray{T, 3}: A tensor that stores a bunch of time series.
  • A::AbstractMatrix: A matrix that is used to perform various scalar products.

For one of these time series the function performs the following computation:

\[ (z^{(i)}, z^{(j)}) \mapsto (z^{(i)})^TAz^{(j)} \text{ for } i > j.\]

The result of this are $n(n-2)\div2$ scalar products. These scalar products are written into a lower-triangular matrix and the final output of the function is a tensor of these lower-triangular matrices.

source
GeometricMachineLearning.train!Function
train!(...)

Perform a training of a neural networks on data using given method a training Method

Different ways of use:

train!(neuralnetwork, data, optimizer = GradientOptimizer(1e-2), training_method; nruns = 1000, batch_size = default(data, type), showprogress = false )

Arguments

  • neuralnetwork::LuxNeuralNetwork : the neural net work using LuxBackend
  • data : the data (see TrainingData)
  • optimizer = GradientOptimizer: the optimization method (see Optimizer)
  • training_method : specify the loss function used
  • nruns : number of iteration through the process with default value
  • batch_size : size of batch of data used for each step
source
GeometricMachineLearning.train!Method
train!(neuralnetwork, data, optimizer, training_method; nruns = 1000, batch_size, showprogress = false )

Arguments

  • neuralnetwork::LuxNeuralNetwork : the neural net work using LuxBackend
  • data::AbstractTrainingData : the data
  • ``
source
GeometricMachineLearning.transformer_lossMethod

The transformer works similarly to the regular loss, but with the difference that $\mathcal{NN}(input)$ and $output$ may have different sizes.

It takes as input:

  • model::Union{Chain, AbstractExplicitLayer}
  • ps::Union{Tuple, NamedTuple}
  • input::Union{Array, NamedTuple}
  • output::Uniont{Array, NamedTuple}
source
diff --git a/latest/manifolds/basic_topology/index.html b/latest/manifolds/basic_topology/index.html index 5ef3fcc16..651243d90 100644 --- a/latest/manifolds/basic_topology/index.html +++ b/latest/manifolds/basic_topology/index.html @@ -1,2 +1,2 @@ -Concepts from General Topology · GeometricMachineLearning.jl

Basic Concepts of General Topology

On this page we discuss basic notions of topology that are necessary to define and work manifolds. Here we largely omit concrete examples and only define concepts that are necessary for defining a manifold[1], namely the properties of being Hausdorff and second countable. For a wide range of examples and a detailed discussion of the theory see e.g. [5]. The here-presented theory is also (rudimentary) covered in most differential geometry books such as [6] and [7].

Definition: A topological space is a set $\mathcal{M}$ for which we define a collection of subsets of $\mathcal{M}$, which we denote by $\mathcal{T}$ and call the open subsets. $\mathcal{T}$ further has to satisfy the following three conditions:

  1. The empty set and $\mathcal{M}$ belong to $\mathcal{T}$.
  2. Any union of an arbitrary number of elements of $\mathcal{T}$ again belongs to $\mathcal{T}$.
  3. Any intersection of a finite number of elements of $\mathcal{T}$ again belongs to $\mathcal{T}$.

Based on this definition of a topological space we can now define what it means to be Hausdorff: Definition: A topological space $\mathcal{M}$ is said to be Hausdorff if for any two points $x,y\in\mathcal{M}$ we can find two open sets $U_x,U_y\in\mathcal{T}$ s.t. $x\in{}U_x, y\in{}U_y$ and $U_x\cap{}U_y=\{\}$.

We now give the second definition that we need for defining manifolds, that of second countability: Definition: A topological space $\mathcal{M}$ is said to be second-countable if we can find a countable subcollection of $\mathcal{T}$ called $\mathcal{U}$ s.t. $\forall{}U\in\mathcal{T}$ and $x\in{}U$ we can find an element $V\in\mathcal{U}$ for which $x\in{}V\subset{}U$.

We now give a few definitions and results that are needed for the inverse function theorem which is essential for practical applications of manifold theory.

Definition: A mapping $f$ between topological spaces $\mathcal{M}$ and $\mathcal{N}$ is called continuous if the preimage of every open set is again an open set, i.e. if $f^{-1}\{U\}\in\mathcal{T}$ for $U$ open in $\mathcal{N}$ and $\mathcal{T}$ the topology on $\mathcal{M}$.

Definition: A closed set of a topological space $\mathcal{M}$ is one whose complement is an open set, i.e. $F$ is closed if $F^c\in\mathcal{T}$, where the superscript ${}^c$ indicates the complement. For closed sets we thus have the following three properties:

  1. The empty set and $\mathcal{M}$ are closed sets.
  2. Any union of a finite number of closed sets is again closed.
  3. Any intersection of an arbitrary number of closed sets is again closed.

Theorem: The definition of continuity is equivalent to the following, second definition: $f:\mathcal{M}\to\mathcal{N}$ is continuous if $f^{-1}\{F\}\subset\mathcal{M}$ is a closed set for each closed set $F\subset\mathcal{N}$.

Proof: First assume that $f$ is continuous according to the first definition and not to the second. Then $f^{-1}{F}$ is not closed but $f^{-1}{F^c}$ is open. But $f^{-1}\{F^c\} = \{x\in\mathcal{M}:f(x)\not\in\mathcal{N}\} = (f^{-1}\{F\})^c$ cannot be open, else $f^{-1}\{F\}$ would be closed. The implication of the first definition under assumption of the second can be shown analogously.

Theorem: The property of a set $F$ being closed is equivalent to the following statement: If a point $y$ is such that for every open set $U$ containing it we have $U\cap{}F\neq\{\}$ then this point is contained in $F$.

Proof: We first proof that if a set is closed then the statement holds. Consider a closed set $F$ and a point $y\not\in{}F$ s.t. every open set containing $y$ has nonempty intersection with $F$. But the complement $F^c$ also is such a set, which is a clear contradiction. Now assume the above statement for a set $F$ and further assume $F$ is not closed. Its complement $F^c$ is thus not open. Now consider the interior of this set: $\mathrm{int}(F^c):=\cup\{U:U\subset{}F^c\}$, i.e. the biggest open set contained within $F^c$. Hence there must be a point $y$ which is in $F^c$ but is not in its interior, else $F^c$ would be equal to its interior, i.e. would be open. We further must be able to find an open set $U$ that contains $y$ but is also contained in $F^c$, else $y$ would be an element of $F$. A contradiction.

Definition: An open cover of a topological space $\mathcal{M}$ is a (not necessarily countable) collection of open sets $\{U_i\}_{i\mathcal{I}}$ s.t. their union contains $\mathcal{M}$. A finite open cover is a collection of a finite number of open sets that cover $\mathcal{M}$. We say that an open cover is reducible to a finite cover if we can find a finite number of elements in the open cover whose union still contains $\mathcal{M}$.

Definition: A topological space $\mathcal{M}$ is called compact if every open cover is reducible to a finite cover.

Theorem: Consider a continuous function $f:\mathcal{M}\to\mathcal{N}$ and a compact set $K\in\mathcal{M}$. Then $f(K)$ is also compact.

Proof: Consider an open cover of $f(K)$: $\{U_i\}_{i\in\mathcal{I}}$. Then $\{f^{-1}\{U_i\}\}_{i\in\mathcal{I}}$ is an open cover of $K$ and hence reducible to a finite cover $\{f^{-1}\{U_i\}\}_{i\in\{i_1,\ldots,i_n\}}$. But then $\{{U_i\}_{i\in\{i_1,\ldots,i_n}}$ also covers $f(K)$.

Theorem: A closed subset of a compact space is compact:

Proof: Call the closed set $F$ and consider an open cover of this set: $\{U\}_{i\in\mathcal{I}}$. Then this open cover combined with $F^c$ is an open cover for the entire compact space, hence reducible to a finite cover.

Theorem: A compact subset of a Hausdorff space is closed:

Proof: Consider a compact subset $K$. If $K$ is not closed, then there has to be a point $y\not\in{}K$ s.t. every open set containing $y$ intersects $K$. Because the surrounding space is Hausdorff we can now find the following two collections of open sets: $\{(U_z, U_{z,y}: U_z\cap{}U_{z,y}=\{\})\}_{z\in{}K}$. The open cover $\{U_z\}_{z\in{}K}$ is then reducible to a finite cover $\{U_z\}_{z\in\{z_1, \ldots, z_n\}}$. The intersection $\cap_{z\in{z_1, \ldots, z_n}}U_{z,y}$ is then an open set that contains $y$ but has no intersection with $K$. A contraction.

Theorem: If $\mathcal{M}$ is compact and $\mathcal{N}$ is Hausdorff, then the inverse of a continuous function $f:\mathcal{M}\to\mathcal{N}$ is again continuous, i.e. $f(V)$ is an open set in $\mathcal{N}$ for $V\in\mathcal{T}$.

Proof: We can equivalently show that every closed set is mapped to a closed set. First consider the set $K\in\mathcal{M}$. Its image is again compact and hence closed because $\mathcal{N}$ is Hausdorff.

References

[7]
S. I. Richard L. Bishop. Tensor Analysis on Manifolds (Dover Publications, 1980).
[6]
S. Lang. Fundamentals of differential geometry. Vol. 191 (Springer Science & Business Media, 2012).
[5]
S. Lipschutz. General Topology (McGraw-Hill Book Company, 1965).
  • 1Some authors (see e.g. [6]) do not require these properties. But since they constitute very weak restrictions and are always satisfied by the manifolds relevant for our purposes we require them here.
+Concepts from General Topology · GeometricMachineLearning.jl

Basic Concepts of General Topology

On this page we discuss basic notions of topology that are necessary to define and work manifolds. Here we largely omit concrete examples and only define concepts that are necessary for defining a manifold[1], namely the properties of being Hausdorff and second countable. For a wide range of examples and a detailed discussion of the theory see e.g. [5]. The here-presented theory is also (rudimentary) covered in most differential geometry books such as [6] and [7].

Definition: A topological space is a set $\mathcal{M}$ for which we define a collection of subsets of $\mathcal{M}$, which we denote by $\mathcal{T}$ and call the open subsets. $\mathcal{T}$ further has to satisfy the following three conditions:

  1. The empty set and $\mathcal{M}$ belong to $\mathcal{T}$.
  2. Any union of an arbitrary number of elements of $\mathcal{T}$ again belongs to $\mathcal{T}$.
  3. Any intersection of a finite number of elements of $\mathcal{T}$ again belongs to $\mathcal{T}$.

Based on this definition of a topological space we can now define what it means to be Hausdorff: Definition: A topological space $\mathcal{M}$ is said to be Hausdorff if for any two points $x,y\in\mathcal{M}$ we can find two open sets $U_x,U_y\in\mathcal{T}$ s.t. $x\in{}U_x, y\in{}U_y$ and $U_x\cap{}U_y=\{\}$.

We now give the second definition that we need for defining manifolds, that of second countability: Definition: A topological space $\mathcal{M}$ is said to be second-countable if we can find a countable subcollection of $\mathcal{T}$ called $\mathcal{U}$ s.t. $\forall{}U\in\mathcal{T}$ and $x\in{}U$ we can find an element $V\in\mathcal{U}$ for which $x\in{}V\subset{}U$.

We now give a few definitions and results that are needed for the inverse function theorem which is essential for practical applications of manifold theory.

Definition: A mapping $f$ between topological spaces $\mathcal{M}$ and $\mathcal{N}$ is called continuous if the preimage of every open set is again an open set, i.e. if $f^{-1}\{U\}\in\mathcal{T}$ for $U$ open in $\mathcal{N}$ and $\mathcal{T}$ the topology on $\mathcal{M}$.

Definition: A closed set of a topological space $\mathcal{M}$ is one whose complement is an open set, i.e. $F$ is closed if $F^c\in\mathcal{T}$, where the superscript ${}^c$ indicates the complement. For closed sets we thus have the following three properties:

  1. The empty set and $\mathcal{M}$ are closed sets.
  2. Any union of a finite number of closed sets is again closed.
  3. Any intersection of an arbitrary number of closed sets is again closed.

Theorem: The definition of continuity is equivalent to the following, second definition: $f:\mathcal{M}\to\mathcal{N}$ is continuous if $f^{-1}\{F\}\subset\mathcal{M}$ is a closed set for each closed set $F\subset\mathcal{N}$.

Proof: First assume that $f$ is continuous according to the first definition and not to the second. Then $f^{-1}{F}$ is not closed but $f^{-1}{F^c}$ is open. But $f^{-1}\{F^c\} = \{x\in\mathcal{M}:f(x)\not\in\mathcal{N}\} = (f^{-1}\{F\})^c$ cannot be open, else $f^{-1}\{F\}$ would be closed. The implication of the first definition under assumption of the second can be shown analogously.

Theorem: The property of a set $F$ being closed is equivalent to the following statement: If a point $y$ is such that for every open set $U$ containing it we have $U\cap{}F\neq\{\}$ then this point is contained in $F$.

Proof: We first proof that if a set is closed then the statement holds. Consider a closed set $F$ and a point $y\not\in{}F$ s.t. every open set containing $y$ has nonempty intersection with $F$. But the complement $F^c$ also is such a set, which is a clear contradiction. Now assume the above statement for a set $F$ and further assume $F$ is not closed. Its complement $F^c$ is thus not open. Now consider the interior of this set: $\mathrm{int}(F^c):=\cup\{U:U\subset{}F^c\}$, i.e. the biggest open set contained within $F^c$. Hence there must be a point $y$ which is in $F^c$ but is not in its interior, else $F^c$ would be equal to its interior, i.e. would be open. We further must be able to find an open set $U$ that contains $y$ but is also contained in $F^c$, else $y$ would be an element of $F$. A contradiction.

Definition: An open cover of a topological space $\mathcal{M}$ is a (not necessarily countable) collection of open sets $\{U_i\}_{i\mathcal{I}}$ s.t. their union contains $\mathcal{M}$. A finite open cover is a collection of a finite number of open sets that cover $\mathcal{M}$. We say that an open cover is reducible to a finite cover if we can find a finite number of elements in the open cover whose union still contains $\mathcal{M}$.

Definition: A topological space $\mathcal{M}$ is called compact if every open cover is reducible to a finite cover.

Theorem: Consider a continuous function $f:\mathcal{M}\to\mathcal{N}$ and a compact set $K\in\mathcal{M}$. Then $f(K)$ is also compact.

Proof: Consider an open cover of $f(K)$: $\{U_i\}_{i\in\mathcal{I}}$. Then $\{f^{-1}\{U_i\}\}_{i\in\mathcal{I}}$ is an open cover of $K$ and hence reducible to a finite cover $\{f^{-1}\{U_i\}\}_{i\in\{i_1,\ldots,i_n\}}$. But then $\{{U_i\}_{i\in\{i_1,\ldots,i_n}}$ also covers $f(K)$.

Theorem: A closed subset of a compact space is compact:

Proof: Call the closed set $F$ and consider an open cover of this set: $\{U\}_{i\in\mathcal{I}}$. Then this open cover combined with $F^c$ is an open cover for the entire compact space, hence reducible to a finite cover.

Theorem: A compact subset of a Hausdorff space is closed:

Proof: Consider a compact subset $K$. If $K$ is not closed, then there has to be a point $y\not\in{}K$ s.t. every open set containing $y$ intersects $K$. Because the surrounding space is Hausdorff we can now find the following two collections of open sets: $\{(U_z, U_{z,y}: U_z\cap{}U_{z,y}=\{\})\}_{z\in{}K}$. The open cover $\{U_z\}_{z\in{}K}$ is then reducible to a finite cover $\{U_z\}_{z\in\{z_1, \ldots, z_n\}}$. The intersection $\cap_{z\in{z_1, \ldots, z_n}}U_{z,y}$ is then an open set that contains $y$ but has no intersection with $K$. A contraction.

Theorem: If $\mathcal{M}$ is compact and $\mathcal{N}$ is Hausdorff, then the inverse of a continuous function $f:\mathcal{M}\to\mathcal{N}$ is again continuous, i.e. $f(V)$ is an open set in $\mathcal{N}$ for $V\in\mathcal{T}$.

Proof: We can equivalently show that every closed set is mapped to a closed set. First consider the set $K\in\mathcal{M}$. Its image is again compact and hence closed because $\mathcal{N}$ is Hausdorff.

References

[7]
S. I. Richard L. Bishop. Tensor Analysis on Manifolds (Dover Publications, 1980).
[6]
S. Lang. Fundamentals of differential geometry. Vol. 191 (Springer Science & Business Media, 2012).
[5]
S. Lipschutz. General Topology (McGraw-Hill Book Company, 1965).
  • 1Some authors (see e.g. [6]) do not require these properties. But since they constitute very weak restrictions and are always satisfied by the manifolds relevant for our purposes we require them here.
diff --git a/latest/manifolds/existence_and_uniqueness_theorem/index.html b/latest/manifolds/existence_and_uniqueness_theorem/index.html index 85f00fe9a..034748689 100644 --- a/latest/manifolds/existence_and_uniqueness_theorem/index.html +++ b/latest/manifolds/existence_and_uniqueness_theorem/index.html @@ -8,4 +8,4 @@ \end{aligned}\]

where we have used the triangle inequality in the first line. If we now let $m$ on the right-hand side first go to infinity then we get

\[\begin{aligned} |x_m-x_n| & \leq q^n|x_1 -x_0|\sum_{i=1}^{\infty}q^i & =q^n|x_1 -x_0| \frac{1}{1-q}, -\end{aligned}\]

proofing that the sequence is Cauchy. Because $\mathbb{R}^N$ is a complete metric space we get that $(x_n)_{n\in\mathbb{N}}$ is a convergent sequence. We call the limit of this sequence $x^*$. This completes the proof of the Banach fixed-point theorem.

+\end{aligned}\]

proofing that the sequence is Cauchy. Because $\mathbb{R}^N$ is a complete metric space we get that $(x_n)_{n\in\mathbb{N}}$ is a convergent sequence. We call the limit of this sequence $x^*$. This completes the proof of the Banach fixed-point theorem.

diff --git a/latest/manifolds/grassmann_manifold/index.html b/latest/manifolds/grassmann_manifold/index.html index c940b04c8..705bd0db5 100644 --- a/latest/manifolds/grassmann_manifold/index.html +++ b/latest/manifolds/grassmann_manifold/index.html @@ -1,2 +1,2 @@ -Grassmann · GeometricMachineLearning.jl

Grassmann Manifold

(The description of the Grassmann manifold is based on that of the Stiefel manifold, so this should be read first.)

An element of the Grassmann manifold $G(n,N)$ is a vector subspace $\subset\mathbb{R}^N$ of dimension $n$. Each such subspace (i.e. element of the Grassmann manifold) can be represented by a full-rank matrix $A\in\mathbb{R}^{N\times{}n}$ and we identify two elements with the following equivalence relation:

\[A_1 \sim A_2 \iff \exists{}C\in\mathbb{R}^{n\times{}n}\text{ s.t. }A_1C = A_2.\]

The resulting manifold is of dimension $n(N-n)$. One can find a parametrization of the manifold the following way: Because the matrix $Y$ has full rank, there have to be $n$ independent columns in it: $i_1, \ldots, i_n$. For simplicity assume that $i_1 = 1, i_2=2, \ldots, i_n=n$ and call the matrix made up by these columns $C$. Then the mapping to the coordinate chart is: $YC^{-1}$ and the last $N-n$ columns are the coordinates.

We can also define the Grassmann manifold based on the Stiefel manifold since elements of the Stiefel manifold are already full-rank matrices. In this case we have the following equivalence relation (for $Y_1, Y_2\in{}St(n,N)$):

\[Y_1 \sim Y_2 \iff \exists{}C\in{}O(n)\text{ s.t. }Y_1C = Y_2.\]

The Riemannian Gradient

Obtaining the Riemannian Gradient for the Grassmann manifold is slightly more difficult than it is in the case of the Stiefel manifold. Since the Grassmann manifold can be obtained from the Stiefel manifold through an equivalence relation however, we can use this as a starting point. In a first step we identify charts on the Grassmann manifold to make dealing with it easier. For this consider the following open cover of the Grassmann manifold (also see [8]):

\[\{\mathcal{U}_W\}_{W\in{}St(n, N)} \quad\text{where}\quad \mathcal{U}_W = \{\mathrm{span}(Y):\mathrm{det}(W^TY)\neq0\}.\]

We can find a canonical bijective mapping from the set $\mathcal{U}_W$ to the set $\mathcal{S}_W := \{Y\in\mathbb{R}^{N\times{}n}:W^TY=\mathbb{I}_n\}$:

\[\sigma_W: \mathcal{U}_W \to \mathcal{S}_W,\, \mathcal{Y}=\mathrm{span}(Y)\mapsto{}Y(W^TY)^{-1} =: \hat{Y}.\]

That $\sigma_W$ is well-defined is easy to see: Consider $YC$ with $C\in\mathbb{R}^{n\times{}n}$ non-singular. Then $YC(W^TYC)^{-1}=Y(W^TY)^{-1} = \hat{Y}$. With this isomorphism we can also find a representation of elements of the tangent space:

\[T_\mathcal{Y}\sigma_W: T_\mathcal{Y}Gr(n,N)\to{}T_{\hat{Y}}\mathcal{S}_W,\, \xi \mapsto (\xi_{\diamond{}Y} -\hat{Y}(W^T\xi_{\diamond{}Y}))(W^TY)^{-1}.\]

$\xi_{\diamond{}Y}$ is the representation of $\xi\in{}T_\mathcal{Y}Gr(n,N)$ for the point $Y\in{}St(n,N)$, i.e. $T_Y\pi(\xi_{\diamond{}Y}) = \xi$; because the map $\sigma_W$ does not care about the representation of $\mathrm{span}(Y)$ we can perform the variations in $St(n,N)$[1]:

\[\frac{d}{dt}Y(t)(W^TY(t))^{-1} = (\dot{Y}(0) - Y(W^TY)^{-1}W^T\dot{Y}(0))(W^TY)^{-1},\]

where $\dot{Y}(0)\in{}T_YSt(n,N)$. Also note that the representation of $\xi$ in $T_YSt(n,N)$ is not unique in general, but $T_\mathcal{Y}\sigma_W$ is still well-defined. To see this consider two curves $Y(t)$ and $\bar{Y}(t)$ for which we have $Y(0) = \bar{Y}(0) = Y$ and further $T\pi(\dot{Y}(0)) = T\pi(\dot{bar{Y}}(0))$. This is equivalent to being able to find a $C(\cdot):(-\varepsilon,\varepsilon)\to{}O(n)$ for which $C(0)=\mathbb{I}(0)$ s.t. $\bar{Y}(t) = Y(t)C(t)$. We thus have $\dot{\bar{Y}}(0) = \dot{Y}(0) + Y\dot{C}(0)$ and if we replace $\xi_{\diamond{}Y}$ above with the second term in the expression we get: $Y\dot{C}(0) - \hat{Y}W^T(Y\dot{C}(0)) = 0$. The parametrization of $T_\mathcal{Y}Gr(n,N)$ with $T_\mathcal{Y}\sigma_W$ is thus independent of the choice of $\dot{C}(0)$ and hence of $\xi_{\diamond{}Y}$ and is therefore well-defined.

Further note that we have $T_\mathcal{Y}\mathcal{U}_W = T_\mathcal{Y}Gr(n,N)$ because $\mathcal{U}_W$ is an open subset of $Gr(n,N)$. We thus can identify the tangent space $T_\mathcal{Y}Gr(n,N)$ with the following set (where we again have $\hat{Y}=Y(W^TY)^{-1}$):

\[T_{\hat{Y}}\mathcal{S}_W = \{(\Delta - Y(W^TY)^{-1}W^T\Delta)(W^T\Delta)^{-1}: Y\in{}St(n,N)\text{ s.t. }\mathrm{span}(Y)=\mathcal{Y}\text{ and }\Delta\in{}T_YSt(n,N)\}.\]

If we now further take $W=Y$[2] then we get the identification:

\[T_\mathcal{Y}Gr(n,N) \equiv \{\Delta - YY^T\Delta: Y\in{}St(n,N)\text{ s.t. }\mathrm{span}(Y)=\mathcal{Y}\text{ and }\Delta\in{}T_YSt(n,N)\},\]

which is very easy to handle computationally (we simply store and change the matrix $Y$ that represents an element of the Grassmann manifold). The Riemannian gradient is then

\[\mathrm{grad}_\mathcal{Y}^{Gr}L = \mathrm{grad}_Y^{St}L - YY^T\mathrm{grad}_Y^{St}L = \nabla_Y{}L - YY^T\nabla_YL,\]

where $\nabla_Y{}L$ again is the Euclidean gradient as in the Stiefel manifold case.

  • 1I.e. $Y(t)\in{}St(n,N)$ for $t\in(-\varepsilon,\varepsilon)$. We also set $Y(0) = Y$.
  • 2We can pick any element $W$ to construct the charts for a neighborhood around the point $\mathcal{Y}\in{}Gr(n,N)$ as long as we have $\mathrm{det}(W^TY)\neq0$ for $\mathrm{span}(Y)=\mathcal{Y}$.
+Grassmann · GeometricMachineLearning.jl

Grassmann Manifold

(The description of the Grassmann manifold is based on that of the Stiefel manifold, so this should be read first.)

An element of the Grassmann manifold $G(n,N)$ is a vector subspace $\subset\mathbb{R}^N$ of dimension $n$. Each such subspace (i.e. element of the Grassmann manifold) can be represented by a full-rank matrix $A\in\mathbb{R}^{N\times{}n}$ and we identify two elements with the following equivalence relation:

\[A_1 \sim A_2 \iff \exists{}C\in\mathbb{R}^{n\times{}n}\text{ s.t. }A_1C = A_2.\]

The resulting manifold is of dimension $n(N-n)$. One can find a parametrization of the manifold the following way: Because the matrix $Y$ has full rank, there have to be $n$ independent columns in it: $i_1, \ldots, i_n$. For simplicity assume that $i_1 = 1, i_2=2, \ldots, i_n=n$ and call the matrix made up by these columns $C$. Then the mapping to the coordinate chart is: $YC^{-1}$ and the last $N-n$ columns are the coordinates.

We can also define the Grassmann manifold based on the Stiefel manifold since elements of the Stiefel manifold are already full-rank matrices. In this case we have the following equivalence relation (for $Y_1, Y_2\in{}St(n,N)$):

\[Y_1 \sim Y_2 \iff \exists{}C\in{}O(n)\text{ s.t. }Y_1C = Y_2.\]

The Riemannian Gradient

Obtaining the Riemannian Gradient for the Grassmann manifold is slightly more difficult than it is in the case of the Stiefel manifold. Since the Grassmann manifold can be obtained from the Stiefel manifold through an equivalence relation however, we can use this as a starting point. In a first step we identify charts on the Grassmann manifold to make dealing with it easier. For this consider the following open cover of the Grassmann manifold (also see [8]):

\[\{\mathcal{U}_W\}_{W\in{}St(n, N)} \quad\text{where}\quad \mathcal{U}_W = \{\mathrm{span}(Y):\mathrm{det}(W^TY)\neq0\}.\]

We can find a canonical bijective mapping from the set $\mathcal{U}_W$ to the set $\mathcal{S}_W := \{Y\in\mathbb{R}^{N\times{}n}:W^TY=\mathbb{I}_n\}$:

\[\sigma_W: \mathcal{U}_W \to \mathcal{S}_W,\, \mathcal{Y}=\mathrm{span}(Y)\mapsto{}Y(W^TY)^{-1} =: \hat{Y}.\]

That $\sigma_W$ is well-defined is easy to see: Consider $YC$ with $C\in\mathbb{R}^{n\times{}n}$ non-singular. Then $YC(W^TYC)^{-1}=Y(W^TY)^{-1} = \hat{Y}$. With this isomorphism we can also find a representation of elements of the tangent space:

\[T_\mathcal{Y}\sigma_W: T_\mathcal{Y}Gr(n,N)\to{}T_{\hat{Y}}\mathcal{S}_W,\, \xi \mapsto (\xi_{\diamond{}Y} -\hat{Y}(W^T\xi_{\diamond{}Y}))(W^TY)^{-1}.\]

$\xi_{\diamond{}Y}$ is the representation of $\xi\in{}T_\mathcal{Y}Gr(n,N)$ for the point $Y\in{}St(n,N)$, i.e. $T_Y\pi(\xi_{\diamond{}Y}) = \xi$; because the map $\sigma_W$ does not care about the representation of $\mathrm{span}(Y)$ we can perform the variations in $St(n,N)$[1]:

\[\frac{d}{dt}Y(t)(W^TY(t))^{-1} = (\dot{Y}(0) - Y(W^TY)^{-1}W^T\dot{Y}(0))(W^TY)^{-1},\]

where $\dot{Y}(0)\in{}T_YSt(n,N)$. Also note that the representation of $\xi$ in $T_YSt(n,N)$ is not unique in general, but $T_\mathcal{Y}\sigma_W$ is still well-defined. To see this consider two curves $Y(t)$ and $\bar{Y}(t)$ for which we have $Y(0) = \bar{Y}(0) = Y$ and further $T\pi(\dot{Y}(0)) = T\pi(\dot{bar{Y}}(0))$. This is equivalent to being able to find a $C(\cdot):(-\varepsilon,\varepsilon)\to{}O(n)$ for which $C(0)=\mathbb{I}(0)$ s.t. $\bar{Y}(t) = Y(t)C(t)$. We thus have $\dot{\bar{Y}}(0) = \dot{Y}(0) + Y\dot{C}(0)$ and if we replace $\xi_{\diamond{}Y}$ above with the second term in the expression we get: $Y\dot{C}(0) - \hat{Y}W^T(Y\dot{C}(0)) = 0$. The parametrization of $T_\mathcal{Y}Gr(n,N)$ with $T_\mathcal{Y}\sigma_W$ is thus independent of the choice of $\dot{C}(0)$ and hence of $\xi_{\diamond{}Y}$ and is therefore well-defined.

Further note that we have $T_\mathcal{Y}\mathcal{U}_W = T_\mathcal{Y}Gr(n,N)$ because $\mathcal{U}_W$ is an open subset of $Gr(n,N)$. We thus can identify the tangent space $T_\mathcal{Y}Gr(n,N)$ with the following set (where we again have $\hat{Y}=Y(W^TY)^{-1}$):

\[T_{\hat{Y}}\mathcal{S}_W = \{(\Delta - Y(W^TY)^{-1}W^T\Delta)(W^T\Delta)^{-1}: Y\in{}St(n,N)\text{ s.t. }\mathrm{span}(Y)=\mathcal{Y}\text{ and }\Delta\in{}T_YSt(n,N)\}.\]

If we now further take $W=Y$[2] then we get the identification:

\[T_\mathcal{Y}Gr(n,N) \equiv \{\Delta - YY^T\Delta: Y\in{}St(n,N)\text{ s.t. }\mathrm{span}(Y)=\mathcal{Y}\text{ and }\Delta\in{}T_YSt(n,N)\},\]

which is very easy to handle computationally (we simply store and change the matrix $Y$ that represents an element of the Grassmann manifold). The Riemannian gradient is then

\[\mathrm{grad}_\mathcal{Y}^{Gr}L = \mathrm{grad}_Y^{St}L - YY^T\mathrm{grad}_Y^{St}L = \nabla_Y{}L - YY^T\nabla_YL,\]

where $\nabla_Y{}L$ again is the Euclidean gradient as in the Stiefel manifold case.

  • 1I.e. $Y(t)\in{}St(n,N)$ for $t\in(-\varepsilon,\varepsilon)$. We also set $Y(0) = Y$.
  • 2We can pick any element $W$ to construct the charts for a neighborhood around the point $\mathcal{Y}\in{}Gr(n,N)$ as long as we have $\mathrm{det}(W^TY)\neq0$ for $\mathrm{span}(Y)=\mathcal{Y}$.
diff --git a/latest/manifolds/homogeneous_spaces/index.html b/latest/manifolds/homogeneous_spaces/index.html index c3d2871e6..ce31a1564 100644 --- a/latest/manifolds/homogeneous_spaces/index.html +++ b/latest/manifolds/homogeneous_spaces/index.html @@ -1,2 +1,2 @@ -Homogeneous Spaces · GeometricMachineLearning.jl

Homogeneous Spaces

Homogeneous spaces are manifolds $\mathcal{M}$ on which a Lie group $G$ acts transitively, i.e.

\[\forall X,Y\in\mathcal{M} \exists{}A\in{}G\text{ s.t. }AX = Y.\]

Now fix a distinct element $E\in\mathcal{M}$. We can also establish an isomorphism between $\mathcal{M}$ and the quotient space $G/\sim$ with the equivalence relation:

\[A_1 \sim A_2 \iff A_1E = A_2E.\]

Note that this is independent of the chosen $E$.

The tangent spaces of $\mathcal{M}$ are of the form $T_Y\mathcal{M} = \mathfrak{g}\cdot{}Y$, i.e. can be fully described through its Lie algebra. Based on this we can perform a splitting of $\mathfrak{g}$ into two parts:

  1. The vertical component $\mathfrak{g}^{\mathrm{ver},Y}$ is the kernel of the map $\mathfrak{g}\to{}T_Y\mathcal{M}, V \mapsto VY$, i.e. $\mathfrak{g}^{\mathrm{ver},Y} = \{V\in\mathfrak{g}:VY = 0\}.$

  2. The horizontal component $\mathfrak{g}^{\mathrm{hor},Y}$ is the orthogonal complement of $\mathfrak{g}^{\mathrm{ver},Y}$ in $\mathfrak{g}$. It is isomorphic to $T_Y\mathcal{M}$.

We will refer to the mapping from $T_Y\mathcal{M}$ to $\mathfrak{g}^{\mathrm{hor}, Y}$ by $\Omega$. If we have now defined a metric $\langle\cdot,\cdot\rangle$ on $\mathfrak{g}$, then this induces a Riemannian metric on $\mathcal{M}$:

\[g_Y(\Delta_1, \Delta_2) = \langle\Omega(Y,\Delta_1),\Omega(Y,\Delta_2)\rangle\text{ for $\Delta_1,\Delta_2\in{}T_Y\mathcal{M}$.}\]

Two examples of homogeneous spaces implemented in GeometricMachineLearning are the Stiefel and the Grassmann manifold.

References

  • Frankel, Theodore. The geometry of physics: an introduction. Cambridge university press, 2011.
+Homogeneous Spaces · GeometricMachineLearning.jl

Homogeneous Spaces

Homogeneous spaces are manifolds $\mathcal{M}$ on which a Lie group $G$ acts transitively, i.e.

\[\forall X,Y\in\mathcal{M} \exists{}A\in{}G\text{ s.t. }AX = Y.\]

Now fix a distinct element $E\in\mathcal{M}$. We can also establish an isomorphism between $\mathcal{M}$ and the quotient space $G/\sim$ with the equivalence relation:

\[A_1 \sim A_2 \iff A_1E = A_2E.\]

Note that this is independent of the chosen $E$.

The tangent spaces of $\mathcal{M}$ are of the form $T_Y\mathcal{M} = \mathfrak{g}\cdot{}Y$, i.e. can be fully described through its Lie algebra. Based on this we can perform a splitting of $\mathfrak{g}$ into two parts:

  1. The vertical component $\mathfrak{g}^{\mathrm{ver},Y}$ is the kernel of the map $\mathfrak{g}\to{}T_Y\mathcal{M}, V \mapsto VY$, i.e. $\mathfrak{g}^{\mathrm{ver},Y} = \{V\in\mathfrak{g}:VY = 0\}.$

  2. The horizontal component $\mathfrak{g}^{\mathrm{hor},Y}$ is the orthogonal complement of $\mathfrak{g}^{\mathrm{ver},Y}$ in $\mathfrak{g}$. It is isomorphic to $T_Y\mathcal{M}$.

We will refer to the mapping from $T_Y\mathcal{M}$ to $\mathfrak{g}^{\mathrm{hor}, Y}$ by $\Omega$. If we have now defined a metric $\langle\cdot,\cdot\rangle$ on $\mathfrak{g}$, then this induces a Riemannian metric on $\mathcal{M}$:

\[g_Y(\Delta_1, \Delta_2) = \langle\Omega(Y,\Delta_1),\Omega(Y,\Delta_2)\rangle\text{ for $\Delta_1,\Delta_2\in{}T_Y\mathcal{M}$.}\]

Two examples of homogeneous spaces implemented in GeometricMachineLearning are the Stiefel and the Grassmann manifold.

References

  • Frankel, Theodore. The geometry of physics: an introduction. Cambridge university press, 2011.
diff --git a/latest/manifolds/inverse_function_theorem/index.html b/latest/manifolds/inverse_function_theorem/index.html index 0299ef3aa..bee7fec55 100644 --- a/latest/manifolds/inverse_function_theorem/index.html +++ b/latest/manifolds/inverse_function_theorem/index.html @@ -4,4 +4,4 @@ & \leq |\eta|^{-1}||F'(x)||^{-1}|F(H(\xi+\eta)) - G(H(\xi+\eta)) - F(H(\xi)) + G(x) - \eta| \\ & = |\eta|^{-1}||F'(x)||^{-1}|\xi + \eta - G(H(\xi+\eta)) - \xi + G(x) - \eta| \\ & = |\eta|^{-1}||F'(x)||^{-1}|G(H(\xi+\eta)) - G(H(\xi))|, -\end{aligned}\]

and this goes to zero as $\eta$ goes to zero, because $H$ is continuous and therefore $H(\xi+\eta)$ goes to $H(\xi)=x$ and the expression on the right goes to zero as well.

References

[6]
S. Lang. Fundamentals of differential geometry. Vol. 191 (Springer Science & Business Media, 2012).
+\end{aligned}\]

and this goes to zero as $\eta$ goes to zero, because $H$ is continuous and therefore $H(\xi+\eta)$ goes to $H(\xi)=x$ and the expression on the right goes to zero as well.

References

[6]
S. Lang. Fundamentals of differential geometry. Vol. 191 (Springer Science & Business Media, 2012).
diff --git a/latest/manifolds/manifolds/index.html b/latest/manifolds/manifolds/index.html index b75622a2f..8e9c9c16c 100644 --- a/latest/manifolds/manifolds/index.html +++ b/latest/manifolds/manifolds/index.html @@ -1,2 +1,2 @@ -General Theory on Manifolds · GeometricMachineLearning.jl

(Matrix) Manifolds

Manifolds are topological spaces that locally look like vector spaces. In the following we restrict ourselves to finite-dimensional manifolds. Definition: A finite-dimensional smooth manifold of dimension $n$ is a second-countable Hausdorff space $\mathcal{M}$ for which $\forall{}x\in\mathcal{M}$ we can find a neighborhood $U$ that contains $x$ and a corresponding homeomorphism $\varphi_U:U\cong{}W\subset\mathbb{R}^n$ where $W$ is an open subset. The homeomorphisms $\varphi_U$ are referred to as coordinate charts. If two such coordinate charts overlap, i.e. if $U_1\cap{}U_2\neq\{\}$, then the map $\varphi_{U_2}^{-1}\circ\varphi_{U_1}$ is $C^\infty$.

One example of a manifold that is also important for GeometricMachineLearning.jl is the Lie group[1] of orthonormal matrices $SO(N)$. Before we can proof that $SO(N)$ is a manifold we first need another definition and a theorem:

Definition: Consider a smooth mapping $g: \mathcal{M}\to\mathcal{N}$ from one manifold to another. A point $B\in\mathcal{M}$ is called a regular value of $\mathcal{M}$ if $\forall{}A\in{}g^{-1}\{B\}$ the map $T_Ag:T_A\mathcal{M}\to{}T_{g(A)}\mathcal{N}$ is surjective.

Theorem: Consider a smooth map $g:\mathcal{M}\to\mathcal{N}$ from one manifold to another. Then the preimage of a regular point $B$ of $\mathcal{N}$ is a submanifold of $\mathcal{M}$. Furthermore the codimension of $g^{-1}\{B\}$ is equal to the dimension of $\mathcal{N}$ and the tangent space $T_A(g^{-1}\{B\})$ is equal to the kernel of $T_Ag$. This is known as the preimage theorem.

Proof:

Theorem: The group $SO(N)$ is a Lie group (i.e. has manifold structure). Proof: The vector space $\mathbb{R}^{N\times{}N}$ clearly has manifold structure. The group $SO(N)$ is equivalent to one of the level sets of the mapping: $f:\mathbb{R}^{N\times{}N}\to\mathcal{S}(N), A\mapsto{}A^TA$, i.e. it is the component of $f^{-1}\{\mathbb{I}\}$ that contains $\mathbb{I}$. We still need to proof that $\mathbb{I}$ is a regular point of $f$, i.e. that for $A\in{}SO(N)$ the mapping $T_Af$ is surjective. This means that $\forall{}B\in\mathcal{S}(N), A\in\mathbb{R}^{N\times{}N}$ $\exists{}C\in\mathbb{R}^{N\times{}N}$ s.t. $C^TA + A^TC = B$. The element $C=\frac{1}{2}AB\in\mathcal{R}^{N\times{}N}$ satisfies this property.

With the definition above we can generalize the notion of an ordinary differential equation (ODE) on a vector space to an ordinary differential equation on a manifold:

Definition: An ODE on a manifold is a mapping that assigns to each element of the manifold $A\in\mathcal{M}$ an element of the corresponding tangent space $T_A\mathcal{M}$.

  • 1Lie groups are manifolds that also have a group structure, i.e. there is an operation $\mathcal{M}\times\mathcal{M}\to\mathcal{M},(a,b)\mapsto{}ab$ s.t. $(ab)c = a(bc)$ and $\exists{}e\mathcal{M}$ s.t. $ae$ = $a$ $\forall{}a\in\mathcal{M}$.
+General Theory on Manifolds · GeometricMachineLearning.jl

(Matrix) Manifolds

Manifolds are topological spaces that locally look like vector spaces. In the following we restrict ourselves to finite-dimensional manifolds. Definition: A finite-dimensional smooth manifold of dimension $n$ is a second-countable Hausdorff space $\mathcal{M}$ for which $\forall{}x\in\mathcal{M}$ we can find a neighborhood $U$ that contains $x$ and a corresponding homeomorphism $\varphi_U:U\cong{}W\subset\mathbb{R}^n$ where $W$ is an open subset. The homeomorphisms $\varphi_U$ are referred to as coordinate charts. If two such coordinate charts overlap, i.e. if $U_1\cap{}U_2\neq\{\}$, then the map $\varphi_{U_2}^{-1}\circ\varphi_{U_1}$ is $C^\infty$.

One example of a manifold that is also important for GeometricMachineLearning.jl is the Lie group[1] of orthonormal matrices $SO(N)$. Before we can proof that $SO(N)$ is a manifold we first need another definition and a theorem:

Definition: Consider a smooth mapping $g: \mathcal{M}\to\mathcal{N}$ from one manifold to another. A point $B\in\mathcal{M}$ is called a regular value of $\mathcal{M}$ if $\forall{}A\in{}g^{-1}\{B\}$ the map $T_Ag:T_A\mathcal{M}\to{}T_{g(A)}\mathcal{N}$ is surjective.

Theorem: Consider a smooth map $g:\mathcal{M}\to\mathcal{N}$ from one manifold to another. Then the preimage of a regular point $B$ of $\mathcal{N}$ is a submanifold of $\mathcal{M}$. Furthermore the codimension of $g^{-1}\{B\}$ is equal to the dimension of $\mathcal{N}$ and the tangent space $T_A(g^{-1}\{B\})$ is equal to the kernel of $T_Ag$. This is known as the preimage theorem.

Proof:

Theorem: The group $SO(N)$ is a Lie group (i.e. has manifold structure). Proof: The vector space $\mathbb{R}^{N\times{}N}$ clearly has manifold structure. The group $SO(N)$ is equivalent to one of the level sets of the mapping: $f:\mathbb{R}^{N\times{}N}\to\mathcal{S}(N), A\mapsto{}A^TA$, i.e. it is the component of $f^{-1}\{\mathbb{I}\}$ that contains $\mathbb{I}$. We still need to proof that $\mathbb{I}$ is a regular point of $f$, i.e. that for $A\in{}SO(N)$ the mapping $T_Af$ is surjective. This means that $\forall{}B\in\mathcal{S}(N), A\in\mathbb{R}^{N\times{}N}$ $\exists{}C\in\mathbb{R}^{N\times{}N}$ s.t. $C^TA + A^TC = B$. The element $C=\frac{1}{2}AB\in\mathcal{R}^{N\times{}N}$ satisfies this property.

With the definition above we can generalize the notion of an ordinary differential equation (ODE) on a vector space to an ordinary differential equation on a manifold:

Definition: An ODE on a manifold is a mapping that assigns to each element of the manifold $A\in\mathcal{M}$ an element of the corresponding tangent space $T_A\mathcal{M}$.

  • 1Lie groups are manifolds that also have a group structure, i.e. there is an operation $\mathcal{M}\times\mathcal{M}\to\mathcal{M},(a,b)\mapsto{}ab$ s.t. $(ab)c = a(bc)$ and $\exists{}e\mathcal{M}$ s.t. $ae$ = $a$ $\forall{}a\in\mathcal{M}$.
diff --git a/latest/manifolds/stiefel_manifold/index.html b/latest/manifolds/stiefel_manifold/index.html index 034ee5888..167e4b093 100644 --- a/latest/manifolds/stiefel_manifold/index.html +++ b/latest/manifolds/stiefel_manifold/index.html @@ -2,4 +2,4 @@ Stiefel · GeometricMachineLearning.jl

Stiefel manifold

The Stiefel manifold $St(n, N)$ is the space (a homogeneous space) of all orthonormal frames in $\mathbb{R}^{N\times{}n}$, i.e. matrices $Y\in\mathbb{R}^{N\times{}n}$ s.t. $Y^TY = \mathbb{I}_n$. It can also be seen as the special orthonormal group $SO(N)$ modulo an equivalence relation: $A\sim{}B\iff{}AE = BE$ for

\[E = \begin{bmatrix} \mathbb{I}_n \\ \mathbb{O} -\end{bmatrix}\in\mathcal{M},\]

which is the canonical element of the Stiefel manifold. In words: the first $n$ columns of $A$ and $B$ are the same.

The tangent space to the element $Y\in{}St(n,N)$ can easily be determined:

\[T_YSt(n,N)=\{\Delta:\Delta^TY + Y^T\Delta = 0\}.\]

The Lie algebra of $SO(N)$ is $\mathfrak{so}(N):=\{V\in\mathbb{R}^{N\times{}N}:V^T + V = 0\}$ and the canonical metric associated with it is simply $(V_1,V_2)\mapsto\frac{1}{2}\mathrm{Tr}(V_1^TV_2)$.

The Riemannian Gradient

For matrix manifolds (like the Stiefel manifold), the Riemannian gradient of a function can be easily determined computationally:

The Euclidean gradient of a function $L$ is equivalent to an element of the cotangent space $T^*_Y\mathcal{M}$ via:

\[\langle\nabla{}L,\cdot\rangle:T_Y\mathcal{M} \to \mathbb{R}, \Delta \mapsto \sum_{ij}[\nabla{}L]_{ij}[\Delta]_{ij} = \mathrm{Tr}(\nabla{}L^T\Delta).\]

We can then utilize the Riemannian metric on $\mathcal{M}$ to map the element from the cotangent space (i.e. $\nabla{}L$) to the tangent space. This element is called $\mathrm{grad}_{(\cdot)}L$ here. Explicitly, it is given by:

\[ \mathrm{grad}_YL = \nabla_YL - Y(\nabla_YL)^TY\]

rgrad

What was referred to as $\nabla{}L$ before can in practice be obtained with an AD routine. We then use the function rgrad to map this Euclidean gradient to $\in{}T_YSt(n,N)$. This mapping has the property:

\[\mathrm{Tr}((\nabla{}L)^T\Delta) = g_Y(\mathtt{rgrad}(Y, \nabla{}L), \Delta) \forall\Delta\in{}T_YSt(n,N)\]

and $g$ is the Riemannian metric.

+\end{bmatrix}\in\mathcal{M},\]

which is the canonical element of the Stiefel manifold. In words: the first $n$ columns of $A$ and $B$ are the same.

The tangent space to the element $Y\in{}St(n,N)$ can easily be determined:

\[T_YSt(n,N)=\{\Delta:\Delta^TY + Y^T\Delta = 0\}.\]

The Lie algebra of $SO(N)$ is $\mathfrak{so}(N):=\{V\in\mathbb{R}^{N\times{}N}:V^T + V = 0\}$ and the canonical metric associated with it is simply $(V_1,V_2)\mapsto\frac{1}{2}\mathrm{Tr}(V_1^TV_2)$.

The Riemannian Gradient

For matrix manifolds (like the Stiefel manifold), the Riemannian gradient of a function can be easily determined computationally:

The Euclidean gradient of a function $L$ is equivalent to an element of the cotangent space $T^*_Y\mathcal{M}$ via:

\[\langle\nabla{}L,\cdot\rangle:T_Y\mathcal{M} \to \mathbb{R}, \Delta \mapsto \sum_{ij}[\nabla{}L]_{ij}[\Delta]_{ij} = \mathrm{Tr}(\nabla{}L^T\Delta).\]

We can then utilize the Riemannian metric on $\mathcal{M}$ to map the element from the cotangent space (i.e. $\nabla{}L$) to the tangent space. This element is called $\mathrm{grad}_{(\cdot)}L$ here. Explicitly, it is given by:

\[ \mathrm{grad}_YL = \nabla_YL - Y(\nabla_YL)^TY\]

rgrad

What was referred to as $\nabla{}L$ before can in practice be obtained with an AD routine. We then use the function rgrad to map this Euclidean gradient to $\in{}T_YSt(n,N)$. This mapping has the property:

\[\mathrm{Tr}((\nabla{}L)^T\Delta) = g_Y(\mathtt{rgrad}(Y, \nabla{}L), \Delta) \forall\Delta\in{}T_YSt(n,N)\]

and $g$ is the Riemannian metric.

diff --git a/latest/manifolds/submersion_theorem/index.html b/latest/manifolds/submersion_theorem/index.html index 5b932e71f..6a5dab830 100644 --- a/latest/manifolds/submersion_theorem/index.html +++ b/latest/manifolds/submersion_theorem/index.html @@ -1,2 +1,2 @@ -The Submersion Theorem · GeometricMachineLearning.jl
+The Submersion Theorem · GeometricMachineLearning.jl
diff --git a/latest/optimizers/adam_optimizer/index.html b/latest/optimizers/adam_optimizer/index.html index 4998ae086..33f0ea37c 100644 --- a/latest/optimizers/adam_optimizer/index.html +++ b/latest/optimizers/adam_optimizer/index.html @@ -1,2 +1,2 @@ -Adam Optimizer · GeometricMachineLearning.jl

The Adam Optimizer

The Adam Optimizer is one of the most widely (if not the most widely used) neural network optimizer. Like most modern neural network optimizers it contains a cache that is updated based on first-order gradient information and then, in a second step, the cache is used to compute a velocity estimate for updating the neural networ weights.

Here we first describe the Adam algorithm for the case where all the weights are on a vector space and then show how to generalize this to the case where the weights are on a manifold.

All weights on a vector space

The cache of the Adam optimizer consists of first and second moments. The first moments $B_1$ store linear information about the current and previous gradients, and the second moments $B_2$ store quadratic information about current and previous gradients (all computed from a first-order gradient).

If all the weights are on a vector space, then we directly compute updates for $B_1$ and $B_2$:

  1. \[B_1 \gets ((\rho_1 - \rho_1^t)/(1 - \rho_1^t))\cdot{}B_1 + (1 - \rho_1)/(1 - \rho_1^t)\cdot{}\nabla{}L,\]

  2. \[B_2 \gets ((\rho_2 - \rho_1^t)/(1 - \rho_2^t))\cdot{}B_2 + (1 - \rho_2)/(1 - \rho_2^t)\cdot\nabla{}L\odot\nabla{}L,\]

    where $\odot:\mathbb{R}^n\times\mathbb{R}^n\to\mathbb{R}^n$ is the Hadamard product: $[a\odot{}b]_i = a_ib_i$. $\rho_1$ and $\rho_2$ are hyperparameters. Their defaults, $\rho_1=0.9$ and $\rho_2=0.99$, are taken from (Goodfellow et al., 2016, page 301). After having updated the cache (i.e. $B_1$ and $B_2$) we compute a velocity (step 3) with which the parameters $Y_t$ are then updated (step 4).

  3. \[W_t\gets -\eta{}B_1/\sqrt{B_2 + \delta},\]

  4. \[Y_{t+1} \gets Y_t + W_t,\]

Here $\eta$ (with default 0.01) is the learning rate and $\delta$ (with default $3\cdot10^{-7}$) is a small constant that is added for stability. The division, square root and addition in step 3 are performed element-wise.

Weights on manifolds

The problem with generalizing Adam to manifolds is that the Hadamard product $\odot$ as well as the other element-wise operations ($/$, $\sqrt{}$ and $+$ in step 3 above) lack a clear geometric interpretation. In GeometricMachineLearning we get around this issue by utilizing a so-called global tangent space representation.

References

  • Goodfellow I, Bengio Y, Courville A. Deep learning[M]. MIT press, 2016.
[25]
I. Goodfellow, Y. Bengio and A. Courville. Deep learning (MIT press, Cambridge, MA, 2016).
+Adam Optimizer · GeometricMachineLearning.jl

The Adam Optimizer

The Adam Optimizer is one of the most widely (if not the most widely used) neural network optimizer. Like most modern neural network optimizers it contains a cache that is updated based on first-order gradient information and then, in a second step, the cache is used to compute a velocity estimate for updating the neural networ weights.

Here we first describe the Adam algorithm for the case where all the weights are on a vector space and then show how to generalize this to the case where the weights are on a manifold.

All weights on a vector space

The cache of the Adam optimizer consists of first and second moments. The first moments $B_1$ store linear information about the current and previous gradients, and the second moments $B_2$ store quadratic information about current and previous gradients (all computed from a first-order gradient).

If all the weights are on a vector space, then we directly compute updates for $B_1$ and $B_2$:

  1. \[B_1 \gets ((\rho_1 - \rho_1^t)/(1 - \rho_1^t))\cdot{}B_1 + (1 - \rho_1)/(1 - \rho_1^t)\cdot{}\nabla{}L,\]

  2. \[B_2 \gets ((\rho_2 - \rho_1^t)/(1 - \rho_2^t))\cdot{}B_2 + (1 - \rho_2)/(1 - \rho_2^t)\cdot\nabla{}L\odot\nabla{}L,\]

    where $\odot:\mathbb{R}^n\times\mathbb{R}^n\to\mathbb{R}^n$ is the Hadamard product: $[a\odot{}b]_i = a_ib_i$. $\rho_1$ and $\rho_2$ are hyperparameters. Their defaults, $\rho_1=0.9$ and $\rho_2=0.99$, are taken from (Goodfellow et al., 2016, page 301). After having updated the cache (i.e. $B_1$ and $B_2$) we compute a velocity (step 3) with which the parameters $Y_t$ are then updated (step 4).

  3. \[W_t\gets -\eta{}B_1/\sqrt{B_2 + \delta},\]

  4. \[Y_{t+1} \gets Y_t + W_t,\]

Here $\eta$ (with default 0.01) is the learning rate and $\delta$ (with default $3\cdot10^{-7}$) is a small constant that is added for stability. The division, square root and addition in step 3 are performed element-wise.

Weights on manifolds

The problem with generalizing Adam to manifolds is that the Hadamard product $\odot$ as well as the other element-wise operations ($/$, $\sqrt{}$ and $+$ in step 3 above) lack a clear geometric interpretation. In GeometricMachineLearning we get around this issue by utilizing a so-called global tangent space representation.

References

  • Goodfellow I, Bengio Y, Courville A. Deep learning[M]. MIT press, 2016.
[25]
I. Goodfellow, Y. Bengio and A. Courville. Deep learning (MIT press, Cambridge, MA, 2016).
diff --git a/latest/optimizers/bfgs_optimizer/index.html b/latest/optimizers/bfgs_optimizer/index.html index 7c1c106bb..c31b053ea 100644 --- a/latest/optimizers/bfgs_optimizer/index.html +++ b/latest/optimizers/bfgs_optimizer/index.html @@ -14,4 +14,4 @@ u^T\tilde{B}_{k-1}u - 1 & u^T\tilde{B}_{k-1}u \\ u_\perp^T\tilde{B}_{k-1}u & u_\perp^T(\tilde{B}_{k-1}-\tilde{B}_k)u_\perp \end{bmatrix}. -\end{aligned}\]

By a property of the Frobenius norm:

\[||\tilde{B}_{k-1} - \tilde{B}||^2_F = (u^T\tilde{B}_{k-1} -1)^2 + ||u^T\tilde{B}_{k-1}u_\perp||_F^2 + ||u_\perp^T\tilde{B}_{k-1}u||_F^2 + ||u_\perp^T(\tilde{B}_{k-1} - \tilde{B})u_\perp||_F^2.\]

We see that $\tilde{B}$ only appears in the last term, which should therefore be made zero. This then gives:

\[\tilde{B} = U\begin{bmatrix} 1 & 0 \\ 0 & u^T_\perp\tilde{B}_{k-1}u_\perp \end{bmatrix} = uu^T + (\mathbb{I}-uu^T)\tilde{B}_{k-1}(\mathbb{I}-uu^T).\]

If we now map back to the original coordinate system, the ideal solution for $B_k$ is:

\[B_k = (\mathbb{I} - \frac{1}{y_{k-1}^Ts_{k-1}}y_{k-1}s_{k-1}^T)B_{k-1}(\mathbb{I} - \frac{1}{y_{k-1}^Ts_{k-1}}s_{k-1}y_{k-1}^T) + \frac{1}{y_{k-1}^Ts_{k-1}}y_ky_k^T.\]

What we need in practice however is not $B_k$, but its inverse $H_k$. This is because we need to find $s_{k-1}$ based on $y_{k-1}$. To get $H_k$ based on the expression for $B_k$ above we can use the Sherman-Morrison-Woodbury formula[3] to obtain:

\[H_{k} = H_{k-1} - \frac{H_{k-1}y_{k-1}y_{k-1}^TH_{k-1}}{y_{k-1}^TH_{k-1}y_{k-1}} + \frac{s_{k-1}s_{k-1}^T}{y_{k-1}^Ts_{k-1}}.\]

TODO: Example where this works well!

References

[9]
J. N. Stephen J. Wright. Numerical optimization (Springer Science+Business Media, 2006).
+\end{aligned}\]

By a property of the Frobenius norm:

\[||\tilde{B}_{k-1} - \tilde{B}||^2_F = (u^T\tilde{B}_{k-1} -1)^2 + ||u^T\tilde{B}_{k-1}u_\perp||_F^2 + ||u_\perp^T\tilde{B}_{k-1}u||_F^2 + ||u_\perp^T(\tilde{B}_{k-1} - \tilde{B})u_\perp||_F^2.\]

We see that $\tilde{B}$ only appears in the last term, which should therefore be made zero. This then gives:

\[\tilde{B} = U\begin{bmatrix} 1 & 0 \\ 0 & u^T_\perp\tilde{B}_{k-1}u_\perp \end{bmatrix} = uu^T + (\mathbb{I}-uu^T)\tilde{B}_{k-1}(\mathbb{I}-uu^T).\]

If we now map back to the original coordinate system, the ideal solution for $B_k$ is:

\[B_k = (\mathbb{I} - \frac{1}{y_{k-1}^Ts_{k-1}}y_{k-1}s_{k-1}^T)B_{k-1}(\mathbb{I} - \frac{1}{y_{k-1}^Ts_{k-1}}s_{k-1}y_{k-1}^T) + \frac{1}{y_{k-1}^Ts_{k-1}}y_ky_k^T.\]

What we need in practice however is not $B_k$, but its inverse $H_k$. This is because we need to find $s_{k-1}$ based on $y_{k-1}$. To get $H_k$ based on the expression for $B_k$ above we can use the Sherman-Morrison-Woodbury formula[3] to obtain:

\[H_{k} = H_{k-1} - \frac{H_{k-1}y_{k-1}y_{k-1}^TH_{k-1}}{y_{k-1}^TH_{k-1}y_{k-1}} + \frac{s_{k-1}s_{k-1}^T}{y_{k-1}^Ts_{k-1}}.\]

TODO: Example where this works well!

References

[9]
J. N. Stephen J. Wright. Numerical optimization (Springer Science+Business Media, 2006).
diff --git a/latest/optimizers/general_optimization/index.html b/latest/optimizers/general_optimization/index.html index d81924f8b..fe8650353 100644 --- a/latest/optimizers/general_optimization/index.html +++ b/latest/optimizers/general_optimization/index.html @@ -1,2 +1,2 @@ -General Optimization · GeometricMachineLearning.jl

Optimization for Neural Networks

Optimization for neural networks is (almost always) some variation on gradient descent. The most basic form of gradient descent is a discretization of the gradient flow equation:

\[\dot{\theta} = -\nabla_\theta{}L,\]

by means of a Euler time-stepping scheme:

\[\theta^{t+1} = \theta^{t} - h\nabla_{\theta^{t}}L,\]

where $\eta$ (the time step of the Euler scheme) is referred to as the learning rate

This equation can easily be generalized to manifolds by replacing the Euclidean gradient $\nabla_{\theta^{t}L}$ by a Riemannian gradient $-h\mathrm{grad}_{\theta^{t}}L$ and addition by $-h\nabla_{\theta^{t}}L$ with a retraction by $-h\mathrm{grad}_{\theta^{t}}L$.

+General Optimization · GeometricMachineLearning.jl

Optimization for Neural Networks

Optimization for neural networks is (almost always) some variation on gradient descent. The most basic form of gradient descent is a discretization of the gradient flow equation:

\[\dot{\theta} = -\nabla_\theta{}L,\]

by means of a Euler time-stepping scheme:

\[\theta^{t+1} = \theta^{t} - h\nabla_{\theta^{t}}L,\]

where $\eta$ (the time step of the Euler scheme) is referred to as the learning rate

This equation can easily be generalized to manifolds by replacing the Euclidean gradient $\nabla_{\theta^{t}L}$ by a Riemannian gradient $-h\mathrm{grad}_{\theta^{t}}L$ and addition by $-h\nabla_{\theta^{t}}L$ with a retraction by $-h\mathrm{grad}_{\theta^{t}}L$.

diff --git a/latest/optimizers/manifold_related/cayley/index.html b/latest/optimizers/manifold_related/cayley/index.html index 850c90524..423b296b9 100644 --- a/latest/optimizers/manifold_related/cayley/index.html +++ b/latest/optimizers/manifold_related/cayley/index.html @@ -10,4 +10,4 @@ \begin{bmatrix} \mathbb{I} \\ \frac{1}{2}A \end{bmatrix} + \begin{bmatrix} \frac{1}{2}A \\ \frac{1}{4}A^2 - \frac{1}{2}B^TB \end{bmatrix} \right) - \right)\]

Note that for computational reason we compute $\mathrm{Cayley}(C)E$ instead of just the Cayley transform (see the section on retractions).

+ \right)\]

Note that for computational reason we compute $\mathrm{Cayley}(C)E$ instead of just the Cayley transform (see the section on retractions).

diff --git a/latest/optimizers/manifold_related/geodesic/index.html b/latest/optimizers/manifold_related/geodesic/index.html index 71c6ca9f0..b88fdc598 100644 --- a/latest/optimizers/manifold_related/geodesic/index.html +++ b/latest/optimizers/manifold_related/geodesic/index.html @@ -1,2 +1,2 @@ -Geodesic Retraction · GeometricMachineLearning.jl

Geodesic Retraction

General retractions are approximations of the exponential map. In GeometricMachineLearning we can, instead of using an approximation, solve the geodesic equation exactly (up to numerical error) by specifying Geodesic() as the argument of layers that have manifold weights.

+Geodesic Retraction · GeometricMachineLearning.jl

Geodesic Retraction

General retractions are approximations of the exponential map. In GeometricMachineLearning we can, instead of using an approximation, solve the geodesic equation exactly (up to numerical error) by specifying Geodesic() as the argument of layers that have manifold weights.

diff --git a/latest/optimizers/manifold_related/global_sections/index.html b/latest/optimizers/manifold_related/global_sections/index.html index 7bd8fd98d..d330db072 100644 --- a/latest/optimizers/manifold_related/global_sections/index.html +++ b/latest/optimizers/manifold_related/global_sections/index.html @@ -7,4 +7,4 @@ & = \begin{bmatrix} Y^T\Delta{}E^T \\ \bar{\lambda}\Delta{}E^T \end{bmatrix} + E\Delta^TYE^T - \begin{bmatrix}E\Delta^TY & E\Delta^T\bar{\lambda} \end{bmatrix} \\ & = EY^T\Delta{}E^T + E\Delta^TYE^T - E\Delta^TYE^T + \begin{bmatrix} \mathbb{O} \\ \bar{\lambda}\Delta{}E^T \end{bmatrix} - \begin{bmatrix} \mathbb{O} & E\Delta^T\bar{\lambda} \end{bmatrix} \\ & = EY^T\Delta{}E^T + \begin{bmatrix} \mathbb{O} \\ \bar{\lambda}\Delta{}E^T \end{bmatrix} - \begin{bmatrix} \mathbb{O} & E\Delta^T\bar{\lambda} \end{bmatrix}, -\end{aligned}\]

meaning that for an element of the horizontal component of the Lie algebra $\mathfrak{g}^\mathrm{hor}$ we store $A=Y^T\Delta$ and $B=\bar{\lambda}^T\Delta$.

Optimization

The output of global_rep is then used for all the optimization steps.

References

[24]
T. Frankel. The geometry of physics: an introduction (Cambridge university press, Cambridge, UK, 2011).
+\end{aligned}\]

meaning that for an element of the horizontal component of the Lie algebra $\mathfrak{g}^\mathrm{hor}$ we store $A=Y^T\Delta$ and $B=\bar{\lambda}^T\Delta$.

Optimization

The output of global_rep is then used for all the optimization steps.

References

[24]
T. Frankel. The geometry of physics: an introduction (Cambridge university press, Cambridge, UK, 2011).
diff --git a/latest/optimizers/manifold_related/horizontal_lift/index.html b/latest/optimizers/manifold_related/horizontal_lift/index.html index 17cbcf9e5..0e5b4a4ec 100644 --- a/latest/optimizers/manifold_related/horizontal_lift/index.html +++ b/latest/optimizers/manifold_related/horizontal_lift/index.html @@ -1,2 +1,2 @@ -Horizontal Lift · GeometricMachineLearning.jl

The Horizontal Lift

For each element $Y\in\mathcal{M}$ we can perform a splitting $\mathfrak{g} = \mathfrak{g}^{\mathrm{hor}, Y}\oplus\mathfrak{g}^{\mathrm{ver}, Y}$, where the two subspaces are the horizontal and the vertical component of $\mathfrak{g}$ at $Y$ respectively. For homogeneous spaces: $T_Y\mathcal{M} = \mathfrak{g}\cdot{}Y$, i.e. every tangent space to $\mathcal{M}$ can be expressed through the application of the Lie algebra to the relevant element. The vertical component consists of those elements of $\mathfrak{g}$ which are mapped to the zero element of $T_Y\mathcal{M}$, i.e.

\[\mathfrak{g}^{\mathrm{ver}, Y} := \mathrm{ker}(\mathfrak{g}\to{}T_Y\mathcal{M}).\]

The orthogonal complement[1] of $\mathfrak{g}^{\mathrm{ver}, Y}$ is the horizontal component and is referred to by $\mathfrak{g}^{\mathrm{hor}, Y}$. This is naturally isomorphic to $T_Y\mathcal{M}$. For the Stiefel manifold the horizontal lift has the simple form:

\[\Omega(Y, V) = \left(\mathbb{I} - \frac{1}{2}\right)VY^T - YV^T(\mathbb{I} - \frac{1}{2}YY^T).\]

If the element $Y$ is the distinct element $E$, then the elements of $\mathfrak{g}^{\mathrm{hor},E}$ take a particularly simple form, see Global Tangent Space for a description of this.

  • 1The orthogonal complement is taken with respect to a metric defined on $\mathfrak{g}$. For the case of $G=SO(N)$ and $\mathfrak{g}=\mathfrak{so}(N) = \{A:A+A^T =0\}$ this metric can be chosen as $(A_1,A_2)\mapsto{}\frac{1}{2}A_1^TA_2$.
+Horizontal Lift · GeometricMachineLearning.jl

The Horizontal Lift

For each element $Y\in\mathcal{M}$ we can perform a splitting $\mathfrak{g} = \mathfrak{g}^{\mathrm{hor}, Y}\oplus\mathfrak{g}^{\mathrm{ver}, Y}$, where the two subspaces are the horizontal and the vertical component of $\mathfrak{g}$ at $Y$ respectively. For homogeneous spaces: $T_Y\mathcal{M} = \mathfrak{g}\cdot{}Y$, i.e. every tangent space to $\mathcal{M}$ can be expressed through the application of the Lie algebra to the relevant element. The vertical component consists of those elements of $\mathfrak{g}$ which are mapped to the zero element of $T_Y\mathcal{M}$, i.e.

\[\mathfrak{g}^{\mathrm{ver}, Y} := \mathrm{ker}(\mathfrak{g}\to{}T_Y\mathcal{M}).\]

The orthogonal complement[1] of $\mathfrak{g}^{\mathrm{ver}, Y}$ is the horizontal component and is referred to by $\mathfrak{g}^{\mathrm{hor}, Y}$. This is naturally isomorphic to $T_Y\mathcal{M}$. For the Stiefel manifold the horizontal lift has the simple form:

\[\Omega(Y, V) = \left(\mathbb{I} - \frac{1}{2}\right)VY^T - YV^T(\mathbb{I} - \frac{1}{2}YY^T).\]

If the element $Y$ is the distinct element $E$, then the elements of $\mathfrak{g}^{\mathrm{hor},E}$ take a particularly simple form, see Global Tangent Space for a description of this.

  • 1The orthogonal complement is taken with respect to a metric defined on $\mathfrak{g}$. For the case of $G=SO(N)$ and $\mathfrak{g}=\mathfrak{so}(N) = \{A:A+A^T =0\}$ this metric can be chosen as $(A_1,A_2)\mapsto{}\frac{1}{2}A_1^TA_2$.
diff --git a/latest/optimizers/manifold_related/retractions/index.html b/latest/optimizers/manifold_related/retractions/index.html index b684e586e..bcae7ad9f 100644 --- a/latest/optimizers/manifold_related/retractions/index.html +++ b/latest/optimizers/manifold_related/retractions/index.html @@ -1,2 +1,2 @@ -Retractions · GeometricMachineLearning.jl

Retractions

Classical Definition

Classically, retractions are defined as maps smooth maps

\[R: T\mathcal{M}\to\mathcal{M}:(x,v)\mapsto{}R_x(v)\]

such that each curve $c(t) := R_x(tv)$ satisfies $c(0) = x$ and $c'(0) = v$.

In GeometricMachineLearning

Retractions are a map from the horizontal component of the Lie algebra $\mathfrak{g}^\mathrm{hor}$ to the respective manifold.

For optimization in neural networks (almost always first order) we solve a gradient flow equation

\[\dot{W} = -\mathrm{grad}_WL, \]

where $\mathrm{grad}_WL$ is the Riemannian gradient of the loss function $L$ evaluated at position $W$.

If we deal with Euclidean spaces (vector spaces), then the Riemannian gradient is just the result of an AD routine and the solution of the equation above can be approximated with $W^{t+1} \gets W^t - \eta\nabla_{W^t}L$, where $\eta$ is the learning rate.

For manifolds, after we obtained the Riemannian gradient (see e.g. the section on Stiefel manifold), we have to solve a geodesic equation. This is a canonical ODE associated with any Riemannian manifold.

The general theory of Riemannian manifolds is rather complicated, but for the neural networks treated in GeometricMachineLearning, we only rely on optimization of matrix Lie groups and homogeneous spaces, which is much simpler.

For Lie groups each tangent space is isomorphic to its Lie algebra $\mathfrak{g}\equiv{}T_\mathbb{I}G$. The geodesic map from $\mathfrak{g}$ to $G$, for matrix Lie groups with bi-invariant Riemannian metric like $SO(N)$, is simply the application of the matrix exponential $\exp$. Alternatively this can be replaced by the Cayley transform (see (Absil et al, 2008).)

Starting from this basic map $\exp:\mathfrak{g}\to{}G$ we can build mappings for more complicated cases:

  1. General tangent space to a Lie group $T_AG$: The geodesic map for an element $V\in{}T_AG$ is simply $A\exp(A^{-1}V)$.

  2. Special tangent space to a homogeneous space $T_E\mathcal{M}$: For $V=BE\in{}T_E\mathcal{M}$ the exponential map is simply $\exp(B)E$.

  3. General tangent space to a homogeneous space $T_Y\mathcal{M}$ with $Y = AE$: For $\Delta=ABE\in{}T_Y\mathcal{M}$ the exponential map is simply $A\exp(B)E$. This is the general case which we deal with.

The general theory behind points 2. and 3. is discussed in chapter 11 of (O'Neill, 1983). The function retraction in GeometricMachineLearning performs $\mathfrak{g}^\mathrm{hor}\to\mathcal{M}$, which is the second of the above points. To get the third from the second point, we simply have to multiply with a matrix from the left. This step is done with apply_section and represented through the red vertical line in the diagram on the general optimizer framework.

Word of caution

The Lie group corresponding to the Stiefel manifold $SO(N)$ has a bi-invariant Riemannian metric associated with it: $(B_1,B_2)\mapsto \mathrm{Tr}(B_1^TB_2)$. For other Lie groups (e.g. the symplectic group) the situation is slightly more difficult (see (Bendokat et al, 2021).)

References

  • Absil P A, Mahony R, Sepulchre R. Optimization algorithms on matrix manifolds[M]. Princeton University Press, 2008.

  • Bendokat T, Zimmermann R. The real symplectic Stiefel and Grassmann manifolds: metrics, geodesics and applications[J]. arXiv preprint arXiv:2108.12447, 2021.

  • O'Neill, Barrett. Semi-Riemannian geometry with applications to relativity. Academic press, 1983.

+Retractions · GeometricMachineLearning.jl

Retractions

Classical Definition

Classically, retractions are defined as maps smooth maps

\[R: T\mathcal{M}\to\mathcal{M}:(x,v)\mapsto{}R_x(v)\]

such that each curve $c(t) := R_x(tv)$ satisfies $c(0) = x$ and $c'(0) = v$.

In GeometricMachineLearning

Retractions are a map from the horizontal component of the Lie algebra $\mathfrak{g}^\mathrm{hor}$ to the respective manifold.

For optimization in neural networks (almost always first order) we solve a gradient flow equation

\[\dot{W} = -\mathrm{grad}_WL, \]

where $\mathrm{grad}_WL$ is the Riemannian gradient of the loss function $L$ evaluated at position $W$.

If we deal with Euclidean spaces (vector spaces), then the Riemannian gradient is just the result of an AD routine and the solution of the equation above can be approximated with $W^{t+1} \gets W^t - \eta\nabla_{W^t}L$, where $\eta$ is the learning rate.

For manifolds, after we obtained the Riemannian gradient (see e.g. the section on Stiefel manifold), we have to solve a geodesic equation. This is a canonical ODE associated with any Riemannian manifold.

The general theory of Riemannian manifolds is rather complicated, but for the neural networks treated in GeometricMachineLearning, we only rely on optimization of matrix Lie groups and homogeneous spaces, which is much simpler.

For Lie groups each tangent space is isomorphic to its Lie algebra $\mathfrak{g}\equiv{}T_\mathbb{I}G$. The geodesic map from $\mathfrak{g}$ to $G$, for matrix Lie groups with bi-invariant Riemannian metric like $SO(N)$, is simply the application of the matrix exponential $\exp$. Alternatively this can be replaced by the Cayley transform (see (Absil et al, 2008).)

Starting from this basic map $\exp:\mathfrak{g}\to{}G$ we can build mappings for more complicated cases:

  1. General tangent space to a Lie group $T_AG$: The geodesic map for an element $V\in{}T_AG$ is simply $A\exp(A^{-1}V)$.

  2. Special tangent space to a homogeneous space $T_E\mathcal{M}$: For $V=BE\in{}T_E\mathcal{M}$ the exponential map is simply $\exp(B)E$.

  3. General tangent space to a homogeneous space $T_Y\mathcal{M}$ with $Y = AE$: For $\Delta=ABE\in{}T_Y\mathcal{M}$ the exponential map is simply $A\exp(B)E$. This is the general case which we deal with.

The general theory behind points 2. and 3. is discussed in chapter 11 of (O'Neill, 1983). The function retraction in GeometricMachineLearning performs $\mathfrak{g}^\mathrm{hor}\to\mathcal{M}$, which is the second of the above points. To get the third from the second point, we simply have to multiply with a matrix from the left. This step is done with apply_section and represented through the red vertical line in the diagram on the general optimizer framework.

Word of caution

The Lie group corresponding to the Stiefel manifold $SO(N)$ has a bi-invariant Riemannian metric associated with it: $(B_1,B_2)\mapsto \mathrm{Tr}(B_1^TB_2)$. For other Lie groups (e.g. the symplectic group) the situation is slightly more difficult (see (Bendokat et al, 2021).)

References

  • Absil P A, Mahony R, Sepulchre R. Optimization algorithms on matrix manifolds[M]. Princeton University Press, 2008.

  • Bendokat T, Zimmermann R. The real symplectic Stiefel and Grassmann manifolds: metrics, geodesics and applications[J]. arXiv preprint arXiv:2108.12447, 2021.

  • O'Neill, Barrett. Semi-Riemannian geometry with applications to relativity. Academic press, 1983.

diff --git a/latest/reduced_order_modeling/autoencoder/index.html b/latest/reduced_order_modeling/autoencoder/index.html index 4e4449ad7..f0b4bb88f 100644 --- a/latest/reduced_order_modeling/autoencoder/index.html +++ b/latest/reduced_order_modeling/autoencoder/index.html @@ -1,2 +1,2 @@ -POD and Autoencoders · GeometricMachineLearning.jl

Reduced Order modeling and Autoencoders

Reduced order modeling is a data-driven technique that exploits the structure of parametric PDEs to make solving those PDEs easier.

Consider a parametric PDE written in the form: $F(z(\mu);\mu)=0$ where $z(\mu)$ evolves on a infinite-dimensional Hilbert space $V$.

In modeling any PDE we have to choose a discretization (particle discretization, finite element method, ...) of $V$, which will be denoted by $V_h$.

Solution manifold

To any parametric PDE we associate a solution manifold:

\[\mathcal{M} = \{z(\mu):F(z(\mu);\mu)=0, \mu\in\mathbb{P}\}.\]

In the image above a 2-dimensional solution manifold is visualized as a sub-manifold in 3-dimensional space. In general the embedding space is an infinite-dimensional function space.

As an example of this consider the 1-dimensional wave equation:

\[\partial_{tt}^2q(t,\xi;\mu) = \mu^2\partial_{\xi\xi}^2q(t,\xi;\mu)\text{ on }I\times\Omega,\]

where $I = (0,1)$ and $\Omega=(-1/2,1/2)$. As initial condition for the first derivative we have $\partial_tq(0,\xi;\mu) = -\mu\partial_\xi{}q_0(\xi;\mu)$ and furthermore $q(t,\xi;\mu)=0$ on the boundary (i.e. $\xi\in\{-1/2,1/2\}$).

The solution manifold is a 1-dimensional submanifold:

\[\mathcal{M} = \{(t, \xi)\mapsto{}q(t,\xi;\mu)=q_0(\xi-\mu{}t;\mu):\mu\in\mathbb{P}\subset\mathbb{R}\}.\]

If we provide an initial condition $u_0$, a parameter instance $\mu$ and a time $t$, then $\xi\mapsto{}q(t,\xi;\mu)$ will be the momentary solution. If we consider the time evolution of $q(t,\xi;\mu)$, then it evolves on a two-dimensional submanifold $\bar{\mathcal{M}} := \{\xi\mapsto{}q(t,\xi;\mu):t\in{}I,\mu\in\mathbb{P}\}$.

General workflow

In reduced order modeling we aim to construct a mapping to a space that is close to this solution manifold. This is done through the following steps:

  1. Discretize the PDE.

  2. Solve the discretized PDE for a certain set of parameter instances $\mu\in\mathbb{P}$.

  3. Build a reduced basis with the data obtained from having solved the discretized PDE. This step consists of finding two mappings: the reduction $\mathcal{P}$ and the reconstruction $\mathcal{R}$.

The third step can be done with various machine learning (ML) techniques. Traditionally the most popular of these has been Proper orthogonal decomposition (POD), but in recent years autoencoders have also become a popular alternative (see (Fresca et al, 2021)).

References

[18]
S. Fresca, L. Dede’ and A. Manzoni. A comprehensive deep learning-based approach to reduced order modeling of nonlinear time-dependent parametrized PDEs. Journal of Scientific Computing 87, 1–36 (2021).
+POD and Autoencoders · GeometricMachineLearning.jl

Reduced Order modeling and Autoencoders

Reduced order modeling is a data-driven technique that exploits the structure of parametric PDEs to make solving those PDEs easier.

Consider a parametric PDE written in the form: $F(z(\mu);\mu)=0$ where $z(\mu)$ evolves on a infinite-dimensional Hilbert space $V$.

In modeling any PDE we have to choose a discretization (particle discretization, finite element method, ...) of $V$, which will be denoted by $V_h$.

Solution manifold

To any parametric PDE we associate a solution manifold:

\[\mathcal{M} = \{z(\mu):F(z(\mu);\mu)=0, \mu\in\mathbb{P}\}.\]

In the image above a 2-dimensional solution manifold is visualized as a sub-manifold in 3-dimensional space. In general the embedding space is an infinite-dimensional function space.

As an example of this consider the 1-dimensional wave equation:

\[\partial_{tt}^2q(t,\xi;\mu) = \mu^2\partial_{\xi\xi}^2q(t,\xi;\mu)\text{ on }I\times\Omega,\]

where $I = (0,1)$ and $\Omega=(-1/2,1/2)$. As initial condition for the first derivative we have $\partial_tq(0,\xi;\mu) = -\mu\partial_\xi{}q_0(\xi;\mu)$ and furthermore $q(t,\xi;\mu)=0$ on the boundary (i.e. $\xi\in\{-1/2,1/2\}$).

The solution manifold is a 1-dimensional submanifold:

\[\mathcal{M} = \{(t, \xi)\mapsto{}q(t,\xi;\mu)=q_0(\xi-\mu{}t;\mu):\mu\in\mathbb{P}\subset\mathbb{R}\}.\]

If we provide an initial condition $u_0$, a parameter instance $\mu$ and a time $t$, then $\xi\mapsto{}q(t,\xi;\mu)$ will be the momentary solution. If we consider the time evolution of $q(t,\xi;\mu)$, then it evolves on a two-dimensional submanifold $\bar{\mathcal{M}} := \{\xi\mapsto{}q(t,\xi;\mu):t\in{}I,\mu\in\mathbb{P}\}$.

General workflow

In reduced order modeling we aim to construct a mapping to a space that is close to this solution manifold. This is done through the following steps:

  1. Discretize the PDE.

  2. Solve the discretized PDE for a certain set of parameter instances $\mu\in\mathbb{P}$.

  3. Build a reduced basis with the data obtained from having solved the discretized PDE. This step consists of finding two mappings: the reduction $\mathcal{P}$ and the reconstruction $\mathcal{R}$.

The third step can be done with various machine learning (ML) techniques. Traditionally the most popular of these has been Proper orthogonal decomposition (POD), but in recent years autoencoders have also become a popular alternative (see (Fresca et al, 2021)).

References

[18]
S. Fresca, L. Dede’ and A. Manzoni. A comprehensive deep learning-based approach to reduced order modeling of nonlinear time-dependent parametrized PDEs. Journal of Scientific Computing 87, 1–36 (2021).
diff --git a/latest/reduced_order_modeling/kolmogorov_n_width/index.html b/latest/reduced_order_modeling/kolmogorov_n_width/index.html index c680833e7..7624fbfe4 100644 --- a/latest/reduced_order_modeling/kolmogorov_n_width/index.html +++ b/latest/reduced_order_modeling/kolmogorov_n_width/index.html @@ -1,2 +1,2 @@ -Kolmogorov n-width · GeometricMachineLearning.jl

Kolmogorov $n$-width

The Kolmogorov $n$-width measures how well some set $\mathcal{M}$ (typically the solution manifold) can be approximated with a linear subspace:

\[d_n(\mathcal{M}) := \mathrm{inf}_{V_n\subset{}V;\mathrm{dim}V_n=n}\mathrm{sup}(u\in\mathcal{M})\mathrm{inf}_{v_n\in{}V_n}|| u - v_n ||_V,\]

with $\mathcal{M}\subset{}V$ and $V$ is a (typically infinite-dimensional) Banach space. For advection-dominated problems (among others) the decay of the Kolmogorov $n$-width is very slow, i.e. one has to pick $n$ very high in order to obtain useful approximations (see [12] and [13]).

In order to overcome this, techniques based on neural networks (see e.g. [14]) and optimal transport (see e.g. [13]) have been used.

References

[13]
T. Blickhan. A registration method for reduced basis problems using linear optimal transport, arXiv preprint arXiv:2304.14884 (2023).
[12]
C. Greif and K. Urban. Decay of the Kolmogorov N-width for wave problems. Applied Mathematics Letters 96, 216–222 (2019).
[14]
K. Lee and K. T. Carlberg. Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders. Journal of Computational Physics 404, 108973 (2020).
+Kolmogorov n-width · GeometricMachineLearning.jl

Kolmogorov $n$-width

The Kolmogorov $n$-width measures how well some set $\mathcal{M}$ (typically the solution manifold) can be approximated with a linear subspace:

\[d_n(\mathcal{M}) := \mathrm{inf}_{V_n\subset{}V;\mathrm{dim}V_n=n}\mathrm{sup}(u\in\mathcal{M})\mathrm{inf}_{v_n\in{}V_n}|| u - v_n ||_V,\]

with $\mathcal{M}\subset{}V$ and $V$ is a (typically infinite-dimensional) Banach space. For advection-dominated problems (among others) the decay of the Kolmogorov $n$-width is very slow, i.e. one has to pick $n$ very high in order to obtain useful approximations (see [12] and [13]).

In order to overcome this, techniques based on neural networks (see e.g. [14]) and optimal transport (see e.g. [13]) have been used.

References

[13]
T. Blickhan. A registration method for reduced basis problems using linear optimal transport, arXiv preprint arXiv:2304.14884 (2023).
[12]
C. Greif and K. Urban. Decay of the Kolmogorov N-width for wave problems. Applied Mathematics Letters 96, 216–222 (2019).
[14]
K. Lee and K. T. Carlberg. Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders. Journal of Computational Physics 404, 108973 (2020).
diff --git a/latest/reduced_order_modeling/projection_reduction_errors/index.html b/latest/reduced_order_modeling/projection_reduction_errors/index.html index 4011941fd..1ef304722 100644 --- a/latest/reduced_order_modeling/projection_reduction_errors/index.html +++ b/latest/reduced_order_modeling/projection_reduction_errors/index.html @@ -2,4 +2,4 @@ Projection and Reduction Error · GeometricMachineLearning.jl

Projection and Reduction Errors of Reduced Models

Two errors that are of very big importance in reduced order modeling are the projection and the reduction error. During training one typically aims to miminimze the projection error, but for the actual application of the model the reduction error is often more important.

Projection Error

The projection error computes how well a reduced basis, represented by the reduction $\mathcal{P}$ and the reconstruction $\mathcal{R}$, can represent the data with which it is build. In mathematical terms:

\[e_\mathrm{proj}(\mu) := \frac{|| \mathcal{R}\circ\mathcal{P}(M) - M ||}{|| M ||},\]

where $||\cdot||$ is the Frobenius norm (one could also optimize for different norms).

Reduction Error

The reduction error measures how far the reduced system diverges from the full-order system during integration (online stage). In mathematical terms (and for a single initial condition):

\[e_\mathrm{red}(\mu) := \sqrt{ \frac{\sum_{t=0}^K|| \mathbf{x}^{(t)}(\mu) - \mathcal{R}(\mathbf{x}^{(t)}_r(\mu)) ||^2}{\sum_{t=0}^K|| \mathbf{x}^{(t)}(\mu) ||^2} -},\]

where $\mathbf{x}^{(t)}$ is the solution of the FOM at point $t$ and $\mathbf{x}^{(t)}_r$ is the solution of the ROM (in the reduced basis) at point $t$. The reduction error, as opposed to the projection error, not only measures how well the solution manifold is represented by the reduced basis, but also measures how well the FOM dynamics are approximated by the ROM dynamics (via the induced vector field on the reduced basis).

+},\]

where $\mathbf{x}^{(t)}$ is the solution of the FOM at point $t$ and $\mathbf{x}^{(t)}_r$ is the solution of the ROM (in the reduced basis) at point $t$. The reduction error, as opposed to the projection error, not only measures how well the solution manifold is represented by the reduced basis, but also measures how well the FOM dynamics are approximated by the ROM dynamics (via the induced vector field on the reduced basis).

diff --git a/latest/reduced_order_modeling/symplectic_autoencoder/index.html b/latest/reduced_order_modeling/symplectic_autoencoder/index.html index 395f44890..1d2c723cd 100644 --- a/latest/reduced_order_modeling/symplectic_autoencoder/index.html +++ b/latest/reduced_order_modeling/symplectic_autoencoder/index.html @@ -9,4 +9,4 @@ \hat{p}_2(t_0) & \hat{p}_2(t_1) & \ldots & \hat{p}_2(t_f) \\ \ldots & \ldots & \ldots & \ldots \\ \hat{p}_{N}(t_0) & \hat{p}_{N}(t_1) & \ldots & \hat{p}_{N}(t_f) \\ -\end{array}\right],\]

then $\Phi$ can be computed in a very straight-forward manner:

  1. Rearrange the rows of the matrix $M$ such that we end up with a $N\times2(f+1)$ matrix: $\hat{M} := [M_q, M_p]$.
  2. Perform SVD: $\hat{M} = U\Sigma{}V^T$; set $\Phi\gets{}U\mathtt{[:,1:n]}$.

For details on the cotangent lift (and other methods for linear symplectic model reduction) consult [11].

Symplectic Autoencoders

PSD suffers from the similar shortcomings as regular POD: it is a linear map and the approximation space $\tilde{\mathcal{M}}= \{\Psi^\mathrm{dec}(z_r)\in\mathbb{R}^{2N}:u_r\in\mathrm{R}^{2n}\}$ is strictly linear. For problems with slowly-decaying Kolmogorov $n$-width this leads to very poor approximations.

In order to overcome this difficulty we use neural networks, more specifically SympNets, together with cotangent lift-like matrices. The resulting architecture, symplectic autoencoders, are demonstrated in the following image:

So we alternate between SympNet and PSD layers. Because all the PSD layers are based on matrices $\Phi\in{}St(n,N)$ we have to optimize on the Stiefel manifold.

References

[10]
P. Buchfink, S. Glas and B. Haasdonk. Symplectic model reduction of Hamiltonian systems on nonlinear manifolds and approximation with weakly symplectic autoencoder. SIAM Journal on Scientific Computing 45, A289–A311 (2023).
[11]
L. Peng and K. Mohseni. Symplectic model reduction of Hamiltonian systems. SIAM Journal on Scientific Computing 38, A1–A27 (2016).
+\end{array}\right],\]

then $\Phi$ can be computed in a very straight-forward manner:

  1. Rearrange the rows of the matrix $M$ such that we end up with a $N\times2(f+1)$ matrix: $\hat{M} := [M_q, M_p]$.
  2. Perform SVD: $\hat{M} = U\Sigma{}V^T$; set $\Phi\gets{}U\mathtt{[:,1:n]}$.

For details on the cotangent lift (and other methods for linear symplectic model reduction) consult [11].

Symplectic Autoencoders

PSD suffers from the similar shortcomings as regular POD: it is a linear map and the approximation space $\tilde{\mathcal{M}}= \{\Psi^\mathrm{dec}(z_r)\in\mathbb{R}^{2N}:u_r\in\mathrm{R}^{2n}\}$ is strictly linear. For problems with slowly-decaying Kolmogorov $n$-width this leads to very poor approximations.

In order to overcome this difficulty we use neural networks, more specifically SympNets, together with cotangent lift-like matrices. The resulting architecture, symplectic autoencoders, are demonstrated in the following image:

So we alternate between SympNet and PSD layers. Because all the PSD layers are based on matrices $\Phi\in{}St(n,N)$ we have to optimize on the Stiefel manifold.

References

[10]
P. Buchfink, S. Glas and B. Haasdonk. Symplectic model reduction of Hamiltonian systems on nonlinear manifolds and approximation with weakly symplectic autoencoder. SIAM Journal on Scientific Computing 45, A289–A311 (2023).
[11]
L. Peng and K. Mohseni. Symplectic model reduction of Hamiltonian systems. SIAM Journal on Scientific Computing 38, A1–A27 (2016).
diff --git a/latest/references/index.html b/latest/references/index.html index a6612441b..d236b66c4 100644 --- a/latest/references/index.html +++ b/latest/references/index.html @@ -1,2 +1,2 @@ -References · GeometricMachineLearning.jl

References

[1]
P. Jin, Z. Zhang, A. Zhu, Y. Tang and G. E. Karniadakis. SympNets: Intrinsic structure-preserving symplectic networks for identifying Hamiltonian systems. Neural Networks 132, 166–179 (2020).
[2]
E. Hairer, C. Lubich and G. Wanner. Geometric Numerical integration: structure-preserving algorithms for ordinary differential equations (Springer, 2006).
[3]
B. Leimkuhler and S. Reich. Simulating hamiltonian dynamics. No. 14 (Cambridge university press, 2004).
[4]
P. Jin, Z. Lin and B. Xiao. Optimal unit triangular factorization of symplectic matrices. Linear Algebra and its Applications (2022).
[5]
S. Lipschutz. General Topology (McGraw-Hill Book Company, 1965).
[6]
S. Lang. Fundamentals of differential geometry. Vol. 191 (Springer Science & Business Media, 2012).
[7]
S. I. Richard L. Bishop. Tensor Analysis on Manifolds (Dover Publications, 1980).
[8]
P.-A. Absil, R. Mahony and R. Sepulchre. Riemannian geometry of Grassmann manifolds with a view on algorithmic computation. Acta Applicandae Mathematica 80, 199–220 (2004).
[9]
J. N. Stephen J. Wright. Numerical optimization (Springer Science+Business Media, 2006).
[10]
P. Buchfink, S. Glas and B. Haasdonk. Symplectic model reduction of Hamiltonian systems on nonlinear manifolds and approximation with weakly symplectic autoencoder. SIAM Journal on Scientific Computing 45, A289–A311 (2023).
[11]
L. Peng and K. Mohseni. Symplectic model reduction of Hamiltonian systems. SIAM Journal on Scientific Computing 38, A1–A27 (2016).
[12]
C. Greif and K. Urban. Decay of the Kolmogorov N-width for wave problems. Applied Mathematics Letters 96, 216–222 (2019).
[13]
T. Blickhan. A registration method for reduced basis problems using linear optimal transport, arXiv preprint arXiv:2304.14884 (2023).
[14]
K. Lee and K. T. Carlberg. Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders. Journal of Computational Physics 404, 108973 (2020).
[15]
B. Brantner. Generalizing Adam To Manifolds For Efficiently Training Transformers, arXiv preprint arXiv:2305.16901 (2023).
[16]
T. Lin and H. Zha. Riemannian manifold learning. IEEE transactions on pattern analysis and machine intelligence 30, 796–809 (2008).
[17]
T. Blickhan. BrenierTwoFluids.jl, https://github.com/ToBlick/BrenierTwoFluids (2023).
[18]
S. Fresca, L. Dede’ and A. Manzoni. A comprehensive deep learning-based approach to reduced order modeling of nonlinear time-dependent parametrized PDEs. Journal of Scientific Computing 87, 1–36 (2021).
[19]
M.-T. Luong, H. Pham and C. D. Manning. Effective approaches to attention-based neural machine translation, arXiv preprint arXiv:1508.04025 (2015).
[20]
D. Bahdanau, K. Cho and Y. Bengio. Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473 (2014).
[21]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin. Attention is all you need. Advances in neural information processing systems 30 (2017).
[22]
B. Brantner and M. Kraus. Symplectic autoencoders for Model Reduction of Hamiltonian Systems, arXiv preprint arXiv:2312.10004 (2023).
[23]
B. Brantner, G. de Romemont, M. Kraus and Z. Li. Structure-Preserving Transformers for Learning Parametrized Hamiltonian Systems, arXiv preprint arXiv:2312:11166 (2023).
[24]
T. Frankel. The geometry of physics: an introduction (Cambridge university press, Cambridge, UK, 2011).
[25]
I. Goodfellow, Y. Bengio and A. Courville. Deep learning (MIT press, Cambridge, MA, 2016).
[26]
and R. Sepulchre. Optimization algorithms on matrix manifolds (Princeton University Press, Princeton, New Jersey, 2008).
[27]
T. Bendokat, R. Zimmermann and P.-A. Absil. A Grassmann manifold handbook: Basic geometry and computational aspects, arXiv preprint arXiv:2011.13699 (2020).
[28]
W. S. Moses, V. Churavy, L. Paehler, J. Hückelheim, S. H. Narayanan, M. Schanen and J. Doerfert. Reverse-Mode Automatic Differentiation and Optimization of GPU Kernels via Enzyme. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '21 (Association for Computing Machinery, New York, NY, USA, 2021).
[29]
M. Betancourt. A geometric theory of higher-order automatic differentiation, arXiv preprint arXiv:1812.11592 (2018).
[30]
J. Bolte and E. Pauwels. A mathematical model for automatic differentiation in machine learning. Advances in Neural Information Processing Systems 33, 10809–10819 (2020).
[31]
K. Jacobs. Discrete Stochastics (Birkhäuser Verlag, Basel, Switzerland, 1992).
+References · GeometricMachineLearning.jl

References

[1]
P. Jin, Z. Zhang, A. Zhu, Y. Tang and G. E. Karniadakis. SympNets: Intrinsic structure-preserving symplectic networks for identifying Hamiltonian systems. Neural Networks 132, 166–179 (2020).
[2]
E. Hairer, C. Lubich and G. Wanner. Geometric Numerical integration: structure-preserving algorithms for ordinary differential equations (Springer, 2006).
[3]
B. Leimkuhler and S. Reich. Simulating hamiltonian dynamics. No. 14 (Cambridge university press, 2004).
[4]
P. Jin, Z. Lin and B. Xiao. Optimal unit triangular factorization of symplectic matrices. Linear Algebra and its Applications (2022).
[5]
S. Lipschutz. General Topology (McGraw-Hill Book Company, 1965).
[6]
S. Lang. Fundamentals of differential geometry. Vol. 191 (Springer Science & Business Media, 2012).
[7]
S. I. Richard L. Bishop. Tensor Analysis on Manifolds (Dover Publications, 1980).
[8]
P.-A. Absil, R. Mahony and R. Sepulchre. Riemannian geometry of Grassmann manifolds with a view on algorithmic computation. Acta Applicandae Mathematica 80, 199–220 (2004).
[9]
J. N. Stephen J. Wright. Numerical optimization (Springer Science+Business Media, 2006).
[10]
P. Buchfink, S. Glas and B. Haasdonk. Symplectic model reduction of Hamiltonian systems on nonlinear manifolds and approximation with weakly symplectic autoencoder. SIAM Journal on Scientific Computing 45, A289–A311 (2023).
[11]
L. Peng and K. Mohseni. Symplectic model reduction of Hamiltonian systems. SIAM Journal on Scientific Computing 38, A1–A27 (2016).
[12]
C. Greif and K. Urban. Decay of the Kolmogorov N-width for wave problems. Applied Mathematics Letters 96, 216–222 (2019).
[13]
T. Blickhan. A registration method for reduced basis problems using linear optimal transport, arXiv preprint arXiv:2304.14884 (2023).
[14]
K. Lee and K. T. Carlberg. Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders. Journal of Computational Physics 404, 108973 (2020).
[15]
B. Brantner. Generalizing Adam To Manifolds For Efficiently Training Transformers, arXiv preprint arXiv:2305.16901 (2023).
[16]
T. Lin and H. Zha. Riemannian manifold learning. IEEE transactions on pattern analysis and machine intelligence 30, 796–809 (2008).
[17]
T. Blickhan. BrenierTwoFluids.jl, https://github.com/ToBlick/BrenierTwoFluids (2023).
[18]
S. Fresca, L. Dede’ and A. Manzoni. A comprehensive deep learning-based approach to reduced order modeling of nonlinear time-dependent parametrized PDEs. Journal of Scientific Computing 87, 1–36 (2021).
[19]
M.-T. Luong, H. Pham and C. D. Manning. Effective approaches to attention-based neural machine translation, arXiv preprint arXiv:1508.04025 (2015).
[20]
D. Bahdanau, K. Cho and Y. Bengio. Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473 (2014).
[21]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin. Attention is all you need. Advances in neural information processing systems 30 (2017).
[22]
B. Brantner and M. Kraus. Symplectic autoencoders for Model Reduction of Hamiltonian Systems, arXiv preprint arXiv:2312.10004 (2023).
[23]
B. Brantner, G. de Romemont, M. Kraus and Z. Li. Structure-Preserving Transformers for Learning Parametrized Hamiltonian Systems, arXiv preprint arXiv:2312:11166 (2023).
[24]
T. Frankel. The geometry of physics: an introduction (Cambridge university press, Cambridge, UK, 2011).
[25]
I. Goodfellow, Y. Bengio and A. Courville. Deep learning (MIT press, Cambridge, MA, 2016).
[26]
and R. Sepulchre. Optimization algorithms on matrix manifolds (Princeton University Press, Princeton, New Jersey, 2008).
[27]
T. Bendokat, R. Zimmermann and P.-A. Absil. A Grassmann manifold handbook: Basic geometry and computational aspects, arXiv preprint arXiv:2011.13699 (2020).
[28]
W. S. Moses, V. Churavy, L. Paehler, J. Hückelheim, S. H. Narayanan, M. Schanen and J. Doerfert. Reverse-Mode Automatic Differentiation and Optimization of GPU Kernels via Enzyme. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '21 (Association for Computing Machinery, New York, NY, USA, 2021).
[29]
M. Betancourt. A geometric theory of higher-order automatic differentiation, arXiv preprint arXiv:1812.11592 (2018).
[30]
J. Bolte and E. Pauwels. A mathematical model for automatic differentiation in machine learning. Advances in Neural Information Processing Systems 33, 10809–10819 (2020).
[31]
K. Jacobs. Discrete Stochastics (Birkhäuser Verlag, Basel, Switzerland, 1992).
diff --git a/latest/tikz/Makefile b/latest/tikz/Makefile index deb459382..a85607aa9 100644 --- a/latest/tikz/Makefile +++ b/latest/tikz/Makefile @@ -3,13 +3,8 @@ all: pdf $(MAKE) logo $(MAKE) clean -linux: pdf - $(MAKE) convert_with_pdftocairo - $(MAKE) logo - $(MAKE) clean - -mac: pdf - $(MAKE) convert_with_sips +latex: pdf + $(MAKE) convert_with_pdftocairo res=500 $(MAKE) logo $(MAKE) clean @@ -20,16 +15,10 @@ pdf: $(MYDIR)/*.tex xelatex -shell-escape $${file} ; \ done -# this is converting pdfs to pngs using sips (mac version) -convert_with_sips: $(MYDIR)/*.pdf - for file in $^ ; do \ - sips --setProperty format png --resampleHeightWidthMax 2000 $${file} --out $${file%.*}.png ; \ - done - -# this is converting pdfs to pngs using pdftocairo (linux version) +# this is converting pdfs to pngs using pdftocairo (with a fixed resolution for all images) convert_with_pdftocairo: $(MYDIR)/*.pdf for file in $^ ; do \ - pdftocairo -png -r 500 -transp -singlefile $${file} $${file%.*} ; \ + pdftocairo -png -r $(res) -transp -singlefile $${file} $${file%.*} ; \ done png: diff --git a/latest/tutorials/grassmann_layer/99748321.svg b/latest/tutorials/grassmann_layer/7125e5d0.svg similarity index 65% rename from latest/tutorials/grassmann_layer/99748321.svg rename to latest/tutorials/grassmann_layer/7125e5d0.svg index fae1f7f5c..45bb45598 100644 --- a/latest/tutorials/grassmann_layer/99748321.svg +++ b/latest/tutorials/grassmann_layer/7125e5d0.svg @@ -1,963 +1,963 @@ - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/latest/tutorials/grassmann_layer/c5064922.svg b/latest/tutorials/grassmann_layer/8d219c66.svg similarity index 85% rename from latest/tutorials/grassmann_layer/c5064922.svg rename to latest/tutorials/grassmann_layer/8d219c66.svg index 20b494836..fbeb22422 100644 --- a/latest/tutorials/grassmann_layer/c5064922.svg +++ b/latest/tutorials/grassmann_layer/8d219c66.svg @@ -1,44 +1,44 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/latest/tutorials/grassmann_layer/12a0e472.svg b/latest/tutorials/grassmann_layer/bb166470.svg similarity index 66% rename from latest/tutorials/grassmann_layer/12a0e472.svg rename to latest/tutorials/grassmann_layer/bb166470.svg index 346cd52ae..9f8b8e7a5 100644 --- a/latest/tutorials/grassmann_layer/12a0e472.svg +++ b/latest/tutorials/grassmann_layer/bb166470.svg @@ -1,992 +1,992 @@ - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/latest/tutorials/grassmann_layer/index.html b/latest/tutorials/grassmann_layer/index.html index 7bd57c08b..964155927 100644 --- a/latest/tutorials/grassmann_layer/index.html +++ b/latest/tutorials/grassmann_layer/index.html @@ -2,7 +2,7 @@ Grassmann manifold · GeometricMachineLearning.jl

Example of a Neural Network with a Grassmann Layer

Here we show how to implement a neural network that contains a layer whose weight is an element of the Grassmann manifold and where this might be useful.

To answer where we would need this consider the following scenario

Problem statement

We are given data in a big space $\mathcal{D}=[d_i]_{i\in\mathcal{I}}\subset\mathbb{R}^N$ and know these data live on an $n$-dimensional[1] submanifold[2] in $\mathbb{R}^N$. Based on these data we would now like to generate new samples from the distributions that produced our original data. This is where the Grassmann manifold is useful: each element $V$ of the Grassmann manifold is an $n$-dimensional subspace of $\mathbb{R}^N$ from which we can easily sample. We can then construct a (bijective) mapping from this space $V$ onto a space that contains our data points $\mathcal{D}$.

Example

Consider the following toy example: We want to sample from the graph of the (scaled) Rosenbrock function $f(x,y) = ((1 - x)^2 + 100(y - x^2)^2)/1000$ while pretending we do not know the function.

rosenbrock(x::Vector) = ((1.0 - x[1]) ^ 2 + 100.0 * (x[2] - x[1] ^ 2) ^ 2) / 1000
 x, y = -1.5:0.1:1.5, -1.5:0.1:1.5
 z = Surface((x,y)->rosenbrock([x,y]), x, y)
-p = surface(x,y,z; camera=(30,20), alpha=.6, colorbar=false, xlims=(-1.5, 1.5), ylims=(-1.5, 1.5), zlims=(0.0, rosenbrock([-1.5, -1.5])))
Example block output

We now build a neural network whose task it is to map a product of two Gaussians $\mathcal{N}(0,1)\times\mathcal{N}(0,1)$ onto the graph of the Rosenbrock function where the range for $x$ and for $y$ is $[-1.5,1.5]$.

For computing the loss between the two distributions, i.e. $\Psi(\mathcal{N}(0,1)\times\mathcal{N}(0,1))$ and $f([-1.5,1.5], [-1.5,1.5])$ we use the Wasserstein distance[3].

using GeometricMachineLearning, Zygote, BrenierTwoFluid
+p = surface(x,y,z; camera=(30,20), alpha=.6, colorbar=false, xlims=(-1.5, 1.5), ylims=(-1.5, 1.5), zlims=(0.0, rosenbrock([-1.5, -1.5])))
Example block output

We now build a neural network whose task it is to map a product of two Gaussians $\mathcal{N}(0,1)\times\mathcal{N}(0,1)$ onto the graph of the Rosenbrock function where the range for $x$ and for $y$ is $[-1.5,1.5]$.

For computing the loss between the two distributions, i.e. $\Psi(\mathcal{N}(0,1)\times\mathcal{N}(0,1))$ and $f([-1.5,1.5], [-1.5,1.5])$ we use the Wasserstein distance[3].

using GeometricMachineLearning, Zygote, BrenierTwoFluid
 import Random # hide
 Random.seed!(123)
 
@@ -50,7 +50,7 @@
     loss_array[i] = val
     optimization_step!(optimizer, model, nn.params, dp)
 end
-plot(loss_array, xlabel="training step", label="loss")
Example block output

Now we plot a few points to check how well they match the graph:

const number_of_points = 35
+plot(loss_array, xlabel="training step", label="loss")
Example block output

Now we plot a few points to check how well they match the graph:

const number_of_points = 35
 
 coordinates = nn(randn(2, number_of_points))
-scatter3d!(p, [coordinates[1, :]], [coordinates[2, :]], [coordinates[3, :]], alpha=.5, color=4, label="mapped points")
Example block output
  • 1We may know $n$ exactly or approximately.
  • 2Problems and solutions related to this scenario are commonly summarized under the term manifold learning (see [16]).
  • 3The implementation of the Wasserstein distance is taken from [17].
+scatter3d!(p, [coordinates[1, :]], [coordinates[2, :]], [coordinates[3, :]], alpha=.5, color=4, label="mapped points")Example block output
  • 1We may know $n$ exactly or approximately.
  • 2Problems and solutions related to this scenario are commonly summarized under the term manifold learning (see [16]).
  • 3The implementation of the Wasserstein distance is taken from [17].
diff --git a/latest/tutorials/linear_wave_equation/index.html b/latest/tutorials/linear_wave_equation/index.html index 4ef776910..1753120e1 100644 --- a/latest/tutorials/linear_wave_equation/index.html +++ b/latest/tutorials/linear_wave_equation/index.html @@ -9,4 +9,4 @@ 1 - \frac{3}{2}s^2 + \frac{3}{4}s^3 & \text{if } 0 \leq s \leq 1 \\ \frac{1}{4}(2 - s)^3 & \text{if } 1 < s \leq 2 \\ 0 & \text{else.} -\end{cases}\]

Plotted on the relevant domain it looks like this:

if Main.output_type == :html # hide

Taking the above function $h(s)$ as a starting point, the initial conditions for the linear wave equations will now be constructed under the following considerations:

  • the initial condition (i.e. the shape of the wave) should depend on the parameter of the vector field, i.e. $u_0(\mu)(\omega) = h(s(\omega, \mu))$.
  • the solutions of the linear wave equation will travel with speed $\mu$, and we should make sure that the wave does not touch the right boundary of the domain, i.e. 0.5. So the peak should be sharper for higher values of $\mu$ as the wave will travel faster.
  • the wave should start at the left boundary of the domain, i.e. at point 0.5, so to cover it as much as possible.

Based on this we end up with the following choice of parametrized initial conditions:

\[u_0(\mu)(\omega) = h(s(\omega, \mu)), \quad s(\omega, \mu) = 20 \mu |\omega + \frac{\mu}{2}|.\]

References

[10]
P. Buchfink, S. Glas and B. Haasdonk. Symplectic model reduction of Hamiltonian systems on nonlinear manifolds and approximation with weakly symplectic autoencoder. SIAM Journal on Scientific Computing 45, A289–A311 (2023).
[11]
L. Peng and K. Mohseni. Symplectic model reduction of Hamiltonian systems. SIAM Journal on Scientific Computing 38, A1–A27 (2016).
[12]
C. Greif and K. Urban. Decay of the Kolmogorov N-width for wave problems. Applied Mathematics Letters 96, 216–222 (2019).
  • 1This conserves the Hamiltonian structure of the system.
+\end{cases}\]

Plotted on the relevant domain it looks like this:

if Main.output_type == :html # hide

Taking the above function $h(s)$ as a starting point, the initial conditions for the linear wave equations will now be constructed under the following considerations:

  • the initial condition (i.e. the shape of the wave) should depend on the parameter of the vector field, i.e. $u_0(\mu)(\omega) = h(s(\omega, \mu))$.
  • the solutions of the linear wave equation will travel with speed $\mu$, and we should make sure that the wave does not touch the right boundary of the domain, i.e. 0.5. So the peak should be sharper for higher values of $\mu$ as the wave will travel faster.
  • the wave should start at the left boundary of the domain, i.e. at point 0.5, so to cover it as much as possible.

Based on this we end up with the following choice of parametrized initial conditions:

\[u_0(\mu)(\omega) = h(s(\omega, \mu)), \quad s(\omega, \mu) = 20 \mu |\omega + \frac{\mu}{2}|.\]

References

[10]
P. Buchfink, S. Glas and B. Haasdonk. Symplectic model reduction of Hamiltonian systems on nonlinear manifolds and approximation with weakly symplectic autoencoder. SIAM Journal on Scientific Computing 45, A289–A311 (2023).
[11]
L. Peng and K. Mohseni. Symplectic model reduction of Hamiltonian systems. SIAM Journal on Scientific Computing 38, A1–A27 (2016).
[12]
C. Greif and K. Urban. Decay of the Kolmogorov N-width for wave problems. Applied Mathematics Letters 96, 216–222 (2019).
  • 1This conserves the Hamiltonian structure of the system.
diff --git a/latest/tutorials/mnist_tutorial/index.html b/latest/tutorials/mnist_tutorial/index.html index 620845895..8fd2992fe 100644 --- a/latest/tutorials/mnist_tutorial/index.html +++ b/latest/tutorials/mnist_tutorial/index.html @@ -19,4 +19,4 @@ loss_array = optimizer_instance(nn, dl, batch, n_epochs) -println("final test accuracy: ", accuracy(Ψᵉ, ps, dl_test), "\n")

It is instructive to play with n_layers, n_epochs and the Stiefel property.

[15]
B. Brantner. Generalizing Adam To Manifolds For Efficiently Training Transformers, arXiv preprint arXiv:2305.16901 (2023).
+println("final test accuracy: ", accuracy(Ψᵉ, ps, dl_test), "\n")

It is instructive to play with n_layers, n_epochs and the Stiefel property.

[15]
B. Brantner. Generalizing Adam To Manifolds For Efficiently Training Transformers, arXiv preprint arXiv:2305.16901 (2023).
diff --git a/latest/tutorials/sympnet_tutorial/48daba9d.svg b/latest/tutorials/sympnet_tutorial/b42e7145.svg similarity index 87% rename from latest/tutorials/sympnet_tutorial/48daba9d.svg rename to latest/tutorials/sympnet_tutorial/b42e7145.svg index 3dbe0a4ed..7f4284218 100644 --- a/latest/tutorials/sympnet_tutorial/48daba9d.svg +++ b/latest/tutorials/sympnet_tutorial/b42e7145.svg @@ -1,50 +1,50 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/latest/tutorials/sympnet_tutorial/d7094b1a.svg b/latest/tutorials/sympnet_tutorial/c2597183.svg similarity index 88% rename from latest/tutorials/sympnet_tutorial/d7094b1a.svg rename to latest/tutorials/sympnet_tutorial/c2597183.svg index 2d344eeed..57a3e9e8d 100644 --- a/latest/tutorials/sympnet_tutorial/d7094b1a.svg +++ b/latest/tutorials/sympnet_tutorial/c2597183.svg @@ -1,40 +1,40 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + diff --git a/latest/tutorials/sympnet_tutorial/index.html b/latest/tutorials/sympnet_tutorial/index.html index 0060fcf62..4685c89b6 100644 --- a/latest/tutorials/sympnet_tutorial/index.html +++ b/latest/tutorials/sympnet_tutorial/index.html @@ -38,405 +38,371 @@ # perform training (returns array that contains the total loss for each training step) g_loss_array = g_opt(g_nn, dl, batch, nepochs) -la_loss_array = la_opt(la_nn, dl, batch, nepochs)

Progress:   1%|▎                                        |  ETA: 0:17:38
+la_loss_array = la_opt(la_nn, dl, batch, nepochs)

Progress:   1%|▎                                        |  ETA: 0:16:51
   TrainingLoss:  0.909222836011647
-

Progress:  13%|█████▎                                   |  ETA: 0:01:05
-  TrainingLoss:  0.2452585095581233
-

Progress:  16%|██████▊                                  |  ETA: 0:00:50
+

Progress:  12%|████▊                                    |  ETA: 0:01:07
+  TrainingLoss:  0.2707095773187127
+

Progress:  14%|█████▋                                   |  ETA: 0:00:57
+  TrainingLoss:  0.21999370001441165
+

Progress:  15%|██████▏                                  |  ETA: 0:00:52
+  TrainingLoss:  0.18784218013538828
+

Progress:  16%|██████▊                                  |  ETA: 0:00:48
   TrainingLoss:  0.15866004989642474
-

Progress:  17%|███████▏                                 |  ETA: 0:00:47
-  TrainingLoss:  0.13982333708831512
-

Progress:  18%|███████▌                                 |  ETA: 0:00:45
-  TrainingLoss:  0.12242977503763892
-

Progress:  19%|███████▉                                 |  ETA: 0:00:42
-  TrainingLoss:  0.10606854606545517
-

Progress:  20%|████████▍                                |  ETA: 0:00:40
+

Progress:  18%|███████▎                                 |  ETA: 0:00:44
+  TrainingLoss:  0.1341243646944143
+

Progress:  19%|███████▊                                 |  ETA: 0:00:41
+  TrainingLoss:  0.11131824628452096
+

Progress:  20%|████████▍                                |  ETA: 0:00:38
   TrainingLoss:  0.09224355567759161
-

Progress:  21%|████████▊                                |  ETA: 0:00:38
-  TrainingLoss:  0.08080088746678245
-

Progress:  22%|█████████▏                               |  ETA: 0:00:37
-  TrainingLoss:  0.06986076690420591
-

Progress:  23%|█████████▋                               |  ETA: 0:00:35
-  TrainingLoss:  0.05963487293338036
-

Progress:  24%|██████████                               |  ETA: 0:00:33
+

Progress:  22%|████████▉                                |  ETA: 0:00:36
+  TrainingLoss:  0.07711666735650555
+

Progress:  23%|█████████▍                               |  ETA: 0:00:33
+  TrainingLoss:  0.06292411472539625
+

Progress:  24%|██████████                               |  ETA: 0:00:31
   TrainingLoss:  0.050648454856902815
-

Progress:  25%|██████████▍                              |  ETA: 0:00:32
-  TrainingLoss:  0.04366821160208312
-

Progress:  26%|██████████▊                              |  ETA: 0:00:31
-  TrainingLoss:  0.038484837461107875
-

Progress:  27%|███████████▎                             |  ETA: 0:00:29
-  TrainingLoss:  0.035155358402035305
-

Progress:  28%|███████████▋                             |  ETA: 0:00:28
-  TrainingLoss:  0.03254796132031385
-

Progress:  29%|████████████                             |  ETA: 0:00:27
+

Progress:  26%|██████████▌                              |  ETA: 0:00:30
+  TrainingLoss:  0.04170501315745365
+

Progress:  27%|██████████▉                              |  ETA: 0:00:28
+  TrainingLoss:  0.03715385976771302
+

Progress:  28%|███████████▌                             |  ETA: 0:00:27
+  TrainingLoss:  0.03332873755735711
+

Progress:  29%|████████████                             |  ETA: 0:00:26
   TrainingLoss:  0.03002121025704575
-

Progress:  30%|████████████▍                            |  ETA: 0:00:26
+

Progress:  30%|████████████▍                            |  ETA: 0:00:25
   TrainingLoss:  0.028098005436551547
-

Progress:  31%|████████████▉                            |  ETA: 0:00:25
-  TrainingLoss:  0.026246661761262592
-

Progress:  32%|█████████████▎                           |  ETA: 0:00:24
-  TrainingLoss:  0.024349815238206823
-

Progress:  33%|█████████████▋                           |  ETA: 0:00:23
-  TrainingLoss:  0.022715596892490447
-

Progress:  34%|██████████████▏                          |  ETA: 0:00:23
+

Progress:  32%|█████████████                            |  ETA: 0:00:23
+  TrainingLoss:  0.02577253251908417
+

Progress:  33%|█████████████▌                           |  ETA: 0:00:22
+  TrainingLoss:  0.023281006734764808
+

Progress:  34%|██████████████▏                          |  ETA: 0:00:21
   TrainingLoss:  0.021281667325134722
-

Progress:  35%|██████████████▌                          |  ETA: 0:00:22
-  TrainingLoss:  0.02036181031063595
-

Progress:  36%|██████████████▉                          |  ETA: 0:00:21
-  TrainingLoss:  0.01923765140273361
-

Progress:  37%|███████████████▎                         |  ETA: 0:00:20
-  TrainingLoss:  0.018417784651478128
-

Progress:  38%|███████████████▊                         |  ETA: 0:00:20
+

Progress:  36%|██████████████▋                          |  ETA: 0:00:20
+  TrainingLoss:  0.020002841333850434
+

Progress:  37%|███████████████▏                         |  ETA: 0:00:19
+  TrainingLoss:  0.018694377190663976
+

Progress:  38%|███████████████▊                         |  ETA: 0:00:18
   TrainingLoss:  0.01771676858136383
-

Progress:  39%|████████████████▏                        |  ETA: 0:00:19
-  TrainingLoss:  0.01691771673258003
-

Progress:  40%|████████████████▌                        |  ETA: 0:00:18
-  TrainingLoss:  0.01624378238608939
-

Progress:  41%|█████████████████                        |  ETA: 0:00:18
-  TrainingLoss:  0.015535916959958696
-

Progress:  42%|█████████████████▍                       |  ETA: 0:00:17
+

Progress:  40%|████████████████▎                        |  ETA: 0:00:18
+  TrainingLoss:  0.01671047586587387
+

Progress:  41%|████████████████▊                        |  ETA: 0:00:17
+  TrainingLoss:  0.01583640206689827
+

Progress:  42%|█████████████████▍                       |  ETA: 0:00:16
   TrainingLoss:  0.014929130853420344
-

Progress:  43%|█████████████████▊                       |  ETA: 0:00:17
-  TrainingLoss:  0.01436009220151282
-

Progress:  44%|██████████████████▏                      |  ETA: 0:00:16
-  TrainingLoss:  0.013819415260382597
-

Progress:  45%|██████████████████▋                      |  ETA: 0:00:16
-  TrainingLoss:  0.013208620663237353
-

Progress:  46%|███████████████████                      |  ETA: 0:00:15
+

Progress:  44%|█████████████████▉                       |  ETA: 0:00:16
+  TrainingLoss:  0.01413571251544236
+

Progress:  45%|██████████████████▌                      |  ETA: 0:00:15
+  TrainingLoss:  0.013430767977697822
+

Progress:  46%|███████████████████                      |  ETA: 0:00:14
   TrainingLoss:  0.012802956037120991
-

Progress:  47%|███████████████████▍                     |  ETA: 0:00:15
-  TrainingLoss:  0.012380512078110134
-

Progress:  48%|███████████████████▉                     |  ETA: 0:00:14
-  TrainingLoss:  0.011865888831338196
-

Progress:  49%|████████████████████▎                    |  ETA: 0:00:14
-  TrainingLoss:  0.011513163766239015
+

Progress:  48%|███████████████████▌                     |  ETA: 0:00:14
+  TrainingLoss:  0.012213005637506723
+

Progress:  49%|████████████████████▏                    |  ETA: 0:00:13
+  TrainingLoss:  0.011609319493380392
 

Progress:  50%|████████████████████▋                    |  ETA: 0:00:13
   TrainingLoss:  0.011071099032179621
-

Progress:  51%|█████████████████████                    |  ETA: 0:00:13
-  TrainingLoss:  0.010798583868263614
-

Progress:  52%|█████████████████████▌                   |  ETA: 0:00:13
+

Progress:  52%|█████████████████████▏                   |  ETA: 0:00:12
+  TrainingLoss:  0.010663843576407608
+

Progress:  52%|█████████████████████▌                   |  ETA: 0:00:12
   TrainingLoss:  0.010373200887543397
-

Progress:  53%|█████████████████████▉                   |  ETA: 0:00:12
-  TrainingLoss:  0.010113246291306851
-

Progress:  54%|██████████████████████▎                  |  ETA: 0:00:12
-  TrainingLoss:  0.009750831637723466
-

Progress:  55%|██████████████████████▋                  |  ETA: 0:00:11
-  TrainingLoss:  0.009459923398791776
-

Progress:  56%|███████████████████████▏                 |  ETA: 0:00:11
+

Progress:  54%|██████████████████████                   |  ETA: 0:00:11
+  TrainingLoss:  0.009966718243090401
+

Progress:  55%|██████████████████████▌                  |  ETA: 0:00:11
+  TrainingLoss:  0.009529085234212562
+

Progress:  56%|███████████████████████▏                 |  ETA: 0:00:10
   TrainingLoss:  0.00923169773886583
-

Progress:  57%|███████████████████████▌                 |  ETA: 0:00:11
-  TrainingLoss:  0.008918903550656197
-

Progress:  58%|███████████████████████▉                 |  ETA: 0:00:10
-  TrainingLoss:  0.008832151219499502
-

Progress:  59%|████████████████████████▍                |  ETA: 0:00:10
-  TrainingLoss:  0.008560885156218034
-

Progress:  60%|████████████████████████▊                |  ETA: 0:00:10
+

Progress:  58%|███████████████████████▋                 |  ETA: 0:00:10
+  TrainingLoss:  0.008887308303050952
+

Progress:  59%|████████████████████████▎                |  ETA: 0:00:09
+  TrainingLoss:  0.008655760949679367
+

Progress:  60%|████████████████████████▊                |  ETA: 0:00:09
   TrainingLoss:  0.008320687618735174
-

Progress:  61%|█████████████████████████▏               |  ETA: 0:00:09
-  TrainingLoss:  0.008130829562582617
-

Progress:  62%|█████████████████████████▌               |  ETA: 0:00:09
-  TrainingLoss:  0.007945766406464255
-

Progress:  63%|██████████████████████████               |  ETA: 0:00:09
-  TrainingLoss:  0.007743307671420374
+

Progress:  62%|█████████████████████████▎               |  ETA: 0:00:09
+  TrainingLoss:  0.008057216002955423
+

Progress:  63%|█████████████████████████▉               |  ETA: 0:00:08
+  TrainingLoss:  0.007817013381626867
 

Progress:  64%|██████████████████████████▍              |  ETA: 0:00:08
   TrainingLoss:  0.007606258020944676
-

Progress:  65%|██████████████████████████▊              |  ETA: 0:00:08
-  TrainingLoss:  0.0074229887045026005
-

Progress:  66%|███████████████████████████▎             |  ETA: 0:00:08
-  TrainingLoss:  0.007314861747524862
-

Progress:  67%|███████████████████████████▋             |  ETA: 0:00:08
-  TrainingLoss:  0.007160189981425366
+

Progress:  66%|██████████████████████████▉              |  ETA: 0:00:07
+  TrainingLoss:  0.00741897936575434
+

Progress:  67%|███████████████████████████▌             |  ETA: 0:00:07
+  TrainingLoss:  0.0072318641745363664
 

Progress:  68%|████████████████████████████             |  ETA: 0:00:07
   TrainingLoss:  0.007030576968018319
-

Progress:  69%|████████████████████████████▍            |  ETA: 0:00:07
-  TrainingLoss:  0.006870848875129526
-

Progress:  70%|████████████████████████████▉            |  ETA: 0:00:07
-  TrainingLoss:  0.006747481287128898
-

Progress:  71%|█████████████████████████████▎           |  ETA: 0:00:06
-  TrainingLoss:  0.00664129804237869
+

Progress:  70%|████████████████████████████▋            |  ETA: 0:00:06
+  TrainingLoss:  0.006842160831814867
+

Progress:  71%|█████████████████████████████▏           |  ETA: 0:00:06
+  TrainingLoss:  0.006663295693090479
 

Progress:  72%|█████████████████████████████▋           |  ETA: 0:00:06
   TrainingLoss:  0.006541829406911069
-

Progress:  73%|██████████████████████████████▏          |  ETA: 0:00:06
-  TrainingLoss:  0.0064552203631803836
-

Progress:  74%|██████████████████████████████▌          |  ETA: 0:00:06
-  TrainingLoss:  0.0063660010423700775
-

Progress:  75%|██████████████████████████████▉          |  ETA: 0:00:05
-  TrainingLoss:  0.0064022197079272895
+

Progress:  74%|██████████████████████████████▎          |  ETA: 0:00:05
+  TrainingLoss:  0.006439266215098352
+

Progress:  75%|██████████████████████████████▊          |  ETA: 0:00:05
+  TrainingLoss:  0.006373630354463582
 

Progress:  76%|███████████████████████████████▎         |  ETA: 0:00:05
   TrainingLoss:  0.006231744483980814
-

Progress:  77%|███████████████████████████████▊         |  ETA: 0:00:05
-  TrainingLoss:  0.006119335012969507
-

Progress:  78%|████████████████████████████████▏        |  ETA: 0:00:05
+

Progress:  78%|███████████████████████████████▉         |  ETA: 0:00:04
+  TrainingLoss:  0.006105925322761813
+

Progress:  78%|████████████████████████████████▏        |  ETA: 0:00:04
   TrainingLoss:  0.006051537518187265
-

Progress:  79%|████████████████████████████████▌        |  ETA: 0:00:04
-  TrainingLoss:  0.005995399385285204
-

Progress:  80%|████████████████████████████████▉        |  ETA: 0:00:04
-  TrainingLoss:  0.005940739644776499
-

Progress:  81%|█████████████████████████████████▍       |  ETA: 0:00:04
-  TrainingLoss:  0.005864659481308273
-

Progress:  82%|█████████████████████████████████▊       |  ETA: 0:00:04
+

Progress:  80%|████████████████████████████████▋        |  ETA: 0:00:04
+  TrainingLoss:  0.005958265393168101
+

Progress:  81%|█████████████████████████████████▎       |  ETA: 0:00:04
+  TrainingLoss:  0.00589933221157149
+

Progress:  82%|█████████████████████████████████▊       |  ETA: 0:00:03
   TrainingLoss:  0.005839728541554363
-

Progress:  83%|██████████████████████████████████▏      |  ETA: 0:00:03
-  TrainingLoss:  0.005756940686556331
-

Progress:  84%|██████████████████████████████████▋      |  ETA: 0:00:03
-  TrainingLoss:  0.00566662379100261
-

Progress:  85%|███████████████████████████████████      |  ETA: 0:00:03
-  TrainingLoss:  0.0056088887753173566
+

Progress:  84%|██████████████████████████████████▎      |  ETA: 0:00:03
+  TrainingLoss:  0.005732300286263278
+

Progress:  85%|██████████████████████████████████▉      |  ETA: 0:00:03
+  TrainingLoss:  0.00564177681931222
 

Progress:  86%|███████████████████████████████████▍     |  ETA: 0:00:03
   TrainingLoss:  0.005588522487225395
-

Progress:  87%|███████████████████████████████████▊     |  ETA: 0:00:03
-  TrainingLoss:  0.005510213746739661
-

Progress:  88%|████████████████████████████████████▎    |  ETA: 0:00:02
-  TrainingLoss:  0.0054979164963569
-

Progress:  89%|████████████████████████████████████▋    |  ETA: 0:00:02
-  TrainingLoss:  0.005419126072428864
+

Progress:  88%|████████████████████████████████████     |  ETA: 0:00:02
+  TrainingLoss:  0.005518383756603809
+

Progress:  89%|████████████████████████████████████▌    |  ETA: 0:00:02
+  TrainingLoss:  0.005447130651414077
 

Progress:  90%|█████████████████████████████████████    |  ETA: 0:00:02
   TrainingLoss:  0.005383498125335612
-

Progress:  91%|█████████████████████████████████████▌   |  ETA: 0:00:02
-  TrainingLoss:  0.0053433202349327386
-

Progress:  92%|█████████████████████████████████████▉   |  ETA: 0:00:02
-  TrainingLoss:  0.005289632405488795
-

Progress:  93%|██████████████████████████████████████▎  |  ETA: 0:00:01
-  TrainingLoss:  0.005252486571131763
+

Progress:  92%|█████████████████████████████████████▋   |  ETA: 0:00:02
+  TrainingLoss:  0.005338191077391543
+

Progress:  93%|██████████████████████████████████████▏  |  ETA: 0:00:01
+  TrainingLoss:  0.005325000191606857
 

Progress:  94%|██████████████████████████████████████▋  |  ETA: 0:00:01
   TrainingLoss:  0.005241139972852765
-

Progress:  95%|███████████████████████████████████████▏ |  ETA: 0:00:01
-  TrainingLoss:  0.005224773711242664
-

Progress:  96%|███████████████████████████████████████▌ |  ETA: 0:00:01
-  TrainingLoss:  0.005189084656079959
-

Progress:  97%|███████████████████████████████████████▉ |  ETA: 0:00:01
-  TrainingLoss:  0.005160276706676276
+

Progress:  96%|███████████████████████████████████████▎ |  ETA: 0:00:01
+  TrainingLoss:  0.005199936369162577
+

Progress:  97%|███████████████████████████████████████▊ |  ETA: 0:00:01
+  TrainingLoss:  0.0051580930550792214
 

Progress:  98%|████████████████████████████████████████▍|  ETA: 0:00:00
   TrainingLoss:  0.005100797449693223
-

Progress:  99%|████████████████████████████████████████▊|  ETA: 0:00:00
-  TrainingLoss:  0.0051866932602439055
-

Progress: 100%|█████████████████████████████████████████| Time: 0:00:19
+

Progress:  99%|████████████████████████████████████████▉|  ETA: 0:00:00
+  TrainingLoss:  0.005108432700258919
+

Progress: 100%|█████████████████████████████████████████| Time: 0:00:17
   TrainingLoss:  0.005070411268189842
-
Progress:   1%|▎                                        |  ETA: 0:07:42
+
Progress:   1%|▎                                        |  ETA: 0:07:39
   TrainingLoss:  14.632655986264927
-

Progress:   1%|▌                                        |  ETA: 0:03:58
+

Progress:   1%|▌                                        |  ETA: 0:03:56
   TrainingLoss:  13.329559242102803
-

Progress:   2%|▉                                        |  ETA: 0:02:44
+

Progress:   2%|▉                                        |  ETA: 0:02:42
   TrainingLoss:  12.143594203779958
-

Progress:   3%|█▏                                       |  ETA: 0:02:06
+

Progress:   3%|█▏                                       |  ETA: 0:02:04
   TrainingLoss:  11.039649287252608
-

Progress:   3%|█▍                                       |  ETA: 0:01:44
+

Progress:   3%|█▍                                       |  ETA: 0:01:42
   TrainingLoss:  10.02114636245331
-

Progress:   4%|█▊                                       |  ETA: 0:01:24
-  TrainingLoss:  8.664983302582026
-

Progress:   5%|██                                       |  ETA: 0:01:15
-  TrainingLoss:  7.826286073786067
-

Progress:   6%|██▍                                      |  ETA: 0:01:07
-  TrainingLoss:  7.060097246691313
-

Progress:   6%|██▋                                      |  ETA: 0:01:02
-  TrainingLoss:  6.3372364378906365
-

Progress:   7%|██▉                                      |  ETA: 0:00:57
-  TrainingLoss:  5.662896892396221
-

Progress:   8%|███▏                                     |  ETA: 0:00:53
-  TrainingLoss:  5.024312059121114
-

Progress:   8%|███▍                                     |  ETA: 0:00:50
-  TrainingLoss:  4.427968285628076
-

Progress:   9%|███▊                                     |  ETA: 0:00:47
-  TrainingLoss:  3.892381685431584
-

Progress:  10%|████                                     |  ETA: 0:00:45
-  TrainingLoss:  3.400291327607385
-

Progress:  10%|████▎                                    |  ETA: 0:00:42
-  TrainingLoss:  3.018038703349647
-

Progress:  11%|████▌                                    |  ETA: 0:00:41
-  TrainingLoss:  2.728452875751347
-

Progress:  12%|████▊                                    |  ETA: 0:00:39
+

Progress:   4%|█▋                                       |  ETA: 0:01:27
+  TrainingLoss:  9.097046791654707
+

Progress:   5%|█▉                                       |  ETA: 0:01:16
+  TrainingLoss:  8.237909753448795
+

Progress:   5%|██▏                                      |  ETA: 0:01:08
+  TrainingLoss:  7.4387152627155375
+

Progress:   6%|██▌                                      |  ETA: 0:01:02
+  TrainingLoss:  6.6946504960811195
+

Progress:   7%|██▊                                      |  ETA: 0:00:57
+  TrainingLoss:  5.99335742380527
+

Progress:   7%|███                                      |  ETA: 0:00:53
+  TrainingLoss:  5.337824496581051
+

Progress:   8%|███▎                                     |  ETA: 0:00:49
+  TrainingLoss:  4.722331395613759
+

Progress:   9%|███▌                                     |  ETA: 0:00:46
+  TrainingLoss:  4.157010558636954
+

Progress:   9%|███▉                                     |  ETA: 0:00:44
+  TrainingLoss:  3.6389835326953497
+

Progress:  10%|████▏                                    |  ETA: 0:00:42
+  TrainingLoss:  3.196596575110065
+

Progress:  11%|████▍                                    |  ETA: 0:00:40
+  TrainingLoss:  2.8546007176758708
+

Progress:  12%|████▊                                    |  ETA: 0:00:37
   TrainingLoss:  2.563477605016243
-

Progress:  12%|█████                                    |  ETA: 0:00:37
+

Progress:  12%|█████                                    |  ETA: 0:00:36
   TrainingLoss:  2.4589363154824095
-

Progress:  13%|█████▍                                   |  ETA: 0:00:36
+

Progress:  13%|█████▍                                   |  ETA: 0:00:34
   TrainingLoss:  2.367851135226425
-

Progress:  14%|█████▋                                   |  ETA: 0:00:35
+

Progress:  14%|█████▋                                   |  ETA: 0:00:33
   TrainingLoss:  2.2839316905065385
-

Progress:  14%|█████▉                                   |  ETA: 0:00:34
+

Progress:  14%|█████▉                                   |  ETA: 0:00:32
   TrainingLoss:  2.2027034713961204
-

Progress:  15%|██████▏                                  |  ETA: 0:00:33
+

Progress:  15%|██████▏                                  |  ETA: 0:00:31
   TrainingLoss:  2.119989438641279
-

Progress:  16%|██████▍                                  |  ETA: 0:00:32
+

Progress:  16%|██████▍                                  |  ETA: 0:00:30
   TrainingLoss:  2.039395047170094
-

Progress:  16%|██████▊                                  |  ETA: 0:00:31
+

Progress:  16%|██████▊                                  |  ETA: 0:00:29
   TrainingLoss:  1.9560871301681106
-

Progress:  17%|███████                                  |  ETA: 0:00:30
+

Progress:  17%|███████                                  |  ETA: 0:00:28
   TrainingLoss:  1.8737919561973513
-

Progress:  18%|███████▎                                 |  ETA: 0:00:29
+

Progress:  18%|███████▎                                 |  ETA: 0:00:27
   TrainingLoss:  1.7956070934878448
-

Progress:  18%|███████▌                                 |  ETA: 0:00:28
+

Progress:  18%|███████▌                                 |  ETA: 0:00:27
   TrainingLoss:  1.7173402100382613
-

Progress:  19%|███████▊                                 |  ETA: 0:00:28
+

Progress:  19%|███████▊                                 |  ETA: 0:00:26
   TrainingLoss:  1.6415047688305637
-

Progress:  20%|████████▏                                |  ETA: 0:00:28
+

Progress:  20%|████████▏                                |  ETA: 0:00:25
   TrainingLoss:  1.5599512629481986
-

Progress:  20%|████████▍                                |  ETA: 0:00:28
+

Progress:  20%|████████▍                                |  ETA: 0:00:25
   TrainingLoss:  1.474598400713758
-

Progress:  21%|████████▋                                |  ETA: 0:00:27
+

Progress:  21%|████████▋                                |  ETA: 0:00:24
   TrainingLoss:  1.3934137812896394
-

Progress:  22%|████████▉                                |  ETA: 0:00:26
+

Progress:  22%|████████▉                                |  ETA: 0:00:24
   TrainingLoss:  1.3150502016603007
-

Progress:  22%|█████████▏                               |  ETA: 0:00:26
+

Progress:  22%|█████████▏                               |  ETA: 0:00:23
   TrainingLoss:  1.2414577504431759
-

Progress:  23%|█████████▍                               |  ETA: 0:00:25
+

Progress:  23%|█████████▍                               |  ETA: 0:00:23
   TrainingLoss:  1.164763172928613
-

Progress:  24%|█████████▊                               |  ETA: 0:00:25
+

Progress:  24%|█████████▊                               |  ETA: 0:00:22
   TrainingLoss:  1.0992520972478235
-

Progress:  24%|██████████                               |  ETA: 0:00:24
+

Progress:  24%|██████████                               |  ETA: 0:00:22
   TrainingLoss:  1.043503871028655
-

Progress:  25%|██████████▎                              |  ETA: 0:00:24
+

Progress:  25%|██████████▎                              |  ETA: 0:00:21
   TrainingLoss:  0.9907049996147231
-

Progress:  26%|██████████▌                              |  ETA: 0:00:23
+

Progress:  26%|██████████▌                              |  ETA: 0:00:21
   TrainingLoss:  0.937864831394004
-

Progress:  26%|██████████▊                              |  ETA: 0:00:23
+

Progress:  26%|██████████▊                              |  ETA: 0:00:21
   TrainingLoss:  0.8866346274340288
-

Progress:  27%|███████████▏                             |  ETA: 0:00:22
+

Progress:  27%|███████████▏                             |  ETA: 0:00:20
   TrainingLoss:  0.837410160030462
-

Progress:  28%|███████████▍                             |  ETA: 0:00:22
+

Progress:  28%|███████████▍                             |  ETA: 0:00:20
   TrainingLoss:  0.7890516517640938
-

Progress:  28%|███████████▋                             |  ETA: 0:00:21
+

Progress:  28%|███████████▋                             |  ETA: 0:00:19
   TrainingLoss:  0.7423383837164257
-

Progress:  29%|███████████▉                             |  ETA: 0:00:21
+

Progress:  29%|███████████▉                             |  ETA: 0:00:19
   TrainingLoss:  0.697538174671303
-

Progress:  30%|████████████▏                            |  ETA: 0:00:21
+

Progress:  30%|████████████▏                            |  ETA: 0:00:19
   TrainingLoss:  0.6582292233771829
-

Progress:  30%|████████████▍                            |  ETA: 0:00:20
+

Progress:  30%|████████████▍                            |  ETA: 0:00:18
   TrainingLoss:  0.6224919123900028
-

Progress:  31%|████████████▊                            |  ETA: 0:00:20
+

Progress:  31%|████████████▊                            |  ETA: 0:00:18
   TrainingLoss:  0.58730267432665
-

Progress:  32%|█████████████                            |  ETA: 0:00:20
+

Progress:  32%|█████████████                            |  ETA: 0:00:18
   TrainingLoss:  0.5524066912925043
-

Progress:  32%|█████████████▎                           |  ETA: 0:00:19
+

Progress:  32%|█████████████▎                           |  ETA: 0:00:17
   TrainingLoss:  0.5182720630782547
-

Progress:  33%|█████████████▌                           |  ETA: 0:00:19
+

Progress:  33%|█████████████▌                           |  ETA: 0:00:17
   TrainingLoss:  0.48443695420371735
-

Progress:  34%|█████████████▊                           |  ETA: 0:00:19
+

Progress:  34%|█████████████▊                           |  ETA: 0:00:17
   TrainingLoss:  0.4512055050806613
-

Progress:  34%|██████████████▏                          |  ETA: 0:00:18
+

Progress:  34%|██████████████▏                          |  ETA: 0:00:17
   TrainingLoss:  0.416535305058476
-

Progress:  35%|██████████████▍                          |  ETA: 0:00:18
+

Progress:  35%|██████████████▍                          |  ETA: 0:00:16
   TrainingLoss:  0.3823346069524062
-

Progress:  36%|██████████████▋                          |  ETA: 0:00:18
+

Progress:  36%|██████████████▋                          |  ETA: 0:00:16
   TrainingLoss:  0.3488733373242542
-

Progress:  36%|██████████████▉                          |  ETA: 0:00:18
+

Progress:  36%|██████████████▉                          |  ETA: 0:00:16
   TrainingLoss:  0.31524269728197235
-

Progress:  37%|███████████████▏                         |  ETA: 0:00:18
+

Progress:  37%|███████████████▏                         |  ETA: 0:00:16
   TrainingLoss:  0.28228433330338454
-

Progress:  38%|███████████████▌                         |  ETA: 0:00:17
+

Progress:  38%|███████████████▌                         |  ETA: 0:00:15
   TrainingLoss:  0.2492514554951062
-

Progress:  38%|███████████████▊                         |  ETA: 0:00:17
+

Progress:  38%|███████████████▊                         |  ETA: 0:00:15
   TrainingLoss:  0.21779730660923474
-

Progress:  39%|████████████████                         |  ETA: 0:00:17
+

Progress:  39%|████████████████                         |  ETA: 0:00:15
   TrainingLoss:  0.18764192633612334
-

Progress:  40%|████████████████▎                        |  ETA: 0:00:16
+

Progress:  40%|████████████████▎                        |  ETA: 0:00:15
   TrainingLoss:  0.15869341883306012
-

Progress:  40%|████████████████▌                        |  ETA: 0:00:16
+

Progress:  40%|████████████████▌                        |  ETA: 0:00:14
   TrainingLoss:  0.1318885979667291
-

Progress:  41%|████████████████▊                        |  ETA: 0:00:16
+

Progress:  41%|████████████████▊                        |  ETA: 0:00:14
   TrainingLoss:  0.1099539138032661
-

Progress:  42%|█████████████████▏                       |  ETA: 0:00:16
+

Progress:  42%|█████████████████▏                       |  ETA: 0:00:14
   TrainingLoss:  0.09701118126007126
-

Progress:  42%|█████████████████▍                       |  ETA: 0:00:15
+

Progress:  42%|█████████████████▍                       |  ETA: 0:00:14
   TrainingLoss:  0.08900259651992284
-

Progress:  43%|█████████████████▋                       |  ETA: 0:00:15
+

Progress:  43%|█████████████████▋                       |  ETA: 0:00:13
   TrainingLoss:  0.0808094563549285
-

Progress:  44%|█████████████████▉                       |  ETA: 0:00:15
+

Progress:  44%|█████████████████▉                       |  ETA: 0:00:13
   TrainingLoss:  0.07339300802370965
-

Progress:  44%|██████████████████▏                      |  ETA: 0:00:15
+

Progress:  44%|██████████████████▏                      |  ETA: 0:00:13
   TrainingLoss:  0.06675713528309779
-

Progress:  45%|██████████████████▌                      |  ETA: 0:00:14
+

Progress:  45%|██████████████████▌                      |  ETA: 0:00:13
   TrainingLoss:  0.0606663803236915
-

Progress:  46%|██████████████████▊                      |  ETA: 0:00:14
+

Progress:  46%|██████████████████▊                      |  ETA: 0:00:13
   TrainingLoss:  0.05545599246206947
-

Progress:  46%|███████████████████                      |  ETA: 0:00:14
+

Progress:  46%|███████████████████                      |  ETA: 0:00:12
   TrainingLoss:  0.050715554791113025
-

Progress:  47%|███████████████████▎                     |  ETA: 0:00:14
+

Progress:  47%|███████████████████▎                     |  ETA: 0:00:12
   TrainingLoss:  0.0464026436710707
-

Progress:  48%|███████████████████▌                     |  ETA: 0:00:13
+

Progress:  48%|███████████████████▌                     |  ETA: 0:00:12
   TrainingLoss:  0.042549202200844925
-

Progress:  48%|███████████████████▉                     |  ETA: 0:00:13
+

Progress:  48%|███████████████████▉                     |  ETA: 0:00:12
   TrainingLoss:  0.03921800743473182
-

Progress:  49%|████████████████████▏                    |  ETA: 0:00:13
+

Progress:  49%|████████████████████▏                    |  ETA: 0:00:12
   TrainingLoss:  0.03614571237958915
-

Progress:  50%|████████████████████▍                    |  ETA: 0:00:13
+

Progress:  50%|████████████████████▍                    |  ETA: 0:00:11
   TrainingLoss:  0.033762651633194245
-

Progress:  50%|████████████████████▋                    |  ETA: 0:00:13
+

Progress:  50%|████████████████████▋                    |  ETA: 0:00:11
   TrainingLoss:  0.03160352098444022
-

Progress:  51%|████████████████████▉                    |  ETA: 0:00:12
+

Progress:  51%|████████████████████▉                    |  ETA: 0:00:11
   TrainingLoss:  0.029496129613007144
-

Progress:  52%|█████████████████████▏                   |  ETA: 0:00:12
+

Progress:  52%|█████████████████████▏                   |  ETA: 0:00:11
   TrainingLoss:  0.027812358892054377
-

Progress:  52%|█████████████████████▌                   |  ETA: 0:00:12
+

Progress:  52%|█████████████████████▌                   |  ETA: 0:00:11
   TrainingLoss:  0.026299936002555913
-

Progress:  53%|█████████████████████▊                   |  ETA: 0:00:12
+

Progress:  53%|█████████████████████▊                   |  ETA: 0:00:10
   TrainingLoss:  0.024968781910377685
-

Progress:  54%|██████████████████████                   |  ETA: 0:00:12
+

Progress:  54%|██████████████████████                   |  ETA: 0:00:10
   TrainingLoss:  0.02374785247234018
-

Progress:  54%|██████████████████████▎                  |  ETA: 0:00:11
+

Progress:  54%|██████████████████████▎                  |  ETA: 0:00:10
   TrainingLoss:  0.022809070756924087
-

Progress:  55%|██████████████████████▌                  |  ETA: 0:00:11
+

Progress:  55%|██████████████████████▌                  |  ETA: 0:00:10
   TrainingLoss:  0.02188893586840457
-

Progress:  56%|██████████████████████▉                  |  ETA: 0:00:11
+

Progress:  56%|██████████████████████▉                  |  ETA: 0:00:10
   TrainingLoss:  0.02118680572754904
-

Progress:  56%|███████████████████████▏                 |  ETA: 0:00:11
+

Progress:  56%|███████████████████████▏                 |  ETA: 0:00:10
   TrainingLoss:  0.020551368049606222
-

Progress:  57%|███████████████████████▍                 |  ETA: 0:00:11
+

Progress:  57%|███████████████████████▍                 |  ETA: 0:00:09
   TrainingLoss:  0.020100220024996764
-

Progress:  58%|███████████████████████▋                 |  ETA: 0:00:10
+

Progress:  58%|███████████████████████▋                 |  ETA: 0:00:09
   TrainingLoss:  0.019509878917005626
-

Progress:  58%|███████████████████████▉                 |  ETA: 0:00:10
+

Progress:  58%|███████████████████████▉                 |  ETA: 0:00:09
   TrainingLoss:  0.019141533776909803
-

Progress:  59%|████████████████████████▎                |  ETA: 0:00:10
+

Progress:  59%|████████████████████████▎                |  ETA: 0:00:09
   TrainingLoss:  0.018975374225275038
-

Progress:  60%|████████████████████████▌                |  ETA: 0:00:10
+

Progress:  60%|████████████████████████▌                |  ETA: 0:00:09
   TrainingLoss:  0.018515774727828112
-

Progress:  60%|████████████████████████▊                |  ETA: 0:00:10
+

Progress:  60%|████████████████████████▊                |  ETA: 0:00:09
   TrainingLoss:  0.018091452314537394
-

Progress:  61%|█████████████████████████                |  ETA: 0:00:09
+

Progress:  61%|█████████████████████████                |  ETA: 0:00:08
   TrainingLoss:  0.017772086369565808
-

Progress:  62%|█████████████████████████▎               |  ETA: 0:00:09
+

Progress:  62%|█████████████████████████▎               |  ETA: 0:00:08
   TrainingLoss:  0.017536378851434803
-

Progress:  62%|█████████████████████████▌               |  ETA: 0:00:09
+

Progress:  62%|█████████████████████████▌               |  ETA: 0:00:08
   TrainingLoss:  0.01736003468706266
-

Progress:  63%|█████████████████████████▉               |  ETA: 0:00:09
+

Progress:  63%|█████████████████████████▉               |  ETA: 0:00:08
   TrainingLoss:  0.017025823849866683
-

Progress:  64%|██████████████████████████▏              |  ETA: 0:00:09
+

Progress:  64%|██████████████████████████▏              |  ETA: 0:00:08
   TrainingLoss:  0.016688046574225616
 

Progress:  64%|██████████████████████████▍              |  ETA: 0:00:08
   TrainingLoss:  0.016490627785595374
-

Progress:  65%|██████████████████████████▋              |  ETA: 0:00:08
+

Progress:  65%|██████████████████████████▋              |  ETA: 0:00:07
   TrainingLoss:  0.01629352014149605
-

Progress:  66%|██████████████████████████▉              |  ETA: 0:00:08
+

Progress:  66%|██████████████████████████▉              |  ETA: 0:00:07
   TrainingLoss:  0.016141001976350034
-

Progress:  66%|███████████████████████████▎             |  ETA: 0:00:08
+

Progress:  66%|███████████████████████████▎             |  ETA: 0:00:07
   TrainingLoss:  0.015970433275107075
-

Progress:  67%|███████████████████████████▌             |  ETA: 0:00:08
+

Progress:  67%|███████████████████████████▌             |  ETA: 0:00:07
   TrainingLoss:  0.016120769218408826
-

Progress:  68%|███████████████████████████▊             |  ETA: 0:00:08
+

Progress:  68%|███████████████████████████▊             |  ETA: 0:00:07
   TrainingLoss:  0.015608063327593916
 

Progress:  68%|████████████████████████████             |  ETA: 0:00:07
   TrainingLoss:  0.015571504111262176
-

Progress:  69%|████████████████████████████▎            |  ETA: 0:00:07
+

Progress:  69%|████████████████████████████▎            |  ETA: 0:00:06
   TrainingLoss:  0.01535747458202367
-

Progress:  70%|████████████████████████████▋            |  ETA: 0:00:07
+

Progress:  70%|████████████████████████████▋            |  ETA: 0:00:06
   TrainingLoss:  0.015155005946412165
-

Progress:  70%|████████████████████████████▉            |  ETA: 0:00:07
+

Progress:  70%|████████████████████████████▉            |  ETA: 0:00:06
   TrainingLoss:  0.015021847326137547
-

Progress:  71%|█████████████████████████████▏           |  ETA: 0:00:07
+

Progress:  71%|█████████████████████████████▏           |  ETA: 0:00:06
   TrainingLoss:  0.01498047418815078
-

Progress:  72%|█████████████████████████████▍           |  ETA: 0:00:07
+

Progress:  72%|█████████████████████████████▍           |  ETA: 0:00:06
   TrainingLoss:  0.014843390995077163
 

Progress:  72%|█████████████████████████████▋           |  ETA: 0:00:06
   TrainingLoss:  0.014603739926525047
 

Progress:  73%|█████████████████████████████▉           |  ETA: 0:00:06
   TrainingLoss:  0.014603550598535745
-

Progress:  74%|██████████████████████████████▎          |  ETA: 0:00:06
+

Progress:  74%|██████████████████████████████▎          |  ETA: 0:00:05
   TrainingLoss:  0.01439023563791847
-

Progress:  74%|██████████████████████████████▌          |  ETA: 0:00:06
+

Progress:  74%|██████████████████████████████▌          |  ETA: 0:00:05
   TrainingLoss:  0.01423336734781801
-

Progress:  75%|██████████████████████████████▊          |  ETA: 0:00:06
+

Progress:  75%|██████████████████████████████▊          |  ETA: 0:00:05
   TrainingLoss:  0.014331711552141727
-

Progress:  76%|███████████████████████████████          |  ETA: 0:00:06
+

Progress:  76%|███████████████████████████████          |  ETA: 0:00:05
   TrainingLoss:  0.014013409330169263
 

Progress:  76%|███████████████████████████████▎         |  ETA: 0:00:05
   TrainingLoss:  0.01408242157975964
@@ -444,11 +410,11 @@
   TrainingLoss:  0.01408093688187484
 

Progress:  78%|███████████████████████████████▉         |  ETA: 0:00:05
   TrainingLoss:  0.013818558168160439
-

Progress:  78%|████████████████████████████████▏        |  ETA: 0:00:05
+

Progress:  78%|████████████████████████████████▏        |  ETA: 0:00:04
   TrainingLoss:  0.01367697398957804
-

Progress:  79%|████████████████████████████████▍        |  ETA: 0:00:05
+

Progress:  79%|████████████████████████████████▍        |  ETA: 0:00:04
   TrainingLoss:  0.013549440438040137
-

Progress:  80%|████████████████████████████████▋        |  ETA: 0:00:05
+

Progress:  80%|████████████████████████████████▋        |  ETA: 0:00:04
   TrainingLoss:  0.013426807806641008
 

Progress:  80%|████████████████████████████████▉        |  ETA: 0:00:04
   TrainingLoss:  0.013372290643131928
@@ -458,11 +424,11 @@
   TrainingLoss:  0.013238893272391144
 

Progress:  82%|█████████████████████████████████▊       |  ETA: 0:00:04
   TrainingLoss:  0.013278033298022615
-

Progress:  83%|██████████████████████████████████       |  ETA: 0:00:04
+

Progress:  83%|██████████████████████████████████       |  ETA: 0:00:03
   TrainingLoss:  0.013245051519894888
-

Progress:  84%|██████████████████████████████████▎      |  ETA: 0:00:04
+

Progress:  84%|██████████████████████████████████▎      |  ETA: 0:00:03
   TrainingLoss:  0.013154241194858857
-

Progress:  84%|██████████████████████████████████▋      |  ETA: 0:00:04
+

Progress:  84%|██████████████████████████████████▋      |  ETA: 0:00:03
   TrainingLoss:  0.013149573858188027
 

Progress:  85%|██████████████████████████████████▉      |  ETA: 0:00:03
   TrainingLoss:  0.012926247174580352
@@ -472,9 +438,9 @@
   TrainingLoss:  0.01272471105609572
 

Progress:  87%|███████████████████████████████████▋     |  ETA: 0:00:03
   TrainingLoss:  0.012702724975240678
-

Progress:  88%|████████████████████████████████████     |  ETA: 0:00:03
+

Progress:  88%|████████████████████████████████████     |  ETA: 0:00:02
   TrainingLoss:  0.01263513290271608
-

Progress:  88%|████████████████████████████████████▎    |  ETA: 0:00:03
+

Progress:  88%|████████████████████████████████████▎    |  ETA: 0:00:02
   TrainingLoss:  0.012752449197258367
 

Progress:  89%|████████████████████████████████████▌    |  ETA: 0:00:02
   TrainingLoss:  0.012606558868147044
@@ -488,7 +454,7 @@
   TrainingLoss:  0.012527577342140467
 

Progress:  92%|█████████████████████████████████████▉   |  ETA: 0:00:02
   TrainingLoss:  0.012315353593095175
-

Progress:  93%|██████████████████████████████████████▏  |  ETA: 0:00:02
+

Progress:  93%|██████████████████████████████████████▏  |  ETA: 0:00:01
   TrainingLoss:  0.012138139310187578
 

Progress:  94%|██████████████████████████████████████▍  |  ETA: 0:00:01
   TrainingLoss:  0.012166164573733653
@@ -502,7 +468,7 @@
   TrainingLoss:  0.011965496381345492
 

Progress:  97%|███████████████████████████████████████▊ |  ETA: 0:00:01
   TrainingLoss:  0.01206445045092103
-

Progress:  98%|████████████████████████████████████████ |  ETA: 0:00:01
+

Progress:  98%|████████████████████████████████████████ |  ETA: 0:00:00
   TrainingLoss:  0.011794511225930305
 

Progress:  98%|████████████████████████████████████████▍|  ETA: 0:00:00
   TrainingLoss:  0.011790251492423666
@@ -510,10 +476,10 @@
   TrainingLoss:  0.011915804680435628
 

Progress:  99%|████████████████████████████████████████▉|  ETA: 0:00:00
   TrainingLoss:  0.011769304365704187
-

Progress: 100%|█████████████████████████████████████████| Time: 0:00:21
+

Progress: 100%|█████████████████████████████████████████| Time: 0:00:19
   TrainingLoss:  0.011774418870212931

We can also plot the training errors against the epoch (here the $y$-axis is in log-scale):

using Plots
 p1 = plot(g_loss_array, xlabel="Epoch", ylabel="Training error", label="G-SympNet", color=3, yaxis=:log)
-plot!(p1, la_loss_array, label="LA-SympNet", color=2)
Example block output

The train function will change the parameters of the neural networks and gives an a vector containing the evolution of the value of the loss function during the training. Default values for the arguments ntraining and batch_size are respectively $1000$ and $10$.

The trainings data data_q and data_p must be matrices of $\mathbb{R}^{n\times d}$ where $n$ is the length of data and $d$ is the half of the dimension of the system, i.e data_q[i,j] is $q_j(t_i)$ where $(t_1,...,t_n)$ are the corresponding time of the training data.

Then we can make prediction. Let's compare the initial data with a prediction starting from the same phase space point using the provided function Iterate_Sympnet:

ics = (q=qp_data.q[:,1], p=qp_data.p[:,1])
+plot!(p1, la_loss_array, label="LA-SympNet", color=2)
Example block output

The train function will change the parameters of the neural networks and gives an a vector containing the evolution of the value of the loss function during the training. Default values for the arguments ntraining and batch_size are respectively $1000$ and $10$.

The trainings data data_q and data_p must be matrices of $\mathbb{R}^{n\times d}$ where $n$ is the length of data and $d$ is the half of the dimension of the system, i.e data_q[i,j] is $q_j(t_i)$ where $(t_1,...,t_n)$ are the corresponding time of the training data.

Then we can make prediction. Let's compare the initial data with a prediction starting from the same phase space point using the provided function Iterate_Sympnet:

ics = (q=qp_data.q[:,1], p=qp_data.p[:,1])
 
 steps_to_plot = 200
 
@@ -524,4 +490,4 @@
 using Plots
 p2 = plot(qp_data.q'[1:steps_to_plot], qp_data.p'[1:steps_to_plot], label="training data")
 plot!(p2, la_trajectory.q', la_trajectory.p', label="LA Sympnet")
-plot!(p2, g_trajectory.q', g_trajectory.p', label="G Sympnet")
Example block output

We see that GSympNet gives an almost perfect math on the training data whereas LASympNet cannot even properly replicate the training data. It also takes longer to train LASympNet.

+plot!(p2, g_trajectory.q', g_trajectory.p', label="G Sympnet")Example block output

We see that GSympNet gives an almost perfect math on the training data whereas LASympNet cannot even properly replicate the training data. It also takes longer to train LASympNet.