Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added docs for ROM, added docs for Cayley (and optimized routine), simplified calculation in attention layer. #76

Merged
merged 41 commits into from
Sep 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
f8018a3
Changed Adam Optimizer visualization to two-step process.
benedict-96 Sep 12, 2023
107e27f
Added description of multihead attention layer.
benedict-96 Sep 12, 2023
d658341
Got rid of the assign_matrix and assign_tensor functions. Hope this s…
benedict-96 Sep 12, 2023
7061da1
Option to use LogExpFunctions.softmax => doesn't seem to be working.
benedict-96 Sep 12, 2023
d460f00
Included this rescaling factor in the computation of the softmax.
benedict-96 Sep 12, 2023
86fc724
Now performing two plots (as in the paper).
benedict-96 Sep 12, 2023
4613a02
Added a tutorial and started describing the data loader.
benedict-96 Sep 13, 2023
955b005
Added the new files.
benedict-96 Sep 13, 2023
1e93672
Copied the Lux pipeline for now.
benedict-96 Sep 13, 2023
dd10a62
Started adding comments on the Cayley transform and the geodesic retr…
benedict-96 Sep 13, 2023
296247e
Added computation of cayley function we are using.
benedict-96 Sep 13, 2023
9a48c5b
Made Cayley retraction work.
benedict-96 Sep 13, 2023
0762657
Added an option to output the test accuracy (for image classification…
benedict-96 Sep 13, 2023
074aaed
Added new files for cayley and geodesic.
benedict-96 Sep 13, 2023
f5ccd9f
Added description of general retractions.
benedict-96 Sep 13, 2023
0e4d995
Added test for cayley retraction.
benedict-96 Sep 13, 2023
c388f1c
Added Cayley retraction to test.
benedict-96 Sep 13, 2023
5ee8f5d
Moved file to src.
benedict-96 Sep 13, 2023
81228e7
Changed path to tutorial.
benedict-96 Sep 13, 2023
596b4f5
Moved directory.
benedict-96 Sep 13, 2023
337d3a4
Fixed problem with Cayley transform.
benedict-96 Sep 13, 2023
4d6bdbc
Changed argument input name to lower case retraction.
benedict-96 Sep 13, 2023
6c1f3d9
Added a script to generate data as in the weakly symplectic paper.
benedict-96 Sep 14, 2023
2c4a71b
Added routine for assigning a batch in the amtrix case.
benedict-96 Sep 14, 2023
b3a4d1d
Added assign_tensor file.
benedict-96 Sep 14, 2023
b84b8ba
Added new constructor for matrix input.
benedict-96 Sep 14, 2023
2f43c39
Added new data loader functionality for autoencoders.
benedict-96 Sep 15, 2023
db76b11
Fixed typos. Trainign data now correct.
benedict-96 Sep 15, 2023
e4487d7
Script for training symplectic autoencoder.
benedict-96 Sep 15, 2023
a4c77d2
Started adding description(s) of autoencoder and symplectic autoencod…
benedict-96 Sep 17, 2023
d9ec7f5
Moved files to new directory.
benedict-96 Sep 18, 2023
2324392
Added solution manifold.
benedict-96 Sep 18, 2023
23bf602
Moved file.
benedict-96 Sep 18, 2023
bdc9387
Fixed test (forgot to divide a few routines with the sqrt factor befo…
benedict-96 Sep 18, 2023
b63d60f
Started adding description to struct TrainingData because this was ca…
benedict-96 Sep 18, 2023
07136aa
Added kolmogorov n-width.
benedict-96 Sep 18, 2023
6fe36b3
Added PSD and started adding symplectic autoencoders.
benedict-96 Sep 18, 2023
94108d1
Added short description of Kolmogorov n-width.
benedict-96 Sep 18, 2023
d0e1e61
Added short description of snapshot matrix.
benedict-96 Sep 18, 2023
782778e
Images for symplectic autoencoder and mha.
benedict-96 Sep 18, 2023
2f51853
Added image and some description for symplectic autoencoder.
benedict-96 Sep 18, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
steps:
- label: ":julia: Julia {{matrix.julia}} + CUDA GPU"
plugins:
- JuliaCI/julia#v1:
version: "{{matrix.julia}}"
- JuliaCI/julia-test#v1:
test_args: "--quickfail"
- JuliaCI/julia-coverage#v1:
codecov: true
dirs:
- src
- ext
agents:
queue: "juliagpu"
cuda: "*"
env:
GROUP: "CUDA"
if: build.message !~ /\[skip tests\]/
timeout_in_minutes: 240
matrix:
setup:
julia:
- "1"
- "1.6"
- "nightly"
adjustments:
- with:
julia: "1.6"
soft_fail: true
- with:
julia: "nightly"
soft_fail: true
- label: "Documentation"
plugins:
- JuliaCI/julia#v1:
version: "1"
- JuliaCI/julia-coverage#v1:
codecov: true
dirs:
- src
- ext
command: |
julia --project --code-coverage=user --color=yes --threads=3 -e '
println("--- :julia: Instantiating project")
using Pkg
Pkg.instantiate()
Pkg.activate("docs")
Pkg.develop(PackageSpec(path=pwd()))
Pkg.instantiate()
println("+++ :julia: Building documentation")

Pkg.activate("docs")
include("docs/tutorials.jl")
Pkg.activate("docs")
include("docs/make.jl")'
agents:
queue: "juliagpu"
cuda: "*"
env:
DATADEPS_ALWAYS_ACCEPT: true
JULIA_DEBUG: "Documenter"
GKSwstype: "100" # https://discourse.julialang.org/t/generation-of-documentation-fails-qt-qpa-xcb-could-not-connect-to-display/60988
if: build.message !~ /\[skip docs\]/ && !build.pull_request.draft
timeout_in_minutes: 240

env:
SECRET_CODECOV_TOKEN: "jQ0BMTQgyZx7QGyU0Q2Ec7qB9mtE2q/tDu0FsfxvEG7/zOAGvXkyXrzIFFOQxvDoFcP+K2+hYZKMxicYdNqzr5wcxu505aNGN2GM3wyegAr+hO6q12bCFYx6qXzU9FLCCdeqINqn9gUSSOlGtWNFrbAlrTyz/D4Yo66TqBDzvaLL63FMnhCLaXW/zJt3hNuEAJaPY2O6Ze1rX2WZ3Y+i+s3uQ8aLImtoCJhPe8CRx+OhuYiTzGhynFfGntZ0738/1RN4gNM0S/hTC4gLE7XMVBanJpGh32rFaiDwW4zAyXKBrDkL3QA3MS1RvLTJxGJ085S16hCk0C4ddAhZCvIM9Q==;U2FsdGVkX1+bXdFeKMs5G79catOCyby2n07A2fg0FjVAvrjQLZ0yfvDS4paJiFikLkodho0khz2YALKb2Y0K6w=="
SECRET_DOCUMENTER_KEY: "iRC4P/r5o9pARB670eK9jPlKQKgkTMDAyvp2GbLG8WwLuT8T1VcWx/o4+ofGlzbTh5Z+LuFgPXfgqkjGuoWLcocHNm78xQMNMywB4rcLB2shqp8xG2vhglgnTBBS4EiyPAtVqGyi5AKmfF95PfkJvnI0Lqg5P/RWQvNGywLAR0Ikgr/lqocm2CvkFGbpMzpGxGvj76JYOusVeKvGAp698TXqPabSZR2oZQLfYnEZnaO8ivkqvMGQSXfgzoIMjCOrN1rSa84SWeI9BDeBslzDHwaYGlvjpfCyviiLtKj4t5Acl1gVE0qxxZxWuALIU6z+C1W8TbW7ZDCBUFs6UTIT+Q==;U2FsdGVkX1+/HSgg1skLszz835vSO6mEtXMhG62ohQQUc5opdo7kEIAG2wCoJPQrqGyaF9kKDVvrN5G2MdjUyaLBYlv90RzXhjTiMNFdgI3M4K500xKq3itY/aEL7hUSMRKxTos8u4xhdbRboY4rPcqgtCJ2LHEjNxmml/NfEo/8lk291rGoEYQLTvKP9cuo4enmEVVRhqmabBzt1MDz0m4c8RufJWW2Ni4osaKRkYPjl/ijJ38wvRUZIiyCX7uofh+3iCKWn0111q5xFhn256Pm79Cx2ZP+yTp9sMsVNMJZ3UJ5r18F3H+zFHWWQSoiWpHn2WNB/2VUEyt0Lp1LnogKru96P2oYkXi6kqrA+qlLISUUU7R7ggJU0IRS6MjSGDyVzlaZG8m+RmY0bmQKrDwSeq1JMGkBpjwPY1o4yOnFRB7Rj1bzToLtd2IFSa8x0a2dUSyL5pBlyWklzZCxPp05R53RNSOi2KfhNfdZU2H7xEj5+z2aV5OidzowXIyYH8FlusMdk3NAOsvTbmBGiwvN4Zub9Exli06ZwARu/oJHLRh+hgOErIJ7DoX6nPrAtofSy6Etydpt+c4HkVZtGPWFSTMNWIGNx2NB1IfveOTU60H5emQ7zow5grXz4VTczqvCIh2hoQdSR4Oplr6+tDDLhtcGGHchHt473o2ygQ1m1tg7oSvMN7jmkUV1N6GniQofmlbr8d5LK4i/QtfC5GHCKIg3ohRlDvuvvKzvVWofgHX3NhXFTKK/CWAIp76iOaCWJcI562SpKyn+pFqYKpatJ42WfF3VbNpJYVMYMai5BwAE2RyZ6FhHbsaHq/NXO/dRJwHeDm4Pc/LFlGFdzpdbuf+w2DoePc56PlNmKsLNlZVlwbWcExKttI8nz3Th3aHNNtbIbD9awf1RdDspudQrTPWkyEopDVm7TkOj/J891U5p24PF5dasIJR19Tqpic3LVJuBXYRbL/Z79VRjeE3wBGLTDdhzJMA8TrS+yMSCF80bIw/F44o4WbA3Ya425mph9MIt/a137osRKATYqbustmVW/LfIyVhuHCOCRQsqTyFU+ff6Tp0EE2i1du90wosr+UutXiubYphCmuKkZONPbiXjpW1CAi40iAwxfgOVqAl13y4FlUp4EiGS7hPBUbvvEXMqT3ssfL+mlideH/v08PQCRcyG03zcCjCTmjXCggqHd+eEXhnsNZ4PFKCKiN+znR5SW+/p+kJTaBrX2e/kMU6kzjwb4NyNmZie0hHSneVtwJ1FuXJk/Zph4quv5KugCCx21xb5pePqxfKRW5jtW6r2Rc7OSNN4BHjwAcj8fOVV+12Ak7//o8mRh0aveYfoEvjCdaI8OPfjduDGfmzPUvXiqV9kGpovdlDUATyoVa3l1CowJ5r8KDOD6Ps89OG7TV2c7Wzxq2FQVjMFXxv/4wMZR1F/0zyH+ofPLVZjK3039z35GD4uoOW9Uc7WSr4FbxxuCDwOXWgstuk3rk6ASZFSe7RIwE/Y16d/aqzI+LG8pHqaEdhg6o6Y6JxBYNQo/JoglUOHwD+N5g5n9vfBNzf0xTlE/r0yjO3LCHyWzCnWr3QdKgzm6EDyL8GO+yQIbtXtw6lRQB/UEZ+ayt175r08Yhey95IsPwLVDFRRlG6pYwmzTlQOEwvqDI8SDMWboU+jp6a5jrbaAmqiIkaoiIzrV1QDp1x+Sqj0veqN+RtcpXLawJevz8dm76H+Mmp1br61nwvGcBaOKukICVj3iLeeu5tV5NoEJznWPwveHrcarZtKvOOeJbydmNAz286i0F1ocX337dt17jIkRv9sHbfqAVapob+eT7F3N/UY99GWGDVbXzaruQwsuPPR6MbLolG6buHQaKX3OZ/zJqGWfEAHw5yJKoKNe8aSgY2DsoITqPlbNRQQmOIMuF8ffD8L1stD/P5Ohth5Nql2W+l6y87/nqxkJ9y4FFS4QzrMrl9ztugfsRoYyeSWRydLUHlTCv155VsGAxjCMBQg1rP99Smfd02EbCFlWlypIw/zem0LZ1zVuz/Wjb03n+dzi2GIKRlTrt6YMrGGAcKI+3Pf1D0rsDhXNkdFUjOeofUkDbBr/splYCKLucDHFVdN88XyaQoj2fBymNJ4BqvK64TVOLwPGAQvh/rHZ5PkJR3lMI4fg+Kxdl9/5xDjkD9aV+yRvfqVGodNW/qofq34nrdb3co1tZ4BxtSANKdJg3Fv6U0I4DOMVsJTeOn/918M31rif0rKAwnHAkeyQVbZyEsFoqxvE8gUFs1zTRwZJWlmY0xnuVcM8pOh6hULeYGiF57ZlbvymygYqObe58YgrChRnF4NhKIIYzuz7mOSKRXqF3Cr0LNYHcktUH9wrqISxiHbaUQceYZ1D0q8UfiayeK9yppMkltcDUL9M93xjTGJK8pVzARXn6ETuEsNTtLvbU/KMDY7bnVc7n08suLCk1YeJB/sn0wuTbPt+27NeYIG1YXBEE0dsgJW4z64489h71v4xws856gFOHZx0L/nkW7l328HA3jltbgJFl52mQHAJwUZrt5sJef/k7gsTdX1zQtjKN8lFjo4qpvJUpenmO9nT+Wty5cjohlETBos8CdSqj4SjEu7/UhDt52evt33EayoWJ8TjKd4VRFYCXnM6eGnSMDqUU5f7DxVjrwHnT26jtq9ijKTiAxls7fYjN8TGT/S3CHZZAK1u5gSbWfkFOcE+mioboNwDvuvysjL6de+bsc7r35w4hLFnPmKemcde4pNQfEnuelBFJqwYZbcAkhN8AmtqIWPXBw9n3eUx/TJgMFEIoB/frNDRbB0WJKdBkjdE1NVvAUl3jDnZbWjG6rqE+6UvyGqKBpd0FRYAfg3ss3hVB70uluULKUBVazlNIQlqX+qYEMBXaDIkxcftre8KYebQyJnxiOB5V+eELvm6L28bK4Xh2tpXzJL7aDlQnL8dRNvQdZgDL62EXYhrc3mz0I/p7br3KMcnei/LaPRAgcsW7WKLwzE5id6JnpOJj4VXdkX7IUB4xQjDRsGKxhjbklMVFA8g/801khNlwzU/IoXsHBgTs7yZoFX/oo4Jyp514hwqPlvJEgci0OHiSA6Mx3le2nUh0SQH+AzFJ2vi7Bn1a4psiuqd+vJJ1iuNw5CBCZlV+GO8sG93BBGnLzZDoRvkIMbzwESFP3JYZ/lKs29CB2Adobl9YbwP3he0I9cD0A/RPC70gzTdVEfL6T4iPUhBr1Bn3YlUPeC2QvCTbpKkxDsfzchuq/y0xlmL4E7Rdb+4TSMlViXfnc6aoD9vvPMWLJFF2qrxRLKhUTse5V6RoE+EVmHSiX0Vd7sd/bYp7asOC0b1xL+zjfJ5DSrtMA/P8L1p+CoLNXgVfgzCB3sCa+GLSLS2INsL1Qtnfkl8IGaMDeV+VAyHjY0HCj0l1X99f/RzD6TYrZAkLS8h1EM/JjomglhVG9/HTKS20BBJeos5ifrVd38rhONJy0HCP28pn4rCIyIE4bNG+1tEsHAg4FDYgh/OYuBsaGYgha9TGV5lGIxmVCECq3IPpkPN1CsLqv3KuDvNeH6XOOAzVtFj4VoIV6QgRLP8+94ZiiEDaPQxQ7BZoqrqFYrxWHDtEuon46VtQ3Nfq/1Rq/HvszJv6JE77w7qvKlxG9sXgxzCDRqNrG83cwY2hpDBr8U0hPMrEx977Weja1aG/rG6uirNBcY5qAAOLDo+9RvV1xqvWFF8SkT97tzNUHbzw8tuUlCT9m4rshCG+jBw59rpUZwW+eR1ih9qU7Nyr3oNgi/zmkORF1duym8VSfW5dxtRBIqxxM0oSWoHti+HSd0VLdHw8jRpbQddMBr1sjD1jIgp3w2dU4oEthzStKCPY2/lAWBm+1Es1okGhEM3I939DRcYOjfJnTCtJLJ9DTKycVDMerXvHnCgImZ0Oh4mtLF+63hn+9wUc56owFeNqs+NJHqmBBFX2uNr/Rj9mzYkRRPsYYSyCB7jIS+Z8Zall6W3dwLcsE3uw/oPKx5bJDAhnp7kZgzLC0zlS2D0ZcNZuW2uUtwhZJM6OOyV+FUFgizmpIQAQ8Nm6n/1yk0asB4jZFf221a9ZmzvUfWKmmIR7OxX3qBH9x2uMMhemv9LZdEHMcjTeIXRYciMLWUNeWagYhDgV1cRBGCDTh2EhHvYX7ZXfpsHjLOR+sAEr7uR3siitf/mRkiLfT2YBgTACKKoj05UuC8aknEV4T5bWiye+gKGioml5G/fWYHyHow37g6D84n0cBTWmI0oPlg+rqpeRLOeYaTeCXOtM/7M1FHuGvzmBnag2vhKY2tpjVrg2nI3p4SRlzTyoQkyMfRXN87v5nAheVcLgrYtkv9aX7R6VMZ1UIsxn62ZHFa2IR6skB/xw7RRuJY5r5FIWs1LqIQDaon5L4C4v9rnBxMYoUM"
14 changes: 14 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,26 @@ makedocs(;
"Horizontal Lift" => "optimizers/manifold_related/horizontal_lift.md",
"Global Sections" => "optimizers/manifold_related/global_sections.md",
"Retractions" => "optimizers/manifold_related/retractions.md",
"Geodesic Retraction" => "optimizers/manifold_related/geodesic.md",
"Cayley Retraction" => "optimizers/manifold_related/cayley.md",
"Adam Optimizer" => "optimizers/adam_optimizer.md",
],
"Special Neural Network Layers" => [
"Attention" => "layers/attention_layer.md",
"Multihead Attention" => "layers/multihead_attention_layer.md",
],
"Data Loader" =>[
"Routines" => "data_loader/data_loader.md",
"Snapshot matrix" => "data_loader/snapshot_matrix.md",
],
"Reduced Order Modelling" =>[
"POD and Autoencoders" => "reduced_order_modeling/autoencoder.md",
"PSD and Symplectic Autoencoders" => "reduced_order_modeling/symplectic_autoencoder.md",
"Kolmogorov n-width" => "reduced_order_modeling/kolmogorov_n_width.md",
],
"Tutorials" =>[
"MNIST" => "tutorials/mnist_tutorial.md",
],
"Library" => "library.md",
],
)
Expand Down
8 changes: 8 additions & 0 deletions docs/src/data_loader/data_loader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Data Loader

`GeometricMachineLearning` provides flexible routines to load and manage data for training neural networks.
`DataLoader` has several constructors:

1. If provided with a tensor, then it assumes the first axis is the system dimension, the second axis is the dimension of the parameter space, and the third axis gives the time evolution of the system.

2. If provided with a tensor and a vector, it assumes the data are related to a classification task.
15 changes: 15 additions & 0 deletions docs/src/data_loader/snapshot_matrix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Snapshot matrix

The snapshot matrix stores solutions of the high-dimensional ODE (obtained from discretizing a PDE). This is then used to construct [reduced bases](../reduced_order_modeling/autoencoder.md) in a data-driven way. So (for a single parameter[^1]) the snapshot matrix takes the following form:

[^1]: If we deal with a parametrized PDE then there are **two stages** at which the snapshot matrix has to be processed: the offline stage and the online stage.

```math
M = \left[\begin{array}{c:c:c:c}
\hat{u}_1(t_0) & \hat{u}_1(t_1) & \quad\ldots\quad & \hat{u}_1(t_f) \\
\hat{u}_2(t_0) & \hat{u}_2(t_1) & \ldots & \hat{u}_2(t_f) \\
\hat{u}_3(t_0) & \hat{u}_3(t_1) & \ldots & \hat{u}_3(t_f) \\
\ldots & \ldots & \ldots & \ldots \\
\hat{u}_{2N}(t_0) & \hat{u}_{2N}(t_1) & \ldots & \hat{u}_{2N}(t_f) \\
\end{array}\right].
```
Binary file modified docs/src/images/adam_optimizer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/src/images/adam_optimizer.tex
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
\coordinate[right of=R3, xshift=.5cm] (addition);
\node[fit=(R2)(cache)(R3)(R4)(addition),draw, ultra thick, rounded corners, label=below:$\mathtt{optimization\_step(o, ::}\mathbb{R}^N\mathtt{,::}\mathbb{R}^N\mathtt{)}$] (optimization_step) {};

\draw[->] (R2) -- (R3) node[pos=.5, sloped, below] {Adam};
\draw[->] (R2) -- (cache) node[pos=.5, sloped, below] {Adam};
\draw[->] (cache) -- (R3) node[pos=.5, sloped, above] {Adam};
\draw[->] (R3) -- (R4) node[pos=.5, right] {Addition};
\draw[->, mgreen] (R1) -- (R2) node[pos=.5, left] {\color{mgreen}AD};
Expand Down
Binary file modified docs/src/images/general_optimization.pdf
Binary file not shown.
Binary file modified docs/src/images/general_optimization.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/src/images/general_optimization.tex
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

\draw[->] (TYM) -- (ghorY) node[pos=.5, left] {$\Omega$};
\draw[->, mred] (ghorY) -- (ghor);
\draw[->] (ghor) -- (ghor2) node[pos=.5, sloped, below] {Adam};
\draw[->] (ghor) -- (cache) node[pos=.5, sloped, below] {Adam};
\draw[->] (cache) -- (ghor2) node[pos=.5, sloped, above] {Adam};
\draw[->] (ghor2) -- (M) node[pos=.5, right] {Retraction};
\end{tikzpicture}
Expand Down
Binary file modified docs/src/images/general_optimization_with_boundary.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/src/images/general_optimization_with_boundary.tex
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@

\draw[->] (TYM) -- (ghorY) node[pos=.5, left] {$\Omega$};
\draw[->, mred] (ghorY) -- (ghor);
\draw[->] (ghor) -- (ghor2) node[pos=.5, sloped, below] {Adam};
\draw[->] (ghor) -- (cache) node[pos=.5, sloped, below] {Adam};
\draw[->] (cache) -- (ghor2) node[pos=.5, sloped, above] {Adam};
\draw[->] (ghor2) -- (M) node[pos=.5, right] {Retraction};
\draw[->, mgreen] (M2) -- (Euc) node[pos=.5, left] {\color{mgreen}AD};
Expand Down
Binary file added docs/src/images/mha.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/src/images/solution_manifold_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/src/images/symplectic_autoencoder.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 21 additions & 1 deletion docs/src/layers/multihead_attention_layer.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,27 @@
# Multihead Attention Layer

In order to arrive from the [attention layer](attention_layer.md) at the **multihead attention layer** we only have to do a simple modification:
In order to arrive from the [attention layer](attention_layer.md) at the **multihead attention layer** we have to do a few modifications:

Note that these neural networks were originally developed for natural language processing (NLP) tasks and the terminology used here bears some resemblance to that field.
The input to a multihead attention layer typicaly comprises three components:

1. Values $V\in\mathbb{R}^{n\times{}T}$: a matrix whose columns are **value vectors**,
2. Queries $Q\in\mathbb{R}^{n\times{}T}$: a matrix whose columns are **query vectors**,
3. Keys $K\in\mathbb{R}^{n\times{}T}$: a matrix whose columns are **key vectors**.

Regular attention performs the following operation:

```math
\mathrm{Attention}(Q,K,V) = V\mathrm{softmax}(\frac{K^TQ}{\sqrt{n}}),
```

where $n$ is the dimension of the vectors in $V$, $Q$ and $K$. The softmax activation function here acts column-wise, so it can be seen as a transformation $\mathrm{softmax}:\mathbb{R}^{T}\to\mathbb{R}^T$ with $[\mathrm{softmax}(v)]_i = e^{v_i}/\left(\sum_{j=1}e^{v_j}\right)$. The $K^TQ$ term is a similarity matrix between the queries and the vectors.

The transformer contains a **self-attention mechanism**, i.e. takes an input $X$ and then transforms it linearly to $V$, $Q$ and $K$, i.e. $V = P^VX$, $Q = P^QX$ and $K = P^KX$. What distinguishes the multihead attention layer from the singlehead attention layer, is that there is not just one $P^V$, $P^Q$ and $P^K$, but there are several: one for each **head** of the multihead attention layer. After computing the individual values, queries and vectors, and after applying the softmax, the outputs are then concatenated together in order to obtain again an array that is of the same size as the input array:

![](../images/mha.png)

Here the various $P$ matrices can be interpreted as being projections onto lower-dimensional subspaces, hence the designation by the letter $P$. Because of this interpretation as projection matrices onto smaller spaces that should **capture features in the input data** it makes sense to constrain these elements to be part of the Stiefel manifold.

## References
- Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).
52 changes: 52 additions & 0 deletions docs/src/optimizers/manifold_related/cayley.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# The Cayley Retraction

The Cayley transformation is one of the most popular retractions. For several matrix Lie groups it is a mapping from the Lie algebra $\mathfrak{g}$ onto the Lie group $G$.
They Cayley retraction reads:

```math
\mathrm{Cayley}(C) = \left(\mathbb{I} -\frac{1}{2}C\right)^{-1}\left(\mathbb{I} +\frac{1}{2}C\right).
```
This is easily checked to be a retraction, i.e. $\mathrm{Cayley}(\mathbb{O}) = \mathbb{I}$ and $\frac{\partial}{\partial{}t}\mathrm{Cayley}(tC) = C$.

What we need in practice is not the computation of the Cayley transform of an arbitrary matrix, but the Cayley transform of an element of $\mathfrak{g}^\mathrm{hor}$, the [global tangent space representation](../../arrays/stiefel_lie_alg_horizontal.md).

The elements of $\mathfrak{g}^\mathrm{hor}$ can be written as:

```math
C = \begin{bmatrix}
A & -B^T \\
B & \mathbb{O}
\end{bmatrix} = \begin{bmatrix} \frac{1}{2}A & \mathbb{I} \\ B & \mathbb{O} \end{bmatrix} \begin{bmatrix} \mathbb{I} & \mathbb{O} \\ \frac{1}{2}A & -B^T \end{bmatrix},
```

where the second expression exploits the sparse structure of the array, i.e. it is a multiplication of a $N\times2n$ with a $2n\times{}N$ matrix. We can hence use the **Sherman-Morrison-Woodbury formula** to obtain:

```math
(\mathbb{I} - \frac{1}{2}UV)^{-1} = \mathbb{I} + \frac{1}{2}U(\mathbb{I} - \frac{1}{2}VU)^{-1}V
```

So what we have to invert is the term

```math
\mathbb{I} - \frac{1}{2}\begin{bmatrix} \mathbb{I} & \mathbb{O} \\ \frac{1}{2}A & -B^T \end{bmatrix}\begin{bmatrix} \frac{1}{2}A & \mathbb{I} \\ B & \mathbb{O} \end{bmatrix} =
\begin{bmatrix} \mathbb{I} - \frac{1}{4}A & - \frac{1}{2}\mathbb{I} \\ \frac{1}{2}B^TB - \frac{1}{8}A^2 & \mathbb{I} - \frac{1}{4}A \end{bmatrix}.
```

The whole cayley transform is then:

```math
\left(\mathbb{I} + \frac{1}{2}\begin{bmatrix} \frac{1}{2}A & \mathbb{I} \\ B & \mathbb{O} \end{bmatrix} \begin{bmatrix} \mathbb{I} - \frac{1}{4}A & - \frac{1}{2}\mathbb{I} \\ \frac{1}{2}B^TB - \frac{1}{8}A^2 & \mathbb{I} - \frac{1}{4}A \end{bmatrix}^{-1} \begin{bmatrix} \mathbb{I} & \mathbb{O} \\ \frac{1}{2}A & -B^T \end{bmatrix} \right)\left( E + \frac{1}{2}\begin{bmatrix} \frac{1}{2}A & \mathbb{I} \\ B & \mathbb{O} \end{bmatrix} \begin{bmatrix} \mathbb{I} \\ \frac{1}{2}A \end{bmatrix}\ \right) = \\

E + \frac{1}{2}\begin{bmatrix} \frac{1}{2}A & \mathbb{I} \\ B & \mathbb{O} \end{bmatrix}\left(
\begin{bmatrix} \mathbb{I} \\ \frac{1}{2}A \end{bmatrix} +
\begin{bmatrix} \mathbb{I} - \frac{1}{4}A & - \frac{1}{2}\mathbb{I} \\ \frac{1}{2}B^TB - \frac{1}{8}A^2 & \mathbb{I} - \frac{1}{4}A \end{bmatrix}^{-1}\left(

\begin{bmatrix} \mathbb{I} \\ \frac{1}{2}A \end{bmatrix} +
\begin{bmatrix} \frac{1}{2}A \\ \frac{1}{4}A^2 - \frac{1}{2}B^TB \end{bmatrix}

\right)
\right)
```


Note that for computational reason we compute $\mathrm{Cayley}(C)E$ instead of just the Cayley transform (see the section on [retractions](retractions.md)).
3 changes: 3 additions & 0 deletions docs/src/optimizers/manifold_related/geodesic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Geodesic Retraction

General **retractions** are approximations of the exponential map. In `GeometricMachineLearning` we can, instead of using an approximation, solve the geodesic equation exactly (up to numerical error) by specifying `Geodesic()` as the argument of layers that have manifold weights.
11 changes: 11 additions & 0 deletions docs/src/optimizers/manifold_related/retractions.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
# Retractions

## Classical Definition
Classically, retractions are defined as maps smooth maps

```math
R: T\mathcal{M}\to\mathcal{M}:(x,v)\mapsto{}R_x(v)
```

such that each curve $c(t) := R_x(tv)$ satisfies $c(0) = x$ and $c'(0) = v$.

## In `GeometricMachineLearning`

Retractions are a map from the **horizontal component** of the Lie algebra $\mathfrak{g}^\mathrm{hor}$ to the respective manifold.

For optimization in neural networks (almost always first order) we solve a gradient flow equation
Expand Down
Loading