-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #150 from JuliaGNI/linear_symplectic_transformer
Linear symplectic transformer
- Loading branch information
Showing
54 changed files
with
1,348 additions
and
160 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Linear Symplectic Transformer | ||
|
||
The linear symplectic transformer consists of a combination of [linear symplectic attention](@ref "Linear Symplectic Attention") and [gradient](@ref "SympNet Gradient Layer") layers and is visualized below: | ||
|
||
```@example | ||
Main.include_graphics("../tikz/linear_symplectic_transformer"; caption = raw"Visualization of the linear symplectic transformer architecutre. \texttt{n\_sympnet} refers to the number of SympNet layers (\texttt{n\_sympnet=2} in this figure) and \texttt{L} refers to the number of transformer blocks (\texttt{L=1} in this figure).", width = .3) # hide | ||
``` | ||
|
||
## Library Functions | ||
|
||
```@docs; canonical=false | ||
LinearSymplecticTransformer | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# Neural Network Integrators | ||
|
||
In `GeometricMachineLearning` we can divide most neural network architectures (that are used for applications to physical systems) into two categories: autoencoders and integrators. *Integrator* in its most general form refers to an approximation of the flow of an ODE (see [the section on the existence and uniqueness theorem](@ref "The Existence-And-Uniqueness Theorem")) by a numerical scheme. Traditionally these numerical schemes were constructed by defining certain relationships between a known time step ``z^{(t)}`` and a future unknown one ``z^{(t+1)}`` [hairer2006geometric, leimkuhler2004simulating](@cite): | ||
|
||
```math | ||
f(z^{(t)}, z^{(t+1)}) = 0. | ||
``` | ||
|
||
One usually refers to such a relationship as an "integration scheme". If this relationship can be reformulated as | ||
|
||
```math | ||
z^{(t+1)} = g(z^{(t)}), | ||
``` | ||
|
||
then we refer to the scheme as *explicit*, if it cannot be reformulated in such a way then we refer to it as *implicit*. Implicit schemes are typically more expensive to solve than explicit ones. The `Julia` library `GeometricIntegrators` [Kraus:2020:GeometricIntegrators](@cite) offers a wide variety of integration schemes both implicit and explicit. | ||
|
||
The neural network integrators in `GeometricMachineLearning` (the corresponding type is [`NeuralNetworkIntegrator`](@ref)) are all explicit integration schemes where the function ``g`` above is modeled with a neural network. | ||
|
||
Neural networks, as an alternative to traditional methods, are employed because of (i) potentially superior performance and (ii) an ability to learn unknown dynamics from data. | ||
|
||
## Multi-step methods | ||
|
||
*Multi-step method* [feng1987symplectic, ge1988approximation](@cite) refers to schemes that are of the form[^1]: | ||
|
||
[^1]: We again assume that all the steps up to and including ``t`` are known. | ||
|
||
```math | ||
f(z^{(t - \mathtt{sl} + 1)}, z^{(t - \mathtt{sl} + 2)}, \ldots, z^{(t)}, z^{(t + 1)}, \ldots, z^{(\mathtt{pw} + 1)}) = 0, | ||
``` | ||
where `sl` is short for *sequence length* and `pw` is short for *prediction window*. In contrast to traditional single-step methods, `sl` and `pw` can be greater than 1. An explicit multi-step method has the following form: | ||
|
||
```math | ||
[z^{(t+1)}, \ldots, z^{(t+\mathtt{pw})}] = g(z^{(t - \mathtt{sl} + 1)}, \ldots, z^{(t)}). | ||
``` | ||
|
||
There are essentially two ways to construct multi-step methods with neural networks: the older one is using recurrent neural networks such as long short-term memory cells (LSTMs, [hochreiter1997long](@cite)) and the newer one is using transformer neural networks [vaswani2017attention](@cite). Both of these approaches have been successfully employed to learn multi-step methods (see [fresca2021comprehensive, lee2020model](@cite) for the former and [hemmasian2023reduced, solera2023beta, brantner2024volume](@cite) for the latter), but because the transformer architecture exhibits superior performance on modern hardware and can be imbued with geometric properties it is recommended to always use a transformer-derived architecture when dealing with time series[^2]. | ||
|
||
[^2]: `GeometricMachineLearning` also has an LSTM implementation, but this may be deprecated in the future. | ||
|
||
Explicit multi-step methods derived from he transformer are always subtypes of the type [`TransformerIntegrator`](@ref) in `GeometricMachineLearning`. In `GeometricMachineLearning` the [standard transformer](@ref "Standard Transformer"), the [volume-preserving transformer](@ref "Volume-Preserving Transformer") and the [linear symplectic transformer](@ref "Linear Symplectic Transformer") are implemented. | ||
|
||
## Library Functions | ||
|
||
```@docs; canonical=false | ||
NeuralNetworkIntegrator | ||
TransformerIntegrator | ||
``` | ||
|
||
## References | ||
|
||
```@bibliography | ||
Pages = [] | ||
Canonical = false | ||
hairer2006geometric | ||
leimkuhler2004simulating | ||
Kraus:2020:GeometricIntegrators | ||
feng1998step | ||
hochreiter1997long | ||
vaswani2017attention | ||
fresca2021comprehensive | ||
lee2020model | ||
hemmasian2023reduced | ||
solera2023beta | ||
brantner2024volume | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Standard Transformer | ||
|
||
The transformer is a relatively modern neural network architecture [vaswani2017attention](@cite) that has come to dominate the field of natural language processing (NLP, [patwardhan2023transformers](@cite)) and replaced the previously dominant long-short term memory cells (LSTM, [hochreiter1997long](@cite)). Its success is due to a variety of factors: | ||
- unlike LSTMs it consists of very simple building blocks and hence is easier to interpret mathematically, | ||
- it is very flexible in its application and the data it is fed with do not have to conform to a rigid pattern, | ||
- transformers utilize modern hardware (especially GPUs) very effectively. | ||
|
||
The transformer architecture is sketched below: | ||
|
||
```@example | ||
Main.include_graphics("../tikz/transformer_encoder") # hide | ||
``` | ||
|
||
It is nothing more than a combination of a [multihead attention layer](@ref "Multihead Attention") and a residual neural network[^1] (ResNet). | ||
|
||
[^1]: A ResNet is nothing more than a neural network to whose output we again add the input, i.e. every ResNet is of the form ``\mathrm{ResNet}(x) = x + \mathcal{NN}(x)``. | ||
|
||
## Library Functions | ||
|
||
```@docs; canonical=false | ||
StandardTransformerIntegrator | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Volume-Preserving Feedforward Neural Network | ||
|
||
## Neural network architecture | ||
|
||
The constructor produces the following architecture[^1]: | ||
|
||
[^1]: Based on the input arguments `n_linear` and `n_blocks`. In this example `init_upper` is set to false, which means that the first layer is of type *lower* followed by a layer of type *upper*. | ||
|
||
```@example | ||
Main.include_graphics("../tikz/vp_feedforward") # hide | ||
``` | ||
|
||
Here *LinearLowerLayer* performs ``x \mapsto x + Lx`` and *NonLinearLowerLayer* performs ``x \mapsto x + \sigma(Lx + b)``. The activation function ``\sigma`` is the forth input argument to the constructor and `tanh` by default. | ||
|
||
## Note on Sympnets | ||
|
||
As [SympNets](@ref "SympNet Architecture") are symplectic maps, they also conserve phase space volume and therefore form a subcategory of volume-preserving feedforward layers. | ||
|
||
## Library Functions | ||
|
||
```@docs; canonical=false | ||
VolumePreservingFeedForward | ||
``` |
Oops, something went wrong.