JuliaGNI · michakraus · Dec 4, 2024 · Oct 28, 2024 · Oct 30, 2024 · Oct 30, 2024
diff --git a/.githooks/pre-push b/.githooks/pre-push
@@ -0,0 +1,19 @@
+# pre-push git hook that runs all tests before pushing
+
+red='\033[0;31m'
+green='\033[0;32m'
+no_color='\033[0m'
+
+reponame=$(basename `git rev-parse --show-toplevel`)
+
+
+echo "\nRunning pre-push hook\n"
+echo "Testing $reponame"
+julia --project=@.  -e "using Pkg; Pkg.test(\"SymbolicNeuralNetworks\")"
+
+if [[ $? -ne 0 ]]; then
+        echo "\n${red}ERROR - Tests must pass before push!\n${no_color}"
+  exit 1
+fi
+
+echo "\n${green}Git hook was SUCCESSFUL!${no_color}\n"
diff --git a/.github/workflows/CI.yml b/.github/workflows/CI.yml
@@ -19,7 +19,6 @@ jobs:
       fail-fast: false
       matrix:
         version:
-          - '1.6'
           - '1.10'
           - '^1.11.0-0'
         os:

diff --git a/Project.toml b/Project.toml
@@ -5,22 +5,30 @@ version = "0.1.2"
 
 [deps]
 AbstractNeuralNetworks = "60874f82-5ada-4c70-bd1c-fa6be7711c8a"
-KernelAbstractions = "63c18a36-062a-441e-b654-da1e3ab1ce7c"
+Latexify = "23fbe1c1-3f47-55db-b15f-69d7ec21a316"
 LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
 RuntimeGeneratedFunctions = "7e49a35a-f44a-4d26-94aa-eba1b4ca6b47"
-SafeTestsets = "1bc83da4-3b8d-516f-aca4-4fe02f6d838f"
 Symbolics = "0c5d862f-8b57-4792-8d23-62f2024744c7"
 
 [compat]
-AbstractNeuralNetworks = "0.1, 0.3, 0.4"
-KernelAbstractions = "0.9"
+AbstractNeuralNetworks = "0.3, 0.4"
+Documenter = "1.8.0"
+ForwardDiff = "0.10.38"
+Latexify = "0.16.5"
 RuntimeGeneratedFunctions = "0.5"
 SafeTestsets = "0.1"
 Symbolics = "5, 6"
+Zygote = "0.6.73"
 julia = "1.6"
 
 [extras]
+Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
+ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
+Latexify = "23fbe1c1-3f47-55db-b15f-69d7ec21a316"
+Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
+SafeTestsets = "1bc83da4-3b8d-516f-aca4-4fe02f6d838f"
 Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
+Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
 
 [targets]
-test = ["Test"]
+test = ["Test", "ForwardDiff", "Random", "Documenter", "Latexify", "SafeTestsets", "Zygote"]
diff --git a/README.md b/README.md
@@ -6,51 +6,68 @@
 [![Coverage](https://codecov.io/gh/JuliaGNI/SymbolicNeuralNetworks.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/JuliaGNI/SymbolicNeuralNetworks.jl)
 [![PkgEval](https://JuliaCI.github.io/NanosoldierReports/pkgeval_badges/S/SymbolicNeuralNetworks.svg)](https://JuliaCI.github.io/NanosoldierReports/pkgeval_badges/S/SymbolicNeuralNetworks.html)
 
-SymbolicNeuralNetworks.jl was created to take advantage of [Symbolics.jl](https://symbolics.juliasymbolics.org/stable/) for training neural networks by accelarating their evaluation and by simplifing the computation of some derivatives of the neural network that may be needed for loss functions. This package is based on [AbstractNeuralNetwork.jl](https://github.com/JuliaGNI/AbstractNeuralNetworks.jl) and can be applied to [GeometricMachineLearning.jl](https://github.com/JuliaGNI/GeometricMachineLearning.jl). 
+In a perfect world we probably would not need `SymbolicNeuralNetworks`. Its motivation mainly comes from [`Zygote`](https://github.com/FluxML/Zygote.jl)'s inability to handle second-order derivatives in a decent way[^1]. We also note that if [`Enzyme`](https://github.com/EnzymeAD/Enzyme.jl) matures further, there may be no need for `SymoblicNeuralNetworks` anymore in the future. For now (December 2024) `SymbolicNeuralNetworks` offer a good way to incorporate derivatives into the loss function.
 
-To accelerate the evaluation of the neural network, we change its evaluation method with its code generated by [Symbolics.jl](https://symbolics.juliasymbolics.org/stable/), performs some otpmizations on it, and generate the associate function with [RuntimeGeneratedFunctions.jl](https://github.com/SciML/RuntimeGeneratedFunctions.jl).
+[^1]: In some cases it is possible to perform second-order differentiation with `Zygote`, but when this is possible and when it is not is not entirely clear. 
 
-One can easily symbolize its neural network which will create another neural networks with the symbolize method
+`SymbolicNeuralNetworks` was created to take advantage of [`Symbolics`](https://symbolics.juliasymbolics.org/stable/) for training neural networks by accelerating their evaluation and by simplifying the computation of arbitrary derivatives of the neural network. This package is based on [`AbstractNeuralNetwork`](https://github.com/JuliaGNI/AbstractNeuralNetworks.jl) and can be applied to [`GeometricMachineLearning`](https://github.com/JuliaGNI/GeometricMachineLearning.jl). 
+
+`SymbolicNeuralNetworks` creates a symbolic expression of the neural network, computes arbitrary combinations of derivatives and uses [`RuntimeGeneratedFunctions`](https://github.com/SciML/RuntimeGeneratedFunctions.jl) to compile a `Julia` function.
+
+To create a symbolic neural network, we first design a `model` with [`AbstractNeuralNetwork`](https://github.com/JuliaGNI/AbstractNeuralNetworks.jl):
 ```julia
-symbolize(neuralnet, dim)
+using AbstractNeuralNetworks
+
+c = Chain(Dense(2, 2, tanh), Linear(2, 1))
 ```
-where neuralnet is a neural network in the framework of  [AbstractNeuralNetwork.jl](https://github.com/JuliaGNI/AbstractNeuralNetworks.jl) and dim the dimension of the input.
 
-## Example
+We now call `SymbolicNeuralNetwork`:
 
-```Julia
+```julia
 using SymbolicNeuralNetworks
-using GeometricMachineLearning
-using Symbolics
 
-@variables sx[1:2]
-@variables nn(sx)[1:1]
-Dx1 = Differential(sx[1])
-Dx2 = Differential(sx[2])
-vectorfield = [0 1; -1 0] * [Dx1(nn[1]), Dx2(nn[1])]
-eqs = (x = sx, nn = nn, vectorfield = vectorfield)
+nn = SymbolicNeuralNetwork(c)
+```
 
-arch = HamiltonianNeuralNetwork(2)
-shnn = SymbolicNeuralNetwork(arch; eqs = eqs)
+## Example
 
-hnn = NeuralNetwork(arch, Float64)
-fun_vectorfield = functions(shnn).vectorfield
-```
+We now train the neural network by using `SymbolicPullback`[^2]:
 
-## Performance
+[^2]: This example is discussed in detail in the docs.
 
-Let see the performance to compute the vectorfield between SymbolicNeuralNetwork's version and Zygote's one:
-```Julia
-using Zygote
+```julia
+pb = SymbolicPullback(nn)
+
+using GeometricMachineLearning
 
-ω∇ₓnn(x, params) = [0 1; -1 0] * Zygote.gradient(x->hnn(x, params)[1], x)[1]
+# we generate the data and process them with `GeometricMachineLearning.DataLoader`
+x_vec = -1.:.1:1.
+y_vec = -1.:.1:1.
+xy_data = hcat([[x, y] for x in x_vec, y in y_vec]...)
+f(x::Vector) = exp.(-sum(x.^2))
+z_data = mapreduce(i -> f(xy_data[:, i]), hcat, axes(xy_data, 2))
 
-println("Comparison of performances between Zygote and SymbolicNeuralNetwork for ω∇ₓnn")
-x = [0.5, 0.8]
-@time ω∇ₓnn(x, hnn.params)[1]
-@time fun_vectorfield(x, hnn.params)
+dl = DataLoader(xy_data, z_data)
+
+nn_cpu = NeuralNetwork(c, CPU())
+o = Optimizer(AdamOptimizer(), nn_cpu)
+n_epochs = 1000
+batch = Batch(10)
+o(nn_cpu, dl, batch, n_epochs, pb.loss, pb)
 ```
 
-Let see another example of the training of a SympNet (an intrasec structure preserving architecture present in [GeometricMachineLearning.jl](https://github.com/JuliaGNI/GeometricMachineLearning.jl)) on an harmonic oscillator the data of which come from [GeometricProblem.jl](https://github.com/JuliaGNI/GeometricProblems.jl) :
+We can also train the neural network with `Zygote`-based[^3] automatic differentiation (AD):
 
+[^3]: Note that here we can actually use `Zygote` without problems as it does not involve any complicated derivatives.
 
+```julia
+pb_zygote = GeometricMachineLearning.ZygotePullback(FeedForwardLoss())
+o(nn_cpu, dl, batch, n_epochs, pb_zygote.loss, pb_zygote)
+```
+
+## Development
+
+We are using git hooks, e.g., to enforce that all tests pass before pushing. In order to activate these hooks, the following command must be executed once:
+```
+git config core.hooksPath .githooks
+```
diff --git a/docs/Project.toml b/docs/Project.toml
@@ -1,3 +1,9 @@
 [deps]
+AbstractNeuralNetworks = "60874f82-5ada-4c70-bd1c-fa6be7711c8a"
+CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
 Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
+GeometricMachineLearning = "194d25b2-d3f5-49f0-af24-c124f4aa80cc"
+Latexify = "23fbe1c1-3f47-55db-b15f-69d7ec21a316"
 SymbolicNeuralNetworks = "aed23131-dcd0-47ca-8090-d21e605652e3"
+Symbolics = "0c5d862f-8b57-4792-8d23-62f2024744c7"
+Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
diff --git a/docs/make.jl b/docs/make.jl
@@ -1,5 +1,9 @@
 using SymbolicNeuralNetworks
 using Documenter
+using Latexify: LaTeXString
+
+# taken from https://github.com/korsbo/Latexify.jl/blob/master/docs/make.jl
+Base.show(io::IO, ::MIME"text/html", l::LaTeXString) = l.s
 
 DocMeta.setdocmeta!(SymbolicNeuralNetworks, :DocTestSetup, :(using SymbolicNeuralNetworks); recursive=true)
 
@@ -13,9 +17,14 @@ makedocs(;
         canonical="https://JuliaGNI.github.io/SymbolicNeuralNetworks.jl",
         edit_link="main",
         assets=String[],
+        mathengine = MathJax3()
     ),
     pages=[
         "Home" => "index.md",
+        "Tutorials" => [
+            "Vanilla Symbolic Neural Network" => "symbolic_neural_networks.md",
+            "Double Derivative" => "double_derivative.md",
+            ],
     ],
 )
 

diff --git a/docs/src/double_derivative.md b/docs/src/double_derivative.md
@@ -0,0 +1,103 @@
+# Arbitrarily Combining Derivatives
+
+`SymbolicNeuralNetworks` can compute derivatives of arbitrary order of a neural network. For this we use two `struct`s:
+1. [`SymbolicNeuralNetworks.Jacobian`](@ref) and
+2. [`SymbolicNeuralNetworks.Gradient`](@ref).
+
+!!! info "Terminology"
+    Whereas the name `Jacobian` is standard for the matrix whose entries consist of all partial derivatives of the output of a function, the name `Gradient` is typically not used the way it is done here. Normally a *gradient* collects all the partial derivatives of a scalar function. In `SymbolicNeuralNetworks` the `struct` `Gradient` performs all partial derivatives of a symbolic array with respect to all the parameters of a neural network. So if we compute the `Gradient` of a matrix, then the corresponding routine returns *a matrix of neural network parameters*, each of which is the *standard gradient* of a matrix element. So it can be written as:
+    ```math
+    \mathtt{Gradient}\left( \begin{pmatrix} m_{11} & m_{12} & \cdots & m_{1m} \\ m_{21} & m_{22} & \cdots & m_{2m} \\ \vdots & \vdots & \vdots & \vdots \\ m_{n1} & m_{n2} & \cdots & m_{nm} \end{pmatrix} \right) = \begin{pmatrix} \nabla_{\mathbb{P}}m_{11} & \nabla_{\mathbb{P}}m_{12} & \cdots & \nabla_{\mathbb{P}}m_{1m} \\ \nabla_{\mathbb{P}}m_{21} & \nabla_{\mathbb{P}}m_{22} & \cdots & \nabla_{\mathbb{P}}m_{2m} \\ \vdots & \vdots & \vdots & \vdots \\ \nabla_{\mathbb{P}}m_{n1} & \nabla_{\mathbb{P}}m_{n2} & \cdots & \nabla_{\mathbb{P}}m_{nm} \end{pmatrix},
+    ```
+    where ``\mathbb{P}`` are the parameters of the neural network. For computational and consistency reasons each element ``\nabla_\mathbb{P}m_{ij}`` are `NeuralNetworkParameters`.
+
+## Jacobian of a Neural Network
+
+[`SymbolicNeuralNetworks.Jacobian`](@ref) differentiates a symbolic expression with respect to the input arguments of a neural network:
+
+```@example jacobian_gradient
+using AbstractNeuralNetworks
+using SymbolicNeuralNetworks
+using SymbolicNeuralNetworks: Jacobian, Gradient, derivative
+using Latexify: latexify
+
+c = Chain(Dense(2, 1, tanh; use_bias = false))
+nn = SymbolicNeuralNetwork(c)
+□ = Jacobian(nn)
+# we show the derivative with respect to 
+derivative(□) |> latexify
+```
+
+Note that the output of `nn` is one-dimensional and we use the convention
+
+```math
+\square_{ij} = [\mathrm{jacobian}_{x}f]_{ij} = \frac{\partial}{\partial{}x_j}f_i,
+```
+so the output has shape ``\mathrm{input\_dim}\times\mathrm{output\_dim} = 1\times2``:
+
+```@example jacobian_gradient
+@assert size(derivative(□)) == (1, 2) # hide
+size(derivative(□))
+```
+
+## Gradient of a Neural Network
+
+As described above [`SymbolicNeuralNetworks.Gradient`](@ref) differentiates every element of the array-valued output with respect to the neural network parameters:
+
+```@example jacobian_gradient
+using SymbolicNeuralNetworks: Gradient
+
+g = Gradient(nn)
+
+derivative(g)[1].L1.W |> latexify
+```
+
+## Double Derivatives
+
+We can easily differentiate a neural network twice by using [`SymbolicNeuralNetworks.Jacobian`](@ref) and [`SymbolicNeuralNetworks.Gradient`](@ref) together. We first use [`SymbolicNeuralNetworks.Jacobian`](@ref) to differentiate the network output with respect to its input:
+
+```@example jacobian_gradient
+using AbstractNeuralNetworks
+using SymbolicNeuralNetworks
+using SymbolicNeuralNetworks: Jacobian, Gradient, derivative
+using Latexify: latexify
+
+c = Chain(Dense(2, 1, tanh))
+nn = SymbolicNeuralNetwork(c)
+□ = Jacobian(nn)
+# we show the derivative with respect to 
+derivative(□) |> latexify
+```
+
+We see that the output is a matrix of size ``\mathrm{output\_dim} \times \mathrm{input\_dim}``. We can further compute the gradients of all entries of this matrix with [`SymbolicNeuralNetworks.Gradient`](@ref):
+
+```@example jacobian_gradient
+g = Gradient(derivative(□), nn)
+nothing # hide
+```
+
+So [`SymbolicNeuralNetworks.Gradient`](@ref) differentiates every element of the matrix with respect to all neural network parameters. In order to access the gradient of the first element of the neural network with respect to the weight `b` in the first layer, we write:
+
+```@example jacobian_gradient
+matrix_index = (1, 1)
+layer = :L1
+weight = :b
+derivative(g)[matrix_index...][layer][weight] |> latexify
+```
+
+If we now want to obtain an executable `Julia` function we have to use [`build_nn_function`](@ref). We call this function on:
+
+```math
+x = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \quad W = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}, \quad b = \begin{bmatrix} 0 \\ 0 \end{bmatrix}
+```
+
+```@example jacobian_gradient
+built_function = build_nn_function(derivative(g), nn.params, nn.input)
+
+x = [1., 0.]
+ps = NeuralNetworkParameters((L1 = (W = [1. 0.; 0. 1.], b = [0., 0.]), ))
+built_function(x, ps)[matrix_index...][layer][weight]
+```
+
+!!! info
+    With `SymbolicNeuralNetworks`, the `struct`s [`SymbolicNeuralNetworks.Jacobian`](@ref), [`SymbolicNeuralNetworks.Gradient`](@ref) and [`build_nn_function`](@ref) it is easy to build combinations of derivatives. This is much harder when using `Zygote`-based AD.
-Original file line number
+Diff line change
@@ Expand Up / @@ -19,7 +19,6 @@ jobs: @@
           fail-fast: false
           matrix:
             version:
-              - '1.6'
               - '1.10'
               - '^1.11.0-0'
             os:
@@ Expand Down @@