Large reorganization of the paper (#93)

* Add figure for discrete adjoints * [build-tex] Quantum application and comments on symbolic diff Co-authored-by: frankschae <[email protected]> Co-authored-by: facusapienza21 <facusapienza21> * Add workflow distpatch * add workflow distpatch * [build-bib] * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * Updates from Overleaf * [build-bib] Section on Symbolic Differentiation * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * [build-bib] Changes in AD VJP connection * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * Updates from Overleaf * small changes in introduction * updated figure on VJP * Mention on Stressen algorithm for matrix multiplication * [build-bib] Reorganize scientific motivation section * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * [build-bib] Reorganization of adjoints and software + appendix on stepsizing * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * Small push to sync with main --------- Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: frankschae <[email protected]> Co-authored-by: Author <[email protected]>
ODINN-SciML · Jan 23, 2024 · 3cd1bef · 3cd1bef
1 parent 0c96e43
commit 3cd1bef
Show file tree

Hide file tree

Showing 28 changed files with 22,628 additions and 16,707 deletions.
diff --git a/.github/workflows/biblatex.yml b/.github/workflows/biblatex.yml
@@ -1,5 +1,9 @@
 name: Build BibLaTeX file
-on: [push]
+on:
+  push:
+    branches:
+      - main
+  workflow_dispatch:
 jobs:
   build_latex:
     runs-on: ubuntu-latest

diff --git a/.github/workflows/latex.yml b/.github/workflows/latex.yml
@@ -7,6 +7,7 @@ on:
     workflows: ["Build BibLaTeX file"]
     types:
       - completed
+  workflow_dispatch:
 jobs:
   build_latex:
     runs-on: ubuntu-latest

diff --git a/tex/appendix/AD-norm.tex b/tex/appendix/AD-norm.tex
@@ -0,0 +1,83 @@
+\label{appendix:dual-number-solver}
+
+In this section, we are going to consider certain errors that can potentially show up when combining AD with numerical solver. 
+Numerical solvers for differential equations usually define an error term computed as \cite{hairer-solving-1, Rackauckas_Nie_2016}
+\begin{equation}
+    \text{Err}_\text{scaled}^{n+1}
+    =
+    \left( \frac{1}{n} \sum_{i=1}^n \left( \frac{\text{err}_i^{n+1}}{atol + rtol \max (u_i^{n_1}, \hat u_i^{n+1})} \right)^2 \right)^{\frac{1}{2}}
+    \label{eq:internal-norm-wrong}
+\end{equation}
+where $\text{err}_i^{n+1} = u_i^{n+1} - \hat u_i^{n+1}$ is the estimated local error.
+Different strategies for computing the error exist, for example they can be based on two approximation to the solution or in theoretical upper bounds provided by the numerical solver. 
+The choice of the norm $\| \cdot \|_2 / length(\cdot)$ has been the standard for a long time\cite{Ranocha_Dalcin_Parsani_Ketcheson_2022} and it is based on the assumption that if we increase the size of the systems of ODEs this should not affect the stepsize choice (imagine for example if we just duplicate the equations), but other options can be considered \cite{hairer-solving-1}.   
+% cite Ranocha, 
+% Chris on https://discourse.julialang.org/t/default-norm-used-in-ode-error-control/70995/4 mention to this norm to be the defaul since 70s
+
+The goal of a good stepize controller is to pick $\Delta t_{n+1}$ as large as possible (so the solver requires less total steps) at the same time that $\text{Err}_\text{scaled} \leq 1$. 
+One of the most used methods is the proportional-integral controller (PIC) that updates the stepsize according to \cite{Ranocha_Dalcin_Parsani_Ketcheson_2022}
+\begin{equation}
+    \Delta t_{n+1} = \eta \, \Delta t_n
+    \qquad 
+    \eta = w_{n+1}^{\beta_1 / k} w_n^{\beta_2 / k} w_{n-1}^{\beta_3 / k}
+    \label{eq:PIC}
+\end{equation}
+with $w_{n+1} = 1 / \text{Err}_\text{scaled}^{n+1}$ the inverse of the scaled error estimates, and ...
+If the stepsize $\Delta t_{n+1}$ proposed in Equation \eqref{eq:PIC} to update from $u^{n+1}$ to $u^{n+2}$ does not satisfy $\text{Err}_\text{scaled}^{n+2} \leq 1$, a new smaller stepsize is proposed. 
+When $\eta < 1$ (which is the case for simple controllers with $\beta_2 = \beta_3 = 0$), Equation \eqref{eq:PIC} can be used for the local update. 
+It is also common to restrict $\eta \in [\eta_\text{min}, \eta_\text{max}]$ so the stepsize does not change abruptly \cite{hairer-solving-1}. 
+
+When performing forward AD on top of a numerical solver, the error used in the stepsize controller needs to naturally account for the errors induced in the numerical solution of the original ODE and also the errors in the dual component carrying the value of the derivative. 
+This relation between the numerical solver and AD has been made explicit when we presented the relationship between forward AD and the sensitivity equations (Section \ref{section:sensitivity-equation}, Equation \eqref{eq:sensitivity-equation-AD}). 
+To illustrate this, let us consider the simple ODE example from Section \ref{section:software-Forward-AD}, consisting in the system of equations 
+... 
+Notice that this ODE has a simple analytical solution $u_1(t) = u_2(t) = 0$ for all times $t$, making this problem very simple to solve numerically.
+The following code solves for the derivative with respect to the parameter $a$ using two different methods \footnote{Full code available at ...}. 
+The second method using forward AD with dual numbers declares the \texttt{internalnorm} argument according to Equation \eqref{eq:internal-norm-wrong}.
+\begin{jllisting}
+using SciMLSensitivity, OrdinaryDiffEq, Zygote, ForwardDiff
+
+function fiip(du, u, p, t)
+    du[1] =  p[1] * u[1] - u[1] * u[2]
+    du[2] = -p[1] * u[2] + u[1] * u[2]
+end
+
+p = [1.]
+u0 = [1.0;1.0]
+prob = ODEProblem(fiip, u0, (0.0, 10.0), p);
+
+# Correct gradient computed using 
+grad0 = Zygote.gradient(p->sum(solve(prob, Tsit5(), u0=u0, p=p, sensealg = ForwardSensitivity(), saveat = 0.1, abstol=1e-12, reltol=1e-12)), p)
+# grad0 = ([212.71042521681443],)
+
+# Original AD with wrong norm 
+grad1 = Zygote.gradient(p->sum(solve(prob, Tsit5(), u0=u0, p=p, sensealg = ForwardDiffSensitivity(), saveat = 0.1, internalnorm = (u,t) -> sum(abs2,u/length(u)), abstol=1e-12, reltol=1e-12)), p)
+# grad1 = ([6278.15677493293],)
+\end{jllisting}
+The reason why the two methods give different answers is because the error estimation by the stepsize controller is ignoring numerical errors in the dual component. 
+In the later case, the stepsize controller is drastically underestimating the errors to $\text{err}_i^{n+1} = 0$, which makes the stepsize $\Delta t_{n+1}$ to increase exponentially at every step. 
+This can be fixed by instead considering 
+\begin{align}
+    \text{Err}_\text{scaled}^{n+1}
+    =
+    \Bigg[ \frac{1}{n(p+1)} \Bigg( 
+    &\sum_{i=1}^n \left( \frac{u_i^{n+1} - \hat u_i^{n+1}}{atol + rtol \max (u_i^{n_1}, \hat u_i^{n+1})} \right)^2 \nonumber \\
+    + 
+    &\sum_{i=1}^n \sum_{j=1}^p  
+    \left( \frac{s_{ij}^{n+1} - \hat s_{ij}^{n+1}}{atol + rtol \max (s_{ij}^{n_1}, \hat s_{ij}^{n+1})} \Bigg)^2 \right)
+    \Bigg]^{\frac{1}{2}},
+    \label{eq:internal-norm-correct} 
+\end{align}
+which now gives the right answer
+\begin{jllisting}
+sse(x::Number) = x^2
+sse(x::ForwardDiff.Dual) = sse(ForwardDiff.value(x)) + sum(sse, ForwardDiff.partials(x))
+
+totallength(x::Number) = 1
+totallength(x::ForwardDiff.Dual) = totallength(ForwardDiff.value(x)) + sum(totallength, ForwardDiff.partials(x))
+totallength(x::AbstractArray) = sum(totallength,x)
+
+grad3 = Zygote.gradient(p->sum(solve(prob, Tsit5(), u0=u0, p=p, sensealg = ForwardDiffSensitivity(), saveat = 0.1, internalnorm = (u,t) -> sqrt(sum(x->sse(x),u) / totallength(u)), abstol=abstol, reltol=reltol)), p)
+# grad3 = ([212.71042521681392],)
+\end{jllisting}
+Notice that current implementations of forward AD inside \texttt{SciMLSensitivity.jl} already accounts for this and there is no need to specify the internal norm. 
diff --git a/tex/appendix/lagrangian.tex b/tex/appendix/lagrangian.tex
@@ -1,6 +1,9 @@
 \label{appendix:lagrangian}
 
-In this section we derive the adjoint equation for both discrete and continuous methods using the Lagrange multiplier trick. 
+%by the means of the \emph{Lagrange multiplier method}\cite{Vadlamani.2020}.
+% The introduction of the adjoint variable allows to reduce the computational complexity of sensitivity methods, as we will explore in this section and later in Section \ref{section:computing-adjoints}.
+
+In this section we are going to derive the adjoint equation for both discrete and continuous methods using the Lagrange multiplier trick \cite{Vadlamani_Xiao_Yablonovitch_2020}. 
 Conceptually, the method is the same in both discrete and continuous case, with the difference that we manipulate linear algebra objects for the former and continuous operators for the later. 
 
 For the continuous adjoint method, we proceed the same way by writing a new loss function $I(\theta)$, sometimes known as the \textit{Lagrangian}, identical to $L(\theta)$ as