[build-bib, build-latex] Structural changes and polishing for final v…

…ersion (#98) * accepted changes and bib style * [build-bib] Fix problem with bibliography * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * [build-bib] Update biblatex_merger.py - ignore victor for bib compilation * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * Updates from Overleaf * [build-bib] filter bibliography.bib * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * [build-bib] fix typo * [build-bib] Delete tex/bibliography.bib * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * Updates from Overleaf * [build-bib] remove victor file for bib compilation * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * [build-bib] Connection between forward AD and sensitivity equations * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * Updates from Overleaf * Integrated edits and small changes Co-authored-by: heimbach <[email protected]> Co-authored-by: JordiBolibar <[email protected]> Co-authored-by: facusapienza21 <[email protected]> * [build-bib] trigger bib generation * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * [build-bib] Add complex-step differentiation comment on holomorphic assumption * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * Recommendation for chaotic systems Co-authored-by: frankschae <[email protected]> * [build-bib] comments and small changes in all manuscript * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * [build-bib] Consistency results more clean * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * [build-bib] Organization changes + symbolic + ecology section Co-authored-by: vboussange <no-reply> Co-authored-by: avik-pal <[email protected]> * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * [build-bib] Structural changes and general typo correction * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> --------- Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Author <[email protected]> Co-authored-by: heimbach <[email protected]> Co-authored-by: JordiBolibar <[email protected]> Co-authored-by: frankschae <[email protected]> Co-authored-by: avik-pal <[email protected]>
ODINN-SciML · Feb 29, 2024 · 41129e5 · 41129e5
1 parent e0c177b
commit 41129e5
Show file tree

Hide file tree

Showing 40 changed files with 21,011 additions and 47,990 deletions.
diff --git a/.utils/biblatex_merger.py b/.utils/biblatex_merger.py
@@ -50,6 +50,11 @@ def merge_bib(path):
 
     # Filter .ipynb files
     bib_files = list(filter(lambda x : '.ipynb' not in x, bib_files))
+
+    # Filter certain bib files from the compilation proccess because of unknown induced errors 
+    bib_files = list(filter(lambda x : 'victor' not in x, bib_files))
+    # bib_files = list(filter(lambda x : 'bibliography.bib' not in x, bib_files))
+
     # Put output file at the end to avoid backslash plague
     bib_files = sorted(bib_files, key = lambda x : 'bibliography.bib' in x)
 
@@ -61,7 +66,6 @@ def merge_bib(path):
     filtered_bib_data = BibliographyData()
 
     for file in tex_files:
-        # with open("tex/sections/adjoint-state.tex", 'r') as f:
         with open(file, 'r') as f:
             latex += f.read()
 
@@ -76,9 +80,9 @@ def merge_bib(path):
 
     # Collect all bib data 
     for file in bib_files:
-        # print(file)
-        # if file == "./bib_test.bib":
-        #     continue
+        if file == "./tex/bibliography.bib":
+            continue
+        print("-----------> Processing bib file: ", file)
         parser = bibtex.Parser()
         bib_data.append(parser.parse_file(file))
 

diff --git a/tex/appendix/AD-norm.tex b/tex/appendix/AD-norm.tex
@@ -1,38 +1,44 @@
 \label{appendix:dual-number-solver}
 
-In this section, we are going to consider certain errors that can potentially show up when combining AD with numerical solver. 
-Numerical solvers for differential equations usually define an error term computed as \cite{hairer-solving-1, Rackauckas_Nie_2016}
+In this section, we are going to consider certain errors that can potentially show up when combining AD with a numerical solver. 
+Numerical solvers for differential equations usually estimate internally a scaled error computed as 
+define an error term computed as \cite{hairer-solving-1, Rackauckas_Nie_2016}
 \begin{equation}
     \text{Err}_\text{scaled}^{n+1}
     =
-    \left( \frac{1}{n} \sum_{i=1}^n \left( \frac{\text{err}_i^{n+1}}{atol + rtol \max (u_i^{n_1}, \hat u_i^{n+1})} \right)^2 \right)^{\frac{1}{2}}
+    \left( \frac{1}{n} \sum_{i=1}^n \left( \frac{\text{err}_i^{n+1}}{\mathfrak{abstol} + \mathfrak{reltol} \, \times \, M} \right)^2 \right)^{\frac{1}{2}},
     \label{eq:internal-norm-wrong}
 \end{equation}
-where $\text{err}_i^{n+1} = u_i^{n+1} - \hat u_i^{n+1}$ is the estimated local error.
-Different strategies for computing the error exist, for example they can be based on two approximation to the solution or in theoretical upper bounds provided by the numerical solver. 
-The choice of the norm $\| \cdot \|_2 / length(\cdot)$ has been the standard for a long time\cite{Ranocha_Dalcin_Parsani_Ketcheson_2022} and it is based on the assumption that if we increase the size of the systems of ODEs this should not affect the stepsize choice (imagine for example if we just duplicate the equations), but other options can be considered \cite{hairer-solving-1}.   
-% cite Ranocha, 
+with $\mathfrak{abstol}$ and $\mathfrak{reltol}$ the absolute and relative solver tolerances (customize), respectively; $M$ is the maximum expected value of the numerical solution; and $\text{err}_i^{n+1}$ is an estimation of the numerical error at step $n+1$. 
+Common choices for these include $M = \max (u_i^{n+1}, \hat u_i^{n+1})$ and $\text{err}_i^{n+1} = u_i^{n+1} - \hat u_i^{n+1}$, but these can vary between solvers. 
+Estimations of the local error $\text{err}_i^{n+1}$ can be based on two approximation to the solution based on Runge-Kutta pairs   \cite{Ranocha_Dalcin_Parsani_Ketcheson_2022, hairer-solving-1}; or in theoretical upper bounds provided by the numerical solver. 
+The choice of the norm $\frac{1}{\sqrt n} \| \cdot \|_2$ for computing the total error $\text{Err}_\text{scaled}$, sometimes known as Hairer norm, has been the standard for a long time\cite{Ranocha_Dalcin_Parsani_Ketcheson_2022} and it is based on the assumption that a mere increase in the size of the systems of ODEs (e.g., by simply duplicating the ODE system) should not affect the stepsize choice, but other options can be considered \cite{hairer-solving-1}.   
 % Chris on https://discourse.julialang.org/t/default-norm-used-in-ode-error-control/70995/4 mention to this norm to be the defaul since 70s
 
-The goal of a good stepize controller is to pick $\Delta t_{n+1}$ as large as possible (so the solver requires less total steps) at the same time that $\text{Err}_\text{scaled} \leq 1$. 
-One of the most used methods is the proportional-integral controller (PIC) that updates the stepsize according to \cite{Ranocha_Dalcin_Parsani_Ketcheson_2022}
+The goal of a stepize controller is to pick $\Delta t_{n+1}$ as large as possible (so the solver requires less total steps) at the same time that $\text{Err}_\text{scaled} \leq 1$. 
+One of the most used methods to archive this is the proportional-integral controller (PIC) that updates the stepsize according to \cite{hairer-solving-2, Ranocha_Dalcin_Parsani_Ketcheson_2022}
 \begin{equation}
     \Delta t_{n+1} = \eta \, \Delta t_n
     \qquad 
-    \eta = w_{n+1}^{\beta_1 / k} w_n^{\beta_2 / k} w_{n-1}^{\beta_3 / k}
+    \eta = w_{n+1}^{\beta_1 / q} w_n^{\beta_2 / q} w_{n-1}^{\beta_3 / q}
     \label{eq:PIC}
 \end{equation}
-with $w_{n+1} = 1 / \text{Err}_\text{scaled}^{n+1}$ the inverse of the scaled error estimates, and ...
+with $w_{n+1} = 1 / \text{Err}_\text{scaled}^{n+1}$ the inverse of the scaled error estimates; $\beta_1$, $\beta_2$, and $\beta_3$ numerical coefficients defined by the controller; and $q$ the order of the numerical solver. 
 If the stepsize $\Delta t_{n+1}$ proposed in Equation \eqref{eq:PIC} to update from $u^{n+1}$ to $u^{n+2}$ does not satisfy $\text{Err}_\text{scaled}^{n+2} \leq 1$, a new smaller stepsize is proposed. 
 When $\eta < 1$ (which is the case for simple controllers with $\beta_2 = \beta_3 = 0$), Equation \eqref{eq:PIC} can be used for the local update. 
 It is also common to restrict $\eta \in [\eta_\text{min}, \eta_\text{max}]$ so the stepsize does not change abruptly \cite{hairer-solving-1}. 
 
-When performing forward AD on top of a numerical solver, the error used in the stepsize controller needs to naturally account for the errors induced in the numerical solution of the original ODE and also the errors in the dual component carrying the value of the derivative. 
+When performing forward AD thought numerical solver, the error used in the stepsize controller needs to naturally account for both the errors induced in the numerical solution of the original ODE and the errors in the dual component carrying the value of the sensitivity. 
 This relation between the numerical solver and AD has been made explicit when we presented the relationship between forward AD and the sensitivity equations (Section \ref{section:sensitivity-equation}, Equation \eqref{eq:sensitivity-equation-AD}). 
 To illustrate this, let us consider the simple ODE example from Section \ref{section:software-Forward-AD}, consisting in the system of equations 
-... 
-Notice that this ODE has a simple analytical solution $u_1(t) = u_2(t) = 0$ for all times $t$, making this problem very simple to solve numerically.
-The following code solves for the derivative with respect to the parameter $a$ using two different methods \footnote{Full code available at ...}. 
+\begin{equation}
+\begin{cases}
+ \frac{du_1}{dt} = a u_1 - u_1 u_2 & \quad u_1(0) = 1  \\ 
+ \frac{du_2}{dt} = - a u_2 + u_1 u_2 & \quad u_2(0) = 1.
+\end{cases}
+\end{equation}
+Notice that for $a = 1$ this ODE admits a simple analytical solution $u_1(t) = u_2(t) = 1$ for all times $t$, making this problem very simple to solve numerically.
+The following code solves for the derivative with respect to the parameter $a$ using two different methods\footnote{Full code available at ...}. 
 The second method using forward AD with dual numbers declares the \texttt{internalnorm} argument according to Equation \eqref{eq:internal-norm-wrong}.
 \begin{jllisting}
 using SciMLSensitivity, OrdinaryDiffEq, Zygote, ForwardDiff
@@ -55,16 +61,16 @@
 # grad1 = ([6278.15677493293],)
 \end{jllisting}
 The reason why the two methods give different answers is because the error estimation by the stepsize controller is ignoring numerical errors in the dual component. 
-In the later case, the stepsize controller is drastically underestimating the errors to $\text{err}_i^{n+1} = 0$, which makes the stepsize $\Delta t_{n+1}$ to increase exponentially at every step. 
-This can be fixed by instead considering 
+In the later case, the local estimated error is drastically underestimated to $\text{err}_i^{n+1} = 0$, which makes the stepsize $\Delta t_{n+1}$ to increase by a multiplicative factor at evary step. 
+This can be fixed by instead considering a norm that accounts for both the primal and dual components in the forward pass, 
 \begin{align}
     \text{Err}_\text{scaled}^{n+1}
     =
     \Bigg[ \frac{1}{n(p+1)} \Bigg( 
-    &\sum_{i=1}^n \left( \frac{u_i^{n+1} - \hat u_i^{n+1}}{atol + rtol \max (u_i^{n_1}, \hat u_i^{n+1})} \right)^2 \nonumber \\
+    &\sum_{i=1}^n \left( \frac{u_i^{n+1} - \hat u_i^{n+1}}{\mathfrak{abstol} + \mathfrak{reltol} \max (u_i^{n_1}, \hat u_i^{n+1})} \right)^2 \nonumber \\
     + 
     &\sum_{i=1}^n \sum_{j=1}^p  
-    \left( \frac{s_{ij}^{n+1} - \hat s_{ij}^{n+1}}{atol + rtol \max (s_{ij}^{n_1}, \hat s_{ij}^{n+1})} \Bigg)^2 \right)
+    \left( \frac{s_{ij}^{n+1} - \hat s_{ij}^{n+1}}{\mathfrak{abstol} + \mathfrak{reltol} \max (s_{ij}^{n_1}, \hat s_{ij}^{n+1})} \Bigg)^2 \right)
     \Bigg]^{\frac{1}{2}},
     \label{eq:internal-norm-correct} 
 \end{align}

diff --git a/tex/appendix/code.tex b/tex/appendix/code.tex
@@ -1 +1,6 @@
-% Appendix to mentioned the code provided in the GithUb repository.
+\label{appedix:code}
+
+\begin{itemize}
+    \item[$\clubsuit_1$] 323
+    \item[$\clubsuit_2$] 323
+\end{itemize}
diff --git a/tex/appendix/lagrangian.tex b/tex/appendix/lagrangian.tex
@@ -3,14 +3,67 @@
 %by the means of the \emph{Lagrange multiplier method}\cite{Vadlamani.2020}.
 % The introduction of the adjoint variable allows to reduce the computational complexity of sensitivity methods, as we will explore in this section and later in Section \ref{section:computing-adjoints}.
 
-In this section we are going to derive the adjoint equation for both discrete and continuous methods using the Lagrange multiplier trick \cite{Vadlamani_Xiao_Yablonovitch_2020}. 
+The adjoint equation can be derived directly from the 
+Following the analysis in \cite{Giles_Pierce_2000}, we decided to present both approaches here, although we prefer with the duality viewpoint introduced in the main text since we believe is more commonly used and easy to understand for newcomers. 
+
+In this section we are going to derive the adjoint equation for both discrete and continuous methods using the Lagrangian formulation of the adjoint. 
+It is important to mention that this is different than using the Lagrange multipliers approach, a common confusion in the literature\cite{Givoli_2021}.
 Conceptually, the method is the same in both discrete and continuous case, with the difference that we manipulate linear algebra objects for the former and continuous operators for the later. 
 
-For the continuous adjoint method, we proceed the same way by writing a new loss function $I(\theta)$, sometimes known as the \textit{Lagrangian}, identical to $L(\theta)$ as 
+\subsubsection{Discrete adjoint}
+
+Following the notation introduced in Section \ref{section:discrete-adjoint}, we first define the Lagrangian function $I(U, \theta)$ as 
+\begin{equation}
+    I(U, \theta) = L(u; \theta) + \lambda^T G(U; \theta),
+\end{equation}
+where $\lambda \in \R^{nk}$ is, in principle, any choice of vector. 
+Given that for every choice of the parameter $\theta$ we have $G(U; \theta) = 0$, we have that $I(U, \theta) = L(U, \theta)$.
+Then, the gradient of $L$ with respect to the vector parameter $\theta$ can be computed as 
+\begin{align}
+\begin{split}
+    \frac{dL}{d\theta}
+    = 
+    \frac{dI}{d\theta}
+    &= 
+    \frac{\partial L}{\partial \theta} + \frac{\partial L}{\partial U} \frac{\partial U}{\partial \theta}
+    + 
+    \lambda^T
+    \left( \frac{\partial G}{\partial U} + \frac{\partial G}{\partial U} \frac{\partial U}{\partial \theta} \right) \nonumber
+    \\ 
+    &= 
+    \frac{\partial L}{\partial \theta} + \lambda^T \frac{\partial G}{\partial U} 
+    + 
+    \left( \frac{\partial L}{\partial \theta} + \lambda^T \frac{\partial G}{\partial U} \right) \frac{\partial U}{\partial \theta}.
+\end{split}
+\label{eq:appendix-discrete-adjoint}
+\end{align}
+The important trick in the last term involved grouping all the terms involving the sensitivity $\frac{\partial U}{\partial \theta}$ together. 
+In order to avoid the computation of the sensitivity at all, we can define $\lambda$ as the vector that makes the last term in the right hand side of Equation \eqref{eq:appendix-discrete-adjoint} which results in the same results we obtained in Equation \eqref{eq:adjoint-state-equation} and \eqref{eq:def_adjoint}, that is, 
+\begin{equation}
+    \frac{\partial L}{\partial \theta} = - \lambda^T \frac{\partial G}{\partial U}.
+\end{equation}
+Finally, the gradient can be computed as 
+\begin{equation}
+    \frac{dL}{d\theta} 
+    = 
+    \frac{\partial L}{\partial \theta} + \lambda^T \frac{\partial G}{\partial U} .
+\end{equation}
+
+\subsubsection{Continuous adjoint}
+
+For the continuous adjoint method, we proceed the same way by defining the Lagrangian $I(\theta)$ as 
 \begin{equation}
     I(\theta) = L(\theta) - \inttime \lambda(t)^T \left( \frac{du}{dt} - f(u, \theta, t) \right) dt
 \end{equation}
-where $\lambda(t) \in \mathbb R^n$ is the Lagrange multiplier of the continuous constraint defined by the differential equation. Now, 
+where $\lambda(t) \in \mathbb R^n$ is any function. 
+It is important to mention that, in principle, there is connection yet between $\lambda (t)$ and the Lagrange multiplier associated to the constraint \cite{Givoli_2021}. 
+Instead, the condition 
+\begin{equation}
+    \inttime \lambda(t)^T \left( \frac{du}{dt} - f(u, \theta, t) \right) dt = 0 \qquad \text{for all function } \lambda : [t_0, t_1] \mapsto \R^n
+\end{equation}
+correspond to the weak formulation of the differential equation \eqref{eq:original_ODE} \cite{brezis2011functional}.
+As long as the differential equation is satisfyed, we have $I(\theta) = L(\theta)$ for all choices of the vector parameter $\theta$. 
+Now, 
 \begin{equation}
     \frac{dL}{d\theta} = \frac{dI}{d\theta} = 
     \inttime \left( \frac{\partial h}{\partial \theta} + \frac{\partial h}{\partial u} \frac{\partial u}{\partial \theta} \right) dt
@@ -37,4 +90,6 @@
 \end{equation}
 with final condition $\lambda(t_1) = 0$. 
 
-It is easy to see that this derivation is equivalent to solving the Karush-Kuhn-Tucker (KKT) conditions. 
+It is easy to see that this derivation is equivalent to solving the Karush-Kuhn-Tucker (KKT) conditions. 
+
+\subsubsection{The adjoint from a functional analysis perspective}