-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Large reorganization of the paper (#93)
* Add figure for discrete adjoints * [build-tex] Quantum application and comments on symbolic diff Co-authored-by: frankschae <[email protected]> Co-authored-by: facusapienza21 <facusapienza21> * Add workflow distpatch * add workflow distpatch * [build-bib] * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * Updates from Overleaf * [build-bib] Section on Symbolic Differentiation * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * [build-bib] Changes in AD VJP connection * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * Updates from Overleaf * small changes in introduction * updated figure on VJP * Mention on Stressen algorithm for matrix multiplication * [build-bib] Reorganize scientific motivation section * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * [build-bib] Reorganization of adjoints and software + appendix on stepsizing * 📁 Automated biblatex merge [build-latex] Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> * Small push to sync with main --------- Signed-off-by: My GitHub Actions Bot <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: frankschae <[email protected]> Co-authored-by: Author <[email protected]>
- Loading branch information
1 parent
0c96e43
commit 3cd1bef
Showing
28 changed files
with
22,628 additions
and
16,707 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
\label{appendix:dual-number-solver} | ||
|
||
In this section, we are going to consider certain errors that can potentially show up when combining AD with numerical solver. | ||
Numerical solvers for differential equations usually define an error term computed as \cite{hairer-solving-1, Rackauckas_Nie_2016} | ||
\begin{equation} | ||
\text{Err}_\text{scaled}^{n+1} | ||
= | ||
\left( \frac{1}{n} \sum_{i=1}^n \left( \frac{\text{err}_i^{n+1}}{atol + rtol \max (u_i^{n_1}, \hat u_i^{n+1})} \right)^2 \right)^{\frac{1}{2}} | ||
\label{eq:internal-norm-wrong} | ||
\end{equation} | ||
where $\text{err}_i^{n+1} = u_i^{n+1} - \hat u_i^{n+1}$ is the estimated local error. | ||
Different strategies for computing the error exist, for example they can be based on two approximation to the solution or in theoretical upper bounds provided by the numerical solver. | ||
The choice of the norm $\| \cdot \|_2 / length(\cdot)$ has been the standard for a long time\cite{Ranocha_Dalcin_Parsani_Ketcheson_2022} and it is based on the assumption that if we increase the size of the systems of ODEs this should not affect the stepsize choice (imagine for example if we just duplicate the equations), but other options can be considered \cite{hairer-solving-1}. | ||
% cite Ranocha, | ||
% Chris on https://discourse.julialang.org/t/default-norm-used-in-ode-error-control/70995/4 mention to this norm to be the defaul since 70s | ||
|
||
The goal of a good stepize controller is to pick $\Delta t_{n+1}$ as large as possible (so the solver requires less total steps) at the same time that $\text{Err}_\text{scaled} \leq 1$. | ||
One of the most used methods is the proportional-integral controller (PIC) that updates the stepsize according to \cite{Ranocha_Dalcin_Parsani_Ketcheson_2022} | ||
\begin{equation} | ||
\Delta t_{n+1} = \eta \, \Delta t_n | ||
\qquad | ||
\eta = w_{n+1}^{\beta_1 / k} w_n^{\beta_2 / k} w_{n-1}^{\beta_3 / k} | ||
\label{eq:PIC} | ||
\end{equation} | ||
with $w_{n+1} = 1 / \text{Err}_\text{scaled}^{n+1}$ the inverse of the scaled error estimates, and ... | ||
If the stepsize $\Delta t_{n+1}$ proposed in Equation \eqref{eq:PIC} to update from $u^{n+1}$ to $u^{n+2}$ does not satisfy $\text{Err}_\text{scaled}^{n+2} \leq 1$, a new smaller stepsize is proposed. | ||
When $\eta < 1$ (which is the case for simple controllers with $\beta_2 = \beta_3 = 0$), Equation \eqref{eq:PIC} can be used for the local update. | ||
It is also common to restrict $\eta \in [\eta_\text{min}, \eta_\text{max}]$ so the stepsize does not change abruptly \cite{hairer-solving-1}. | ||
|
||
When performing forward AD on top of a numerical solver, the error used in the stepsize controller needs to naturally account for the errors induced in the numerical solution of the original ODE and also the errors in the dual component carrying the value of the derivative. | ||
This relation between the numerical solver and AD has been made explicit when we presented the relationship between forward AD and the sensitivity equations (Section \ref{section:sensitivity-equation}, Equation \eqref{eq:sensitivity-equation-AD}). | ||
To illustrate this, let us consider the simple ODE example from Section \ref{section:software-Forward-AD}, consisting in the system of equations | ||
... | ||
Notice that this ODE has a simple analytical solution $u_1(t) = u_2(t) = 0$ for all times $t$, making this problem very simple to solve numerically. | ||
The following code solves for the derivative with respect to the parameter $a$ using two different methods \footnote{Full code available at ...}. | ||
The second method using forward AD with dual numbers declares the \texttt{internalnorm} argument according to Equation \eqref{eq:internal-norm-wrong}. | ||
\begin{jllisting} | ||
using SciMLSensitivity, OrdinaryDiffEq, Zygote, ForwardDiff | ||
|
||
function fiip(du, u, p, t) | ||
du[1] = p[1] * u[1] - u[1] * u[2] | ||
du[2] = -p[1] * u[2] + u[1] * u[2] | ||
end | ||
|
||
p = [1.] | ||
u0 = [1.0;1.0] | ||
prob = ODEProblem(fiip, u0, (0.0, 10.0), p); | ||
|
||
# Correct gradient computed using | ||
grad0 = Zygote.gradient(p->sum(solve(prob, Tsit5(), u0=u0, p=p, sensealg = ForwardSensitivity(), saveat = 0.1, abstol=1e-12, reltol=1e-12)), p) | ||
# grad0 = ([212.71042521681443],) | ||
|
||
# Original AD with wrong norm | ||
grad1 = Zygote.gradient(p->sum(solve(prob, Tsit5(), u0=u0, p=p, sensealg = ForwardDiffSensitivity(), saveat = 0.1, internalnorm = (u,t) -> sum(abs2,u/length(u)), abstol=1e-12, reltol=1e-12)), p) | ||
# grad1 = ([6278.15677493293],) | ||
\end{jllisting} | ||
The reason why the two methods give different answers is because the error estimation by the stepsize controller is ignoring numerical errors in the dual component. | ||
In the later case, the stepsize controller is drastically underestimating the errors to $\text{err}_i^{n+1} = 0$, which makes the stepsize $\Delta t_{n+1}$ to increase exponentially at every step. | ||
This can be fixed by instead considering | ||
\begin{align} | ||
\text{Err}_\text{scaled}^{n+1} | ||
= | ||
\Bigg[ \frac{1}{n(p+1)} \Bigg( | ||
&\sum_{i=1}^n \left( \frac{u_i^{n+1} - \hat u_i^{n+1}}{atol + rtol \max (u_i^{n_1}, \hat u_i^{n+1})} \right)^2 \nonumber \\ | ||
+ | ||
&\sum_{i=1}^n \sum_{j=1}^p | ||
\left( \frac{s_{ij}^{n+1} - \hat s_{ij}^{n+1}}{atol + rtol \max (s_{ij}^{n_1}, \hat s_{ij}^{n+1})} \Bigg)^2 \right) | ||
\Bigg]^{\frac{1}{2}}, | ||
\label{eq:internal-norm-correct} | ||
\end{align} | ||
which now gives the right answer | ||
\begin{jllisting} | ||
sse(x::Number) = x^2 | ||
sse(x::ForwardDiff.Dual) = sse(ForwardDiff.value(x)) + sum(sse, ForwardDiff.partials(x)) | ||
|
||
totallength(x::Number) = 1 | ||
totallength(x::ForwardDiff.Dual) = totallength(ForwardDiff.value(x)) + sum(totallength, ForwardDiff.partials(x)) | ||
totallength(x::AbstractArray) = sum(totallength,x) | ||
|
||
grad3 = Zygote.gradient(p->sum(solve(prob, Tsit5(), u0=u0, p=p, sensealg = ForwardDiffSensitivity(), saveat = 0.1, internalnorm = (u,t) -> sqrt(sum(x->sse(x),u) / totallength(u)), abstol=abstol, reltol=reltol)), p) | ||
# grad3 = ([212.71042521681392],) | ||
\end{jllisting} | ||
Notice that current implementations of forward AD inside \texttt{SciMLSensitivity.jl} already accounts for this and there is no need to specify the internal norm. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.