Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finishing the linear solve and Krylov interfaces #1551

Closed
ChrisRackauckas opened this issue Dec 27, 2021 · 13 comments
Closed

Finishing the linear solve and Krylov interfaces #1551

ChrisRackauckas opened this issue Dec 27, 2021 · 13 comments

Comments

@ChrisRackauckas
Copy link
Member

I plan to finally finish the linear solver Krylov preconditioner etc. interfaces this week. The big breaking LinearSolve.jl change #1543 of course was a prerequisite to doing this correctly, and so now we're set to solve SciML/DifferentialEquations.jl#453 and SciML/DifferentialEquations.jl#521 at the same time. The test cases are:

The plan for this last little part is (mostly documenting for myself to remember the whole plan a few days from now 😅)

  • Add and document algorithm traits for LinearSolve.jl algorithms for needsconcreteA(alg) for whether a concrete A is required or whether only the A*x is used.
  • Internally to OrdinaryDiffEq, https://github.com/SciML/OrdinaryDiffEq.jl/blob/v6.0.3/src/derivative_utils.jl#L680 elseif jac_prototype === nothing && needsconcreteA(alg.linsolve), automatically choose J a JacVecOperator matching the autodiff setting. This would make it the same as the user choosing jac_prototype = JacVecOperator(), but now if they choose GMRES it will automatically make this swap and set autodiff appropriately. This is then inverted for a good reason: if the user wants to specify their own matrix-free operator or approximate Jacobian, then they just as before set jac_prototype and jac
  • Algorithms then get a precs argument for a function Pl,Pr = alg.precs(W,du,u,p,t,newW,solverdata) for building preconditioners which are dependent on the current W matrix. To handle the type-groundedness for the solver's init phase, newW will be passed as nothing to allow a separate dispatch for handling that phase. solverdata is kind of an escape hatch so future additions can be non-breaking. Currently it's just set to include gamma for the W = I - gamma*J since that will be needed for @tjdiamandis's randomized preconditioners, but that list of "things from the solver someone needs" can easily grow, and so it's non-breaking to add more little scalars to the struct in the future. dt, etc. who knows.
  • Note that if W is a WOperator, then W is given to the user in the preconditioner and the linear solver in the lazy form where they can W.mass_matrix, W.gamma, W.J and do special linear solvers. This solves solving for Implicit operators CliMA/ClimaCore.jl#41 without the need for keeping the Wfact! code around.

That will be relatively painless to do and we'll get fully general matrix-free preconditioned Krylov.

Mentioning this to @simonbyrne who is tracking this kind of stuff in CliMA/ClimaCore.jl#41

One last little tidbit, when digging around Sundials, the only thing that Sundials has that this doesn't is some weird pointer in what would be the equivalent to the precs function https://sundials.readthedocs.io/en/latest/cvode/Usage/index.html#c.CVLsPrecSetupFn

jcur – a pointer to a flag which should be set to SUNTRUE if Jacobian data was recomputed, or set to SUNFALSE if Jacobian data was not recomputed, but saved data was still reused.

That makes no sense if matrix-free though, and so Sundials usually just ignores it. So what's the case where you're doing not matrix-free Newton-Krylov and you want to tell the algorithm you modified the Jacobian in the preconditioner setup function to do... ? I don't see Sundials actually doing anything with it, and it doesn't make too much sense to me right now to have it. If someone updates the Jacobian, then surely they don't want you to exit/reject the step right before using it to calculate a new Jacobian, so it should not act like the normal use of the newW flag. The only thing it could be is for resetting isfresh for the A in the linear solve: so the user directly mutated W and wants it to know that it might need to refactorize before solving even though the algorithm didn't necessarily ask for a new W. But that can't be the case because this is in preconditioners and thus not for factorization algorithms, unless it's for mixed iterative+factorization algorithms. I'll need to ruminate on this a bit more. But it's not hard, the better way to do this would be to implement it in our form as Pl,Pr,Wmodified = alg.precs(W,du,u,p,t,newW,solverdata), so the user function needs to additionally return a boolean. It would be good then to not break the interface and have it now if it's necessary.

ChrisRackauckas added a commit that referenced this issue Jan 2, 2022
Basically solves #1551 sans any extra performance issues. The interface is simply that the user gives:

```julia
Pr, Pl = precs(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
```

in the associated algorithm definitions. This gives a slight generalization over the Sundials interface so it should be enough, sans the `jcur` thing that is still an ongoing question. The setup phase is just a different dispatch on this function, i.e.

```julia
Pr, Pl = precs(W,du,u,p,t,::Nothing,::Nothing,::Nothing,solverdata)
```

which is rather clean. `solverdata` is for backwards compatibility: it's going to not be documented for a bit but allow for slapping things like `gamma` or `dt` in there, and adding to the struct won't be breaking like changing the call signature would be.

This PR is currently untested and only implements it for one algorithm, and so getting this to merge will essentially just require adding proper tests of using preconditioners which require updated Jacobians, like incomplete LU-factorizations, showing that it improves the convergence of GMRES. And setting up the rest of the algorithms to have this preconditioner interface as well, which is the simple but tedious part.
@ChrisRackauckas
Copy link
Member Author

The purpose of jcur:

We need to know how "current" the preconditioner in case the Newton iteration fails to converge. Specifically, if the Newton iteration diverges with a "stale" preconditioner, then we will first call psetup with jok = SUNFALSE so that you can recompute things before we try again. However, if the Newton iteration fails to converge with a "current" preconditioner, then we have no recourse but to reduce the time step size and retry the time step.

This would have to hook into the Jacobian reuse stuff.

ChrisRackauckas added a commit that referenced this issue Jan 7, 2022
Basically solves #1551 sans any extra performance issues. The interface is simply that the user gives:

```julia
Pr, Pl = precs(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
```

in the associated algorithm definitions. This gives a slight generalization over the Sundials interface so it should be enough, sans the `jcur` thing that is still an ongoing question. The setup phase is just a different dispatch on this function, i.e.

```julia
Pr, Pl = precs(W,du,u,p,t,::Nothing,::Nothing,::Nothing,solverdata)
```

which is rather clean. `solverdata` is for backwards compatibility: it's going to not be documented for a bit but allow for slapping things like `gamma` or `dt` in there, and adding to the struct won't be breaking like changing the call signature would be.

This PR is currently untested and only implements it for one algorithm, and so getting this to merge will essentially just require adding proper tests of using preconditioners which require updated Jacobians, like incomplete LU-factorizations, showing that it improves the convergence of GMRES. And setting up the rest of the algorithms to have this preconditioner interface as well, which is the simple but tedious part.
@ChrisRackauckas
Copy link
Member Author

After #1556

using OrdinaryDiffEq, LinearSolve, Test

const N = 32
const xyd_brusselator = range(0,stop=1,length=N)
brusselator_f(x, y, t) = (((x-0.3)^2 + (y-0.6)^2) <= 0.1^2) * (t >= 1.1) * 5.
limit(a, N) = a == N+1 ? 1 : a == 0 ? N : a
function brusselator_2d_loop(du, u, p, t)
  A, B, alpha, dx = p
  alpha = alpha/dx^2
  @inbounds for I in CartesianIndices((N, N))
    i, j = Tuple(I)
    x, y = xyd_brusselator[I[1]], xyd_brusselator[I[2]]
    ip1, im1, jp1, jm1 = limit(i+1, N), limit(i-1, N), limit(j+1, N), limit(j-1, N)
    du[i,j,1] = alpha*(u[im1,j,1] + u[ip1,j,1] + u[i,jp1,1] + u[i,jm1,1] - 4u[i,j,1]) +
                B + u[i,j,1]^2*u[i,j,2] - (A + 1)*u[i,j,1] + brusselator_f(x, y, t)
    du[i,j,2] = alpha*(u[im1,j,2] + u[ip1,j,2] + u[i,jp1,2] + u[i,jm1,2] - 4u[i,j,2]) +
                A*u[i,j,1] - u[i,j,1]^2*u[i,j,2]
    end
end
p = (3.4, 1., 10., step(xyd_brusselator))

function init_brusselator_2d(xyd)
  N = length(xyd)
  u = zeros(N, N, 2)
  for I in CartesianIndices((N, N))
    x = xyd[I[1]]
    y = xyd[I[2]]
    u[I,1] = 22*(y*(1-y))^(3/2)
    u[I,2] = 27*(x*(1-x))^(3/2)
  end
  u
end
u0 = init_brusselator_2d(xyd_brusselator)
prob_ode_brusselator_2d = ODEProblem(brusselator_2d_loop,
                                     u0,(0.,5.0),p)

@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve=IterativeSolversJL_GMRES()),save_everystep=false)
#  49.017548 seconds (798.97 M allocations: 21.856 GiB, 6.28% gc time, 12.82% compilation time)

using Symbolics
du0 = copy(u0)
jac = Symbolics.jacobian_sparsity((du,u)->brusselator_2d_loop(du,u,p,0.0),du0,u0)

prob_ode_brusselator_2d = ODEProblem(ODEFunction(brusselator_2d_loop,jac_prototype = float.(jac)),
                                     u0,(0.,11.5),p)

@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),concrete_jac=false),save_everystep=false);
# 100.195429 seconds (1.75 G allocations: 45.848 GiB, 4.98% gc time)

@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 5.442302 seconds (52.74 k allocations: 160.680 MiB, 0.09% gc time)

@time solve(prob_ode_brusselator_2d,TRBDF2(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 5.442302 seconds (52.74 k allocations: 160.680 MiB, 0.09% gc time)

@time solve(prob_ode_brusselator_2d,Rodas4P(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 7.422705 seconds (59.82 k allocations: 231.397 MiB, 0.29% gc time)

@time solve(prob_ode_brusselator_2d,Rodas4(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 7.422705 seconds (59.82 k allocations: 231.397 MiB, 0.29% gc time)

@time solve(prob_ode_brusselator_2d,TRBDF2(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 7.422705 seconds (59.82 k allocations: 231.397 MiB, 0.29% gc time)

using IncompleteLU
function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = ilu(convert(AbstractMatrix,W), τ = 5.0)
  else
    Pl = Plprev
  end
  Pl,nothing
end
@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.541864 seconds (50.86 k allocations: 384.094 MiB, 3.17% gc time)

function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = ilu(convert(AbstractMatrix,W), τ = 200.0)
  else
    Pl = Plprev
  end
  Pl,nothing
end
@time solve(prob_ode_brusselator_2d,Rodas4P(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.617162 seconds (51.02 k allocations: 387.520 MiB, 2.59% gc time)

@time solve(prob_ode_brusselator_2d,Rodas4(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.625991 seconds (51.17 k allocations: 390.754 MiB, 2.56% gc time)

@time solve(prob_ode_brusselator_2d,Rodas5(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.556428 seconds (47.99 k allocations: 320.761 MiB, 2.85% gc time)

using AlgebraicMultigrid
function algebraicmultigrid(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = aspreconditioner(ruge_stuben(W))
  else
    Pl = Plprev
  end
  Pl,nothing
end
@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 0.952956 seconds (111.62 k allocations: 1.000 GiB, 4.67% gc time)

@time solve(prob_ode_brusselator_2d,Rodas4(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 1.122485 seconds (98.24 k allocations: 908.294 MiB, 3.52% gc time)

@time solve(prob_ode_brusselator_2d,Rodas5(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 1.022904 seconds (81.73 k allocations: 694.942 MiB, 2.69% gc time)

@time solve(prob_ode_brusselator_2d,Rosenbrock23(),save_everystep=false);
# 1.127027 seconds (48.75 k allocations: 762.010 MiB, 0.85% gc time)

@time solve(prob_ode_brusselator_2d,Rodas4(),save_everystep=false);
# 1.026517 seconds (47.41 k allocations: 637.861 MiB, 0.78% gc time)

@time solve(prob_ode_brusselator_2d,Rodas5(),save_everystep=false);
# 0.778769 seconds (43.59 k allocations: 426.856 MiB, 0.65% gc time)

@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve = KLUFactorization()),save_everystep=false);
# 0.904355 seconds (61.25 k allocations: 528.112 MiB, 0.69% gc time)

@time solve(prob_ode_brusselator_2d,Rodas4(linsolve = KLUFactorization()),save_everystep=false);
# 0.745028 seconds (63.78 k allocations: 416.990 MiB, 0.89% gc time)

@time solve(prob_ode_brusselator_2d,Rodas5(linsolve = KLUFactorization()),save_everystep=false);
# 0.493103 seconds (56.56 k allocations: 269.619 MiB, 0.98% gc time)

@time solve(prob_ode_brusselator_2d,TRBDF2(linsolve = KLUFactorization()),save_everystep=false);
# 0.426342 seconds (57.53 k allocations: 219.330 MiB, 2.16% gc time)

@time solve(prob_ode_brusselator_2d,KenCarp4(linsolve = KLUFactorization()),save_everystep=false);
# 0.467652 seconds (67.39 k allocations: 227.136 MiB, 0.85% gc time)

@time solve(prob_ode_brusselator_2d,FBDF(linsolve = KLUFactorization()),save_everystep=false);
# 0.718509 seconds (76.77 k allocations: 403.669 MiB, 0.71% gc time)

@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve = UMFPACKFactorization()),save_everystep=false);
# 1.110053 seconds (48.79 k allocations: 762.014 MiB, 0.96% gc time)

@time solve(prob_ode_brusselator_2d,Rodas5(linsolve = UMFPACKFactorization()),save_everystep=false);
# 0.749630 seconds (43.60 k allocations: 426.858 MiB, 0.70% gc time)

@time solve(prob_ode_brusselator_2d,FBDF(linsolve = UMFPACKFactorization()),save_everystep=false);
# 0.941824 seconds (63.09 k allocations: 604.900 MiB, 0.98% gc time)

#=
import Pardiso
@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve = MKLPardisoFactorize()),save_everystep=false);
#

@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve = MKLPardisoIterate()),save_everystep=false);
#
=#

using Sundials
@time solve(prob_ode_brusselator_2d,CVODE_BDF(linear_solver=:GMRES),save_everystep=false);
# 0.225442 seconds (58.38 k allocations: 2.846 MiB)

@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Jan 8, 2022

There's one last PR to really finish this up. Instead of having a JacVec form of Jacobian-free W, my plan is for that to be the WOperator standard when used in mul! or * form. The reason is because otherwise concrete_jac=true for the preconditioning, but by doing so W does not update as often. So the holy grail, W updating at each step but also having approximated concrete W's, requires mul! always uses the JacVec. And that makes sense anyways since JacVec will always be cheaper than the full matrix computation by virtue of being always a direct matrix-vector computation. That, plus making sure update_coefficients! is called and that it updates u appropriately, should fix the timings for the implicit methods. This does not effect the Rosenbrock methods since they are supposed to only use uprev anyways.

We still need the concrete_jac machinery since we do not always want to make the matrix. The matrix is then only ever generated if convert(AbstractMatrix,W) is called, but for the types to be correct it would need to build the concrete_form once, so we still need an easy way to keep the one that concretizes away from the one that never does.

And we should add the jcur bit before releasing and documenting.

@ChrisRackauckas
Copy link
Member Author

With #1561, we are almost done.

using OrdinaryDiffEq, LinearSolve, Test

const N = 32
const xyd_brusselator = range(0,stop=1,length=N)
brusselator_f(x, y, t) = (((x-0.3)^2 + (y-0.6)^2) <= 0.1^2) * (t >= 1.1) * 5.
limit(a, N) = a == N+1 ? 1 : a == 0 ? N : a
function brusselator_2d_loop(du, u, p, t)
  A, B, alpha, dx = p
  alpha = alpha/dx^2
  @inbounds for I in CartesianIndices((N, N))
    i, j = Tuple(I)
    x, y = xyd_brusselator[I[1]], xyd_brusselator[I[2]]
    ip1, im1, jp1, jm1 = limit(i+1, N), limit(i-1, N), limit(j+1, N), limit(j-1, N)
    du[i,j,1] = alpha*(u[im1,j,1] + u[ip1,j,1] + u[i,jp1,1] + u[i,jm1,1] - 4u[i,j,1]) +
                B + u[i,j,1]^2*u[i,j,2] - (A + 1)*u[i,j,1] + brusselator_f(x, y, t)
    du[i,j,2] = alpha*(u[im1,j,2] + u[ip1,j,2] + u[i,jp1,2] + u[i,jm1,2] - 4u[i,j,2]) +
                A*u[i,j,1] - u[i,j,1]^2*u[i,j,2]
    end
end
p = (3.4, 1., 10., step(xyd_brusselator))

function init_brusselator_2d(xyd)
  N = length(xyd)
  u = zeros(N, N, 2)
  for I in CartesianIndices((N, N))
    x = xyd[I[1]]
    y = xyd[I[2]]
    u[I,1] = 22*(y*(1-y))^(3/2)
    u[I,2] = 27*(x*(1-x))^(3/2)
  end
  u
end
u0 = init_brusselator_2d(xyd_brusselator)
prob_ode_brusselator_2d = ODEProblem(brusselator_2d_loop,
                                     u0,(0.,11.5),p)

using Symbolics
du0 = copy(u0)
jac = Symbolics.jacobian_sparsity((du,u)->brusselator_2d_loop(du,u,p,0.0),du0,u0)

prob_ode_brusselator_2d_sparse = ODEProblem(ODEFunction(brusselator_2d_loop,jac_prototype = float.(jac)),
                                            u0,(0.,11.5),p)

using ModelingToolkit
prob_ode_brusselator_2d_mtk = ODEProblem(modelingtoolkitize(prob_ode_brusselator_2d_sparse),[],(0.0,11.5),jac=true,sparse=true)

# Sparse and AD
@time sol = solve(prob_ode_brusselator_2d,TRBDF2(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 9.350326 seconds (2.06 M allocations: 190.429 MiB, 0.20% gc time)

@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 9.242326 seconds (2.09 M allocations: 195.962 MiB, 0.35% gc time)

@time solve(prob_ode_brusselator_2d,TRBDF2(linsolve=IterativeSolversJL_GMRES(),autodiff=false),save_everystep=false);
# 19.075056 seconds (4.34 M allocations: 277.909 MiB, 0.17% gc time)

@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),autodiff=false),save_everystep=false);
# 18.641695 seconds (4.37 M allocations: 282.333 MiB, 0.17% gc time)

# Different methods, no preconditioner

@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 6.231675 seconds (1.37 M allocations: 206.219 MiB, 0.28% gc time)

@time solve(prob_ode_brusselator_2d,Rodas4(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 8.095367 seconds (1.82 M allocations: 296.098 MiB, 0.38% gc time)

@time solve(prob_ode_brusselator_2d,Rodas4P(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 8.714918 seconds (1.96 M allocations: 299.474 MiB, 0.35% gc time)

@time solve(prob_ode_brusselator_2d,Rodas4P2(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 9.772534 seconds (2.20 M allocations: 319.374 MiB, 0.30% gc time)

@time solve(prob_ode_brusselator_2d,Rodas5(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 7.827665 seconds (1.76 M allocations: 276.399 MiB, 0.32% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),concrete_jac=false),save_everystep=false);
# 6.406100 seconds (1.40 M allocations: 211.752 MiB, 3.53% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 6.173799 seconds (1.40 M allocations: 212.254 MiB, 0.20% gc time)

@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 9.123308 seconds (2.09 M allocations: 196.480 MiB, 0.18% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas4P(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 8.711351 seconds (1.99 M allocations: 305.508 MiB, 0.30% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas4(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 8.151490 seconds (1.86 M allocations: 302.132 MiB, 0.30% gc time)

@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 9.089930 seconds (2.09 M allocations: 196.464 MiB, 0.09% gc time)

using IncompleteLU
function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = ilu(convert(AbstractMatrix,W), τ = 5.0)
  else
    Pl = Plprev
  end
  Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.578485 seconds (93.01 k allocations: 385.764 MiB, 2.90% gc time)

function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = ilu(convert(AbstractMatrix,W), τ = 0.2)
  else
    Pl = Plprev
  end
  Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 11.473043 seconds (2.09 M allocations: 197.050 MiB, 0.11% gc time)

function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = ilu(convert(AbstractMatrix,W), τ = 200.0)
  else
    Pl = Plprev
  end
  Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,Rodas4P(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.627300 seconds (107.22 k allocations: 389.992 MiB, 1.89% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas4(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.639779 seconds (108.02 k allocations: 393.251 MiB, 1.44% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.564875 seconds (100.44 k allocations: 323.098 MiB, 1.79% gc time)

using AlgebraicMultigrid
function algebraicmultigrid(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = aspreconditioner(ruge_stuben(convert(AbstractMatrix,W)))
  else
    Pl = Plprev
  end
  Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 0.907391 seconds (126.34 k allocations: 1.001 GiB, 3.95% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas4(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 1.122485 seconds (98.24 k allocations: 908.294 MiB, 3.52% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 1.138373 seconds (119.22 k allocations: 909.016 MiB, 3.31% gc time)

@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 2.139822 seconds (171.27 k allocations: 428.200 MiB, 1.35% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(),save_everystep=false);
# 1.140436 seconds (48.75 k allocations: 762.010 MiB, 1.19% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas4(),save_everystep=false);
# 1.055678 seconds (47.45 k allocations: 637.863 MiB, 0.89% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas5(),save_everystep=false);
# 0.781087 seconds (43.60 k allocations: 426.857 MiB, 1.64% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve = KLUFactorization()),save_everystep=false);
# 0.948284 seconds (62.59 k allocations: 528.187 MiB, 1.38% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas4(linsolve = KLUFactorization()),save_everystep=false);
# 0.768939 seconds (64.30 k allocations: 417.019 MiB, 2.38% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve = KLUFactorization()),save_everystep=false);
# 0.513310 seconds (57.34 k allocations: 269.663 MiB, 2.46% gc time)

@time solve(prob_ode_brusselator_2d_mtk,Rodas5(linsolve = KLUFactorization()),save_everystep=false);
# 0.498070 seconds (53.55 k allocations: 269.487 MiB)

@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve = KLUFactorization()),save_everystep=false);
# 0.421878 seconds (56.46 k allocations: 219.270 MiB, 1.86% gc time)

@time solve(prob_ode_brusselator_2d_mtk,TRBDF2(linsolve = KLUFactorization()),save_everystep=false);
# 0.405702 seconds (52.99 k allocations: 219.127 MiB)

@time solve(prob_ode_brusselator_2d_sparse,KenCarp4(linsolve = KLUFactorization()),save_everystep=false);
# 0.476274 seconds (67.56 k allocations: 227.145 MiB, 1.61% gc time)

@time solve(prob_ode_brusselator_2d_mtk,KenCarp4(linsolve = KLUFactorization()),save_everystep=false);
# 0.479085 seconds (63.95 k allocations: 227.024 MiB, 2.32% gc time)

@time solve(prob_ode_brusselator_2d_sparse,FBDF(linsolve = KLUFactorization()),save_everystep=false);
# 0.725433 seconds (77.00 k allocations: 403.682 MiB, 1.99% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve = UMFPACKFactorization()),save_everystep=false);
# 1.133577 seconds (48.77 k allocations: 762.013 MiB, 0.88% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve = UMFPACKFactorization()),save_everystep=false);
# 0.774167 seconds (43.60 k allocations: 426.858 MiB, 1.22% gc time)

@time solve(prob_ode_brusselator_2d_mtk,Rodas5(linsolve = UMFPACKFactorization()),save_everystep=false);
# 0.779785 seconds (41.72 k allocations: 426.789 MiB, 2.16% gc time)

@time solve(prob_ode_brusselator_2d_sparse,FBDF(linsolve = UMFPACKFactorization()),save_everystep=false);
# 0.978541 seconds (63.08 k allocations: 604.900 MiB, 0.97% gc time)

#=
import Pardiso
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve = MKLPardisoFactorize()),save_everystep=false);
#

@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve = MKLPardisoIterate()),save_everystep=false);
#
=#

using Sundials
@time solve(prob_ode_brusselator_2d_sparse,CVODE_BDF(linear_solver=:GMRES),save_everystep=false);
# 0.212607 seconds (58.38 k allocations: 2.846 MiB)

@time solve(prob_ode_brusselator_2d_mtk,CVODE_BDF(linear_solver=:KLU),save_everystep=false);
# 0.495496 seconds (1.36 k allocations: 5.795 MiB)

@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Jan 9, 2022

using OrdinaryDiffEq, LinearSolve, Test

const N = 32
const xyd_brusselator = range(0,stop=1,length=N)
brusselator_f(x, y, t) = (((x-0.3)^2 + (y-0.6)^2) <= 0.1^2) * (t >= 1.1) * 5.
limit(a, N) = a == N+1 ? 1 : a == 0 ? N : a

function brusselator_2d_loop(du, u, p, t)
  A, B, alpha, dx = p
  alpha = alpha/dx^2
  @inbounds for I in CartesianIndices((N, N))
    i, j = Tuple(I)
    x, y = xyd_brusselator[I[1]], xyd_brusselator[I[2]]
    ip1, im1, jp1, jm1 = limit(i+1, N), limit(i-1, N), limit(j+1, N), limit(j-1, N)
    du[i,j,1] = alpha*(u[im1,j,1] + u[ip1,j,1] + u[i,jp1,1] + u[i,jm1,1] - 4u[i,j,1]) +
                B + u[i,j,1]^2*u[i,j,2] - (A + 1)*u[i,j,1] + brusselator_f(x, y, t)
    du[i,j,2] = alpha*(u[im1,j,2] + u[ip1,j,2] + u[i,jp1,2] + u[i,jm1,2] - 4u[i,j,2]) +
                A*u[i,j,1] - u[i,j,1]^2*u[i,j,2]
    end
end
p = (3.4, 1., 10., step(xyd_brusselator))

function init_brusselator_2d(xyd)
  N = length(xyd)
  u = zeros(N, N, 2)
  for I in CartesianIndices((N, N))
    x = xyd[I[1]]
    y = xyd[I[2]]
    u[I,1] = 22*(y*(1-y))^(3/2)
    u[I,2] = 27*(x*(1-x))^(3/2)
  end
  u
end
u0 = init_brusselator_2d(xyd_brusselator)
prob_ode_brusselator_2d = ODEProblem(brusselator_2d_loop,
                                     u0,(0.,11.5),p)

using Symbolics
du0 = copy(u0)
jac = Symbolics.jacobian_sparsity((du,u)->brusselator_2d_loop(du,u,p,0.0),du0,u0)

prob_ode_brusselator_2d_sparse = ODEProblem(ODEFunction(brusselator_2d_loop,jac_prototype = float.(jac)),
                                            u0,(0.,11.5),p)

using ModelingToolkit
prob_ode_brusselator_2d_mtk = ODEProblem(modelingtoolkitize(prob_ode_brusselator_2d_sparse),[],(0.0,11.5),jac=true,sparse=true);

# Sparse and AD
@time sol = solve(prob_ode_brusselator_2d,TRBDF2(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 9.350326 seconds (2.06 M allocations: 190.429 MiB, 0.20% gc time)

@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 9.242326 seconds (2.09 M allocations: 195.962 MiB, 0.35% gc time)

@time solve(prob_ode_brusselator_2d,TRBDF2(linsolve=IterativeSolversJL_GMRES(),autodiff=false),save_everystep=false);
# 19.075056 seconds (4.34 M allocations: 277.909 MiB, 0.17% gc time)

@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),autodiff=false),save_everystep=false);
# 18.641695 seconds (4.37 M allocations: 282.333 MiB, 0.17% gc time)

# Different methods, no preconditioner

@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve=KrylovJL_GMRES()),save_everystep=false);
# 0.891711 seconds (230.15 k allocations: 184.457 MiB, 5.10% gc time)

@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 6.231675 seconds (1.37 M allocations: 206.219 MiB, 0.28% gc time)

@time solve(prob_ode_brusselator_2d,Rodas4(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 8.095367 seconds (1.82 M allocations: 296.098 MiB, 0.38% gc time)

@time solve(prob_ode_brusselator_2d,Rodas4(linsolve=KrylovJL_GMRES()),save_everystep=false);
# 1.215097 seconds (317.01 k allocations: 152.230 MiB, 3.26% gc time)

@time solve(prob_ode_brusselator_2d,Rodas4P(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 8.714918 seconds (1.96 M allocations: 299.474 MiB, 0.35% gc time)

@time solve(prob_ode_brusselator_2d,Rodas4P2(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 9.772534 seconds (2.20 M allocations: 319.374 MiB, 0.30% gc time)

@time solve(prob_ode_brusselator_2d,Rodas5(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 7.827665 seconds (1.76 M allocations: 276.399 MiB, 0.32% gc time)

@time solve(prob_ode_brusselator_2d,Rodas5(linsolve=KrylovJL_GMRES()),save_everystep=false);
# 1.145810 seconds (301.72 k allocations: 116.390 MiB, 4.10% gc time)

@time solve(prob_ode_brusselator_2d,TRBDF2(linsolve=KrylovJL_GMRES()),save_everystep=false);
# 0.993533 seconds (242.07 k allocations: 103.769 MiB, 2.48% gc time)

@time solve(prob_ode_brusselator_2d,KenCarp47(linsolve=KrylovJL_GMRES()),save_everystep=false);
# 0.715011 seconds (172.53 k allocations: 30.978 MiB)

@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),concrete_jac=false),save_everystep=false);
# 6.406100 seconds (1.40 M allocations: 211.752 MiB, 3.53% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 6.173799 seconds (1.40 M allocations: 212.254 MiB, 0.20% gc time)

@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 9.123308 seconds (2.09 M allocations: 196.480 MiB, 0.18% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas4P(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 8.711351 seconds (1.99 M allocations: 305.508 MiB, 0.30% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas4(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 8.151490 seconds (1.86 M allocations: 302.132 MiB, 0.30% gc time)

@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 9.089930 seconds (2.09 M allocations: 196.464 MiB, 0.09% gc time)

using IncompleteLU
function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = ilu(convert(AbstractMatrix,W), τ = 5.0)
  else
    Pl = Plprev
  end
  Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.578485 seconds (93.01 k allocations: 385.764 MiB, 2.90% gc time)

function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = ilu(convert(AbstractMatrix,W), τ = 10.0)
  else
    Pl = Plprev
  end
  Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=KrylovJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.458363 seconds (118.22 k allocations: 236.974 MiB, 3.12% gc time)

function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = ilu(convert(AbstractMatrix,W), τ = 0.2)
  else
    Pl = Plprev
  end
  Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 11.473043 seconds (2.09 M allocations: 197.050 MiB, 0.11% gc time)

function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = ilu(convert(AbstractMatrix,W), τ = 200.0)
  else
    Pl = Plprev
  end
  Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,Rodas4P(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.627300 seconds (107.22 k allocations: 389.992 MiB, 1.89% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas4(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.639779 seconds (108.02 k allocations: 393.251 MiB, 1.44% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.564875 seconds (100.44 k allocations: 323.098 MiB, 1.79% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve=KrylovJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.519443 seconds (127.76 k allocations: 171.152 MiB)

function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = ilu(convert(AbstractMatrix,W), τ = 50.0)
  else
    Pl = Plprev
  end
  Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.237032 seconds (59.60 k allocations: 156.461 MiB)

@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.377022 seconds (66.31 k allocations: 292.416 MiB, 2.79% gc time)

@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=KrylovJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.178704 seconds (61.76 k allocations: 61.385 MiB)

function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = ilu(convert(AbstractMatrix,W), τ = 200.0)
  else
    Pl = Plprev
  end
  Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,FBDF(linsolve=KrylovJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.309274 seconds (108.03 k allocations: 188.703 MiB)

using AlgebraicMultigrid
function algebraicmultigrid(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = aspreconditioner(ruge_stuben(convert(AbstractMatrix,W)))
  else
    Pl = Plprev
  end
  Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 0.907391 seconds (126.34 k allocations: 1.001 GiB, 3.95% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=KrylovJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 0.837808 seconds (131.66 k allocations: 923.665 MiB, 5.45% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas4(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 1.122485 seconds (98.24 k allocations: 908.294 MiB, 3.52% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 1.138373 seconds (119.22 k allocations: 909.016 MiB, 3.31% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve=KrylovJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 0.942355 seconds (108.41 k allocations: 563.479 MiB, 2.71% gc time)

@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 2.139822 seconds (171.27 k allocations: 428.200 MiB, 1.35% gc time)

@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 0.466828 seconds (61.93 k allocations: 255.953 MiB, 2.74% gc time)

@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=KrylovJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 0.378352 seconds (61.18 k allocations: 160.815 MiB, 4.92% gc time)

function algebraicmultigrid2(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = aspreconditioner(smoothed_aggregation(convert(AbstractMatrix,W)))
  else
    Pl = Plprev
  end
  Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid2,concrete_jac=true),save_everystep=false);
# 1.527112 seconds (144.66 k allocations: 666.361 MiB, 2.75% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid2,concrete_jac=true),save_everystep=false);
# 1.971150 seconds (155.53 k allocations: 629.689 MiB, 2.68% gc time)

@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid2,concrete_jac=true),save_everystep=false);
# 1.018097 seconds (96.01 k allocations: 281.463 MiB, 1.34% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(),save_everystep=false);
# 1.140436 seconds (48.75 k allocations: 762.010 MiB, 1.19% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas4(),save_everystep=false);
# 1.055678 seconds (47.45 k allocations: 637.863 MiB, 0.89% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas5(),save_everystep=false);
# 0.781087 seconds (43.60 k allocations: 426.857 MiB, 1.64% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve = KLUFactorization()),save_everystep=false);
# 0.948284 seconds (62.59 k allocations: 528.187 MiB, 1.38% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas4(linsolve = KLUFactorization()),save_everystep=false);
# 0.768939 seconds (64.30 k allocations: 417.019 MiB, 2.38% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve = KLUFactorization()),save_everystep=false);
# 0.513310 seconds (57.34 k allocations: 269.663 MiB, 2.46% gc time)

@time solve(prob_ode_brusselator_2d_mtk,Rodas5(linsolve = KLUFactorization()),save_everystep=false);
# 0.498070 seconds (53.55 k allocations: 269.487 MiB)

@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve = KLUFactorization()),save_everystep=false);
# 0.421878 seconds (56.46 k allocations: 219.270 MiB, 1.86% gc time)

@time solve(prob_ode_brusselator_2d_mtk,TRBDF2(linsolve = KLUFactorization()),save_everystep=false);
# 0.405702 seconds (52.99 k allocations: 219.127 MiB)

@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve = KLUFactorization()),save_everystep=false);
# 0.333920 seconds (64.67 k allocations: 158.963 MiB)

@time solve(prob_ode_brusselator_2d_mtk,KenCarp4(linsolve = KLUFactorization()),save_everystep=false);
# 0.479085 seconds (63.95 k allocations: 227.024 MiB, 2.32% gc time)

@time solve(prob_ode_brusselator_2d_mtk,KenCarp47(linsolve = KLUFactorization()),save_everystep=false);
# 0.352764 seconds (62.23 k allocations: 158.895 MiB, 3.02% gc time)

@time solve(prob_ode_brusselator_2d_mtk,RadauIIA5(linsolve = KLUFactorization()),save_everystep=false);
# 7.843305 seconds (58.72 k allocations: 572.325 MiB, 0.21% gc time)

@time solve(prob_ode_brusselator_2d_sparse,FBDF(linsolve = KLUFactorization()),save_everystep=false);
# 0.725433 seconds (77.00 k allocations: 403.682 MiB, 1.99% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve = UMFPACKFactorization()),save_everystep=false);
# 1.133577 seconds (48.77 k allocations: 762.013 MiB, 0.88% gc time)

@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve = UMFPACKFactorization()),save_everystep=false);
# 0.774167 seconds (43.60 k allocations: 426.858 MiB, 1.22% gc time)

@time solve(prob_ode_brusselator_2d_mtk,Rodas5(linsolve = UMFPACKFactorization()),save_everystep=false);
# 0.779785 seconds (41.72 k allocations: 426.789 MiB, 2.16% gc time)

@time solve(prob_ode_brusselator_2d_sparse,FBDF(linsolve = UMFPACKFactorization()),save_everystep=false);
# 0.978541 seconds (63.08 k allocations: 604.900 MiB, 0.97% gc time)

@time solve(prob_ode_brusselator_2d_sparse,RadauIIA5(linsolve = UMFPACKFactorization()),save_everystep=false);
# 8.086175 seconds (47.83 k allocations: 912.373 MiB, 0.31% gc time)

#=
import Pardiso
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve = MKLPardisoFactorize()),save_everystep=false);
#

@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve = MKLPardisoIterate()),save_everystep=false);
#
=#

using Sundials
@time solve(prob_ode_brusselator_2d_sparse,CVODE_BDF(linear_solver=:GMRES),save_everystep=false);
# 0.212607 seconds (58.38 k allocations: 2.846 MiB)

@time solve(prob_ode_brusselator_2d_mtk,CVODE_BDF(linear_solver=:KLU),save_everystep=false);
# 0.495496 seconds (1.36 k allocations: 5.795 MiB)

@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Jan 9, 2022

Summary: fastest algorithm is:

@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=KrylovJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.178704 seconds (61.76 k allocations: 61.385 MiB)

then

@time solve(prob_ode_brusselator_2d_sparse,CVODE_BDF(linear_solver=:GMRES),save_everystep=false);
# 0.212607 seconds (58.38 k allocations: 2.846 MiB)

Finally, a winner.

IterativeSolvers.jl is terrible for performance.

@ChrisRackauckas
Copy link
Member Author

It looks like Krylov.jl is doing some performance nono's. Testing script:

using OrdinaryDiffEq, LinearSolve, Test

const N = 32
const xyd_brusselator = range(0,stop=1,length=N)
brusselator_f(x, y, t) = (((x-0.3)^2 + (y-0.6)^2) <= 0.1^2) * (t >= 1.1) * 5.
limit(a, N) = a == N+1 ? 1 : a == 0 ? N : a

function brusselator_2d_loop(du, u, p, t)
  A, B, alpha, dx = p
  alpha = alpha/dx^2
  @inbounds for I in CartesianIndices((N, N))
    i, j = Tuple(I)
    x, y = xyd_brusselator[I[1]], xyd_brusselator[I[2]]
    ip1, im1, jp1, jm1 = limit(i+1, N), limit(i-1, N), limit(j+1, N), limit(j-1, N)
    du[i,j,1] = alpha*(u[im1,j,1] + u[ip1,j,1] + u[i,jp1,1] + u[i,jm1,1] - 4u[i,j,1]) +
                B + u[i,j,1]^2*u[i,j,2] - (A + 1)*u[i,j,1] + brusselator_f(x, y, t)
    du[i,j,2] = alpha*(u[im1,j,2] + u[ip1,j,2] + u[i,jp1,2] + u[i,jm1,2] - 4u[i,j,2]) +
                A*u[i,j,1] - u[i,j,1]^2*u[i,j,2]
    end
end
p = (3.4, 1., 10., step(xyd_brusselator))

function init_brusselator_2d(xyd)
  N = length(xyd)
  u = zeros(N, N, 2)
  for I in CartesianIndices((N, N))
    x = xyd[I[1]]
    y = xyd[I[2]]
    u[I,1] = 22*(y*(1-y))^(3/2)
    u[I,2] = 27*(x*(1-x))^(3/2)
  end
  u
end
u0 = init_brusselator_2d(xyd_brusselator)
prob_ode_brusselator_2d = ODEProblem(brusselator_2d_loop,
                                     u0,(0.,11.5),p)

using Symbolics
du0 = copy(u0)
jac = Symbolics.jacobian_sparsity((du,u)->brusselator_2d_loop(du,u,p,0.0),du0,u0)

prob_ode_brusselator_2d_sparse = ODEProblem(ODEFunction(brusselator_2d_loop,jac_prototype = float.(jac)),
                                            u0,(0.,11.5),p)

@time solve(prob_ode_brusselator_2d,KenCarp47(linsolve=KrylovJL_GMRES()),save_everystep=false);
# 0.715011 seconds (172.53 k allocations: 30.978 MiB)

using IncompleteLU
function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = ilu(convert(AbstractMatrix,W), τ = 50.0)
  else
    Pl = Plprev
  end
  Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=KrylovJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.178704 seconds (61.76 k allocations: 61.385 MiB)

using AlgebraicMultigrid
function algebraicmultigrid(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = aspreconditioner(ruge_stuben(convert(AbstractMatrix,W)))
  else
    Pl = Plprev
  end
  Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=KrylovJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 0.378352 seconds (61.18 k allocations: 160.815 MiB, 4.92% gc time)

using Sundials
@time solve(prob_ode_brusselator_2d_sparse,CVODE_BDF(linear_solver=:GMRES),save_everystep=false);
# 0.212607 seconds (58.38 k allocations: 2.846 MiB)

It points all to:

    mul!(w, A, p)                # w ← AN⁻¹vₖ
    MisI || mul!(q, M, w)        # q ← M⁻¹AN⁻¹vₖ
    for i = 1 : iter
      R[nr+i] = @kdot(n, V[i], q)    # hᵢₖ = qᵀvᵢ
      @kaxpy!(n, -R[nr+i], V[i], q)  # q ← q - hᵢₖvᵢ
    end

Forced BLAS is probably bad for OpenBLAS defaults with dot. And IncompleteLU is all stuck on:

"""
Applies in-place backward substitution with the U factor of F, under the assumptions:

1. U is stored transposed / row-wise
2. U has no lower-triangular elements stored
3. U has (nonzero) diagonal elements stored.
"""
function backward_substitution!(F::ILUFactorization, y::AbstractVector)
    U = F.U
    @inbounds for col = U.n : -1 : 1

        # Substitutions
        for idx = U.colptr[col + 1] - 1 : -1 : U.colptr[col] + 1
            y[col] -= U.nzval[idx] * y[U.rowval[idx]]
        end

        # Final answer for y[col]
        y[col] /= U.nzval[U.colptr[col]]
    end

    y
end

function backward_substitution!(v::AbstractVector, F::ILUFactorization, y::AbstractVector)
    v .= y
    backward_substitution!(F, v)
end

"""
Applies in-place forward substitution with the L factor of F, under the assumptions:

1. L is stored column-wise (unlike U)
2. L has no upper triangular elements
3. L has *no* diagonal elements
"""
function forward_substitution!(F::ILUFactorization, y::AbstractVector)
    L = F.L
    @inbounds for col = 1 : L.n - 1
        for idx = L.colptr[col] : L.colptr[col + 1] - 1
            y[L.rowval[idx]] -= L.nzval[idx] * y[col]
        end
    end

    y
end

function forward_substitution!(v::AbstractVector, F::ILUFactorization, y::AbstractVector)
    v .= y
    forward_substitution!(F, v)
end

which probably could use LoopVectorization magic. Looks like it's in @chriselrod territory now.

@ChrisRackauckas
Copy link
Member Author

Krylov.jl needs JuliaSmoothOptimizers/Krylov.jl#477 for ILU to work.

@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Jan 9, 2022

Versions with preconditioning in Sundials:

using OrdinaryDiffEq, LinearSolve, Test

const N = 32
const xyd_brusselator = range(0,stop=1,length=N)
brusselator_f(x, y, t) = (((x-0.3)^2 + (y-0.6)^2) <= 0.1^2) * (t >= 1.1) * 5.
limit(a, N) = a == N+1 ? 1 : a == 0 ? N : a

function brusselator_2d_loop(du, u, p, t)
  A, B, alpha, dx = p
  alpha = alpha/dx^2
  @inbounds for I in CartesianIndices((N, N))
    i, j = Tuple(I)
    x, y = xyd_brusselator[I[1]], xyd_brusselator[I[2]]
    ip1, im1, jp1, jm1 = limit(i+1, N), limit(i-1, N), limit(j+1, N), limit(j-1, N)
    du[i,j,1] = alpha*(u[im1,j,1] + u[ip1,j,1] + u[i,jp1,1] + u[i,jm1,1] - 4u[i,j,1]) +
                B + u[i,j,1]^2*u[i,j,2] - (A + 1)*u[i,j,1] + brusselator_f(x, y, t)
    du[i,j,2] = alpha*(u[im1,j,2] + u[ip1,j,2] + u[i,jp1,2] + u[i,jm1,2] - 4u[i,j,2]) +
                A*u[i,j,1] - u[i,j,1]^2*u[i,j,2]
    end
end
p = (3.4, 1., 10., step(xyd_brusselator))

function init_brusselator_2d(xyd)
  N = length(xyd)
  u = zeros(N, N, 2)
  for I in CartesianIndices((N, N))
    x = xyd[I[1]]
    y = xyd[I[2]]
    u[I,1] = 22*(y*(1-y))^(3/2)
    u[I,2] = 27*(x*(1-x))^(3/2)
  end
  u
end
u0 = init_brusselator_2d(xyd_brusselator)
prob_ode_brusselator_2d = ODEProblem(brusselator_2d_loop,
                                     u0,(0.,11.5),p)

using Symbolics
du0 = copy(u0)
jac = Symbolics.jacobian_sparsity((du,u)->brusselator_2d_loop(du,u,p,0.0),du0,u0)

prob_ode_brusselator_2d_sparse = ODEProblem(ODEFunction(brusselator_2d_loop,jac_prototype = float.(jac)),
                                            u0,(0.,11.5),p)

using ModelingToolkit
prob_ode_brusselator_2d_mtk = ODEProblem(modelingtoolkitize(prob_ode_brusselator_2d_sparse),[],(0.0,11.5),jac=true,sparse=true);

@time solve(prob_ode_brusselator_2d,KenCarp47(linsolve=KrylovJL_GMRES()),save_everystep=false);
# 0.715011 seconds (172.53 k allocations: 30.978 MiB)

prob_ode_brusselator_2d_sparse = ODEProblem(ODEFunction(brusselator_2d_loop,jac_prototype = float.(jac)),
                                            u0,(0.,11.5),p)

using IncompleteLU
Base.eltype(::IncompleteLU.ILUFactorization{Tv,Ti}) where {Tv,Ti} = Tv

function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = ilu(convert(AbstractMatrix,W), τ = 50.0)
  else
    Pl = Plprev
  end
  Pl,nothing
end

@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=KrylovJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.178704 seconds (61.76 k allocations: 61.385 MiB)

using AlgebraicMultigrid
function algebraicmultigrid(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    Pl = aspreconditioner(ruge_stuben(convert(AbstractMatrix,W)))
  else
    Pl = Plprev
  end
  Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=KrylovJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 0.378352 seconds (61.18 k allocations: 160.815 MiB, 4.92% gc time)

function algebraicmultigrid2(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
  if newW === nothing || newW
    A = convert(AbstractMatrix,W)
    Pl = aspreconditioner(ruge_stuben(A, presmoother = AlgebraicMultigrid.Jacobi(rand(size(A,1))), postsmoother = AlgebraicMultigrid.Jacobi(rand(size(A,1)))))
  else
    Pl = Plprev
  end
  Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=KrylovJL_GMRES(),precs=algebraicmultigrid2,concrete_jac=true),save_everystep=false);
# 0.285271 seconds (65.71 k allocations: 170.192 MiB, 2.49% gc time)

using Sundials, LinearAlgebra
@time solve(prob_ode_brusselator_2d_sparse,CVODE_BDF(linear_solver=:GMRES),save_everystep=false);
# 0.212607 seconds (58.38 k allocations: 2.846 MiB)

u0 = prob_ode_brusselator_2d_mtk.u0
p  = prob_ode_brusselator_2d_mtk.p
const jaccache = prob_ode_brusselator_2d_mtk.f.jac(u0,p,0.0)
const W = I - 1.0*jaccache

prectmp = ilu(W, τ = 50.0)
const preccache = Ref(prectmp)

function psetupilu(p, t, u, du, jok, jcurPtr, gamma)
  if jok
    prob_ode_brusselator_2d_mtk.f.jac(jaccache,u,p,t)
    jcurPtr[] = true

    # W = I - gamma*J
    @. W = -gamma*jaccache
    idxs = diagind(W)
    @. @view(W[idxs]) = @view(W[idxs]) + 1

    # Build preconditioner on W
    preccache[] = ilu(W, τ = 5.0)
  end
end

function precilu(z,r,p,t,y,fy,gamma,delta,lr)
  ldiv!(z,preccache[],r)
end

prectmp2 = aspreconditioner(ruge_stuben(W, presmoother = AlgebraicMultigrid.Jacobi(rand(size(W,1))), postsmoother = AlgebraicMultigrid.Jacobi(rand(size(W,1)))))
const preccache3 = Ref(prectmp2)
function psetupamg(p, t, u, du, jok, jcurPtr, gamma)
  if jok
    prob_ode_brusselator_2d_mtk.f.jac(jaccache,u,p,t)
    jcurPtr[] = true

    # W = I - gamma*J
    @. W = -gamma*jaccache
    idxs = diagind(W)
    @. @view(W[idxs]) = @view(W[idxs]) + 1

    # Build preconditioner on W
    preccache3[] = aspreconditioner(ruge_stuben(W, presmoother = AlgebraicMultigrid.Jacobi(rand(size(W,1))), postsmoother = AlgebraicMultigrid.Jacobi(rand(size(W,1)))))
  end
end

function precamg(z,r,p,t,y,fy,gamma,delta,lr)
  ldiv!(z,preccache3[],r)
end

@time solve(prob_ode_brusselator_2d_sparse,CVODE_BDF(linear_solver=:GMRES,prec=precilu,psetup=psetupilu,prec_side=1),save_everystep=false);
# 0.095955 seconds (17.72 k allocations: 77.079 MiB)

@time solve(prob_ode_brusselator_2d_sparse,CVODE_BDF(linear_solver=:GMRES,prec=precamg,psetup=psetupamg,prec_side=1),save_everystep=false);
# 0.146130 seconds (30.68 k allocations: 275.589 MiB, 7.51% gc time)

@ChrisRackauckas
Copy link
Member Author

Doing some upstream fixing haampie/IncompleteLU.jl#22 and JuliaSmoothOptimizers/Krylov.jl#477

@chriselrod
Copy link
Contributor

Could you give me any representative problems to play with for the forward and back substitution?

@ChrisRackauckas
Copy link
Member Author

See the script above. The 3 versions with preconditioners all seem to have a heavy amount of time in the ILU and AMG forward and back substitution

@ChrisRackauckas
Copy link
Member Author

Completed 🎉.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants