-
-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finishing the linear solve and Krylov interfaces #1551
Comments
Basically solves #1551 sans any extra performance issues. The interface is simply that the user gives: ```julia Pr, Pl = precs(W,du,u,p,t,newW,Plprev,Prprev,solverdata) ``` in the associated algorithm definitions. This gives a slight generalization over the Sundials interface so it should be enough, sans the `jcur` thing that is still an ongoing question. The setup phase is just a different dispatch on this function, i.e. ```julia Pr, Pl = precs(W,du,u,p,t,::Nothing,::Nothing,::Nothing,solverdata) ``` which is rather clean. `solverdata` is for backwards compatibility: it's going to not be documented for a bit but allow for slapping things like `gamma` or `dt` in there, and adding to the struct won't be breaking like changing the call signature would be. This PR is currently untested and only implements it for one algorithm, and so getting this to merge will essentially just require adding proper tests of using preconditioners which require updated Jacobians, like incomplete LU-factorizations, showing that it improves the convergence of GMRES. And setting up the rest of the algorithms to have this preconditioner interface as well, which is the simple but tedious part.
The purpose of
This would have to hook into the Jacobian reuse stuff. |
Basically solves #1551 sans any extra performance issues. The interface is simply that the user gives: ```julia Pr, Pl = precs(W,du,u,p,t,newW,Plprev,Prprev,solverdata) ``` in the associated algorithm definitions. This gives a slight generalization over the Sundials interface so it should be enough, sans the `jcur` thing that is still an ongoing question. The setup phase is just a different dispatch on this function, i.e. ```julia Pr, Pl = precs(W,du,u,p,t,::Nothing,::Nothing,::Nothing,solverdata) ``` which is rather clean. `solverdata` is for backwards compatibility: it's going to not be documented for a bit but allow for slapping things like `gamma` or `dt` in there, and adding to the struct won't be breaking like changing the call signature would be. This PR is currently untested and only implements it for one algorithm, and so getting this to merge will essentially just require adding proper tests of using preconditioners which require updated Jacobians, like incomplete LU-factorizations, showing that it improves the convergence of GMRES. And setting up the rest of the algorithms to have this preconditioner interface as well, which is the simple but tedious part.
After #1556 using OrdinaryDiffEq, LinearSolve, Test
const N = 32
const xyd_brusselator = range(0,stop=1,length=N)
brusselator_f(x, y, t) = (((x-0.3)^2 + (y-0.6)^2) <= 0.1^2) * (t >= 1.1) * 5.
limit(a, N) = a == N+1 ? 1 : a == 0 ? N : a
function brusselator_2d_loop(du, u, p, t)
A, B, alpha, dx = p
alpha = alpha/dx^2
@inbounds for I in CartesianIndices((N, N))
i, j = Tuple(I)
x, y = xyd_brusselator[I[1]], xyd_brusselator[I[2]]
ip1, im1, jp1, jm1 = limit(i+1, N), limit(i-1, N), limit(j+1, N), limit(j-1, N)
du[i,j,1] = alpha*(u[im1,j,1] + u[ip1,j,1] + u[i,jp1,1] + u[i,jm1,1] - 4u[i,j,1]) +
B + u[i,j,1]^2*u[i,j,2] - (A + 1)*u[i,j,1] + brusselator_f(x, y, t)
du[i,j,2] = alpha*(u[im1,j,2] + u[ip1,j,2] + u[i,jp1,2] + u[i,jm1,2] - 4u[i,j,2]) +
A*u[i,j,1] - u[i,j,1]^2*u[i,j,2]
end
end
p = (3.4, 1., 10., step(xyd_brusselator))
function init_brusselator_2d(xyd)
N = length(xyd)
u = zeros(N, N, 2)
for I in CartesianIndices((N, N))
x = xyd[I[1]]
y = xyd[I[2]]
u[I,1] = 22*(y*(1-y))^(3/2)
u[I,2] = 27*(x*(1-x))^(3/2)
end
u
end
u0 = init_brusselator_2d(xyd_brusselator)
prob_ode_brusselator_2d = ODEProblem(brusselator_2d_loop,
u0,(0.,5.0),p)
@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve=IterativeSolversJL_GMRES()),save_everystep=false)
# 49.017548 seconds (798.97 M allocations: 21.856 GiB, 6.28% gc time, 12.82% compilation time)
using Symbolics
du0 = copy(u0)
jac = Symbolics.jacobian_sparsity((du,u)->brusselator_2d_loop(du,u,p,0.0),du0,u0)
prob_ode_brusselator_2d = ODEProblem(ODEFunction(brusselator_2d_loop,jac_prototype = float.(jac)),
u0,(0.,11.5),p)
@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),concrete_jac=false),save_everystep=false);
# 100.195429 seconds (1.75 G allocations: 45.848 GiB, 4.98% gc time)
@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 5.442302 seconds (52.74 k allocations: 160.680 MiB, 0.09% gc time)
@time solve(prob_ode_brusselator_2d,TRBDF2(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 5.442302 seconds (52.74 k allocations: 160.680 MiB, 0.09% gc time)
@time solve(prob_ode_brusselator_2d,Rodas4P(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 7.422705 seconds (59.82 k allocations: 231.397 MiB, 0.29% gc time)
@time solve(prob_ode_brusselator_2d,Rodas4(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 7.422705 seconds (59.82 k allocations: 231.397 MiB, 0.29% gc time)
@time solve(prob_ode_brusselator_2d,TRBDF2(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 7.422705 seconds (59.82 k allocations: 231.397 MiB, 0.29% gc time)
using IncompleteLU
function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
Pl = ilu(convert(AbstractMatrix,W), τ = 5.0)
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.541864 seconds (50.86 k allocations: 384.094 MiB, 3.17% gc time)
function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
Pl = ilu(convert(AbstractMatrix,W), τ = 200.0)
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d,Rodas4P(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.617162 seconds (51.02 k allocations: 387.520 MiB, 2.59% gc time)
@time solve(prob_ode_brusselator_2d,Rodas4(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.625991 seconds (51.17 k allocations: 390.754 MiB, 2.56% gc time)
@time solve(prob_ode_brusselator_2d,Rodas5(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.556428 seconds (47.99 k allocations: 320.761 MiB, 2.85% gc time)
using AlgebraicMultigrid
function algebraicmultigrid(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
Pl = aspreconditioner(ruge_stuben(W))
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 0.952956 seconds (111.62 k allocations: 1.000 GiB, 4.67% gc time)
@time solve(prob_ode_brusselator_2d,Rodas4(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 1.122485 seconds (98.24 k allocations: 908.294 MiB, 3.52% gc time)
@time solve(prob_ode_brusselator_2d,Rodas5(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 1.022904 seconds (81.73 k allocations: 694.942 MiB, 2.69% gc time)
@time solve(prob_ode_brusselator_2d,Rosenbrock23(),save_everystep=false);
# 1.127027 seconds (48.75 k allocations: 762.010 MiB, 0.85% gc time)
@time solve(prob_ode_brusselator_2d,Rodas4(),save_everystep=false);
# 1.026517 seconds (47.41 k allocations: 637.861 MiB, 0.78% gc time)
@time solve(prob_ode_brusselator_2d,Rodas5(),save_everystep=false);
# 0.778769 seconds (43.59 k allocations: 426.856 MiB, 0.65% gc time)
@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve = KLUFactorization()),save_everystep=false);
# 0.904355 seconds (61.25 k allocations: 528.112 MiB, 0.69% gc time)
@time solve(prob_ode_brusselator_2d,Rodas4(linsolve = KLUFactorization()),save_everystep=false);
# 0.745028 seconds (63.78 k allocations: 416.990 MiB, 0.89% gc time)
@time solve(prob_ode_brusselator_2d,Rodas5(linsolve = KLUFactorization()),save_everystep=false);
# 0.493103 seconds (56.56 k allocations: 269.619 MiB, 0.98% gc time)
@time solve(prob_ode_brusselator_2d,TRBDF2(linsolve = KLUFactorization()),save_everystep=false);
# 0.426342 seconds (57.53 k allocations: 219.330 MiB, 2.16% gc time)
@time solve(prob_ode_brusselator_2d,KenCarp4(linsolve = KLUFactorization()),save_everystep=false);
# 0.467652 seconds (67.39 k allocations: 227.136 MiB, 0.85% gc time)
@time solve(prob_ode_brusselator_2d,FBDF(linsolve = KLUFactorization()),save_everystep=false);
# 0.718509 seconds (76.77 k allocations: 403.669 MiB, 0.71% gc time)
@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve = UMFPACKFactorization()),save_everystep=false);
# 1.110053 seconds (48.79 k allocations: 762.014 MiB, 0.96% gc time)
@time solve(prob_ode_brusselator_2d,Rodas5(linsolve = UMFPACKFactorization()),save_everystep=false);
# 0.749630 seconds (43.60 k allocations: 426.858 MiB, 0.70% gc time)
@time solve(prob_ode_brusselator_2d,FBDF(linsolve = UMFPACKFactorization()),save_everystep=false);
# 0.941824 seconds (63.09 k allocations: 604.900 MiB, 0.98% gc time)
#=
import Pardiso
@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve = MKLPardisoFactorize()),save_everystep=false);
#
@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve = MKLPardisoIterate()),save_everystep=false);
#
=#
using Sundials
@time solve(prob_ode_brusselator_2d,CVODE_BDF(linear_solver=:GMRES),save_everystep=false);
# 0.225442 seconds (58.38 k allocations: 2.846 MiB) |
There's one last PR to really finish this up. Instead of having a We still need the And we should add the |
With #1561, we are almost done. using OrdinaryDiffEq, LinearSolve, Test
const N = 32
const xyd_brusselator = range(0,stop=1,length=N)
brusselator_f(x, y, t) = (((x-0.3)^2 + (y-0.6)^2) <= 0.1^2) * (t >= 1.1) * 5.
limit(a, N) = a == N+1 ? 1 : a == 0 ? N : a
function brusselator_2d_loop(du, u, p, t)
A, B, alpha, dx = p
alpha = alpha/dx^2
@inbounds for I in CartesianIndices((N, N))
i, j = Tuple(I)
x, y = xyd_brusselator[I[1]], xyd_brusselator[I[2]]
ip1, im1, jp1, jm1 = limit(i+1, N), limit(i-1, N), limit(j+1, N), limit(j-1, N)
du[i,j,1] = alpha*(u[im1,j,1] + u[ip1,j,1] + u[i,jp1,1] + u[i,jm1,1] - 4u[i,j,1]) +
B + u[i,j,1]^2*u[i,j,2] - (A + 1)*u[i,j,1] + brusselator_f(x, y, t)
du[i,j,2] = alpha*(u[im1,j,2] + u[ip1,j,2] + u[i,jp1,2] + u[i,jm1,2] - 4u[i,j,2]) +
A*u[i,j,1] - u[i,j,1]^2*u[i,j,2]
end
end
p = (3.4, 1., 10., step(xyd_brusselator))
function init_brusselator_2d(xyd)
N = length(xyd)
u = zeros(N, N, 2)
for I in CartesianIndices((N, N))
x = xyd[I[1]]
y = xyd[I[2]]
u[I,1] = 22*(y*(1-y))^(3/2)
u[I,2] = 27*(x*(1-x))^(3/2)
end
u
end
u0 = init_brusselator_2d(xyd_brusselator)
prob_ode_brusselator_2d = ODEProblem(brusselator_2d_loop,
u0,(0.,11.5),p)
using Symbolics
du0 = copy(u0)
jac = Symbolics.jacobian_sparsity((du,u)->brusselator_2d_loop(du,u,p,0.0),du0,u0)
prob_ode_brusselator_2d_sparse = ODEProblem(ODEFunction(brusselator_2d_loop,jac_prototype = float.(jac)),
u0,(0.,11.5),p)
using ModelingToolkit
prob_ode_brusselator_2d_mtk = ODEProblem(modelingtoolkitize(prob_ode_brusselator_2d_sparse),[],(0.0,11.5),jac=true,sparse=true)
# Sparse and AD
@time sol = solve(prob_ode_brusselator_2d,TRBDF2(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 9.350326 seconds (2.06 M allocations: 190.429 MiB, 0.20% gc time)
@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 9.242326 seconds (2.09 M allocations: 195.962 MiB, 0.35% gc time)
@time solve(prob_ode_brusselator_2d,TRBDF2(linsolve=IterativeSolversJL_GMRES(),autodiff=false),save_everystep=false);
# 19.075056 seconds (4.34 M allocations: 277.909 MiB, 0.17% gc time)
@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),autodiff=false),save_everystep=false);
# 18.641695 seconds (4.37 M allocations: 282.333 MiB, 0.17% gc time)
# Different methods, no preconditioner
@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 6.231675 seconds (1.37 M allocations: 206.219 MiB, 0.28% gc time)
@time solve(prob_ode_brusselator_2d,Rodas4(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 8.095367 seconds (1.82 M allocations: 296.098 MiB, 0.38% gc time)
@time solve(prob_ode_brusselator_2d,Rodas4P(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 8.714918 seconds (1.96 M allocations: 299.474 MiB, 0.35% gc time)
@time solve(prob_ode_brusselator_2d,Rodas4P2(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 9.772534 seconds (2.20 M allocations: 319.374 MiB, 0.30% gc time)
@time solve(prob_ode_brusselator_2d,Rodas5(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 7.827665 seconds (1.76 M allocations: 276.399 MiB, 0.32% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),concrete_jac=false),save_everystep=false);
# 6.406100 seconds (1.40 M allocations: 211.752 MiB, 3.53% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 6.173799 seconds (1.40 M allocations: 212.254 MiB, 0.20% gc time)
@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 9.123308 seconds (2.09 M allocations: 196.480 MiB, 0.18% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas4P(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 8.711351 seconds (1.99 M allocations: 305.508 MiB, 0.30% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas4(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 8.151490 seconds (1.86 M allocations: 302.132 MiB, 0.30% gc time)
@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 9.089930 seconds (2.09 M allocations: 196.464 MiB, 0.09% gc time)
using IncompleteLU
function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
Pl = ilu(convert(AbstractMatrix,W), τ = 5.0)
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.578485 seconds (93.01 k allocations: 385.764 MiB, 2.90% gc time)
function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
Pl = ilu(convert(AbstractMatrix,W), τ = 0.2)
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 11.473043 seconds (2.09 M allocations: 197.050 MiB, 0.11% gc time)
function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
Pl = ilu(convert(AbstractMatrix,W), τ = 200.0)
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,Rodas4P(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.627300 seconds (107.22 k allocations: 389.992 MiB, 1.89% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas4(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.639779 seconds (108.02 k allocations: 393.251 MiB, 1.44% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.564875 seconds (100.44 k allocations: 323.098 MiB, 1.79% gc time)
using AlgebraicMultigrid
function algebraicmultigrid(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
Pl = aspreconditioner(ruge_stuben(convert(AbstractMatrix,W)))
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 0.907391 seconds (126.34 k allocations: 1.001 GiB, 3.95% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas4(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 1.122485 seconds (98.24 k allocations: 908.294 MiB, 3.52% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 1.138373 seconds (119.22 k allocations: 909.016 MiB, 3.31% gc time)
@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 2.139822 seconds (171.27 k allocations: 428.200 MiB, 1.35% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(),save_everystep=false);
# 1.140436 seconds (48.75 k allocations: 762.010 MiB, 1.19% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas4(),save_everystep=false);
# 1.055678 seconds (47.45 k allocations: 637.863 MiB, 0.89% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas5(),save_everystep=false);
# 0.781087 seconds (43.60 k allocations: 426.857 MiB, 1.64% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve = KLUFactorization()),save_everystep=false);
# 0.948284 seconds (62.59 k allocations: 528.187 MiB, 1.38% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas4(linsolve = KLUFactorization()),save_everystep=false);
# 0.768939 seconds (64.30 k allocations: 417.019 MiB, 2.38% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve = KLUFactorization()),save_everystep=false);
# 0.513310 seconds (57.34 k allocations: 269.663 MiB, 2.46% gc time)
@time solve(prob_ode_brusselator_2d_mtk,Rodas5(linsolve = KLUFactorization()),save_everystep=false);
# 0.498070 seconds (53.55 k allocations: 269.487 MiB)
@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve = KLUFactorization()),save_everystep=false);
# 0.421878 seconds (56.46 k allocations: 219.270 MiB, 1.86% gc time)
@time solve(prob_ode_brusselator_2d_mtk,TRBDF2(linsolve = KLUFactorization()),save_everystep=false);
# 0.405702 seconds (52.99 k allocations: 219.127 MiB)
@time solve(prob_ode_brusselator_2d_sparse,KenCarp4(linsolve = KLUFactorization()),save_everystep=false);
# 0.476274 seconds (67.56 k allocations: 227.145 MiB, 1.61% gc time)
@time solve(prob_ode_brusselator_2d_mtk,KenCarp4(linsolve = KLUFactorization()),save_everystep=false);
# 0.479085 seconds (63.95 k allocations: 227.024 MiB, 2.32% gc time)
@time solve(prob_ode_brusselator_2d_sparse,FBDF(linsolve = KLUFactorization()),save_everystep=false);
# 0.725433 seconds (77.00 k allocations: 403.682 MiB, 1.99% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve = UMFPACKFactorization()),save_everystep=false);
# 1.133577 seconds (48.77 k allocations: 762.013 MiB, 0.88% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve = UMFPACKFactorization()),save_everystep=false);
# 0.774167 seconds (43.60 k allocations: 426.858 MiB, 1.22% gc time)
@time solve(prob_ode_brusselator_2d_mtk,Rodas5(linsolve = UMFPACKFactorization()),save_everystep=false);
# 0.779785 seconds (41.72 k allocations: 426.789 MiB, 2.16% gc time)
@time solve(prob_ode_brusselator_2d_sparse,FBDF(linsolve = UMFPACKFactorization()),save_everystep=false);
# 0.978541 seconds (63.08 k allocations: 604.900 MiB, 0.97% gc time)
#=
import Pardiso
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve = MKLPardisoFactorize()),save_everystep=false);
#
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve = MKLPardisoIterate()),save_everystep=false);
#
=#
using Sundials
@time solve(prob_ode_brusselator_2d_sparse,CVODE_BDF(linear_solver=:GMRES),save_everystep=false);
# 0.212607 seconds (58.38 k allocations: 2.846 MiB)
@time solve(prob_ode_brusselator_2d_mtk,CVODE_BDF(linear_solver=:KLU),save_everystep=false);
# 0.495496 seconds (1.36 k allocations: 5.795 MiB) |
using OrdinaryDiffEq, LinearSolve, Test
const N = 32
const xyd_brusselator = range(0,stop=1,length=N)
brusselator_f(x, y, t) = (((x-0.3)^2 + (y-0.6)^2) <= 0.1^2) * (t >= 1.1) * 5.
limit(a, N) = a == N+1 ? 1 : a == 0 ? N : a
function brusselator_2d_loop(du, u, p, t)
A, B, alpha, dx = p
alpha = alpha/dx^2
@inbounds for I in CartesianIndices((N, N))
i, j = Tuple(I)
x, y = xyd_brusselator[I[1]], xyd_brusselator[I[2]]
ip1, im1, jp1, jm1 = limit(i+1, N), limit(i-1, N), limit(j+1, N), limit(j-1, N)
du[i,j,1] = alpha*(u[im1,j,1] + u[ip1,j,1] + u[i,jp1,1] + u[i,jm1,1] - 4u[i,j,1]) +
B + u[i,j,1]^2*u[i,j,2] - (A + 1)*u[i,j,1] + brusselator_f(x, y, t)
du[i,j,2] = alpha*(u[im1,j,2] + u[ip1,j,2] + u[i,jp1,2] + u[i,jm1,2] - 4u[i,j,2]) +
A*u[i,j,1] - u[i,j,1]^2*u[i,j,2]
end
end
p = (3.4, 1., 10., step(xyd_brusselator))
function init_brusselator_2d(xyd)
N = length(xyd)
u = zeros(N, N, 2)
for I in CartesianIndices((N, N))
x = xyd[I[1]]
y = xyd[I[2]]
u[I,1] = 22*(y*(1-y))^(3/2)
u[I,2] = 27*(x*(1-x))^(3/2)
end
u
end
u0 = init_brusselator_2d(xyd_brusselator)
prob_ode_brusselator_2d = ODEProblem(brusselator_2d_loop,
u0,(0.,11.5),p)
using Symbolics
du0 = copy(u0)
jac = Symbolics.jacobian_sparsity((du,u)->brusselator_2d_loop(du,u,p,0.0),du0,u0)
prob_ode_brusselator_2d_sparse = ODEProblem(ODEFunction(brusselator_2d_loop,jac_prototype = float.(jac)),
u0,(0.,11.5),p)
using ModelingToolkit
prob_ode_brusselator_2d_mtk = ODEProblem(modelingtoolkitize(prob_ode_brusselator_2d_sparse),[],(0.0,11.5),jac=true,sparse=true);
# Sparse and AD
@time sol = solve(prob_ode_brusselator_2d,TRBDF2(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 9.350326 seconds (2.06 M allocations: 190.429 MiB, 0.20% gc time)
@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 9.242326 seconds (2.09 M allocations: 195.962 MiB, 0.35% gc time)
@time solve(prob_ode_brusselator_2d,TRBDF2(linsolve=IterativeSolversJL_GMRES(),autodiff=false),save_everystep=false);
# 19.075056 seconds (4.34 M allocations: 277.909 MiB, 0.17% gc time)
@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),autodiff=false),save_everystep=false);
# 18.641695 seconds (4.37 M allocations: 282.333 MiB, 0.17% gc time)
# Different methods, no preconditioner
@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve=KrylovJL_GMRES()),save_everystep=false);
# 0.891711 seconds (230.15 k allocations: 184.457 MiB, 5.10% gc time)
@time solve(prob_ode_brusselator_2d,Rosenbrock23(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 6.231675 seconds (1.37 M allocations: 206.219 MiB, 0.28% gc time)
@time solve(prob_ode_brusselator_2d,Rodas4(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 8.095367 seconds (1.82 M allocations: 296.098 MiB, 0.38% gc time)
@time solve(prob_ode_brusselator_2d,Rodas4(linsolve=KrylovJL_GMRES()),save_everystep=false);
# 1.215097 seconds (317.01 k allocations: 152.230 MiB, 3.26% gc time)
@time solve(prob_ode_brusselator_2d,Rodas4P(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 8.714918 seconds (1.96 M allocations: 299.474 MiB, 0.35% gc time)
@time solve(prob_ode_brusselator_2d,Rodas4P2(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 9.772534 seconds (2.20 M allocations: 319.374 MiB, 0.30% gc time)
@time solve(prob_ode_brusselator_2d,Rodas5(linsolve=IterativeSolversJL_GMRES()),save_everystep=false);
# 7.827665 seconds (1.76 M allocations: 276.399 MiB, 0.32% gc time)
@time solve(prob_ode_brusselator_2d,Rodas5(linsolve=KrylovJL_GMRES()),save_everystep=false);
# 1.145810 seconds (301.72 k allocations: 116.390 MiB, 4.10% gc time)
@time solve(prob_ode_brusselator_2d,TRBDF2(linsolve=KrylovJL_GMRES()),save_everystep=false);
# 0.993533 seconds (242.07 k allocations: 103.769 MiB, 2.48% gc time)
@time solve(prob_ode_brusselator_2d,KenCarp47(linsolve=KrylovJL_GMRES()),save_everystep=false);
# 0.715011 seconds (172.53 k allocations: 30.978 MiB)
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),concrete_jac=false),save_everystep=false);
# 6.406100 seconds (1.40 M allocations: 211.752 MiB, 3.53% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 6.173799 seconds (1.40 M allocations: 212.254 MiB, 0.20% gc time)
@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 9.123308 seconds (2.09 M allocations: 196.480 MiB, 0.18% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas4P(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 8.711351 seconds (1.99 M allocations: 305.508 MiB, 0.30% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas4(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 8.151490 seconds (1.86 M allocations: 302.132 MiB, 0.30% gc time)
@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),concrete_jac=true),save_everystep=false);
# 9.089930 seconds (2.09 M allocations: 196.464 MiB, 0.09% gc time)
using IncompleteLU
function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
Pl = ilu(convert(AbstractMatrix,W), τ = 5.0)
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.578485 seconds (93.01 k allocations: 385.764 MiB, 2.90% gc time)
function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
Pl = ilu(convert(AbstractMatrix,W), τ = 10.0)
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=KrylovJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.458363 seconds (118.22 k allocations: 236.974 MiB, 3.12% gc time)
function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
Pl = ilu(convert(AbstractMatrix,W), τ = 0.2)
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 11.473043 seconds (2.09 M allocations: 197.050 MiB, 0.11% gc time)
function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
Pl = ilu(convert(AbstractMatrix,W), τ = 200.0)
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,Rodas4P(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.627300 seconds (107.22 k allocations: 389.992 MiB, 1.89% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas4(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.639779 seconds (108.02 k allocations: 393.251 MiB, 1.44% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.564875 seconds (100.44 k allocations: 323.098 MiB, 1.79% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve=KrylovJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.519443 seconds (127.76 k allocations: 171.152 MiB)
function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
Pl = ilu(convert(AbstractMatrix,W), τ = 50.0)
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.237032 seconds (59.60 k allocations: 156.461 MiB)
@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.377022 seconds (66.31 k allocations: 292.416 MiB, 2.79% gc time)
@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=KrylovJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.178704 seconds (61.76 k allocations: 61.385 MiB)
function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
Pl = ilu(convert(AbstractMatrix,W), τ = 200.0)
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,FBDF(linsolve=KrylovJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.309274 seconds (108.03 k allocations: 188.703 MiB)
using AlgebraicMultigrid
function algebraicmultigrid(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
Pl = aspreconditioner(ruge_stuben(convert(AbstractMatrix,W)))
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 0.907391 seconds (126.34 k allocations: 1.001 GiB, 3.95% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=KrylovJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 0.837808 seconds (131.66 k allocations: 923.665 MiB, 5.45% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas4(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 1.122485 seconds (98.24 k allocations: 908.294 MiB, 3.52% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 1.138373 seconds (119.22 k allocations: 909.016 MiB, 3.31% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve=KrylovJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 0.942355 seconds (108.41 k allocations: 563.479 MiB, 2.71% gc time)
@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 2.139822 seconds (171.27 k allocations: 428.200 MiB, 1.35% gc time)
@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 0.466828 seconds (61.93 k allocations: 255.953 MiB, 2.74% gc time)
@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=KrylovJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 0.378352 seconds (61.18 k allocations: 160.815 MiB, 4.92% gc time)
function algebraicmultigrid2(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
Pl = aspreconditioner(smoothed_aggregation(convert(AbstractMatrix,W)))
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid2,concrete_jac=true),save_everystep=false);
# 1.527112 seconds (144.66 k allocations: 666.361 MiB, 2.75% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid2,concrete_jac=true),save_everystep=false);
# 1.971150 seconds (155.53 k allocations: 629.689 MiB, 2.68% gc time)
@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=IterativeSolversJL_GMRES(),precs=algebraicmultigrid2,concrete_jac=true),save_everystep=false);
# 1.018097 seconds (96.01 k allocations: 281.463 MiB, 1.34% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(),save_everystep=false);
# 1.140436 seconds (48.75 k allocations: 762.010 MiB, 1.19% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas4(),save_everystep=false);
# 1.055678 seconds (47.45 k allocations: 637.863 MiB, 0.89% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas5(),save_everystep=false);
# 0.781087 seconds (43.60 k allocations: 426.857 MiB, 1.64% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve = KLUFactorization()),save_everystep=false);
# 0.948284 seconds (62.59 k allocations: 528.187 MiB, 1.38% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas4(linsolve = KLUFactorization()),save_everystep=false);
# 0.768939 seconds (64.30 k allocations: 417.019 MiB, 2.38% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve = KLUFactorization()),save_everystep=false);
# 0.513310 seconds (57.34 k allocations: 269.663 MiB, 2.46% gc time)
@time solve(prob_ode_brusselator_2d_mtk,Rodas5(linsolve = KLUFactorization()),save_everystep=false);
# 0.498070 seconds (53.55 k allocations: 269.487 MiB)
@time solve(prob_ode_brusselator_2d_sparse,TRBDF2(linsolve = KLUFactorization()),save_everystep=false);
# 0.421878 seconds (56.46 k allocations: 219.270 MiB, 1.86% gc time)
@time solve(prob_ode_brusselator_2d_mtk,TRBDF2(linsolve = KLUFactorization()),save_everystep=false);
# 0.405702 seconds (52.99 k allocations: 219.127 MiB)
@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve = KLUFactorization()),save_everystep=false);
# 0.333920 seconds (64.67 k allocations: 158.963 MiB)
@time solve(prob_ode_brusselator_2d_mtk,KenCarp4(linsolve = KLUFactorization()),save_everystep=false);
# 0.479085 seconds (63.95 k allocations: 227.024 MiB, 2.32% gc time)
@time solve(prob_ode_brusselator_2d_mtk,KenCarp47(linsolve = KLUFactorization()),save_everystep=false);
# 0.352764 seconds (62.23 k allocations: 158.895 MiB, 3.02% gc time)
@time solve(prob_ode_brusselator_2d_mtk,RadauIIA5(linsolve = KLUFactorization()),save_everystep=false);
# 7.843305 seconds (58.72 k allocations: 572.325 MiB, 0.21% gc time)
@time solve(prob_ode_brusselator_2d_sparse,FBDF(linsolve = KLUFactorization()),save_everystep=false);
# 0.725433 seconds (77.00 k allocations: 403.682 MiB, 1.99% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve = UMFPACKFactorization()),save_everystep=false);
# 1.133577 seconds (48.77 k allocations: 762.013 MiB, 0.88% gc time)
@time solve(prob_ode_brusselator_2d_sparse,Rodas5(linsolve = UMFPACKFactorization()),save_everystep=false);
# 0.774167 seconds (43.60 k allocations: 426.858 MiB, 1.22% gc time)
@time solve(prob_ode_brusselator_2d_mtk,Rodas5(linsolve = UMFPACKFactorization()),save_everystep=false);
# 0.779785 seconds (41.72 k allocations: 426.789 MiB, 2.16% gc time)
@time solve(prob_ode_brusselator_2d_sparse,FBDF(linsolve = UMFPACKFactorization()),save_everystep=false);
# 0.978541 seconds (63.08 k allocations: 604.900 MiB, 0.97% gc time)
@time solve(prob_ode_brusselator_2d_sparse,RadauIIA5(linsolve = UMFPACKFactorization()),save_everystep=false);
# 8.086175 seconds (47.83 k allocations: 912.373 MiB, 0.31% gc time)
#=
import Pardiso
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve = MKLPardisoFactorize()),save_everystep=false);
#
@time solve(prob_ode_brusselator_2d_sparse,Rosenbrock23(linsolve = MKLPardisoIterate()),save_everystep=false);
#
=#
using Sundials
@time solve(prob_ode_brusselator_2d_sparse,CVODE_BDF(linear_solver=:GMRES),save_everystep=false);
# 0.212607 seconds (58.38 k allocations: 2.846 MiB)
@time solve(prob_ode_brusselator_2d_mtk,CVODE_BDF(linear_solver=:KLU),save_everystep=false);
# 0.495496 seconds (1.36 k allocations: 5.795 MiB)
|
Summary: fastest algorithm is: @time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=KrylovJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.178704 seconds (61.76 k allocations: 61.385 MiB) then @time solve(prob_ode_brusselator_2d_sparse,CVODE_BDF(linear_solver=:GMRES),save_everystep=false);
# 0.212607 seconds (58.38 k allocations: 2.846 MiB) Finally, a winner. IterativeSolvers.jl is terrible for performance. |
It looks like Krylov.jl is doing some performance nono's. Testing script: using OrdinaryDiffEq, LinearSolve, Test
const N = 32
const xyd_brusselator = range(0,stop=1,length=N)
brusselator_f(x, y, t) = (((x-0.3)^2 + (y-0.6)^2) <= 0.1^2) * (t >= 1.1) * 5.
limit(a, N) = a == N+1 ? 1 : a == 0 ? N : a
function brusselator_2d_loop(du, u, p, t)
A, B, alpha, dx = p
alpha = alpha/dx^2
@inbounds for I in CartesianIndices((N, N))
i, j = Tuple(I)
x, y = xyd_brusselator[I[1]], xyd_brusselator[I[2]]
ip1, im1, jp1, jm1 = limit(i+1, N), limit(i-1, N), limit(j+1, N), limit(j-1, N)
du[i,j,1] = alpha*(u[im1,j,1] + u[ip1,j,1] + u[i,jp1,1] + u[i,jm1,1] - 4u[i,j,1]) +
B + u[i,j,1]^2*u[i,j,2] - (A + 1)*u[i,j,1] + brusselator_f(x, y, t)
du[i,j,2] = alpha*(u[im1,j,2] + u[ip1,j,2] + u[i,jp1,2] + u[i,jm1,2] - 4u[i,j,2]) +
A*u[i,j,1] - u[i,j,1]^2*u[i,j,2]
end
end
p = (3.4, 1., 10., step(xyd_brusselator))
function init_brusselator_2d(xyd)
N = length(xyd)
u = zeros(N, N, 2)
for I in CartesianIndices((N, N))
x = xyd[I[1]]
y = xyd[I[2]]
u[I,1] = 22*(y*(1-y))^(3/2)
u[I,2] = 27*(x*(1-x))^(3/2)
end
u
end
u0 = init_brusselator_2d(xyd_brusselator)
prob_ode_brusselator_2d = ODEProblem(brusselator_2d_loop,
u0,(0.,11.5),p)
using Symbolics
du0 = copy(u0)
jac = Symbolics.jacobian_sparsity((du,u)->brusselator_2d_loop(du,u,p,0.0),du0,u0)
prob_ode_brusselator_2d_sparse = ODEProblem(ODEFunction(brusselator_2d_loop,jac_prototype = float.(jac)),
u0,(0.,11.5),p)
@time solve(prob_ode_brusselator_2d,KenCarp47(linsolve=KrylovJL_GMRES()),save_everystep=false);
# 0.715011 seconds (172.53 k allocations: 30.978 MiB)
using IncompleteLU
function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
Pl = ilu(convert(AbstractMatrix,W), τ = 50.0)
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=KrylovJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.178704 seconds (61.76 k allocations: 61.385 MiB)
using AlgebraicMultigrid
function algebraicmultigrid(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
Pl = aspreconditioner(ruge_stuben(convert(AbstractMatrix,W)))
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=KrylovJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 0.378352 seconds (61.18 k allocations: 160.815 MiB, 4.92% gc time)
using Sundials
@time solve(prob_ode_brusselator_2d_sparse,CVODE_BDF(linear_solver=:GMRES),save_everystep=false);
# 0.212607 seconds (58.38 k allocations: 2.846 MiB) It points all to: mul!(w, A, p) # w ← AN⁻¹vₖ
MisI || mul!(q, M, w) # q ← M⁻¹AN⁻¹vₖ
for i = 1 : iter
R[nr+i] = @kdot(n, V[i], q) # hᵢₖ = qᵀvᵢ
@kaxpy!(n, -R[nr+i], V[i], q) # q ← q - hᵢₖvᵢ
end Forced BLAS is probably bad for OpenBLAS defaults with """
Applies in-place backward substitution with the U factor of F, under the assumptions:
1. U is stored transposed / row-wise
2. U has no lower-triangular elements stored
3. U has (nonzero) diagonal elements stored.
"""
function backward_substitution!(F::ILUFactorization, y::AbstractVector)
U = F.U
@inbounds for col = U.n : -1 : 1
# Substitutions
for idx = U.colptr[col + 1] - 1 : -1 : U.colptr[col] + 1
y[col] -= U.nzval[idx] * y[U.rowval[idx]]
end
# Final answer for y[col]
y[col] /= U.nzval[U.colptr[col]]
end
y
end
function backward_substitution!(v::AbstractVector, F::ILUFactorization, y::AbstractVector)
v .= y
backward_substitution!(F, v)
end
"""
Applies in-place forward substitution with the L factor of F, under the assumptions:
1. L is stored column-wise (unlike U)
2. L has no upper triangular elements
3. L has *no* diagonal elements
"""
function forward_substitution!(F::ILUFactorization, y::AbstractVector)
L = F.L
@inbounds for col = 1 : L.n - 1
for idx = L.colptr[col] : L.colptr[col + 1] - 1
y[L.rowval[idx]] -= L.nzval[idx] * y[col]
end
end
y
end
function forward_substitution!(v::AbstractVector, F::ILUFactorization, y::AbstractVector)
v .= y
forward_substitution!(F, v)
end which probably could use LoopVectorization magic. Looks like it's in @chriselrod territory now. |
Krylov.jl needs JuliaSmoothOptimizers/Krylov.jl#477 for ILU to work. |
Versions with preconditioning in Sundials: using OrdinaryDiffEq, LinearSolve, Test
const N = 32
const xyd_brusselator = range(0,stop=1,length=N)
brusselator_f(x, y, t) = (((x-0.3)^2 + (y-0.6)^2) <= 0.1^2) * (t >= 1.1) * 5.
limit(a, N) = a == N+1 ? 1 : a == 0 ? N : a
function brusselator_2d_loop(du, u, p, t)
A, B, alpha, dx = p
alpha = alpha/dx^2
@inbounds for I in CartesianIndices((N, N))
i, j = Tuple(I)
x, y = xyd_brusselator[I[1]], xyd_brusselator[I[2]]
ip1, im1, jp1, jm1 = limit(i+1, N), limit(i-1, N), limit(j+1, N), limit(j-1, N)
du[i,j,1] = alpha*(u[im1,j,1] + u[ip1,j,1] + u[i,jp1,1] + u[i,jm1,1] - 4u[i,j,1]) +
B + u[i,j,1]^2*u[i,j,2] - (A + 1)*u[i,j,1] + brusselator_f(x, y, t)
du[i,j,2] = alpha*(u[im1,j,2] + u[ip1,j,2] + u[i,jp1,2] + u[i,jm1,2] - 4u[i,j,2]) +
A*u[i,j,1] - u[i,j,1]^2*u[i,j,2]
end
end
p = (3.4, 1., 10., step(xyd_brusselator))
function init_brusselator_2d(xyd)
N = length(xyd)
u = zeros(N, N, 2)
for I in CartesianIndices((N, N))
x = xyd[I[1]]
y = xyd[I[2]]
u[I,1] = 22*(y*(1-y))^(3/2)
u[I,2] = 27*(x*(1-x))^(3/2)
end
u
end
u0 = init_brusselator_2d(xyd_brusselator)
prob_ode_brusselator_2d = ODEProblem(brusselator_2d_loop,
u0,(0.,11.5),p)
using Symbolics
du0 = copy(u0)
jac = Symbolics.jacobian_sparsity((du,u)->brusselator_2d_loop(du,u,p,0.0),du0,u0)
prob_ode_brusselator_2d_sparse = ODEProblem(ODEFunction(brusselator_2d_loop,jac_prototype = float.(jac)),
u0,(0.,11.5),p)
using ModelingToolkit
prob_ode_brusselator_2d_mtk = ODEProblem(modelingtoolkitize(prob_ode_brusselator_2d_sparse),[],(0.0,11.5),jac=true,sparse=true);
@time solve(prob_ode_brusselator_2d,KenCarp47(linsolve=KrylovJL_GMRES()),save_everystep=false);
# 0.715011 seconds (172.53 k allocations: 30.978 MiB)
prob_ode_brusselator_2d_sparse = ODEProblem(ODEFunction(brusselator_2d_loop,jac_prototype = float.(jac)),
u0,(0.,11.5),p)
using IncompleteLU
Base.eltype(::IncompleteLU.ILUFactorization{Tv,Ti}) where {Tv,Ti} = Tv
function incompletelu(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
Pl = ilu(convert(AbstractMatrix,W), τ = 50.0)
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=KrylovJL_GMRES(),precs=incompletelu,concrete_jac=true),save_everystep=false);
# 0.178704 seconds (61.76 k allocations: 61.385 MiB)
using AlgebraicMultigrid
function algebraicmultigrid(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
Pl = aspreconditioner(ruge_stuben(convert(AbstractMatrix,W)))
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=KrylovJL_GMRES(),precs=algebraicmultigrid,concrete_jac=true),save_everystep=false);
# 0.378352 seconds (61.18 k allocations: 160.815 MiB, 4.92% gc time)
function algebraicmultigrid2(W,du,u,p,t,newW,Plprev,Prprev,solverdata)
if newW === nothing || newW
A = convert(AbstractMatrix,W)
Pl = aspreconditioner(ruge_stuben(A, presmoother = AlgebraicMultigrid.Jacobi(rand(size(A,1))), postsmoother = AlgebraicMultigrid.Jacobi(rand(size(A,1)))))
else
Pl = Plprev
end
Pl,nothing
end
@time solve(prob_ode_brusselator_2d_sparse,KenCarp47(linsolve=KrylovJL_GMRES(),precs=algebraicmultigrid2,concrete_jac=true),save_everystep=false);
# 0.285271 seconds (65.71 k allocations: 170.192 MiB, 2.49% gc time)
using Sundials, LinearAlgebra
@time solve(prob_ode_brusselator_2d_sparse,CVODE_BDF(linear_solver=:GMRES),save_everystep=false);
# 0.212607 seconds (58.38 k allocations: 2.846 MiB)
u0 = prob_ode_brusselator_2d_mtk.u0
p = prob_ode_brusselator_2d_mtk.p
const jaccache = prob_ode_brusselator_2d_mtk.f.jac(u0,p,0.0)
const W = I - 1.0*jaccache
prectmp = ilu(W, τ = 50.0)
const preccache = Ref(prectmp)
function psetupilu(p, t, u, du, jok, jcurPtr, gamma)
if jok
prob_ode_brusselator_2d_mtk.f.jac(jaccache,u,p,t)
jcurPtr[] = true
# W = I - gamma*J
@. W = -gamma*jaccache
idxs = diagind(W)
@. @view(W[idxs]) = @view(W[idxs]) + 1
# Build preconditioner on W
preccache[] = ilu(W, τ = 5.0)
end
end
function precilu(z,r,p,t,y,fy,gamma,delta,lr)
ldiv!(z,preccache[],r)
end
prectmp2 = aspreconditioner(ruge_stuben(W, presmoother = AlgebraicMultigrid.Jacobi(rand(size(W,1))), postsmoother = AlgebraicMultigrid.Jacobi(rand(size(W,1)))))
const preccache3 = Ref(prectmp2)
function psetupamg(p, t, u, du, jok, jcurPtr, gamma)
if jok
prob_ode_brusselator_2d_mtk.f.jac(jaccache,u,p,t)
jcurPtr[] = true
# W = I - gamma*J
@. W = -gamma*jaccache
idxs = diagind(W)
@. @view(W[idxs]) = @view(W[idxs]) + 1
# Build preconditioner on W
preccache3[] = aspreconditioner(ruge_stuben(W, presmoother = AlgebraicMultigrid.Jacobi(rand(size(W,1))), postsmoother = AlgebraicMultigrid.Jacobi(rand(size(W,1)))))
end
end
function precamg(z,r,p,t,y,fy,gamma,delta,lr)
ldiv!(z,preccache3[],r)
end
@time solve(prob_ode_brusselator_2d_sparse,CVODE_BDF(linear_solver=:GMRES,prec=precilu,psetup=psetupilu,prec_side=1),save_everystep=false);
# 0.095955 seconds (17.72 k allocations: 77.079 MiB)
@time solve(prob_ode_brusselator_2d_sparse,CVODE_BDF(linear_solver=:GMRES,prec=precamg,psetup=psetupamg,prec_side=1),save_everystep=false);
# 0.146130 seconds (30.68 k allocations: 275.589 MiB, 7.51% gc time) |
Doing some upstream fixing haampie/IncompleteLU.jl#22 and JuliaSmoothOptimizers/Krylov.jl#477 |
Could you give me any representative problems to play with for the forward and back substitution? |
See the script above. The 3 versions with preconditioners all seem to have a heavy amount of time in the ILU and AMG forward and back substitution |
Completed 🎉. |
I plan to finally finish the linear solver Krylov preconditioner etc. interfaces this week. The big breaking LinearSolve.jl change #1543 of course was a prerequisite to doing this correctly, and so now we're set to solve SciML/DifferentialEquations.jl#453 and SciML/DifferentialEquations.jl#521 at the same time. The test cases are:
The plan for this last little part is (mostly documenting for myself to remember the whole plan a few days from now 😅)
needsconcreteA(alg)
for whether a concreteA
is required or whether only theA*x
is used.elseif jac_prototype === nothing && needsconcreteA(alg.linsolve)
, automatically chooseJ
a JacVecOperator matching the autodiff setting. This would make it the same as the user choosingjac_prototype = JacVecOperator()
, but now if they choose GMRES it will automatically make this swap and set autodiff appropriately. This is then inverted for a good reason: if the user wants to specify their own matrix-free operator or approximate Jacobian, then they just as before setjac_prototype
andjac
precs
argument for a functionPl,Pr = alg.precs(W,du,u,p,t,newW,solverdata)
for building preconditioners which are dependent on the currentW
matrix. To handle the type-groundedness for the solver's init phase,newW
will be passed asnothing
to allow a separate dispatch for handling that phase.solverdata
is kind of an escape hatch so future additions can be non-breaking. Currently it's just set to includegamma
for theW = I - gamma*J
since that will be needed for @tjdiamandis's randomized preconditioners, but that list of "things from the solver someone needs" can easily grow, and so it's non-breaking to add more little scalars to the struct in the future.dt
, etc. who knows.W
is aWOperator
, thenW
is given to the user in the preconditioner and the linear solver in the lazy form where they canW.mass_matrix
,W.gamma
,W.J
and do special linear solvers. This solves solving for Implicit operators CliMA/ClimaCore.jl#41 without the need for keeping theWfact!
code around.That will be relatively painless to do and we'll get fully general matrix-free preconditioned Krylov.
Mentioning this to @simonbyrne who is tracking this kind of stuff in CliMA/ClimaCore.jl#41
One last little tidbit, when digging around Sundials, the only thing that Sundials has that this doesn't is some weird pointer in what would be the equivalent to the
precs
function https://sundials.readthedocs.io/en/latest/cvode/Usage/index.html#c.CVLsPrecSetupFnThat makes no sense if matrix-free though, and so Sundials usually just ignores it. So what's the case where you're doing not matrix-free Newton-Krylov and you want to tell the algorithm you modified the Jacobian in the preconditioner setup function to do... ? I don't see Sundials actually doing anything with it, and it doesn't make too much sense to me right now to have it. If someone updates the Jacobian, then surely they don't want you to exit/reject the step right before using it to calculate a new Jacobian, so it should not act like the normal use of the
newW
flag. The only thing it could be is for resettingisfresh
for theA
in the linear solve: so the user directly mutatedW
and wants it to know that it might need to refactorize before solving even though the algorithm didn't necessarily ask for a newW
. But that can't be the case because this is in preconditioners and thus not for factorization algorithms, unless it's for mixed iterative+factorization algorithms. I'll need to ruminate on this a bit more. But it's not hard, the better way to do this would be to implement it in our form asPl,Pr,Wmodified = alg.precs(W,du,u,p,t,newW,solverdata)
, so the user function needs to additionally return a boolean. It would be good then to not break the interface and have it now if it's necessary.The text was updated successfully, but these errors were encountered: