Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse Jacobians #416

Closed
ChrisRackauckas opened this issue Jul 5, 2018 · 11 comments
Closed

Sparse Jacobians #416

ChrisRackauckas opened this issue Jul 5, 2018 · 11 comments

Comments

@ChrisRackauckas
Copy link
Member

SciML/DifferentialEquations.jl#220 defines an interface an the whole of DiffEq. Implementing it into OrdinaryDiffEq.jl well will be an ordeal as well. We will need to change the W calculations to take into account possible structure, sparsity, or laziness:

https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/derivative_utils.jl#L35

Specifically these lines:

https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/derivative_utils.jl#L66-L73

We will probably want to handle SciML/DifferentialEquations.jl#259 and SciML/DifferentialEquations.jl#263 simultaneously by updating this one https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/derivative_utils.jl#L82 to have similar options to the in-place one.

Then we just need to make https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/caches/kencarp_kvaerno_caches.jl#L73-L74 build the correct cache depending on the jac_prototype. That should probably be some dispatching function in derivative_utils.jl that handles it that we then apply to every cache similarly. In the caches, we will want to make the default linsolve change depending on the type of J/W here: https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/caches/kencarp_kvaerno_caches.jl#L92 . So it should probably be nothing in the algorithms and then if nothing use a smarter default else use the user-override. For sparse it's probably smart to just use the sparse-LU, and for matrix-free we can just default to GMRES (maybe add in a simple preconditioner?) from IterativeSolvers.jl.

Then we will need to incorporate it into https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/nlsolve_utils.jl . We should have a dispatch to avoid the whole nonlinear solving if the function is linear. While u' = Au isn't much of an interesting case, the IMEX algorithms applied to semilinear equations u' = Au + f(t,u) is one case where this will show up so we might as well handle it. I don't think any more modifications will be needed there.

Lastly, the exponential integrators need to make use of it. I think they are setup just to use the matmul, so the previous parts should already be enough to get them using the correct sparse/matrix-free Jacobians.

@YingboMa 's and @MSeeker1340 's work will make heavy use of this, so since this is a large change we may want to coordinate doing this in pieces.

@MSeeker1340
Copy link
Contributor

I took a look at the code around the jacobian and W operator and I think it's best if we work out the big picture first before actual coding.

Currently, the related pieces are:

  • In the constructor for ODEProblem, there is a field for jac_prototype (which is currently unused) and mass_matrix.

  • The update function for the Jacobian is f.jac(J,u,p,t) for an ODEFunction f. The analytic inverse W, if available, is also presented this way as f.invW or f.invW_t.

  • Nevertheless, the recommended way of updating the Jacobian/W is not by invoking f.jac/f.invW directly, but via the interface calc_J! and calc_W!. (ExpRK currently use f.jac directly but this is a temporary solution and we should switch to calc_J! instead).

  • The storage location of the jacobian and the W operator is in the caches, as required by calc_J! and calc_W!. The construction of the caches is handled in alg_cache for the corresponding method.

  • Finally there's diffeq_nlsolve!, which currently takes in W as a black box operator to be used by cache.linsolve. So the expected workflow in this case is to first have calc_W! update the W operator in the cache and then pass it to diffeq_nlsolve!.

To summarize: information for the jacobian/W is stored in two places, the ODEProblem and its ODEFunction. It is processed by alg_cache to construct suitable jacobian/W objects and also used by calc_J! and calc_W! to update these objects. The integrators will then utilize the updated operators in their own way (use the Jacobian directly for exponential methods/use diffeq_nlsolve! for implicit methods).

Now onto the central issue: the flow of information is missing from ODEProblem to alg_cache, i.e. it is not present in the argument list of alg_cache. For starters, alg_cache would not know what prob.jac_prototype is, making it difficult for it to construct the correct J. Also if we wish to construct W lazily as MM - gamma * J, then alg_cache would not know what the mass matrix should be. There are two solutions to the problem:

  1. We expand the argument list of alg_cache to include jac_prototype and mass_matrix, or the ODEProblem itself.

  2. We instead store all relevant information (jac_prototype and mass_matrix in particular) in the ODEFunction, which is accessible by alg_cache.

I can see advantages to both solutions. 1 requires fewer changes to the existing codebase, while 2 is in my opinion more logical because these should be considered a part of ODEFunction anyways (it also makes ODEFunction the single source of information, and avoids the need of having deep levels of indirection like integrator.sol.prob.mass_matrix in calc_W!). Personally I'm more in favor of 2 but @ChrisRackauckas it's your call how this should be handled. @YingboMa I'd also like to hear your opinion on the matter.

(By the way one convenience of 2 is that if a AbstractDiffEqLinearOperator is given as jac_prototype to the constructor of ODEFunction, then it can automatically set f.jac to be update_coefficients!).

Some other issues that have come to my mind:

  • This is mentioned in Jacobian types DifferentialEquations.jl#220

    But then we'd have to have some way of knowing not to do

       for j in 1:length(u), i in 1:length(u)
          @inbounds W[i,j] = mass_matrix[i,j] - γ*J[i,j]
      end

    and instead build the lazy W operator. An option in the ODEProblem etc. could be lazy_jac which
    would default to true for AbstractSparseMatrix or Array Jacobians.

    I think a more logical name should be lazyW, as this only affects how W is computed? Also, because this option needs to be known by alg_cache, it also ties with the previous issue.

  • Why do we need a separate W parameter in diffeq_nlsolve! anyway? Wouldn't we just use cache.W in all cases?

  • We will probably want to handle Computing jacobians by finite differences with out of place functions DifferentialEquations.jl#259 and Complex woes with implicit solvers (out-of-place only) DifferentialEquations.jl#263 simultaneously by updating this one https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/derivative_utils.jl#L82 to have similar options to the in-place one.

    What about the other direction, i.e. have the in-place version also support ForwardDiff and possibly other approximations?

  • Let's consider how alg_cache should handle different jac_prototype. If the prototype is a regular AbstractMatrix, then J = similar(jac_prototype) should be the case. What if the prototype is a AbstractDiffEqLinearOperator? We can certainly just set J = jac_prototype, which should work in 99% of the cases (it is unlikely that the original prototype would be used elsewhere). Still this leaves potential issues so implementing a cloning/deep copy interface for AbstractDiffEqLinearOperator might be necessary.

@ChrisRackauckas
Copy link
Member Author

We can certainly just set J = jac_prototype, which should work in 99% of the cases (it is unlikely that the original prototype would be used elsewhere).

This isn't thread-safe which will give issues with things like DiffEqMonteCarlo. We should keep thread-safety especially with the coming threading changes. I wanted to treat that first because I want it to be clear that is a good we want to keep in mind.

I can see advantages to both solutions. 1 requires fewer changes to the existing codebase, while 2 is in my opinion more logical because these should be considered a part of ODEFunction anyways (it also makes ODEFunction the single source of information, and avoids the need of having deep levels of indirection like integrator.sol.prob.mass_matrix in calc_W!). Personally I'm more in favor of 2 but @ChrisRackauckas it's your call how this should be handled. @YingboMa I'd also like to hear your opinion on the matter.

I wouldn't use "fewer changes" as a metric. Our code base slapped this kind of support on as it needed and the whole Jacobian part is in need of a re-design. @YingboMa helped out a lot by refactoring it so now only a small amount of code needs to be targeted (before there was a bunch of copy paste programming... it grew fast...), and now we should use this structure to directly update it correctly.

(By the way one convenience of 2 is that if a AbstractDiffEqLinearOperator is given as jac_prototype to the constructor of ODEFunction, then it can automatically set f.jac to be update_coefficients!).

That's a very good argument for 2. Let's go with 2.

Well... another thing we should ask is whether the DiffEqFunction as separate is a good idea. After going through with all of it, could it be better to slap all of those fields into the problem type itself? I don't want to dwell too much on details since it's always easy to move this stuff around and don't want to delay actual work due to bikeshedding.

One thing to keep in mind is the lean-ness of the of the inputs. I opened up JuliaLang/julia#25107 to try and find out a leaner way to give matrices than just giving the entire matrix since that's super memory inefficient if we are going to be copying. I don't think we have a better solution for now but this is why this stuff was kind of kept separate in case the functions are copied more. We ended up copying functions less in DelayDiffEq.jl than the original design so it may not be an issue anymore.

I think a more logical name should be lazyW, as this only affects how W is computed? Also, because this option needs to be known by alg_cache, it also ties with the previous issue.

Your composition solution gets rid of this whole issue, or at least changes the discussion. The question before was, how do we know when to build W and how do we know when to generate a function that does W*x? But if we have generic operator composition, then we always build the composed W operator and give that to the linsolve routine (maybe with a cache as well), and then it handles it as necessary (for example, materializes the W and then factorizes). So if there's one W we might as well call it W.

Why do we need a separate W parameter in diffeq_nlsolve! anyway? Wouldn't we just use cache.W in all cases?

Good idea.

What about the other direction, i.e. have the in-place version also support ForwardDiff and possibly other approximations?

In-place already supports ForwardDiff.jl and DiffEqDiffTools.jl. That's what this is all for: https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/derivative_wrappers.jl .

@MSeeker1340
Copy link
Contributor

This isn't thread-safe which will give issues with things like DiffEqMonteCarlo. We should keep thread-safety especially with the coming threading changes. I wanted to treat that first because I want it to be clear that is a good we want to keep in mind.

That's a good point. Yes I will tackle this first in DiffEqOperators, probably via deepcopy. Incidentally this can give us a way of serializing AbstractDiffEqLinearOperator, which can prove useful in some situations.

But if we have generic operator composition, then we always build the composed W operator and give that to the linsolve routine (maybe with a cache as well), and then it handles it as necessary (for example, materializes the W and then factorizes). So if there's one W we might as well call it W.

Oh I see. I was under the impression that there might be cases where W is preferred to be calculated as just a dense matrix. Since lazy composition is preferred there is indeed no need for lazyW.

@ChrisRackauckas
Copy link
Member Author

Oh I see. I was under the impression that there might be cases where W is preferred to be calculated as just a dense matrix. Since lazy composition is preferred there is indeed no need for lazyW.

Well it's only ever used to linear solve, I think it's fine to hand the composed operator over to the users and let them make the choice for how to handle it, with the default routines doing the same thing as before which is just materializing the matrix and then factorizing. To get to where we were before, we'd want to have an allocation free way of materializing that though, but that sounds like a generally useful thing to plan for the operator handling anyways.

@MSeeker1340
Copy link
Contributor

Well... another thing we should ask is whether the DiffEqFunction as separate is a good idea. After going through with all of it, could it be better to slap all of those fields into the problem type itself? I don't want to dwell too much on details since it's always easy to move this stuff around and don't want to delay actual work due to bikeshedding.

I can see where this is going. Now that every *DEProblem includes its own *DEFunction, it does raise the question of whether this separation of problem and function is necessary. I guess one advantage of having them separate is that a single instance of *DEFunction can be used by multiple problems that have the same right hand side but different u0 and/or tspan?

I looked into deepcopy and it indeed support AbstractDiffEqOperators out of the box. (It is still not 100% thread safe if some update_func of an operator is a closure. But having the update functions relying on outside variables other than (u,p,t) is bad practice anyways.)

(By the way one convenience of 2 is that if a AbstractDiffEqLinearOperator is given as jac_prototype to the constructor of ODEFunction, then it can automatically set f.jac to be update_coefficients!).

About this... I realized DiffEqOperators is not a requirement of DiffEqBase so this is not possible. Of course I can just add it to REQUIRE but I'm not sure if this is a good idea from a package development standpoint. Instead, I can add an additional clause to calc_J! that checks if f.jac == nothing AND isa(f.jac_prototype, AbstractDiffEqLinearOperator), in which case it will invoke the update_coefficients! method. This is uglier than having f.jac automatically set but should get the job done.

@ChrisRackauckas
Copy link
Member Author

(It is still not 100% thread safe if some update_func of an operator is a closure. But having the update functions relying on outside variables other than (u,p,t) is bad practice anyways.)

No, it will be thread-safe since deepcopy will make a copy of those internal variables.

About this... I realized DiffEqOperators is not a requirement of DiffEqBase so this is not possible. Of course I can just add it to REQUIRE but I'm not sure if this is a good idea from a package development standpoint. Instead, I can add an additional clause to calc_J! that checks if f.jac == nothing AND isa(f.jac_prototype, AbstractDiffEqLinearOperator), in which case it will invoke the update_coefficients! method. This is uglier than having f.jac automatically set but should get the job done.

While having all of the extra operators defined in a separate library is nice, I'm not sure all of the structure should've been moved to a separate package.

@MSeeker1340
Copy link
Contributor

Ah wait, update_coefficients! is actually defined in DiffEqBase itself, which means I can use it directly.

I can finally see why it is a good idea to define the abstract interface for the operators in DiffEqBase and not have DiffEqOperators be self-contained ;)

@YingboMa
Copy link
Member

Personally I'm more in favor of 2 but @ChrisRackauckas it's your call how this should be handled. @YingboMa I'd also like to hear your opinion on the matter.

Yeah, I also like option 2.

Why do we need a separate W parameter in diffeq_nlsolve! anyway? Wouldn't we just use cache.W in all cases?

I would like to make the function diffeq_nlsolve! has almost the same function signature for both OrdinaryDiffEqConstantCache and OrdinaryDiffEqMutableCache.

@MSeeker1340
Copy link
Contributor

Now that work on lazy W is starting, I'd like to discuss the variable dtgamma issue in more detail, which I kinda glossed over in the previous discussions.

Conceptually, the lazy W operator can be constructed as (for the moment let's make it simple and assume both mass_matrix and J are AbstractDiffEqLinearOperator):

γ = DiffEqScalar(dtgamma)
W = mass_matrix - γ * J

If dtgamma is a constant then that's it: updating W is as simple as calling update_coefficients! on it. But what if dtgamma changes? I can think of a few solutions:

  1. We stick to the update_coefficients! interface, which means adding an update function to γ that reads p.dtgamma and changes its value accordingly. In this way updating W is still just update_coefficients!. However, the problem is that when constructing the ODE problem, dtgamma is certainly not in the user provided p so we need to tag it along. This can be difficult because we're not putting any type restriction on p. Furthermore for this to work p would also need to be mutable.

  2. We break the interface and set the value of γ directly. Specifically, W is still constructed in the same way (with γ being a constant operator). In calc_W!, we call update_coefficients! not on W as a whole but on J. Then we do setval!(γ, dtgamma). This has the same effect on W as the first option. Note that for this to work we need to also store both γ and J in the cache. (It's true that we can access them by probing W's internals, but I don't think this is safe).

  3. This is a slight variation on option 2. We define a special type (say WOperator) which has mass_matrix, dtgamma and J as its fields. It behaves exactly like the previous composed operator, but has a special method (say setgamma!) that changes dtgamma. calc_W! still behaves the same way but instead of update_coefficients! on J we call it on W directly, and insteadof setval!(γ, dtgamma) we do setgamma!(W, dtgamma). The advantage of this is that only W as a whole needs to be stored in the cache.
    (By the way, another advantage of having a separate type is we can keep mass_matrix as regular matrices/I and don't need to wrap it in an operator as is required for option 1 and 2).

@ChrisRackauckas
Copy link
Member Author

I don't like 1. Constructing the lazy W each time is cheap so I think that's fine, but making it its own special type could be useful since it will be user-facing in the linsolve interface. What kind of interface can we put on it and would it make it much easier for users to interact with than having it be a standard operator?

I think it would be cool if WOperator could have its own cache, so that way factorization would be implemented as:

if !have_cache(W)
  allocate_cache!(W)
end

if matrix_updated
  materialize!(W) # Writes to the internal cache the actual W matrix
  p.Wfac = factorize!(W) # uses the internal cache
end

ldiv!(x,p.Wfac,b)

essentially this is saying "W has the possibility of having a dense cache associated with it. If it's not allocated, for my linsolve I'll need it so allocate it". This should only allocate the first iteration. Then the next iterations we use it: "If W updates then build the dense/sparse W matrix in memory (into the already allocated cache) and factorize that". And then each call it uses the factorization. Then matrix-free usage would just be like:

gmres(x,W,b)

and use it directly without factorizing.

Thoughts?

@MSeeker1340
Copy link
Contributor

essentially this is saying "W has the possibility of having a dense cache associated with it. If it's not allocated, for my linsolve I'll need it so allocate it". This should only allocate the first iteration. Then the next iterations we use it: "If W updates then build the dense/sparse W matrix in memory (into the already allocated cache) and factorize that". And then each call it uses the factorization.

Interesting. Kinda reminds me of lazy constructs in languages such as C#.

But I think option 3 is the way to go by the looks of it. Having a dedicated type for W makes it way more versatile than generic composition, your allocate-as-needed cache being one example.

I will start with purely lazy WOperator for the moment, and after that see if I can incorporate your cache idea into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants