-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Next-generation nonlinear #846
Comments
Thanks @blegat, at least at a high level this is sounding reasonable to me. Is this required to break the NLP block into separate MOI constraints or is there an intermediate step, where we first address that point? |
The first step does not require it but resolving the whole issue requires it. |
At a high level this seems fine. Don't forget that people want to query derivatives from a JuMP model without acting as a solver (http://www.juliaopt.org/JuMP.jl/v0.19.2/nlp/#Querying-derivatives-from-a-JuMP-model-1). It's not clear how that fits in with bridges. The big details I'm worried about are the specification of the expression graph format. Using Julia expression graphs would be pretty inefficient because every term in the expression is a GC-tracked object. |
From the perspective of SCIP (consuming expression graphs directly) this sounds good to me. I understand we would not need any bridge then, and could easily port the |
Is anyone currently working on this? If so, I'll be happy to pitch in and collaborate on this. Otherwise, I'll start taking a pass at this. Just let me know. |
No one is working on this. But this is a massive project to undertake (on the order of months). We're working on getting funding to support a sustained development effort, but things are still uncertain due to COVID etc. It won't happen before JuMP 1.0. If you have development time for NLP, it's probably better fixing some of the outstanding issues in Ipopt etc. |
Migration of Pavito, Pajarito, Alpine to MOI are good near-term NLP activities. |
@odow Entirely understood. I'll keep pushing out fixes as I come across them in the meantime. That said my interest in this is somewhat selfish. There’s a family of nonsmooth NLP solvers which require generalized derivative information (Lexicographic or Clarke type) that I’m getting interested in. So being able to specify a backend other than the standard AD scheme is of interest. I can do this currently by grabbing the NLPEvaluator from JuMP and manipulating it to construct a new evaluator. That said, this is a bit clunky (and prone to breaking when someone updates the AD in JuMP). Also, I’ve also had a few people chat with me about wanting to extract computational tapes from JuMP models (for testing things akin forward-reverse interval contractors and so on…). So, while this is definitely a long-term task, I’m interested in starting to get the MOI foundation for its setup. @ccoffrin I’d be happy to do that if I ever have some genuinely free time 😊. That said, there are some research activities I’m looking at that may benefit from a more extendable nlp interface (which is motivating my interest in this). |
This isn't going to change in the near-term.
The first change on the AD in JuMP will be to enable this feature :) |
Awesome! Thanks for the clarification. |
I started a Google doc outlining the high-level tasks required for a re-write of NLP support in JuMP. https://docs.google.com/document/d/1SfwFrIhLyzpRdXyHKMFKv_XKeyJ6mi6kvhnUZ2LEEjM/edit?usp=sharing Comments and suggestions for a more detailed design are appreciated. |
It has been on my TODO for literally 6 months to talk to you and workout how the ChainRules project fits into this story. |
@oxinabox I think if the plan to make a more generic derivative back-end works out then ChainRules could be a part of that! The goal would be to have a more generic differentiation interface so users can pick solutions that well suited to their specific use case. |
@blegat @odow I was thinking of next steps for the CP sets, and I was wondering how to support nonlinear functions (like absolute value). Based on the current thoughts about this NLP issue, do you think adding |
I don't see the harm in defining There are no real plans for other expression types. One option is AMPL-esque: |
OK, thanks! At least, such a solution could fit better with bridges, I think. |
There's a few demos already well beyond that 😅 . We need symbolic array support to be completed to make a few things nicer though. @blegat added an MOI binding to GalacticOptim.jl so we'll be making use of this all in the background. |
@ChrisRackauckas I had initially not studied ModelingToolkit a lot, because it does not seem to be well-equipped for discreteness. I have listed five pain points for now. Maybe they could be solved by using Symbolics.jl instead of ModelingToolkit.jl? 1. CountingHow do you make something like this work? This could be the way a user enters a @parameters t σ ρ β
sum([t, σ] .~ [ρ, β]) For now, this code yields the following error:
2. Global constraints for optimisationAlso, not all constraints are equality or inequality: you might have 3. Lack of differentiability for some functionsAt its core, ModelingToolkit seems to be excellent for computing derivatives, but what about functions that do not have a derivative? (At least not in a classical sense…) How does the package work when there is no way to compute a derivative? How strong is the differentiability hypothesis? (Also, I guess it would be useful to deal with functions that are not differentiable everywhere and to compute subgradients.) 4. Fancy rule-based rewritingI didn't see a lot of ways to add new rules for rewriting expressions in ModelingToolkit.jl, only in Symbolics.jl. However, to get to optimisation solver, it's not sufficient to simplify expressions, but rather to get to a form that the targeted solver allows (like second-order cones and linear constraints). Is there a way to write this kind of transformation on top of ModelingToolkit.jl/Symbolics.jl, i.e. without throwing tens of thousands of lines of code at it? (For instance, MOI already has an excellent solution to this problem with bridges.) Also, you mention that GalacticOptim.jl has MOI support, but only for the current, legacy NLP support: the current code does not seem to be able to extract information about the problem, like linear constraints. I bet that such a system would be more complex to write. 5. Not all parameters are optimisation variablesYou might want to keep some values in your constraints variable, for instance to reoptimise quickly when just a few values change. (https://github.com/jump-dev/ParametricOptInterface.jl/blob/master/docs/src/example.md) Maybe this can be solved by having the user specify what parameters are optimisation variables and which ones are not. The problem that arises with this kind of parameters is that the product of two optimisation variables is unlikely to be convex, but if one of them is fixed then the expression is linear. This kind of information should be available to the rewriting system. |
ModelingToolkit.jl is built on Symbolics.jl. Anything that Symbolics.jl can represent can go into the ModellingToolkit.jl types and give a modeling frontend to.
The symbolic system is as confused as I am. What are you trying to do there? Build an equation summing the left and right sides? We probably could specialize
Yeah, we'd have to add that expression.
using Symbolics
@variables a::Int b::Bool
It uses the same rule expressions as the AD system, so it will handle
Good, because there's no such thing as a ModelingToolkit.jl expression. All expressions are Symbolics.jl expressions.
Yeah, this is rather trivial really. It's definitely a good application of E-graphs. I do have a student looking into this, though I think the main focus of GalacticOptim.jl right now is really on nonlinear support since that's the bulk of the use cases.
Look at the definition of OptimizationProblem. There are states (optimized for) and parameters (not optimized for). |
Thanks for your detailed answer! Regarding the t = 1
σ = 2
ρ = 1
β = 4
[t, σ] .== [ρ, β] # BitVector: [true, false]
sum([t, σ] .== [ρ, β]) # 1: one of the two equalities hold |
Yes, that makes sense. Open an issue on Symbolics.jl? |
Here it is: JuliaSymbolics/Symbolics.jl#322 |
Reposted from #1538 @frapac from our discussion: Modeling languages are great, but for many nonlinear problems you just want to provide an objective and constraints to the solvers with minimal hassles. In economics, in particular, the equations might be impossible to even write in existing modeling langauges. In other cases, maybe you want to use all sorts of Symbolics.jl/ModelingTookit.jl trickery, but in the end have a function which just operates on component arrays. Basically, you just want to call the optimizers with as native of primitives as possible and without any DSL getting in the way. This is in the same spirit as https://github.com/jump-dev/MatrixOptInterface.jl I believe. In its current form, JuMP for the NLP is very scalar centric, and this was one of the things that led to the creation of https://github.com/SciML/GalacticOptim.jl (note, however, that the hope was that the scope of GalacticOptim will eventually expand into far more interesting things largely around applying problem reduction as well as other fancy AD things). On the MOI side, there might not be that many things missing for a more direct pass-though. Probably need dense jacobians and hessians, and I think that anything that needs to allocate will be a problem if it wants to do passthrough to GPU-style algorithms, and it might make passthrough of things like componentarrays difficult. But that stuff could come later. What might this look like to a user? First, it is useful to have an object to collect derivatives and make different AD convenient. I will describe the ones from GalacticOptim, but the gradient wrappers are something that could be standardized and shared in SciMLBase or just made specific to JuMP/MOI. For example, https://github.com/SciML/GalacticOptim.jl/tree/master/src/function provides some wrappers for all sorts of AD. It could do something like the following. Write your julia function. SciML allows allows passing in a parameter vector rosenbrock(x, p) = (p[1] - x[1])^2 + p[2] * (x[2] - x[1]^2)^2
f = OptimizationFunctions(rosenbrock) Which would dispatch to the default auto-differentiation. Or to use zygote or finite differences OptimizationFunction(rosenbrock, GalacticOptim.AutoZygote())
OptimizationFunction(rosenbrock, GalacticOptim.AutoFiniteDiff()) Or to pass in your own derviatives rosenbrock_grad(x, p) = 2 p[1] * x[1]+ # WHATEVER IT IS
OptimizationFunction(rosenbrock, rosenbrock_grad) Finally, if the problem was constrained, then in the current GalacticOptim (which could be changed) it can generate the jacobian and hessians as required. The current design has an optional rosenbrock(x, p) = (p[1] - x[1])^2 + p[2] * (x[2] - x[1]^2)^2
function con2_c(x,p)
[x[1]^2 + x[2]^2,
x[2]*sin(x[1])-x[1]]
end
OptimizationFunction(rosenbrock, GalacticOptim.AutoForwardDiff();cons= con2_c) First consider if there were MOI exnteions instead. @frapac suggested something like model = MOI.nlpsolve(f, g!, h!,p, x_lb, x_ub, c_lb, c_ub)
attach_linear_constraints!(model, A, b)
attach_quadratic_constraints!(model, Q, q, d) which I would hope could have a "differentiation wrapper" version: p = # the parameters
fun = OptimizationFunction(rosenbrock, GalacticOptim.AutoForwardDiff();cons= con2_c)
model = MOI.nlpsolve(fun, p, x_lb, x_ub, c_lb, c_ub)
attach_linear_constraints!(model, A, b)
attach_quadratic_constraints!(model, Q, q, d) Note that I added in parameters in the vector In fact, if this existed, then I think GalacticOptim could be extremly short and almost not exist. Or it could just be the The last piece of GalacticOptim that might be worth considering is its role in auto-differentiation rules for the optimizeres themselves. We want to be able to establish AD rules for optimizers themselves (e.g. https://en.wikipedia.org/wiki/Envelope_theorem) but that might be better discussed down the line. To be rrule friendly, we would probably want to avoid the mutatino of adding constraints ex-post, and instead have them in the constructor of the problem type. Mutation makes for difficult rules. From the "DSL" perspective, I think that JuMP might actually work with a few constrained on supported features. Although I think I would preefer the MOI-extension instead. In particular, if you want to provide a "vectorized" objective (and optionally nonlinear constraint) then:
Otherwise, I think that the existing JuMP and MOI stuff around linear constraints would be perfectly fine. The user would only have a single variables to work with, but I think that is fine. Alternatively, if you wanted this to live in JuMP then, to see possible usage in a one-variable limited jump, let me adapt https://jump.dev/JuMP.jl/stable/tutorials/Nonlinear%20programs/rosenbrock/ using JuMP
import Ipopt
import Test
import AutoDiffWrappersFromWherever
function example_rosenbrock()
model = Model(Ipopt.Optimizer)
set_silent(model)
# call wrappers discussed above
rosenbrock(x) = (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2
obj = OptimizationFunctions(rosenbrock, AutoFiniteDiff())
@variable(model, x[1:2]) # or @passthrough_variable(model, x[1,2])
@NLvector_objective(model, Min, obj)
# ERRORS IF MODEL HAS MORE THAN ONE VARIABLE OR IF SCALAR
optimize!(model)
Test.@test termination_status(model) == MOI.LOCALLY_SOLVED
Test.@test primal_status(model) == MOI.FEASIBLE_POINT
Test.@test objective_value(model) ≈ 0.0 atol = 1e-10
Test.@test value(x) ≈ 1.0
Test.@test value(y) ≈ 1.0
return
end
function example_constrained_rosenbrock()
model = Model(Ipopt.Optimizer)
set_silent(model)
# call wrappers discussed above
rosenbrock(x) = (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2
function con2_c(x)
[x[1]^2 + x[2]^2,
x[2]*sin(x[1])-x[1]]
end
obj = OptimizationFunctions(rosenbrock, AutoFiniteDiff(), cons = con2_c)
@variable(model, x[1:2])
@NLvector_objective(model, Min, obj)
# THIS ADDS THE CONSTRAINT AS WELL. HESSIAN OF LAGRANGIAN ATTACHED
optimize!(model)
Test.@test termination_status(model) == MOI.LOCALLY_SOLVED
Test.@test primal_status(model) == MOI.FEASIBLE_POINT
Test.@test objective_value(model) ≈ 0.0 atol = 1e-10
Test.@test value(x) ≈ 1.0
Test.@test value(y) ≈ 1.0
return
end What would |
ScalarNonlinearFunction is merged. I'll hold off closing this until we add VectorNonlinearFunction. |
Closing as done. We know have |
We discussed with @ccoffrin, @harshangrjn and @kaarthiksundar about what the next-generation NLP could look like.
Here is a gist of the idea:
We create two new function types. First
NonlinearExpressionFunction
which will be used by JuMP to give the objective function and the constraints (asNonlinearExpressionFunction
-in-EqualTo
/GreaterThan
/LessThan
) to the MOI backend.Then
SecondOrderBlackBoxFunction
will be what NLP solvers such as Ipopt supports:But NLP solvers such as Alpine can directly support
NonlinearExpressionFunction
as it exploits the expression structure.We can then define the bridges
AffinetoSecondOrderBlackBox
andQuadratictoSecondOrderBlackBoxBridge
which transforms respectivelyScalarAffineFunction
andScalarQuadraticFunction
intoSecondOrderBlackBoxFunctionBridge
(as is currently done in the Ipopt MOI wrapper for instance).The transformation from
NonlinearExpressionFunction
toSecondOrderBlackBoxFunction
can be achieved by aAutomaticDifferentiationBridge{ADBackend}
which computes the callbacks usingADBackend
.We can move the current AD implementation form JuMP to
MOI.Utilities
into anDefaultADBackend
andfull_bridge_optimizer
addsAutomaticDifferentiationBridge{DefaultADBackend}
.If the user wants to change the AD backend to
OtherADBackend
, there will be a JuMP function that internally removesAutomaticDifferentiationBridge{DefaultADBackend}
and addsAutomaticDifferentiationBridge{OtherADBackend}
.@kaarthiksundar would be interested to work on this and we thought a good first step is to define
SecondOrderBlackBoxFunction
and create the bridgesAffinetoSecondOrderBlackBox
andQuadratictoSecondOrderBlackBoxBridge
.Let us know what you think !
The text was updated successfully, but these errors were encountered: