-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: NLP interface #32
Conversation
I like it. Obviously will work very nicely with IPOPT. |
Sure I'd be interested at least in chiming in. In many realistic problems, the Hessian and Jacobian aren't available as explicit matrices, but only as operators. So there's a need for evaluating Hessian-vector products, Jacobian-vector products, and transpose Jacobian-vector products, i.e, H_v, J_v and J'*u. IPOPT requires explicit matrices for everything because it's factorization-based, but for iterative solvers, it makes more sense to only rely on products. |
@dpo do you know of a open-source iterative solver we can look at that might use those sorts of things? |
Just to chime in here, one of the reasons my constrained optimization branch for Optim looks so different and therefore remains unmerged is because these are issues for me, too. I can't pretend that I know this approach is best, but I've generally settled into a pattern where I use conjugate-gradient as my core algorithm, and accumulate both the gradient of the objective and of all the constraints (of which there can be as many as 10^7 individual constraints) into a single gradient vector for some specific value of the barrier coefficient. Matrix algebra, even sparse matrix algebra, tends to be pretty dang slow under such circumstances. Why spend all your time fussing over the exact inversion of a huge sparse matrix---probably using preconditioned conjugate gradient---as an "inner" step during a single iteration, when you could instead be investing your CPU cycles making progress on the overall nonlinear conjugate gradient? So for the problems I'm interested in, I'm not yet a huge fan of techniques that rely on inversion/factorization as a core step for optimization. As an extra but (now) much less important point, for my most important problem it took almost two weeks to analytically calculate and numerically implement the gradient. (It's a nasty, complicated expression and I was doing this in the early days of Julia before we had AD.) Certainly, I'm leery of analytically calculating the 2nd derivative too 😄. Now that we have AD I doubt this is such a concern, unless it has substantially worse performance than hand-calculated expressions. I haven't tested. |
We should also include @johnmyleswhite here. |
To @IainNZ and @timholy - I think the main example of a general-purpose open-source optimization solver for nonconvex nonlinear programming that can be switched to use either direct or iterative linear algebra is Lancelot. Dominique knows better than I do, but I'm not aware of any other general-purpose solvers using iterative linear algebra that have demonstrated robustness and good performance on standard test sets. It's something a lot of people are working on, but most of the implementations that I know of are still experimental or problem-specific or shoehorned into existing API's. To @mlubin, is the idea here to abstract out a general-purpose nonlinear programming API that can provide every possible evaluation routine that any solver or algorithm could ever need, then the modeling frameworks (for now mostly JuMP, but potentially others later) are responsible for implementing as many of the API functions as possible for any given problem class? It's worth doing a survey of what's out there aside from Ipopt (which is great and I use it heavily, but its API is representative of interior point algorithms using direct linear algebra), and what other classes of algorithms are under development. What about augmented Lagrangian methods for example, have you guys taken a close look at the API for Lancelot? There's also Opt++ https://software.sandia.gov/opt++/ which I've read a little about but not thoroughly evaluated, and TAO http://www.mcs.anl.gov/research/projects/tao/ which I think still can't handle general constraints but uses PETSc for all its linear algebra. How about active set or reduced space methods, would those tests and transformations have to be done at a solver-specific level, or can those be abstracted to a solver-independent interface here? Same with constraint projection or proximal operators, for classes of problems where those methods apply. Would it be worth the added complexity of allowing linear and nonlinear constraints and/or variables to be provided separately, or indicated by the user (modeling framework) in some manner? Constraint linearity makes little algorithmic difference to Ipopt, but Minos and Snopt (and MINLP solvers like Bonmin and Couenne) can take advantage of it. Snopt typically beats Ipopt in number of iterations, but its SQP iterations are more expensive than Ipopt's interior point steps unless the function evaluations are very slow. The number of times I heard "the prototype implementation is in Matlab" over the past week at SIOPT made me a little sad, here's hoping JuliaOpt can start fixing that problem. I did some evangelizing during my talk, and in conversations with Laurent El Ghaoui and others. |
To be honest, in terms of being a platform to develop new optimization algorithms, until we have the equivalent of The other thing we need to do it get that CUTEst wrapper figured out. I ran into some roadblocks in my own testing, and haven't had time to keep plugging away at it. |
OT and mostly separate from this proposal, but you're quite right that you have to be able to keep a close eye on what's going on for writing a prototype. The part that makes me sad is how so many of these Matlab prototypes will never get turned into something seriously useful, since the barrier to translating into a real C++ implementation is so high and beyond the scope of most PhD's. |
There are a lot of things in optimization that might be shared across solvers, but we'd go crazy trying to implement them all. So, that suggests there is some threshold - e.g. in the extreme, if only one solver supports something, its not ready to be abstracted. We just need to get a consensus on that threshold. The "IPOPT style", which we know has multiple solvers, and in fact already has a Julia interface, is of course the natural place to start. I think your point about the iterative solver is also a good use case, although it still seems fairly cutting-edge. We've also been talking about MINLP solvers around here, which need/want an expression tree, but thats already at the point where maybe we are jumping the gun. A thought occurs though: maybe providing a standardized interface encourages the development of new solvers by reducing effort? Hmmm... Anyway, I doubt we are going to get it right the first time. It'd help to have more solvers wrapped in Julia before trying to guess a general interface for them, but maybe as a proxy we should assemble a little document that contains a list of the solvers we'd be interested in one-day wrapping, and a few words about their interface? |
Definitely start somewhere, you can afford some rounds of refinement early on. Supporting matrix-free / iterative interfaces will be especially important for parallel algorithms, where the optimization community (with a few notable exceptions) is falling increasingly behind the times. Julia wants to be on the cutting edge, no? |
Definitely! |
Thanks for the feedback, @dpo, @timholy, @tkelman. I would say that a first-order design goal is to provide enough functionality to support the existing open-source and commercial nonlinear solvers that are used in practice; these are typically factorization based. But I also think that this is an opportunity to provide a foundation for iterative-method-based solvers. We should definitely add methods for evaluating @tkelman, I was also thinking about specifying whether constraints are linear/quadratic/nonlinear. One can compute this via a depth-first search on the expression graph, so the information is available in principle. We should have helper routines for this. Are there cases where expression graphs might not be available but we know that constraints are linear or quadratic? To summarize, we should support the basic operations needed for both factorization and iterative-based solvers. Past that, I think it's reasonable to let the interface grow organically as demand arises for particular functionality, as has happened for the LP/MIP interface. |
@tkelman made lots of good points. Additional information that is useful when you develop new methods includes (you may already have some of those; I didn't check):
I also find it useful to be able to evaluate the Jacobian of the linear and nonlinear constraints separately. For instance, IPOPT could be finer and evaluate the Jacobian of the linear constraints only once upon initialization. During the iterations, it would only need to evaluate the Jacobian of what's changed. For problems with many linear constraints, it could mean substantial speedup. The commercial package Knitro uses an inexact Newton/CG approach and relies on Hessian-vector products, but it does need an explicit Jacobian. Lancelot is a good and bad example of the matrix-free family. It's true that internally, it requires only products with the Hessian and Jacobian, but for some obscure reason, it evaluates those as explicit matrices. A student of mine wrote an experimental matrix-free augmented Lagrangian: https://github.com/syarra/nlpy/blob/lancelot-dev/nlpy/optimize/solvers/auglag2.py. It's in the process of being cleaned up. For those who were at SIOPT, it's the code used in Joaquim Martins's plenary talk. I think there's also a matrix-free option in Algencan (http://www.ime.usp.br/~egbirgin/tango/codes.php#algencan). Speaking for continuous optimization, I can say from experience that the API I exposed in AMPL.jl is sufficient for most cases, except that for some strange reason, AMPL doesn't provide products with the Jacobian and its transpose. It's only recently that I worked on a method that needs to perform the Also I'm very happy to help with all things CUTEst. |
@mlubin What you're saying about expression trees is making me think back of my old Dr. AMPL where the idea is to prove or disprove convexity. (The idea is quite different from CVX, where you build convex problems.) The mechanisms are described in http://dx.doi.org/10.1287/ijoc.1090.0321 and http://dx.doi.org/10.1007/s10287-009-0101-z. It sounds like the same may be doable in Julia. Dr. Julia? |
@dpo That would definitely be doable, and likely easier to implement since expression trees in Julia are first-class objects. |
CC also @stevengj. We should be able to bring NLopt into the fold. |
@dpo Ah yeah, had forgotten about Algencan and didn't know the details of Knitro. Jacek Gondzio has some matrix-free solvers as well, the published results look good but the up-to-date implementations are commercial. I thought Martins said he was using Snopt, but anyway there's some good stuff in NLPy (and your results showing minres does extra work on saddle point systems - I'm running into exactly that issue right now).
Where do the expression graphs live? In MathProgBase? I rarely deal with black-box solvers, but that's the main example I can think of where the modeler may know more about the constraints than the code does. |
The expression graphs may optionally be provided by the modeling level if they're available. With CasADi, you can specify a vector-valued constraint function whose most natural expression graph is matrix and not scalar-based, so that's an example where the modeling system could determine that the model is linear or quadratic without constraint-wise scalar expression graphs necessarily being available. |
Another example is if I were to port over the sparse nonlinear modeling library I put together from Matlab you also wouldn't necessarily have an expression graph, but would be able to determine linearity from the properties of the matrices I use to encode the problem. You could construct an expression graph if you wanted one, but it wouldn't usually be necessary. Not sure if it's worth doing for model construction (I'm fairly well-sold that macros are faster than operator overloading, if more complicated to understand), but if I can get expression trees out of JuMP in a convenient format I may take another look at transforming things into my representation for Jacobian/Hessian evaluation purposes. |
NLopt currently requires the dense Jacobian for its gradient-based algorithms, mainly because that was the easiest lowest-common-denominator interface to implement. The CCSA / MMA, and Auglag algorithms in NLopt could potentially be rewritten to only use matrix-times-vector products with the Jacobian and hence to take advantage of sparsity. (I'm particularly fond of the CCSA algorithm: a surprisingly simple and robust globally-convergent algorithm for general NLPs, though it doesn't exploit second-derivative information or quasi-Newton approximations thereof. If you're looking to implement something natively in Julia for NLP's, I would try out CCSA as a first pass.) |
No equality constraints means not general, for my purposes anyway. |
@tkelman, that's true, it's not completely general NLP; the nature of the conservative approximations in CCSA (which finds a sequence of strictly feasible points) requires a non-empty interior of the feasible region (although linear equality constraints can be added fairly easily). |
It's an illusion. Inequality constraints don't guarantee a non-empty feasible region (though you probably mean a non-empty strict interior, but inequalities don't guarantee that either). You might get around the problem by penalizing violation in the 1-norm. Equality constraints c(x)=0 are converted to -t ≤ c(x) ≤ t, and the new variable t is penalized in the objective (as the sum of its components). |
Sorry to be late chiming in:
Yes, I think I'll be all set if one implements an interface that works when provided with those operations. |
@dpo, of course inequality constraints don't guarantee a strict interior (e.g. equality constraints can be trivially implemented via inequality constraints), but equality constraints guarantee that you don't have one and hence they are rejected by the code. The nature of the CCSA algorithm would be entirely changed if you don't have strictly feasible points; it would be an entirely different algorithm. Of course, you could have a nested algorithm with an increasing penalty in an outer loop and CCSA in an inner optimizations, but I tend to dislike such nested algorithms (including the Augmented Lagrangian methods).... you want something smarter that doesn't try to converge the inner optimization too accurately when the penalty is large. |
👍 |
@jaeandersson, do you think it's reasonable to start off with only scalar expression graphs for each constraint? This seems more tractable with some prior work like .nl files and OSiL. |
(in case it wasn't already obvious) I'm a huge fan of OSiL as an interchange format - it's more verbose than AMPL nl, but way easier to parse, construct, and manipulate. I haven't dug into the internals of how JuMP's nonlinear support is currently implemented, but I imagine it should be possible to do a more-or-less direct translation. |
I'm not too familiar with the OSiL format, but I imagine we would be able to generate it from JuMP. The only slightly weird syntax we have is |
OSiL has varargs sum and prod nodes. But yes, something in-memory and more compact than XML would be desirable for the purposes of MPB. It might not need to be anything more than Julia Expr objects in the end. Saving, loading, and converting between optimization formats could probably be developed outside of either MPB or JuMP at first, but nice to have a unified format that can go through MPB for interfacing different solvers and/or modeling environments. If anyone here uses Pyomo, leveraging some of their well-developed conversion/import/export capabilities via PyCall could be worth looking into. |
An OSiL writer would fit in here as a MPB "solver" that would request the expression trees and write them out (and maybe then call a real solver with the output). Speaking of other languages, one design consideration we should have for this interface is to make it "easily" C-exportable for the day that Julia code can be compiled to standalone shared libraries. |
@dpo @timholy: I've added Jacobian-vector products Generating the indices of the linear constraints is now a one-liner: I can see that it could be useful to evaluate the Jacobian of the linear and nonlinear constraints separately, although the |
I'm forgetting, of course, that one could want to use the linear equalities to project into a feasible subspace, for example. My primary concern, however, is that the necessary information, like linear/nonlinear and Jacobian-vector products, is available. I don't think we can hope to design the interface so that it perfectly matches the computational patterns that any algorithm might want, but all of the needed information should be available by a simple transformation that won't become a speed bottleneck, at least. |
Yes, or imagine a penalty method that calls a sub-solver to minimize the penalty function. If your sub-solver is able to handle and satisfy linear constraints, there's hope your penalty parameter won't become artificially large. I'd you now have a fairly complete interface. You can't possibly predict what all algorithms might need, but that's fine. I never imagined I would ever need the dreaded |
It is definitely much easier. Maybe it could be enough, assuming that you support calling vector/matrix-valued expression from the scalar expression graphs. It might however be hard or impossible to support really large-scale problems if everything is expanded in terms of scalar operations. |
I would strongly advise you to coordinate this whole effort with the Pyomo people. I think it's obvious from the Python-Julia interop efforts that a cooperation can make both projects stronger. |
Returns the sparsity structure of the Jacobian matrix :math:`J_g(x) = \left[ \begin{array}{c} \nabla g_1(x) \\ \nabla g_2(x) \\ \vdots \\ \nabla g_m(x) \end{array}\right]` where :math:`g_i` is the :math:`i\text{th}` component of :math:`g`. The sparsity structure | ||
is assumed to be independent of the point :math:`x`. Returns a tuple ``(I,J)`` | ||
where ``I`` contains the row indices and ``J`` contains the column indices of each | ||
structurally nonzero element. These indices may not be sorted and can contain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"are not required to be sorted" - they can be but don't have to be
This interface is sufficient for supporting continuous NLP solvers, so I'll merge this now. The expression tree interface will be addressed later. |
Following #31, here's a proposal for a solver-independent interface to nonlinear programming in MathProgBase.
You can see the compiled documentation here: http://mathprogbasejl.readthedocs.org/en/nlp/nlp.html
@dpo, since you've written an interface to AMPL and have expressed interest in writing solvers in Julia, would you be interested in collaborating on this?
I've just written down what I think is the minimal interface needed to work with the existing solvers like IPOPT and MOSEK. Suggestions for additional functionality to add are welcome.
We might also consider adding support for:
although that's a lot to start out with!
CC: @IainNZ @joehuchette @juan-pablo-vielma @tkelman @ulfworsoe @jfsantos @r-gaia-cs @madeleineudell @timholy @jaeandersson