RFC: NLP interface #32

mlubin · 2014-05-24T00:01:31Z

Following #31, here's a proposal for a solver-independent interface to nonlinear programming in MathProgBase.

You can see the compiled documentation here: http://mathprogbasejl.readthedocs.org/en/nlp/nlp.html

@dpo, since you've written an interface to AMPL and have expressed interest in writing solvers in Julia, would you be interested in collaborating on this?

I've just written down what I think is the minimal interface needed to work with the existing solvers like IPOPT and MOSEK. Suggestions for additional functionality to add are welcome.

We might also consider adding support for:

Multiple objectives
Sensitivity analysis
Nonlinear conic programming

although that's a lot to start out with!

CC: @IainNZ @joehuchette @juan-pablo-vielma @tkelman @ulfworsoe @jfsantos @r-gaia-cs @madeleineudell @timholy @jaeandersson

IainNZ · 2014-05-24T01:35:00Z

I like it. Obviously will work very nicely with IPOPT.

dpo · 2014-05-24T02:42:41Z

Sure I'd be interested at least in chiming in. In many realistic problems, the Hessian and Jacobian aren't available as explicit matrices, but only as operators. So there's a need for evaluating Hessian-vector products, Jacobian-vector products, and transpose Jacobian-vector products, i.e, H_v, J_v and J'*u. IPOPT requires explicit matrices for everything because it's factorization-based, but for iterative solvers, it makes more sense to only rely on products.

IainNZ · 2014-05-24T05:25:34Z

@dpo do you know of a open-source iterative solver we can look at that might use those sorts of things?

timholy · 2014-05-24T10:42:57Z

Just to chime in here, one of the reasons my constrained optimization branch for Optim looks so different and therefore remains unmerged is because these are issues for me, too. I can't pretend that I know this approach is best, but I've generally settled into a pattern where I use conjugate-gradient as my core algorithm, and accumulate both the gradient of the objective and of all the constraints (of which there can be as many as 10^7 individual constraints) into a single gradient vector for some specific value of the barrier coefficient. Matrix algebra, even sparse matrix algebra, tends to be pretty dang slow under such circumstances. Why spend all your time fussing over the exact inversion of a huge sparse matrix---probably using preconditioned conjugate gradient---as an "inner" step during a single iteration, when you could instead be investing your CPU cycles making progress on the overall nonlinear conjugate gradient?

So for the problems I'm interested in, I'm not yet a huge fan of techniques that rely on inversion/factorization as a core step for optimization.

As an extra but (now) much less important point, for my most important problem it took almost two weeks to analytically calculate and numerically implement the gradient. (It's a nasty, complicated expression and I was doing this in the early days of Julia before we had AD.) Certainly, I'm leery of analytically calculating the 2nd derivative too 😄. Now that we have AD I doubt this is such a concern, unless it has substantially worse performance than hand-calculated expressions. I haven't tested.

timholy · 2014-05-24T10:43:41Z

We should also include @johnmyleswhite here.

tkelman · 2014-05-24T11:59:44Z

To @IainNZ and @timholy - I think the main example of a general-purpose open-source optimization solver for nonconvex nonlinear programming that can be switched to use either direct or iterative linear algebra is Lancelot. Dominique knows better than I do, but I'm not aware of any other general-purpose solvers using iterative linear algebra that have demonstrated robustness and good performance on standard test sets. It's something a lot of people are working on, but most of the implementations that I know of are still experimental or problem-specific or shoehorned into existing API's.

To @mlubin, is the idea here to abstract out a general-purpose nonlinear programming API that can provide every possible evaluation routine that any solver or algorithm could ever need, then the modeling frameworks (for now mostly JuMP, but potentially others later) are responsible for implementing as many of the API functions as possible for any given problem class?

It's worth doing a survey of what's out there aside from Ipopt (which is great and I use it heavily, but its API is representative of interior point algorithms using direct linear algebra), and what other classes of algorithms are under development. What about augmented Lagrangian methods for example, have you guys taken a close look at the API for Lancelot? There's also Opt++ https://software.sandia.gov/opt++/ which I've read a little about but not thoroughly evaluated, and TAO http://www.mcs.anl.gov/research/projects/tao/ which I think still can't handle general constraints but uses PETSc for all its linear algebra. How about active set or reduced space methods, would those tests and transformations have to be done at a solver-specific level, or can those be abstracted to a solver-independent interface here? Same with constraint projection or proximal operators, for classes of problems where those methods apply.

Would it be worth the added complexity of allowing linear and nonlinear constraints and/or variables to be provided separately, or indicated by the user (modeling framework) in some manner? Constraint linearity makes little algorithmic difference to Ipopt, but Minos and Snopt (and MINLP solvers like Bonmin and Couenne) can take advantage of it. Snopt typically beats Ipopt in number of iterations, but its SQP iterations are more expensive than Ipopt's interior point steps unless the function evaluations are very slow.

The number of times I heard "the prototype implementation is in Matlab" over the past week at SIOPT made me a little sad, here's hoping JuliaOpt can start fixing that problem. I did some evangelizing during my talk, and in conversations with Laurent El Ghaoui and others.

timholy · 2014-05-24T12:39:18Z

The number of times I heard "the prototype implementation is in Matlab" over the past week at SIOPT made me a little sad, here's hoping JuliaOpt can start fixing that problem.

To be honest, in terms of being a platform to develop new optimization algorithms, until we have the equivalent of dbstop if error I don't think we can really compete. From my experience implementing hz_linesearch in Optim, the problem is that optimization is full of weird corner cases, numerical precision loss, etc. If your problem doesn't crash until you're a couple hours into it, it's just too annoying to debug using @show everywhere. Fortunately, I think Keno is planning on working on this over the summer.

The other thing we need to do it get that CUTEst wrapper figured out. I ran into some roadblocks in my own testing, and haven't had time to keep plugging away at it.

tkelman · 2014-05-24T12:52:20Z

OT and mostly separate from this proposal, but you're quite right that you have to be able to keep a close eye on what's going on for writing a prototype. The part that makes me sad is how so many of these Matlab prototypes will never get turned into something seriously useful, since the barrier to translating into a real C++ implementation is so high and beyond the scope of most PhD's.

IainNZ · 2014-05-24T16:48:20Z

There are a lot of things in optimization that might be shared across solvers, but we'd go crazy trying to implement them all. So, that suggests there is some threshold - e.g. in the extreme, if only one solver supports something, its not ready to be abstracted. We just need to get a consensus on that threshold. The "IPOPT style", which we know has multiple solvers, and in fact already has a Julia interface, is of course the natural place to start. I think your point about the iterative solver is also a good use case, although it still seems fairly cutting-edge. We've also been talking about MINLP solvers around here, which need/want an expression tree, but thats already at the point where maybe we are jumping the gun. A thought occurs though: maybe providing a standardized interface encourages the development of new solvers by reducing effort? Hmmm...

Anyway, I doubt we are going to get it right the first time. It'd help to have more solvers wrapped in Julia before trying to guess a general interface for them, but maybe as a proxy we should assemble a little document that contains a list of the solvers we'd be interested in one-day wrapping, and a few words about their interface?

tkelman · 2014-05-24T18:09:41Z

Definitely start somewhere, you can afford some rounds of refinement early on. Supporting matrix-free / iterative interfaces will be especially important for parallel algorithms, where the optimization community (with a few notable exceptions) is falling increasingly behind the times. Julia wants to be on the cutting edge, no?

IainNZ · 2014-05-24T18:13:24Z

Definitely!

mlubin · 2014-05-24T18:38:05Z

Thanks for the feedback, @dpo, @timholy, @tkelman. I would say that a first-order design goal is to provide enough functionality to support the existing open-source and commercial nonlinear solvers that are used in practice; these are typically factorization based. But I also think that this is an opportunity to provide a foundation for iterative-method-based solvers. We should definitely add methods for evaluating J*x and J'*x, and these won't be difficult to support from JuMP (even if we just use the built-in sparse mat-vec). @timholy, will this be sufficient for your constrained optimization code to accept this interface?

@tkelman, I was also thinking about specifying whether constraints are linear/quadratic/nonlinear. One can compute this via a depth-first search on the expression graph, so the information is available in principle. We should have helper routines for this. Are there cases where expression graphs might not be available but we know that constraints are linear or quadratic?

To summarize, we should support the basic operations needed for both factorization and iterative-based solvers. Past that, I think it's reasonable to let the interface grow organically as demand arises for particular functionality, as has happened for the LP/MIP interface.

dpo · 2014-05-24T18:57:05Z

@tkelman made lots of good points. Additional information that is useful when you develop new methods includes (you may already have some of those; I didn't check):

the indices of linear constraints
the indices of equality / inequality constraints
the indices of bounded variables.

I also find it useful to be able to evaluate the Jacobian of the linear and nonlinear constraints separately. For instance, IPOPT could be finer and evaluate the Jacobian of the linear constraints only once upon initialization. During the iterations, it would only need to evaluate the Jacobian of what's changed. For problems with many linear constraints, it could mean substantial speedup.

The commercial package Knitro uses an inexact Newton/CG approach and relies on Hessian-vector products, but it does need an explicit Jacobian.

Lancelot is a good and bad example of the matrix-free family. It's true that internally, it requires only products with the Hessian and Jacobian, but for some obscure reason, it evaluates those as explicit matrices. A student of mine wrote an experimental matrix-free augmented Lagrangian: https://github.com/syarra/nlpy/blob/lancelot-dev/nlpy/optimize/solvers/auglag2.py. It's in the process of being cleaned up. For those who were at SIOPT, it's the code used in Joaquim Martins's plenary talk.

I think there's also a matrix-free option in Algencan (http://www.ime.usp.br/~egbirgin/tango/codes.php#algencan).

Speaking for continuous optimization, I can say from experience that the API I exposed in AMPL.jl is sufficient for most cases, except that for some strange reason, AMPL doesn't provide products with the Jacobian and its transpose. It's only recently that I worked on a method that needs to perform the ghjvprod() operation, but I've never seen it anywhere else.

Also I'm very happy to help with all things CUTEst.

dpo · 2014-05-24T19:02:14Z

@mlubin What you're saying about expression trees is making me think back of my old Dr. AMPL where the idea is to prove or disprove convexity. (The idea is quite different from CVX, where you build convex problems.) The mechanisms are described in http://dx.doi.org/10.1287/ijoc.1090.0321 and http://dx.doi.org/10.1007/s10287-009-0101-z.

It sounds like the same may be doable in Julia. Dr. Julia?

mlubin · 2014-05-24T19:21:55Z

@dpo That would definitely be doable, and likely easier to implement since expression trees in Julia are first-class objects.

mlubin · 2014-05-24T20:12:38Z

CC also @stevengj. We should be able to bring NLopt into the fold.

tkelman · 2014-05-24T20:15:43Z

@dpo Ah yeah, had forgotten about Algencan and didn't know the details of Knitro. Jacek Gondzio has some matrix-free solvers as well, the published results look good but the up-to-date implementations are commercial. I thought Martins said he was using Snopt, but anyway there's some good stuff in NLPy (and your results showing minres does extra work on saddle point systems - I'm running into exactly that issue right now).

Are there cases where expression graphs might not be available but we know that constraints are linear or quadratic?

Where do the expression graphs live? In MathProgBase? I rarely deal with black-box solvers, but that's the main example I can think of where the modeler may know more about the constraints than the code does.

mlubin · 2014-05-25T02:00:14Z

The expression graphs may optionally be provided by the modeling level if they're available. With CasADi, you can specify a vector-valued constraint function whose most natural expression graph is matrix and not scalar-based, so that's an example where the modeling system could determine that the model is linear or quadratic without constraint-wise scalar expression graphs necessarily being available.

tkelman · 2014-05-25T02:23:48Z

Another example is if I were to port over the sparse nonlinear modeling library I put together from Matlab you also wouldn't necessarily have an expression graph, but would be able to determine linearity from the properties of the matrices I use to encode the problem. You could construct an expression graph if you wanted one, but it wouldn't usually be necessary. Not sure if it's worth doing for model construction (I'm fairly well-sold that macros are faster than operator overloading, if more complicated to understand), but if I can get expression trees out of JuMP in a convenient format I may take another look at transforming things into my representation for Jacobian/Hessian evaluation purposes.

stevengj · 2014-05-25T02:28:17Z

NLopt currently requires the dense Jacobian for its gradient-based algorithms, mainly because that was the easiest lowest-common-denominator interface to implement. The CCSA / MMA, and Auglag algorithms in NLopt could potentially be rewritten to only use matrix-times-vector products with the Jacobian and hence to take advantage of sparsity.

(I'm particularly fond of the CCSA algorithm: a surprisingly simple and robust globally-convergent algorithm for general NLPs, though it doesn't exploit second-derivative information or quasi-Newton approximations thereof. If you're looking to implement something natively in Julia for NLP's, I would try out CCSA as a first pass.)

tkelman · 2014-05-25T02:35:56Z

From http://ab-initio.mit.edu/wiki/index.php/NLopt_Algorithms#MMA_.28Method_of_Moving_Asymptotes.29_and_CCSA

algorithm for gradient-based local optimization, including nonlinear inequality constraints (but not equality constraints)

No equality constraints means not general, for my purposes anyway.

stevengj · 2014-05-25T03:20:39Z

@tkelman, that's true, it's not completely general NLP; the nature of the conservative approximations in CCSA (which finds a sequence of strictly feasible points) requires a non-empty interior of the feasible region (although linear equality constraints can be added fairly easily).

dpo · 2014-05-25T03:27:33Z

It's an illusion. Inequality constraints don't guarantee a non-empty feasible region (though you probably mean a non-empty strict interior, but inequalities don't guarantee that either). You might get around the problem by penalizing violation in the 1-norm. Equality constraints c(x)=0 are converted to -t ≤ c(x) ≤ t, and the new variable t is penalized in the objective (as the sum of its components).

timholy · 2014-05-25T12:07:56Z

Sorry to be late chiming in:

We should definitely add methods for evaluating J_x and J'_x, and these won't be difficult to support from JuMP (even if we just use the built-in sparse mat-vec). @timholy, will this be sufficient for your constrained optimization code to accept this interface?

Yes, I think I'll be all set if one implements an interface that works when provided with those operations.

stevengj · 2014-05-25T12:41:10Z

@dpo, of course inequality constraints don't guarantee a strict interior (e.g. equality constraints can be trivially implemented via inequality constraints), but equality constraints guarantee that you don't have one and hence they are rejected by the code. The nature of the CCSA algorithm would be entirely changed if you don't have strictly feasible points; it would be an entirely different algorithm. Of course, you could have a nested algorithm with an increasing penalty in an outer loop and CCSA in an inner optimizations, but I tend to dislike such nested algorithms (including the Augmented Lagrangian methods).... you want something smarter that doesn't try to converge the inner optimization too accurately when the penalty is large.

jaeandersson · 2014-05-25T16:42:31Z

👍
Would be very nice if there would be some way of exporting the expression graphs corresponding to the NLP. Like AMPL models can be exported as .nl files. But first you would need to define a good exchange format, which is a non-trivial problem.

mlubin · 2014-05-26T01:43:10Z

@jaeandersson, do you think it's reasonable to start off with only scalar expression graphs for each constraint? This seems more tractable with some prior work like .nl files and OSiL.

tkelman · 2014-05-26T05:22:18Z

(in case it wasn't already obvious) I'm a huge fan of OSiL as an interchange format - it's more verbose than AMPL nl, but way easier to parse, construct, and manipulate. I haven't dug into the internals of how JuMP's nonlinear support is currently implemented, but I imagine it should be possible to do a more-or-less direct translation.

mlubin · 2014-05-26T05:32:03Z

I'm not too familiar with the OSiL format, but I imagine we would be able to generate it from JuMP. The only slightly weird syntax we have is sum{} and prod{} which we might need to expand out (potentially leading to very large expression trees). Anyway, for this interface at least I'm imagining an in-memory representation of expression trees.

tkelman · 2014-05-26T05:41:28Z

OSiL has varargs sum and prod nodes. But yes, something in-memory and more compact than XML would be desirable for the purposes of MPB. It might not need to be anything more than Julia Expr objects in the end.

Saving, loading, and converting between optimization formats could probably be developed outside of either MPB or JuMP at first, but nice to have a unified format that can go through MPB for interfacing different solvers and/or modeling environments. If anyone here uses Pyomo, leveraging some of their well-developed conversion/import/export capabilities via PyCall could be worth looking into.

mlubin · 2014-05-26T06:04:10Z

An OSiL writer would fit in here as a MPB "solver" that would request the expression trees and write them out (and maybe then call a real solver with the output).

Speaking of other languages, one design consideration we should have for this interface is to make it "easily" C-exportable for the day that Julia code can be compiled to standalone shared libraries.

mlubin · 2014-05-26T06:40:22Z

@dpo @timholy: I've added Jacobian-vector products eval_jac_prod and eval_jac_prod_t, and also isobjlinear/isobjquadratic/isconstrlinear.

Generating the indices of the linear constraints is now a one-liner: find([isconstrlinear(d,i) for i in 1:numConstr]). The indices of equality/inequality constraints and bounded variables are also one-liners given the constraint and variable bounds, respectively.

I can see that it could be useful to evaluate the Jacobian of the linear and nonlinear constraints separately, although the AbstractNLPEvaluator implementation can (and should) handle this optimization by just copying the constant values instead of computing anything. The overhead of the copying should be dominated by the numerical factorization time. Are there further ways to exploit constant rows in a Jacobian at the level of the the linear algebra/optimization algorithm?

mlubin · 2014-05-26T06:53:39Z

I'm forgetting, of course, that one could want to use the linear equalities to project into a feasible subspace, for example. My primary concern, however, is that the necessary information, like linear/nonlinear and Jacobian-vector products, is available. I don't think we can hope to design the interface so that it perfectly matches the computational patterns that any algorithm might want, but all of the needed information should be available by a simple transformation that won't become a speed bottleneck, at least.

dpo · 2014-05-26T07:27:33Z

Yes, or imagine a penalty method that calls a sub-solver to minimize the penalty function. If your sub-solver is able to handle and satisfy linear constraints, there's hope your penalty parameter won't become artificially large.

I'd you now have a fairly complete interface. You can't possibly predict what all algorithms might need, but that's fine. I never imagined I would ever need the dreaded ghjvprod(), but it wasn't difficult to add (it's a bottleneck in AMPL, unfortunately).

jaeandersson · 2014-05-26T07:41:31Z

@jaeandersson, do you think it's reasonable to start off with only scalar expression graphs for each constraint? This seems more tractable with some prior work like .nl files and OSiL.

It is definitely much easier. Maybe it could be enough, assuming that you support calling vector/matrix-valued expression from the scalar expression graphs. It might however be hard or impossible to support really large-scale problems if everything is expanded in terms of scalar operations.

jaeandersson · 2014-05-26T07:44:23Z

Saving, loading, and converting between optimization formats could probably be developed outside of either MPB or JuMP at first, but nice to have a unified format that can go through MPB for interfacing different solvers and/or modeling environments. If anyone here uses Pyomo, leveraging some of their well-developed conversion/import/export capabilities via PyCall could be worth looking into.

I would strongly advise you to coordinate this whole effort with the Pyomo people. I think it's obvious from the Python-Julia interop efforts that a cooperation can make both projects stronger.

tkelman · 2014-05-26T07:54:01Z

doc/nlp.rst

+    Returns the sparsity structure of the Jacobian matrix :math:`J_g(x) = \left[ \begin{array}{c} \nabla g_1(x) \\ \nabla g_2(x) \\ \vdots \\ \nabla g_m(x) \end{array}\right]` where :math:`g_i` is the :math:`i\text{th}` component of :math:`g`. The sparsity structure
+    is assumed to be independent of the point :math:`x`. Returns a tuple ``(I,J)``
+    where ``I`` contains the row indices and ``J`` contains the column indices of each
+    structurally nonzero element. These indices may not be sorted and can contain


"are not required to be sorted" - they can be but don't have to be

mlubin · 2014-06-03T23:03:45Z

This interface is sufficient for supporting continuous NLP solvers, so I'll merge this now. The expression tree interface will be addressed later.

RFC: NLP interface

draft of NLP interface

e2b0722

add Jacobian-vector products and linear/quadratic info

3b4dc2a

tkelman reviewed May 26, 2014
View reviewed changes

wording tweaks, thanks @tkelman

f365e7f

mlubin added a commit that referenced this pull request Jun 3, 2014

Merge pull request #32 from JuliaOpt/nlp

d43636d

RFC: NLP interface

mlubin merged commit d43636d into master Jun 3, 2014

mlubin deleted the nlp branch June 3, 2014 23:04

mlubin mentioned this pull request Sep 28, 2014

proposed standard for nonlinear expression graphs #42

Merged

RFC: NLP interface #32

RFC: NLP interface #32

Conversation

mlubin commented May 24, 2014

IainNZ commented May 24, 2014

dpo commented May 24, 2014

IainNZ commented May 24, 2014

timholy commented May 24, 2014

timholy commented May 24, 2014

tkelman commented May 24, 2014

timholy commented May 24, 2014

tkelman commented May 24, 2014

IainNZ commented May 24, 2014

tkelman commented May 24, 2014

IainNZ commented May 24, 2014

mlubin commented May 24, 2014

dpo commented May 24, 2014

dpo commented May 24, 2014

mlubin commented May 24, 2014

mlubin commented May 24, 2014

tkelman commented May 24, 2014

mlubin commented May 25, 2014

tkelman commented May 25, 2014

stevengj commented May 25, 2014

tkelman commented May 25, 2014

stevengj commented May 25, 2014

dpo commented May 25, 2014

timholy commented May 25, 2014

stevengj commented May 25, 2014

jaeandersson commented May 25, 2014

mlubin commented May 26, 2014

tkelman commented May 26, 2014

mlubin commented May 26, 2014

tkelman commented May 26, 2014

mlubin commented May 26, 2014

mlubin commented May 26, 2014

mlubin commented May 26, 2014

dpo commented May 26, 2014

jaeandersson commented May 26, 2014

jaeandersson commented May 26, 2014

tkelman May 26, 2014

Choose a reason for hiding this comment

mlubin commented Jun 3, 2014