WIP: constrained optimization #50

timholy · 2014-03-11T03:34:37Z

This is not done yet, but I thought I should put it out there so people can comment sooner rather than later, if it looks like this is going in the wrong direction.

When finished, this should implement all the standard inequality constraints; equality constraints may come, but likely will be a little later (I have a pressing need for the inequalities, but not the equalities). The only algorithm I've implemented is a central-path-following interior-point solver---but I'd venture to say that if you could only pick one, interior-point is the way to go.

Using this, a version of nnls (just two lines, objective = linlsq(A, b); result = interior(objective, initial_x, bounds, method=:newton)) is 100x faster on a 50x50 problem compared to the existing implementation of nnls in Optim.

Left to do:

Add support for constraints to other linesearch algorithms
Implement value, gradient, and hessian for barrier functions of linear and nonlinear constraints
Add more tests

In addition to adding support for constrained optimization, this makes several relevant improvements to hz_linesearch, in particular ones that are smarter about some of the odd behaviors of floating-point numbers, and others that will make constrained optimization perform significantly better. Thanks to @JeffBezanson for his quick work on JuliaLang/julia#6097, making it feasible to debug some of the darker corners of hz_linesearch.

One final point: note that this changes how the problems directory is handled. I changed it because it wasn't obvious to me that we wanted this to be a regular part of loading Optim. Perhaps we should go one step further, and only load it when running tests (and not even have testpaths in Optim.jl).

mlubin · 2014-03-11T03:50:00Z

Nice! How difficult would it be to support sparse Hessians and constraint Jacobians? We could then pretty easily use this as the first pure-Julia backend for JuMP.

timholy · 2014-03-11T03:58:26Z

As you might notice, I began a process of switching declarations over to AbstractMatrix and AbstractVector. So it should not be hard. The actual fill-in will of course be up to the user, just as the user currently has to write the objective function.

For the constraints, my thinking is that the user writes the functions so they add to an existing gradient/Hessian (one computed by the objective function), rather than having the algorithm provide a blank vector/matrix and then copy the results into the combined gradient/Hessian. (See the beginning of interior.jl.) That's precisely so we can better support sparsity (all that copying would destroy any advantage of sparsity).

mlubin · 2014-03-11T04:03:41Z

The typical interface for these sorts of solvers it to provide a callback to compute the Hessian of the Lagrangian directly, where you provide the weights to the users. Check out http://ipoptjl.readthedocs.org/en/latest/ipopt.html.

We're soon going to be developing a standardized interface for this in MathProgBase, as we have for linear programming.

johnmyleswhite · 2014-03-11T05:18:02Z

This seems like a great start. I don't see anything that troubles me in a quick review. Will do a more detailed pass through tomorrow morning.

timholy · 2014-03-11T11:03:53Z

I can see passing Lagrange multipliers/coefficients to the user---that's indeed probably a better design than my attempt to hide it from the user. But there are some other things I don't understand (or don't like):

Why do you need to store the gradient of each constraint separately? (I.e., why do you even need the full Jacobian?) At least for interior-point problems without equality constraints, the only thing you need is the summed gradient of both the objective and the constraints. But perhaps other solvers benefit from having the full Jacobian? In which case we'd better plan for this.
Why should the user have to write a single function that incorporates all the constraints together with the objective? It seems to encourage better design to have the solver make the calls that accumulate each term separately, i.e.:
```
objective_gradient!(x, g)                # This initializes g and fills it with values
constraintA_gradient!(x, g, lambda1)     # Adds to g, doesn't replace g
constraintB_gradient!(x, g, lambda2)
...
```
The key advantage is that once I implement the functions for constraintA, I can re-use it again without modification for other problems, simply by saying something like
```
add_nlineq!(constraints, DifferentiableFunction(constraintA_objective, constraintA_gradient!))
```
In many fields there are a small handful of standard penalty functions or constraints, which can be used in conjunction with many different objective functions.
It seems inconsistent to provide the weights for computing the Hessian but not the Jacobian

In your case, I recognize that you simply have to live with the design choices make by Ipopt, but I'm not sure we should mimic it entirely.

mlubin · 2014-03-11T15:14:02Z

Different interior-point approaches do require the full Jacobian. If there are equality constraints, the KKT system is typically formulated as

[ H A^T ]
[ A  0  ]

where A is the Jacobian of the equality constraints (might have a transpose incorrect here). For sparse problems it's reasonable to formulate the KKT conditions using the explicit inequalities instead of eliminating them from the system. I'm mostly familiar with primal-dual interior-point methods, so things might be slightly different with primal methods. Anyway it doesn't make too much sense to have a "summing" interface to provide inequality constraints and another interface for equality constraints. The interface should also try to be uniform over different algorithms.

It's reasonable to have an interface where the terms are summed separately. This is more difficult if you have a sparse hessian, since you can't easily add to a sparse matrix (but the trick here is that solvers let you provide the matrix in triplet format, and duplicate terms are summed together). I don't have any objections to providing a convenient interface like this, as long as there's a way to write a small wrapper to transform Ipopt-style input to use this code.

Note that almost all high-performance nonlinear solvers out there (like KNITRO, MOSEK) have an interface that looks like Ipopt's. They're certainly not user friendly, though they're meant to be used from modeling systems like AMPL, GAMS, and now JuMP that can automatically provide sparse derivatives.

timholy · 2014-03-11T16:23:27Z

I did specify in my (edited) comment problems without equality constraints, which as far as I could tell from a brief skim is the only kind of problem Ipopt handles. Suppose you add linear equality constraints; for such constraints, there's no need to ask the user to supply anything other than the matrix and rhs---all the derivatives can all be handled internally, so again I don't see the need to trouble the user for the full Jacobian. Where one has to think about it is for nonlinear equality constraints, but my suspicion is that there one would be going with an augmented Lagrangian anyway, making it a little different problem. I'll think about it more, however.

The solver here "looks" primal but it exploits duality---eps_gap is the duality gap. One advantage of this formulation (which is straight out of Hindi's tutorial, based in turn on Boyd & Vandenberghe) is that you can use it with methods that don't require the Hessian (cg, etc). That's already implemented, in fact. I was quite surprised that even for a nnls problem, where you have the whole matrix, cg was only ~10x slower than Newton's method.

johnmyleswhite · 2014-03-11T17:05:54Z

src/bfgs.jl

@@ -138,6 +138,18 @@ function bfgs{T}(d::Union(DifferentiableFunction,
        f_x_previous, f_x = f_x, d.fg!(x, gr)
        f_calls, g_calls = f_calls + 1, g_calls + 1

+        x_converged,


Does moving the convergence check here change the inputs? If so, were the previous inputs not correct?

The issue was that there were other exit points from the function (https://github.com/JuliaOpt/Optim.jl/blob/master/src/bfgs.jl#L149). So it was indicating it hadn't converged, but that test often gets triggered because it did converge.

johnmyleswhite · 2014-03-11T17:21:52Z

Having read this more carefully, one question I have is why we should have a linlsq function in Optim. What's the use case for it?

tkelman · 2014-03-11T19:25:10Z

problems without equality constraints, which as far as I could tell from a brief skim is the only kind of problem Ipopt handles

Most of the serious large-scale solvers use a row-wise interface with a lower and an upper bound for each constraint, one of which can be +/- inf. Equality constraints have lb[i] = ub[i]. From https://projects.coin-or.org/Ipopt/wiki

   min     f(x)
x in R^n

s.t.       g_L <= g(x) <= g_U
           x_L <=  x   <= x_U

Or pick your favorite canonical form, I'm a fan of having just equality constraints and variable bounds, introducing slack variables for inequalities as needed.

The KKT system has the Jacobians of both equality and inequality constraints in it, in the sparse non-convex case you don't want to eliminate anything ahead of time, the sparse symmetric indefinite linear solver will do a better job at reducing fill-in. The zero block in the lower right isn't usually actually zero, an inertia perturbation is required to guarantee convergence properties.

When some constraint rows are linear, you could technically leave those nonzeros in the Jacobian untouched between interations and save a few copies, but the savings there are minimal when compared to the KKT factorization cost.

timholy · 2014-03-11T19:31:06Z

one question I have is why we should have a linlsq function in Optim

This creates an objective function for least squares, but then you can supplement this with constraints. Using this, nonnegative least-squares becomes a few lines:

objective = linlsq(A, b)
bounds = ConstraintsBox(zeros(eltype(x), length(x)), nothing)
results = interior(objective, initial_x, bounds, method=:newton)

As I mentioned above, I tested this on a particular 50x50 problem and found this version to be roughly 100x faster than the nnls I wrote initially and is currently in Optim.

mlubin · 2014-03-11T21:28:48Z

linlsq seems a bit restrictive. What if you want to do L1 or L2 regularized least squares? It's okay to have as an internal helper, but should it be exported?

johnmyleswhite · 2014-03-11T22:27:03Z

Building nnls in three lines of code is a really great example for Optim. But I guess I think nnls belongs in a separate package, since it's more of a use case for Optim than part of Optim per se.

timholy · 2014-03-12T03:22:43Z

Or pick your favorite canonical form, I'm a fan of having just equality constraints and variable bounds, introducing slack variables for inequalities as needed.

Got it, I hadn't thought through the implications of having two bounds on each (non)linear constraint carefully enough. That will work well for a Newton method, but I have to think through the implications for implementation with CG (which for my problems is the much more important algorithm)---it seems like unifying inequalities and equalities could result in troublesome behavior with respect to roundoff errors.

linlsq seems a bit restrictive. What if you want to do L1 or L2 regularized least squares? It's okay to have as an internal helper, but should it be exported?

But I guess I think nnls belongs in a separate package

OK, these can go.

Since this PR may take some redesign and I have deadlines, this may sit for a while.

johnmyleswhite · 2014-03-14T02:00:27Z

Ok. I'll be excited to see where this heads whenever you pick it up again.

Also support keyword arguments

…oint When the parameters are Float32, the return value of the function had better be, too, or we could be fooled about the precision.

timholy · 2015-10-21T18:43:57Z

I'm back to being interested in this branch again. Reference: https://groups.google.com/forum/#!topic/julia-opt/TVmuXFWfeBM. In contrast, here I see no overhead from the solver, and the progress towards the minimum per iteration also seems better. I haven't yet measured it per function evaluation; that's likely to look a little less favorable, because I noticed that our linesearch not infrequently uses more evaluations than Ipopt. (There's some evidence that might be a good thing (p. 2164), but it seems likely to reduce the speed of optimization.)

I kinda wanted to get out of the business of writing my own optimization code, and maybe I still will if someone comes up with another solution. But I thought I'd at least ping this issue to say it may not be dead yet. Another thing about Optim that I like is that I can use Float32s (my 100GB disk file would be a 200GB disk file had I used Float64s), and I worry a little about the minimizer detecting "edges" among adjacent Float32 values if it's using Float64s as coordinates for the optimization.

Since I've just been playing with this, here are a couple of random API thoughts:

With barrier methods, you have a combined objective. With show_trace=true, it's nice to show just the part that corresponds to the user's objective. The architecture here doesn't yet make that easy, because what the "outer" optimizer sees is the total function penalty (user-objective + barrier terms).
The API here isn't yet as flexible as MathProgBase. However, I note that it takes substantially fewer "user" lines of code. (Making it more flexible might reduce that advantage, of course.)

mlubin · 2015-10-22T15:04:13Z

@timholy, I'm open to suggestions like JuliaOpt/MathProgBase.jl#87 on how to make the MPB API easier to both use and implement on the solver side.

This should cover cases where dphi < 0 "by a hair," i.e., where the search direction is almost orthogonal to the gradient. It's an extra layer of security beyond commit 50b9d60 to increase the likelihood that when we terminate, we really have reached a minimum.

timholy · 2015-10-22T18:23:18Z

Other than what I suggested already, I can't think of any other ways to improve on what you've done---at the end of the day you have to be able to interact with C code, and the API you've already designed is quite good for that. I just wanted to throw in a tweak or two.

But of course new ideas may crop up in the course of implementation.

This no longer assumes that H is positive-definite, instead introducing a check and fix for cases where it does not already hold.

johnmyleswhite · 2016-01-12T19:36:46Z

src/cg.jl

@@ -89,7 +89,7 @@ macro cgtrace()
                dt["g(x)"] = copy(gr)
                dt["Current step size"] = alpha
            end
-            grnorm = norm(gr, Inf)
+            grnorm = norm(gr[:], Inf)


Can you use vecnorm(gr, Inf) here?

Yeah.

Thanks for the comment!

timholy · 2016-01-26T01:53:05Z

I am under the illusion that I'll soon have a few free days to work on Optim, and might want to put this into shape for merger. Any API thoughts about this PR would be welcome; I think I'll need to make quite a few changes.

mlubin · 2016-01-26T02:02:28Z

@timholy, I'd say choose the API that seems most natural to you, so long as it's possible to write a lightweight wrapper to take in MathProgBase input as well. There's no need for the native API to the solver to be exactly MathProgBase.

timholy · 2016-01-26T09:24:00Z

Aside from the integration into MathProgBase, what I wonder most about is the proper API for specifying constraints. Ipopt, for example, does not appear to have a special category for linear constraints, but I assume that would be a useful thing to specialize on. Likewise, #50 (comment) suggests two possible APIs for specifying equality or inequality constraints. As I think about it, I lean towards making everything an equality constraint and then specifying bounds on the slack variables, but any comments would be welcome.

From the user's perspective, the good news is that no matter how crappy an API I design, it will still look pretty when accessed via the remarkable JuMP 😄. But that's probably not something to shoot for.

tkelman · 2016-01-27T02:51:59Z

If you want the API to be appropriate for problems that may have general nonlinear equality constraints, then my personal favorite canonical form (equalities and bounds) hasn't changed over the last 2 years. Conversions between canonical forms are simple so the user-facing API can use something more general like the row-wise form even if the algorithm internally uses something different. For other classes of problems you may want the internal implementation to use a different form (and/or API) though.

In the Ipopt algorithm, linear constraints can at best save you a subset of Jacobian row evaluations, and those are typically not the bottleneck relative to the solution of the KKT system for the Newton step. You can set an option flag if all constraints happen to be linear then the Jacobian will only be evaluated once. Whether it's worth the API complication of allowing situations in between, either via a linearity bitmask or separate inputs for linear vs nonlinear constraints (Matlab fmincon style), depends mostly on whether you expect Jacobian evaluations to be expensive.

…ot eps_gap eps_gap introduced an absolute scale for convergence, which seemed problematic. This should be more robust, as it monitors the characteristics of the solution. It should also avoid making changes to the barrier that no longer have a consequence for the solution; previous approaches introduced numerical stability problems when the barrier penalty became extreme.

Also cleans up a bit of the tracing & convergence-testing

timholy · 2016-11-08T15:33:51Z

Don't delete this branch just yet, though. People may be using it.

pkofod · 2016-11-08T21:09:16Z

Well, we could let it stay forever. It could be used as a "masterclass in git-fu"... I know I've tried to rebase this quite a few times, only to give up halfway through!

timholy · 2016-11-09T11:40:53Z

Probably a good idea. Certainly people in my lab are using it!

johnmyleswhite reviewed Mar 11, 2014
View reviewed changes

timholy mentioned this pull request May 24, 2014

RFC: NLP interface JuliaOpt/MathProgBase.jl#32

Merged

timholy added 8 commits September 10, 2014 19:57

Add support for constraints and a interior-point methods

680edc7

Add constrained tests, and don't load problems unless running tests

809be96

Implement linear constraints

83a0bc3

Improve initial value of barrier coefficient

f832562

Also support keyword arguments

Update for deprecation of infs

5fabfe8

Check type of function value, and fix type of initial t in interior-p…

8f56008

…oint When the parameters are Float32, the return value of the function had better be, too, or we could be fooled about the precision.

Modify some asserts that were causing trouble for non-convex domains

de95ece

nelder-mead: check that at least one starting point has finite value

72f18ea

timholy force-pushed the teh/constrained branch from 50e0bee to 72f18ea Compare September 11, 2014 01:07

timholy added 2 commits September 10, 2014 21:15

Fix call to norm on high-dimensional objects

048394a

fminbox: remove unused tol argument and add mu0

4d536b7

Ken-B mentioned this pull request Nov 30, 2014

extra cg arguments in fminbox #89

Closed

timholy added 4 commits November 30, 2015 20:05

Ensure that the search direction is a descent direction

28ce676

This no longer assumes that H is positive-definite, instead introducing a check and fix for cases where it does not already hold.

interior_newton: increment t if the step converged

dc4a4c9

Fix returned function value when there are 0 parameters

67c9996

Improvements to positive-definite newton step

be3324a

johnmyleswhite reviewed Jan 12, 2016
View reviewed changes

timholy mentioned this pull request Jan 26, 2016

optim-julep: define AbstractProblem and make use of it in optimizers #163

Closed

Use PositiveFactorizations for the newton step

afccefc

timholy added 4 commits January 27, 2016 07:51

dot->vecdot

b66e6b6

interior_newton: pass options, bail if encounter numeric instability

db76ca0

Also cleans up a bit of the tracing & convergence-testing

interior_newton: also check for finiteness

2943c3f

pkofod force-pushed the master branch from e747f3f to 21bd318 Compare June 21, 2016 15:37

pkofod force-pushed the master branch 2 times, most recently from 7b0207c to 276c98d Compare August 13, 2016 07:49

pkofod mentioned this pull request Sep 24, 2016

API cortner/ConstrainedOptim.jl#5

Open

timholy mentioned this pull request Nov 8, 2016

Constrained optimization episode 2: revenge of the slack variables #303

Closed

timholy closed this Nov 8, 2016

pkofod mentioned this pull request Mar 6, 2017

GSoC 2017: Constrained Optimisation #379

Closed

cossio mentioned this pull request May 23, 2019

only_fgh! doesn't work if algorithm does not need Hessian #718

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: constrained optimization #50

WIP: constrained optimization #50

timholy commented Mar 11, 2014

mlubin commented Mar 11, 2014

timholy commented Mar 11, 2014

mlubin commented Mar 11, 2014

johnmyleswhite commented Mar 11, 2014

timholy commented Mar 11, 2014

mlubin commented Mar 11, 2014

timholy commented Mar 11, 2014

johnmyleswhite Mar 11, 2014

timholy Mar 11, 2014

johnmyleswhite commented Mar 11, 2014

tkelman commented Mar 11, 2014

timholy commented Mar 11, 2014

mlubin commented Mar 11, 2014

johnmyleswhite commented Mar 11, 2014

timholy commented Mar 12, 2014

johnmyleswhite commented Mar 14, 2014

timholy commented Oct 21, 2015

mlubin commented Oct 22, 2015

timholy commented Oct 22, 2015

johnmyleswhite Jan 12, 2016

timholy Jan 26, 2016

timholy commented Jan 26, 2016

mlubin commented Jan 26, 2016

timholy commented Jan 26, 2016

tkelman commented Jan 27, 2016

timholy commented Nov 8, 2016

pkofod commented Nov 8, 2016 •

edited

Loading

timholy commented Nov 9, 2016

WIP: constrained optimization #50

WIP: constrained optimization #50

Conversation

timholy commented Mar 11, 2014

mlubin commented Mar 11, 2014

timholy commented Mar 11, 2014

mlubin commented Mar 11, 2014

johnmyleswhite commented Mar 11, 2014

timholy commented Mar 11, 2014

mlubin commented Mar 11, 2014

timholy commented Mar 11, 2014

johnmyleswhite Mar 11, 2014

Choose a reason for hiding this comment

timholy Mar 11, 2014

Choose a reason for hiding this comment

johnmyleswhite commented Mar 11, 2014

tkelman commented Mar 11, 2014

timholy commented Mar 11, 2014

mlubin commented Mar 11, 2014

johnmyleswhite commented Mar 11, 2014

timholy commented Mar 12, 2014

johnmyleswhite commented Mar 14, 2014

timholy commented Oct 21, 2015

mlubin commented Oct 22, 2015

timholy commented Oct 22, 2015

johnmyleswhite Jan 12, 2016

Choose a reason for hiding this comment

timholy Jan 26, 2016

Choose a reason for hiding this comment

timholy commented Jan 26, 2016

mlubin commented Jan 26, 2016

timholy commented Jan 26, 2016

tkelman commented Jan 27, 2016

timholy commented Nov 8, 2016

pkofod commented Nov 8, 2016 • edited Loading

timholy commented Nov 9, 2016

pkofod commented Nov 8, 2016 •

edited

Loading