Constrained optimization episode 2: revenge of the slack variables #303

timholy · 2016-11-08T15:32:20Z

No matter how the American election turns out, today we have something to celebrate: the beginning of what I hope will be a merge-worthy native Julia constrained optimization framework.

Long ago said that I would not work on hard-core optimization without a debugger (errors triggered 1 hour into an optimization run are too painful to fix otherwise), but @Keno has delivered, so now it's time for me to pony up. Besides, I've grown tired of maintaining my own fork of Optim and want access to all the goodies and cleanups (especially the cleanups) that others have contributed.

Rather than implementing the whole framework at once, I decided to try to keep things "simple" (if a ~~1600~~2000 line diff can be called simple) by implementing just one method. I picked the interior-point Newton method, since that seems to be the mostly widely-used and/or described method, and for which the iteration is relatively simple. There are also many, many bells and whistles one could add. But I want to get a basic implementation of something merged before going on to other things.

Compared to #50, there are almost too many differences to mention, as this was a clean start. A few of the highlights:

it's more in line with the MathProgBase interface (still not a perfect match, but pretty close), for example in how the bounds are specified and how the constraint hessian is requested.
it supports both equality and inequality constraints
it should exhibit far greater numerical stability near the edges of the feasible region (this has been a big problem with the approach in WIP: constrained optimization #50)

It's also worth briefly mentioning that this might have some nice features, though it needs real-world testing for us to know whether these will materialize. For example, unlike Ipopt this is guaranteed to roughly preserve the descent direction of the user's objective function (it will not choose such a high barrier coefficient that it pushes the solution into a different basin).

Testing are a mixed bag: there's quite a few tests for the low-level mathematics, but any overall tests (e.g., ones that involve a call to optimize) are a work in progress. I'm hoping to make use of CUTEst (JuliaSmoothOptimizers/CUTEst.jl#43 (comment)) and do a fairly thorough job, but that will take some time.

It's worth noting a few details:

argument order may be somewhat confused; in some places I adopt mutated-first, in others (where I'm trying to match some preexisting API) it's last. It's my understanding that Optim will eventually migrate to mutated-first, but perhaps this is too inconsistent now
I needed to customize the line search algorithm to account for constraints, so this goes against the migration to LineSearches.jl and reintroduces a linesearch method to Optim. Aside from simplicity in making updates (I don't want to be bothered with tagging another package just to make tests for a RFC pass), I kept this here because I was also trying to move towards a more sane linesearch API and so it doesn't fit with our classic functions. (It's based on the fact that anonymous functions are fast with julia 0.5 and higher---if people are unhappy with the overhead this causes on 0.4, tough. We have to write code for the future, not the past.)

CC @Cody-G, @dpo, @abelsiqueira.

Computes and tests the gradient, too. The Hessian will come later.

Based on the notion that we want to (largely) preserve the objective function's initial descent direction.

Also implements tracing

johnmyleswhite · 2016-11-08T15:48:42Z

Tim Holy for BDFL!

timholy · 2016-11-08T16:13:09Z

Noticed that @tkelman doesn't watch this repo, and that he provided very useful feedback on my previous attempt.

tkelman · 2016-11-08T19:19:13Z

is this a primal algorithm or primal-dual?

pkofod · 2016-11-08T21:06:43Z

Wonderful. We need more positive emoji-"like"-types I see.

argument order may be somewhat confused; in some places I adopt mutated-first, in others (where I'm trying to match some preexisting API) it's last. It's my understanding that Optim will eventually migrate to mutated-first, but perhaps this is too inconsistent now

Mutated first seems quite standard, so I'll be happy to go through the code and clean it up. Would it be annoying if I did this now?

it supports both equality and inequality constraints
👍

I'll have a closer look one of these days.

timholy · 2016-11-08T21:52:06Z

is this a primal algorithm or primal-dual?

I'm so glad you asked, because as I've learned more I've grown increasingly confused about this point. I form the KKT equations and solve them, so I suspect that many would call this primal-dual. However, the matrices in the primal-dual equations are asymmetric (e.g., the equation after eq. 5), and for generic nonlinear optimization that's problematic because you usually want to enforce positive-definiteness somewhere. So, at least some algorithms (e.g., Ipopt and this one) use a step of elimination to solve for certain constraints, which makes the reduced problem symmetric and amenable to being forced to be positive-definite (e.g., see Wächter & Biegler 2006, "On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming," eq. 11). From what I can tell, this basically converts the primal-dual equation back into something that's virtually a primal one---the resulting equation is almost identical to what you'd get if you went the primal route in the first place.

Can you clear this up?

tkelman · 2016-11-08T22:30:46Z

I think you've partially misread the Ipopt paper. It doesn't force or require anything to be positive definite - if it did, it would do a very poor job of solving non convex problems. It's not really constraints that are being eliminated, but complementarity rows of the Newton step direction of the KKT conditions. The KKT linear system that Ipopt solves is symmetric, but indefinite. It requires using a sparse Bunch Kaufman factorization in order to obtain the inertia of the KKT system at each iteration. It does a perturbation if the inertia does not give n positive and m negative eigenvalues, where n is the number of primal variables and m is the number of constraints.

The main difference between primal and primal-dual interior point, if I'm remembering the literature right, is the treatment of the barrier term for inequalities. If you replace inequalities with primal objective barrier terms before posing the KKT conditions and forming the Newton system, you have a primal method and the barrier term enters the objective Hessian in a way that squares the slack. For a primal dual method, you state the KKT conditions before approximating the inequalities with barrier terms. Instead you relax the complementarity condition to equal the barrier parameter instead of exactly zero. The contribution to the objective Hessian due to the barrier terms (which here aren't actually used in the derivation, it's purely a descriptive analogy) then depends on both the slack and dual variables. This is equivalent at a KKT point, but gives a different path to get there from an initial guess. Typically primal dual follows the central path better, and gives a Newton system that isn't quite as badly conditioned.

codecov-io · 2016-11-08T22:43:34Z

Current coverage is 85.64% (diff: 72.41%)

Merging #303 into master will decrease coverage by 4.95%

@@             master       #303   diff @@
==========================================
  Files            27         31     +4   
  Lines          1565       2160   +595   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           1418       1850   +432   
- Misses          147        310   +163   
  Partials          0          0

Powered by Codecov. Last update ccc51d7...cada264

timholy · 2016-11-09T11:22:58Z

Tim Holy for BDFL!

I was planning to decline, but given the results of the last few hours I might take you up on that offer, if it's still open. 😯

@tkelman, thanks for diving in. I'm learning a lot by thinking through your points.

It's not really constraints that are being eliminated, but complementarity rows of the Newton step direction of the KKT conditions.

Agreed. I simply meant it in the sense that you've introduced additional variables, and they too are subject to "constraints" (the perturbed complementarity condition). Also agreed my choice of words could be confusing, since there are also user-supplied constraints that are in a distinct category.

It doesn't force or require anything to be positive definite - if it did, it would do a very poor job of solving non convex problems.
It does a perturbation if the inertia does not give n positive and m negative eigenvalues, where n is the number of primal variables and m is the number of constraints.

OK, let's apply that to a simple non-convex problem, f(x) = -x^2 + x^4 with no constraints, so we have n=1 and m=0. The second derivative (hessian) is negative over the region |x| <= 1/sqrt(6). Ipopt will "perturb" the hessian until it has 1 positive "eigenvalue" (and there's only one eigenvalue in this case). Now, in this case it's very clear that the negativity of the hessian has nothing to do with roundoff error, so the "perturbation" will be large, bigger than the hessian itself. That's what I mean by "forcing nonnegativity." (I've also argued elsewhere that this very common approach to generating nonnegativity---adding a positive diagonal until the matrix is just barely positive-definite---has serious weaknesses, because it throws away useful information about the length scales over which your function is likely to change, effectively swapping "clearly negative" directions in which the objective changes over short length scales with "nearly flat" directions that require big jumps to produce changes in the objective. In my opinion we have a much better approach available in PositiveFactorizations.jl. But that's a separate discussion.)

I agree, though, that you can't impose positive-definiteness to the total problem, and that the equality constraints are what prevent that. In this PR I only impose nonnegativity on the hessian for the objective + terms that come from the inequalities. I don't have a convergence proof yet, and I wonder if I should somehow project out the components of the equality constraints first (see the discussion in the Ipopt paper following eq. 12). That's totally doable (at least for dense problems), but before worrying too much about that, I thought I'd see how the simple version fared in tests.

As far as the relationship between the primal problem and the primal-dual problem, AFAICT here's the difference: adopting the problem and approximately the notation on this page (i.e., temporarily ignoring equality constraints), the Newton equation from following the primal route is

and the one from the dual route, after solving for Δλ and substituting it into the other equation, is

(Note the wikipedia page is a little misleading in calling W the hessian matrix of B, it's really the coefficient of Δx in the first-order expansion of their eq 5, which is not the same thing.) If you apply the complementarity condition to this, a short calculation shows that these are the same. So AFAICT the dual method is simply a "lagged primal" update, where the complementarity condition is not exactly satisfied at all stages of the update but is instead "frozen" at the base point (and even the base point might not satisfy it exactly). So to the extent that the dual method is better, it arises from the "ability" to not exactly satisfy the perturbed complementarity equation.

tkelman · 2016-11-09T12:46:46Z

Ah good point, the interpretation of ensuring positive-definiteness of the Hessian block "projected onto the null space of the constraint Jacobian" works. I keep forgetting about that way of looking at it because that's not how it's calculated, unless you look at the unconstrained case.

In my opinion we have a much better approach available in PositiveFactorizations.jl

With apologies for my skepticism, but I'd personally like to see a convergence proof make it through peer review, and/or demonstrated results across common test sets, before being totally sold on the approach. Pure Julia implementation, abstraction and API design is hard enough work that simultaneous innovation on the algorithm side is a bit beyond my own risk tolerance when comparing to known-quantity existing solvers, but that's coming from someone who isn't in academia any more. You must have a motivating application that you've been developing this for and using it on already, right? Would be interesting if you could point to any details about the kinds of problems where this has been working well, in terms of size, convexity, number of constraints, Hessian and Jacobian structure, etc.

Looks like all the Hessian and Jacobian fields here are dense matrices? I hope that's a planned-to-be-temporary limitation?

tkelman · 2016-11-09T13:01:48Z

Note the wikipedia page is a little misleading in calling W the hessian matrix of B, it's really the coefficient of Δx in the first-order expansion of their eq 5, which is not the same thing.

How do they differ?

timholy · 2016-11-09T13:15:58Z

With apologies for my skepticism, but I'd personally like to see a convergence proof make it through peer review, and/or demonstrated results across common test sets, before being totally sold on the approach.

I want those things too, and hopefully I'll get to it eventually. But surely you understand the problem? After correction, a Hessian [-k 0; 0 ϵ] becomes [ϵ 0; 0 k+2ϵ], which means that you're modeling the function as having strong curvature in x_2 and weak curvature in x_1 whereas the reality is that it has strong curvature in x_1 and weak curvature in x_2. So your Newton step will be huge in x_1, and then you'll be puzzled about why the function is behaving entirely differently from what you expected---now you have a huge backtrack on your hands. There's no world in which that's a good thing. That doesn't prove that my approach is better, of course, just that the common approach has obvious limitations and there's a ton of room for doing better.

How do they differ?

It doesn't yet have the JDJ^T term, that comes about by solving and substituting for Δλ.

timholy · 2016-11-09T13:26:22Z

Looks like all the Hessian and Jacobian fields here are dense matrices? I hope that's a planned-to-be-temporary limitation?

AFAICT that's true of all of Optim. Subject of another PR, and it doesn't have to be me. I use L-BFGS, so none of this is relevant anyway. I'm just trying to get something to pass muster, and on #50 people didn't like the fact that it wasn't like the conventional Newton approach. So I'm starting with Newton, even though I have no intention of using it for heavy-duty stuff.

tkelman · 2016-11-09T13:56:31Z

It doesn't yet have the JDJ^T term, that comes about by solving and substituting for Δλ.

You'll get that term from the gradient of A^T λ since A is a function of x, no?

I use L-BFGS, so none of this is relevant anyway

For constrained problems too?

timholy · 2016-11-09T14:25:40Z

You'll get that term from the gradient of A^T λ since A is a function of x, no?

No, that gives you the ∇∇c term.

For constrained problems too?

Yes.

timholy · 2016-11-12T16:25:49Z

OK, I'd say this is no longer WIP: it seems to be working quite well on many test problems. I used CUTEst.jl and selected problems of 100 variables or less (and 100 constraints or less) that had second derivatives. Out of 293 such problems, it errored on 4 (I haven't yet dug into why), failed to converge on 16, and converged on the rest. On 32 problems, it found a better minimum than the one recorded in the CUTEst database, sometimes by quite a lot (possibly an error in the database, of course). Code is https://github.com/timholy/OptimTests.jl. I don't know if you want to incorporate that here, or exactly how you want this tested, but at least now this has been tested pretty thoroughly even if the code isn't part of this PR.

This isn't perfect, but I think it's good enough to be useful to folks.

This was motivated by the observation that the deviation-based algorithm doesn't work when there's only one constraint. The predictor algorithm has a little trouble with this case too, but it's not nearly so severe (it *can* increase μ, you just have to prevent it from decreasing it to 0). Moreover, it seems a little more regular in its changes.

timholy · 2016-11-23T21:35:05Z

Status update: lots of changes. Here's how we compare in terms of objective value to Ipopt:

We agree on ~90% of problems, and each is better than the other on another 5%. Looks like a wash to me.

Our progress on a per-iteration basis compared to Ipopt (plots the cumulative histogram of the fraction of iterations on which we're ahead):
$fracbetter$
Perhaps ever-so-slightly worse than Ipopt, but in the same ballpark.

Total number of iterations:

Both were set to a maximum of 3000 iterations. We're similar, but we do have more problems that reached the maximum number of iterations without converging. In cases I've checked, the algorithm actually found the right minimum but it didn't detect that it had converged. Some of these problems also don't converge if you run them in the unconstrained suite, so this isn't very specific to this PR.

While it's been a lot of work to get this far, I'm pretty stoked that with ~1month of effort it looks like we've got something that (for dense problems) does not look too bad compared to the industry-standard solution. There are several places (particularly the linesearch) where we are also noteworthily simpler than Ipopt, which I think is a good thing. I haven't benchmarked for speed, but I'm not going to worry about that yet. I'll focus on cleanup and test coverage and then merge this, and then work on speed if my own practical problems require it.

anriseth · 2016-12-13T22:54:04Z

Status update on this PR @timholy ?

pkofod · 2016-12-16T08:07:42Z

I wonder why the number of iterations in fminbox differs between the linux and mac build...

pkofod · 2016-12-17T19:24:48Z

@timholy is there anything specific you need to clean up? Do tell if you need a hand, so we don't end up in another rebase from hell!

pkofod · 2016-12-19T14:56:21Z

When you're ready I suggest you squash some of the 40 commits. This will make handling of the merge conflicts easier I think. I'll be happy to do the actual rebasing once you've squashed it a bit. The funny thing is that the pr that introduced the conflict is almost orthogonal to what happens here, but I still end up with weird conflicts when trying to rebase. For example, /test/constraints.jl doesn't even exist in master, but I still get merge conflicts when rebasing. So odd.

pkofod · 2017-02-02T11:40:34Z

src/Optim.jl

@@ -17,8 +17,13 @@ module Optim
           Base.setindex!

    export optimize,


any reason ContraintBounds are not exported?

anriseth · 2017-03-06T22:11:47Z

src/types.jl

+                push!(σ, 1)
+                push!(b, li)
+            end
+            ui = u[i]


It seems like ui is already defined in line 472?

musm · 2017-05-31T17:31:23Z

would be awesome :)

pkofod · 2017-05-31T18:06:36Z

would be awesome :)

Unless @timholy decides to finish this himself (ABC, Tim, Always Be Closing!), I wouldn't say it's improbable that I merge some version of this during 2017. We've cleared some older issues recently, and after a more proper loosening of types, I think this is next on the ticket.

timholy · 2017-05-31T18:24:03Z

I'd be happy with that. I haven't merged it because the natural next step is to implement this for something besides Newton (and it's what my lab would actually use). I started looking into that, and decided I didn't have the month to spare to tackle that 😄.

pkofod · 2017-05-31T18:36:32Z

I haven't merged it because the natural next step is to implement this for something besides Newton (and it's what my lab would actually use).

What do you mean by the comment in parentheses? That your lab would use it if something besides Newton was implemented, or that your lab is using this as is?

timholy · 2017-05-31T18:38:00Z

No, we're using the old teh/constrained branch, which supports more methods but has a much worse implementation.

pkofod · 2017-05-31T18:47:10Z

No, we're using the old teh/constrained branch, which supports more methods but has a much worse implementation.

Alright, I follow. I think the first step will be up resolve conflicts (hopefully not as horrible as teh/constrained!!!), and merge this. Then we can always play around with other versions later.

anriseth · 2018-03-03T17:49:19Z

The functionality from this PR is now available at https://github.com/JuliaNLSolvers/ConstrainedOptim.jl

I encourage everyone who are interested to try it out and let us know how it works for you.

The plan is to add some more constrained optimization code that is lying around to ConstrainedOptim.jl, and then move the code back into Optim at some point in the future.

timholy added 9 commits November 4, 2016 05:14

Add constraints and parsing tests

034f2cf

Add pretty-printing of ConstraintBounds

0c7a70f

Implement barrier function and equality-constraints

8b1e03b

Computes and tests the gradient, too. The Hessian will come later.

Fixes for julia-0.4

1ba0abf

Add interior point Newton method state, setup of Newton update equation

7d2ac1e

Add interior-point Newton step update and backtracking linesearch

6441534

Add a principled initialization for μ and λ

50b1582

Based on the notion that we want to (largely) preserve the objective function's initial descent direction.

Add optimize for interior-point methods

7102957

Also implements tracing

Add BaseTestNext to test/REQUIRE

11b0470

johnmyleswhite self-assigned this Nov 8, 2016

timholy added 3 commits November 10, 2016 03:54

Fix state bugs in linesearch and initialization

2cf1421

Add isfeasible and isinterior

f4bb08a

Add more utilities and record more variables with extended_trace

4f55399

timholy changed the title ~~(RFC) Constrained optimization episode 2: revenge of the slack variables~~ Constrained optimization episode 2: revenge of the slack variables Nov 12, 2016

Adopt exact updating of slack terms and λI during linesearch

0e84777

timholy added 4 commits November 23, 2016 11:04

Introduce a primal-dual guard condition

39eb001

Fix tests

ad06e8a

Fix ambiguities, tests on julia 0.4

1cbad48

timholy mentioned this pull request Nov 27, 2016

Generalize Optim.jl to be an interface for nonlinear optimization #309

Closed

timholy mentioned this pull request Dec 22, 2016

Make LinSpace generic JuliaLang/julia#18777

Merged

pkofod reviewed Feb 2, 2017

View reviewed changes

src/Optim.jl

@@ -17,8 +17,13 @@ module Optim

Base.setindex!

export optimize,

Copy link

Member

pkofod Feb 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason ContraintBounds are not exported?

pkofod unassigned johnmyleswhite Feb 13, 2017

pkofod mentioned this pull request Mar 6, 2017

GSoC 2017: Constrained Optimisation #379

Closed

anriseth reviewed Mar 6, 2017

View reviewed changes

src/types.jl

push!(σ, 1)

push!(b, li)

end

ui = u[i]

Copy link

Contributor

anriseth Mar 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like ui is already defined in line 472?

pkofod self-assigned this May 31, 2017

timholy mentioned this pull request Jun 17, 2017

Manifold optimization #433

Closed

pkofod mentioned this pull request Sep 19, 2017

Fix some deprecation warnings #465

Merged

anriseth mentioned this pull request Oct 25, 2017

Citing Optim.jl #478

Closed

anriseth mentioned this pull request Dec 9, 2017

API for fg! #439

Closed

anriseth mentioned this pull request Feb 7, 2018

GSoC 2018 proposals #523

Closed

pkofod closed this May 29, 2018

raphaelchinchilla mentioned this pull request Jul 2, 2020

More details on IPNewton implementation #829

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constrained optimization episode 2: revenge of the slack variables #303

Constrained optimization episode 2: revenge of the slack variables #303

timholy commented Nov 8, 2016 •

edited

Loading

johnmyleswhite commented Nov 8, 2016

timholy commented Nov 8, 2016

tkelman commented Nov 8, 2016

pkofod commented Nov 8, 2016

timholy commented Nov 8, 2016

tkelman commented Nov 8, 2016

codecov-io commented Nov 8, 2016 •

edited

Loading

timholy commented Nov 9, 2016

tkelman commented Nov 9, 2016 •

edited

Loading

tkelman commented Nov 9, 2016

timholy commented Nov 9, 2016

timholy commented Nov 9, 2016

tkelman commented Nov 9, 2016

timholy commented Nov 9, 2016

timholy commented Nov 12, 2016

timholy commented Nov 23, 2016 •

edited

Loading

anriseth commented Dec 13, 2016

pkofod commented Dec 16, 2016

pkofod commented Dec 17, 2016

pkofod commented Dec 19, 2016 •

edited

Loading

pkofod Feb 2, 2017

anriseth Mar 6, 2017

musm commented May 31, 2017

pkofod commented May 31, 2017

timholy commented May 31, 2017

pkofod commented May 31, 2017

timholy commented May 31, 2017

pkofod commented May 31, 2017

anriseth commented Mar 3, 2018

		@@ -17,8 +17,13 @@ module Optim
		Base.setindex!

		export optimize,

Constrained optimization episode 2: revenge of the slack variables #303

Constrained optimization episode 2: revenge of the slack variables #303

Conversation

timholy commented Nov 8, 2016 • edited Loading

johnmyleswhite commented Nov 8, 2016

timholy commented Nov 8, 2016

tkelman commented Nov 8, 2016

pkofod commented Nov 8, 2016

timholy commented Nov 8, 2016

tkelman commented Nov 8, 2016

codecov-io commented Nov 8, 2016 • edited Loading

Current coverage is 85.64% (diff: 72.41%)

timholy commented Nov 9, 2016

tkelman commented Nov 9, 2016 • edited Loading

tkelman commented Nov 9, 2016

timholy commented Nov 9, 2016

timholy commented Nov 9, 2016

tkelman commented Nov 9, 2016

timholy commented Nov 9, 2016

timholy commented Nov 12, 2016

timholy commented Nov 23, 2016 • edited Loading

anriseth commented Dec 13, 2016

pkofod commented Dec 16, 2016

pkofod commented Dec 17, 2016

pkofod commented Dec 19, 2016 • edited Loading

pkofod Feb 2, 2017

Choose a reason for hiding this comment

anriseth Mar 6, 2017

Choose a reason for hiding this comment

musm commented May 31, 2017

pkofod commented May 31, 2017

timholy commented May 31, 2017

pkofod commented May 31, 2017

timholy commented May 31, 2017

pkofod commented May 31, 2017

anriseth commented Mar 3, 2018

timholy commented Nov 8, 2016 •

edited

Loading

codecov-io commented Nov 8, 2016 •

edited

Loading

tkelman commented Nov 9, 2016 •

edited

Loading

timholy commented Nov 23, 2016 •

edited

Loading

pkofod commented Dec 19, 2016 •

edited

Loading