-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constrained optimization episode 2: revenge of the slack variables #303
Conversation
Computes and tests the gradient, too. The Hessian will come later.
Based on the notion that we want to (largely) preserve the objective function's initial descent direction.
Also implements tracing
Tim Holy for BDFL! |
Noticed that @tkelman doesn't watch this repo, and that he provided very useful feedback on my previous attempt. |
is this a primal algorithm or primal-dual? |
Wonderful. We need more positive emoji-"like"-types I see.
Mutated first seems quite standard, so I'll be happy to go through the code and clean it up. Would it be annoying if I did this now?
I'll have a closer look one of these days. |
I'm so glad you asked, because as I've learned more I've grown increasingly confused about this point. I form the KKT equations and solve them, so I suspect that many would call this primal-dual. However, the matrices in the primal-dual equations are asymmetric (e.g., the equation after eq. 5), and for generic nonlinear optimization that's problematic because you usually want to enforce positive-definiteness somewhere. So, at least some algorithms (e.g., Ipopt and this one) use a step of elimination to solve for certain constraints, which makes the reduced problem symmetric and amenable to being forced to be positive-definite (e.g., see Wächter & Biegler 2006, "On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming," eq. 11). From what I can tell, this basically converts the primal-dual equation back into something that's virtually a primal one---the resulting equation is almost identical to what you'd get if you went the primal route in the first place. Can you clear this up? |
I think you've partially misread the Ipopt paper. It doesn't force or require anything to be positive definite - if it did, it would do a very poor job of solving non convex problems. It's not really constraints that are being eliminated, but complementarity rows of the Newton step direction of the KKT conditions. The KKT linear system that Ipopt solves is symmetric, but indefinite. It requires using a sparse Bunch Kaufman factorization in order to obtain the inertia of the KKT system at each iteration. It does a perturbation if the inertia does not give n positive and m negative eigenvalues, where n is the number of primal variables and m is the number of constraints. The main difference between primal and primal-dual interior point, if I'm remembering the literature right, is the treatment of the barrier term for inequalities. If you replace inequalities with primal objective barrier terms before posing the KKT conditions and forming the Newton system, you have a primal method and the barrier term enters the objective Hessian in a way that squares the slack. For a primal dual method, you state the KKT conditions before approximating the inequalities with barrier terms. Instead you relax the complementarity condition to equal the barrier parameter instead of exactly zero. The contribution to the objective Hessian due to the barrier terms (which here aren't actually used in the derivation, it's purely a descriptive analogy) then depends on both the slack and dual variables. This is equivalent at a KKT point, but gives a different path to get there from an initial guess. Typically primal dual follows the central path better, and gives a Newton system that isn't quite as badly conditioned. |
Current coverage is 85.64% (diff: 72.41%)@@ master #303 diff @@
==========================================
Files 27 31 +4
Lines 1565 2160 +595
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 1418 1850 +432
- Misses 147 310 +163
Partials 0 0
|
I was planning to decline, but given the results of the last few hours I might take you up on that offer, if it's still open. 😯 @tkelman, thanks for diving in. I'm learning a lot by thinking through your points.
Agreed. I simply meant it in the sense that you've introduced additional variables, and they too are subject to "constraints" (the perturbed complementarity condition). Also agreed my choice of words could be confusing, since there are also user-supplied constraints that are in a distinct category.
OK, let's apply that to a simple non-convex problem, I agree, though, that you can't impose positive-definiteness to the total problem, and that the equality constraints are what prevent that. In this PR I only impose nonnegativity on the hessian for the objective + terms that come from the inequalities. I don't have a convergence proof yet, and I wonder if I should somehow project out the components of the equality constraints first (see the discussion in the Ipopt paper following eq. 12). That's totally doable (at least for dense problems), but before worrying too much about that, I thought I'd see how the simple version fared in tests. As far as the relationship between the primal problem and the primal-dual problem, AFAICT here's the difference: adopting the problem and approximately the notation on this page (i.e., temporarily ignoring equality constraints), the Newton equation from following the primal route is |
Ah good point, the interpretation of ensuring positive-definiteness of the Hessian block "projected onto the null space of the constraint Jacobian" works. I keep forgetting about that way of looking at it because that's not how it's calculated, unless you look at the unconstrained case.
With apologies for my skepticism, but I'd personally like to see a convergence proof make it through peer review, and/or demonstrated results across common test sets, before being totally sold on the approach. Pure Julia implementation, abstraction and API design is hard enough work that simultaneous innovation on the algorithm side is a bit beyond my own risk tolerance when comparing to known-quantity existing solvers, but that's coming from someone who isn't in academia any more. You must have a motivating application that you've been developing this for and using it on already, right? Would be interesting if you could point to any details about the kinds of problems where this has been working well, in terms of size, convexity, number of constraints, Hessian and Jacobian structure, etc. Looks like all the Hessian and Jacobian fields here are dense matrices? I hope that's a planned-to-be-temporary limitation? |
How do they differ? |
I want those things too, and hopefully I'll get to it eventually. But surely you understand the problem? After correction, a Hessian
It doesn't yet have the |
AFAICT that's true of all of Optim. Subject of another PR, and it doesn't have to be me. I use L-BFGS, so none of this is relevant anyway. I'm just trying to get something to pass muster, and on #50 people didn't like the fact that it wasn't like the conventional Newton approach. So I'm starting with Newton, even though I have no intention of using it for heavy-duty stuff. |
You'll get that term from the gradient of
For constrained problems too? |
No, that gives you the
Yes. |
OK, I'd say this is no longer WIP: it seems to be working quite well on many test problems. I used CUTEst.jl and selected problems of 100 variables or less (and 100 constraints or less) that had second derivatives. Out of 293 such problems, it errored on 4 (I haven't yet dug into why), failed to converge on 16, and converged on the rest. On 32 problems, it found a better minimum than the one recorded in the CUTEst database, sometimes by quite a lot (possibly an error in the database, of course). Code is https://github.com/timholy/OptimTests.jl. I don't know if you want to incorporate that here, or exactly how you want this tested, but at least now this has been tested pretty thoroughly even if the code isn't part of this PR. This isn't perfect, but I think it's good enough to be useful to folks. |
This was motivated by the observation that the deviation-based algorithm doesn't work when there's only one constraint. The predictor algorithm has a little trouble with this case too, but it's not nearly so severe (it *can* increase μ, you just have to prevent it from decreasing it to 0). Moreover, it seems a little more regular in its changes.
Status update on this PR @timholy ? |
I wonder why the number of iterations in fminbox differs between the linux and mac build... |
@timholy is there anything specific you need to clean up? Do tell if you need a hand, so we don't end up in another rebase from hell! |
When you're ready I suggest you squash some of the 40 commits. This will make handling of the merge conflicts easier I think. I'll be happy to do the actual rebasing once you've squashed it a bit. The funny thing is that the pr that introduced the conflict is almost orthogonal to what happens here, but I still end up with weird conflicts when trying to rebase. For example, |
@@ -17,8 +17,13 @@ module Optim | |||
Base.setindex! | |||
|
|||
export optimize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason ContraintBounds
are not exported?
push!(σ, 1) | ||
push!(b, li) | ||
end | ||
ui = u[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like ui
is already defined in line 472?
would be awesome :) |
Unless @timholy decides to finish this himself (ABC, Tim, Always Be Closing!), I wouldn't say it's improbable that I merge some version of this during 2017. We've cleared some older issues recently, and after a more proper loosening of types, I think this is next on the ticket. |
I'd be happy with that. I haven't merged it because the natural next step is to implement this for something besides Newton (and it's what my lab would actually use). I started looking into that, and decided I didn't have the month to spare to tackle that 😄. |
What do you mean by the comment in parentheses? That your lab would use it if something besides Newton was implemented, or that your lab is using this as is? |
No, we're using the old |
Alright, I follow. I think the first step will be up resolve conflicts (hopefully not as horrible as teh/constrained!!!), and merge this. Then we can always play around with other versions later. |
The functionality from this PR is now available at https://github.com/JuliaNLSolvers/ConstrainedOptim.jl I encourage everyone who are interested to try it out and let us know how it works for you. The plan is to add some more constrained optimization code that is lying around to ConstrainedOptim.jl, and then move the code back into Optim at some point in the future. |
No matter how the American election turns out, today we have something to celebrate: the beginning of what I hope will be a merge-worthy native Julia constrained optimization framework.
Long ago said that I would not work on hard-core optimization without a debugger (errors triggered 1 hour into an optimization run are too painful to fix otherwise), but @Keno has delivered, so now it's time for me to pony up. Besides, I've grown tired of maintaining my own fork of Optim and want access to all the goodies and cleanups (especially the cleanups) that others have contributed.
Rather than implementing the whole framework at once, I decided to try to keep things "simple" (if a
16002000 line diff can be called simple) by implementing just one method. I picked the interior-point Newton method, since that seems to be the mostly widely-used and/or described method, and for which the iteration is relatively simple. There are also many, many bells and whistles one could add. But I want to get a basic implementation of something merged before going on to other things.Compared to #50, there are almost too many differences to mention, as this was a clean start. A few of the highlights:
It's also worth briefly mentioning that this might have some nice features, though it needs real-world testing for us to know whether these will materialize. For example, unlike Ipopt this is guaranteed to roughly preserve the descent direction of the user's objective function (it will not choose such a high barrier coefficient that it pushes the solution into a different basin).
Testing are a mixed bag: there's quite a few tests for the low-level mathematics, but any overall tests (e.g., ones that involve a call to
optimize
) are a work in progress. I'm hoping to make use of CUTEst (JuliaSmoothOptimizers/CUTEst.jl#43 (comment)) and do a fairly thorough job, but that will take some time.It's worth noting a few details:
CC @Cody-G, @dpo, @abelsiqueira.