-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: constrained optimization #50
Conversation
Nice! How difficult would it be to support sparse Hessians and constraint Jacobians? We could then pretty easily use this as the first pure-Julia backend for JuMP. |
As you might notice, I began a process of switching declarations over to For the constraints, my thinking is that the user writes the functions so they add to an existing gradient/Hessian (one computed by the objective function), rather than having the algorithm provide a blank vector/matrix and then copy the results into the combined gradient/Hessian. (See the beginning of |
The typical interface for these sorts of solvers it to provide a callback to compute the Hessian of the Lagrangian directly, where you provide the weights to the users. Check out http://ipoptjl.readthedocs.org/en/latest/ipopt.html. We're soon going to be developing a standardized interface for this in MathProgBase, as we have for linear programming. |
This seems like a great start. I don't see anything that troubles me in a quick review. Will do a more detailed pass through tomorrow morning. |
I can see passing Lagrange multipliers/coefficients to the user---that's indeed probably a better design than my attempt to hide it from the user. But there are some other things I don't understand (or don't like):
In your case, I recognize that you simply have to live with the design choices make by Ipopt, but I'm not sure we should mimic it entirely. |
Different interior-point approaches do require the full Jacobian. If there are equality constraints, the KKT system is typically formulated as
where It's reasonable to have an interface where the terms are summed separately. This is more difficult if you have a sparse hessian, since you can't easily add to a sparse matrix (but the trick here is that solvers let you provide the matrix in triplet format, and duplicate terms are summed together). I don't have any objections to providing a convenient interface like this, as long as there's a way to write a small wrapper to transform Ipopt-style input to use this code. Note that almost all high-performance nonlinear solvers out there (like KNITRO, MOSEK) have an interface that looks like Ipopt's. They're certainly not user friendly, though they're meant to be used from modeling systems like AMPL, GAMS, and now JuMP that can automatically provide sparse derivatives. |
I did specify in my (edited) comment problems without equality constraints, which as far as I could tell from a brief skim is the only kind of problem Ipopt handles. Suppose you add linear equality constraints; for such constraints, there's no need to ask the user to supply anything other than the matrix and rhs---all the derivatives can all be handled internally, so again I don't see the need to trouble the user for the full Jacobian. Where one has to think about it is for nonlinear equality constraints, but my suspicion is that there one would be going with an augmented Lagrangian anyway, making it a little different problem. I'll think about it more, however. The solver here "looks" primal but it exploits duality--- |
@@ -138,6 +138,18 @@ function bfgs{T}(d::Union(DifferentiableFunction, | |||
f_x_previous, f_x = f_x, d.fg!(x, gr) | |||
f_calls, g_calls = f_calls + 1, g_calls + 1 | |||
|
|||
x_converged, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does moving the convergence check here change the inputs? If so, were the previous inputs not correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was that there were other exit points from the function (https://github.com/JuliaOpt/Optim.jl/blob/master/src/bfgs.jl#L149). So it was indicating it hadn't converged, but that test often gets triggered because it did converge.
Having read this more carefully, one question I have is why we should have a |
Most of the serious large-scale solvers use a row-wise interface with a lower and an upper bound for each constraint, one of which can be +/- inf. Equality constraints have lb[i] = ub[i]. From https://projects.coin-or.org/Ipopt/wiki
Or pick your favorite canonical form, I'm a fan of having just equality constraints and variable bounds, introducing slack variables for inequalities as needed. The KKT system has the Jacobians of both equality and inequality constraints in it, in the sparse non-convex case you don't want to eliminate anything ahead of time, the sparse symmetric indefinite linear solver will do a better job at reducing fill-in. The zero block in the lower right isn't usually actually zero, an inertia perturbation is required to guarantee convergence properties. When some constraint rows are linear, you could technically leave those nonzeros in the Jacobian untouched between interations and save a few copies, but the savings there are minimal when compared to the KKT factorization cost. |
This creates an objective function for least squares, but then you can supplement this with constraints. Using this, nonnegative least-squares becomes a few lines:
As I mentioned above, I tested this on a particular 50x50 problem and found this version to be roughly 100x faster than the |
|
Building |
Got it, I hadn't thought through the implications of having two bounds on each (non)linear constraint carefully enough. That will work well for a Newton method, but I have to think through the implications for implementation with CG (which for my problems is the much more important algorithm)---it seems like unifying inequalities and equalities could result in troublesome behavior with respect to roundoff errors.
OK, these can go. Since this PR may take some redesign and I have deadlines, this may sit for a while. |
Ok. I'll be excited to see where this heads whenever you pick it up again. |
Also support keyword arguments
…oint When the parameters are Float32, the return value of the function had better be, too, or we could be fooled about the precision.
50e0bee
to
72f18ea
Compare
I'm back to being interested in this branch again. Reference: https://groups.google.com/forum/#!topic/julia-opt/TVmuXFWfeBM. In contrast, here I see no overhead from the solver, and the progress towards the minimum per iteration also seems better. I haven't yet measured it per function evaluation; that's likely to look a little less favorable, because I noticed that our linesearch not infrequently uses more evaluations than Ipopt. (There's some evidence that might be a good thing (p. 2164), but it seems likely to reduce the speed of optimization.) I kinda wanted to get out of the business of writing my own optimization code, and maybe I still will if someone comes up with another solution. But I thought I'd at least ping this issue to say it may not be dead yet. Another thing about Optim that I like is that I can use Since I've just been playing with this, here are a couple of random API thoughts:
|
@timholy, I'm open to suggestions like JuliaOpt/MathProgBase.jl#87 on how to make the MPB API easier to both use and implement on the solver side. |
This should cover cases where dphi < 0 "by a hair," i.e., where the search direction is almost orthogonal to the gradient. It's an extra layer of security beyond commit 50b9d60 to increase the likelihood that when we terminate, we really have reached a minimum.
Other than what I suggested already, I can't think of any other ways to improve on what you've done---at the end of the day you have to be able to interact with C code, and the API you've already designed is quite good for that. I just wanted to throw in a tweak or two. But of course new ideas may crop up in the course of implementation. |
This no longer assumes that H is positive-definite, instead introducing a check and fix for cases where it does not already hold.
@@ -89,7 +89,7 @@ macro cgtrace() | |||
dt["g(x)"] = copy(gr) | |||
dt["Current step size"] = alpha | |||
end | |||
grnorm = norm(gr, Inf) | |||
grnorm = norm(gr[:], Inf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use vecnorm(gr, Inf)
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah.
Thanks for the comment!
I am under the illusion that I'll soon have a few free days to work on Optim, and might want to put this into shape for merger. Any API thoughts about this PR would be welcome; I think I'll need to make quite a few changes. |
@timholy, I'd say choose the API that seems most natural to you, so long as it's possible to write a lightweight wrapper to take in MathProgBase input as well. There's no need for the native API to the solver to be exactly MathProgBase. |
Aside from the integration into MathProgBase, what I wonder most about is the proper API for specifying constraints. Ipopt, for example, does not appear to have a special category for linear constraints, but I assume that would be a useful thing to specialize on. Likewise, #50 (comment) suggests two possible APIs for specifying equality or inequality constraints. As I think about it, I lean towards making everything an equality constraint and then specifying bounds on the slack variables, but any comments would be welcome. From the user's perspective, the good news is that no matter how crappy an API I design, it will still look pretty when accessed via the remarkable JuMP 😄. But that's probably not something to shoot for. |
If you want the API to be appropriate for problems that may have general nonlinear equality constraints, then my personal favorite canonical form (equalities and bounds) hasn't changed over the last 2 years. Conversions between canonical forms are simple so the user-facing API can use something more general like the row-wise form even if the algorithm internally uses something different. For other classes of problems you may want the internal implementation to use a different form (and/or API) though. In the Ipopt algorithm, linear constraints can at best save you a subset of Jacobian row evaluations, and those are typically not the bottleneck relative to the solution of the KKT system for the Newton step. You can set an option flag if all constraints happen to be linear then the Jacobian will only be evaluated once. Whether it's worth the API complication of allowing situations in between, either via a linearity bitmask or separate inputs for linear vs nonlinear constraints (Matlab fmincon style), depends mostly on whether you expect Jacobian evaluations to be expensive. |
…ot eps_gap eps_gap introduced an absolute scale for convergence, which seemed problematic. This should be more robust, as it monitors the characteristics of the solution. It should also avoid making changes to the barrier that no longer have a consequence for the solution; previous approaches introduced numerical stability problems when the barrier penalty became extreme.
Also cleans up a bit of the tracing & convergence-testing
7b0207c
to
276c98d
Compare
Don't delete this branch just yet, though. People may be using it. |
Well, we could let it stay forever. It could be used as a "masterclass in git-fu"... I know I've tried to rebase this quite a few times, only to give up halfway through! |
Probably a good idea. Certainly people in my lab are using it! |
This is not done yet, but I thought I should put it out there so people can comment sooner rather than later, if it looks like this is going in the wrong direction.
When finished, this should implement all the standard inequality constraints; equality constraints may come, but likely will be a little later (I have a pressing need for the inequalities, but not the equalities). The only algorithm I've implemented is a central-path-following interior-point solver---but I'd venture to say that if you could only pick one, interior-point is the way to go.
Using this, a version of
nnls
(just two lines,objective = linlsq(A, b); result = interior(objective, initial_x, bounds, method=:newton)
) is 100x faster on a 50x50 problem compared to the existing implementation ofnnls
inOptim
.Left to do:
In addition to adding support for constrained optimization, this makes several relevant improvements to
hz_linesearch
, in particular ones that are smarter about some of the odd behaviors of floating-point numbers, and others that will make constrained optimization perform significantly better. Thanks to @JeffBezanson for his quick work on JuliaLang/julia#6097, making it feasible to debug some of the darker corners ofhz_linesearch
.One final point: note that this changes how the
problems
directory is handled. I changed it because it wasn't obvious to me that we wanted this to be a regular part of loadingOptim
. Perhaps we should go one step further, and only load it when running tests (and not even havetestpaths
inOptim.jl
).