-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manifold optimization #433
Comments
Definitely interested. And have thought about it. Just never became urgent enough. I'd also like to add the suggestion to add curved linesearch |
Sure, why not. By your description, I think this is very easy to implement in a way that does not affect the compiled code of our curent algorithms for our current types of problems at all (empty fallbacks etc). Is there some sort of intro reference for this? It is not my field at all! |
How does this differ from general equality-constrained optimization? |
It can differ because you can force the linesearch to stay within the manifold, instead of searching the full space but relaxing to the manifold via a constraint. But arguably the best way to do this is on the user's side: choosing coordinates which are themselves constrained to be in the manifold (which would then force this on the linesearch). Or you can project to the manifold with each change, adding a nonlinear solve between each step. This can be required though if the equation off the manifold is just not well-behaved. |
That's well-known to be an inefficient approach for most problems, because it forces you to take tiny steps or perform second-order correction to high precision. Of course there are cases where you can't evaluate your objective off the manifold, so there is a place for this, but it should be understood that we'll get much bigger benefit from solving the general problem. Of course, projection methods are pretty simple in some ways, and so getting something simple merged might be a fast way to get some of the benefits of #303. |
In DiffEq the callbacks are powerful enough that these kinds of projection methods can be just implemented as callbacks, and then there's a library of standard callback methods which includes a projection method. Maybe that's a good approach for Optim as well: instead of implementing a projection directly, allow the user to inject a function after each update, and have an example where this performs a manifold projection. This same interface can then serve a lot of other functions as well. |
Eigenvalue Problems and Harmonic Maps are examples where staying on the manifold tends to be advantageous I think. |
But in general I agree with @timholy. It's just that some of these special cases are incredibly important. |
There's a report on ROPTLIB at http://www.math.fsu.edu/~whuang2/pdf/ROPTLIB_11WH_techrep.pdf that's recent and has references to the literature. The basic idea is that if you want to minimize a function defined on a sphere, instead of going in the direction of the gradient, you project the gradient on the tangent space to the sphere at the point you're at, do a step, and then map ("retract") back to the sphere. There are subtleties in extending more complex algorithms like CG and BFGS (how do you combine tangent vectors at different points on the manifold? what hessian should BFGS approximate?) and a whole lot of manifolds and variants. The sphere example is trivial, but the main interest in applications (at least from my perspective) is minimizing functionals on the space of matrices with orthogonal columns (the Stiefel manifold). Compared to general constrained optimization, the benefit is that, when the manifold is simple enough that you can do the projection analytically, you can basically treat it as an unconstrained optimization problem and re-use the same algorithms. The typical use-case would be conjugate gradient or BFGS with orthogonality constraints, and I would imagine it's pretty hard to beat this kind of methods with a general constrained optimization algorithm (but I don't have a good benchmark) |
Yes, if you can do analytic projection then this is a good strategy. I'd say go for it; @ChrisRackauckas is right about thinking about this primarily in terms of user-supplied functions for performing the projection. |
Ah, okay! Well, I still think this will be a change that doesn't hurt anything, and if someone stands to gain by explicitly handling the "manifold constraint" using an analytical projection, then let's go for it. |
I had a look at implementing this and it is slightly more tricky than I thought it would be. In particular, line searches need to be aware of this because the line search is not on f(x + alpha d) but R(f(x+alpha d)), where R is a retraction (projection on the manifold). I've thought of a few ways of doing this:
@simd for i in 1:n
@inbounds x_scratch[i] = x[i] + alpha * s[i]
end or x .+= alpha.*s
Thoughts? |
This should happen anyway, so even if painful seems unquestionably the way to go. |
the Without further thinking about it, I quite like the first approach. This would only be feasible when the projections are relatively cheap anyhow and otherwise one would switch to standard constrained optimisation I think. |
I don't like this; my current experiences with the string method suggest this will be awful
agree this will be bug-prone. |
Are there any line search algorithms out there where this would not work? If not, then I agree that this makes sense. I would make sure evaluations of the derivative based on the gradient of the objective is propagated back to Optim somehow (don't think that's very difficult). |
the only case I can think of immediately that needs thinking about - and I am not even sure this is an exception - is non-monotone linesearch because it depends on the history of the optimisation. Does anybody have experience with that? |
I don't think non-monotone linesearch fits into the current API either, it's pretty much orthogonal. As far as I can tell line search is exactly a 1D (approximate) minimization problem, and the linesearches redefine the 1D function (e.g. linefunc! in hagerzhang), so it would make more sense to just pass it a function (and the function can include some machinery to store the gradient evaluations, akin to what's done now). About a manifold type, I see how it would work with updates like x = x + alpha * d, and even for x .+= d (one would just need to overload broadcast for +, pass it to the data inside x and d, and retract back), but how to do it for something like x .+= alpha.*d1 .+ d2? broadcast gets passed an arbitrary function, which would be hard to analyze. If that can be done it may not be a bad idea... there would just be a type for a point on a manifold, a type for a tangent vector (that stores its base point so it can do vector transport if needed), and the only thing Optim needs to know is that it should project the gradient immediately after computation, and that it should only do "legitimate" manifold stuff (ie adds tangent vectors to points, vecdot() tangent vectors between themselves, etc). The user would just do optimize(obj_function, gradient_function!, Stiefel(input_data)) and everything would just work. Basically #399 on steroids: optimization doesn't even need a Hilbert space, it only needs a Riemann manifold with a retraction (x + d, where x is a point on the manifold and d a tangent vector to the manifold at x), a vector transport (dx + dy, where dx is a tangent vector at x and dy a tangent vector at y) and a projection on the tangent space. Possibly the same interface can even be reused directly for DiffEq to solve differential equations on manifolds automatically, without callbacks. If that works, it would be pretty awesome ;-) |
|
That's right, which is why |
I like that. I think we should implement a prototype in a separate branch. |
I see - I misunderstood. For what it's worth I always thought doing everything in-place in those operations is overkill. But maybe I am wrong, maybe worth thinking about more carefully. |
The original code didn't do things in-place and was noticeably slower. That was long ago in Julia history, but I'd want to see benchmarks before removing that optimization. |
Then I suspect that the objectives I am used to are just significantly more costly than the ones you used for the benchmarks. |
Anyhow why can't one dispatch the broadcast on the types of the arguments? |
Indeed, using multiple dispatch for optimizations is a large benefit of lowering dot-syntax to broadcast. |
The problems that we have in UnconstrainedProblems are so cheap to evaluate that inplace updating makes a difference. Some of the larger problems in CUTEst will take much longer to evaluate the objective and gradients than updating the current iterate. However, I do think that we need to cater to both needs, as I suspect that Optim is used to optimize "cheap" functions repeatedly quite a lot, so there it would be a regression to remove inplace operations, even if only for updating the iterate. |
Thanks for this, if someone feels like some of the discussion should be continued, please open new issues for each topic (discussion seemed to go a bit off-topic) |
A number of problems involve minimizing a function on a Riemannian manifold defined by geometric constraints. An example is the eigenvalue problem by the minimization of the Rayleigh quotient under orthogonality constraints, nonlinear models in quantum chemistry (e.g. DFT), and several problems in matrix approximation. There are several toolboxes available: ROPTLIB has bindings to Julia, and there's https://github.com/NickMcNutt/ManifoldOptim.jl but it's abandoned (I emailed the author). One issue with these efforts is that they basically duplicate code from a general unconstrained optimizer.
As far as I can tell, the adaptation of an unconstrained optimizer to Riemannian optimization is pretty easy (at least for first-order algorithms): basically, it amounts to projecting the gradient onto the tangent manifold immediately after computing it, and retracting iterates back to the manifold after each step. Therefore, an implementation in Optim.jl would not be very intrusive. There would be a Manifold type with subtypes (e.g. Flat, Sphere, Stiefel...), functions like retract(M::Manifold, x) implemented for every subtype, and there would be an manifold option in Options, defaulting to Flat.
This issue is here to gauge interest: if I submit a PR implementing bare bones of this (e.g. gradient descent and CG for the sphere and stiefel manifold, and tests for the linear eigenvalue problem), will it get merged? The only potential issue I can see is that it does involve more code to develop/test for every kind of optimizer, but all of them don't have to support manifolds, and support can be added incrementally as interest arises.
(cc @cortner, who might be interested in the functionality)
The text was updated successfully, but these errors were encountered: