-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manifold optimization #435
Conversation
Codecov Report
@@ Coverage Diff @@
## master #435 +/- ##
========================================
- Coverage 90.7% 90.5% -0.2%
========================================
Files 33 34 +1
Lines 1570 1674 +104
========================================
+ Hits 1424 1515 +91
- Misses 146 159 +13
Continue to review full report at Codecov.
|
Thanks! Will come back and review at a later point |
Very busy these days, sorry.. I haven't forgotten this, and I appreciate the effort! |
It's OK, I'm not in any hurry ;-) |
I've been using this in real life in a pretty involved example (minimizing a functional on collections of unitary matrices of different sizes), so functionally at least this works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments. I would love to hear from @cortner if he has any comments before we merge this (I know NLSolversBase needs a tag first).
src/Manifolds.jl
Outdated
|
||
|
||
type ManifoldObjective{T<:NLSolversBase.AbstractObjective} <: NLSolversBase.AbstractObjective | ||
manifold :: Manifold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the space around ::
seems quite unusual
src/Manifolds.jl
Outdated
project_tangent!(S::Sphere,g,x) = (g .= g .- real(vecdot(x,g)).*x) | ||
|
||
# N x n matrices such that X'X = I | ||
# TODO: add more retractions, and support arbitrary inner product |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance you would want to open an issue for this? Just so we don't lose track of the TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, see #448
src/Manifolds.jl
Outdated
|
||
# N x n matrices such that X'X = I | ||
# TODO: add more retractions, and support arbitrary inner product | ||
abstract type Stiefel <: Manifold end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's not done all over the code base, but a simple reference or explanation of what the "Stiefel manifold" is would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an important but pretty special manifold, no? What is the justification for having it as part of Optim?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will document it. The main justification is that it is the one I need in my application ;-) More seriously, it's the basic manifold to do this kind of algorithms on: it was the original motivation for the theory, many other manifolds (sphere, O(n), U(n)) are special cases, it's probably the most used in applications (at least that I know of) outside of the sphere, and it's a good template for implementation of other manifolds.
There could be a Manifolds package living outside Optim, but it's a pretty short file so I would think this is fine, and people implementing other manifolds can just PR on Optim?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about the special cases. Ok maybe leave this for now and discuss moving outside of Optim only when somebody complains
There are some conflicts you need to handle (the dot calls I believe) before this can be merged. Also, as I would by no means be the person to add it, you should also add some information in the documentation about this. Something about what it does, how you use it (you need to add the fields in the constructors for example), and so on. Past experiences tells me that if the original PR doesn't do this, it may take a long time for the docs to catch up :) But overall it looks great! Very happy that you took your time to contribute this extension to Optim. |
will look at it asap. |
Thanks, I think I've made @antoine-levitt wait long enough :) |
docs/src/algo/complex.md
Outdated
|
||
The gradient of a complex-to-real function is defined as the only vector `g` such that `f(x+h) = f(x) + real(g' * h) + O(h^2)`. This is sometimes written `g = df/d(z*) = df/d(re(z)) + i df/d(im(z))`. | ||
|
||
Because in general the gradient is not a holomorphic function of `z`, the Hessian is not a well-defined concept and second-order optimization algorithms are not applicable directly. To use second-order optimization, convert to real variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this statement. Or rather, as I read it, it equally applies to the real case. I suggest to either give a little more detail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is not clear exactly in the statement above? The point of this paragraph is that the Hessian of a C^n -> R function is a 2n x 2n matrix, which does not map onto n x n complex matrices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't the hessian of a C^n -> R function be a C^{n x n} matrix? Is the issue that the function is complex -> real rather than complex -> complex? Do you then mean by "in general" that it is never a holomorphic function of z or do you mean sometimes it is not a holomorphic function of z? I am now wondering whether I got tripped up by your usage of the word "in general".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It certainly can be a C^{n x n} matrix (e.g. |z|^2) but it can also not be, when the gradient (a C^n -> C^n function) is not holomorphic. Is the new formulation clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the implementation - just left a few trivial comments. (I had just one questions - see in Conversation section)
@antoine-levitt what is the purpose of |
It's in the companion PR in NLSolversBase: JuliaNLSolvers/NLSolversBase.jl#14 |
@ChrisRackauckas moving the discussion here from #440,
Yes, and it does crash when not passed a gradient. Finite differences is JuliaMath/Calculus.jl#115, autodiff is JuliaDiff/ForwardDiff.jl#157 |
docs/src/algo/complex.md
Outdated
|
||
The gradient of a complex-to-real function is defined as the only vector `g` such that `f(x+h) = f(x) + real(g' * h) + O(h^2)`. This is sometimes written `g = df/d(z*) = df/d(re(z)) + i df/d(im(z))`. | ||
|
||
The gradient of a C^n to R function is a C^n to C^n map. Even if it is differentiable when seen as a function of R^2n to R^2n, it might not be complex-differentiable. For instance, take f(z) = Re(z)^2. Then g(z) = 2 Re(z), which is not complex-differentiable (holomorphic). Therefore, the Hessian of a C^n to R function is not well-defined as a n x n complex matrix (only as a 2n x 2n real matrix), and therefore second-order optimization algorithms are not applicable directly. To use second-order optimization, convert to real variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clear to me now. But the sentence
Therefore, the Hessian of a C^n to R function is not well-defined as a n x n complex matrix (only as a 2n x 2n real matrix), and therefore second-order optimization algorithms are not applicable directly. To use second-order optimization, convert to real variables.
still reads as if this is always true, but it is only true in this example (and many others). Maybe you could add the "rarely" somewhere? But I think my original point was valid: if g were complex-differentiable then you can define the hessian as a C^{n x n} matrix. You are just arguing that this is "rare"? Correct? But is it more "rare" than the objective f being complex-differentiable?
I have no experience this this at all so will take your word - I just want the documentation to be clear. If you can add a reference that would help as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a "in general".
I don't know how rare it is. Some simple functions do have a C^{n x n} matrix, e.g. |z|^2, z' A z, others don't. I have encountered two cases where that happens: |z|^4 (the nonlinearity in the Gross-Pitaevskii equation) and Im(ln(z)) (used in the computation of Wannier functions).
I don't have a good reference for this. Even the definition of the gradient is somewhat non-standard (even though it's cleary the right thing to do for C^n to R optimization). Since half of quantum mechanics is minimizing complex-to-real energies (the other half being perturbation), the physics literature is littered with "dE/dz^*" with no details, so this might be covered in a physics textbook somewhere.
Any remaining obstacles to merging this? |
Not much I would say! Tests only fail because of the base PR is not merged, right? |
I think so, yes (at least they work locally) |
I have no reasons to delay merging |
JuliaLang/METADATA.jl#10875 yayay, we'll be there soon! Sorry for taking so long. It's really a cool addition. I have just been swamped with other stuff! |
It was not as easy as we hoped to tag a new NLSolversBase. It seems like we are getting closer, this PR is good to go: JuliaLang/METADATA.jl#11123 Then remains (correct me if I'm wrong):
|
@antoine-levitt I know this is not your fault, but there were some problems with getting NLSolversBase tagged, but there should be nothing stopping us now! Let's get this merged, do you have time to get it up to speed? |
I'm always confused with git and merges, but I think it's done. Let's see what CI has to say. edit: still doesn't import the correct NLSolversBase apparently? |
yeah, I know.. I'm on it.. Will fix it later |
Well well well, that only took three months! :) Thanks for the PR and your patience. I'll just say this here, but it goes quite generally. If you have some interesting examples please share them. I'm collecting a bunch of "minitutorials" for Optim as Jupyter notebooks, and I think it would be cool to show this off on a/some nice example(s). |
Awesome, thanks! The example I put in the tests is quite nice and self contained, it computes the first eigenvalues of a matrix by optimizing the Rayleigh quotient over orthogonality constraints. I have other examples but way more involved. |
Follow-up from #433. This implements the last option #433 (comment). Having a manifold and tangent vector point is a nice idea (and it works pretty well), but I got stuck at the typing of points and vectors (they are not arrays, nor abstractarrays, so what should Optim type them as?). Anyway this is not too bad, allows more fine-grained on when projections and retractions are done, and seems to work well. I fake an objective function for compatibility with line searches. Ideally, LineSearches should be redesigned to take as input a 1D function, but I'm not confortable enough with the API to do that.
This is a pretty basic implementation, and not optimal by any means: I just project the gradient on the tangent space, and retract the iterate after each update. There is possibly either too much or too little projections. A proper Riemannian optimization person would possibly be horrified (no fancy vector transport or anything like that), but it works well enough in my tests, and this PR implements the infrastructure to make additional modifications possible. Only the spherical and Stiefel manifolds are implemented right now, but adding other manifolds/different retractions should be very easy. In the future, it would probably be better to split into a separate Manifolds package, with only the Manifold and Flat types living in Optim (or NLSolversBase?).
I tried to make the modifications as unintrusive as possible compared to the unconstrained case. In the default case of a flat manifold, everything is a no-op, so there shouldn't be any change to the generated code.
There are some test failures on the counters, but they seem to be present on master already...