Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manifold optimization #435

Merged
merged 26 commits into from
Sep 22, 2017
Merged

Manifold optimization #435

merged 26 commits into from
Sep 22, 2017

Conversation

antoine-levitt
Copy link
Contributor

Follow-up from #433. This implements the last option #433 (comment). Having a manifold and tangent vector point is a nice idea (and it works pretty well), but I got stuck at the typing of points and vectors (they are not arrays, nor abstractarrays, so what should Optim type them as?). Anyway this is not too bad, allows more fine-grained on when projections and retractions are done, and seems to work well. I fake an objective function for compatibility with line searches. Ideally, LineSearches should be redesigned to take as input a 1D function, but I'm not confortable enough with the API to do that.

This is a pretty basic implementation, and not optimal by any means: I just project the gradient on the tangent space, and retract the iterate after each update. There is possibly either too much or too little projections. A proper Riemannian optimization person would possibly be horrified (no fancy vector transport or anything like that), but it works well enough in my tests, and this PR implements the infrastructure to make additional modifications possible. Only the spherical and Stiefel manifolds are implemented right now, but adding other manifolds/different retractions should be very easy. In the future, it would probably be better to split into a separate Manifolds package, with only the Manifold and Flat types living in Optim (or NLSolversBase?).

I tried to make the modifications as unintrusive as possible compared to the unconstrained case. In the default case of a flat manifold, everything is a no-op, so there shouldn't be any change to the generated code.

There are some test failures on the counters, but they seem to be present on master already...

@codecov
Copy link

codecov bot commented Jun 22, 2017

Codecov Report

Merging #435 into master will decrease coverage by 0.19%.
The diff coverage is 90.51%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #435     +/-   ##
========================================
- Coverage    90.7%   90.5%   -0.2%     
========================================
  Files          33      34      +1     
  Lines        1570    1674    +104     
========================================
+ Hits         1424    1515     +91     
- Misses        146     159     +13
Impacted Files Coverage Δ
src/api.jl 95.16% <100%> (ø) ⬆️
src/univariate/types.jl 100% <100%> (ø) ⬆️
src/multivariate/solvers/first_order/bfgs.jl 97.61% <100%> (+0.32%) ⬆️
...e/solvers/first_order/momentum_gradient_descent.jl 100% <100%> (ø) ⬆️
src/multivariate/solvers/first_order/cg.jl 89.58% <100%> (+2.08%) ⬆️
...ltivariate/solvers/first_order/gradient_descent.jl 100% <100%> (ø) ⬆️
src/multivariate/solvers/first_order/l_bfgs.jl 98.3% <100%> (+0.22%) ⬆️
src/types.jl 82.14% <100%> (+0.32%) ⬆️
src/multivariate/solvers/constrained/fminbox.jl 81.1% <100%> (ø) ⬆️
...olvers/first_order/accelerated_gradient_descent.jl 100% <100%> (ø) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cded768...44a186b. Read the comment docs.

@pkofod
Copy link
Member

pkofod commented Jun 22, 2017

Thanks! Will come back and review at a later point

@pkofod
Copy link
Member

pkofod commented Jul 4, 2017

Very busy these days, sorry.. I haven't forgotten this, and I appreciate the effort!

@antoine-levitt
Copy link
Contributor Author

It's OK, I'm not in any hurry ;-)

@antoine-levitt
Copy link
Contributor Author

I've been using this in real life in a pretty involved example (minimizing a functional on collections of unitary matrices of different sizes), so functionally at least this works.

Copy link
Member

@pkofod pkofod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. I would love to hear from @cortner if he has any comments before we merge this (I know NLSolversBase needs a tag first).

src/Manifolds.jl Outdated


type ManifoldObjective{T<:NLSolversBase.AbstractObjective} <: NLSolversBase.AbstractObjective
manifold :: Manifold
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the space around :: seems quite unusual

src/Manifolds.jl Outdated
project_tangent!(S::Sphere,g,x) = (g .= g .- real(vecdot(x,g)).*x)

# N x n matrices such that X'X = I
# TODO: add more retractions, and support arbitrary inner product
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance you would want to open an issue for this? Just so we don't lose track of the TODO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, see #448

src/Manifolds.jl Outdated

# N x n matrices such that X'X = I
# TODO: add more retractions, and support arbitrary inner product
abstract type Stiefel <: Manifold end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's not done all over the code base, but a simple reference or explanation of what the "Stiefel manifold" is would be nice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an important but pretty special manifold, no? What is the justification for having it as part of Optim?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will document it. The main justification is that it is the one I need in my application ;-) More seriously, it's the basic manifold to do this kind of algorithms on: it was the original motivation for the theory, many other manifolds (sphere, O(n), U(n)) are special cases, it's probably the most used in applications (at least that I know of) outside of the sphere, and it's a good template for implementation of other manifolds.

There could be a Manifolds package living outside Optim, but it's a pretty short file so I would think this is fine, and people implementing other manifolds can just PR on Optim?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about the special cases. Ok maybe leave this for now and discuss moving outside of Optim only when somebody complains

@pkofod pkofod requested a review from cortner July 21, 2017 19:58
@pkofod
Copy link
Member

pkofod commented Jul 21, 2017

There are some conflicts you need to handle (the dot calls I believe) before this can be merged.

Also, as I would by no means be the person to add it, you should also add some information in the documentation about this. Something about what it does, how you use it (you need to add the fields in the constructors for example), and so on. Past experiences tells me that if the original PR doesn't do this, it may take a long time for the docs to catch up :)

But overall it looks great! Very happy that you took your time to contribute this extension to Optim.

@cortner
Copy link
Contributor

cortner commented Jul 21, 2017

will look at it asap.

@pkofod
Copy link
Member

pkofod commented Jul 21, 2017

will look at it asap.

Thanks, I think I've made @antoine-levitt wait long enough :)

@antoine-levitt antoine-levitt mentioned this pull request Jul 23, 2017
7 tasks

The gradient of a complex-to-real function is defined as the only vector `g` such that `f(x+h) = f(x) + real(g' * h) + O(h^2)`. This is sometimes written `g = df/d(z*) = df/d(re(z)) + i df/d(im(z))`.

Because in general the gradient is not a holomorphic function of `z`, the Hessian is not a well-defined concept and second-order optimization algorithms are not applicable directly. To use second-order optimization, convert to real variables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this statement. Or rather, as I read it, it equally applies to the real case. I suggest to either give a little more detail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is not clear exactly in the statement above? The point of this paragraph is that the Hessian of a C^n -> R function is a 2n x 2n matrix, which does not map onto n x n complex matrices.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't the hessian of a C^n -> R function be a C^{n x n} matrix? Is the issue that the function is complex -> real rather than complex -> complex? Do you then mean by "in general" that it is never a holomorphic function of z or do you mean sometimes it is not a holomorphic function of z? I am now wondering whether I got tripped up by your usage of the word "in general".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It certainly can be a C^{n x n} matrix (e.g. |z|^2) but it can also not be, when the gradient (a C^n -> C^n function) is not holomorphic. Is the new formulation clearer?

Copy link
Contributor

@cortner cortner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the implementation - just left a few trivial comments. (I had just one questions - see in Conversation section)

@cortner
Copy link
Contributor

cortner commented Jul 25, 2017

@antoine-levitt what is the purpose of real_to_complex? I am guessing this just converts real. vectors to complex vector or no-op if you start with complex vectors and also no-op if the problem is real to begin with. But I didn't find the definition. Wherever it is defined, maybe add a docstring to it?

@antoine-levitt
Copy link
Contributor Author

It's in the companion PR in NLSolversBase: JuliaNLSolvers/NLSolversBase.jl#14

@antoine-levitt
Copy link
Contributor Author

antoine-levitt commented Jul 26, 2017

@ChrisRackauckas moving the discussion here from #440,

This only works if you specify the gradient, right?

Yes, and it does crash when not passed a gradient. Finite differences is JuliaMath/Calculus.jl#115, autodiff is JuliaDiff/ForwardDiff.jl#157


The gradient of a complex-to-real function is defined as the only vector `g` such that `f(x+h) = f(x) + real(g' * h) + O(h^2)`. This is sometimes written `g = df/d(z*) = df/d(re(z)) + i df/d(im(z))`.

The gradient of a C^n to R function is a C^n to C^n map. Even if it is differentiable when seen as a function of R^2n to R^2n, it might not be complex-differentiable. For instance, take f(z) = Re(z)^2. Then g(z) = 2 Re(z), which is not complex-differentiable (holomorphic). Therefore, the Hessian of a C^n to R function is not well-defined as a n x n complex matrix (only as a 2n x 2n real matrix), and therefore second-order optimization algorithms are not applicable directly. To use second-order optimization, convert to real variables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clear to me now. But the sentence

Therefore, the Hessian of a C^n to R function is not well-defined as a n x n complex matrix (only as a 2n x 2n real matrix), and therefore second-order optimization algorithms are not applicable directly. To use second-order optimization, convert to real variables. 

still reads as if this is always true, but it is only true in this example (and many others). Maybe you could add the "rarely" somewhere? But I think my original point was valid: if g were complex-differentiable then you can define the hessian as a C^{n x n} matrix. You are just arguing that this is "rare"? Correct? But is it more "rare" than the objective f being complex-differentiable?

I have no experience this this at all so will take your word - I just want the documentation to be clear. If you can add a reference that would help as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a "in general".

I don't know how rare it is. Some simple functions do have a C^{n x n} matrix, e.g. |z|^2, z' A z, others don't. I have encountered two cases where that happens: |z|^4 (the nonlinearity in the Gross-Pitaevskii equation) and Im(ln(z)) (used in the computation of Wannier functions).

I don't have a good reference for this. Even the definition of the gradient is somewhat non-standard (even though it's cleary the right thing to do for C^n to R optimization). Since half of quantum mechanics is minimizing complex-to-real energies (the other half being perturbation), the physics literature is littered with "dE/dz^*" with no details, so this might be covered in a physics textbook somewhere.

@antoine-levitt
Copy link
Contributor Author

Any remaining obstacles to merging this?

@pkofod
Copy link
Member

pkofod commented Aug 16, 2017

Not much I would say! Tests only fail because of the base PR is not merged, right?

@antoine-levitt
Copy link
Contributor Author

I think so, yes (at least they work locally)

@cortner
Copy link
Contributor

cortner commented Aug 18, 2017

I have no reasons to delay merging

@pkofod
Copy link
Member

pkofod commented Aug 23, 2017

JuliaLang/METADATA.jl#10875 yayay, we'll be there soon! Sorry for taking so long. It's really a cool addition. I have just been swamped with other stuff!

@anriseth
Copy link
Contributor

anriseth commented Sep 12, 2017

It was not as easy as we hoped to tag a new NLSolversBase. It seems like we are getting closer, this PR is good to go: JuliaLang/METADATA.jl#11123

Then remains (correct me if I'm wrong):

  • Tagging NLSolversBase.jl 3.0
  • Fixing merge conflicts here
  • Tests pass && merge

Sorry, something went wrong.

@pkofod
Copy link
Member

pkofod commented Sep 20, 2017

@antoine-levitt I know this is not your fault, but there were some problems with getting NLSolversBase tagged, but there should be nothing stopping us now! Let's get this merged, do you have time to get it up to speed?

@antoine-levitt
Copy link
Contributor Author

antoine-levitt commented Sep 20, 2017

I'm always confused with git and merges, but I think it's done. Let's see what CI has to say.

edit: still doesn't import the correct NLSolversBase apparently?

@pkofod
Copy link
Member

pkofod commented Sep 20, 2017

edit: still doesn't import the correct NLSolversBase apparently?

yeah, I know.. I'm on it.. Will fix it later

@pkofod pkofod closed this Sep 21, 2017
@pkofod pkofod reopened this Sep 21, 2017
@pkofod
Copy link
Member

pkofod commented Sep 22, 2017

Well well well, that only took three months! :) Thanks for the PR and your patience.

I'll just say this here, but it goes quite generally. If you have some interesting examples please share them. I'm collecting a bunch of "minitutorials" for Optim as Jupyter notebooks, and I think it would be cool to show this off on a/some nice example(s).

@pkofod pkofod merged commit b4057ec into JuliaNLSolvers:master Sep 22, 2017
@antoine-levitt
Copy link
Contributor Author

Awesome, thanks!

The example I put in the tests is quite nice and self contained, it computes the first eigenvalues of a matrix by optimizing the Rayleigh quotient over orthogonality constraints. I have other examples but way more involved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants