Manifold optimization #435

antoine-levitt · 2017-06-22T13:03:25Z

Follow-up from #433. This implements the last option #433 (comment). Having a manifold and tangent vector point is a nice idea (and it works pretty well), but I got stuck at the typing of points and vectors (they are not arrays, nor abstractarrays, so what should Optim type them as?). Anyway this is not too bad, allows more fine-grained on when projections and retractions are done, and seems to work well. I fake an objective function for compatibility with line searches. Ideally, LineSearches should be redesigned to take as input a 1D function, but I'm not confortable enough with the API to do that.

This is a pretty basic implementation, and not optimal by any means: I just project the gradient on the tangent space, and retract the iterate after each update. There is possibly either too much or too little projections. A proper Riemannian optimization person would possibly be horrified (no fancy vector transport or anything like that), but it works well enough in my tests, and this PR implements the infrastructure to make additional modifications possible. Only the spherical and Stiefel manifolds are implemented right now, but adding other manifolds/different retractions should be very easy. In the future, it would probably be better to split into a separate Manifolds package, with only the Manifold and Flat types living in Optim (or NLSolversBase?).

I tried to make the modifications as unintrusive as possible compared to the unconstrained case. In the default case of a flat manifold, everything is a no-op, so there shouldn't be any change to the generated code.

There are some test failures on the counters, but they seem to be present on master already...

codecov · 2017-06-22T14:07:19Z

Codecov Report

Merging #435 into master will decrease coverage by 0.19%.
The diff coverage is 90.51%.

@@           Coverage Diff            @@
##           master    #435     +/-   ##
========================================
- Coverage    90.7%   90.5%   -0.2%     
========================================
  Files          33      34      +1     
  Lines        1570    1674    +104     
========================================
+ Hits         1424    1515     +91     
- Misses        146     159     +13

Impacted Files	Coverage Δ
src/api.jl	`95.16% <100%> (ø)`	⬆️
src/univariate/types.jl	`100% <100%> (ø)`	⬆️
src/multivariate/solvers/first_order/bfgs.jl	`97.61% <100%> (+0.32%)`	⬆️
...e/solvers/first_order/momentum_gradient_descent.jl	`100% <100%> (ø)`	⬆️
src/multivariate/solvers/first_order/cg.jl	`89.58% <100%> (+2.08%)`	⬆️
...ltivariate/solvers/first_order/gradient_descent.jl	`100% <100%> (ø)`	⬆️
src/multivariate/solvers/first_order/l_bfgs.jl	`98.3% <100%> (+0.22%)`	⬆️
src/types.jl	`82.14% <100%> (+0.32%)`	⬆️
src/multivariate/solvers/constrained/fminbox.jl	`81.1% <100%> (ø)`	⬆️
...olvers/first_order/accelerated_gradient_descent.jl	`100% <100%> (ø)`	⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cded768...44a186b. Read the comment docs.

pkofod · 2017-06-22T16:14:45Z

Thanks! Will come back and review at a later point

pkofod · 2017-07-04T13:01:43Z

Very busy these days, sorry.. I haven't forgotten this, and I appreciate the effort!

antoine-levitt · 2017-07-04T16:03:11Z

It's OK, I'm not in any hurry ;-)

antoine-levitt · 2017-07-21T08:14:49Z

I've been using this in real life in a pretty involved example (minimizing a functional on collections of unitary matrices of different sizes), so functionally at least this works.

pkofod

Minor comments. I would love to hear from @cortner if he has any comments before we merge this (I know NLSolversBase needs a tag first).

pkofod · 2017-07-21T19:48:33Z

src/Manifolds.jl

+
+
+type ManifoldObjective{T<:NLSolversBase.AbstractObjective} <: NLSolversBase.AbstractObjective
+    manifold :: Manifold


the space around :: seems quite unusual

pkofod · 2017-07-21T19:51:31Z

src/Manifolds.jl

+project_tangent!(S::Sphere,g,x) = (g .= g .- real(vecdot(x,g)).*x)
+
+# N x n matrices such that X'X = I
+# TODO: add more retractions, and support arbitrary inner product


Any chance you would want to open an issue for this? Just so we don't lose track of the TODO

Done, see #448

pkofod · 2017-07-21T19:52:14Z

src/Manifolds.jl

+
+# N x n matrices such that X'X = I
+# TODO: add more retractions, and support arbitrary inner product
+abstract type Stiefel <: Manifold end


I know it's not done all over the code base, but a simple reference or explanation of what the "Stiefel manifold" is would be nice.

That's an important but pretty special manifold, no? What is the justification for having it as part of Optim?

Will document it. The main justification is that it is the one I need in my application ;-) More seriously, it's the basic manifold to do this kind of algorithms on: it was the original motivation for the theory, many other manifolds (sphere, O(n), U(n)) are special cases, it's probably the most used in applications (at least that I know of) outside of the sphere, and it's a good template for implementation of other manifolds.

There could be a Manifolds package living outside Optim, but it's a pretty short file so I would think this is fine, and people implementing other manifolds can just PR on Optim?

Good point about the special cases. Ok maybe leave this for now and discuss moving outside of Optim only when somebody complains

pkofod · 2017-07-21T19:59:19Z

There are some conflicts you need to handle (the dot calls I believe) before this can be merged.

Also, as I would by no means be the person to add it, you should also add some information in the documentation about this. Something about what it does, how you use it (you need to add the fields in the constructors for example), and so on. Past experiences tells me that if the original PR doesn't do this, it may take a long time for the docs to catch up :)

But overall it looks great! Very happy that you took your time to contribute this extension to Optim.

cortner · 2017-07-21T20:02:41Z

will look at it asap.

pkofod · 2017-07-21T20:06:50Z

will look at it asap.

Thanks, I think I've made @antoine-levitt wait long enough :)

…nto manifold

…itioning

cortner · 2017-07-25T21:48:14Z

docs/src/algo/complex.md

+
+The gradient of a complex-to-real function is defined as the only vector `g` such that `f(x+h) = f(x) + real(g' * h) + O(h^2)`. This is sometimes written `g = df/d(z*) = df/d(re(z)) + i df/d(im(z))`.
+
+Because in general the gradient is not a holomorphic function of `z`, the Hessian is not a well-defined concept and second-order optimization algorithms are not applicable directly. To use second-order optimization, convert to real variables. 


I don't understand this statement. Or rather, as I read it, it equally applies to the real case. I suggest to either give a little more detail?

What is not clear exactly in the statement above? The point of this paragraph is that the Hessian of a C^n -> R function is a 2n x 2n matrix, which does not map onto n x n complex matrices.

why can't the hessian of a C^n -> R function be a C^{n x n} matrix? Is the issue that the function is complex -> real rather than complex -> complex? Do you then mean by "in general" that it is never a holomorphic function of z or do you mean sometimes it is not a holomorphic function of z? I am now wondering whether I got tripped up by your usage of the word "in general".

It certainly can be a C^{n x n} matrix (e.g. |z|^2) but it can also not be, when the gradient (a C^n -> C^n function) is not holomorphic. Is the new formulation clearer?

cortner

I love the implementation - just left a few trivial comments. (I had just one questions - see in Conversation section)

cortner · 2017-07-25T22:07:22Z

@antoine-levitt what is the purpose of real_to_complex? I am guessing this just converts real. vectors to complex vector or no-op if you start with complex vectors and also no-op if the problem is real to begin with. But I didn't find the definition. Wherever it is defined, maybe add a docstring to it?

antoine-levitt · 2017-07-26T05:58:56Z

It's in the companion PR in NLSolversBase: JuliaNLSolvers/NLSolversBase.jl#14

… preconditioning

…nifold

antoine-levitt · 2017-07-26T08:29:32Z

@ChrisRackauckas moving the discussion here from #440,

This only works if you specify the gradient, right?

Yes, and it does crash when not passed a gradient. Finite differences is JuliaMath/Calculus.jl#115, autodiff is JuliaDiff/ForwardDiff.jl#157

cortner · 2017-07-26T08:57:08Z

docs/src/algo/complex.md

+
+The gradient of a complex-to-real function is defined as the only vector `g` such that `f(x+h) = f(x) + real(g' * h) + O(h^2)`. This is sometimes written `g = df/d(z*) = df/d(re(z)) + i df/d(im(z))`.
+
+The gradient of a C^n to R function is a C^n to C^n map. Even if it is differentiable when seen as a function of R^2n to R^2n, it might not be complex-differentiable. For instance, take f(z) = Re(z)^2. Then g(z) = 2 Re(z), which is not complex-differentiable (holomorphic). Therefore, the Hessian of a C^n to R function is not well-defined as a n x n complex matrix (only as a 2n x 2n real matrix), and therefore second-order optimization algorithms are not applicable directly. To use second-order optimization, convert to real variables. 


This is clear to me now. But the sentence

Therefore, the Hessian of a C^n to R function is not well-defined as a n x n complex matrix (only as a 2n x 2n real matrix), and therefore second-order optimization algorithms are not applicable directly. To use second-order optimization, convert to real variables.

still reads as if this is always true, but it is only true in this example (and many others). Maybe you could add the "rarely" somewhere? But I think my original point was valid: if g were complex-differentiable then you can define the hessian as a C^{n x n} matrix. You are just arguing that this is "rare"? Correct? But is it more "rare" than the objective f being complex-differentiable?

I have no experience this this at all so will take your word - I just want the documentation to be clear. If you can add a reference that would help as well.

I added a "in general".

I don't know how rare it is. Some simple functions do have a C^{n x n} matrix, e.g. |z|^2, z' A z, others don't. I have encountered two cases where that happens: |z|^4 (the nonlinearity in the Gross-Pitaevskii equation) and Im(ln(z)) (used in the computation of Wannier functions).

I don't have a good reference for this. Even the definition of the gradient is somewhat non-standard (even though it's cleary the right thing to do for C^n to R optimization). Since half of quantum mechanics is minimizing complex-to-real energies (the other half being perturbation), the physics literature is littered with "dE/dz^*" with no details, so this might be covered in a physics textbook somewhere.

antoine-levitt · 2017-08-16T11:10:16Z

Any remaining obstacles to merging this?

pkofod · 2017-08-16T11:52:02Z

Not much I would say! Tests only fail because of the base PR is not merged, right?

antoine-levitt · 2017-08-16T11:52:43Z

I think so, yes (at least they work locally)

cortner · 2017-08-18T20:22:41Z

I have no reasons to delay merging

pkofod · 2017-08-23T20:35:35Z

JuliaLang/METADATA.jl#10875 yayay, we'll be there soon! Sorry for taking so long. It's really a cool addition. I have just been swamped with other stuff!

anriseth · 2017-09-12T12:58:28Z

It was not as easy as we hoped to tag a new NLSolversBase. It seems like we are getting closer, this PR is good to go: JuliaLang/METADATA.jl#11123

Then remains (correct me if I'm wrong):

Tagging NLSolversBase.jl 3.0
Fixing merge conflicts here
Tests pass && merge

pkofod · 2017-09-20T12:42:02Z

@antoine-levitt I know this is not your fault, but there were some problems with getting NLSolversBase tagged, but there should be nothing stopping us now! Let's get this merged, do you have time to get it up to speed?

antoine-levitt · 2017-09-20T13:11:27Z

I'm always confused with git and merges, but I think it's done. Let's see what CI has to say.

edit: still doesn't import the correct NLSolversBase apparently?

pkofod · 2017-09-20T14:23:39Z

edit: still doesn't import the correct NLSolversBase apparently?

yeah, I know.. I'm on it.. Will fix it later

pkofod · 2017-09-22T06:22:09Z

Well well well, that only took three months! :) Thanks for the PR and your patience.

I'll just say this here, but it goes quite generally. If you have some interesting examples please share them. I'm collecting a bunch of "minitutorials" for Optim as Jupyter notebooks, and I think it would be cool to show this off on a/some nice example(s).

antoine-levitt · 2017-09-22T06:31:10Z

Awesome, thanks!

The example I put in the tests is quite nice and self contained, it computes the first eigenvalues of a matrix by optimizing the Rayleigh quotient over orthogonality constraints. I have other examples but way more involved.

antoine-levitt added 2 commits June 22, 2017 14:35

Manifold optimization

2b9b7f9

add tests

9aa58c1

add gradient(obj,int) to ManifoldObjective

0ea8ff5

antoine-levitt added 8 commits June 22, 2017 21:57

do not modify initial x, and fix LBFGS initialization

765e574

product and power manifolds

7e64c2a

support complex optimization

dd106e2

Merge branch 'complex' into manifold

74cd17b

fix tests, and import iscomplex the correct way

02f05f6

simplify slightly complex API, and support complex manifolds

7b7002e

nicer power and product manifolds

1f8315e

implement gradient! in manifoldobjective

2aadf79

pkofod reviewed Jul 21, 2017

View reviewed changes

pkofod requested a review from cortner July 21, 2017 19:58

antoine-levitt added 3 commits July 22, 2017 17:40

Merge branch 'master' of https://github.com/JuliaNLSolvers/Optim.jl i…

cff44ec

…nto manifold

doc

583997f

whitespace

1feb7b6

antoine-levitt mentioned this pull request Jul 23, 2017

Improvements to Manifolds #448

Open

7 tasks

antoine-levitt added 4 commits July 23, 2017 17:32

convert back to complex for preconditioners

1bb787b

document complex and manifolds

018cc81

whitespace & overloading P for complex optimization

0cba7ee

LBFGS's twoloop! expects vectorized arguments => clashes with precond…

2878095

…itioning

cortner reviewed Jul 25, 2017

View reviewed changes

cortner approved these changes Jul 25, 2017

View reviewed changes

antoine-levitt added 3 commits July 26, 2017 08:41

simplify discussion of second-order complex, and remove discussion of…

376178b

… preconditioning

Merge branch 'manifold' of github.com:antoine-levitt/Optim.jl into ma…

e71a6ed

…nifold

change "does nothing"

2ae30ee

This was referenced Jul 26, 2017

Support complex optimization #440

Closed

Proper epsilon for complex? JuliaMath/Calculus.jl#115

Open

clarify complex docs

4cafc1a

cortner reviewed Jul 26, 2017

View reviewed changes

antoine-levitt added 3 commits July 26, 2017 11:07

document manifolds

e9c8566

add "in general"

c1f1bc5

remove extra copy

0d8a1ac

Merge remote-tracking branch 'origin/master' into manifold

44a186b

pkofod closed this Sep 21, 2017

pkofod reopened this Sep 21, 2017

pkofod merged commit b4057ec into JuliaNLSolvers:master Sep 22, 2017

dehann mentioned this pull request Feb 20, 2020

Integrate with Manifolds.jl JuliaRobotics/RoME.jl#244

Closed



		type ManifoldObjective{T<:NLSolversBase.AbstractObjective} <: NLSolversBase.AbstractObjective
		manifold :: Manifold


		The gradient of a complex-to-real function is defined as the only vector `g` such that `f(x+h) = f(x) + real(g' * h) + O(h^2)`. This is sometimes written `g = df/d(z*) = df/d(re(z)) + i df/d(im(z))`.

		Because in general the gradient is not a holomorphic function of `z`, the Hessian is not a well-defined concept and second-order optimization algorithms are not applicable directly. To use second-order optimization, convert to real variables.


		The gradient of a complex-to-real function is defined as the only vector `g` such that `f(x+h) = f(x) + real(g' * h) + O(h^2)`. This is sometimes written `g = df/d(z*) = df/d(re(z)) + i df/d(im(z))`.

		The gradient of a C^n to R function is a C^n to C^n map. Even if it is differentiable when seen as a function of R^2n to R^2n, it might not be complex-differentiable. For instance, take f(z) = Re(z)^2. Then g(z) = 2 Re(z), which is not complex-differentiable (holomorphic). Therefore, the Hessian of a C^n to R function is not well-defined as a n x n complex matrix (only as a 2n x 2n real matrix), and therefore second-order optimization algorithms are not applicable directly. To use second-order optimization, convert to real variables.

Manifold optimization #435

Manifold optimization #435

Conversation

antoine-levitt commented Jun 22, 2017

codecov bot commented Jun 22, 2017 • edited Loading

Codecov Report

pkofod commented Jun 22, 2017

pkofod commented Jul 4, 2017

antoine-levitt commented Jul 4, 2017

antoine-levitt commented Jul 21, 2017

pkofod left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pkofod commented Jul 21, 2017 • edited Loading

cortner commented Jul 21, 2017

pkofod commented Jul 21, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cortner left a comment

Choose a reason for hiding this comment

cortner commented Jul 25, 2017

antoine-levitt commented Jul 26, 2017

antoine-levitt commented Jul 26, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antoine-levitt commented Aug 16, 2017

pkofod commented Aug 16, 2017 • edited Loading

antoine-levitt commented Aug 16, 2017

cortner commented Aug 18, 2017

pkofod commented Aug 23, 2017

anriseth commented Sep 12, 2017 • edited by pkofod Loading

pkofod commented Sep 20, 2017

antoine-levitt commented Sep 20, 2017 • edited Loading

pkofod commented Sep 20, 2017

pkofod commented Sep 22, 2017

antoine-levitt commented Sep 22, 2017

codecov bot commented Jun 22, 2017 •

edited

Loading

pkofod commented Jul 21, 2017 •

edited

Loading

antoine-levitt commented Jul 26, 2017 •

edited

Loading

pkofod commented Aug 16, 2017 •

edited

Loading

anriseth commented Sep 12, 2017 •

edited by pkofod

Loading

antoine-levitt commented Sep 20, 2017 •

edited

Loading