Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified Newton using PositiveFactorizations #177

Merged
merged 1 commit into from
Mar 31, 2016
Merged

Modified Newton using PositiveFactorizations #177

merged 1 commit into from
Mar 31, 2016

Conversation

pkofod
Copy link
Member

@pkofod pkofod commented Mar 1, 2016

I have no idea where I pulled the other request from in #123 . Made a branch here instead.

julia> using Optim

julia> prob=Optim.UnconstrainedProblems.examples["Himmelblau"]
Optim.UnconstrainedProblems.OptimizationProblem("Himmelblau",Optim.UnconstrainedProblems.himmelblau,Optim.UnconstrainedProblems.himmelblau_gradient!,Optim.UnconstrainedProblems.himmelblau_hessian!,[2.0,2.0],[3.0,2.0],true,true)

julia> ddf = TwiceDifferentiableFunction(prob.f, prob.g!,prob.h!)
Optim.TwiceDifferentiableFunction(Optim.UnconstrainedProblems.himmelblau,Optim.UnconstrainedProblems.himmelblau_gradient!,fg!,Optim.UnconstrainedProblems.himmelblau_hessian!)

julia> res = optimize(ddf, [0., 0.], ModifiedNewton())
Results of Optimization Algorithm
 * Algorithm: Modified Newton's Method
 * Starting Point: [0.0,0.0]
 * Minimum: [2.9999999999873177,2.0000000000110285]
 * Value of Function at Minimum: 0.000000
 * Iterations: 7
 * Convergence: true
   * |x - x'| < 1.0e-32: false
   * |f(x) - f(x')| / |f(x)| < 1.0e-08: false
   * |g(x)| < 1.0e-08: true
   * Exceeded Maximum Number of Iterations: false
 * Objective Function Calls: 33
 * Gradient Call: 33

julia> optimize(ddf, [0., 0.], Newton())
ERROR: Search direction is not a direction of descent; this error may indicate that user-provided derivatives are inaccurate. (dphia = 23.282051; dphib = 0.103816)
 in error at ./error.jl:21
 [inlined code] from printf.jl:145
 in secant2! at /home/pkm/Optim.jl/src/linesearch/hz_linesearch.jl:402
 in hz_linesearch! at /home/pkm/Optim.jl/src/linesearch/hz_linesearch.jl:321
 in hz_linesearch! at /home/pkm/Optim.jl/src/linesearch/hz_linesearch.jl:176 (repeats 10 times)
 in optimize at /home/pkm/Optim.jl/src/newton.jl:108
 in optimize at /home/pkm/Optim.jl/src/optimize.jl:119

julia> @assert norm(res.minimum - prob.solutions) < 1e-10

Thanks for the wonderful tool @timholy !

I'm thinking we really just do this by default, but that's of course something we need to discuss.

iteration += 1

# Search direction is always the negative gradient divided by
# the "closest" positive definite matrix to H. This step is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"closest positive definite matrix to H" isn't quite accurate (the point of the discussion in #153 (comment)). More accurate would be to say "a matrix encoding the absolute values of the curvatures represented by H."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied your formulation, and directed interested people to #153

@timholy
Copy link
Contributor

timholy commented Mar 1, 2016

Like you, I don't see a good reason to have an algorithm that breaks on a large fraction of problems, so I agree this should just be the default for our Newton's method.

@pkofod
Copy link
Member Author

pkofod commented Mar 1, 2016

Yes, it seems kind of odd. As long as we document that it does not just take the Hessian and perform -(H\gr) I think that would be preferable..

@pkofod
Copy link
Member Author

pkofod commented Mar 1, 2016

Another test case. We're starting it at [2., 2.] which is a concave region, but it should be straight forward to go down-hill from there. Newton fails, and ModifiedNewton successfully finds minimizer = [pi, pi].
easom

using Plots, Optim, ForwardDiff
easom(x) = -cos(x[1])*cos(x[2])*exp(-((x[1]-pi)^2 + (x[2]-pi)^2))
default(size=(400,400), fc=:heat)
x, y = 1.2:0.1:4, 1.2:0.1:4
z = Surface((x,y)->easom([x,y]), x, y)
surface(x,y,z, linealpha = 0.3)

g_easom = ForwardDiff.gradient(easom)
h_easom = ForwardDiff.hessian(easom)

optimize(easom, (x, y) -> copy!(y, g_easom(x)), (x,y)->copy!(y, h_easom(x)), [2., 2.], Newton())
optimize(easom, (x, y) -> copy!(y, g_easom(x)), (x,y)->copy!(y, h_easom(x)), [2., 2.], ModifiedNewton())

which gives

julia> optimize(easom, (x, y) -> copy!(y, g_easom(x)), (x,y)->copy!(y, h_easom(x)), [2., 2.], Newton())
ERROR: Search direction is not a direction of descent; this error may indicate that user-provided derivatives are inaccurate. (dphia = 0.015869; dphib = 0.005063)
 in error at ./error.jl:21
 [inlined code] from printf.jl:145
 in secant2! at /home/pkm/Optim.jl/src/linesearch/hz_linesearch.jl:402
 in hz_linesearch! at /home/pkm/Optim.jl/src/linesearch/hz_linesearch.jl:321
 in hz_linesearch! at /home/pkm/Optim.jl/src/linesearch/hz_linesearch.jl:176 (repeats 10 times)
 in optimize at /home/pkm/Optim.jl/src/newton.jl:108
 in optimize at /home/pkm/Optim.jl/src/optimize.jl:137

julia> optimize(easom, (x, y) -> copy!(y, g_easom(x)), (x,y)->copy!(y, h_easom(x)), [2., 2.], ModifiedNewton())
Results of Optimization Algorithm
 * Algorithm: Modified Newton's Method
 * Starting Point: [2.0,2.0]
 * Minimum: [3.141592653589793,3.141592653589793]
 * Value of Function at Minimum: -1.000000
 * Iterations: 5
 * Convergence: true
   * |x - x'| < 1.0e-32: false
   * |f(x) - f(x')| / |f(x)| < 1.0e-08: true
   * |g(x)| < 1.0e-08: true
   * Exceeded Maximum Number of Iterations: false
 * Objective Function Calls: 26
 * Gradient Call: 26

(and no I couldn't bother finding the gradient and hessian by hand... :) )

@pkofod
Copy link
Member Author

pkofod commented Mar 12, 2016

@johnmyleswhite what is your stand of simply using PositiveFactorizations in our "regular" newton's method re the discussions above and elsewhere in the issues?

@johnmyleswhite
Copy link
Contributor

I'll completely defer to Tim here, but it seems clear that we shouldn't default to a version of Newton's method that doesn't work on lots of problems.

@pkofod pkofod force-pushed the pkm/modnewt branch 4 times, most recently from cd650cc to 3178a1e Compare March 13, 2016 22:01
@pkofod
Copy link
Member Author

pkofod commented Mar 13, 2016

Okay; switched gears, and simply patched the original Newton's method. As it were, it was not useful for much more than illustrating the original idea behind the algorithm. In practice, there is really no good reason not to handle concave regions intelligently, and I think PositiveFactorizations is a sound approach. If someone is able to improve this in anyway in the future that is great, but right now it would simply be lovely to get rid of that linesearch error you get in concave regions that we've had forever.

@mlubin
Copy link
Contributor

mlubin commented Mar 13, 2016

Converging is better than not converging. 👍

@pkofod
Copy link
Member Author

pkofod commented Mar 31, 2016

Since there are no objections, I'm merging this. Currently, it seems to be our best bet of ensuring posdefness, and it fixes something that has generated quite a few issues in the past.

@pkofod pkofod merged commit b73ace5 into master Mar 31, 2016
@pkofod pkofod deleted the pkm/modnewt branch June 1, 2016 07:28
@cossio
Copy link
Contributor

cossio commented Jan 5, 2022

Isn't this the same as the "saddle-free Newton method" suggested in this paper: https://arxiv.org/abs/1406.2572?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants