Added a modified Newton's algorithm to handle both convex and concave… #123

pkofod · 2015-06-10T13:57:53Z

… regions.

This is an attempt to resolve some of the problems with Newton's method going in the wrong direction. Related to #104 and my discussion with @johnmyleswhite there. To see that it works with the original starting values, run

using Optim
prob=Optim.UnconstrainedProblems.examples["Himmelblau"]
ddf = TwiceDifferentiableFunction(prob.f, prob.g!,prob.h!)
res = Optim.modified_newton(ddf, [0.,0.])
@assert norm(res.minimum - prob.solutions) < 1e-2

… regions.

johnmyleswhite · 2015-06-10T14:02:23Z

This seems very promising. I've given you commit access to give you a little more freedom to work on projects like this, although do please exercise caution when merging changes. Ideally, we'll still do everything via explicit merge requests and review. I just want to lower friction slightly for you and show you that we appreciate the work you've been doing.

Do you have some academic citations to help me understand why this approach works?

timholy · 2015-06-10T14:27:13Z

It looks like it follows the time-honored tradition of forcing the Hessian to be positive-definite. Is there literature suggesting this is the best approach? I would have naively thought that adding something proportional to the diagonal would be a better strategy; if you think of each variable as having units, there's no guarantee they are scaled the same way.

pkofod · 2015-06-10T14:57:04Z

You both raise interesting questions - I kind of expected that. I just wanted to get a "globalized Newton's Method" started, since I've seen the linesearch question (caused by direction of ascent by Newton's Method) many times.

As @timholy mentions, this is simply the good old "force pos-def" by adding a number to the diagonal of the Hessian. I chose it because it was simple, and at the very least as a proof of concept. The simplicity is a strength, but of course there are also drawbacks as Nocedal and Wright mention:

"This strategy is quite simple and may be preferable to the modified factorization techniques described next, but it suffers from two drawbacks. The value of τ generated by this procedure may be unnecessarily large, which would bias the modified Newton direction too much toward the steepest descent direction. In addition, every value of τ_k requires a new factorization of A + τ_k*I, which can be quite expensive if several trial values are generated. (The symbolic factorization of A is performed only once, but the numerical factorization must be performed from scratch every time."

timholy · 2015-06-10T17:28:47Z

Of relevance to the considerations of symbolic/numerical factorization (which refers to sparse matrices), cholfact allows a shift keyword which only supports A+τ*I for scalar τ. (In julia 0.3, this is cholfact(A, τ) rather than cholfact(A, shift=τ)). This would argue for your strategy, and over the longer term some manual effort to exploit separating the analysis phase from the computation phase.

pkofod · 2015-06-10T18:27:40Z

I didn't know that @timholy, although I can see it in the v0.4 docs. On v"0.4.0-dev+4986" I get

ERROR: unrecognized keyword argument "shift"

timholy · 2015-06-10T18:32:33Z

I was just going by what's in the docs, too, and only noticed this a few days ago myself. But I get:

julia> A = sprand(10,10,0.05)
10x10 sparse matrix with 5 Float64 entries:
        [1 ,  1]  =  0.497277
        [7 ,  1]  =  0.650883
        [3 ,  2]  =  0.626572
        [4 ,  5]  =  0.984572
        [2 ,  9]  =  0.123467

julia> B = A'*A
10x10 sparse matrix with 4 Float64 entries:
        [1 ,  1]  =  0.670933
        [2 ,  2]  =  0.392592
        [5 ,  5]  =  0.969383
        [9 ,  9]  =  0.0152441

julia> F = cholfact(B, shift=0.1)
Base.SparseMatrix.CHOLMOD.Factor{Float64,Int64}
type:          LLt
method: simplicial
maxnnz:         10
nnz:            10

julia> b = rand(10);

julia> F\b
10-element Array{Float64,1}:
 0.395839
 1.06682 
 2.31732 
 0.69799 
 0.245143
 2.6186  
 6.78997 
 6.38511 
 2.73964 
 0.411796

julia> full(B+0.1*I)\b
10-element Array{Float64,1}:
 0.395839
 1.06682 
 2.31732 
 0.69799 
 0.245143
 2.6186  
 6.78997 
 6.38511 
 2.73964 
 0.411796

which seems like it must have worked? (Note: only relevant for sparse matrices.)

pkofod · 2015-06-10T18:40:19Z

I get it. I was trying to do it for a dense matrix. Why isn't that keyword available for dense matrices? @andreasnoack

I will see if I can find some literature (hopefully there are survey articles out there...), but I am open to trying different Hessian modifications instead of this simple versions. On the other hand, I have a gut feeling that this might work reasonably well for small size problems. However, this is hard to say without some benchmarking. As I said earlier, I really did this PR to get things moving, so I'm open to suggestions (and willing to throw some time at it) :)

timholy · 2015-06-10T19:00:10Z

For sparse matrices, there are some opportunities that involve making decisions based on the pattern of sparsity, as these decisions amortize over different numerical values for those nonzero entries.

For dense matrices, there are no decisions to make: you just plug and chug. So there's no advantage in having such syntax for dense matrices (it's "all numerical").

pkofod · 2015-06-10T20:38:41Z

I guess its just a matter of how often it is used or not. As a shorthand, I guess I could still be there, although it doesn't add any numerical advantages.

pkofod · 2016-02-29T11:15:35Z

I can't exactly remember my plans with this. The fastest way to get a Newton's method that doesn't fail in concave regions would be to use https://github.com/timholy/PositiveFactorizations.jl . The clear con would be to add a dependency. Have you played around with this since you made the package @timholy ? Are you still certain that it performs better? If it is indeed better, I would be happy to use it instead of the current "solution".

What do you say @johnmyleswhite - is adding a dependency a big no-no?

timholy · 2016-02-29T12:22:19Z

For me it's both fast and reliable. See the teh/constrained branch for an implementation that uses it.

pkofod · 2016-02-29T15:06:44Z

Sounds good. If it's used in teh/constrained, I guess we can use it here as well. I just think we kind of need a Newton's method that doesn't throw a linesearch error in concave regions :) If the correction is close to zero for positive definite matrices, I actually think it should be our standard "Newton", but that is of course up for discussion.

johnmyleswhite · 2016-02-29T19:01:22Z

I trust @timholy's judgment here.

timholy · 2016-03-01T01:55:53Z

If the correction is close to zero for positive definite matrices, I actually think it should be our standard "Newton", but that is of course up for discussion.

It's not just "close to zero," it is zero, if the Hessian is positive definite.

Performance-wise, I seem to remember it was within a factor of 2 of LAPACK's Cholesky factorization, though of course this would be worth verifying independently. If you're looking for reasons not to make it the default, that factor of 2 (if it's there) is the only good reason I can think of. But it's worth pointing out that you can't efficiently find out whether the Hessian is positive definite until you start trying to Cholesky factor it. If it's not positive definite, then you throw an error, and have to start from scratch. In contrast, this algorithm just deals with problematic diagonals (those <= 0) as it goes; if it never encounters any, then it returns the standard Cholesky decomposition.

pkofod · 2016-03-01T07:25:06Z

Okay, I must've read the README over there a bit too fast. Looks good. And a factor 2 is probably no worse than going uphill :) The "fix" in this PR currently is most certainly very slow, so I'm not too worried about that. Orders of magnitude would of course be a bit problematic for larger cases. I'll try it out and report back.

pkofod · 2016-03-01T10:50:05Z

Closed in favour of #177

Added a modified Newton's algorithm to handle both convex and concave…

0ebf2df

… regions.

pkofod mentioned this pull request Dec 1, 2015

Modified cholesky #153

Closed

pkofod mentioned this pull request Mar 1, 2016

Modified Newton using PositiveFactorizations #177

Merged

pkofod closed this Mar 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added a modified Newton's algorithm to handle both convex and concave… #123

Added a modified Newton's algorithm to handle both convex and concave… #123

pkofod commented Jun 10, 2015

johnmyleswhite commented Jun 10, 2015

timholy commented Jun 10, 2015

pkofod commented Jun 10, 2015

timholy commented Jun 10, 2015

pkofod commented Jun 10, 2015

timholy commented Jun 10, 2015

pkofod commented Jun 10, 2015

timholy commented Jun 10, 2015

pkofod commented Jun 10, 2015

pkofod commented Feb 29, 2016

timholy commented Feb 29, 2016

pkofod commented Feb 29, 2016

johnmyleswhite commented Feb 29, 2016

timholy commented Mar 1, 2016

pkofod commented Mar 1, 2016

pkofod commented Mar 1, 2016

Added a modified Newton's algorithm to handle both convex and concave… #123

Added a modified Newton's algorithm to handle both convex and concave… #123

Conversation

pkofod commented Jun 10, 2015

johnmyleswhite commented Jun 10, 2015

timholy commented Jun 10, 2015

pkofod commented Jun 10, 2015

timholy commented Jun 10, 2015

pkofod commented Jun 10, 2015

timholy commented Jun 10, 2015

pkofod commented Jun 10, 2015

timholy commented Jun 10, 2015

pkofod commented Jun 10, 2015

pkofod commented Feb 29, 2016

timholy commented Feb 29, 2016

pkofod commented Feb 29, 2016

johnmyleswhite commented Feb 29, 2016

timholy commented Mar 1, 2016

pkofod commented Mar 1, 2016

pkofod commented Mar 1, 2016