-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a modified Newton's algorithm to handle both convex and concave… #123
Conversation
This seems very promising. I've given you commit access to give you a little more freedom to work on projects like this, although do please exercise caution when merging changes. Ideally, we'll still do everything via explicit merge requests and review. I just want to lower friction slightly for you and show you that we appreciate the work you've been doing. Do you have some academic citations to help me understand why this approach works? |
It looks like it follows the time-honored tradition of forcing the Hessian to be positive-definite. Is there literature suggesting this is the best approach? I would have naively thought that adding something proportional to the diagonal would be a better strategy; if you think of each variable as having units, there's no guarantee they are scaled the same way. |
You both raise interesting questions - I kind of expected that. I just wanted to get a "globalized Newton's Method" started, since I've seen the linesearch question (caused by direction of ascent by Newton's Method) many times. As @timholy mentions, this is simply the good old "force pos-def" by adding a number to the diagonal of the Hessian. I chose it because it was simple, and at the very least as a proof of concept. The simplicity is a strength, but of course there are also drawbacks as Nocedal and Wright mention: "This strategy is quite simple and may be preferable to the modified factorization techniques described next, but it suffers from two drawbacks. The value of τ generated by this procedure may be unnecessarily large, which would bias the modified Newton direction too much toward the steepest descent direction. In addition, every value of τ_k requires a new factorization of A + τ_k*I, which can be quite expensive if several trial values are generated. (The symbolic factorization of A is performed only once, but the numerical factorization must be performed from scratch every time." |
Of relevance to the considerations of symbolic/numerical factorization (which refers to sparse matrices), |
I didn't know that @timholy, although I can see it in the v0.4 docs. On v"0.4.0-dev+4986" I get ERROR: unrecognized keyword argument "shift" |
I was just going by what's in the docs, too, and only noticed this a few days ago myself. But I get: julia> A = sprand(10,10,0.05)
10x10 sparse matrix with 5 Float64 entries:
[1 , 1] = 0.497277
[7 , 1] = 0.650883
[3 , 2] = 0.626572
[4 , 5] = 0.984572
[2 , 9] = 0.123467
julia> B = A'*A
10x10 sparse matrix with 4 Float64 entries:
[1 , 1] = 0.670933
[2 , 2] = 0.392592
[5 , 5] = 0.969383
[9 , 9] = 0.0152441
julia> F = cholfact(B, shift=0.1)
Base.SparseMatrix.CHOLMOD.Factor{Float64,Int64}
type: LLt
method: simplicial
maxnnz: 10
nnz: 10
julia> b = rand(10);
julia> F\b
10-element Array{Float64,1}:
0.395839
1.06682
2.31732
0.69799
0.245143
2.6186
6.78997
6.38511
2.73964
0.411796
julia> full(B+0.1*I)\b
10-element Array{Float64,1}:
0.395839
1.06682
2.31732
0.69799
0.245143
2.6186
6.78997
6.38511
2.73964
0.411796 which seems like it must have worked? (Note: only relevant for sparse matrices.) |
I get it. I was trying to do it for a dense matrix. Why isn't that keyword available for dense matrices? @andreasnoack I will see if I can find some literature (hopefully there are survey articles out there...), but I am open to trying different Hessian modifications instead of this simple versions. On the other hand, I have a gut feeling that this might work reasonably well for small size problems. However, this is hard to say without some benchmarking. As I said earlier, I really did this PR to get things moving, so I'm open to suggestions (and willing to throw some time at it) :) |
For sparse matrices, there are some opportunities that involve making decisions based on the pattern of sparsity, as these decisions amortize over different numerical values for those nonzero entries. For dense matrices, there are no decisions to make: you just plug and chug. So there's no advantage in having such syntax for dense matrices (it's "all numerical"). |
I guess its just a matter of how often it is used or not. As a shorthand, I guess I could still be there, although it doesn't add any numerical advantages. |
I can't exactly remember my plans with this. The fastest way to get a Newton's method that doesn't fail in concave regions would be to use https://github.com/timholy/PositiveFactorizations.jl . The clear con would be to add a dependency. Have you played around with this since you made the package @timholy ? Are you still certain that it performs better? If it is indeed better, I would be happy to use it instead of the current "solution". What do you say @johnmyleswhite - is adding a dependency a big no-no? |
For me it's both fast and reliable. See the |
Sounds good. If it's used in |
I trust @timholy's judgment here. |
It's not just "close to zero," it is zero, if the Hessian is positive definite. Performance-wise, I seem to remember it was within a factor of 2 of LAPACK's Cholesky factorization, though of course this would be worth verifying independently. If you're looking for reasons not to make it the default, that factor of 2 (if it's there) is the only good reason I can think of. But it's worth pointing out that you can't efficiently find out whether the Hessian is positive definite until you start trying to Cholesky factor it. If it's not positive definite, then you throw an error, and have to start from scratch. In contrast, this algorithm just deals with problematic diagonals (those <= 0) as it goes; if it never encounters any, then it returns the standard Cholesky decomposition. |
Okay, I must've read the README over there a bit too fast. Looks good. And a factor 2 is probably no worse than going uphill :) The "fix" in this PR currently is most certainly very slow, so I'm not too worried about that. Orders of magnitude would of course be a bit problematic for larger cases. I'll try it out and report back. |
Closed in favour of #177 |
… regions.
This is an attempt to resolve some of the problems with Newton's method going in the wrong direction. Related to #104 and my discussion with @johnmyleswhite there. To see that it works with the original starting values, run