Create types for initial step length guess #70

anriseth · 2017-11-02T18:03:21Z

Fixes #69

Currently missing:

Documentation. I should list the different InitialGuess types in the README.
Extrapolate guess from L-BFGS implemented by @cortner.

I'm not exactly sure how to handle the if-statement in L-BFGS alphaguess, where it checks whether state.pseudo_iteration > 1. Was the intention here to not use extrapolation when L-BFGS is reset due to non-positive-definite Hessian approximation, or is it fine as long as state.f_x_previous is finite? (@cortner if you have time, can you give quick feedback on this, please?)

anriseth · 2017-11-02T18:05:15Z

I'll set up the Optim side whenever I've dealt with the extrapolation step.
Then we can tag LineSearches, so that Optim can update its alphaguess! functionality.

pkofod · 2017-11-02T18:59:14Z

src/hagerzhang.jl

@@ -115,7 +132,7 @@ function _hagerzhang!(df,
        phic, dphic = linefunc!(df, x, s, c, xtmp, true)
    end
    if !(isfinite(phic) && isfinite(dphic))
-        println("Warning: failed to achieve finite new evaluation point, using alpha=0")
+        Base.warn("Failed to achieve finite new evaluation point, using alpha=0")


why Base.?

I somehow had it in my head that warn was not exported :o

pkofod · 2017-11-02T19:08:38Z

src/hagerzhang.jl

+
+
+function (is::InitialHagerZhang)(state, dphi0, df)
+    if isnan(state.f_x_previous) && isnan(is.α0)


Is isnan(state.f_x_previous) used to check if this is the very first iteration? Alternatively, we could just pass the iteration number. It may be fine, I'm just curious if that is the purpose. If we keep it this way, you should probably add a comment here.

Yes, this was my hack to check whether it's the first iteration.

Alright, it would be nice if there was a comment just mentioning this.

pkofod · 2017-11-02T19:10:26Z

src/hagerzhang.jl

+                s::Array,
+                xtmp::Array,
+                lsr::LineSearchResults,
+                psi1::Real = convert(T,0.2),


Should these just be set in the function body? I mean, how are people going to set these, and more importantly: who are going to set these? :)

(I know you didn't do this, I'm just asking since we're here now)

Are you referring to x, s, xtmp and lsr, which are all in state from the Optim perspective,
or to psi1 etc. ?

I agree we can do some more invasive changes as we're already doing this :)

I'm talking about all the "optional" positional arguments. Not sure if the original author meant for these to be keywords or not, but since you can't set psi1 anywhere, it might as well just be a variable set in the function body.

The optional arguments can be set by the user when they call InitialHagerZhang(). The only option you can't set is the iterfinitemax. That was just me not thinking far enough, I'll add that as well.

Actually, the value of iterfinitemax seems to be related to the type information, to keep the value of x finite.

So I think there is no reason for the user to set it, and if there is it would be just as complicated for them to edit the source code as anything else.

pkofod · 2017-11-02T19:14:36Z

src/hagerzhang.jl

+# TODO: deal with different types for the inputs
+function _hzI0(x::Array{T},
+               gr::Array,
+               f_x::Real,


I was wondering whether that would be better as well

I'll change it.

Strictly speaking, however, x and gr are in different spaces, so there may be cases where we want different types / eltypes here.

Yes, I know. Anyway, it shouldn't really matter.

pkofod · 2017-11-02T19:20:44Z

src/initialguess.jl

+
+function (is::InitialStatic)(state, dphi0, df)
+    state.alpha = is.alpha
+    if is.scaled == true && (ns = norm(state.s)) > zero(typeof(is.alpha))


you have typeof(is.alpha) in InitialStatic, so you might as well use it

julia> struct A{T} end julia> function (a::A{T})(x) where T println(convert(T, x)) end julia> a=A{Float64}() A{Float64}() julia> a(3) 3.0 julia> a(BigInt(3)) 3.0

so you can simply add the type parameter

funciton (is::InitialStatic{T})(state, dphi0, df) where T ...

pkofod · 2017-11-02T19:23:17Z

src/initialguess.jl

+    state.alpha = is.alpha
+    if is.scaled == true && (ns = norm(state.s)) > zero(typeof(is.alpha))
+        # TODO: Type instability if there's a type mismatch between is.alpha and ns
+        state.alpha *= min(is.alpha, ns) / ns


In most cases eltype(state.s) will be typeof(is.alpha) though, but yeah...

codecov · 2017-11-02T20:01:11Z

Codecov Report

Merging #70 into master will increase coverage by 0.59%.
The diff coverage is 64.64%.

@@            Coverage Diff            @@
##           master     #70      +/-   ##
=========================================
+ Coverage    61.9%   62.5%   +0.59%     
=========================================
  Files           7       8       +1     
  Lines         546     600      +54     
=========================================
+ Hits          338     375      +37     
- Misses        208     225      +17

Impacted Files	Coverage Δ
src/deprecate.jl	`0% <0%> (ø)`
src/initialguess.jl	`100% <100%> (ø)`
src/backtracking.jl	`81.81% <33.33%> (-12.47%)`	⬇️
src/hagerzhang.jl	`56.15% <64.15%> (+2.3%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9140e08...ee18b9d. Read the comment docs.

anriseth · 2017-11-05T16:01:30Z

I'm satisfied with this now. @pkofod let me know if you're OK with the changes (and the sparse documentation).

cortner · 2017-11-06T22:53:35Z

sorry I am slow to respond. I am trying to remember what I did but can't find the original alphaguess implementation. Where should it be?

anriseth · 2017-11-06T22:58:31Z

I am trying to remember what I did but can't find the original alphaguess implementation. Where should it be?

I think you are referring to https://github.com/JuliaNLSolvers/Optim.jl/blob/master/src/utilities/perform_linesearch.jl#L30

It seems most likely that this extrapolation should work fine even if the approximate Hessian is no longer SPD (which is when pseudo_iteration is reset). So as long as f_x_previous has a correct value, I think this makes sense.

cortner · 2017-11-06T23:06:13Z

Looking at that code, my only concern now is that this guess does not impose an upper bound on alpha. Other than that it looks fine and has worked very well for me in the past.

ANyhow, your question was about state.pseudo_iteration > 1. I don't think I wrote that. If I did then I don't remember. After reset, I would normally use 1 for a well-scaled problem or otherwise one of the heuristics based on |grad|. There are probably some of those already implemented in Optim

anriseth · 2017-11-07T09:55:49Z

Looking at that code, my only concern now is that this guess does not impose an upper bound on alpha.

Good point, I'll give the user an option to set a maximum (and minimum) alpha.

After reset, I would normally use 1 for a well-scaled problem or otherwise one of the heuristics based on |grad|. There are probably some of those already implemented in Optim

I see. I think it should be possible for users to compose the different Initial* objects in a way that lets them do that, as long as the information needed is stored in the state object, dphi0 or df.

cortner · 2017-11-07T12:11:13Z

src/initialguess.jl

@@ -70,7 +71,7 @@ function (is::InitialQuadratic{T})(state, dphi0, df) where T
        αguess = is.α0
    else
        αguess = 2.0 * (NLSolversBase.value(df) - state.f_x_previous) / dphi0
-        αguess = min(one(T), 1.01*αguess) # See Nocedal+Wright
+        αguess = min(is.αmax, 1.01*αguess) # See Nocedal + Wright, using is.αmax = 1.0
        αguess = max(is.αmin, αguess)


I don't think there should be a lower bound on the initial step-length. What if the problem is scaled such that 1e-2 is the optimal step-length? This would not be extraordinarily bad scaling at all. I generally prefer

aguess = max(0.25*aprevious, guess)

if you want amin in there, then the default should be much smaller I think.

I've changed minimum initial step length to 1e-12, which I hope is a default that is small enough for most problems. The proportional decrease can also be useful, thanks.

Regarding the default maximum step length, I set that to 1.0 as it was mentioned as useful for Newton and quasi-Newton problems in Nocedal + Wright (after eqn 3.60 in 2nd edition).

For Newton methods I agree. For quasi-Newton I strongly disagree. The premise that 1.0 is a good step is generally only true in the asymptotic regime, which in many problems you never hit. But maybe in practise this is not as bad - or at least not very often. Maybe leave this for now and see if it causes any trouble.

The premise that 1.0 is a good step is generally only true in the asymptotic regime, which in many problems you never hit.

My impression is that part of L-BFGS's big success is that it tends to work very well with a step length of 1.0, so long as one implements the scaling of the inverse Hessian (as I am attempting to do in JuliaNLSolvers/Optim.jl#484 )

Good point. Once you scale the inverse hessian I am happy!

cortner · 2017-11-07T22:23:03Z

src/initialguess.jl

+    else
+        αguess = 2.0 * (NLSolversBase.value(df) - state.f_x_previous) / dphi0
+        αguess = NaNMath.max(is.αmin, state.alpha*is.ρ, αguess)
+        αguess = NaNMath.min(is.αmax, 1.01*αguess) # See Nocedal + Wright, using is.αmax = 1.0


continuing from the previous conversation. Why not have a snap2one kind of thing here instead?

You have almost convinced me that this could be a just-as-good approach. It achieves something similar to what N+W tries to do with min(1.01aguess, amax), however, I'm not sure exactly what the difference will be in practice.

I also see from your defaults in the Optim code, with snap2one=(0.75,Inf), which would be the same as setting amax = 1.0. Do you think those are good snap2one default values, or shall I use something else?

Over the last week, I've gotten a much better appreciation for how important a good step length guess is. So I'm happy to discuss these properly before implementing some damaging default parameters. (Over the long term, optimizing over default parameters on a large test set would be very neat)

I also see from your defaults in the Optim code, with snap2one=(0.75,Inf), which would be the same as setting amax = 1.0. Do you think those are good snap2one default values, or shall I use something else?

I'm not sure I set those default values, but either way this sounds ok.

Over the last week, I've gotten a much better appreciation for how important a good step length guess is. So I'm happy to discuss these properly before implementing some damaging default parameters.

Yes I found that as well when I implemented my first optimisation code.

(Over the long term, optimizing over default parameters on a large test set would be very neat)

I think that would be fantastic.

cortner · 2017-11-08T23:01:05Z

From my perspective I'm happy with this PR, so don't wait for me to merge it in case I don't have time to review again.

anriseth · 2017-11-09T09:37:49Z

This is ready now. @pkofod any final comments?

pkofod · 2017-11-10T16:25:19Z

Perfect. Good to see you updated NEWS and README as well!

anriseth changed the title ~~Create types for initial step length guess~~ [WIP/RFC] Create types for initial step length guess Nov 2, 2017

pkofod reviewed Nov 2, 2017

View reviewed changes

This was referenced Nov 3, 2017

L-BFGS: Use scaled initial guess for inverse Hessian JuliaNLSolvers/Optim.jl#484

Merged

Use alphaguess functionality from LineSearches JuliaNLSolvers/Optim.jl#486

Merged

anriseth changed the title ~~[WIP/RFC] Create types for initial step length guess~~ Create types for initial step length guess Nov 5, 2017

cortner reviewed Nov 7, 2017

View reviewed changes

anriseth added 17 commits November 7, 2017 15:11

Create InitialHagerZhang

6b69827

Deprecate

5950d3e

Add tests

b1f2501

Use @dot

2669ef7

Fix deprecations

90d68cf

Introduce InitialStatic and InitialPrevious

9db37fd

Add tests

8d125b1

Fixes after feedback

2ef4dc4

Remove optional argument that is difficult to set

ebbc14a

Add two more inital guess procedures

cb6711b

Add tests

547bf4e

Fix issues in InitialHagerZhang

2319e65

Make sure f(x+alpha*s) is finite

7744e01

Fix deprecation function call

bf956b8

Default mayterminate to true

7b50949

Typo x_max using gr

0d1681c

Update mayterminate test

7e4be57

Asbjørn Nilsen Riseth and others added 7 commits November 7, 2017 15:11

Default to initial guess 1.0, as used before

81dc7e7

Document alphaguess in README

8e0796c

Add release notes for NEWS

a47aa06

Update tests to work with new defaults

a852d9b

Add initial step bounds for InitialPrevious

0af6171

Make InitialQuadratic upper bound user-defined

3e12670

Change parameters and defaults for Quadratic and ConstantChonge

7365145

anriseth force-pushed the alphaguess branch from 3985df4 to 7365145 Compare November 7, 2017 15:16

cortner reviewed Nov 7, 2017

View reviewed changes

Include snap2one

ee18b9d

anriseth merged commit 56155bd into master Nov 10, 2017

pkofod deleted the alphaguess branch November 10, 2017 16:24



		function (is::InitialHagerZhang)(state, dphi0, df)
		if isnan(state.f_x_previous) && isnan(is.α0)

Create types for initial step length guess #70

Create types for initial step length guess #70

Conversation

anriseth commented Nov 2, 2017 • edited Loading

anriseth commented Nov 2, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 2, 2017 • edited Loading

Codecov Report

anriseth commented Nov 5, 2017

cortner commented Nov 6, 2017

anriseth commented Nov 6, 2017

cortner commented Nov 6, 2017

anriseth commented Nov 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cortner commented Nov 8, 2017

anriseth commented Nov 9, 2017

pkofod commented Nov 10, 2017

anriseth commented Nov 2, 2017 •

edited

Loading

codecov bot commented Nov 2, 2017 •

edited

Loading

anriseth commented Nov 7, 2017 •

edited

Loading