Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create types for initial step length guess #70

Merged
merged 25 commits into from
Nov 10, 2017
Merged

Create types for initial step length guess #70

merged 25 commits into from
Nov 10, 2017

Conversation

anriseth
Copy link
Collaborator

@anriseth anriseth commented Nov 2, 2017

Fixes #69

Currently missing:

  • Documentation. I should list the different InitialGuess types in the README.
  • Extrapolate guess from L-BFGS implemented by @cortner.

I'm not exactly sure how to handle the if-statement in L-BFGS alphaguess, where it checks whether state.pseudo_iteration > 1. Was the intention here to not use extrapolation when L-BFGS is reset due to non-positive-definite Hessian approximation, or is it fine as long as state.f_x_previous is finite? (@cortner if you have time, can you give quick feedback on this, please?)

@anriseth anriseth changed the title Create types for initial step length guess [WIP/RFC] Create types for initial step length guess Nov 2, 2017
@anriseth
Copy link
Collaborator Author

anriseth commented Nov 2, 2017

I'll set up the Optim side whenever I've dealt with the extrapolation step.
Then we can tag LineSearches, so that Optim can update its alphaguess! functionality.

@@ -115,7 +132,7 @@ function _hagerzhang!(df,
phic, dphic = linefunc!(df, x, s, c, xtmp, true)
end
if !(isfinite(phic) && isfinite(dphic))
println("Warning: failed to achieve finite new evaluation point, using alpha=0")
Base.warn("Failed to achieve finite new evaluation point, using alpha=0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why Base.?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somehow had it in my head that warn was not exported :o



function (is::InitialHagerZhang)(state, dphi0, df)
if isnan(state.f_x_previous) && isnan(is.α0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is isnan(state.f_x_previous) used to check if this is the very first iteration? Alternatively, we could just pass the iteration number. It may be fine, I'm just curious if that is the purpose. If we keep it this way, you should probably add a comment here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was my hack to check whether it's the first iteration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, it would be nice if there was a comment just mentioning this.

s::Array,
xtmp::Array,
lsr::LineSearchResults,
psi1::Real = convert(T,0.2),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these just be set in the function body? I mean, how are people going to set these, and more importantly: who are going to set these? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I know you didn't do this, I'm just asking since we're here now)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to x, s, xtmp and lsr, which are all in state from the Optim perspective,
or to psi1 etc. ?

I agree we can do some more invasive changes as we're already doing this :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about all the "optional" positional arguments. Not sure if the original author meant for these to be keywords or not, but since you can't set psi1 anywhere, it might as well just be a variable set in the function body.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optional arguments can be set by the user when they call InitialHagerZhang(). The only option you can't set is the iterfinitemax. That was just me not thinking far enough, I'll add that as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the value of iterfinitemax seems to be related to the type information, to keep the value of x finite.

So I think there is no reason for the user to set it, and if there is it would be just as complicated for them to edit the source code as anything else.

# TODO: deal with different types for the inputs
function _hzI0(x::Array{T},
gr::Array,
f_x::Real,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f_x::T?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering whether that would be better as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it.

Strictly speaking, however, x and gr are in different spaces, so there may be cases where we want different types / eltypes here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know. Anyway, it shouldn't really matter.


function (is::InitialStatic)(state, dphi0, df)
state.alpha = is.alpha
if is.scaled == true && (ns = norm(state.s)) > zero(typeof(is.alpha))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have typeof(is.alpha) in InitialStatic, so you might as well use it

julia> struct A{T} end

julia> function (a::A{T})(x) where T
           println(convert(T, x))
       end

julia> a=A{Float64}()
A{Float64}()

julia> a(3)
3.0

julia> a(BigInt(3))
3.0

so you can simply add the type parameter

funciton (is::InitialStatic{T})(state, dphi0, df) where T
...

state.alpha = is.alpha
if is.scaled == true && (ns = norm(state.s)) > zero(typeof(is.alpha))
# TODO: Type instability if there's a type mismatch between is.alpha and ns
state.alpha *= min(is.alpha, ns) / ns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases eltype(state.s) will be typeof(is.alpha) though, but yeah...

@codecov
Copy link

codecov bot commented Nov 2, 2017

Codecov Report

Merging #70 into master will increase coverage by 0.59%.
The diff coverage is 64.64%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #70      +/-   ##
=========================================
+ Coverage    61.9%   62.5%   +0.59%     
=========================================
  Files           7       8       +1     
  Lines         546     600      +54     
=========================================
+ Hits          338     375      +37     
- Misses        208     225      +17
Impacted Files Coverage Δ
src/deprecate.jl 0% <0%> (ø)
src/initialguess.jl 100% <100%> (ø)
src/backtracking.jl 81.81% <33.33%> (-12.47%) ⬇️
src/hagerzhang.jl 56.15% <64.15%> (+2.3%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9140e08...ee18b9d. Read the comment docs.

@anriseth anriseth changed the title [WIP/RFC] Create types for initial step length guess Create types for initial step length guess Nov 5, 2017
@anriseth
Copy link
Collaborator Author

anriseth commented Nov 5, 2017

I'm satisfied with this now. @pkofod let me know if you're OK with the changes (and the sparse documentation).

@cortner
Copy link
Contributor

cortner commented Nov 6, 2017

sorry I am slow to respond. I am trying to remember what I did but can't find the original alphaguess implementation. Where should it be?

@anriseth
Copy link
Collaborator Author

anriseth commented Nov 6, 2017

I am trying to remember what I did but can't find the original alphaguess implementation. Where should it be?

I think you are referring to https://github.com/JuliaNLSolvers/Optim.jl/blob/master/src/utilities/perform_linesearch.jl#L30

It seems most likely that this extrapolation should work fine even if the approximate Hessian is no longer SPD (which is when pseudo_iteration is reset). So as long as f_x_previous has a correct value, I think this makes sense.

@cortner
Copy link
Contributor

cortner commented Nov 6, 2017

Looking at that code, my only concern now is that this guess does not impose an upper bound on alpha. Other than that it looks fine and has worked very well for me in the past.

ANyhow, your question was about state.pseudo_iteration > 1. I don't think I wrote that. If I did then I don't remember. After reset, I would normally use 1 for a well-scaled problem or otherwise one of the heuristics based on |grad|. There are probably some of those already implemented in Optim

@anriseth
Copy link
Collaborator Author

anriseth commented Nov 7, 2017

Looking at that code, my only concern now is that this guess does not impose an upper bound on alpha.

Good point, I'll give the user an option to set a maximum (and minimum) alpha.

After reset, I would normally use 1 for a well-scaled problem or otherwise one of the heuristics based on |grad|. There are probably some of those already implemented in Optim

I see. I think it should be possible for users to compose the different Initial* objects in a way that lets them do that, as long as the information needed is stored in the state object, dphi0 or df.

@@ -70,7 +71,7 @@ function (is::InitialQuadratic{T})(state, dphi0, df) where T
αguess = is.α0
else
αguess = 2.0 * (NLSolversBase.value(df) - state.f_x_previous) / dphi0
αguess = min(one(T), 1.01*αguess) # See Nocedal+Wright
αguess = min(is.αmax, 1.01*αguess) # See Nocedal + Wright, using is.αmax = 1.0
αguess = max(is.αmin, αguess)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there should be a lower bound on the initial step-length. What if the problem is scaled such that 1e-2 is the optimal step-length? This would not be extraordinarily bad scaling at all. I generally prefer

aguess = max(0.25*aprevious, guess)

if you want amin in there, then the default should be much smaller I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed minimum initial step length to 1e-12, which I hope is a default that is small enough for most problems. The proportional decrease can also be useful, thanks.

Regarding the default maximum step length, I set that to 1.0 as it was mentioned as useful for Newton and quasi-Newton problems in Nocedal + Wright (after eqn 3.60 in 2nd edition).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Newton methods I agree. For quasi-Newton I strongly disagree. The premise that 1.0 is a good step is generally only true in the asymptotic regime, which in many problems you never hit. But maybe in practise this is not as bad - or at least not very often. Maybe leave this for now and see if it causes any trouble.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The premise that 1.0 is a good step is generally only true in the asymptotic regime, which in many problems you never hit.

My impression is that part of L-BFGS's big success is that it tends to work very well with a step length of 1.0, so long as one implements the scaling of the inverse Hessian (as I am attempting to do in JuliaNLSolvers/Optim.jl#484 )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Once you scale the inverse hessian I am happy!

else
αguess = 2.0 * (NLSolversBase.value(df) - state.f_x_previous) / dphi0
αguess = NaNMath.max(is.αmin, state.alpha*is.ρ, αguess)
αguess = NaNMath.min(is.αmax, 1.01*αguess) # See Nocedal + Wright, using is.αmax = 1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continuing from the previous conversation. Why not have a snap2one kind of thing here instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have almost convinced me that this could be a just-as-good approach. It achieves something similar to what N+W tries to do with min(1.01aguess, amax), however, I'm not sure exactly what the difference will be in practice.

I also see from your defaults in the Optim code, with snap2one=(0.75,Inf), which would be the same as setting amax = 1.0. Do you think those are good snap2one default values, or shall I use something else?

Over the last week, I've gotten a much better appreciation for how important a good step length guess is. So I'm happy to discuss these properly before implementing some damaging default parameters. (Over the long term, optimizing over default parameters on a large test set would be very neat)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also see from your defaults in the Optim code, with snap2one=(0.75,Inf), which would be the same as setting amax = 1.0. Do you think those are good snap2one default values, or shall I use something else?

I'm not sure I set those default values, but either way this sounds ok.

Over the last week, I've gotten a much better appreciation for how important a good step length guess is. So I'm happy to discuss these properly before implementing some damaging default parameters.

Yes I found that as well when I implemented my first optimisation code.

(Over the long term, optimizing over default parameters on a large test set would be very neat)

I think that would be fantastic.

@cortner
Copy link
Contributor

cortner commented Nov 8, 2017

From my perspective I'm happy with this PR, so don't wait for me to merge it in case I don't have time to review again.

@anriseth
Copy link
Collaborator Author

anriseth commented Nov 9, 2017

This is ready now. @pkofod any final comments?

@anriseth anriseth merged commit 56155bd into master Nov 10, 2017
@pkofod pkofod deleted the alphaguess branch November 10, 2017 16:24
@pkofod
Copy link
Member

pkofod commented Nov 10, 2017

Perfect. Good to see you updated NEWS and README as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants