-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create types for initial step length guess #70
Conversation
I'll set up the Optim side whenever I've dealt with the extrapolation step. |
src/hagerzhang.jl
Outdated
@@ -115,7 +132,7 @@ function _hagerzhang!(df, | |||
phic, dphic = linefunc!(df, x, s, c, xtmp, true) | |||
end | |||
if !(isfinite(phic) && isfinite(dphic)) | |||
println("Warning: failed to achieve finite new evaluation point, using alpha=0") | |||
Base.warn("Failed to achieve finite new evaluation point, using alpha=0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why Base.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I somehow had it in my head that warn was not exported :o
|
||
|
||
function (is::InitialHagerZhang)(state, dphi0, df) | ||
if isnan(state.f_x_previous) && isnan(is.α0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is isnan(state.f_x_previous)
used to check if this is the very first iteration? Alternatively, we could just pass the iteration number. It may be fine, I'm just curious if that is the purpose. If we keep it this way, you should probably add a comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was my hack to check whether it's the first iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, it would be nice if there was a comment just mentioning this.
src/hagerzhang.jl
Outdated
s::Array, | ||
xtmp::Array, | ||
lsr::LineSearchResults, | ||
psi1::Real = convert(T,0.2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these just be set in the function body? I mean, how are people going to set these, and more importantly: who are going to set these? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I know you didn't do this, I'm just asking since we're here now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to x
, s
, xtmp
and lsr
, which are all in state
from the Optim
perspective,
or to psi1
etc. ?
I agree we can do some more invasive changes as we're already doing this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm talking about all the "optional" positional arguments. Not sure if the original author meant for these to be keywords or not, but since you can't set psi1
anywhere, it might as well just be a variable set in the function body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optional arguments can be set by the user when they call InitialHagerZhang()
. The only option you can't set is the iterfinitemax
. That was just me not thinking far enough, I'll add that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the value of iterfinitemax
seems to be related to the type information, to keep the value of x
finite.
So I think there is no reason for the user to set it, and if there is it would be just as complicated for them to edit the source code as anything else.
src/hagerzhang.jl
Outdated
# TODO: deal with different types for the inputs | ||
function _hzI0(x::Array{T}, | ||
gr::Array, | ||
f_x::Real, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f_x::T
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering whether that would be better as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change it.
Strictly speaking, however, x
and gr
are in different spaces, so there may be cases where we want different types / eltypes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know. Anyway, it shouldn't really matter.
src/initialguess.jl
Outdated
|
||
function (is::InitialStatic)(state, dphi0, df) | ||
state.alpha = is.alpha | ||
if is.scaled == true && (ns = norm(state.s)) > zero(typeof(is.alpha)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have typeof(is.alpha)
in InitialStatic
, so you might as well use it
julia> struct A{T} end
julia> function (a::A{T})(x) where T
println(convert(T, x))
end
julia> a=A{Float64}()
A{Float64}()
julia> a(3)
3.0
julia> a(BigInt(3))
3.0
so you can simply add the type parameter
funciton (is::InitialStatic{T})(state, dphi0, df) where T
...
src/initialguess.jl
Outdated
state.alpha = is.alpha | ||
if is.scaled == true && (ns = norm(state.s)) > zero(typeof(is.alpha)) | ||
# TODO: Type instability if there's a type mismatch between is.alpha and ns | ||
state.alpha *= min(is.alpha, ns) / ns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most cases eltype(state.s)
will be typeof(is.alpha)
though, but yeah...
Codecov Report
@@ Coverage Diff @@
## master #70 +/- ##
=========================================
+ Coverage 61.9% 62.5% +0.59%
=========================================
Files 7 8 +1
Lines 546 600 +54
=========================================
+ Hits 338 375 +37
- Misses 208 225 +17
Continue to review full report at Codecov.
|
I'm satisfied with this now. @pkofod let me know if you're OK with the changes (and the sparse documentation). |
sorry I am slow to respond. I am trying to remember what I did but can't find the original |
I think you are referring to https://github.com/JuliaNLSolvers/Optim.jl/blob/master/src/utilities/perform_linesearch.jl#L30 It seems most likely that this extrapolation should work fine even if the approximate Hessian is no longer SPD (which is when |
Looking at that code, my only concern now is that this guess does not impose an upper bound on alpha. Other than that it looks fine and has worked very well for me in the past. ANyhow, your question was about |
Good point, I'll give the user an option to set a maximum (and minimum) alpha.
I see. I think it should be possible for users to compose the different |
src/initialguess.jl
Outdated
@@ -70,7 +71,7 @@ function (is::InitialQuadratic{T})(state, dphi0, df) where T | |||
αguess = is.α0 | |||
else | |||
αguess = 2.0 * (NLSolversBase.value(df) - state.f_x_previous) / dphi0 | |||
αguess = min(one(T), 1.01*αguess) # See Nocedal+Wright | |||
αguess = min(is.αmax, 1.01*αguess) # See Nocedal + Wright, using is.αmax = 1.0 | |||
αguess = max(is.αmin, αguess) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there should be a lower bound on the initial step-length. What if the problem is scaled such that 1e-2 is the optimal step-length? This would not be extraordinarily bad scaling at all. I generally prefer
aguess = max(0.25*aprevious, guess)
if you want amin
in there, then the default should be much smaller I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed minimum initial step length to 1e-12
, which I hope is a default that is small enough for most problems. The proportional decrease can also be useful, thanks.
Regarding the default maximum step length, I set that to 1.0 as it was mentioned as useful for Newton and quasi-Newton problems in Nocedal + Wright (after eqn 3.60 in 2nd edition).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Newton methods I agree. For quasi-Newton I strongly disagree. The premise that 1.0 is a good step is generally only true in the asymptotic regime, which in many problems you never hit. But maybe in practise this is not as bad - or at least not very often. Maybe leave this for now and see if it causes any trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The premise that 1.0 is a good step is generally only true in the asymptotic regime, which in many problems you never hit.
My impression is that part of L-BFGS's big success is that it tends to work very well with a step length of 1.0, so long as one implements the scaling of the inverse Hessian (as I am attempting to do in JuliaNLSolvers/Optim.jl#484 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Once you scale the inverse hessian I am happy!
src/initialguess.jl
Outdated
else | ||
αguess = 2.0 * (NLSolversBase.value(df) - state.f_x_previous) / dphi0 | ||
αguess = NaNMath.max(is.αmin, state.alpha*is.ρ, αguess) | ||
αguess = NaNMath.min(is.αmax, 1.01*αguess) # See Nocedal + Wright, using is.αmax = 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continuing from the previous conversation. Why not have a snap2one
kind of thing here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have almost convinced me that this could be a just-as-good approach. It achieves something similar to what N+W tries to do with min(1.01aguess, amax)
, however, I'm not sure exactly what the difference will be in practice.
I also see from your defaults in the Optim code, with snap2one=(0.75,Inf)
, which would be the same as setting amax = 1.0
. Do you think those are good snap2one
default values, or shall I use something else?
Over the last week, I've gotten a much better appreciation for how important a good step length guess is. So I'm happy to discuss these properly before implementing some damaging default parameters. (Over the long term, optimizing over default parameters on a large test set would be very neat)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also see from your defaults in the Optim code, with snap2one=(0.75,Inf), which would be the same as setting amax = 1.0. Do you think those are good snap2one default values, or shall I use something else?
I'm not sure I set those default values, but either way this sounds ok.
Over the last week, I've gotten a much better appreciation for how important a good step length guess is. So I'm happy to discuss these properly before implementing some damaging default parameters.
Yes I found that as well when I implemented my first optimisation code.
(Over the long term, optimizing over default parameters on a large test set would be very neat)
I think that would be fantastic.
From my perspective I'm happy with this PR, so don't wait for me to merge it in case I don't have time to review again. |
This is ready now. @pkofod any final comments? |
Perfect. Good to see you updated NEWS and README as well! |
Fixes #69
Currently missing:
I'm not exactly sure how to handle the if-statement in L-BFGS
alphaguess
, where it checks whetherstate.pseudo_iteration > 1
. Was the intention here to not use extrapolation whenL-BFGS
is reset due to non-positive-definite Hessian approximation, or is it fine as long asstate.f_x_previous
is finite? (@cortner if you have time, can you give quick feedback on this, please?)