Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backtracking line search with interpolation #7

Merged
merged 4 commits into from
Oct 10, 2016

Conversation

cortner
Copy link
Contributor

@cortner cortner commented Oct 9, 2016

This PR adds a second backtracking line search, interpbacktrack_linesearch! which, instead of an a *= rho update performs an interpolation step; this is highly efficient for some problems; see QCG and QLBFGS in the following figure (from my own research);
optim

It also outperforms HZ and MT line-search for some more standard model problems. In general, it is no worse than standard backtracking, which I left only for the sake of compatibility. I would actually recommend to remove standard backtracking altogether and replace it with interpbacktrack.

@codecov-io
Copy link

codecov-io commented Oct 9, 2016

Current coverage is 48.67% (diff: 94.73%)

Merging #7 into master will increase coverage by 1.13%

@@             master         #7   diff @@
==========================================
  Files             7          7          
  Lines           591        606    +15   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            281        295    +14   
- Misses          310        311     +1   
  Partials          0          0          

Powered by Codecov. Last update 90d99c3...880e13d

# provided that c1 < 1/2; the backtrack_condition at the beginning
# of the function guarantees at least a backtracking factor rho.
alpha1 = - (gxp * alpha) / ( 2.0 * ((f_x_scratch - f_x)/alpha - gxp) )
alpha = max(alpha1, alpha * min(0.25, rho)) # avoid miniscule steps
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a heuristic for choosing 0.25 over other numbers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only personal experience - do you want to make it a parameter?

Copy link
Collaborator

@anriseth anriseth Oct 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think a rhomin parameter defaulted to 0.25 can be useful.
EDIT: I'm not sure if rhomin is the best name :p

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll call the parameter mindecrease

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mindecfact

@anriseth anriseth mentioned this pull request Oct 9, 2016
c1 = backtrack_condition
end
if rho <= 0.25
warn("rho <= 0.25; revert to standard backtracking")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that we only revert back to standard backtracking if
alpha1 < alpha * min(0.25, rho). Is that correct, or have I misunderstood?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interpbacktrack ensures that at each step alpha is decreased by at least a factor rho. So if rho < 0.25 (or, rho <= mindecrease) then it will be standard backtracking with factor rho.

@anriseth
Copy link
Collaborator

anriseth commented Oct 9, 2016

Thanks! I have found the current backtracking algorithm to be poor-performing, so this is great.
Is there a source describing the approach that we can reference, in case people want to better understand the idea behind the interpolation step?

An alternative to removing backtracking, can be to make the interp flag true by default?
I think we should make a NEWS.md similar to Optim to keep track of these changes as well.

@cortner
Copy link
Contributor Author

cortner commented Oct 9, 2016

The problem with just setting interp=true is it then gets awkward to pass backtracking as an option.

Sent from my iPhone

On 9 Oct 2016, at 19:12, Asbj?rn Nilsen Riseth <[email protected]mailto:[email protected]> wrote:

Thanks! I have found the current backtracking algorithm to be poor-performing, so this is great.
Is there a source describing the approach that we can reference, in case people want to better understand the idea behind the interpolation step?

An alternative to removing backtracking, can be to make the interp flag true by default?
I think we should make a NEWS.md similar to Optim to keep track of these changes as well.

You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHubhttps://github.com//pull/7#issuecomment-252498868, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHVRQ1L5Tt_H8gE1ygxaX-6d0CqawCpGks5qySCSgaJpZM4KSClt.

@cortner
Copy link
Contributor Author

cortner commented Oct 9, 2016

Funnily enough I've not seen this anywhere, hence I mentioned it in an appendix of a paper where I used it. But this certainly doesn't mean others haven't done it -it is an extremely simple and natural idea. Maybe if you run into Nick Gould at some point ask him?

P.S.: I meant I'd feel a bit embarrassed citing my own paper for this. But if you want I can add more documentation.

if interp # this means we are coming from interpbacktrack_linesearch!
backtrack_condition = 1.0 - 1.0/(2*rho) # want guaranteed backtrack factor
if c1 >= backtrack_condition
warning("""The Armijo constant c1 is too large; replacing it with
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be warn? I couldn't find a warning function in Base on Julia 0.5.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right - thank you for catching this.


# Store angle between search direction and gradient
gxp = vecdot(gr_scratch, s)
# read f_x and slope from LineSearchResults
Copy link
Collaborator

@anriseth anriseth Oct 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is useful. It seems like Optim is always passing in an up to date LineSearchResults.
@KristofferC: Will taking f_x and gxp from lsr break NLOpts NLSolves use of line searches, or is this fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I think this needs to be cleaned up in other linesearches as well. According to @pkofod only HZ uses this so far.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should probably be OK. Might make our hack for this unnecessary in NLsolve.jl.

@anriseth
Copy link
Collaborator

anriseth commented Oct 9, 2016

I had a look in Nocedal and Wright - Numerical Optimization.
This algorithm is described as an enhancement to standard backtracking, found in Chapter 3, the section on Interpolation (page 56 in my edition).

They go one step further:
In the first iteration they use quadratic interpolation, and then a cubic interpolation with the two previous alpha-values, i.e. f(0), f'(0), f(α_{k}), f(α_{k-1})

@cortner
Copy link
Contributor Author

cortner commented Oct 10, 2016

great - I didn’t realise they discussed this.

I specifically don’t want to use the cubic though. I’ve found this quadratic interpolation to be extremely robust. But if we can introduce the linesearch options, then we can just let the user choose which interpolation to use.

On 9 Oct 2016, at 23:49, Asbjørn Nilsen Riseth [email protected] wrote:

I had a look in Nocedal and Wright - Numerical Optimization.
This algorithm is described as an enhancement to standard backtracking, found in Chapter 3, the section on Interpolation (page 56 in my edition).

They go one step further:
In the first iteration they use quadratic interpolation, and then a cubic interpolation with the two previous alpha-values, i.e. f(0), f'(0), f(α{k}), f(α{k-1})


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@cortner
Copy link
Contributor Author

cortner commented Oct 10, 2016

I added the reference now. It would be good to add the cubic interpolation as well. Hopefully we can do that in a separate pull request after moving to LineSearchOptions.

@anriseth
Copy link
Collaborator

Great, I'll merge later today.

@anriseth anriseth merged commit b7818e4 into JuliaNLSolvers:master Oct 10, 2016
anriseth pushed a commit that referenced this pull request Dec 23, 2016
* added a backtracking line search with interpolation

* mindecfact and some documentation

* changed `warning to

* added reference
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants