Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New NLSolversBase #27

Merged
merged 14 commits into from
Dec 18, 2017
Merged

New NLSolversBase #27

merged 14 commits into from
Dec 18, 2017

Conversation

pkofod
Copy link
Member

@pkofod pkofod commented Dec 11, 2017

What
This PR brings in the OnceDifferentiable constructors from Optim and the DifferentiableMultivariateFunction type (now also just OnceDifferentiable) into NLSolversBase.

Why
I wanted to get NLsolve up-to-date, and one thing that annoyed me was that *DifferentiableMultivariateFunction has an annoying name: the special thing it that it's a multivalued function (Optim objectives are also multivariate) but we can just have dispatch handle that, it's a very long name, and the sparse versions are even longer. I think it will make Optim and NLsolve seem much more like something coming from the same org that we avoid type/functionality duplication. Pulling it here is needed in order to have

using Optim, NLsolve

play nice (as they're exported).

As I said, I think it's very valuable to share types among Optim, NLsolve, and even LsqFit (LM would be using the same structure as NLsolve for example) in the future where it makes sense. Constructors need only be defined in NLSolversBase if they not type restricted, only autodiff functionality is defined in each package (mainly to avoid unnecessary dependencies).

I allows for conventions and sharing of ideas between the packages much better, and we only have to test this stuff in one place.

This is Julia, we don't need a separate type for each jacobian type, each gradient type or each array type in general.

What's new
The syntax is different now as we need to handle both R^N->R and R^N->R^N cases. We used to seed with x, but obviously we need to seed F as well now, and not through the type name. This is handled with positional arguments.

OnceDifferentiable(f, F::Real, x)
OnceDifferentiable(f, g!, F::Real, x)
OnceDifferentiable(f, g!, F::Real, G, x)
OnceDifferentiable(f, g!, fg!,  F::Real, G, x)
OnceDifferentiable(f!, F::AbstractArray, x)
OnceDifferentiable(f, j!, F::AbstractArray, x)
OnceDifferentiable(f, j!, F::AbstractArray, J, x)
OnceDifferentiable(f, j!, fj!,  F::AbstractArray, J, x)

What's missing
This required some restructuring, but I actually think it works quite well. I have a PR for Optim and NLsolve as well (that I will make after this one). Those PRs work and have passing tests, but will break people's code quite hard if they use OnceDifferentiable, DifferentiableMultivariateFunction, and very much the sparse multivariates. The great thing is that they used to be called different things, so we can actually deprecate this quite smoothly!

Another positional...
... isn't that annoying? Well, no, I don't really think so. I think of these types as two things: internal mechanisms for passing around caches and objective related functions, and a mechanism for advanced users to keep caches between repeated optimization/solves. If you're really need and want that, I'm sure you can handle writing everything out.

If it isn't clear, it is not my intention that the normal user that just minimizes something once should ever see these types. We should only promote these as advanced tools. That's why I intend to bring back the autodiff keyword in Optim. My guess is that most people use OnceDifferentiable to get access to the ForwardDiff functionality, and exposing an implementation dealing just to have users do this is a mistake I made when we moved to OnceDifferentiable. So I am undoing my mistake.

** Worklist **

  • Improve README.md with description of constructors
  • Deprecate old constructors
  • Update NonDifferentiable constructors to handle reshapes (Twice as well?)
  • Should initial evaluations be handled differently?
  • Move tests here from Optim and NLsolve
  • Receive feedback (please)
  • ...
  • Rule the world

edit: I know something is weird in the OnceDifferentiable main constructor, but it's fixed.

@codecov
Copy link

codecov bot commented Dec 11, 2017

Codecov Report

Merging #27 into master will decrease coverage by 13.87%.
The diff coverage is 74.78%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #27       +/-   ##
===========================================
- Coverage   84.41%   70.54%   -13.88%     
===========================================
  Files           3        8        +5     
  Lines          77      129       +52     
===========================================
+ Hits           65       91       +26     
- Misses         12       38       +26
Impacted Files Coverage Δ
src/objective_types/twicedifferentiablehv.jl 0% <0%> (ø)
src/NLSolversBase.jl 100% <100%> (ø)
src/objective_types/twicedifferentiable.jl 100% <100%> (ø)
src/objective_types/oncedifferentiable.jl 61.53% <61.53%> (ø)
src/objective_types/nondifferentiable.jl 72.72% <72.72%> (ø)
src/objective_types/abstract.jl 72.72% <72.72%> (ø)
src/interface.jl 89.55% <92.06%> (-4.74%) ⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cee18b6...d6b8393. Read the comment docs.

@pkofod
Copy link
Member Author

pkofod commented Dec 13, 2017

At some point I had reshape-redifinitions of all the array (non-vector) input constructors just as we had in NLsolve. However, this is unfortunate if the user creates an NDifferentiableArray and wants to use that to calculate something in his or her own dimensions (pass matrix, get matrix - the previous approach would require the user to pass a vector and reshape the output).

@pkofod pkofod force-pushed the newdf branch 3 times, most recently from a7e5f05 to 2f21779 Compare December 14, 2017 19:45
@pkofod
Copy link
Member Author

pkofod commented Dec 16, 2017

Might as well rename this package NDifferentiables.jl...

edit: this is not an actual suggestion

@cortner
Copy link

cortner commented Dec 17, 2017

I like this change

@cortner
Copy link

cortner commented Dec 17, 2017

(I mean the PR not sure about renaming

f # objective
df # (partial) derivative of objective
fdf # objective and (partial) derivative of objective
F::TF # cache for f output
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should f, df, fdf have type parameters? Otherwise this causes a type instability?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for TwiceDifferentiable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could, but they weren't. It will require recompilation for each new instance which might be an annoyance if people create many anonymous functions. We could allow for both though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don't care - my objectives are very expensive. This was more targeted towards the use-case of many cheap optimisations. But you are right, if one has to recompile each time then this is not helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. It's fine to compile for say each input type (Float64, Vector ...) but I do get the impression that a lot of people are iteratively/repeatedly creating anonymous functions that they minimize in fractions of seconds (but maybe thousands of times), and this would seriously hurt them. As I said, we can have a special type for the non-default type if necessary.

Anyway, the fact that you cannot infer the output type isn't really too bad, as we always know exactly what the type of the cache variable is. This means that even if the function ends up spitting out Float32's, it'll automatically be promoted to Float64's, and the majority of the code will be compiled under the assumption that it is indeed Float64s.

@cortner
Copy link

cortner commented Dec 17, 2017

As far as I can see the type restrictions are sufficiently general to allow Optim.jl:#399 to be eventually implemented?

@cortner
Copy link

cortner commented Dec 17, 2017

Finally, if I see correctly, this still requires that I pass in f, g! rather than f, g? Probably this isn't really decided yet, cf Optim.jl:#439, but it seems a good moment to discuss it. It could be a separate PR.

@pkofod
Copy link
Member Author

pkofod commented Dec 17, 2017

As far as I can see the type restrictions are sufficiently general to allow Optim.jl:#399 to be eventually implemented?

Yes there should be quite wide input possibilities. It's only some of the convenience constructors that have restricted annotations (to capture most normal users use cases)

@pkofod
Copy link
Member Author

pkofod commented Dec 17, 2017

Finally, if I see correctly, this still requires that I pass in f, g! rather than f, g? Probably this isn't really decided yet, cf Optim.jl:#439, but it seems a good moment to discuss it. It could be a separate PR.

Yes. I didn't realize it was such a big "feature request" to have this baked in. I guess I figured people would just do the copying themselves.

@cortner
Copy link

cortner commented Dec 17, 2017

Personally, I find it quite annoying to write out the copying. It could really be hidden so easily in NLSolversBase.jl

@pkofod
Copy link
Member Author

pkofod commented Dec 17, 2017

It could really be hidden so easily in NLSolversBase.jl

Only because you (unexpectedly to me) don't use the storage G for doing in-place operations, but rather construct a new array that you then copy into G. We can make a constructor that takes care of it in the case of g not being mutating.

@cortner
Copy link

cortner commented Dec 17, 2017

Only because you (unexpectedly to me) don't use the storage G for doing in-place operations, but rather construct a new array that you then copy into G.

it just doesn't cost anything

We can make a constructor that takes care of it in the case of g not being mutating.

that's really all I am hoping for

@pkofod
Copy link
Member Author

pkofod commented Dec 17, 2017

I'm not sure it's true that allocations are always "free", but if the majority of users want simple f(x), g(x), h(x) default interfaces I'm 💯 comfortable having that as the default

@cortner
Copy link

cortner commented Dec 17, 2017

I was only referring to my use case.

@pkofod
Copy link
Member Author

pkofod commented Dec 17, 2017

I was only referring to my use case.

My bad, I read it as a general observation. But sure, I do realize that the more work being doing per call the smaller the advantage. But I agree, both versions should be easy to use.

@pkofod pkofod merged commit 0372693 into master Dec 18, 2017
@pkofod pkofod deleted the newdf branch December 18, 2017 11:43
Copy link
Contributor

@anriseth anriseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, all unified and ready to go 👍 Sorry about the late feedback...

I remember there was a discussion about caching (on a lower level) on gitter or in some issue. Is that still up for grabs?

end


function TwiceDifferentiable(f, g!, h!, x::AbstractVector, F = real(zero(eltype(x))), G = similar(x))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that our x-input needs to be a Vector if we don't supply fg!?

[0,], [0,], [0,])
end

function TwiceDifferentiableHV(f, fdf, h, x::AbstractVector{T}) where T
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractVector vs AbstractArray?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I was lazy and just wanted to move it here. It's still unexported in Optim so we can do whatever we want :) But yes, that should be generalized (and tested on the generalization!)

@pkofod
Copy link
Member Author

pkofod commented Dec 22, 2017

I remember there was a discussion about caching (on a lower level) on gitter or in some issue.

Cache of what? I can't remember the specific discussion.

@anriseth
Copy link
Contributor

Cache of what? I can't remember the specific discussion.

I believe in the sense of memoization. It was either @cortner or @ChrisRackauckas who were talking about this, but I may have misinterpreted what they were requesting.

@pkofod
Copy link
Member Author

pkofod commented Dec 22, 2017

I wonder if we can cater to people who want that in a sufficiently generic fashion, or if that should just be part of their actual objective function definition, probably easiest as a callable type. We have it in the sense of the "last" x used, but not further back ofc.

[edit]
Also, I doubt that you will often end up in the exact same x many times given the methods here.
[/edit]

I should give a small remark about the fact that I had considered splitting the caches out from the NDifferentiable functions. I do not intend to do that anymore. I tried it, and it just complicated the API too much. The NDifferentiable was basically useless without the caches, and just ended up as a glorified tuple. So my new view is that people should just ignore the NDifferentiable-stuff UNLESS they want the caches specifically.

@cortner
Copy link

cortner commented Dec 23, 2017

We have it in the sense of the "last" x used, but not further back ofc.

I think that might have been the point. If you keep more history then it would remove the need to store intermediate values in, say, linesearches. But I'm not convinced this makes code so much more readable . . .

@pkofod
Copy link
Member Author

pkofod commented Dec 23, 2017

I think that might have been the point. If you keep more history then it would remove the need to store intermediate values in, say, linesearches. But I'm not convinced this makes code so much more readable . . .

We could do something along those lines, I just wan't aware of that discussion or forgot. Shouldn't be difficult to implement, we just use a queue instead of a vector/scalar for x's and f(x)'s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants