Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multidimensional arrays #399

Closed
pkofod opened this issue Apr 21, 2017 · 30 comments
Closed

Multidimensional arrays #399

pkofod opened this issue Apr 21, 2017 · 30 comments

Comments

@pkofod
Copy link
Member

pkofod commented Apr 21, 2017

Fix up (L-)BFGS NelderMead, and PSO code such that we can pass multidimensional arrays as input.

@cortner
Copy link
Contributor

cortner commented May 22, 2017

How about making the input an abstract type? At the end of the day the state of the optimisation just needs to be an element of a hilbert space

@cortner
Copy link
Contributor

cortner commented May 22, 2017

Also see @ChrisRackauckas comments in #326. I really see no barrier to admitting a general abstract type as the state of optimisation - only advantages.

@pkofod
Copy link
Member Author

pkofod commented May 22, 2017

How abstract? AbstractArray?

@cortner
Copy link
Contributor

cortner commented May 22, 2017

Not even an AbstractArray. You actually need elements of a Hilbert space over some type T, i.e., you need x + y, t * x, dot(x, y) and I think that is it already. (I'm not even sure it needs to be a real Hilbert space?)

This is really not a necessity, BUT: it might actually make the code more readable, and it could have some unexpected applications. E.g., if I minimise over a space of functions, then I don't have to represent functions in subsequent optimisation steps w.r.t. the same basis.

@ChrisRackauckas
Copy link
Contributor

This is really not a necessity, BUT: it might actually make the code more readable, and it could have some unexpected applications. E.g., if I minimise over a space of functions, then I don't have to represent functions in subsequent optimisation steps w.r.t. the same basis.

Yes, ApproxFun would be a very interesting application which takes the generality to the max here.

Not even an AbstractArray. You actually need elements of a Hilbert space, i.e., you need x + y, t * x, dot(x, y) and I think that is it already.

From BFGS, it looks like you need A_mul_B!, scale!, indexing, *, +, and vecdot. So if the indexing operation was replaced with a broadcast, then this all boils down to *, +, vecdot, and broadcast overloads for the type (guaranteed by indexing, but necessarily needing indexing). I believe Fun types may already satisfy this.

@cortner
Copy link
Contributor

cortner commented May 22, 2017

everything in BFGS-type algorithms can in principle be reduced to the operations I mentioned - even if the current implementation possibly doesn't. The problem here is that that LBFGS stores the outer product x * y' instead of the operator it defines

@ChrisRackauckas
Copy link
Contributor

everything in BFGS-type algorithms can in principle be reduced to the operations I mentioned - even if the current implementation possibly doesn't.

Mathematically yes. But in reality, what I was trying to point out is that you're missing that either indexing or broadcast needs to be defined. Right now that's using indexing, hence an AbstractArray subtype. But that definitely has an easy change to broadcasting, in which case an AbstractArray isn't needed. Other than that we agree, since A_mul_B! and vecdot have generic fallbacks that use *, +, and hopefully broadcast.

@cortner
Copy link
Contributor

cortner commented May 22, 2017

and in addition CG of L-BFGS or a hessian-free TR or the various accelerated steepest descent methods really just need the small set I mentioned.

@pkofod
Copy link
Member Author

pkofod commented May 23, 2017

Okay, I'm sold. If we need to do something that sacrifices performance for arrays, we can just have tightly and loosely typed methods

@cortner
Copy link
Contributor

cortner commented May 23, 2017

is there a performance benchmark suite? I'd be quite interested to do a prototype (say, CG or LBFGS) and confirm that the strict typing we use at the moment is not needed at all.

@pkofod
Copy link
Member Author

pkofod commented May 23, 2017

is there a performance benchmark suite? I'd be quite interested to do a prototype (say, CG or LBFGS) and confirm that the strict typing we use at the moment is not needed at all.

Not tagged, but the code can be found at OptimTests.jl ..I should really get that fixed up and merged (will do after Optim is tagged)

@ChrisRackauckas
Copy link
Contributor

I would be heavily surprised if, when you did this, the machine code changed one bit. It would only change if you change the indexing to broadcasting, in which case it's already pretty well documented what the (current) cost is there.

@cortner
Copy link
Contributor

cortner commented May 23, 2017

I agree, but there is lots of stuff like this in the code:

        @simd for i in 1:n
            @inbounds state.y[i] = gradient(d, i) - state.g_previous[i]
        end

which I think is appalling, so this will also be a good opportunity to get rid of it.

@pkofod
Copy link
Member Author

pkofod commented May 23, 2017

which I think is appalling, so this will also be a good opportunity to get rid of it.

Agree, and I the time to calculate y .= gradient(d, i) .- state.g_previous would never dominate ldivs or matrix multiplications anyway.

@schlichtanders
Copy link

schlichtanders commented May 30, 2017

Stumbled upon the following (is this related?)

when trying to optimize a sum over multidimensional Rosenbrock function

f(x) = sum((1.0 .- x[1]).^2 .+ 100.0 .* (x[2] .- x[1].^2).^2)

I get the following error

optimize(f, [[12.0,1.0],[1.0,2.0]])
ERROR: MethodError: no method matching zero(::Type{Array{Float64,1}})
Closest candidates are:
  zero(::Type{Base.LibGit2.Oid}) at libgit2\oid.jl:88
  zero(::Type{Base.Pkg.Resolve.VersionWeights.VWPreBuildItem}) at pkg/resolve/versionweight.jl:80
  zero(::Type{Base.Pkg.Resolve.VersionWeights.VWPreBuild}) at pkg/resolve/versionweight.jl:120
  ...
 in initial_state(::Optim.NelderMead{Optim.AffineSimplexer,Optim.AdaptiveParameters}, ::Optim.Options{Void}, ::#f, ::Array{Array{Float64,1},1}) at C:\Users\s.sahm\.julia\v0.5\Optim\src\nelder_mead.jl:132
 in optimize(::Function, ::Array{Array{Float64,1},1}, ::Optim.NelderMead{Optim.AffineSimplexer,Optim.AdaptiveParameters}, ::Optim.Options{Void}) at C:\Users\s.sahm\.julia\v0.5\Optim\src\optimize.jl:172
 in #optimize#74(::Array{Any,1}, ::Function, ::#f, ::Array{Array{Float64,1},1}) at C:\Users\s.sahm\.julia\v0.5\Optim\src\optimize.jl:24
 in optimize(::#f, ::Array{Array{Float64,1},1}) at C:\Users\s.sahm\.julia\v0.5\Optim\src\optimize.jl:23

My final goal is to regard the subcomponents x[1], x[2], ... as completely independent, i.e. for instance x::Array{Array{T,1},1}, is this possible with Optim.jl in general?
And is this possible with autodiff in particular?

thanks a lot

@pkofod
Copy link
Member Author

pkofod commented May 30, 2017

I don't quite get what you're trying to do. For example, do you realize that your provided point cannot be evaluated by f ?

julia> f( [[12.0,1.0],[1.0,2.0]])
ERROR: DimensionMismatch("Cannot multiply two vectors")
Stacktrace:
 [1] power_by_squaring(::Array{Float64,1}, ::Int64) at ./intfuncs.jl:166
 [2] f(::Array{Array{Float64,1},1}) at ./REPL[161]:1

@ChrisRackauckas
Copy link
Contributor

Supporting arrays of arrays is a whole different ballgame. You need to make sure most internal calls will automatically be recursive. The repo

https://github.com/JuliaDiffEq/RecursiveArrayTools.jl

is the tools that are used in DiffEq to support this kind of stuff, but there's a substantial amount of "being careful with types" that has to be done to get recursive arrays working.

@pkofod
Copy link
Member Author

pkofod commented May 30, 2017

You need to make sure most internal calls will automatically be recursive.

You also need to make sure you agree what it's supposed to mean... I mean, in the example above I'm a bit confused.

@schlichtanders
Copy link

@pkofod I am sorry for the confusing, it should be
f(x) = sum((1.0 .- x[1]).^2 .+ 100.0 .* (x[2] .- x[1].^2).^2) and I now also changed it above.
The error stays the same.

I would like to have an argument of type x::Array{Array{T,1},1} supported, because when using the alternative of a single flat vector, I would need to know the shapes of my single arguments in advance. If I define a custom function, I can have multiple parameters and auto-differentiate the complete function with respect to all parameters. Such multiple independend parameters are kind of what I wanted to regain with x::Array{Array{T,1},1}.
Such a setting allows for easily changing sizes of parameters during iteration for instance.

It would be really helpful if I can have something like multiple parameters (i.e. arguments) to my optimizable function. Is this still this issue, or rather something else? Looking forward to your thoughts.

@timholy
Copy link
Contributor

timholy commented May 31, 2017

If each array in the "outer" container is of the same size (as in this example), you almost surely want to use StaticArrays and make it an Vector{SVector{L,T}}. At least the call to zero will succeed if you do that; you might have to make some additional changes, though.

@pkofod
Copy link
Member Author

pkofod commented May 31, 2017

It would be really helpful if I can have something like multiple parameters (i.e. arguments) to my optimizable function. Is this still this issue, or rather something else? Looking forward to your thoughts.

I think it is this issue, yes.

@cortner
Copy link
Contributor

cortner commented Oct 18, 2017

CC @ettersi

@pkofod
Copy link
Member Author

pkofod commented Oct 18, 2017

Does @ettersi have a use case here?

@ChrisRackauckas
Copy link
Contributor

I see I never answered about the recursive arrays. Let me be a little more clear about that.

If you do something like a .= b with arrays of arrays, then it's like a[i] = b[i] and thus the elements of the two arrays of arrays will now share references. The same happens with copy!. So you have to be very careful here. recursivecopy! from RecursiveArrayTools.jl and the VectorOfArrays broadcast implementation take care of these kinds of issues. What @timholy suggests is another way of handling it since SArrays are immutable and thus you don't need recursion because a[i] = b[i] will set the value of a[i] to the static array value of b[i] and not copy a reference. So in short, if everything is generic enough, Vector{SVector{L,T}} should already work without any changes, while Vector{Vector{T}} would need a few extra changes but the tools to do this are already created an well-tested.

I'm not sure if the full mutability makes too much sense for an optimization problem, diffeq does it because things "change over time" but I'm not sure that has much meaning here. But it would be something nice to handle correctly down the road.

As for other array types, things like GPUArrays (and support for internal multithreading via JLArrays) comes free if you A_mul_B! and change your loops to broadcasting as above. In DiffEq we found this has essentially no cost as long as you avoid JuliaLang/julia#22255 , so I would recommend just changing every element-wise loop whenever you have time. Looking at the code, it shouldn't be too difficult to get this all working with GPUArrays, which is nice because it's different than what other packages have. The "marketing" is like this. Most optimization packages allow you to "use" GPUs because you can GPU the call to the objective function. However, when setting this up the user has to always write a step that updates the parameters on the GPU given the update from the optimizer, i.e. there is a GPU load/unload required. But if you setup everything to work with GPUArrays, then internally everything important from the optimizer can stay on the GPU, leading to an optimizer which compiles itself to be "copy-free" and require no extra transferring arrays to/from the GPU. I think this is a huge win and something that can be really hammered home in a blog post, and I could see that getting a lot of interest.

One thing that is missing from the last point though is that GPUArrays don't fully support matrix factorizations and \ yet, so you may want to do all of these changes but then only test it out for now on gradient decent.

I can start over the weekend helping get this all setup. Maybe if you guys are also setup with GPUArrays my "please help me get factorizations working with GPUArrays!" call will be stronger.

@ettersi
Copy link

ettersi commented Oct 18, 2017

My use-case is optimising an (approximate) lattice of atoms, which I store as a Matrix{SVector{2,Float64}}. Currently, when I call

optimize(
         total_energyy, 
         forces!, 
         y, 
         ConjugateGradient()
        )

I get the following error:

ERROR: AssertionError: typeof(f_x) == T
 in initial_state(::Optim.ConjugateGradient{Void,Optim.##29#31,LineSearches.#hagerzhang!}, ::Optim.Options{Void}, ::Optim.OnceDifferentiable, ::Array{StaticArrays.SVector{2,Float64},2}) at /home/simon/.julia/v0.5/Optim/src/cg.jl:105
 in optimize(::Optim.OnceDifferentiable, ::Array{StaticArrays.SVector{2,Float64},2}, ::Optim.ConjugateGradient{Void,Optim.##29#31,LineSearches.#hagerzhang!}, ::Optim.Options{Void}) at /home/simon/.julia/v0.5/Optim/src/optimize.jl:172
 in optimize(::T.##6#9{CartesianIndex{2}}, ::T.##7#10{CartesianIndex{2}}, ::Array{StaticArrays.SVector{2,Float64},2}, ::Optim.ConjugateGradient{Void,Optim.##29#31,LineSearches.#hagerzhang!}) at /home/simon/.julia/v0.5/Optim/src/optimize.jl:58

I haven't really dug into the details of this error yet, though. It could well be that there is some subtle issue in my code, or that it's an issue that I am still on 0.5.

@pkofod
Copy link
Member Author

pkofod commented Oct 18, 2017

Being on v05 will at least mean that you're not getting the latest updates, but supporting static arrray types are high on my priority list

@ChrisRackauckas
Copy link
Contributor

Vectors of Static Arrays will have other issues though, since indexing gets you one static array each time, so defining a Jacobian and factorizing it won't work very well making it work with some algorithms but not too many. You'll want to wrap the Vector{SArray} in something like RecursiveArrayTools' VectorOfArray to really fix that (because then it will have proper indexing, so you can vec to get a view of the whole thing as a Vector and treat it likewise), unless Optim does a bunch of special handling.

@ettersi
Copy link

ettersi commented Oct 19, 2017

Solved the problem using reinterpret. Works kind of all right, although a solution without this extra typing would of course be nice. But I understand that this is open source code and if I want a feature I would have to do it myself.

@pkofod
Copy link
Member Author

pkofod commented Oct 19, 2017

You don't necessarily have to do it yourself. A precise description of what you want goes a long way :)

@pkofod
Copy link
Member Author

pkofod commented Aug 18, 2020

Fixed the original issue, if anything we can do comes up we can open a new issue.

@pkofod pkofod closed this as completed Aug 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants