Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamically creating parameterized objects is slow? #29887

Closed
KristofferC opened this issue Nov 1, 2018 · 2 comments
Closed

Dynamically creating parameterized objects is slow? #29887

KristofferC opened this issue Nov 1, 2018 · 2 comments
Labels
performance Must go faster

Comments

@KristofferC
Copy link
Member

Background: ForwardDiff specializes on the length of the tuples containing the partial derivatives. If one does not preallocate a "cache" this length is computed dynamically. This is apparently extremely expensive, see e.g. the PR JuliaDiff/ForwardDiff.jl#373 which dramatically improved performance of a quite complicated computation by a factor of 5x just by caching the construction of these dynamic objects.

Here is a benchmark replicating that issue

const DEFAULT_CHUNK_THRESHOLD = 10

struct Chunk{N} end

function f()
    N = rand(1:DEFAULT_CHUNK_THRESHOLD)
    return Chunk{N}()
end

const Chunks = [Chunk{i}() for i in 1:DEFAULT_CHUNK_THRESHOLD]

function f2()
    N = rand(1:DEFAULT_CHUNK_THRESHOLD)
    return Chunks[N]
end
julia> @btime f()
  3.277 μs (0 allocations: 0 bytes)
Chunk{4}()

julia> @btime f2()
  16.268 ns (0 allocations: 0 bytes)
Chunk{6}()

3.27 microseconds seems very long for this?

Secondly, it seems not all objects are equal; using a Val instead of our own Chunk:

function g()
    N = rand(1:DEFAULT_CHUNK_THRESHOLD)
    return Val{N}()
end
julia> @btime g()
  2.089 μs (0 allocations: 0 bytes)
Val{9}()

So using a Val is beneficial (1.5x speedup).

Of course, the (arguably knee jerky reaction) is to say that one shouldn't dynamically construct these things. But the trick about julia is to find a part of your computational graph that is beneficial to specalize on. If dynamic dispatch is expensive then the size of that subgraph must be much larger for this to be beneficial which is unfortunate. I feel like doing Forward mode AD with a length 10 input vector to compute a gradient should be a large enough graph to specialize on.

Perhaps this is just a dup of #21730...

@KristofferC KristofferC added the performance Must go faster label Nov 1, 2018
@JeffBezanson
Copy link
Member

Yes I suspect it is a dup of #21730. Returning Chunk{N} instead of Chunk{N}() speeds it up 10x, so the time is in the constructor call. Another quasi-cheating solution is to write Chunk{N}.instance. I think we can mostly fix #21730 and speed it up a lot, but it's unlikely to get as fast as custom code.

@KristofferC
Copy link
Member Author

Closing as dup of #21730 then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

2 participants