Dynamically creating parameterized objects is slow? #29887

KristofferC · 2018-11-01T15:31:35Z

Background: ForwardDiff specializes on the length of the tuples containing the partial derivatives. If one does not preallocate a "cache" this length is computed dynamically. This is apparently extremely expensive, see e.g. the PR JuliaDiff/ForwardDiff.jl#373 which dramatically improved performance of a quite complicated computation by a factor of 5x just by caching the construction of these dynamic objects.

Here is a benchmark replicating that issue

const DEFAULT_CHUNK_THRESHOLD = 10

struct Chunk{N} end

function f()
    N = rand(1:DEFAULT_CHUNK_THRESHOLD)
    return Chunk{N}()
end

const Chunks = [Chunk{i}() for i in 1:DEFAULT_CHUNK_THRESHOLD]

function f2()
    N = rand(1:DEFAULT_CHUNK_THRESHOLD)
    return Chunks[N]
end

julia> @btime f()
  3.277 μs (0 allocations: 0 bytes)
Chunk{4}()

julia> @btime f2()
  16.268 ns (0 allocations: 0 bytes)
Chunk{6}()

3.27 microseconds seems very long for this?

Secondly, it seems not all objects are equal; using a Val instead of our own Chunk:

function g()
    N = rand(1:DEFAULT_CHUNK_THRESHOLD)
    return Val{N}()
end

julia> @btime g()
  2.089 μs (0 allocations: 0 bytes)
Val{9}()

So using a Val is beneficial (1.5x speedup).

Of course, the (arguably knee jerky reaction) is to say that one shouldn't dynamically construct these things. But the trick about julia is to find a part of your computational graph that is beneficial to specalize on. If dynamic dispatch is expensive then the size of that subgraph must be much larger for this to be beneficial which is unfortunate. I feel like doing Forward mode AD with a length 10 input vector to compute a gradient should be a large enough graph to specialize on.

Perhaps this is just a dup of #21730...

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2018-11-01T17:14:23Z

Yes I suspect it is a dup of #21730. Returning Chunk{N} instead of Chunk{N}() speeds it up 10x, so the time is in the constructor call. Another quasi-cheating solution is to write Chunk{N}.instance. I think we can mostly fix #21730 and speed it up a lot, but it's unlikely to get as fast as custom code.

KristofferC · 2020-05-04T21:15:57Z

Closing as dup of #21730 then.

KristofferC added the performance Must go faster label Nov 1, 2018

KristofferC mentioned this issue Nov 2, 2018

A few performance improvements JuliaIO/JSON.jl#263

Merged

KristofferC closed this as completed May 4, 2020

KristofferC mentioned this issue Jun 22, 2020

Performance of value type construction #36384

Closed

mtfishman mentioned this issue Mar 16, 2021

Remove size type parameter from ITensor and IndexSet [WIP] ITensor/ITensors.jl#591

Merged

This was referenced Aug 15, 2024

make Chunk(::StaticArray) type stable by default JuliaDiff/ForwardDiff.jl#707

Closed

Stop precomputing chunk sizes JuliaDiff/ForwardDiff.jl#708

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamically creating parameterized objects is slow? #29887

Dynamically creating parameterized objects is slow? #29887

KristofferC commented Nov 1, 2018

JeffBezanson commented Nov 1, 2018

KristofferC commented May 4, 2020

Dynamically creating parameterized objects is slow? #29887

Dynamically creating parameterized objects is slow? #29887

Comments

KristofferC commented Nov 1, 2018

JeffBezanson commented Nov 1, 2018

KristofferC commented May 4, 2020