You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background: ForwardDiff specializes on the length of the tuples containing the partial derivatives. If one does not preallocate a "cache" this length is computed dynamically. This is apparently extremely expensive, see e.g. the PR JuliaDiff/ForwardDiff.jl#373 which dramatically improved performance of a quite complicated computation by a factor of 5x just by caching the construction of these dynamic objects.
Here is a benchmark replicating that issue
const DEFAULT_CHUNK_THRESHOLD =10struct Chunk{N} endfunctionf()
N =rand(1:DEFAULT_CHUNK_THRESHOLD)
returnChunk{N}()
endconst Chunks = [Chunk{i}() for i in1:DEFAULT_CHUNK_THRESHOLD]
functionf2()
N =rand(1:DEFAULT_CHUNK_THRESHOLD)
return Chunks[N]
end
Of course, the (arguably knee jerky reaction) is to say that one shouldn't dynamically construct these things. But the trick about julia is to find a part of your computational graph that is beneficial to specalize on. If dynamic dispatch is expensive then the size of that subgraph must be much larger for this to be beneficial which is unfortunate. I feel like doing Forward mode AD with a length 10 input vector to compute a gradient should be a large enough graph to specialize on.
Yes I suspect it is a dup of #21730. Returning Chunk{N} instead of Chunk{N}() speeds it up 10x, so the time is in the constructor call. Another quasi-cheating solution is to write Chunk{N}.instance. I think we can mostly fix #21730 and speed it up a lot, but it's unlikely to get as fast as custom code.
Background: ForwardDiff specializes on the length of the tuples containing the partial derivatives. If one does not preallocate a "cache" this length is computed dynamically. This is apparently extremely expensive, see e.g. the PR JuliaDiff/ForwardDiff.jl#373 which dramatically improved performance of a quite complicated computation by a factor of 5x just by caching the construction of these dynamic objects.
Here is a benchmark replicating that issue
3.27 microseconds seems very long for this?
Secondly, it seems not all objects are equal; using a
Val
instead of our ownChunk
:So using a
Val
is beneficial (1.5x speedup).Of course, the (arguably knee jerky reaction) is to say that one shouldn't dynamically construct these things. But the trick about julia is to find a part of your computational graph that is beneficial to specalize on. If dynamic dispatch is expensive then the size of that subgraph must be much larger for this to be beneficial which is unfortunate. I feel like doing Forward mode AD with a length 10 input vector to compute a gradient should be a large enough graph to specialize on.
Perhaps this is just a dup of #21730...
The text was updated successfully, but these errors were encountered: