-
-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected allocations with SVectors #518
Comments
I wrote another benchmark comparing the integrator interface with the solve one and if I didn't make any mistakes I have showed that the problem is also present with the integrator interface. The relevant part from the above: @btime solve($prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false);
@btime solve($prob2, DPRKN12(), abstol=1e-14, reltol=1e-14, save_everystep=false);
@btime solve($prob2, KahanLi8(), dt=1e-2, maxiters=1e10, save_everystep=false);
# 789.911 μs (38974 allocations: 1.21 MiB)
# 355.283 μs (17439 allocations: 463.05 KiB)
# 1.438 ms (108191 allocations: 2.76 MiB)
function integ_benchmark(prob; args...)
integ = init(prob; args...)
while integ.t < prob.tspan[2]
step!(integ)
end
end
@btime integ_benchmark($prob1, alg=Vern9(), abstol=1e-14, reltol=1e-14)
@btime integ_benchmark($prob2, alg=DPRKN12(), abstol=1e-14, reltol=1e-14)
@btime integ_benchmark($prob2, alg=KahanLi8(), dt=1e-2, maxiters=1e10)
# 897.680 μs (40428 allocations: 1.55 MiB)
# 379.758 μs (17909 allocations: 536.55 KiB)
# 1.563 ms (111196 allocations: 3.06 MiB)
tspan = (0., 100.)
prob1 = ODEProblem(ż, z0, tspan, p)
prob2 = DynamicalODEProblem(ṗ, q̇, p0, q0, tspan, p)
@btime solve($prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false);
@btime solve($prob2, DPRKN12(), abstol=1e-14, reltol=1e-14, save_everystep=false);
@btime solve($prob2, KahanLi8(), dt=1e-2, maxiters=1e10, save_everystep=false);
# 7.940 ms (386014 allocations: 11.80 MiB)
# 3.480 ms (173907 allocations: 4.43 MiB)
# 17.513 ms (1080083 allocations: 27.47 MiB)
@btime integ_benchmark($prob1, alg=Vern9(), abstol=1e-14, reltol=1e-14)
@btime integ_benchmark($prob2, alg=DPRKN12(), abstol=1e-14, reltol=1e-14)
@btime integ_benchmark($prob2, alg=KahanLi8(), dt=1e-2, maxiters=1e10)
# 8.980 ms (400491 allocations: 15.50 MiB)
# 3.749 ms (178553 allocations: 4.83 MiB)
# 18.589 ms (1110082 allocations: 29.61 MiB) Note that those benchmarks were done on a different (slower) machine compared with the first ones julia> versioninfo()
Julia Version 1.0.1
Commit 0d713926f8 (2018-09-29 19:05 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, haswell) I am not sure why the benchmarks for the integrator interface give slower timings than the ones for the solve interface. I hope that the benchmark function did not introduce any (grave) performance problems. Edit: updated the link to point to the relevant file version. I tried some modifications (see master), but I am not sure if I got it right. |
I opened Julia with julia> @timev ż(z0, p, 1.);
0.000006 seconds (5 allocations: 208 bytes)
elapsed time (ns): 6038
bytes allocated: 208
pool allocs: 5 |
If changing the timespan doesn't change the allocations in |
I added another set of benchmarks with the With @btime init($prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false)
@btime step_integ(integ1, $tspan[2]) setup=(integ1=init($prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false))
@btime init($prob2, DPRKN12(), abstol=1e-14, reltol=1e-14, save_everystep=false)
@btime step_integ(integ2, $tspan[2]) setup=(integ2=init($prob2, DPRKN12(), abstol=1e-14, reltol=1e-14, save_everystep=false))
@btime init($prob2, KahanLi8(), dt=1e-2, maxiters=1e10, save_everystep=false)
@btime step_integ(integ3, $tspan[2]) setup=(integ3=init($prob2, KahanLi8(), dt=1e-2, maxiters=1e10, save_everystep=false))
# 4.335 μs (88 allocations: 18.45 KiB)
# 3.354 μs (223 allocations: 6.98 KiB)
# 4.258 μs (95 allocations: 11.13 KiB)
# 1.114 μs (69 allocations: 1.80 KiB)
# 2.825 μs (79 allocations: 6.03 KiB)
# 120.168 μs (10810 allocations: 281.53 KiB) and when I increase to @btime init($prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false)
@btime step_integ(integ1, $tspan[2]) setup=(integ1=init($prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false))
@btime init($prob2, DPRKN12(), abstol=1e-14, reltol=1e-14, save_everystep=false)
@btime step_integ(integ2, $tspan[2]) setup=(integ2=init($prob2, DPRKN12(), abstol=1e-14, reltol=1e-14, save_everystep=false))
@btime init($prob2, KahanLi8(), dt=1e-2, maxiters=1e10, save_everystep=false)
@btime step_integ(integ3, $tspan[2]) setup=(integ3=init($prob2, KahanLi8(), dt=1e-2, maxiters=1e10, save_everystep=false))
# 4.384 μs (86 allocations: 18.34 KiB)
# 5.858 ms (385920 allocations: 11.78 MiB)
# 4.300 μs (94 allocations: 11.03 KiB)
# 308.408 μs (19312 allocations: 502.92 KiB)
# 2.825 μs (78 allocations: 5.94 KiB)
# 14.674 ms (1080000 allocations: 27.47 MiB) What I find strange is that in the first case the timings are suspiciously small compared with the rest of the benchmarks and in the second case they explode in the case of |
Okay, that rules out the possibility of something wrong in init. We'd need to check the stepping and the derivative function. There is a possibility that stepping has a problem like JuliaLang/julia#22255 |
using OrdinaryDiffEq
using StaticArrays
using BenchmarkTools
using Profile
@inline function ż(z, p, t)
@inbounds begin
@assert z isa SVector
A, B, D = p
p₀, p₂ = z[1], z[2]
q₀, q₂ = z[3], z[4]
return SVector{4}(
-A * q₀ - 3 * B / √2 * (q₂^2 - q₀^2) - D * q₀ * (q₀^2 + q₂^2),
-q₂ * (A + 3 * √2 * B * q₀ + D * (q₀^2 + q₂^2)),
A * p₀,
A * p₂
)
end
end
q0 = SVector{2}([0.0, -4.363920590485035])
p0 = SVector{2}([10.923918825236079, -5.393598858645495])
z0 = vcat(p0, q0)
p = (A=1,B=0.55,D=0.4)
tspan = (0., 1000.)
prob1 = ODEProblem(ż, z0, tspan, p)
solve(prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false);
@timev solve(prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false);
Profile.clear_malloc_data()
solve(prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false);
exit() When I set Coverage.MallocInfo(16, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 276)
Coverage.MallocInfo(16, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 277)
Coverage.MallocInfo(16, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 281)
Coverage.MallocInfo(32, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 278)
Coverage.MallocInfo(80, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 174)
Coverage.MallocInfo(80, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 404)
Coverage.MallocInfo(96, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 378)
Coverage.MallocInfo(240, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 147)
Coverage.MallocInfo(320, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 116)
Coverage.MallocInfo(336, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 178)
Coverage.MallocInfo(400, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 6)
Coverage.MallocInfo(480, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/integrators/type.jl.mem", 2)
Coverage.MallocInfo(624, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/perform_step/verner_rk_perform_step.jl.mem", 640)
Coverage.MallocInfo(848, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 62)
Coverage.MallocInfo(3264, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/alg_utils.jl.mem", 361)
Coverage.MallocInfo(3328, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/interp_func.jl.mem", 4)
Coverage.MallocInfo(7232, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/integrators/type.jl.mem", 130) , with Coverage.MallocInfo(16, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 276)
Coverage.MallocInfo(16, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 277)
Coverage.MallocInfo(16, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 281)
Coverage.MallocInfo(32, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 278)
Coverage.MallocInfo(80, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 174)
Coverage.MallocInfo(80, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 404)
Coverage.MallocInfo(96, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 378)
Coverage.MallocInfo(240, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 147)
Coverage.MallocInfo(320, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 116)
Coverage.MallocInfo(336, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 178)
Coverage.MallocInfo(400, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 6)
Coverage.MallocInfo(480, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/integrators/type.jl.mem", 2)
Coverage.MallocInfo(624, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/perform_step/verner_rk_perform_step.jl.mem", 640)
Coverage.MallocInfo(848, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 62)
Coverage.MallocInfo(3264, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/alg_utils.jl.mem", 361)
Coverage.MallocInfo(3328, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/interp_func.jl.mem", 4)
Coverage.MallocInfo(7232, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/integrators/type.jl.mem", 130) They are exactly identical. So, there isn't anything extra allocated with a longer time span within the |
How are the allocations in |
@YingboMa , if there's allocations in |
I started investigating this again and I observed something quite strange. For a simpler user function (lorenz) the problem disappears. using OrdinaryDiffEq
using BenchmarkTools
using StaticArrays
function lorenz(u,p,t)
du1 = 10.0*(u[2]-u[1])
du2 = u[1]*(28.0-u[3]) - u[2]
du3 = u[1]*u[2] - (8/3)*u[3]
return SVector{3}(du1,du2,du3)
end
@inbounds @inline function ż(z, p, t)
A, B, D = p
p₀, p₂ = z[SVector{2}(1:2)]
q₀, q₂ = z[SVector{2}(3:4)]
return SVector{4}(
-A * q₀ - 3 * B / √2 * (q₂^2 - q₀^2) - D * q₀ * (q₀^2 + q₂^2),
-q₂ * (A + 3 * √2 * B * q₀ + D * (q₀^2 + q₂^2)),
A * p₀,
A * p₂
)
end
u0 = @SVector [1.0,0.0,0.0]
u = vcat(u0,u0)
p = (A=1,B=0.55,D=0.4)
q0 = SVector{2}([0.0, -4])
p0 = SVector{2}([10, -5])
z0 = vcat(p0, q0)
tspan1 = (0.0,10.0)
prob1_ok = ODEProblem(lorenz,u0,tspan1)
prob1_notok = ODEProblem(ż,z0,tspan1,p)
@btime solve($prob1_ok, Vern9(), save_everystep=false) # 49.999 μs (40 allocations: 9.89 KiB)
@btime solve($prob1_notok, Vern9(), save_everystep=false) # 58.199 μs (3170 allocations: 107.92 KiB)
tspan2 = (0.0,100.0)
prob2_ok = ODEProblem(lorenz,u0,tspan2)
prob2_notok = ODEProblem(ż,z0,tspan2,p)
@btime solve($prob2_ok, Vern9(), save_everystep=false) # 543.900 μs (40 allocations: 9.89 KiB)
@btime solve($prob2_notok, Vern9(), save_everystep=false) # 450.700 μs (25810 allocations: 815.42 KiB) |
I tried a couple of other functions and it looks like it's somehow related to how complicated the user function is (number of operations?). The Henon system is not sufficient to trigger the problem henon(z, p, t) = SVector(
-z[3] * (1 + 2z[4]),
-z[4] - (z[3]^2 - z[4]^2),
z[1],
z[2]
) but if I extend lorenz or henon by writing the equations twice, I can reproduce the problem. For extending I used something like this
and I don't think it introduces problems. |
Indeed, extending doesn't impact allocations. Using function lorenz2(u,p,t)
du1 = 10.0*(u[2]-u[1])
du2 = u[1]*(28.0-u[3]) - u[2]
du3 = u[1]*u[2] - (8/3)*u[3]
du4 = 10.0*(u[2+3]-u[1+3])
du5 = u[1+3]*(28.0-u[3+3]) - u[2+3]
du6 = u[1+3]*u[2+3] - (8/3)*u[3+3]
return SVector{6}(du1,du2,du3,du4,du5,du6)
end yields the same number of allocations (and reproduces the problem). Since the u0 = @SVector [1.0,0.0,0.0]
u = vcat(u0,u0)
tspan1 = (0.0,10.0)
prob1_2lm = ODEProblem(lorenz2,u,tspan1)
@btime solve($prob1_2lm, Vern9(), save_everystep=false)
# @inline 49.900 μs (200 allocations: 16.69 KiB)
# @noinline 53.000 μs (200 allocations: 16.69 KiB)
tspan2 = (0.0,100.0)
prob2_2lm = ODEProblem(lorenz2,u,tspan2)
@btime solve($prob2_2lm, Vern9(), save_everystep=false)
# @inline 536.101 μs (1796 allocations: 79.03 KiB)
# @noinline 571.701 μs (1796 allocations: 79.03 KiB) |
@YingboMa I updated your script above to using StaticArrays
using Profile
using BenchmarkTools
using OrdinaryDiffEq
function lorenz(u,p,t)
du1 = 10.0*(u[2]-u[1])
du2 = u[1]*(28.0-u[3]) - u[2]
du3 = u[1]*u[2] - (8/3)*u[3]
return SVector{3}(du1,du2,du3)
end
function lorenz2(u,p,t)
du1 = 10.0*(u[2]-u[1])
du2 = u[1]*(28.0-u[3]) - u[2]
du3 = u[1]*u[2] - (8/3)*u[3]
du4 = 10.0*(u[2+3]-u[1+3])
du5 = u[1+3]*(28.0-u[3+3]) - u[2+3]
du6 = u[1+3]*u[2+3] - (8/3)*u[3+3]
return SVector{6}(du1,du2,du3,du4,du5,du6)
end
const u0 = @SVector [1.0,0.0,0.0]
const u = vcat(u0,u0)
const tspan = (0.0,10.0)
# const prob = ODEProblem(lorenz,u0,tspan)
const prob = ODEProblem(lorenz2,u,tspan)
@time solve(prob, Tsit5(), save_everystep=false);
@time solve(prob, Tsit5(), save_everystep=false);
Profile.clear_malloc_data()
@timev solve(prob, Tsit5(), save_everystep=false);
exit() and tried to debug with |
Linear increase of allocations is worrisome. Where are all of the other allocs? That gist only displays 144 of like 700 |
I used the Coverage script and checked the top 5 memory allocation spots and I think I found something. The error estimator has a linear increase in allocations. This contradicts @YingboMa 's earlier post, so I would appreciate if someone could try to replicate my findings in order to double check and make sure I didn't miss something. TLDR:
with
|
Looks like it's fixed by SciML/DiffEqBase.jl#348 |
I noticed that with out-of-place integration with
SVector
s the allocations increase with the integration time withsave_everystep=false
. MWE:and increasing the integration time
I also included the timings for the full solution for comparison.
(See http://nbviewer.jupyter.org/github/SebastianM-C/Benchmarks/blob/master/parallel.ipynb?flush_cache=true for more details)
The text was updated successfully, but these errors were encountered: