Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected allocations with SVectors #518

Closed
SebastianM-C opened this issue Oct 7, 2018 · 15 comments · Fixed by SciML/DiffEqBase.jl#348
Closed

Unexpected allocations with SVectors #518

SebastianM-C opened this issue Oct 7, 2018 · 15 comments · Fixed by SciML/DiffEqBase.jl#348

Comments

@SebastianM-C
Copy link
Contributor

I noticed that with out-of-place integration with SVectors the allocations increase with the integration time with save_everystep=false. MWE:

using DifferentialEquations
using StaticArrays
using BenchmarkTools

@inbounds @inline function (z, p, t)
    A, B, D = p
    p₀, p₂ = z[SVector{2}(1:2)]
    q₀, q₂ = z[SVector{2}(3:4)]

    return SVector{4}(
        -A * q₀ - 3 * B / 2 * (q₂^2 - q₀^2) - D * q₀ * (q₀^2 + q₂^2),
        -q₂ * (A + 3 * 2 * B * q₀ + D * (q₀^2 + q₂^2)),
        A * p₀,
        A * p₂
    )
end

@inbounds @inline function (p, q, params, t)
    A, B, D = params
    dp1 = -A * q[1] - 3 * B / 2 * (q[2]^2 - q[1]^2) - D * q[1] * (q[1]^2 + q[2]^2)
    dp2 = -q[2] * (A + 3 * 2 * B * q[1] + D * (q[1]^2 + q[2]^2))
    return SVector{2}(dp1, dp2)
end

@inbounds @inline function (p, q, params, t)
    params.A * p
end

q0 = SVector{2}([0.0, -4.363920590485035])
p0 = SVector{2}([10.923918825236079, -5.393598858645495])
z0 = vcat(p0, q0)
p = (A=1,B=0.55,D=0.4)

tspan = (0., 10.)

prob1 = ODEProblem(ż, z0, tspan, p)
prob2 = DynamicalODEProblem(ṗ, q̇, p0, q0, tspan, p)

@btime solve($prob1, Vern9(), abstol=1e-14, reltol=1e-14);
@btime solve($prob2, DPRKN12(), abstol=1e-14, reltol=1e-14);
@btime solve($prob2, KahanLi8(), dt=1e-2, maxiters=1e10);

# 629.188 μs (39944 allocations: 1.55 MiB)
# 276.165 μs (17752 allocations: 534.27 KiB)
# 1.291 ms (110195 allocations: 3.05 MiB)

@btime solve($prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false);
@btime solve($prob2, DPRKN12(), abstol=1e-14, reltol=1e-14, save_everystep=false);
@btime solve($prob2, KahanLi8(), dt=1e-2, maxiters=1e10, save_everystep=false);

# 592.659 μs (38972 allocations: 1.20 MiB)
# 264.475 μs (17438 allocations: 462.94 KiB)
# 1.223 ms (108190 allocations: 2.76 MiB)

and increasing the integration time

tspan = (0., 100.)
prob1 = ODEProblem(ż, z0, tspan, p)
prob2 = DynamicalODEProblem(ṗ, q̇, p0, q0, tspan, p)

@btime solve($prob1, Vern9(), abstol=1e-14, reltol=1e-14);
@btime solve($prob2, DPRKN12(), abstol=1e-14, reltol=1e-14);
@btime solve($prob2, KahanLi8(), dt=1e-2, maxiters=1e10);

# 6.333 ms (395669 allocations: 15.43 MiB)
# 2.758 ms (177005 allocations: 4.80 MiB)
# 15.447 ms (1100082 allocations: 29.46 MiB)

@btime solve($prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false);
@btime solve($prob2, DPRKN12(), abstol=1e-14, reltol=1e-14, save_everystep=false);
@btime solve($prob2, KahanLi8(), dt=1e-2, maxiters=1e10, save_everystep=false);


@btime solve($prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false);
@btime solve($prob2, DPRKN12(), abstol=1e-14, reltol=1e-14, save_everystep=false);
@btime solve($prob2, KahanLi8(), dt=1e-2, maxiters=1e10, save_everystep=false);

# 5.910 ms (386012 allocations: 11.80 MiB)
# 2.650 ms (173906 allocations: 4.43 MiB)
# 14.698 ms (1080082 allocations: 27.47 MiB)

I also included the timings for the full solution for comparison.
(See http://nbviewer.jupyter.org/github/SebastianM-C/Benchmarks/blob/master/parallel.ipynb?flush_cache=true for more details)

@SebastianM-C
Copy link
Contributor Author

SebastianM-C commented Oct 8, 2018

I wrote another benchmark comparing the integrator interface with the solve one and if I didn't make any mistakes I have showed that the problem is also present with the integrator interface.
See: https://github.com/SebastianM-C/Benchmarks/blob/e808d3d6f3e049627f0fa117795fa1cf4aff1823/integ_vs_solve.ipynb

The relevant part from the above:

@btime solve($prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false);
@btime solve($prob2, DPRKN12(), abstol=1e-14, reltol=1e-14, save_everystep=false);
@btime solve($prob2, KahanLi8(), dt=1e-2, maxiters=1e10, save_everystep=false);
# 789.911 μs (38974 allocations: 1.21 MiB)
# 355.283 μs (17439 allocations: 463.05 KiB)
# 1.438 ms (108191 allocations: 2.76 MiB)

function integ_benchmark(prob; args...)
    integ = init(prob; args...)
    while integ.t < prob.tspan[2]
        step!(integ)
    end
end

@btime integ_benchmark($prob1, alg=Vern9(), abstol=1e-14, reltol=1e-14)
@btime integ_benchmark($prob2, alg=DPRKN12(), abstol=1e-14, reltol=1e-14)
@btime integ_benchmark($prob2, alg=KahanLi8(), dt=1e-2, maxiters=1e10)
# 897.680 μs (40428 allocations: 1.55 MiB)
# 379.758 μs (17909 allocations: 536.55 KiB)
# 1.563 ms (111196 allocations: 3.06 MiB)

tspan = (0., 100.)
prob1 = ODEProblem(ż, z0, tspan, p)
prob2 = DynamicalODEProblem(ṗ, q̇, p0, q0, tspan, p)

@btime solve($prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false);
@btime solve($prob2, DPRKN12(), abstol=1e-14, reltol=1e-14, save_everystep=false);
@btime solve($prob2, KahanLi8(), dt=1e-2, maxiters=1e10, save_everystep=false);
# 7.940 ms (386014 allocations: 11.80 MiB)
# 3.480 ms (173907 allocations: 4.43 MiB)
# 17.513 ms (1080083 allocations: 27.47 MiB)

@btime integ_benchmark($prob1, alg=Vern9(), abstol=1e-14, reltol=1e-14)
@btime integ_benchmark($prob2, alg=DPRKN12(), abstol=1e-14, reltol=1e-14)
@btime integ_benchmark($prob2, alg=KahanLi8(), dt=1e-2, maxiters=1e10)
# 8.980 ms (400491 allocations: 15.50 MiB)
# 3.749 ms (178553 allocations: 4.83 MiB)
# 18.589 ms (1110082 allocations: 29.61 MiB)

Note that those benchmarks were done on a different (slower) machine compared with the first ones

julia> versioninfo()
Julia Version 1.0.1
Commit 0d713926f8 (2018-09-29 19:05 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, haswell)

I am not sure why the benchmarks for the integrator interface give slower timings than the ones for the solve interface. I hope that the benchmark function did not introduce any (grave) performance problems.

Edit: updated the link to point to the relevant file version. I tried some modifications (see master), but I am not sure if I got it right.

@YingboMa
Copy link
Member

YingboMa commented Oct 9, 2018

I opened Julia with --track-allocation=user --inline=no to analyze this issue, however, the only allocation that I found was produced by the derivative function itself.

julia> @timev (z0, p, 1.);
  0.000006 seconds (5 allocations: 208 bytes)
elapsed time (ns): 6038
bytes allocated:   208
pool allocs:       5

@ChrisRackauckas
Copy link
Member

If changing the timespan doesn't change the allocations in init when doing save_everystep=false, then I don't know if there's anything we can point to? We should check this.

@SebastianM-C
Copy link
Contributor Author

I added another set of benchmarks with the init and step! separated. I am not sure if I didn't make any mistakes since I get some strange results.
See https://github.com/SebastianM-C/Benchmarks/blob/694db1bef9340c12b292714691ce74d8425dac42/integ_vs_solve.ipynb

With tspan = (0.,10.) I get

@btime init($prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false)
@btime step_integ(integ1, $tspan[2]) setup=(integ1=init($prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false))
@btime init($prob2, DPRKN12(), abstol=1e-14, reltol=1e-14, save_everystep=false)
@btime step_integ(integ2, $tspan[2]) setup=(integ2=init($prob2, DPRKN12(), abstol=1e-14, reltol=1e-14, save_everystep=false))
@btime init($prob2, KahanLi8(), dt=1e-2, maxiters=1e10, save_everystep=false)
@btime step_integ(integ3, $tspan[2]) setup=(integ3=init($prob2, KahanLi8(), dt=1e-2, maxiters=1e10, save_everystep=false))
# 4.335 μs (88 allocations: 18.45 KiB)
# 3.354 μs (223 allocations: 6.98 KiB)
# 4.258 μs (95 allocations: 11.13 KiB)
# 1.114 μs (69 allocations: 1.80 KiB)
# 2.825 μs (79 allocations: 6.03 KiB)
# 120.168 μs (10810 allocations: 281.53 KiB)

and when I increase to tspan=(0.,100.)

@btime init($prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false)
@btime step_integ(integ1, $tspan[2]) setup=(integ1=init($prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false))
@btime init($prob2, DPRKN12(), abstol=1e-14, reltol=1e-14, save_everystep=false)
@btime step_integ(integ2, $tspan[2]) setup=(integ2=init($prob2, DPRKN12(), abstol=1e-14, reltol=1e-14, save_everystep=false))
@btime init($prob2, KahanLi8(), dt=1e-2, maxiters=1e10, save_everystep=false)
@btime step_integ(integ3, $tspan[2]) setup=(integ3=init($prob2, KahanLi8(), dt=1e-2, maxiters=1e10, save_everystep=false))
# 4.384 μs (86 allocations: 18.34 KiB)
# 5.858 ms (385920 allocations: 11.78 MiB)
# 4.300 μs (94 allocations: 11.03 KiB)
# 308.408 μs (19312 allocations: 502.92 KiB)
# 2.825 μs (78 allocations: 5.94 KiB)
# 14.674 ms (1080000 allocations: 27.47 MiB)

What I find strange is that in the first case the timings are suspiciously small compared with the rest of the benchmarks and in the second case they explode in the case of Vern9.

@ChrisRackauckas
Copy link
Member

ChrisRackauckas commented Oct 10, 2018

Okay, that rules out the possibility of something wrong in init. We'd need to check the stepping and the derivative function. There is a possibility that stepping has a problem like JuliaLang/julia#22255

@YingboMa
Copy link
Member

using OrdinaryDiffEq
using StaticArrays
using BenchmarkTools
using Profile

@inline function (z, p, t)
    @inbounds begin
        @assert z isa SVector
        A, B, D = p
        p₀, p₂ = z[1], z[2]
        q₀, q₂ = z[3], z[4]

        return SVector{4}(
            -A * q₀ - 3 * B / 2 * (q₂^2 - q₀^2) - D * q₀ * (q₀^2 + q₂^2),
            -q₂ * (A + 3 * 2 * B * q₀ + D * (q₀^2 + q₂^2)),
            A * p₀,
            A * p₂
        )
    end
end

q0 = SVector{2}([0.0, -4.363920590485035])
p0 = SVector{2}([10.923918825236079, -5.393598858645495])
z0 = vcat(p0, q0)
p = (A=1,B=0.55,D=0.4)

tspan = (0., 1000.)

prob1 = ODEProblem(ż, z0, tspan, p)

solve(prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false);
@timev solve(prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false);
Profile.clear_malloc_data()
solve(prob1, Vern9(), abstol=1e-14, reltol=1e-14, save_everystep=false);
exit()

When I set tspan = (0., 100.) I got the allocation info

 Coverage.MallocInfo(16, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 276)
 Coverage.MallocInfo(16, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 277)
 Coverage.MallocInfo(16, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 281)
 Coverage.MallocInfo(32, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 278)
 Coverage.MallocInfo(80, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 174)
 Coverage.MallocInfo(80, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 404)
 Coverage.MallocInfo(96, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 378)
 Coverage.MallocInfo(240, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 147)
 Coverage.MallocInfo(320, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 116)
 Coverage.MallocInfo(336, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 178)
 Coverage.MallocInfo(400, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 6)
 Coverage.MallocInfo(480, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/integrators/type.jl.mem", 2)
 Coverage.MallocInfo(624, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/perform_step/verner_rk_perform_step.jl.mem", 640)
 Coverage.MallocInfo(848, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 62)
 Coverage.MallocInfo(3264, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/alg_utils.jl.mem", 361)
 Coverage.MallocInfo(3328, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/interp_func.jl.mem", 4)
 Coverage.MallocInfo(7232, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/integrators/type.jl.mem", 130)

, with tspan=(0., 1000.), I got

 Coverage.MallocInfo(16, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 276)
 Coverage.MallocInfo(16, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 277)
 Coverage.MallocInfo(16, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 281)
 Coverage.MallocInfo(32, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 278)
 Coverage.MallocInfo(80, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 174)
 Coverage.MallocInfo(80, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 404)
 Coverage.MallocInfo(96, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 378)
 Coverage.MallocInfo(240, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 147)
 Coverage.MallocInfo(320, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 116)
 Coverage.MallocInfo(336, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 178)
 Coverage.MallocInfo(400, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 6)
 Coverage.MallocInfo(480, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/integrators/type.jl.mem", 2)
 Coverage.MallocInfo(624, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/perform_step/verner_rk_perform_step.jl.mem", 640)
 Coverage.MallocInfo(848, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/solve.jl.mem", 62)
 Coverage.MallocInfo(3264, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/alg_utils.jl.mem", 361)
 Coverage.MallocInfo(3328, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/interp_func.jl.mem", 4)
 Coverage.MallocInfo(7232, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/integrators/type.jl.mem", 130)

They are exactly identical.

So, there isn't anything extra allocated with a longer time span within the OrdinaryDiffEq.jl package. Closing.

@ChrisRackauckas
Copy link
Member

How are the allocations in Coverage.MallocInfo(624, "/home/scheme/.julia/dev/OrdinaryDiffEq/src/perform_step/verner_rk_perform_step.jl.mem", 640) not dependent on the number of steps?

@ChrisRackauckas
Copy link
Member

@YingboMa , if there's allocations in perform_step!, how is it not dependent on the number of steps?

@SebastianM-C
Copy link
Contributor Author

SebastianM-C commented Oct 7, 2019

I started investigating this again and I observed something quite strange. For a simpler user function (lorenz) the problem disappears.
MWE:

using OrdinaryDiffEq
using BenchmarkTools
using StaticArrays

function lorenz(u,p,t)
  du1 = 10.0*(u[2]-u[1])
  du2 = u[1]*(28.0-u[3]) - u[2]
  du3 = u[1]*u[2] - (8/3)*u[3]

  return SVector{3}(du1,du2,du3)
end

@inbounds @inline function (z, p, t)
    A, B, D = p
    p₀, p₂ = z[SVector{2}(1:2)]
    q₀, q₂ = z[SVector{2}(3:4)]

    return SVector{4}(
        -A * q₀ - 3 * B / 2 * (q₂^2 - q₀^2) - D * q₀ * (q₀^2 + q₂^2),
        -q₂ * (A + 3 * 2 * B * q₀ + D * (q₀^2 + q₂^2)),
        A * p₀,
        A * p₂
    )
end

u0 = @SVector [1.0,0.0,0.0]
u = vcat(u0,u0)
p = (A=1,B=0.55,D=0.4)

q0 = SVector{2}([0.0, -4])
p0 = SVector{2}([10, -5])
z0 = vcat(p0, q0)

tspan1 = (0.0,10.0)
prob1_ok = ODEProblem(lorenz,u0,tspan1)
prob1_notok = ODEProblem(ż,z0,tspan1,p)
@btime solve($prob1_ok, Vern9(), save_everystep=false) # 49.999 μs (40 allocations: 9.89 KiB)
@btime solve($prob1_notok, Vern9(), save_everystep=false) # 58.199 μs (3170 allocations: 107.92 KiB)

tspan2 = (0.0,100.0)
prob2_ok = ODEProblem(lorenz,u0,tspan2)
prob2_notok = ODEProblem(ż,z0,tspan2,p)

@btime solve($prob2_ok, Vern9(), save_everystep=false) # 543.900 μs (40 allocations: 9.89 KiB)
@btime solve($prob2_notok, Vern9(), save_everystep=false) # 450.700 μs (25810 allocations: 815.42 KiB)

@SebastianM-C
Copy link
Contributor Author

I tried a couple of other functions and it looks like it's somehow related to how complicated the user function is (number of operations?). The Henon system is not sufficient to trigger the problem

henon(z, p, t) = SVector(
        -z[3] * (1 + 2z[4]),
        -z[4] - (z[3]^2 - z[4]^2),
        z[1],
        z[2]
    )

but if I extend lorenz or henon by writing the equations twice, I can reproduce the problem. For extending I used something like this

function extend2(f, ::SVector{M}) where M
    @inbounds function eom(u,p,t)
        idx1 = SVector{M}(1:M)
        idx2 = SVector{M}(M+1:2M)
        vcat(f(u[idx1],p,t),
             f(u[idx2],p,t))
    end
end

and I don't think it introduces problems.

@SebastianM-C
Copy link
Contributor Author

Indeed, extending doesn't impact allocations. Using

function lorenz2(u,p,t)
    du1 = 10.0*(u[2]-u[1])
    du2 = u[1]*(28.0-u[3]) - u[2]
    du3 = u[1]*u[2] - (8/3)*u[3]

    du4 = 10.0*(u[2+3]-u[1+3])
    du5 = u[1+3]*(28.0-u[3+3]) - u[2+3]
    du6 = u[1+3]*u[2+3] - (8/3)*u[3+3]
  
    return SVector{6}(du1,du2,du3,du4,du5,du6)
end

yields the same number of allocations (and reproduces the problem).

Since the henon system doesn't present the problem, but the one for does, I am inclined to say that the problem is not related to the number of equations (or length of the SVector), but to something like the number of operations.
I tried @inline and @noinline with lorenz2 and I doesn't influence allocations

u0 = @SVector [1.0,0.0,0.0]
u = vcat(u0,u0)
tspan1 = (0.0,10.0)
prob1_2lm = ODEProblem(lorenz2,u,tspan1)
@btime solve($prob1_2lm, Vern9(), save_everystep=false)
# @inline 49.900 μs (200 allocations: 16.69 KiB)
# @noinline 53.000 μs (200 allocations: 16.69 KiB)
tspan2 = (0.0,100.0)
prob2_2lm = ODEProblem(lorenz2,u,tspan2)
@btime solve($prob2_2lm, Vern9(), save_everystep=false)
# @inline 536.101 μs (1796 allocations: 79.03 KiB)
# @noinline 571.701 μs (1796 allocations: 79.03 KiB)

@SebastianM-C
Copy link
Contributor Author

@YingboMa I updated your script above to

using StaticArrays
using Profile
using BenchmarkTools
using OrdinaryDiffEq

function lorenz(u,p,t)
  du1 = 10.0*(u[2]-u[1])
  du2 = u[1]*(28.0-u[3]) - u[2]
  du3 = u[1]*u[2] - (8/3)*u[3]

  return SVector{3}(du1,du2,du3)
end

function lorenz2(u,p,t)
    du1 = 10.0*(u[2]-u[1])
    du2 = u[1]*(28.0-u[3]) - u[2]
    du3 = u[1]*u[2] - (8/3)*u[3]

    du4 = 10.0*(u[2+3]-u[1+3])
    du5 = u[1+3]*(28.0-u[3+3]) - u[2+3]
    du6 = u[1+3]*u[2+3] - (8/3)*u[3+3]

    return SVector{6}(du1,du2,du3,du4,du5,du6)
end

const u0 = @SVector [1.0,0.0,0.0]
const u = vcat(u0,u0)
const tspan = (0.0,10.0)
# const prob = ODEProblem(lorenz,u0,tspan)
const prob = ODEProblem(lorenz2,u,tspan)

@time solve(prob, Tsit5(), save_everystep=false);

@time solve(prob, Tsit5(), save_everystep=false);
Profile.clear_malloc_data()
@timev solve(prob, Tsit5(), save_everystep=false);
exit()

and tried to debug with --track-allocations=user, but it seems quite tricky due to inlining.
I wrote a very detailed log here: https://gist.github.com/SebastianM-C/578e46a7ccff39dabd4aeeac964c099c

@ChrisRackauckas
Copy link
Member

Linear increase of allocations is worrisome. Where are all of the other allocs? That gist only displays 144 of like 700

@SebastianM-C
Copy link
Contributor Author

I used the Coverage script and checked the top 5 memory allocation spots and I think I found something. The error estimator has a linear increase in allocations.
I was confused at first that in solve! the perform_step! function showed 0 allocations, but actually in the .mem file you can see them.
I updated my findings here: https://github.com/SebastianM-C/Benchmarks/blob/7687fcddde85f908d238e9347b1e4c470e4ef3f1/debug_log_518.md

This contradicts @YingboMa 's earlier post, so I would appreciate if someone could try to replicate my findings in order to double check and make sure I didn't miss something.

TLDR:
with tspan = (0.0,10.0)

- @muladd function perform_step!(integrator, cache::Tsit5ConstantCache, repeat_step=false)
0   @unpack t,dt,uprev,u,f,p = integrator
0   @unpack c1,c2,c3,c4,c5,c6,a21,a31,a32,a41,a42,a43,a51,a52,a53,a54,a61,a62,a63,a64,a65,a71,a72,a73,a74,a75,a76,btilde1,btilde2,btilde3,btilde4,btilde5,btilde6,btilde7 = cache
0   k1 = integrator.fsalfirst
0   a = dt*a21
0   k2 = f(uprev+a*k1, p, t+c1*dt)
0   k3 = f(uprev+dt*(a31*k1+a32*k2), p, t+c2*dt)
0   k4 = f(uprev+dt*(a41*k1+a42*k2+a43*k3), p, t+c3*dt)
0   k5 = f(uprev+dt*(a51*k1+a52*k2+a53*k3+a54*k4), p, t+c4*dt)
0   g6 = uprev+dt*(a61*k1+a62*k2+a63*k3+a64*k4+a65*k5)
0   k6 = f(g6, p, t+dt)
0   u = uprev+dt*(a71*k1+a72*k2+a73*k3+a74*k4+a75*k5+a76*k6)
0   integrator.fsallast = f(u, p, t+dt); k7 = integrator.fsallast
0   integrator.destats.nf += 6
-   if typeof(integrator.alg) <: CompositeAlgorithm
-     g7 = u
-     # Hairer II, page 22
-     integrator.eigen_est = integrator.opts.internalnorm(k7 - k6,t)/integrator.opts.internalnorm(g7 - g6,t)
-   end
0   if integrator.opts.adaptive
0     utilde = dt*(btilde1*k1 + btilde2*k2 + btilde3*k3 + btilde4*k4 + btilde5*k5 + btilde6*k6 + btilde7*k7)
0     atmp = calculate_residuals(utilde, uprev, u, integrator.opts.abstol, integrator.opts.reltol,integrator.opts.internalnorm,t)
1648     integrator.EEst = integrator.opts.internalnorm(atmp,t)
-   end
0   integrator.k[1] = k1
0   integrator.k[2] = k2
0   integrator.k[3] = k3
0   integrator.k[4] = k4
0   integrator.k[5] = k5
0   integrator.k[6] = k6
0   integrator.k[7] = k7
0   integrator.u = u
- end

with tspan = (0.0,20.0)

- @muladd function perform_step!(integrator, cache::Tsit5ConstantCache, repeat_step=false)
0   @unpack t,dt,uprev,u,f,p = integrator
0   @unpack c1,c2,c3,c4,c5,c6,a21,a31,a32,a41,a42,a43,a51,a52,a53,a54,a61,a62,a63,a64,a65,a71,a72,a73,a74,a75,a76,btilde1,btilde2,btilde3,btilde4,btilde5,btilde6,btilde7 = cache
0   k1 = integrator.fsalfirst
0   a = dt*a21
0   k2 = f(uprev+a*k1, p, t+c1*dt)
0   k3 = f(uprev+dt*(a31*k1+a32*k2), p, t+c2*dt)
0   k4 = f(uprev+dt*(a41*k1+a42*k2+a43*k3), p, t+c3*dt)
0   k5 = f(uprev+dt*(a51*k1+a52*k2+a53*k3+a54*k4), p, t+c4*dt)
0   g6 = uprev+dt*(a61*k1+a62*k2+a63*k3+a64*k4+a65*k5)
0   k6 = f(g6, p, t+dt)
0   u = uprev+dt*(a71*k1+a72*k2+a73*k3+a74*k4+a75*k5+a76*k6)
0   integrator.fsallast = f(u, p, t+dt); k7 = integrator.fsallast
0   integrator.destats.nf += 6
-   if typeof(integrator.alg) <: CompositeAlgorithm
-     g7 = u
-     # Hairer II, page 22
-     integrator.eigen_est = integrator.opts.internalnorm(k7 - k6,t)/integrator.opts.internalnorm(g7 - g6,t)
-   end
0   if integrator.opts.adaptive
0     utilde = dt*(btilde1*k1 + btilde2*k2 + btilde3*k3 + btilde4*k4 + btilde5*k5 + btilde6*k6 + btilde7*k7)
0     atmp = calculate_residuals(utilde, uprev, u, integrator.opts.abstol, integrator.opts.reltol,integrator.opts.internalnorm,t)
3616     integrator.EEst = integrator.opts.internalnorm(atmp,t)
-   end
0   integrator.k[1] = k1
0   integrator.k[2] = k2
0   integrator.k[3] = k3
0   integrator.k[4] = k4
0   integrator.k[5] = k5
0   integrator.k[6] = k6
0   integrator.k[7] = k7
0   integrator.u = u
- end

@ChrisRackauckas
Copy link
Member

Looks like it's fixed by SciML/DiffEqBase.jl#348

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants