MultiScaleModels Profiling #9

ChrisRackauckas · 2017-01-21T04:51:18Z

I optimized the ODE solvers around these guys a bit. As a test case, I took a linear system on the embryo from the tests. The embryo is 4 layers and a total length of 13, and so it uses very small leaf nodes. This means that, with a cheap problem and a high hierarchy, it should accentuate any differences and give an upper bound on the cost of using a MultiScaleModel.

I then benchmarked the same problem solved with MultiScaleModels and then just having it be a regular length 13 contiguous array:

prob = ODEProblem(f,em,(0.0,1500.0))
@benchmark sol1 = solve(prob,Tsit5(),save_timeseries=false)
prob = ODEProblem(f,em[:],(0.0,1500.0))
@benchmark sol2 = solve(prob,Tsit5(),save_timeseries=false)

to get the results:

BenchmarkTools.Trial: 
  memory estimate:  230.28 kb
  allocs estimate:  10920
  --------------
  minimum time:     16.469 ms (0.00% GC)
  median time:      16.942 ms (0.00% GC)
  mean time:        17.033 ms (0.46% GC)
  maximum time:     26.385 ms (33.77% GC)
  --------------
  samples:          294
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

BenchmarkTools.Trial: 
  memory estimate:  196.37 kb
  allocs estimate:  10409
  --------------
  minimum time:     7.500 ms (0.00% GC)
  median time:      7.846 ms (0.00% GC)
  mean time:        7.907 ms (0.51% GC)
  maximum time:     13.730 ms (42.01% GC)
  --------------
  samples:          632
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

This puts an upper bound for the cost of using MultiScaleModels at just over 2x. But this is without saving: saving the timeseries is more costly with a MMM. Measuring the cost with saving every step:

prob = ODEProblem(f,em,(0.0,1500.0))
@benchmark sol1 = solve(prob,Tsit5())
prob = ODEProblem(f,em[:],(0.0,1500.0))
@benchmark sol2 = solve(prob,Tsit5())

BenchmarkTools.Trial: 
  memory estimate:  3.20 mb
  allocs estimate:  43891
  --------------
  minimum time:     26.454 ms (0.00% GC)
  median time:      27.738 ms (0.00% GC)
  mean time:        28.962 ms (4.21% GC)
  maximum time:     39.190 ms (27.02% GC)
  --------------
  samples:          173
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

BenchmarkTools.Trial: 
  memory estimate:  1.52 mb
  allocs estimate:  21060
  --------------
  minimum time:     9.414 ms (0.00% GC)
  median time:      9.910 ms (0.00% GC)
  mean time:        10.252 ms (3.11% GC)
  maximum time:     18.033 ms (35.72% GC)
  --------------
  samples:          488
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

This puts an upper bound around 3x. So the maximal cost of the abstraction is about 2x-3x for ODEs. It's actually lower for SDEs and DDEs because more of the calculation in those domains are spent on the interpolation and noise generation, which are actually able to mostly avoid the costs of MMMs. So the final product is something where the abstract cost is less than the performance difference between OrdinaryDiffEq and other packages, meaning using MMMs in OrdinaryDiffEq should still be slightly faster than using other packages with contiguous arrays. I think this counts as well optimized, and so my goal will be to now make this all compatible with the solvers for stiff equations since will make a large impact.

@zahachtah I think you might be interested in these results.

The text was updated successfully, but these errors were encountered:

ChrisRackauckas · 2017-07-01T02:18:29Z

v0.6 DiffEq got a lot faster while this stayed the same:

prob = ODEProblem(f,em,(0.0,1500.0))
@benchmark sol1 = solve(prob,Tsit5(),save_everystep=false)
prob = ODEProblem(f,em[:],(0.0,1500.0))
@benchmark sol2 = solve(prob,Tsit5(),save_everystep=false)


BenchmarkTools.Trial: 
  memory estimate:  24.44 KiB
  allocs estimate:  323
  --------------
  minimum time:     393.737 μs (0.00% GC)
  median time:      432.086 μs (0.00% GC)
  mean time:        452.956 μs (0.88% GC)
  maximum time:     4.695 ms (89.11% GC)
  --------------
  samples:          10000
  evals/sample:     1


BenchmarkTools.Trial: 
  memory estimate:  103.53 KiB
  allocs estimate:  1774
  --------------
  minimum time:     27.596 ms (0.00% GC)
  median time:      28.951 ms (0.00% GC)
  mean time:        28.998 ms (0.00% GC)
  maximum time:     32.709 ms (0.00% GC)
  --------------
  samples:          173
  evals/sample:     1

prob = ODEProblem(f,em,(0.0,1500.0))
@benchmark sol1 = solve(prob,Tsit5())
prob = ODEProblem(f,em[:],(0.0,1500.0))
@benchmark sol2 = solve(prob,Tsit5())

BenchmarkTools.Trial: 
  memory estimate:  13.31 MiB
  allocs estimate:  123635
  --------------
  minimum time:     93.224 ms (0.00% GC)
  median time:      102.820 ms (6.55% GC)
  mean time:        100.983 ms (4.83% GC)
  maximum time:     110.067 ms (12.34% GC)
  --------------
  samples:          50
  evals/sample:     1


BenchmarkTools.Trial: 
  memory estimate:  1.21 MiB
  allocs estimate:  6499
  --------------
  minimum time:     939.406 μs (0.00% GC)
  median time:      1.068 ms (0.00% GC)
  mean time:        1.208 ms (7.40% GC)
  maximum time:     4.706 ms (52.20% GC)
  --------------
  samples:          4123
  evals/sample:     1

Don't know what to say about that. Probably could use some optimizations. Or the "full broadcast" integrators would probably do well, since this is using the indexing versions right now.

The major change that could have caused this is that now the "non-user facing cache variables" also match the type that the user gives. Before they were transformed into contiguous arrays since they were not shown to the user. That had some weird side-effects though, and for example slows down "add_daughter" types of events. So it's somewhat a wash... broadcast will likely be the savior here.

zahachtah · 2018-07-09T21:03:01Z

Do you anticipate 0.7 or 1.0 do improve the 2-3x slower performance?

ChrisRackauckas · 2018-07-09T23:19:17Z

It'll fix it. The detail is that linear indexing is slow because it has to do a binary search, but the built in broadcast is fast. A v0.6 bug prevented broadcast in most ODE/SDE methods: SciML/OrdinaryDiffEq.jl#106

However, this bug was fixed in Julia v0.7: JuliaLang/julia#22255 . So in the v0.7 upgrade we will be making all of those algorithms internally use broadcasting which will get rid of the slow indexing that MultiScaleArrays is hitting (and it also has other nice side effects, like all of the RK methods will be GPU-compatible!).

With ArrayPartitions from RecursiveArrayTools.jl we found that grouping small broadcasts can actually be faster than contiguous loops because of smart cache handling, so there is a chance this can do extremely well. One issue this library will still have is that copying a MultiScaleArray is more expensive than a standard Array, so if you have a lot of saving going on that will be more expensive. But together I wouldn't be surprised if MultiScaleArrays is a small cost (1.3x?) or just a wash (<1.1x performance loss). The 2x-3x shouldn't exist after these changes. I am very very very happy that compiler change happened in Base.

zahachtah · 2018-07-10T08:09:07Z

Oh, thats very cool! I may at times have very large arrays (10000+), do you think threading the ode will be efficient too in 0.7?

ChrisRackauckas · 2018-07-10T10:31:57Z

Yes, for arrays of that size it would be a good idea. I want to create a package which makes broadcast multithreaded via a wrapper. That's pretty simple on v0.7, so I was waiting on that as the solution.

zahachtah · 2018-07-17T18:59:54Z

Hi again, read this thread (https://discourse.julialang.org/t/agent-based-modeling-in-julia/12431/9) and realized its relevant to my case. I have different types as leaves in multiscalearrays. I haven’t benchmarked yet but i guess i would have issues with mixed types? Would https://github.com/tkoolen/TypeSortedCollections.jl solve that?

ChrisRackauckas · 2018-07-17T19:02:41Z

Or just have tuple types for holding the nodes: #26 . TypeSortedCollections.jl will work similarly. ArrayPartition works as well. These all require indexing with literals or broadcasting to get the type-stable operations though.

ChrisRackauckas closed this as completed Jan 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiScaleModels Profiling #9

MultiScaleModels Profiling #9

ChrisRackauckas commented Jan 21, 2017

ChrisRackauckas commented Jul 1, 2017

zahachtah commented Jul 9, 2018

ChrisRackauckas commented Jul 9, 2018

zahachtah commented Jul 10, 2018

ChrisRackauckas commented Jul 10, 2018

zahachtah commented Jul 17, 2018 via email •

edited

Loading

ChrisRackauckas commented Jul 17, 2018

MultiScaleModels Profiling #9

MultiScaleModels Profiling #9

Comments

ChrisRackauckas commented Jan 21, 2017

ChrisRackauckas commented Jul 1, 2017

zahachtah commented Jul 9, 2018

ChrisRackauckas commented Jul 9, 2018

zahachtah commented Jul 10, 2018

ChrisRackauckas commented Jul 10, 2018

zahachtah commented Jul 17, 2018 via email • edited Loading

ChrisRackauckas commented Jul 17, 2018

zahachtah commented Jul 17, 2018 via email •

edited

Loading