-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiScaleModels Profiling #9
Comments
v0.6 DiffEq got a lot faster while this stayed the same: prob = ODEProblem(f,em,(0.0,1500.0))
@benchmark sol1 = solve(prob,Tsit5(),save_everystep=false)
prob = ODEProblem(f,em[:],(0.0,1500.0))
@benchmark sol2 = solve(prob,Tsit5(),save_everystep=false)
BenchmarkTools.Trial:
memory estimate: 24.44 KiB
allocs estimate: 323
--------------
minimum time: 393.737 μs (0.00% GC)
median time: 432.086 μs (0.00% GC)
mean time: 452.956 μs (0.88% GC)
maximum time: 4.695 ms (89.11% GC)
--------------
samples: 10000
evals/sample: 1
BenchmarkTools.Trial:
memory estimate: 103.53 KiB
allocs estimate: 1774
--------------
minimum time: 27.596 ms (0.00% GC)
median time: 28.951 ms (0.00% GC)
mean time: 28.998 ms (0.00% GC)
maximum time: 32.709 ms (0.00% GC)
--------------
samples: 173
evals/sample: 1 prob = ODEProblem(f,em,(0.0,1500.0))
@benchmark sol1 = solve(prob,Tsit5())
prob = ODEProblem(f,em[:],(0.0,1500.0))
@benchmark sol2 = solve(prob,Tsit5())
BenchmarkTools.Trial:
memory estimate: 13.31 MiB
allocs estimate: 123635
--------------
minimum time: 93.224 ms (0.00% GC)
median time: 102.820 ms (6.55% GC)
mean time: 100.983 ms (4.83% GC)
maximum time: 110.067 ms (12.34% GC)
--------------
samples: 50
evals/sample: 1
BenchmarkTools.Trial:
memory estimate: 1.21 MiB
allocs estimate: 6499
--------------
minimum time: 939.406 μs (0.00% GC)
median time: 1.068 ms (0.00% GC)
mean time: 1.208 ms (7.40% GC)
maximum time: 4.706 ms (52.20% GC)
--------------
samples: 4123
evals/sample: 1 Don't know what to say about that. Probably could use some optimizations. Or the "full broadcast" integrators would probably do well, since this is using the indexing versions right now. The major change that could have caused this is that now the "non-user facing cache variables" also match the type that the user gives. Before they were transformed into contiguous arrays since they were not shown to the user. That had some weird side-effects though, and for example slows down "add_daughter" types of events. So it's somewhat a wash... broadcast will likely be the savior here. |
Do you anticipate 0.7 or 1.0 do improve the 2-3x slower performance? |
It'll fix it. The detail is that linear indexing is slow because it has to do a binary search, but the built in broadcast is fast. A v0.6 bug prevented broadcast in most ODE/SDE methods: SciML/OrdinaryDiffEq.jl#106 However, this bug was fixed in Julia v0.7: JuliaLang/julia#22255 . So in the v0.7 upgrade we will be making all of those algorithms internally use broadcasting which will get rid of the slow indexing that MultiScaleArrays is hitting (and it also has other nice side effects, like all of the RK methods will be GPU-compatible!). With ArrayPartitions from RecursiveArrayTools.jl we found that grouping small broadcasts can actually be faster than contiguous loops because of smart cache handling, so there is a chance this can do extremely well. One issue this library will still have is that copying a MultiScaleArray is more expensive than a standard Array, so if you have a lot of saving going on that will be more expensive. But together I wouldn't be surprised if MultiScaleArrays is a small cost (1.3x?) or just a wash (<1.1x performance loss). The 2x-3x shouldn't exist after these changes. I am very very very happy that compiler change happened in Base. |
Oh, thats very cool! I may at times have very large arrays (10000+), do you think threading the ode will be efficient too in 0.7? |
Yes, for arrays of that size it would be a good idea. I want to create a package which makes broadcast multithreaded via a wrapper. That's pretty simple on v0.7, so I was waiting on that as the solution. |
Hi again, read this thread (https://discourse.julialang.org/t/agent-based-modeling-in-julia/12431/9) and realized its relevant to my case. I have different types as leaves in multiscalearrays. I haven’t benchmarked yet but i guess i would have issues with mixed types? Would https://github.com/tkoolen/TypeSortedCollections.jl solve that?
|
Or just have tuple types for holding the nodes: #26 . TypeSortedCollections.jl will work similarly. ArrayPartition works as well. These all require indexing with literals or broadcasting to get the type-stable operations though. |
I optimized the ODE solvers around these guys a bit. As a test case, I took a linear system on the embryo from the tests. The embryo is 4 layers and a total length of 13, and so it uses very small leaf nodes. This means that, with a cheap problem and a high hierarchy, it should accentuate any differences and give an upper bound on the cost of using a MultiScaleModel.
I then benchmarked the same problem solved with MultiScaleModels and then just having it be a regular length 13 contiguous array:
to get the results:
This puts an upper bound for the cost of using MultiScaleModels at just over 2x. But this is without saving: saving the timeseries is more costly with a MMM. Measuring the cost with saving every step:
This puts an upper bound around 3x. So the maximal cost of the abstraction is about 2x-3x for ODEs. It's actually lower for SDEs and DDEs because more of the calculation in those domains are spent on the interpolation and noise generation, which are actually able to mostly avoid the costs of MMMs. So the final product is something where the abstract cost is less than the performance difference between OrdinaryDiffEq and other packages, meaning using MMMs in OrdinaryDiffEq should still be slightly faster than using other packages with contiguous arrays. I think this counts as well optimized, and so my goal will be to now make this all compatible with the solvers for stiff equations since will make a large impact.
@zahachtah I think you might be interested in these results.
The text was updated successfully, but these errors were encountered: