Don't broadcast out-of-place #235

ChrisRackauckas · 2018-01-05T13:42:19Z

It seems that broadcast is harmful to most uses of the out-of-place format. It drastically slows down things like static arrays and ArrayFire arrays that are made to be used out-of-place. Since we should prioritize these for this path, it makes more sense for this path to be non-broadcasting. The reason we put broadcasting there in the first place was for fusion, but fusion is only required if one thinks that temporary allocations are bad for their type, which is opposite of the assumption that the out-of-place branch should be making. If one needs broadcasting, they should be using the route for mutables anyways.

This reverts some hard work by @devmotion, we probably should've checked more into this first. I'm sorry about that but still appreciate your hard work. But the net result is from

779.570 μs (807 allocations: 28.69 KiB)

to

19.320 μs (487 allocations: 19.94 KiB)

so it's huge for

using StaticArrays, OrdinaryDiffEq
using BenchmarkTools

α=1
β=1
f(t,u) = α*u
g(t,u) = β*u
dt = 1//2^(4)
tspan = (0.0,1.0)
u0 = @SVector [2.0,3.0]
prob = ODEProblem(f,u0,(0.0,1.0))

and puts us on similar grounds with BridgeDiffEq.jl which was made specifically for fixed timestep static arrays, so this is the "true optimization" which we were missing. (16.979 μs (698 allocations: 20.98 KiB) for Bridge. Most of the difference in the profiling is now just building the integrator type and other small setup bits which are a constant factor). It also evades #106 for out-of-place which will cleanup our code quite a bit. I'll get to work on getting this package-wide changed.

The text was updated successfully, but these errors were encountered:

ChrisRackauckas · 2018-01-05T14:14:57Z

@rveltz this should give similar speedups on ArrayFire arrays. Do you have any test equations you'd like us to run?

ChrisRackauckas · 2018-01-05T14:19:42Z

BTW @Datseris, this makes static array paths finally faster.

f2(t,u,du) = (du.=α*u)
g2(t,u,du) = (du.=β*u)
u0 = [2.0,3.0]
prob = ODEProblem(f2,u0,(0.0,1.0))
@btime sol1 = solve(prob,BS3(),dt=dt,adaptive=false)

gives

24.005 μs (281 allocations: 20.55 KiB)

Datseris · 2018-01-05T14:58:16Z

Let me know when this is operational in a stable version and then I will benchmark with "real" multidimensional systems. The difference of 5 μs is not big enough at the moment to make it compelling, plus the in-place version is always scalable: the performance will be "nice" regardless of dimensionallity. Can the same be said about static arrays?

Normally you would use static arrays and expect to get orders of magnitude better performance for small dimensionality. Here it is not even close to a single order of magnitude and the dimension is a mere 2, the lowest possible.

P.S.: I was always thinking that there is so much crazy amount of work put into supporting both in-place and out of place versions, but never really understood why,... Why even bother to support the out-of-place?

ChrisRackauckas · 2018-01-05T15:07:02Z

the performance will be "nice" regardless of dimensionallity. Can the same be said about static arrays?

No, static arrays will not scale. That's their limitation.

Normally you would use static arrays and expect to get orders of magnitude better performance for small dimensionality.

orders? No. Maybe an order of magnitude, but if that's true then either the mutable code could be faster or it's using a static array fast path. Static array fast paths, like the fast QR, is so nice when you get it though!

Why even bother to support the out-of-place?

Well, it's really not that much work since it's already written. Probably <5% of my time is spent working on them. Also, they are easier to get working, so when I am developing a new algorithm I always write the out of place version first, so why not keep it? They are more efficient when applicable, and they can be easier for users to do some things efficiently. For example, you don't have to use reinit with out-of-place because there's essentially no cache variables. It also makes weird stuff like resizing easier if it doesn't need to be applied to caches. So it's a nice touch that users who are interested in small systems can use to get maybe and extra 2x-3x out in the best of cases.

Datseris · 2018-01-05T15:09:51Z

Fair enough.

I do have a comment about the speedup. You can indeed get more than one order in the case of 2 and 3 dimensions, this is what I realized when working with discrete systems. Of course at my use case that was because of the absurdly fast QR method native for 2 and 3D static matrices, which made my function crazy fast.

But its probably fair to say that expecting more than one order in a differential equation setting is asking too much.

EDIT: Let me know though when this is in a stable release, and I will test it just so that I can help the ecosystem.

ChrisRackauckas · 2018-01-05T15:13:39Z

But its probably fair to say that expecting more than one order in a differential equation setting is asking too much.

Yes, for a non-stiff diffeq solver it's literally the difference of

@. tmp = a1*k1 + k2*k2

with vectors vs

tmp = a1*k1 + a2*k2

with static arrays, and both are quite optimized in these cases. It's the "harder parts", which would only show up in stiff solvers (because of linear solvers, that difference in the QR is huge!) or in the user's f, which would make a difference. If your f is much faster with static arrays, you'll see it with the diffeq code as well. The tests I posted had a trivial f though.

ChrisRackauckas · 2018-01-05T16:32:40Z

Done here and StochasticDiffEq.jl. Wasn't bad at all 👍

ChrisRackauckas changed the title ~~Dpm~~ Don't broadcast out-of-place Jan 5, 2018

ChrisRackauckas closed this as completed Jan 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't broadcast out-of-place #235

Don't broadcast out-of-place #235

ChrisRackauckas commented Jan 5, 2018 •

edited

Loading

ChrisRackauckas commented Jan 5, 2018

ChrisRackauckas commented Jan 5, 2018

Datseris commented Jan 5, 2018

ChrisRackauckas commented Jan 5, 2018

Datseris commented Jan 5, 2018 •

edited

Loading

ChrisRackauckas commented Jan 5, 2018 •

edited

Loading

ChrisRackauckas commented Jan 5, 2018

Don't broadcast out-of-place #235

Don't broadcast out-of-place #235

Comments

ChrisRackauckas commented Jan 5, 2018 • edited Loading

ChrisRackauckas commented Jan 5, 2018

ChrisRackauckas commented Jan 5, 2018

Datseris commented Jan 5, 2018

ChrisRackauckas commented Jan 5, 2018

Datseris commented Jan 5, 2018 • edited Loading

ChrisRackauckas commented Jan 5, 2018 • edited Loading

ChrisRackauckas commented Jan 5, 2018

ChrisRackauckas commented Jan 5, 2018 •

edited

Loading

Datseris commented Jan 5, 2018 •

edited

Loading

ChrisRackauckas commented Jan 5, 2018 •

edited

Loading