-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding many-dimensional arrays is slow #20163
Comments
The core problem here is that julia's type-inference machinery basically punts when functions have more than 15 arguments due to this constant. There have been attempts to set it much higher (#15080), but that causes problems elsewhere. Your linear-indexing trick should work, though (EDIT: for regular arrays, not for ReshapedArrays, I didn't catch that part initially). You can manually check that they are of compatible size with a line like @boundscheck indices(x) == indices(y) || throw(DimensionMismatch("x and y must have the same indices, got $(indices(x)) and $(indices(y))")) You should also return |
I see. Interesting! But, as a workaround, couldn't the add function for Arrays just treat the fast-linear-indexing case specially, without involving any functions with many arguments? Does that slow the low-dim cases down too much? Anyway, I tried your suggestions and found something interesting: function myplus1(x, y)
@boundscheck length(x) == length(y) || throw(DimensionMismatch("x and y must have the same length, got $(length(x)) and $(length(y))"))
res = zeros(promote_type(eltype(x), eltype(y)), size(x))
@inbounds @simd for j in 1:length(x)
res[j] = x[j] + y[j]
end
res
end
function myplus2(x, y)
@boundscheck length(x) == length(y) || throw(DimensionMismatch("x and y must have the same length, got $(length(x)) and $(length(y))"))
res = zeros(promote_type(eltype(x), eltype(y)), length(x))
@inbounds @simd for j in 1:length(x)
res[j] = x[j] + y[j]
end
reshape(res, size(x))
end
@time a + a #0.000511 seconds (6 allocations: 2.000 MB)
@time res = myplus1(ar,ar) #0.035502 seconds (1.05 M allocations: 21.988 MB, 6.36% gc time)
@time res = myplus1(a,a) #0.000647 seconds (6 allocations: 2.000 MB)
@time res = myplus2(ar,ar) #0.000676 seconds (18 allocations: 2.002 MB)
@time res = myplus2(a,a) #0.000647 seconds (6 allocations: 2.000 MB) These numbers are about the lowest I could get on this machine by repeating a number of times. So indeed, linear indexing does seem to fix things, but not for the destination array! If res has many dimensions I see the same behaviour as I did in my original report. If it is instead a vector, which I reshape afterwards, I see similar performance to the 1-dimensional case. PS: Julia 0.5.0 is the same. Also, I deliberately relaxed the bounds checking to check the linear sizes only. |
Yes, that's another symptom of the same problem. The issue is that inference gives up trying to figure out what type |
Oh dear. Yes, I see. Well I guess I know what to watch out for now! Is there an obvious reason why |
Definitely 😄. The main reason is that there are many You may be winning the prize for the highest array dimension yet reported in what I'm assuming is "real code." (Congratulations!) So you're challenging the system in new and exciting ways. |
😄 It is indeed real code! The task is sparse diagonalization of a |
Interesting. Addressing this very generally is not a simple problem, but there is conceivably a way forward. Very briefly, the engine that underlies much of Julia's array handling is the processing of tuples---the compiler understands a lot about tuples so there is really quite an impressive amount of "computation" that can be done at compile-time (rather than run-time) if you implement operations in terms of tuples. There are a couple of different "dialects" of tuple processing, for example to add all the elements in a tuple you could have foo1(t) = _foo1(0, t...)
_foo1(output, a, b...) = _foo1(output+a, b...)
_foo1(output) = output versus foo2(t) = _foo2(0, t)
_foo2(output, a) = _foo2(output+a[1], Base.tail(a))
_foo2(output, ::Tuple{}) = output The first blows up the number of arguments, but the second (in principle) does not. The reason I say "in principle" is that it relies on The rub is that |
@amilsted can't you just reshape the array locally to a vector before adding them? |
Hi @tknopp. Yes, that would seem to be one work-around strategy. Since It's not an insurmountable problem by any means, it just caught be by surprise. I think I remember reading about the large-tuples limitation back when I tried Julia for the first time, but I had not noticed that this was the problem here, perhaps because I rarely display or type out the the size tuples of these arrays. Is getting Julia to warn about crossing the tuple-length threshold an option? |
You are pushing the limits of Julias type system. So the best thing seems to be making oneself familiar with the limits and introduced simple workarounds. Regarding the warning there is the |
Yes, I should use |
@timholy, if I remember correctly you indeed increased Currently it's |
Very insightful, @Jutho, to come up with a principled way to set these parameters. That seems very compelling to me. I'll have to defer to @vtjnash, @JeffBezanson, or others to say what's feasible. |
Also, could it be set to a different value for repl use to make it more responsive? That would likely mitigate some of the compilation time problems. |
Is this still slow on v0.7? What about: #22370 (comment)? |
Thanks for the reference @andyferris, that is indeed amazing. It will be a long time before we can run exact diagonalization of a quantum spin system with 9001 spins :-). |
The current (1.8.3) situation seems better
The
Granted, the |
Hi,
I often want to add together two arrays with N small dimensions, where e.g. N=18 and each dimension has size 2.
Initially, I was unknowingly adding ReshapedArray objects with these dimensions, which is very slow, at least done naively using
+
:I see a similar slowdown with just Arrays if I try to use broadcasting.
Could these just be extreme cases of the following simple case?:
I thought it might be due to linear indexing not being used, buy my own attempt to use it
was not much better, and obviously skips some safety checks.
This is with
Julia Version 0.6.0-dev.1230
Commit ce6a0c3 (2016-11-10 21:09 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) CPU @ 2.60GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.7.1 (ORCJIT, sandybridge)
Julia 0.5.0 performs similarly, except that
a .+ a
anda + a
switch speeds, witha .+ a
being faster (hundreds of ns) on 0.5.0.The text was updated successfully, but these errors were encountered: