-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gc weirdness #8055
Comments
I have no insight to give here, but it would be interesting if the behaviour goes away if you use #5227 |
It is strange that your replacement only reduces the memory allocation by such a tiny fraction. The second code should not allocate anything in these particular lines, whereas the first one clearly does. What happens to this comparison if you comment out all the code corresponding to |
Also make sure that your loop is well typed (use code_typed). If it isn't, then preallocating doesn't help much. |
Maybe I should have been clearer that the timing and memory consumption is for running the whole program and the tiny part I've shown is the locally most memory consuming line. The reduction of bytes allocated matches the measurements with --track-allocation and is a healthy 10% lower memory consumption for the whole program. Based on earlier optimizations I've made on that code I had expected total time and gc time to follow suit but here they go in totally the opposite way. I haven't really learnt how to read the output of code_typed but the TypeCheck package didn't come up with any complaints. |
On 19 Aug 2014, at 20:56, Gunnar Farnebäck [email protected] wrote:
I understood this, but it is correct that my remark regarding the ‘only 10%’ is meaningless given that I don’t know the rest of the code. Hence my suggestion to just run the code with everything corresponding to […] commented out, and just checking the effect of these lines. This should be possible for a given initial x, you would just be running 100 iterations of filling f with the same data. Given that […] corresponds to a 1000 more lines of code, could it be that these lines contain something that kills the type stability. For example, you could be using the variable name k at some point in your code to represent something different than an Int. This would kill the type stability for k (I think) and result in a lot of overhead for the loop variable of the second code, which is not present in the first code. |
I'd agree with Keno that a type problem is the most likely explanation. TypeCheck is limited to functions with declared types, so there are a lot of loop variable type problems it misses. Glad you like |
A few updates: Inspection of code_typed found a type problem elsewhere in the same function but fixing that had no impact on this issue. Switching to 0.3.0-rc4 (still Windows 32 bit) made no difference to this behavior. There is something platform specific to this, however. With 0.3.0-rc4 on Linux 64 bit the issue can not be seen. There the reduction of bytes allocated is accompanied by a reduction of total time and of gc fraction. |
Windows 64 bit also behaves well, so it seems to be a 32 bit issue. |
Linux 32 bit misbehaves too, so this definitely looks like a 32 bit issue. Since nobody else will have any chance to reproduce it I'll close this issue. I can pick it up again if there's something specific to try that doesn't require too much effort. |
This is going to be a terrible bug report and should possibly have gone on a mailing list, but the effect is interesting. Skip the background section if you want to get to the meat of the issue.
Background
The code in question is about 1000 lines, ported from Matlab. After fixing some early problems with type instability and globals, the speed went from 10 times slower to somewhat faster than Matlab. Further optimizations, mostly temporary array avoidance and devectorization brought it to 4 times faster than Matlab. Profiling and
@time
indicates that much time is still spent on garbage collection. Using --track-allocation from #7464 (great tool!) I identified the code line using most memory, and rewriting it did in fact reduce the amount of used memory. However, this caused a 10% slowdown overall and substantially more time spent in gc. Unfortunately the code is entirely proprietary and considering the global nature of gc I suspect that the problem will just go away if I try to minimize it. Thus this very vague description.Issue
Before optimization (jit warmup taken care of):
In order to reduce memory consumption the following construction
is changed to
x
is an instance of a mutable type where x.v is a Vector{Float64} and x.p is a Matrix{Int}, in this case of sizes 1436 and 818560,2 respectively. After this optimization the memory consumption does go down but total time and fraction of time spent in gc both increase substantiallyThis is reproducible with only small variations in timings and gc fractions.
Any suggestions of what is going on? Is it reasonable for the gc to behave in this way? Is there some effective way for me to dig deeper into the problem?
The text was updated successfully, but these errors were encountered: