-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: statistical allocation profiling #31915
Conversation
Julia has a mature statistical profiler. It sets a timer that captures a backtrace when it is triggered. By the law of large numbers, this gives insight into where an algorithm spends its time, without noticably slowing the program down. By comparison, finding out where the allocations are happening is quite bit more cumbersome. It needs starting Julia with a specific command line switch, code execution is _much_ slower, and after program exit, the results are scattered over the file system. This pull request represents an attempt at bringing the ergonomics of statistical _runtime_ profiling to allocations: "statistical allocation profiling". Similar to how, in the former case, `Profile.init` configures a delay between backtraces, this branch add an option to specify a fraction of allocations that capture a backtrace. Example usage: ```julia using Profile Profile.init(alloc_rate = 0.01) doublefibonacci(n) = if n <= 2 return [1, 1] else return doublefibonacci(n - 1) .+ doublefibonacci(n - 2) end @Profile for i=1:1000; doublefibonacci(15); end Profile.print() # but better to use e.g. ProfileView or StatProfilerHTML ``` State of this commit: - linux support only - not thread-safe - no attempt at a friendly human interface; as it is, the `Profile.init` API almost encourages a linear combination of runtime and allocation profiling. That makes no sense at all.
See also #31534 (I haven't yet looked into either much to compare) |
@vtjnash thanks for the reference! I wasn't aware of that one. From skimming the other PR, it looks like the main differences are:
|
Why is this not just a display feature? The profile already contain backtraces that includes the allocation function. The only job should be to find those functions in the backtrace and it should not involve changing allocation code. |
Because that's scaled by time spent, not by number of allocations. It's the latter thing that's the objective of this PR. |
This was an oversight in the previous commit.
Interesting. I like the tunable runtime overhead. While number of allocations is probably what I'd use this for most, sometimes one might want more info about the size of allocations. Using this approach, could one indirectly get that via an option to trigger every One option worth considering is to collaborate with @staticfloat to finish #31534, and perhaps integrate the tunable runtime overhead of this approach. |
@@ -1108,6 +1116,8 @@ JL_DLLEXPORT jl_value_t *jl_gc_pool_alloc(jl_ptls_t ptls, int pool_offset, | |||
jl_gc_safepoint_(ptls); | |||
} | |||
gc_num.poolalloc++; | |||
if(gc_statprofile_sample_rate && rand() < gc_statprofile_sample_rate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following @timholy's comment on adjustible overhead: This implementation calls the RNG on every alloc. Hence, even if the sample rate is close to zero, the overhead does not converge to zero.
An alternative would be something like
if(gc_num.poolalloc++ == gc_num.next_pool_sample) {gc_num.next_pool_sample += gc_statprofile_pool_inverse_rate; jl_profile_record_trace(NULL);}
.
With gc_num.next_pool_sample = 0
, this would trigger on next wrap-around, i.e. never, and with gc_statprofile_pool_inverse_rate
large this would trigger very rarely. We would pay only a single predicted branch on allocs we don't want to sample.
Similar treatment could be applied to gc_num.bigalloc, gc_num.allocd, etc counters. We probably should randomize the increment in order to avoid biases in loops that have period close to commensurable with the inverse rate. While poisson distribution of the gaps (as your code provides) is statistically nicer, something like 1 + (inverse_rate * rand_uint16()) >> 15
is probably good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great point. I'll run some timings to see how RNG overhead compares to the allocation itself. If it's significant, I'll investigate the right scheme to use here. If not, there's probably value in keeping Poisson.
@timholy thanks for the comments. I'll be glad to work together on combining these pull requests. @staticfloat what do you think? |
@tkluck, thanks again for this. It was extremely useful in JuliaImages/ImageFiltering.jl#94 (comment); highly recommended for anyone else who wants to debug something similar. I am looking forward to whatever form this ends up taking! |
Superseded by #42768? :) |
Julia has a mature statistical profiler. It sets a timer that captures a backtrace when it is triggered. By the law of large numbers, this gives insight into where an algorithm spends its time, without noticably slowing the program down.
By comparison, finding out where the allocations are happening is quite a bit more cumbersome. It needs starting Julia with a specific command line switch, code execution is much slower, and after program exit, the results are scattered over the file system.
This pull request represents an attempt at bringing the ergonomics of statistical runtime profiling to allocations: "statistical allocation profiling". Similar to how, in the former case,
Profile.init
configures a delay between backtraces, this branch add an option to specify a fraction of allocations that capture a backtrace.Example usage:
State of this commit:
Profile.init
API almost encourages a linear combination of runtime and allocation profiling. That makes no sense at all.I'm sending this as a WIP early so I can get feedback before investing time in productionizing this. What do you think?