-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scala 2 forwardport: -Yprofile-trace
#19897
Conversation
3eeaa3d
to
8c55116
Compare
@keynmol petition here |
YES YES YES PLEASE |
880e2a7
to
70523b7
Compare
@nicolasstucki I think you we're doing the original backport of |
-Yprofile-trace
-Yprofile-trace
16795d6
to
6eb6a23
Compare
cc05d6a
to
72d5e0e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done a first quite-quick pass by just reading code. My only concern is the potential cost of the new *on
operations, but that might be insignificant. Otherwise looks good to me at first sight.
I now need to experiment with the option to be able to provide further feedback.
ctx.profiler.onCompletion(sym, associatedFile)(body) | ||
} | ||
|
||
override def complete(root: SymDenotation)(using Context): Unit = profileCompletion(root) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked that the associatedFile
callback above doesn't run when profiling is disabled ✅
But I am still not 100% convinced that is zero cost—it seems that calls to default implementations of beforeCompletion
and afterCompilation
will still exist after inlining? This would just be 2 method calls and a pair allocation, so maybe harmless in terms for performance?
Unfortunately, the benchmarks bot is currently down, so we can't run the benchmarks now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The allocations have been reduced by preallocating empty (default) outputs. This should reduce any overhead when -Yprofile-trace is disabled. The 2 additional calls to before/after completion might still be present, but maybe JVM can optimize these?
Do you maybe know if the benchmarks bot is working now? We could run it now, after the rebase, to ensure the overhead is not significant when profile-tracing is disabled.
def beforePhase(phase: Phase): ProfileSnap | ||
|
||
def afterPhase(phase: Phase, profileBefore: ProfileSnap): Unit | ||
inline def onPhase[T](phase: Phase)(inline body: T): T = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above for all on*
methods; we now allocate a pair on each call. I don't now if that's significant, but maybe we could just extract (TracedEventId.Empty, Profiler.emptySnap)
and so on to constants, in order to avoid the allocations?
|
||
import scala.collection.mutable | ||
|
||
object ChromeTrace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't read this class and FileUtils
in details. Should we have a few unit tests for these, or is it not worth it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've ported the FileUtilsTest from Scala 2 repo and added a ChromeTraceTest to check the outputs and structure of tracing files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I took so long to come back to this! I finally tried compiling "hello world" with tracing enabled this morning. Here is what I did:
scalac -Yprofile-enabled -Yprofile-trace trace.json hello.scala
I was then able to visualize its easily using https://ui.perfetto.dev/:
Is this the expected workflow? It seems to work well, even if Perfetto warned me that it would prefer a trace in protobuf format.
else | ||
val completionName= this.completionName(root, associatedFile) | ||
val event = TracedEventId(associatedFile.name) | ||
chromeTrace.traceDurationEventStart(Category.Completion.name, "↯", colour = "thread_state_sleeping") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct that this always generate 3 events with the same timing information for each completion? Is this to ease filtering by file and symbol? What was the rational for the ↯
events; do we really need them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, 3 events would be always generated. I'm not sure what the purpose of the ↯
events was. If I had to guess I'd say that it only exists to inform about the bounds of IO operations, which might make some sense. All other operations are CPU-bound. The 2 remaining symbols might be used to ease filtering, especially because the associated file might not always be present.
We could remove the ↯
and associated file events, but initially I wanted to make the traces compliant with the outputs of Scala 2.
chromeTrace.traceCounterEvent("jitCompilationTime", "jitCompilationTime", initialSnap.totalJITCompilationTime, processWide = true) | ||
chromeTrace.traceCounterEvent("userTime", "userTime", initialSnap.userTimeNanos, processWide = false) | ||
chromeTrace.traceCounterEvent("cpuTime", "cpuTime", initialSnap.cpuTimeNanos, processWide = false) | ||
chromeTrace.traceCounterEvent("idleTime", "idleTime", initialSnap.idleTimeNanos, processWide = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we record these metrics only in afterUnit
? This means that they are only available after the first unit processed during typer
, so we don't know how much memory was used for parsing or for typing for example. Couldn't we sample them also before/after each phase, and before each unit? Or would that be too expensive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored this part out. Now it's invoked in all 4 before/after phase/unit combinations. This should not impose a significant overhead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
72d5e0e
to
f078a0a
Compare
I needed to rebase the PR which is not good news for the reviewing. The improvements after the first review round start with 592a892 |
Adapt PresentationCompiler to always set (by default noop) profiler
…unId instead of 2
…ns and to match the store non-nullable signature
…-enabled. Allow to define dependencies in String/Phase settings (previously unused)
…by appending GC events after profiling is done.
…ems with scala2-library-bootstrapped compilation
dc9766c
to
718af3a
Compare
test performance please |
performance test scheduled: 1 job(s) in queue, 1 running. |
Performance test finished successfully: Visit https://dotty-bench.epfl.ch/19897/ to see the changes. Benchmarks is based on merging with main (3097a84) |
context.profiler
which could have beennull
, now it's initially a NoOp Profiler-Yprofile-enabled
dependent tasks, now we get an error if-Yprofile-trace
is set without-Yprofile-enabled