Releases: discoveryjs/cpupro
0.5.1
- Added transformation from
parent
tochildren
for call tree nodes for.cpuprofile
files if needed (#5) - Implemented exclusion of ending no-sample time. For certain profiles, the time from the last sample until
endTime
can be significant, indicating the conclusion of the profiling session or adjustments from excluding idle samples at the end. This time is now excluded from theProfiling time
which used for computing time percentages - Fixed double rendering of the page after the profile data is loaded
0.5.0 Performance, Reworked UI, New formats, Deno
This release of CPUpro introduces significant updates, including performance enhancements, a redesigned user interface, and expanded format and runtime support. This version introduces groundbreaking enhancements that significantly reduce the time to load and process extremely large profiles, making CPUpro highly efficient for analyzing complex long-running scripts. The user interface has been thoroughly revamped to offer a more intuitive and responsive experience, enhancing usability across various features and views. New profile formats and support for the Deno runtime has been added, expanding the tool's versatility and adaptability to modern development environments.
Performance
CPUpro has been entirely re-engineered to optimize the preprocessing of profiles upon loading and for subsequent computations. This redesign enables it to handle massive profiles (exceeding 100MB) significantly faster than other tools. CPUpro is currently the best option for analyzing intense long-running scripts that generate extensive CPU profiles, such as webpack build profiles or prolonged browser sessions (that can last minutes or even tens of minutes).
The table below illustrates the time of loading and first render of profiles of varying sizes across different tools:
Profile size | Profile type | CPUpro v0.5 | CPUpro v0.4 | Chromium DevTools | speedscope |
---|---|---|---|---|---|
33MB 215k samples / 120k call tree |
V8 cpuprofile | 0.5s | 0.8s | 4.6s | 6.5s |
113MB 625k samples / 62k call tree |
Chromium Profile | 1.3s | 1.6s | 10.6s | 12.4s |
114MB 739k samples / 446k call tree |
V8 cpuprofile | 1.3s | 2.6s | 12.3s | 18.5s |
239MB 11.6M samples / 489k call tree |
V8 cpuprofile | 2.8s | 11.3s | 48s | Out of memory (after 23s) |
277MB 127k samples / 35k call tree |
Chromium Profile | 1.9s | 2.2s | 4.2s | Out of memory (after 30s) |
418MB 897k samples / 1.86M call tree |
V8 cpuprofile | 4.6s | 8.7s | Out of memory (after 36s) |
Out of memory (after 49s) |
2GB 7.3M samples / 7.28M call tree |
V8 cpuprofile | 27.1s | Out of memory (after 57s) |
Invalid string length (after 20s) |
Out of memory (after 43s) |
Chrome 124 / MacBook Pro 13-inch, M1, 2020
As indicated in the table, the time is affected not only by the profile size but also by its format, the number of samples and the size of the call tree (note that some profiles contain millions of samples and nodes). Notably, the Chromium Profile, which includes extensive additional data beside CPU profile, tends to load faster than .cpuprofile files of the same size. It is worth mentioning that some tools struggle with large profiles, hitting the heap size limit (4GB) and resulting in crashes because of "Out of Memory" errors, which is particularly frustrating when a lengthy load time yields no results. Unlike these tools, CPUpro avoids such pitfalls thanks to new optimizations, now capable of loading and processing even 2GB profiles.
When comparing the loading time between CPUpro versions 0.4 and 0.5, the difference does not look so impressive. The reason for this is that a significant portion of the time is spent on loading and parsing JSON which remains unchanged. However, if we isolate the processing time and initial rendering, where main optimization efforts were concentrated, the new version shows performance improvements ranging from 1.5 to 11 times:
Profile size | Profile type | Load data & parse | CPUpro v0.5 (computations + render) |
CPUpro v0.4 (computations + render) |
Delta |
---|---|---|---|---|---|
33MB 215k samples / 120k call tree |
V8 cpuprofile | 0.3s | 0.16s | 0.52s | 3.1x |
113MB 625k samples / 62k call tree |
Chromium Profile | 1.1s | 0.21s | 0.64s | 3.0x |
114MB 739k samples / 446k call tree |
V8 cpuprofile | 0.9s | 0.37s | 1.48s | 4.0x |
239MB 11.6M samples / 489k call tree |
V8 cpuprofile | 2.2s | 0.79s | 9.21s | 11.7x |
277MB 127k samples / 35k call tree |
Chromium Profile | 1.9s | 0.15s | 0.24s | 1.7x |
418MB 897k samples / 1.86M call tree |
V8 cpuprofile | 3.6s | 1.12s | 4.26s | 3.6x |
2GB 7.3M samples / 7.28M call tree |
V8 cpuprofile | 22.1s | 4.98s | – | – |
Chrome 124 / MacBook Pro 13-inch, M1, 2020
The acceleration was achieved by switching to linear memory (TypedArrays) for tree representation and calculations storage, despite the increased number and complexity of computations added since v0.4. The majority of the calculation algorithms are implemented using simple loops without recursion or complex branching. Experiments with WebAssembly for some calculations have resulted in up to a 2x speed increase in JavaScriptCore (Safari) and SpiderMonkey (Firefox), aligning execution times with V8, where there was no change in performance. Remarkably, the new algorithms allow V8 to optimize JavaScript execution to match the efficiency of WebAssembly, which was an unexpected.
Adopting TypedArray has drastically reduced heap memory usage. While modern browsers typically offer up to 4GB of heap space, exceeding this limit can crash browser's tab (and, accordingly, the app). CPUpro primarily uses the heap only for loading and parsing JSON and during the initial stages of data processing, then most data is managed using TypedArrays. These buffers, stored in what is termed "external memory", are only limited by the system's available memory, significantly lowering the risk of crashes due to "Out of memory". However, there is no reason to worry about it, since CPUpro consumes memory sparingly:
Profile size | Profile type | CPUpro v0.5 | CPUpro v0.4 | Chromium DevTools | speedscope |
---|---|---|---|---|---|
33MB 215k samples / 120k call tree |
V8 cpuprofile | 8MB External: 20MB |
97MB | 752MB | 916MB |
113MB 625k samples / 62k call tree |
Chromium Profile | 7MB External: 17MB |
61MB | 1063MB | 466MB |
114MB 739k samples / 446k call tree |
V8 cpuprofile | 8MB External: 155MB |
324MB | 1803MB | 2001MB |
239MB 11.6M samples / 489k call tree |
V8 cpuprofile | 12MB External: 92MB |
463MB | 3877MB | Out of memory |
277MB 127k samples / 35k call tree |
Chromium Profile | 8MB External: 9MB |
34MB | 488MB | Out of memory |
418MB 897k samples / 1.86M call tree |
V8 cpuprofile | 18MB External: 233MB |
1387MB | Out of memory | Out of memory |
2GB 7.3M samples / 7.28M call tree |
V8 cpuprofile | 22MB External: 866MB |
Out of memory | Invalid string length | Out of memory |
Data collected after loading the profile and calling the garbage collector
After loading the profile and initial calculations, CPUpro is ready for rapid timings recalculations and data sampling on demand, e.g. filter changes. This enhancement enabled the introduction of new complex views that were previously impossible due to prolonged calculations (many seconds) and UI freezing, which broke the user experience. Most views have also been optimized to react almost instantaneously to changes in filters, ensuring a seamless user experience even with large profiles.
cpupro-perf.mov
The optimizations in speed and memory efficiency are not just about improving profile loading and UI responsiveness, they also unlock new capabilities. Notably, it's crucial for features such as profile comparison, which requires loading at least two profiles, potentially doubling both the computation time and memory usage. These challenges have been addressed, setting the stage for future enhancements including profile comparison and more.
User interface
The user interface has undergone a significant redesign. The start page now appears more compact and provides a clearer overview of how the V8 engine operates. It features a timeline categorized by work type and function clustering tables, followed by a flamechart.
Other pages have also been reworked to be more informative. Each page now includes:
- A timeline that not only displays self time but also nested time, with the distribution of nested time by categories.
- A new section titled "Nested Time Distribution" that offers insights into the distribution of nested time in a hierarchical format, from a package to a function.
- A basic flamechart displaying all frames related to the current subject (category, package, module, or function) as root frames.
The timeline has been enhanced with a tooltip that provides expanded details and the capability to select a range, a feature previously lacking when focusing on specific segments of work.
The Flamechart is now faster and smoother. It includes new selection capabilities and a detailed information block for the selected or zoomed frame.
The welcome page has been redesigned as well, and now offers example profiles in various formats to try:
New formats, runtimes, and registries
Support for new formats has been introduced:
- V8 log converted into JSON with the
--preprocess
op...
0.4.0
- Report
- Extracted regular expression into a separate area
regexp
- Fixed edge cases when
scriptId
is not a number - Added ancestor call sites on a function page
- Added function grouping on a function page (enabled by default)
- Added timeline split by areas on default page
- Improved function subtree displaying
- Fixed processing of evaled functions (call frames with
evalmachine
prefixes)
- Extracted regular expression into a separate area
- CLI:
- Added support to load jsonxl files
- API:
- Profile (result of
profileEnd()
):- Renamed methods:
writeToFile()
->writeToFileAsync()
writeToFileSync()
->writeToFile()
writeJsonxlToFileSync()
->writeJsonxlToFile()
- Changed
writeToFileAsync()
,writeToFile()
andwriteJsonxlToFile()
methods to return a destination file path - Added
writeReport()
method as alias toreport.writeToFile()
- Renamed methods:
profileEnd().report
- Renamed
writeToFile()
->writeToFileAsync()
andwriteToFileSync()
->writeToFile()
(however, at the moment both are sync) - Changed
open()
method to return a destination file path
- Renamed
- Capture (result of
profile()
)- Added
onEnd(callback)
method to add a callback to call once capturing is finished, a callback can take a profiling result argument - Added
writeToFile()
,writeJsonxlToFile()
,writeReport()
andopenReport()
methods to call corresponding methods one capturing is finished
- Added
- Changed
profile()
to return an active capturing for a name if any instead of creating a new one - Changed
profile()
to subscribe on process exit to end profiling (process.on('exit', () => profileEnd())
) - Added
writeToFile()
,writeJsonxlToFile()
,writeReport()
andopenReport()
methods that startsprofile()
and call a corresponding method, i.e.writeReport()
is the same asprofile().writeReport()
- Profile (result of
0.3.0
- Used jsonxl binary and gzip encodings for data on report generating, which allow to produce a report much faster and much smaller (up to 50 times) in size
- Added
writeJsonxlToFileSync()
method to profile - Added
build/*.html
andpackage.json
to exports - Report
- Bumped
discoveryjs
to1.0.0-beta.73
- Enabled embed API for integrations
- Rework
flamechart
for performance and reliability, it's a little more compact now - Added badges for function references
- Updated segments timeline
- Fixed Windows path processing
- New page badges
- Bumped
0.2.1 – Boosted flame chart performance and fixes
- Added count badges and tweaked numeric captions
- Reworked
flamechart
view to improve performance especially on large datasets (eliminated double "renders" in some cases, a lot of unnecessary computations and other optimisations) - Changed behaviour in
flamechart
when click on already selected frame to select previously selected frame with a lower depth - Fixed
flamechart
's view height updating when stack depth is growing on zoom - Fixed processing of profiles when call frame
scriptId
is a non-numeric string - Bumped
discoveryjs
to 1.0.0-beta.65
0.2.0 – Support for Chromium profile format & flame charts
- Added support for Chromium Developer Tools profile format (Trace Event Format)
- Added flame chart on index page
- Fixed time deltas processing
- Fixed total time computation for areas, packages, modules and functions
- Fixed module path processing
- Reworked aggregations for areas, packages, modules and functions
0.1.1
- Added missed
bin
field - Renamed profile recording method
end()
intoprofileEnd()
for less confussion - Fixed a crash in viewer when an element in
nodes
doesn't contain achildren
field, e.g. when DevTools protocol is used - Fixed file module path normalization in viewer
- Removed modification of
startTime
andendTime
in recorded profile - Exposed
createReport()
method
0.1.0 – Hello world
- Initial release