Move 'internalBinding('heap_utils').createHeapDump()' to user land #23328

paulrutter · 2018-10-08T07:45:32Z

This feature request comes from discussion in nodejs/diagnostics#239; ideally there would be a public API in nodejs that offers creating heapdumps.
There are several modules around (node-heapdump, v8-profiler, node-oom-heapdump etc) that offer this functionality, but it would make nodejs more mature if this is available in the public API.

@joyeecheung mentioned a way to generate a heapdump without help of an external module, but this seems a bit convoluted.

Phase 2 would be to have a way to hook into V8 isolate->SetOOMErrorHandler(OnOOMError); from JS code, so a user can define custom actions in case of an out of memory. I will create a separate feature request for that one.

The text was updated successfully, but these errors were encountered:

joyeecheung · 2018-10-08T08:11:31Z

BTW this is also available as require('internal/test/heap').createJSHeapDump() (for our heapdump tests)

I think it makes sense to expose it through something like require('v8').takeHeapSnapshot().

cc @addaleax @nodejs/v8 WDYT?

vmarchaud · 2018-10-08T08:21:30Z

I followed the discussion over the diagnostics group and @joyeecheung told that it's possible through the inspector module (link)
I believe it's already a public API, right ?

joyeecheung · 2018-10-08T08:28:58Z

I believe it's already a public API, right ?

@vmarchaud Sort of, to get the heap snapshot through the inspector API the user needs to create an inspector session, which brings more overhead than just calling TakeHeapSnapshot().

paulrutter · 2018-10-08T08:30:27Z

I'm not convinced that leveraging the inspector API should be needed to create a heapdump. Yes, it works, but it's not easy and quite error-prone. One could easily forget to close resources for example.

Also, the inspector API has no backpressure mechanism, which could lead to memory issues.

vmarchaud · 2018-10-08T08:35:18Z

I see, but my point would be that if we expose a public API to make a HeapProfile, why not for the CPU Profile too ?

I think it's more a debugging design question : should we expose all methods of the Inspector through high level method because it's "not easy" or "error-prone" ?

I personally believe we should better document those methods with the inspector module with clear example than exposing another API.

paulrutter · 2018-10-08T08:38:53Z

I see your point. CPU profile would be welcome as well, now we come to speak about it ;-).

Apart from the backpressure issue mentioned in my previous post, which makes using the inspector API not ideal, i guess it's a matter of taste.

vmarchaud · 2018-10-08T09:14:47Z

For the backpressure issue, i believe it's better for us to fix the issue upstream (in this case V8) than writing another implementation in Node ?

I agree on the fact that it's a matter of taste, but i would personally argue that the core goal is to expose the API, and if it's too complex, a userland package could simplify it (which is easier to maintain/release).

I believe the core made a similar decision with domain, an low level API is available now through async_hooks and now the domain is deprecated (which i believe will move to a userland package sometimes in the future)

joyeecheung · 2018-10-08T12:17:58Z

I see, but my point would be that if we expose a public API to make a HeapProfile, why not for the CPU Profile too ?
I think it's more a debugging design question : should we expose all methods of the Inspector through high level method because it's "not easy" or "error-prone" ?

That is a good point, though I personally find the inspector API a bit strange in the case of taking profiles - it makes much more sense in the debugger/runtime domains. For one thing, if you take the profiles through the inspector API, there will be objects/functions in the inspector module showing up in the profiles, not a significant noise but there is still noise. Probably V8 can provide a way for us to mark those as hidden, but exposing the raw methods allows users to avoid the many hoops that the inspector jumps in order to call them in the backend - and they are already part of the V8 API anyway (and quite stable, actually).

richardlau · 2018-10-08T13:18:34Z

cc @nodejs/diagnostics

addaleax · 2018-10-09T02:38:55Z

@joyeecheung Sure, I’d be okay with making this public if that’s what people think is a good idea. One question might be whether we want to present them with parsed JSON or the “raw” stream we get from V8?

joyeecheung · 2018-10-09T17:05:10Z

@addaleax Maybe providing an option like { parse: true } would be enough? (I am terrible at naming things so feel free to propose other ideas)

paulrutter · 2018-10-09T17:05:47Z

A stream might be better, as it doesn't require the whole json blob to be retained in memory.
Better yet: a stream that contains the parsed json chunks, as these are received from V8.

bmeck · 2018-10-09T17:11:07Z

Note that any decently sized heap dump is going to be too large to put into memory as a JS Object unless you want millions of objects to hit the GC and the size of them on the heap. DevTools libraries use typed arrays and views/short lived objects exactly because it is unusable for anything non-trivial.

jkrems · 2018-10-09T17:18:51Z

I think making parsing built-in might be a potential future improvement but it might make more sense as a userland-first feature (given that streaming JSON parsing isn't a thing core does yet). A stream of Buffer chunks would be a safe-ish API that people could build on top of.

joyeecheung · 2018-10-09T17:24:28Z

A stream might be better, as it doesn't require the whole json blob to be retained in memory.

The V8 API only supports JSON as the only serialization format for heap snapshots right now, if we need to implement a serializer ourselves I'd prefer we defer that to the user land, or request V8 to implement other serialization formats instead of doing it ourselves.
Also, the snapshot is already a blob in the memory (from C++ side) after it's generated, probably doesn't make that much difference whether we expose it to JS land in a stream or not...

Better yet: a stream that contains the parsed json chunks, as these are received from V8.

Are you talking about how the inspector API emits chunks of heap snapshot? ~~If so, again I think this should be available in the V8 API first before we start to expose something similar to users.~~

EDIT: oh, this is already available, we just need to implement another v8::OutputStream that emits chunks to JS instead of buffering them together.

jkrems · 2018-10-09T19:34:10Z

I think the stream interface is already part of the public API (lifted from V8HeapProfilerAgentImpl):

v8::HeapProfiler* profiler = m_isolate->GetHeapProfiler();
v8::HeapSnapshot* snapshot = profiler->TakeHeapSnapshot(0, &resolver);

MyOutputStream stream();
snapshot->Serialize(&stream);

Which should give us chunks of raw bytes I think. Actually, the current implementation in heap_utils.cc goes out of its way to assemble the stream into one buffer and then turn it into a string afaict.

sam-github · 2018-10-09T22:59:26Z

It can cause a lot of problems to return a heap snapshot in JS, since that uses heap. That doesn't mean it is never useful, but I hope it won't be the only way to get a heap in pure js. IIRC, the underlying V8 APIs allow setting a stream output destination, so the data can be written directly to disk and not fill up memory. If we add a wrapper around inspector, it would be useful to have similar functionality.

addaleax · 2018-10-10T00:59:15Z

@jkrems @sam-github The “issue” with that is that it’s still all happening synchronously… I think we’d want to change that in upstream V8 before exposing a real streaming interface?

jkrems · 2018-10-10T01:52:06Z

@addaleax Ah, sorry. Missed that part of your point. :) My vote would be to still expose it as a stream from day one. Even if it gets emitted in one synchronous burst of chunks, we should save on copying the data into strings (and/or parsing) and the interface wouldn't need to change if it becomes an asynchronous stream in the future.

Not sure though if an asynchronous stream would be likely - maybe in the presence of the sampling heap profiler where the VM can continue running..? Anyhow, I think the point I care about is that I believe the initial interface should be Buffer chunks [details may vary] and not parsed or strings. Because my guess is that most people would process the data offline, not in a process they want to investigate.

sam-github · 2018-10-10T04:30:10Z

Can you do an async heap snapshot? If js is still being executed, the heap is mutating.... how can you snapshot a heap that is being changed? Seems a hard but maybe not impossible problem.

With what we have now though, I think an API use case that is important is one in which no new js objects (even buffers) need to be allocated in order to retrieve the heap snapshot and write it to a file, which is possible (IIRC) with the sync streaming API exposed now.

As another use case, If there was some way, as @jkrems says, to expose a streaming API for use in-node, that wouldn't have to change if the underlying implementation became a true stream, that would be great, but it might involve difficult predictions of the future.

addaleax · 2018-10-10T04:35:31Z

@sam-github I think what currently happens is that heap snapshot creation and serialization are two different steps. The creation step does not support streaming and is fully synchronous (and should be, to avoid JS heap mutations while it’s happening), the serialization step is the streaming part.

That’s why I think async streaming could be doable if we want it to be – we don’t have to worry about heap allocations during streaming, because that’s always happening after the snapshot was taken.

joyeecheung · 2018-10-10T06:08:32Z

Just so we are clear: when talking about streaming, are we talking about the JS ReadableStreams? Or just some general mechanism like events emitted in the inspector API?
If we are talking about ReadableStreams, I don't think we can implement that somehow purely in C++ just so we can stay away from heap allocations? If we are not talking about ReadableStreams, what stream interface makes sense for the API? Taking callbacks from JS and passing the data in WriteAsciiChunk wrapped as buffers to them?

addaleax · 2018-10-10T15:23:43Z

@joyeecheung I think either would work fine? I was thinking about stream.Readable, but we could also do something simpler

paulrutter · 2018-10-15T18:43:41Z

Is there anything i can do to move this forward? Or do we need more discussion on this topic? Thanks for the insights up until now.

addaleax · 2018-10-17T10:53:54Z

@paulrutter I guess the main issue here is still the question of what API format we want… If we are okay with a stream.Readable (which I think would fulfill everybody’s requirements), then the next step would be to look into adding pull stream support to the relevant V8 API… that might be quite a bit of work, but possibly worth it?

If we want something that requires no V8 changes, we need some kind of synchronous API instead.

Trott · 2018-11-21T00:40:38Z

Is anyone working on moving this forward? Should it be put on the Diagnostics WG's agenda or something? Labeled help wanted? Something else?

cjihrig · 2019-05-02T15:14:09Z

Closing, as I believe this is done. Please reopen if I'm wrong.

paulrutter · 2019-05-02T20:21:53Z

Thanks! In which major versions will it land?

As this API is now available, the next step could be to have a way to hook into V8 isolate->SetOOMErrorHandler(OnOOMError); from JS code.
Is this something to create a separate request for?

cjihrig · 2019-05-02T21:41:44Z

v8.writeHeapSnapshot() originally shipped in v11.13.0. I'm not sure what, if any, backporting plans there are (@BethGriggs might know).

I think exposing an OOM handler would be a separate request. Wouldn't its usefulness from JS be fairly limited though in an OOM situation?

paulrutter · 2019-05-03T07:21:27Z

v8.writeHeapSnapshot() originally shipped in v11.13.0. I'm not sure what, if any, backporting plans there are (@BethGriggs might know).

Ok, good to know.

I think exposing an OOM handler would be a separate request. Wouldn't its usefulness from JS be fairly limited though in an OOM situation?

I agree, but following the response of @joyeecheung, he would rather have a signal than directly create a heapdump, so the developer can decide what to do with it.

cjihrig · 2019-05-03T12:51:30Z

would rather have a signal than directly create a heapdump

You can create a heap snapshot via signal using the --heapsnapshot-signal CLI flag.

paulrutter · 2019-05-03T12:55:06Z

That's not my point; i would want a heapdump when an out of memory occurs.
There is no signal for that i suppose?

addaleax · 2019-05-03T13:32:41Z

@paulrutter No, there currently is no signal or similar, and we can’t really execute JS from a real OOM handler. I do rememeber heapdump-on-OOM being discussed at the last @nodejs/diagnostics summit, but I can’t remember whether that was feasible or not.

(Either way, we should either re-open this issue or open a new one, discussions on closed ones tend to get lost easily. I’d prefer opening a new one.)

paulrutter · 2019-05-03T14:22:56Z

@addaleax I've created #27552 as a follow-up issue.

shaiacs · 2021-09-23T11:14:45Z

I'm unable to create heap dumps when the heap is large (eg.g 2GB and more). When trying to create the dump, the process memory just goes up and up to 3 and 4 times the size of the original heap until it eventually crashes.
I'm trying using either v8.writeHeapSnapshot or the heapdump package with node 16, latest v8 version, running on ubunto 18.04.6.
Is this a known behaviour or am I doing something wrong?

To reproduce I'm using the following code:
`const express = require("express");
const v8 = require("v8");
const app = express();
const port = 8790;
global.bigMap = {};

let counter = 0;

app.get("/memUsage", (req, res) => {
console.log("Getting used memory.");
for (let i=0;i<1000;i++)
{
global.bigMap[counter++] = new Array(100000000).join("a");
}
res.status(200).send("Used memory: " + (process.memoryUsage().rss / 1024 / 1024) + " MB.");
});

app.get("/heapDump", (req, res) => {
console.log("Creating heap dump.");
try
{
const dumpFileName = v8.writeHeapSnapshot();
console.log("Finished writing heap dump to {}.", dumpFileName);
res.status(200).send("Finished writing heap dump to " + dumpFileName);
} catch(e) {
console.error("Error dumping heap: " + JSON.stringify(e));
}
});

app.get("/", (req, res) => {
res.send("Hello World!");
});

app.listen(port, () => {
console.log(Example app listening at http://localhost:${port});
});`

I simply call /memUsage in a loop until the memory consumption reaches a few GBs:
for i in {1..2000}; do curl http://localhost:8790/memUsage; done
And then try to perform a heap dump:
curl http://localhost:8790/heapDump

paulrutter · 2021-09-23T11:45:11Z

Not sure if it's the same issue, but i came across this a few times as well.
See https://github.com/blueconic/node-oom-heapdump#memory-usage.

shaiacs · 2021-09-23T12:03:55Z

@paulrutter thanks. This might indeed be the issue. Doesn't look like there's a solution though.

paulrutter mentioned this issue Oct 8, 2018

Post-mortem out of memory analysis nodejs/diagnostics#239

Closed

joyeecheung added v8 engine Issues and PRs related to the V8 dependency. inspector Issues and PRs related to the V8 inspector protocol labels Oct 8, 2018

vmarchaud mentioned this issue Dec 6, 2018

Tier of Support: V8 Sampling Heap Profiler current state nodejs/diagnostics#259

Closed

joyeecheung mentioned this issue Mar 7, 2019

Heapdump generation as part of Node core nodejs/diagnostics#279

Closed

cjihrig closed this as completed May 2, 2019

paulrutter mentioned this issue May 3, 2019

Create heapdump on out of memory #27552

Closed

Move 'internalBinding('heap_utils').createHeapDump()' to user land #23328

Move 'internalBinding('heap_utils').createHeapDump()' to user land #23328

Comments

paulrutter commented Oct 8, 2018 • edited Loading

joyeecheung commented Oct 8, 2018 • edited Loading

vmarchaud commented Oct 8, 2018

joyeecheung commented Oct 8, 2018

paulrutter commented Oct 8, 2018 • edited Loading

vmarchaud commented Oct 8, 2018

paulrutter commented Oct 8, 2018

vmarchaud commented Oct 8, 2018

joyeecheung commented Oct 8, 2018 • edited Loading

richardlau commented Oct 8, 2018

addaleax commented Oct 9, 2018

joyeecheung commented Oct 9, 2018

paulrutter commented Oct 9, 2018

bmeck commented Oct 9, 2018

jkrems commented Oct 9, 2018 • edited Loading

joyeecheung commented Oct 9, 2018 • edited Loading

jkrems commented Oct 9, 2018 • edited Loading

sam-github commented Oct 9, 2018

addaleax commented Oct 10, 2018

jkrems commented Oct 10, 2018

sam-github commented Oct 10, 2018

addaleax commented Oct 10, 2018

joyeecheung commented Oct 10, 2018

addaleax commented Oct 10, 2018

paulrutter commented Oct 15, 2018

addaleax commented Oct 17, 2018

Trott commented Nov 21, 2018

cjihrig commented May 2, 2019

paulrutter commented May 2, 2019

cjihrig commented May 2, 2019

paulrutter commented May 3, 2019 • edited Loading

cjihrig commented May 3, 2019

paulrutter commented May 3, 2019

addaleax commented May 3, 2019

paulrutter commented May 3, 2019

shaiacs commented Sep 23, 2021

paulrutter commented Sep 23, 2021

shaiacs commented Sep 23, 2021

paulrutter commented Oct 8, 2018 •

edited

Loading

joyeecheung commented Oct 8, 2018 •

edited

Loading

paulrutter commented Oct 8, 2018 •

edited

Loading

joyeecheung commented Oct 8, 2018 •

edited

Loading

jkrems commented Oct 9, 2018 •

edited

Loading

joyeecheung commented Oct 9, 2018 •

edited

Loading

jkrems commented Oct 9, 2018 •

edited

Loading

paulrutter commented May 3, 2019 •

edited

Loading