Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move 'internalBinding('heap_utils').createHeapDump()' to user land #23328

Closed
paulrutter opened this issue Oct 8, 2018 · 37 comments
Closed

Move 'internalBinding('heap_utils').createHeapDump()' to user land #23328

paulrutter opened this issue Oct 8, 2018 · 37 comments
Labels
inspector Issues and PRs related to the V8 inspector protocol v8 engine Issues and PRs related to the V8 dependency.

Comments

@paulrutter
Copy link

paulrutter commented Oct 8, 2018

This feature request comes from discussion in nodejs/diagnostics#239; ideally there would be a public API in nodejs that offers creating heapdumps.
There are several modules around (node-heapdump, v8-profiler, node-oom-heapdump etc) that offer this functionality, but it would make nodejs more mature if this is available in the public API.

@joyeecheung mentioned a way to generate a heapdump without help of an external module, but this seems a bit convoluted.

Phase 2 would be to have a way to hook into V8 isolate->SetOOMErrorHandler(OnOOMError); from JS code, so a user can define custom actions in case of an out of memory. I will create a separate feature request for that one.

@joyeecheung
Copy link
Member

joyeecheung commented Oct 8, 2018

BTW this is also available as require('internal/test/heap').createJSHeapDump() (for our heapdump tests)

I think it makes sense to expose it through something like require('v8').takeHeapSnapshot().

cc @addaleax @nodejs/v8 WDYT?

@vmarchaud
Copy link
Contributor

I followed the discussion over the diagnostics group and @joyeecheung told that it's possible through the inspector module (link)
I believe it's already a public API, right ?

@joyeecheung
Copy link
Member

I believe it's already a public API, right ?

@vmarchaud Sort of, to get the heap snapshot through the inspector API the user needs to create an inspector session, which brings more overhead than just calling TakeHeapSnapshot().

@paulrutter
Copy link
Author

paulrutter commented Oct 8, 2018

I'm not convinced that leveraging the inspector API should be needed to create a heapdump. Yes, it works, but it's not easy and quite error-prone. One could easily forget to close resources for example.

Also, the inspector API has no backpressure mechanism, which could lead to memory issues.

@vmarchaud
Copy link
Contributor

I see, but my point would be that if we expose a public API to make a HeapProfile, why not for the CPU Profile too ?

I think it's more a debugging design question : should we expose all methods of the Inspector through high level method because it's "not easy" or "error-prone" ?

I personally believe we should better document those methods with the inspector module with clear example than exposing another API.

@paulrutter
Copy link
Author

I see your point. CPU profile would be welcome as well, now we come to speak about it ;-).

Apart from the backpressure issue mentioned in my previous post, which makes using the inspector API not ideal, i guess it's a matter of taste.

@vmarchaud
Copy link
Contributor

For the backpressure issue, i believe it's better for us to fix the issue upstream (in this case V8) than writing another implementation in Node ?

I agree on the fact that it's a matter of taste, but i would personally argue that the core goal is to expose the API, and if it's too complex, a userland package could simplify it (which is easier to maintain/release).

I believe the core made a similar decision with domain, an low level API is available now through async_hooks and now the domain is deprecated (which i believe will move to a userland package sometimes in the future)

@joyeecheung
Copy link
Member

joyeecheung commented Oct 8, 2018

I see, but my point would be that if we expose a public API to make a HeapProfile, why not for the CPU Profile too ?
I think it's more a debugging design question : should we expose all methods of the Inspector through high level method because it's "not easy" or "error-prone" ?

That is a good point, though I personally find the inspector API a bit strange in the case of taking profiles - it makes much more sense in the debugger/runtime domains. For one thing, if you take the profiles through the inspector API, there will be objects/functions in the inspector module showing up in the profiles, not a significant noise but there is still noise. Probably V8 can provide a way for us to mark those as hidden, but exposing the raw methods allows users to avoid the many hoops that the inspector jumps in order to call them in the backend - and they are already part of the V8 API anyway (and quite stable, actually).

@joyeecheung joyeecheung added v8 engine Issues and PRs related to the V8 dependency. inspector Issues and PRs related to the V8 inspector protocol labels Oct 8, 2018
@richardlau
Copy link
Member

cc @nodejs/diagnostics

@addaleax
Copy link
Member

addaleax commented Oct 9, 2018

@joyeecheung Sure, I’d be okay with making this public if that’s what people think is a good idea. One question might be whether we want to present them with parsed JSON or the “raw” stream we get from V8?

@joyeecheung
Copy link
Member

@addaleax Maybe providing an option like { parse: true } would be enough? (I am terrible at naming things so feel free to propose other ideas)

@paulrutter
Copy link
Author

A stream might be better, as it doesn't require the whole json blob to be retained in memory.
Better yet: a stream that contains the parsed json chunks, as these are received from V8.

@bmeck
Copy link
Member

bmeck commented Oct 9, 2018

Note that any decently sized heap dump is going to be too large to put into memory as a JS Object unless you want millions of objects to hit the GC and the size of them on the heap. DevTools libraries use typed arrays and views/short lived objects exactly because it is unusable for anything non-trivial.

@jkrems
Copy link
Contributor

jkrems commented Oct 9, 2018

I think making parsing built-in might be a potential future improvement but it might make more sense as a userland-first feature (given that streaming JSON parsing isn't a thing core does yet). A stream of Buffer chunks would be a safe-ish API that people could build on top of.

@joyeecheung
Copy link
Member

joyeecheung commented Oct 9, 2018

A stream might be better, as it doesn't require the whole json blob to be retained in memory.

The V8 API only supports JSON as the only serialization format for heap snapshots right now, if we need to implement a serializer ourselves I'd prefer we defer that to the user land, or request V8 to implement other serialization formats instead of doing it ourselves.
Also, the snapshot is already a blob in the memory (from C++ side) after it's generated, probably doesn't make that much difference whether we expose it to JS land in a stream or not...

Better yet: a stream that contains the parsed json chunks, as these are received from V8.

Are you talking about how the inspector API emits chunks of heap snapshot? If so, again I think this should be available in the V8 API first before we start to expose something similar to users.

EDIT: oh, this is already available, we just need to implement another v8::OutputStream that emits chunks to JS instead of buffering them together.

@jkrems
Copy link
Contributor

jkrems commented Oct 9, 2018

I think the stream interface is already part of the public API (lifted from V8HeapProfilerAgentImpl):

v8::HeapProfiler* profiler = m_isolate->GetHeapProfiler();
v8::HeapSnapshot* snapshot = profiler->TakeHeapSnapshot(0, &resolver);

MyOutputStream stream();
snapshot->Serialize(&stream);

Which should give us chunks of raw bytes I think. Actually, the current implementation in heap_utils.cc goes out of its way to assemble the stream into one buffer and then turn it into a string afaict.

@sam-github
Copy link
Contributor

It can cause a lot of problems to return a heap snapshot in JS, since that uses heap. That doesn't mean it is never useful, but I hope it won't be the only way to get a heap in pure js. IIRC, the underlying V8 APIs allow setting a stream output destination, so the data can be written directly to disk and not fill up memory. If we add a wrapper around inspector, it would be useful to have similar functionality.

@addaleax
Copy link
Member

@jkrems @sam-github The “issue” with that is that it’s still all happening synchronously… I think we’d want to change that in upstream V8 before exposing a real streaming interface?

@jkrems
Copy link
Contributor

jkrems commented Oct 10, 2018

@addaleax Ah, sorry. Missed that part of your point. :) My vote would be to still expose it as a stream from day one. Even if it gets emitted in one synchronous burst of chunks, we should save on copying the data into strings (and/or parsing) and the interface wouldn't need to change if it becomes an asynchronous stream in the future.

Not sure though if an asynchronous stream would be likely - maybe in the presence of the sampling heap profiler where the VM can continue running..? Anyhow, I think the point I care about is that I believe the initial interface should be Buffer chunks [details may vary] and not parsed or strings. Because my guess is that most people would process the data offline, not in a process they want to investigate.

@sam-github
Copy link
Contributor

Can you do an async heap snapshot? If js is still being executed, the heap is mutating.... how can you snapshot a heap that is being changed? Seems a hard but maybe not impossible problem.

With what we have now though, I think an API use case that is important is one in which no new js objects (even buffers) need to be allocated in order to retrieve the heap snapshot and write it to a file, which is possible (IIRC) with the sync streaming API exposed now.

As another use case, If there was some way, as @jkrems says, to expose a streaming API for use in-node, that wouldn't have to change if the underlying implementation became a true stream, that would be great, but it might involve difficult predictions of the future.

@addaleax
Copy link
Member

@sam-github I think what currently happens is that heap snapshot creation and serialization are two different steps. The creation step does not support streaming and is fully synchronous (and should be, to avoid JS heap mutations while it’s happening), the serialization step is the streaming part.

That’s why I think async streaming could be doable if we want it to be – we don’t have to worry about heap allocations during streaming, because that’s always happening after the snapshot was taken.

@joyeecheung
Copy link
Member

Just so we are clear: when talking about streaming, are we talking about the JS ReadableStreams? Or just some general mechanism like events emitted in the inspector API?
If we are talking about ReadableStreams, I don't think we can implement that somehow purely in C++ just so we can stay away from heap allocations? If we are not talking about ReadableStreams, what stream interface makes sense for the API? Taking callbacks from JS and passing the data in WriteAsciiChunk wrapped as buffers to them?

@addaleax
Copy link
Member

@joyeecheung I think either would work fine? I was thinking about stream.Readable, but we could also do something simpler

@paulrutter
Copy link
Author

Is there anything i can do to move this forward? Or do we need more discussion on this topic? Thanks for the insights up until now.

@addaleax
Copy link
Member

@paulrutter I guess the main issue here is still the question of what API format we want… If we are okay with a stream.Readable (which I think would fulfill everybody’s requirements), then the next step would be to look into adding pull stream support to the relevant V8 API… that might be quite a bit of work, but possibly worth it?

If we want something that requires no V8 changes, we need some kind of synchronous API instead.

@Trott
Copy link
Member

Trott commented Nov 21, 2018

Is anyone working on moving this forward? Should it be put on the Diagnostics WG's agenda or something? Labeled help wanted? Something else?

@cjihrig
Copy link
Contributor

cjihrig commented May 2, 2019

Closing, as I believe this is done. Please reopen if I'm wrong.

@cjihrig cjihrig closed this as completed May 2, 2019
@paulrutter
Copy link
Author

Thanks! In which major versions will it land?

As this API is now available, the next step could be to have a way to hook into V8 isolate->SetOOMErrorHandler(OnOOMError); from JS code.
Is this something to create a separate request for?

@cjihrig
Copy link
Contributor

cjihrig commented May 2, 2019

v8.writeHeapSnapshot() originally shipped in v11.13.0. I'm not sure what, if any, backporting plans there are (@BethGriggs might know).

I think exposing an OOM handler would be a separate request. Wouldn't its usefulness from JS be fairly limited though in an OOM situation?

@paulrutter
Copy link
Author

paulrutter commented May 3, 2019

v8.writeHeapSnapshot() originally shipped in v11.13.0. I'm not sure what, if any, backporting plans there are (@BethGriggs might know).

Ok, good to know.

I think exposing an OOM handler would be a separate request. Wouldn't its usefulness from JS be fairly limited though in an OOM situation?

I agree, but following the response of @joyeecheung, he would rather have a signal than directly create a heapdump, so the developer can decide what to do with it.

@cjihrig
Copy link
Contributor

cjihrig commented May 3, 2019

would rather have a signal than directly create a heapdump

You can create a heap snapshot via signal using the --heapsnapshot-signal CLI flag.

@paulrutter
Copy link
Author

That's not my point; i would want a heapdump when an out of memory occurs.
There is no signal for that i suppose?

@addaleax
Copy link
Member

addaleax commented May 3, 2019

@paulrutter No, there currently is no signal or similar, and we can’t really execute JS from a real OOM handler. I do rememeber heapdump-on-OOM being discussed at the last @nodejs/diagnostics summit, but I can’t remember whether that was feasible or not.

(Either way, we should either re-open this issue or open a new one, discussions on closed ones tend to get lost easily. I’d prefer opening a new one.)

@paulrutter
Copy link
Author

@addaleax I've created #27552 as a follow-up issue.

@shaiacs
Copy link

shaiacs commented Sep 23, 2021

I'm unable to create heap dumps when the heap is large (eg.g 2GB and more). When trying to create the dump, the process memory just goes up and up to 3 and 4 times the size of the original heap until it eventually crashes.
I'm trying using either v8.writeHeapSnapshot or the heapdump package with node 16, latest v8 version, running on ubunto 18.04.6.
Is this a known behaviour or am I doing something wrong?

To reproduce I'm using the following code:
`const express = require("express");
const v8 = require("v8");
const app = express();
const port = 8790;
global.bigMap = {};

let counter = 0;

app.get("/memUsage", (req, res) => {
console.log("Getting used memory.");
for (let i=0;i<1000;i++)
{
global.bigMap[counter++] = new Array(100000000).join("a");
}
res.status(200).send("Used memory: " + (process.memoryUsage().rss / 1024 / 1024) + " MB.");
});

app.get("/heapDump", (req, res) => {
console.log("Creating heap dump.");
try
{
const dumpFileName = v8.writeHeapSnapshot();
console.log("Finished writing heap dump to {}.", dumpFileName);
res.status(200).send("Finished writing heap dump to " + dumpFileName);
} catch(e) {
console.error("Error dumping heap: " + JSON.stringify(e));
}
});

app.get("/", (req, res) => {
res.send("Hello World!");
});

app.listen(port, () => {
console.log(Example app listening at http://localhost:${port});
});`

I simply call /memUsage in a loop until the memory consumption reaches a few GBs:
for i in {1..2000}; do curl http://localhost:8790/memUsage; done
And then try to perform a heap dump:
curl http://localhost:8790/heapDump

@paulrutter
Copy link
Author

Not sure if it's the same issue, but i came across this a few times as well.
See https://github.com/blueconic/node-oom-heapdump#memory-usage.

@shaiacs
Copy link

shaiacs commented Sep 23, 2021

@paulrutter thanks. This might indeed be the issue. Doesn't look like there's a solution though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inspector Issues and PRs related to the V8 inspector protocol v8 engine Issues and PRs related to the V8 dependency.
Projects
None yet
Development

No branches or pull requests