Post-mortem out of memory analysis #239

paulrutter · 2018-10-04T07:53:30Z

Hi,

While reading the readme on https://github.com/nodejs/diagnostics, one of the domains is "Heap and Memory Analysis", although i cannot find anything about post-mortem out of memory analysis.

In our usecase, we run several hundred nodejs processes (in a Docker container), which are restricted to run with very low memory (+- 80MB). When these go out of memory, we want to be able to analyze what's on the heap when it happened.

For that purpose, we created "node-oom-heapdump" (https://github.com/blueconic/node-oom-heapdump), which does exactly that.

Is post-mortem out of memory analysis planned to be included in the Node.js core?

joyeecheung · 2018-10-04T07:58:55Z

Not in node core, but I tend to use https://github.com/nodejs/llnode to diagnose coredumps produced during OOM (it probably needs a few tweaks to generate coredumps in docker when the process crashes though). You can at least see the stacktrace (C++ and JS) when the OOM happens which tend to be useful (unless the process is not running the code that causes the problem when it crashes). You can also scan the heap and find the types with the most instances which is also helpful most of the times.

paulrutter · 2018-10-04T08:08:14Z

Thanks!

I have seen that one in the past but found it to be a bit too low-level for my purposes.
Of course, it might provide more insight, but the threshold to be able to use it seems to be a bit higher then just including our own module (which creates .heapsnapshot files which can be loaded in Chrome DevTools).
I would prefer something easier; just like Java does, it would be nice to have some kind of command line flag which automatically creates those heapdumps on out of memory (https://docs.oracle.com/javase/7/docs/webnotes/tsg/TSG-VM/html/clopts.html).
Any ideas on this?

joyeecheung · 2018-10-04T08:33:16Z

@paulrutter There is an experimental JS API of llnode https://github.com/nodejs/llnode/blob/master/JSAPI.md which allows you to install it as a normal C++ Node.js addon and use the JS API to access info in the core dump (it's not yet completed, of course).

There is also https://github.com/nodejs/node-report which can dump a summary when the process crashes, probably a bit closer to the Java heapdump experience. The problem with heapdumps is that they only include information from the VM's managed heap, but sometimes the cause may come from the native layer, so it's sometimes inevitable to dig into there.

joyeecheung · 2018-10-04T08:38:37Z

Also, node-report is being integrated into core nodejs/node#22712 the current implementation allows you to use command line flags to configure when the report will be generated.

paulrutter · 2018-10-04T08:43:14Z

Thanks, that's very helpful.
I understand that a heapdump alone sometimes might not be enough, although for our usecases it usually is (we run pluggable JS code which usually is the cause for mem leaks).

I'll take a look into node-report to see if that's helpful 👍

mmarchini · 2018-10-04T09:41:50Z

Another useful tool is https://github.com/v8/sampling-heap-profiler, which shows allocation per function call. This tool is also more suited for production scenarios because taking heapdumps is expensive (high memory consumption) and slow, but it shouldn't be a problem if the process keeps memory usage around 80MB.

paulrutter · 2018-10-04T10:45:10Z

Thanks, that's another one not known to me until now.
Memory usage for generating heapdumps is not an issue in case of OoM, because the process is lost anyway. For on-demand heapdumps; those are created from another process, so memory consumption is not a risk for the process being analyzed (in node-oom-heapdump, other modules like v8-profiler run in the same process).

mhdawson · 2018-10-04T20:49:18Z

@paulrutter please let me know your experiences with node-report. Would like to hear about what worked well as well as other information that might be useful in node-report.

In my session coming up at NodeConfEU one of the demo's included is using nore-report to help when you have an OOM.

paulrutter · 2018-10-05T07:21:38Z

Hi @mhdawson,

I tried node-report yesterday (the fatalerror.js example included in the source), and although this is definitely helpful information in case of an OoM, i still miss information about what the heap contained at the time.
If a heapdump (v8::Isolate::GetCurrent()->GetHeapProfiler()->TakeHeapSnapshot();) was created as well, that would give more insight. Of course, as @joyeecheung mentioned, this only shows the VM managed heap and no native info.

mhdawson · 2018-10-05T17:54:17Z

@paulrutter taking a heap snapshot is also one of the core diagnostics capabilities that I believe should be in core. We have a PR working to get node-report into core itself (in a bit different form). After that I want to look at the best practice for getting a heapdump.

@richardlau at one point I had thought node-report was going to enable triggering a heap dump as well but maybe that was https://github.com/RuntimeTools/appmetrics instead.

You can get a heapdump with the https://www.npmjs.com/package/heapdump module and also through the inspector APIs (I've not had time to validate the later myself quite)

paulrutter · 2018-10-05T18:00:40Z

Thanks!

For the latter, we now created our own module called node-oom-heapdump, which works very well for us. But it would be nice to have this functionality in the core like you mentioned. Just a matter of time, i guess?

I could help on getting a pull request for node-report, if that would help?

richardlau · 2018-10-05T18:54:39Z

@mhdawson I have a feeling it was appmetrics and that ended up using the heapdump module.

mhdawson · 2018-10-05T19:50:19Z

@paulrutter you mean a PR to add heapdump trigger support in node-report or something else?

joyeecheung · 2018-10-05T19:50:41Z

By the way, the heap snapshot doesn't seem to work on larger heaps: https://bugs.chromium.org/p/chromium/issues/detail?id=826697

mhdawson · 2018-10-05T19:53:48Z

@paulrutter if you are interested we might collaborate on the path of having a heapdump solution in core. I'm quite interested in that but have just not had time to make progress. One thing people have mentioned is that it is possible through the inspector API so first step is to understand that. If its sufficient then documenting it as part of the diagnostics group's best practices and if not making the case for what we think should be in core. Adding @gireeshpunathil as another interested party.

joyeecheung · 2018-10-05T19:53:51Z

also through the inspector APIs (I've not had time to validate the later myself quite)

@mhdawson That is certainly possible. This is the snippet I used to generate the heap snapshots for the screenshots in nodejs/node#23072

See snippet

'use strict';

const http2 = require('http2');
const inspector = require('inspector');
const session = new inspector.Session();
session.connect();
const fs = require('fs');

function createSnapshot() {
  let buf = '';
  session.on('HeapProfiler.addHeapSnapshotChunk', ({
    method, params: { chunk }
  }) => {
    console.log('addHeapSnapshotChunk', chunk.length);
    buf += chunk;
  });
  session.on('HeapProfiler.reportHeapSnapshotProgress', (progress) => {
    console.log('reportHeapSnapshotProgress', progress);
  });
  session.post('HeapProfiler.takeHeapSnapshot', {
    reportProgress: true
  }, () => {
    console.log('Writing snapshot');
    fs.writeFileSync('./heap.heapsnapshot', buf);
  });
}

const server = http2.createServer();
server.on('stream', (stream) => {
  stream.respondWithFile(__filename);
});
server.listen(0, () => {
  const client = http2.connect(`http://localhost:${server.address().port}`);
  const req = client.request();

  req.on('response', () => {
    createSnapshot();
  });

  req.resume();
  req.on('end', () => {
    client.close();
    server.close();
  });
  req.end();
});

mhdawson · 2018-10-05T19:57:16Z

@joyeecheung the key thing for me is that you should be able to trigger heapdumps without having change your code. What I thought was being suggested was to connect through the external inspector interface and request a heapdump. The potential issue with that is whether it is acceptable/practical to have that enabled/accessible security in production.

One other question, for your example above did you need to enable the inspector through a command line option or is simply requiring like you show enough?

joyeecheung · 2018-10-05T22:08:27Z

@mhdawson Without code changes we will probably have to achieve this through certain IPC mechanisms, AFAIK user land tend to use signals for this (so does node-report).

The snippet above just needs to be run with the node executable without any flags to produce a heapdump.

paulrutter · 2018-10-06T11:56:50Z

To be clear; we're talking about two separate things here:
1.creating a heapdump should be core functionality without having to depend on userland packages. I'm all in favor of this.
2. Creating a heapdump when an out of memory situation occurs, should be a core function as well (afaik), possibly behind a command line flag. This could use the functionality specified in bullet 1 to do so, with help of the SetOOMErrorHandler in V8.

I would be happy to collaborate on both topics. I could possibly get some time on this within the company i work for, because it benefits them as well.

Let me know if my summary checks out for you as well and if so, go from there.

mmarchini · 2018-10-06T12:54:29Z

I like these ideas but I'm still concerned about performance issues and memory footprint while generating heap dumps. AFAIK, taking a heapdump of large heaps takes too much time, uses at least the same amount of memory already allocated for the heap and can't be done in a separate thread (e.g.: it is a blocking operation which will stop the event loop). These problems won't be a problem when dealing with smaller heaps, but most Node.js users will experience OOM only after they're using more than 1Gb (which is the default max-old-space-size setting).

I think we need to solve these problems before integrating this into core.

cc @hashseed @ofrobots @bmeurer any insights on this?

paulrutter · 2018-10-06T13:05:29Z

That certainly is an issue for bullet 1, but not for 2. In case of OoM, blocking the event loop for a heapdump doesn't sound like an issue to me? The process is dead anyway.
Memory is a concern because it required roughly two times the configured heap size.

joyeecheung · 2018-10-06T19:31:51Z

1.creating a heapdump should be core functionality without having to depend on user land packages. I'm all in favor of this.

This is already possible in JS land through the inspector API. if the request is for something that directly calls into heap_profiler->TakeHeapSnapshot() without creating an inspector session, there is already an internal API internalBinding('heap_utils').createHeapDump()that does this for testing heap dumps, I would suggest to open an issue in the nodejs/node repo to request moving that to user land and see if collaborators are comfortable with that.

For bullet 2 I would perfer something more general that allows users to configure actions in OOM other than just taking an heapdump - but again we can only find out how comfortable collaborators feel about this with an issue in the core repo, since that's where the code will land.

paulrutter · 2018-10-07T05:53:53Z

Ok, that is clarifying! That indeed is what i mean for bullet 1. Just an API for creating heapdumps. Will test next week and create an issue to move that to a more public API, or document it somewhere.

For bullet 2, making it configurable would make it more flexibele, i agree. Maybe an ability to hook into SetOOMErrorHandler from JS code would then be enough? Something like process.on('fatal_error') or something similar?

joyeecheung · 2018-10-07T12:30:10Z

or document it somewhere.

It’s an internal API for testing so I believe documenting it (without moving it to the public API surface) is not really on the table.

paulrutter · 2018-10-08T07:46:21Z

I created nodejs/node#23328 for creating heapdumps in core.
Let's see what the responses are before filing an issue for bullet 2.

paulrutter · 2019-05-03T14:24:46Z

And the follow up issue here: nodejs/node#27552

paulrutter mentioned this issue Oct 8, 2018

Move 'internalBinding('heap_utils').createHeapDump()' to user land nodejs/node#23328

Closed

paulrutter mentioned this issue May 3, 2019

Create heapdump on out of memory nodejs/node#27552

Closed

paulrutter closed this as completed May 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post-mortem out of memory analysis #239

Post-mortem out of memory analysis #239

paulrutter commented Oct 4, 2018

joyeecheung commented Oct 4, 2018 •

edited

Loading

paulrutter commented Oct 4, 2018

joyeecheung commented Oct 4, 2018

joyeecheung commented Oct 4, 2018

paulrutter commented Oct 4, 2018

mmarchini commented Oct 4, 2018

paulrutter commented Oct 4, 2018

mhdawson commented Oct 4, 2018

paulrutter commented Oct 5, 2018

mhdawson commented Oct 5, 2018

paulrutter commented Oct 5, 2018

richardlau commented Oct 5, 2018

mhdawson commented Oct 5, 2018

joyeecheung commented Oct 5, 2018

mhdawson commented Oct 5, 2018

joyeecheung commented Oct 5, 2018 •

edited

Loading

mhdawson commented Oct 5, 2018

joyeecheung commented Oct 5, 2018

paulrutter commented Oct 6, 2018

mmarchini commented Oct 6, 2018

paulrutter commented Oct 6, 2018

joyeecheung commented Oct 6, 2018 •

edited

Loading

paulrutter commented Oct 7, 2018 •

edited

Loading

joyeecheung commented Oct 7, 2018 •

edited

Loading

paulrutter commented Oct 8, 2018

paulrutter commented May 3, 2019

Post-mortem out of memory analysis #239

Post-mortem out of memory analysis #239

Comments

paulrutter commented Oct 4, 2018

joyeecheung commented Oct 4, 2018 • edited Loading

paulrutter commented Oct 4, 2018

joyeecheung commented Oct 4, 2018

joyeecheung commented Oct 4, 2018

paulrutter commented Oct 4, 2018

mmarchini commented Oct 4, 2018

paulrutter commented Oct 4, 2018

mhdawson commented Oct 4, 2018

paulrutter commented Oct 5, 2018

mhdawson commented Oct 5, 2018

paulrutter commented Oct 5, 2018

richardlau commented Oct 5, 2018

mhdawson commented Oct 5, 2018

joyeecheung commented Oct 5, 2018

mhdawson commented Oct 5, 2018

joyeecheung commented Oct 5, 2018 • edited Loading

mhdawson commented Oct 5, 2018

joyeecheung commented Oct 5, 2018

paulrutter commented Oct 6, 2018

mmarchini commented Oct 6, 2018

paulrutter commented Oct 6, 2018

joyeecheung commented Oct 6, 2018 • edited Loading

paulrutter commented Oct 7, 2018 • edited Loading

joyeecheung commented Oct 7, 2018 • edited Loading

paulrutter commented Oct 8, 2018

paulrutter commented May 3, 2019

joyeecheung commented Oct 4, 2018 •

edited

Loading

joyeecheung commented Oct 5, 2018 •

edited

Loading

joyeecheung commented Oct 6, 2018 •

edited

Loading

paulrutter commented Oct 7, 2018 •

edited

Loading

joyeecheung commented Oct 7, 2018 •

edited

Loading