Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM: Extreme memory growth #3113

Closed
arthurschreiber opened this issue Sep 29, 2015 · 58 comments
Closed

VM: Extreme memory growth #3113

arthurschreiber opened this issue Sep 29, 2015 · 58 comments
Labels
memory Issues and PRs related to the memory management or memory footprint. performance Issues and PRs related to the performance of Node.js. vm Issues and PRs related to the vm subsystem.

Comments

@arthurschreiber
Copy link

It looks like context objects are not collected by the GC until the max memory limit is reached.

'use strict';

var vm = require('vm');

var script = vm.createScript();
var context = vm.createContext();

for (var i = 0; i < 10000000; i++) {
  vm.runInNewContext('(_$_v.length-1)', {
    '_$_v': [
      { category: 'reference',
        author: 'Nigel Rees',
        title: 'Sayings of the Century',
        price: 8.95 },
      { category: 'fiction',
        author: 'Evelyn Waugh',
        title: 'Sword of Honour',
        price: 12.99 },
      { category: 'fiction',
        author: 'Herman Melville',
        title: 'Moby Dick',
        isbn: '0-553-21311-3',
        price: 8.99 },
      { category: 'fiction',
        author: 'J. R. R. Tolkien',
        title: 'The Lord of the Rings',
        isbn: '0-395-19395-8',
        price: 22.99 }
    ]
  });

  if (i % 10 === 0) {
    console.log("After", i, "iterations:", parseInt(process.memoryUsage().heapTotal / 1024 / 1024), "MB");
  }
}
$ node test.js
After 0 iterations: 9 MB
After 10 iterations: 20 MB
After 20 iterations: 30 MB
After 30 iterations: 46 MB
After 40 iterations: 52 MB
After 50 iterations: 60 MB
After 60 iterations: 69 MB
After 70 iterations: 95 MB
After 80 iterations: 101 MB
After 90 iterations: 107 MB
After 100 iterations: 114 MB
After 110 iterations: 120 MB
After 120 iterations: 126 MB
After 130 iterations: 137 MB
After 140 iterations: 142 MB
After 150 iterations: 154 MB
After 160 iterations: 159 MB
After 170 iterations: 166 MB
After 180 iterations: 178 MB
After 190 iterations: 184 MB
After 200 iterations: 191 MB
After 210 iterations: 203 MB
After 220 iterations: 208 MB
After 230 iterations: 221 MB
After 240 iterations: 227 MB
After 250 iterations: 234 MB
After 260 iterations: 245 MB
After 270 iterations: 253 MB
After 280 iterations: 265 MB
After 290 iterations: 270 MB
After 300 iterations: 277 MB
After 310 iterations: 288 MB
After 320 iterations: 296 MB
After 330 iterations: 302 MB
After 340 iterations: 313 MB
After 350 iterations: 320 MB
After 360 iterations: 333 MB
After 370 iterations: 339 MB
After 380 iterations: 346 MB
After 390 iterations: 357 MB
After 400 iterations: 363 MB
After 410 iterations: 365 MB
After 420 iterations: 365 MB
After 430 iterations: 365 MB
After 440 iterations: 365 MB
After 450 iterations: 365 MB
After 460 iterations: 367 MB
After 470 iterations: 370 MB
After 480 iterations: 376 MB
After 490 iterations: 385 MB
After 500 iterations: 392 MB

If I specify --max-old-space-size I can stop the memory growth.

 node --max-old-space-size=32 test2.js
After 0 iterations: 9 MB
After 10 iterations: 20 MB
After 20 iterations: 39 MB
After 30 iterations: 44 MB
After 40 iterations: 10 MB
After 50 iterations: 20 MB
After 60 iterations: 30 MB
After 70 iterations: 47 MB
After 80 iterations: 17 MB
After 90 iterations: 25 MB
After 100 iterations: 31 MB
After 110 iterations: 47 MB
After 120 iterations: 15 MB
After 130 iterations: 26 MB

If I rewrite the code to reuse the context object, memory usage and performance are a lot better:

'use strict';

var vm = require('vm');

var script = vm.createScript('(_$_v.length-1)');
var context = vm.createContext();

for (var i = 0; i < 10000000; i++) {
  context['_$_v'] = [
    { category: 'reference',
      author: 'Nigel Rees',
      title: 'Sayings of the Century',
      price: 8.95 },
    { category: 'fiction',
      author: 'Evelyn Waugh',
      title: 'Sword of Honour',
      price: 12.99 },
    { category: 'fiction',
      author: 'Herman Melville',
      title: 'Moby Dick',
      isbn: '0-553-21311-3',
      price: 8.99 },
    { category: 'fiction',
      author: 'J. R. R. Tolkien',
      title: 'The Lord of the Rings',
      isbn: '0-395-19395-8',
      price: 22.99 }
  ];

  script.runInContext(context);

  if (i % 10000 === 0) {
    console.log("After", i, "iterations:", parseInt(process.memoryUsage().heapTotal / 1024 / 1024), "MB");
  }
}
$ node test.js
After 0 iterations: 9 MB
After 10000 iterations: 11 MB
After 20000 iterations: 11 MB
After 30000 iterations: 12 MB
After 40000 iterations: 13 MB
After 50000 iterations: 13 MB
After 60000 iterations: 13 MB
After 70000 iterations: 13 MB
After 80000 iterations: 13 MB
After 90000 iterations: 13 MB
After 100000 iterations: 13 MB
After 110000 iterations: 14 MB
After 120000 iterations: 14 MB
After 130000 iterations: 15 MB
After 140000 iterations: 16 MB
After 150000 iterations: 17 MB
After 160000 iterations: 17 MB
After 170000 iterations: 18 MB
After 180000 iterations: 19 MB
After 190000 iterations: 20 MB
After 200000 iterations: 20 MB
After 210000 iterations: 21 MB
After 220000 iterations: 22 MB
After 230000 iterations: 22 MB
After 240000 iterations: 23 MB
After 250000 iterations: 23 MB
After 260000 iterations: 23 MB
After 270000 iterations: 23 MB
After 280000 iterations: 23 MB
After 290000 iterations: 23 MB
After 300000 iterations: 23 MB
After 310000 iterations: 23 MB
After 320000 iterations: 23 MB
After 330000 iterations: 23 MB
After 340000 iterations: 23 MB
After 350000 iterations: 23 MB
After 360000 iterations: 23 MB
After 370000 iterations: 23 MB
After 380000 iterations: 19 MB
After 390000 iterations: 19 MB
After 400000 iterations: 19 MB
After 410000 iterations: 19 MB
After 420000 iterations: 19 MB
After 430000 iterations: 19 MB
After 440000 iterations: 19 MB
After 450000 iterations: 19 MB
After 460000 iterations: 19 MB
After 470000 iterations: 19 MB
After 480000 iterations: 19 MB
After 490000 iterations: 19 MB
After 500000 iterations: 19 MB
After 510000 iterations: 20 MB
After 520000 iterations: 20 MB
After 530000 iterations: 21 MB
After 540000 iterations: 22 MB

I understand why performance is a lot better (re-using the same object is obviously faster than creating new ones), but why is the memory profile so different?

@silverwind silverwind added vm Issues and PRs related to the vm subsystem. memory Issues and PRs related to the memory management or memory footprint. labels Sep 29, 2015
@ChALkeR
Copy link
Member

ChALkeR commented Sep 29, 2015

The fact that the memory growth is stopped by using --max-old-space-size means that those objects are collectable by gc. Probably gc isn't triggered in this situation. Will take a look in an hour.

@bnoordhuis
Copy link
Member

What you're seeing is more or less the expected behavior. Garbage collection is expensive and garbage collecting VM contexts even more so, so V8 will generally prefer to grow the heap instead of sweeping it.

If you start node with --expose_gc and call gc() every n iterations, memory usage should be much more stable.

@arthurschreiber
Copy link
Author

@bnoordhuis This is still quite unexpected, especially as we didn't have this problem with v0.10. I understand that the internals of the vm module changed quite a bit, but it still feels that something is quite off if memory usage suddenly grows to +1GB where it was pretty stable at around 150MB before.

I understand that I could either expose the gc or limit the max heap size, but both solutions feel like some sort of hack to me.

@bnoordhuis
Copy link
Member

It is what it is. There is not much we (node.js) can do here, GC heuristics are predominantly a V8 affair. We can't really hint at V8 either that now would be a good time to collect garbage because node.js doesn't really know when the VM context is well and truly dead.

@arthurschreiber
Copy link
Author

Oh well, that's unfortunate. :( I worked around this issue for now by re-using the same context instead of recreating a new context. Not perfect, but better than using up all available memory on the system.

@ChALkeR
Copy link
Member

ChALkeR commented Sep 29, 2015

The code fires only scavenges, but not mark-sweeps (until the total RSS reaches the limit).
The memory is reclaimed by mark-sweeps, scavenges are ineffective here.

@ChALkeR
Copy link
Member

ChALkeR commented Sep 29, 2015

It's also worth noting that somewhy --expose-gc (without calling gc() manually) makes the situation worse when --max-old-space-size is specified.

  1. node --max-old-space-size=32 test.js — works (checked up to 1000 iteration).
  2. node --max-old-space-size=32 --expose-gc test.js — crashes after about 30-40 iterations.
  3. node --max-old-space-size=100 --expose-gc test.js — crashes after about 140 iterations.
  4. node --max-old-space-size=200 --expose-gc test.js — crashes after about 280 iterations.

test.js is exactly the same in all four cases.

@arthurschreiber
Copy link
Author

@ChALkeR What I've seen in our app is that even when after running for more than 48 hours, none of the memory that is used up by the contexts is ever really freed again. Basically, memory usage grows to some limit around 1GB and then stays there forever. Not sure why no mark-sweep is triggered and why these context objects are not cleaned up.

@ChALkeR
Copy link
Member

ChALkeR commented Sep 29, 2015

none of the memory that is used up by the contexts is ever really freed again. Basically, memory usage grows to some limit around 1GB and then stays there forever.

What do you mean by «not freed again»? It's actually freed when a mark-sweep is triggered and re-used for further allocations. If your heapUsed stays constant for a lot of operations generally means that GC is triggered too often, frees too little space, and that the freed space is immediately consumed. That sutuation is suboptimal, it usually means big gc pauses and lags as a result.

If your RSS does not go down when something is freed — that's normal even for plain c/c++ programs.

@ChALkeR
Copy link
Member

ChALkeR commented Sep 29, 2015

A side note: vm.runInNewContext(…) is not required here, the problem could be reproduced with just vm.createContext(); calls.

@ChALkeR
Copy link
Member

ChALkeR commented Sep 29, 2015

@indutny Maybe ContextifyContext constructor and destructor should call AdjustAmountOfExternalAllocatedMemory?

@ChALkeR
Copy link
Member

ChALkeR commented Sep 29, 2015

That's about 1 MiB per each (empty) ContextifyContext for me.

@arthurschreiber
Copy link
Author

@ChALkeR One thing I noticed:

The contextify module used NanContextDisposedNotification https://github.com/brianmcd/contextify/blob/1ed0631b435f6301986cc0890c36a29d5c55de97/src/contextify.cc#L31

Provide a GC hint that the context has gone away. Without this call it does not seem that the collector will touch the context until under extreme stress.

I don't see any call to ContextDisposedNotification in the node_contextify.cc. Might this be related?

@arthurschreiber
Copy link
Author

The v8 source says the following about ContextDisposedNotification:

Optional notification that a context has been disposed. V8 uses these notifications to guide the GC heuristic. Returns the number of context disposals - including this one - since the last time V8 had a chance to clean up.

Looks like this call was somehow lost during the merge of the contextify module?

@ChALkeR
Copy link
Member

ChALkeR commented Sep 29, 2015

Just for the reference:

diff --git a/src/node_contextify.cc b/src/node_contextify.cc
index 6520355..d21fc80 100644
--- a/src/node_contextify.cc
+++ b/src/node_contextify.cc
@@ -83,10 +83,13 @@ class ContextifyContext {
     proxy_global_.SetWeak(this, WeakCallback<Object, kProxyGlobal>);
     proxy_global_.MarkIndependent();
     references_++;
+
+    env->isolate()->AdjustAmountOfExternalAllocatedMemory(5 * 1024 * 1024);
   }


   ~ContextifyContext() {
+    env_->isolate()->AdjustAmountOfExternalAllocatedMemory(-5 * 1024 * 1024);
     context_.Reset();
     proxy_global_.Reset();
     sandbox_.Reset();

is just enough stress for gc to clean things up and keep the memory usage stable there.
I'm not proposing that as a solution atm, it's just a test to get what's going on here.

@ChALkeR
Copy link
Member

ChALkeR commented Sep 29, 2015

@arthurschreiber It could be related, but by modifying just the destructor you can't solve this problem.
The destructor does not get called until the GC collects that object, and GC doesn't collect that object until a full mark-sweep, and those aren't triggered currently in your testcase.

@ChALkeR
Copy link
Member

ChALkeR commented Sep 29, 2015

/cc @domenic @brianmcd @indutny.

@arthurschreiber
Copy link
Author

@ChALkeR Are you sure? My C++ is not that great, but I think the destructor is invoked here: https://github.com/nodejs/node/blob/master/src/node_contextify.cc#L337.

@bnoordhuis
Copy link
Member

I don't see any call to ContextDisposedNotification in the node_contextify.cc.

Isolate::ContextDisposedNotification() isn't very effective in node.js. You can try it if you want, you'll probably see roughly similar memory usage and worse overall performance.

I looked into it a while ago and IIRC, it's because the v8::Context can stay alive after the ContextifyContext itself has been destroyed; embedded weak references in code objects and such keep it alive. It doesn't help that ContextifyContext does a complicated dance with multiple weakly persistent objects.

@ChALkeR
Copy link
Member

ChALkeR commented Sep 30, 2015

WeakCallback is indeed triggered by scavenges, while the ~ContextifyContext is triggered only by mark-sweeps.

Looking at lines 336-337, WeakCallback is triggered three times (for the number of weak references):

  1. First triggered with context->references_ === 3 for the kSandbox weak reference — by the first and only scavenge.
  2. Then with context->references_ === 2 for the kContext weak reference — only during a mark-sweep.
  3. Then (immediately) with context->references_ === 1 for the kProxyGlobal weak reference — during the same mark-sweep as 2.

@arthurschreiber
Copy link
Author

I tried sticking ContextDisposedNotification calls into different places, and they didn't seem to have any effect whatsoever. :(

@Fishrock123
Copy link
Contributor

cc @domenic again

@domenic
Copy link
Contributor

domenic commented Oct 7, 2015

At this point @ChALkeR has a better grasp on this than I do.

@ChALkeR
Copy link
Member

ChALkeR commented Oct 7, 2015

For reference, reduced testcase:

'use strict';

var vm = require('vm');
for (var i = 0; i < 10000000; i++) {
  if (i % 10 === 0) {
    console.log("After", i, "iterations:", parseInt(process.memoryUsage().heapTotal / 1024 / 1024), "MB");
  }
  vm.createContext();
}

@rafinskipg
Copy link

+1 we have faced the same problem while executing precompiled vm.Script in runInNewContext and runInContext.

It went pretty bad for us, if anyone could provide some hint about where this is located on the v8 we would like to investigate (we don't know C but we'll do our best)

@dhendo
Copy link

dhendo commented Nov 18, 2015

I'm seeing the same kind of issue - from the heap dumps, it looks like I've got vm sandboxes piling up from calls to runInNewContext. A simple test case, with very little going on in the script benefits from -expose-gc and calling gc().

We can't really hint at V8 either that now would be a good time to collect garbage because node.js doesn't really know when the VM context is well and truly dead.

Would is be possible to expose some way of marking/disposing of a sandbox/context as unwanted e.g. unref() or .dispose() ?

@bnoordhuis
Copy link
Member

Would is be possible to expose some way of marking/disposing of a sandbox/context as unwanted e.g. unref() or .dispose() ?

That wouldn't make a difference. As long as there are still references (from your own code or e.g. from weak embedded pointers in code objects), the context can't be reclaimed by the garbage collector.

@indutny
Copy link
Member

indutny commented Nov 22, 2015

With #3962 it still leaks memory, but at least it collects it a bit faster (partly, because allocation takes much less time).

@ChALkeR ChALkeR added the performance Issues and PRs related to the performance of Node.js. label Feb 16, 2016
@asaworld
Copy link

asaworld commented Mar 3, 2016

Just as an FYI as it is suggested many times in here. If you add the --expose-gc flag, it adds the gc to the context when it is created in runInNewContext. The gc must reference the context and now the context references the gc so it leaks as a result of this flag being set. Setting context.gc = null solves this after the context is created. I updated #3249 aswell.

@rvagg
Copy link
Member

rvagg commented Mar 15, 2016

Has anyone done any work on comparing v0.10 to v4 and v5 and even master (perhaps v8 4.9 is interesting now)?

I'm hearing that this issue is holding up some users from migrating to v4+ since it's such a big deal. @nodejs/v8 I'm wondering if this is something on V8's radar at all? How do they use new contexts in Chromium or is that functionality entirely for embedders? i.e. would Chromium feel any pain from this change from their user-base or is it on us to provide feedback about what we're experiencing and try and find a resolution with them?

@ofrobots
Copy link
Contributor

I can reproduce this with v4 and v5, but it seems to be fixed on master (or to be more precise, master + recent V8 lkgr). Once I have wrapped up a few things I can try taking a look at what fixed it.

@ofrobots
Copy link
Contributor

It seems like I fixed this issue recently when I simplified the logic of how node_contextify was keeping references to the V8 context (relevant commits ebbbc5a...66566df). Next I can take a look at whether these are easily backportable.

@rvagg
Copy link
Member

rvagg commented Mar 16, 2016

Accidentally fixing something, nice! Thanks for looking into this @ofrobots!

@ofrobots
Copy link
Contributor

These back-port to v4.x with a rather simple merge conflict resolution. I have verified that problem gets fixed in v4.x after the backport.

@nodejs/lts: what would be the process to get this backported? These 4 commits (ebbbc5a...66566df) need to be backported, but I don't want to tag the issue (#5392) with lts-watch tag.

@ofrobots
Copy link
Contributor

For some reason I thought that the issue contained more commits than just these 4. That is not the case, so I am just going to tag the issue.

@rvagg
Copy link
Member

rvagg commented Mar 16, 2016

I have test builds showing up here: https://nodejs.org/download/test/v4.3.2-test20160316bdc07626e8/ using https://github.com/rvagg/io.js/tree/5392-backport where I have those 4 commits applied on top of the current v4.x. I'd appreciate it if those having trouble with this problem could test it out and report back.

@ofrobots
Copy link
Contributor

It turns out the above 4 commits are not functionally complete – an additional commit (#5800) is needed to avoid a crash. (@rvagg, would you mind putting together another build?)

I would really like to get some independent verification whether this memory growth issue is indeed fixed. In my experimentation, the reduced test-case from @ChALkeR above does seem to get fixed.

@rvagg
Copy link
Member

rvagg commented Mar 20, 2016

New builds with the extra commit are available here: https://nodejs.org/download/test/v4.3.2-test2016032010820d8910/

please test folks!

@cin210
Copy link

cin210 commented Mar 29, 2016

Using the build provided by @rvagg in development benchmarks dramatically reduces the memory footprint of creating and tearing down VMs over time. This fix is a huge boon for our team! Thank you!

@indieisaconcept
Copy link

I'm looking to do a test with 5.9.1 later this week. I've seen good results locally but need to kick the tyres more with a % of production traffic to be certain.

@rvagg
Copy link
Member

rvagg commented Mar 29, 2016

If things turn out to be good with this in 5.9.1 then I don't see any reason they won't be applied to a v4 release soonish. Given that it turned out to be not quite as simple (regression in 5.9.0) it might be something we want to sit on for a little longer to ensure that the functionality is stable in v5 before we have confidence to pull it back. / @nodejs/lts

@MylesBorins
Copy link
Contributor

+1 to that @rvagg

I would love to see these changes backported, but would also like to see them mature a little longer in v5.

@indieisaconcept
Copy link

Finally had chance to test this - I was able to do an initial non-prod performance test. Metrics look good. I saw an overall drop in CPU and memory is reduced. I'm hoping to do a realistic prod test in the next couple of days but so far I am very happy with the results.

Thanks for looking into this.

metrics

I should clarify that those peaks are expected in my particular use case. That is actually when node-vm is executing.

@eljefedelrodeodeljefe
Copy link
Contributor

Cool. So this can be closed, right?

@ChALkeR
Copy link
Member

ChALkeR commented Apr 14, 2016

Closing this as it's not reproducable anymore and seems to be fixed.

--expose-gc issue is at #3249.

@rvagg
Copy link
Member

rvagg commented May 19, 2016

PR to get this in to v4.x: #6871

@rvagg
Copy link
Member

rvagg commented May 25, 2016

Released in v4.4.5 folks: https://nodesource.com/blog/node-js-v4-4-5-release-brief/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
memory Issues and PRs related to the memory management or memory footprint. performance Issues and PRs related to the performance of Node.js. vm Issues and PRs related to the vm subsystem.
Projects
None yet
Development

No branches or pull requests