-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory leak in shadow realms #47353
Comments
@nodejs/realm |
Could this memory leak cause performance regression when we run URL tests? I'm seeing a different result in node benchmark compare results, compared to microbenchmarks with mitata. |
I think we would eventually need some way to "know" when a ShadowRealm can be collected / closed / destroyed to dispose of any connected libuv handles or other native resources. |
I don't think so, this is only relevant in shadow realms, and I don't think we have URLs benchmarks in shadow realms (it's only supported starting from two weeks ago and not even released, and still it's behind |
ShadowRealm is currently behind the runtime flag
If BaseObjects are weak and no active HandleWraps or ReqWraps, the ShadowRealms' context should be unreachable and can be GC-ed. However, as pointed out by @joyeecheung as option 2, this needs to avoid making BaseObjects strong. This can be particularly a problem when we need to nest BaseObjects with c++ member fields by v8 global handles as v8's garbage collector is unable to infer the references between c++ objects. As an alternative, the references can be saved as object internal fields, or as v8's cppgc |
I also noticed that the current machinery for managing BaseObjects does not work well with the idea that "a context will become unreachable before the BaseObjects are destructed". For example I am trying to make BindingData weak, and I notice the they get deleted first via the cleanup hooks (as part of the destruction of the Realm) and then via the weak callbacks, whereas in the principal realm we simply assume that weak callbacks, if ever called, must always be called before the cleanup hooks (and we already have guard in the code for double-deletion in this case, but not the other one). There are probably more assumption about this throughout the code base. This also means, IIUC, none of the BaseObjects destructors should access the context or even modify JS states (not even setting the internal fields, that also crashes when the context is unreachable). This can be particularly problematic for complex objects like AsyncWraps (who...calls destroy hooks during destruction. Ouch). |
I now wonder if it's too late to make the ShadowRealms API actively dispose themselves (e.g. |
From my point of view, it would be tough for us to deal with any native/asynchronous resources without a way to dispose of the realm deliberately. With the current semantics, if we open a socket inside a ShadowRealm, and the wrapper object is collected, what should we do? There are three options:
With a |
I think the ability of opening a socket is probably out of the scope in ShadowRealms ;). But even in those cases, those wrappers are usually weak so would not hold the context alive. The more problematic ones are e.g. caches (e.g. for module imports and serialization, which should be available in shadow realms?), we may need strong references to them to guarantee idempotence. The |
Yeah, modules are supposed to be supported in the shadow realm and can be imported with I think modules are a bit different from |
There is actually a leak. The test doesn't exercise the right path to create a substantial enough object graph (e.g. accessing something that results in the loading of a binding). This does something more complicated in the test and moves it to known_issues until we find a fix. PR-URL: #47355 Refs: #47353 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
There is actually a leak. The test doesn't exercise the right path to create a substantial enough object graph (e.g. accessing something that results in the loading of a binding). This does something more complicated in the test and moves it to known_issues until we find a fix. PR-URL: #47355 Refs: #47353 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
There is actually a leak. The test doesn't exercise the right path to create a substantial enough object graph (e.g. accessing something that results in the loading of a binding). This does something more complicated in the test and moves it to known_issues until we find a fix. PR-URL: #47355 Refs: #47353 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
I have sort of a fix here: https://github.com/joyeecheung/node/tree/weak-binding - but it's more like a band-aid for the binding data, which are special in that a) they are limited and one-per-realm b) they are alive as long as the context is still alive c) it's fine to actively and deterministically destroy them only in the cleanup hook. These characteristics aren't true for other BaseObjects, though. So that isn't really a solution to the issue here. But at least it works for binding data in the shadow realms (which are the most likely BaseObjects users can run into in shadow realms or now), and it kind of makes sense on its own already, even not for shadow realms. |
There is actually a leak. The test doesn't exercise the right path to create a substantial enough object graph (e.g. accessing something that results in the loading of a binding). This does something more complicated in the test and moves it to known_issues until we find a fix. PR-URL: #47355 Refs: #47353 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
I think that only solves part of the problem. Even if we manage to make all the references to the BaseObject wrappers weak, there really is no guarantee that the weak callbacks of the wrappers are always fired before the weak callback of the context is fired (becoming unreachable first doesn't mean their weak callback always get called first, V8 explicitly does not guarantee that). So if the destructors of the BaseObjects assume that the context is still valid (e.g. by getting anything from the wrapper, or calling any JS function, or even accessing the internal aligned pointer - this happens to way too many BaseObjects), we have to actively complete all the destructions of BaseObjects before the context becomes unreachable (which in turn means we must be the one controlling the timing of the context destruction). And that's what I was seeing when I tried to make the binding data weak -the realm weak callback got called first before the binding data weak callback got called. So the best solution is to just make them weak without a callback and simply rely on the cleanup queue to destruct them. But that's a very unique solution that only applies to binding data (they are fixed and one-per-realm so we don't get leaks by doing this). |
If BaseObjects' weak callback gets called first, their persistent handles should be reset first at Line 61 in 7efae93
However, if the context's weak callback gets called first, it is unsafe to clean up tracked base objects in the first pass weak callback. Yet it is still feasible to schedule a second pass weak callback and run the cleanup hooks to destroy all tracked base objects: https://github.com/legendecas/node/tree/shadow-realm/binding-data. In this way, the base object's weak callbacks are invoked before the realm's second-pass weak callback and their persistent handles are reset before calling their destructor. This aligns with the existing plain weak BaseObject finalization behavior. AsyncWrap's destructor calls into JavaScript. However, AsyncWrap should keep the realm reachable. If an AsyncWrap is un-refed, it should avoid calling any V8 APIs in the destructor with the flag |
I think in the case, the BaseObject's weak callbacks only happen to be invoked before the Realm's second-pass callback. V8 explicitly does not guarantee the order of the weak callback invocations - it actually doesn't even guarantee that these callbacks would ever be called...and is also part of the reason why we have the cleanup queue to get rid of any BaseObjects that aren't destructed via the weak callback as the last resort (arguably it's rarer to see weak callbacks never get called, so we could get away with relying on them to reclaim resources, but relying on an assumed order is a bit too far IMHO). |
@joyeecheung thanks for the clarification. I'm afraid my comment above may be not clear. If we defer the draining of the realms' cleanup hooks to a safe point to access V8 API (i.e. outside of weak callbacks), we can avoid depending on the order of the weak callbacks of BaseObjects and v8::Context. If the realm's cleanup hooks are drained in the weak callback, BaseObject's (whose weak callbacks are pending to be invoked) destructor tries to create an instance handle of the wrapper object and set the internal fields. This violates V8's weak callback API contract and leads to a crash. The second-pass callback is a way that shows how we defer draining the realm cleanup hooks. It can also be replaced with |
I think as long as we use |
Actually if the realm weak callback is called first, (I also noticed that technically speaking BaseObject's weak callbacks should be made two-passed and shouldn't actually touch V8 in the first pass weak callback too...and we should make the cleanup queue for BaseObjects a two-pass process as well to avoid interleaving weak callbacks with cleanup hooks - which might happen if the BaseObject destructors allocates any memory somehow - I am not sure how probably that is, but left a TODO here) |
There is actually a leak. The test doesn't exercise the right path to create a substantial enough object graph (e.g. accessing something that results in the loading of a binding). This does something more complicated in the test and moves it to known_issues until we find a fix. PR-URL: nodejs#47355 Refs: nodejs#47353 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
There is actually a leak. The test doesn't exercise the right path to create a substantial enough object graph (e.g. accessing something that results in the loading of a binding). This does something more complicated in the test and moves it to known_issues until we find a fix. PR-URL: #47355 Refs: #47353 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
The binding data holds references to the AliasedBuffers directly from their wrappers which already ensures that the AliasedBuffers won't be accessed when the wrappers are GC'ed. So we can just make the global references to the AliasedBuffers weak. This way we can simply deserialize the typed arrays when deserialize the binding data and avoid the extra Object::Set() calls. It also eliminates the caveat in the JS land where aliased buffers must be dynamically read from the binding. PR-URL: #47354 Refs: #47353 Reviewed-By: Chengzhong Wu <[email protected]>
The binding data holds references to the AliasedBuffers directly from their wrappers which already ensures that the AliasedBuffers won't be accessed when the wrappers are GC'ed. So we can just make the global references to the AliasedBuffers weak. This way we can simply deserialize the typed arrays when deserialize the binding data and avoid the extra Object::Set() calls. It also eliminates the caveat in the JS land where aliased buffers must be dynamically read from the binding. PR-URL: #47354 Refs: #47353 Reviewed-By: Chengzhong Wu <[email protected]>
There is actually a leak. The test doesn't exercise the right path to create a substantial enough object graph (e.g. accessing something that results in the loading of a binding). This does something more complicated in the test and moves it to known_issues until we find a fix. PR-URL: #47355 Refs: #47353 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
There is actually a leak. The test doesn't exercise the right path to create a substantial enough object graph (e.g. accessing something that results in the loading of a binding). This does something more complicated in the test and moves it to known_issues until we find a fix. PR-URL: nodejs#47355 Refs: nodejs#47353 Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Chengzhong Wu <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]>
Version
all
Platform
all
Subsystem
shadow-realm
What steps will reproduce the bug?
How often does it reproduce? Is there a required condition?
Always
What is the expected behavior? Why is that the expected behavior?
Shouldn't crash from OOM
What do you see instead?
Crash from OOM
Additional information
See analysis #47339 (comment). This was introduced in #46809. The current memory management of shadow realms requires that there should be no strong global references in the graph, but we currently have many of them in the code base - among them are the binding data and the aliased arrays associated with the encoding binding, which is lazily created when
TextEncoder
is accessed, for example.I think we have at least two options:
Or, we could use some help from V8 to get notified about the realms being unreachable, and release the realms in some callback instead of in a weak callback of the context. It's not clear to me what's enough for us at this point, though.
The text was updated successfully, but these errors were encountered: