Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src,doc: Experimental support for SEA #42334

Closed
wants to merge 10 commits into from

Conversation

mhdawson
Copy link
Member

Signed-off-by: Michael Dawson [email protected]

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/startup
  • @nodejs/tsc

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Mar 14, 2022
@mhdawson
Copy link
Member Author

@igorklopov, @jesec, @styfle could you look at what is documented/the proof of concept to confirm this is what we talked about in the next-10 mini-summit and that it's what is needed. If so would be great if you could help fill in the windows/osx versions.

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

mhdawson added 3 commits July 13, 2022 09:47
Signed-off-by: Michael Dawson <[email protected]>
@mhdawson
Copy link
Member Author

mhdawson commented Jul 13, 2022

Pushed some additional commits to

  1. search the binary if there is no added segment. This is based on the code in @jesec's branch. It will let us experiment with both approaches at the same time. It would also let us add some tests more easily.
  2. add exposing the pointer to the binary data. Not really sure if this is how we'd want to do it since although @jesec indicates it's used by the virtual file system I'm thinking something in the C++ layer will need to wrap the pointer. I'm thinking it might make more sense to expose is as buffer?
  3. Change the eye catcher to a GUID in order to reduce the change of an accidental collision to as close as possible to 0.

@RaisinTen
Copy link
Contributor

Btw, I have a working implementation of SEA in my branch that uses postject main...RaisinTen:node:sea. It also implements the VFS (asar in this case)!

The major problem I can think of is that the bootstrap script assumes that I'm using asar as the vfs, so we should probably find a way to allow users to configure it - perhaps by exposing an interface that folks can implement to hook in their custom vfs implementations? Or maybe some other approach that I couldn't think of?

@RaisinTen
Copy link
Contributor

add exposing the pointer to the binary data. Not really sure if this is how we'd want to do it since although @jesec indicates it's used by the virtual file system I'm thinking something in the C++ layer will need to wrap the pointer. I'm thinking it might make more sense to expose is as buffer?

I tried to expose the embedded data as a buffer on process._singleExecutableApplicationCode in main...RaisinTen:node:sea#diff-6c1386cbc6e4a11941649af31a0257d5c3e40b5f41223cdddf87ab2e984b705aR549 but I think we should find a way to expose this in a way without adding new fields on process.

@jesec
Copy link
Member

jesec commented Jul 14, 2022

Btw, I have a working implementation of SEA in my branch that uses postject main...RaisinTen:node:sea. It also implements the VFS (asar in this case)!

The major problem I can think of is that the bootstrap script assumes that I'm using asar as the vfs, so we should probably find a way to allow users to configure it - perhaps by exposing an interface that folks can implement to hook in their custom vfs implementations? Or maybe some other approach that I couldn't think of?

Or simply treat VFS as out-of-scope and let the packager utility / embedded script do it.

Implementing VFS inside Node.js does not help on its own, since a packager utility is still necessary after all. It is complicated to override Node.js APIs. Plus, there are many edge cases to handle and changes could easily become semver-major. It would significantly increase the maintenance burden for Node.js without tangible benefits (unless something like "node compile" is implemented along with it).

@jesec
Copy link
Member

jesec commented Jul 14, 2022

Pushed some additional commits to

  1. search the binary if there is no added segment. This is based on the code in @jesec's branch. It will let us experiment with both approaches at the same time. It would also let us add some tests more easily.
  2. add exposing the pointer to the binary data. Not really sure if this is how we'd want to do it since although @jesec indicates it's used by the virtual file system I'm thinking something in the C++ layer will need to wrap the pointer. I'm thinking it might make more sense to expose is as buffer?
  3. Change the eye catcher to a GUID in order to reduce the change of an accidental collision to as close as possible to 0.

I am a bit concerned about the case that the payload could be huge. It is not efficient to prematurely store the payload in memory.

@mhdawson
Copy link
Member Author

mhdawson commented Jul 14, 2022

I am a bit concerned about the case that the payload could be huge. It is not efficient to prematurely store the payload in memory.

I had wondered about your comment about 4096 being big enough. In the case of using an additional segment it will already be in memory so perhaps less of a concern.

Was the offset you were exposing an offset into the file instead of memory?

@mhdawson
Copy link
Member Author

Or simply treat VFS as out-of-scope and let the packager utility / embedded script do it.

I believe we should keep it as out-of-scope at least to start. Having the SEA support be minimal I think is best and we can then possibly adjust that based on feedback later on.

@RaisinTen
Copy link
Contributor

Or simply treat VFS as out-of-scope and let the packager utility / embedded script do it.

But wouldn't that promote monkey-patching in Node.js? I'm not sure if that's necessarily a good thing that we want to promote.

Implementing VFS inside Node.js does not help on its own, since a packager utility is still necessary after all. It is complicated to override Node.js APIs. Plus, there are many edge cases to handle and changes could easily become semver-major. It would significantly increase the maintenance burden for Node.js without tangible benefits (unless something like "node compile" is implemented along with it).

I totally agree with your points but if we support SEA in node, it would mean that we should at least support a way to implement a VFS, even if we don't provide a complete VFS implementation out of the box. The reason is that SEA won't be usable if node can't make sense of the data that was embedded in it. I can think of 2 solutions here:

  • officially support monkey patching fs and other components that get transitively monkey patched, like require() in this case
  • provide an interface that users can override to hook in their own VFS implementations

@RaisinTen
Copy link
Contributor

RaisinTen commented Jul 15, 2022

In the case of using an additional segment it will already be in memory so perhaps less of a concern.

The current approach allocates a v8::BackingStore in main...RaisinTen:node:sea#diff-6c1386cbc6e4a11941649af31a0257d5c3e40b5f41223cdddf87ab2e984b705aR537-R538 that has the same size as the embedded data and then copying the embedded data to it before passing it to the JS layer, so I think this could be problematic.

I wonder if there's a way to prevent this extra allocation. 🤔

@mhdawson
Copy link
Member Author

@RaisinTen If you look at what is in this PR, if isntead of providing the address we provided a buffer created on the native side by passing in the pointer to the binary data it would avoid the copy in a similar manner to now it avoids copying any of the parameters/scripts passed. ie when its in a segement it just uses the memory already loaded when the binary was run.

@mhdawson
Copy link
Member Author

@RaisinTen in particular if you look at the Node-api code here:

napi_status NAPI_CDECL napi_create_external_buffer(napi_env env,

API docs
https://nodejs.org/api/n-api.html#napi_create_external_buffer

That should be creating a buffer without allocating/copying.

It does still mean that if the addition is large it will be loaded into memory. Having the binary data come from a read to the exe on disk might still have an advantage if that is done as needed as opposed to pre-loaded.

@mhdawson
Copy link
Member Author

provide an interface that users can override to hook in their own VFS implementations

This may be being discussed as part of the loaders effort. -> #41076

@RaisinTen
Copy link
Contributor

thanks @mhdawson, removed the extra allocation here by passing a no-op FreeCallback :)

Local<Value> buffer =
    Buffer::New(
        isolate,
        const_cast<char*>(single_executable_application_code),
        single_executable_application_size,
        [](char* data, void* hint) {},
        nullptr)
        .ToLocalChecked();

@mhdawson
Copy link
Member Author

@RaisinTen it would be good to work towards a PR that we might land.

Any chance you could PR changes you think are needed to this PR? I think at very least your changes to provide access to a buffer instead of the address would make sense. It would also be a good way to discuss any others and start heading towards 1 shared patch versus us each working on possibly different ones.

@RaisinTen
Copy link
Contributor

I would be happy to send a PR for this but last week I had arranged a meeting between the Postject team and @jesec where we found out that there are some areas that need some more research, like these ones:

VFS:

  • all of the VFS implementations we were aware of depends on monkey-patching the fs module. Not shipping a full-blown VFS implementation or at least a way to implement it using hooks as you have already mentioned might promote more monkey-patching which might not be a good idea. Another good reason to revamp the way a VFS is implemented in userland is that, it's hard for the userland implementations to stay in sync with the API changes in node. For example, the asar implementation in electron is outdated - https://github.com/electron/electron/blob/06a00b74e817a61f20e2734d50d8eb7bc9b099f6/lib/asar/fs-wrapper.ts.
  • what factors should be considered while choosing a VFS?
  • do we even need to have different VFS implementations or do all use cases point towards a single kind of VFS?

Postject:

  • Since it currently depends on lief which has a significant amount of native code, how should we approach the problem of making it more portable? Do we have to manually rewrite it in JS? Compile it into WASM? Build it from scratch? Something else?
  • Should Postject also support deleting some segments from a binary? That might solve the problem of stripping large portions of the binary that aren't needed for a particular use case. Might be relevant for Node.js needs to lose weight #43811.

... etc.

I believe, having a team in node where we can discuss everything about SEA would be the right thing to do here. I've DM'ed you to learn about the process for creating a team and I'll follow that.

@RaisinTen
Copy link
Contributor

Any chance you could PR changes you think are needed to this PR?

@mhdawson done - #45038 :)

@mhdawson
Copy link
Member Author

mhdawson commented Feb 8, 2023

Closing in favor of #45038

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.