Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs hooks #63

Closed
bengl opened this issue Mar 6, 2020 · 11 comments
Closed

fs hooks #63

bengl opened this issue Mar 6, 2020 · 11 comments

Comments

@bengl
Copy link
Member

bengl commented Mar 6, 2020

This was initially brought up in the 2020-02-21 meeting.

The proposal here is for adding a way to add "hooks" to fs, allowing user code to alter its behavior. fs operations would be intercepted, giving an opportunity to do things like change paths, provide buffers or streams corresponding to filenames, etc.

There are a number of reasons why folks may want to alter the behaviour of fs. Some I can think of quickly:

  1. Tools that build up a fake directory structure and access packed files or files that are located elsewhere. Examples: yarn --pnp/yarn@2, tink.
  2. On-the-fly transpiling. While loader hooks give an avenue for ESM, this doesn't (yet) help for CJS modules (IIRC). Also, not all transpilation is strictly for code to run in the Node.js runtime.
  3. Policy enforcement tools may want to prevent access to certain files or change their contents on the fly to obfuscate contents.
  4. Mad-science things like ipfs can be used more transparently.
  5. APMs.

A way to intercept fs would enable these sorts of use cases and probably a bunch of others I couldn't think of. This could be implemented as hooks, similar to async_hooks. We would, of course, want this to have zero (or nearly zero) overhead when not enabled.

Potentially Related: #48

Prior art

FUSE exists today fitting a similar use case at the system level, and could be used as a template for building our own API.

Alternatives existing today

In order to do anything like this today, folks are shimming fs directly. While this is generally workable, it's certainly not trivial to do it correctly, and it often results in rather hard-to-read implementations.

@ljharb
Copy link
Member

ljharb commented Mar 6, 2020

There should also be a way to prevent second-run code from doing this; currently i can freeze fs or save a reference to the functions i need.

@wesleytodd
Copy link
Member

Another use cases which would help with, bundling applications. Like pkg or #3.

Also, similar to PnP/tink, things like Yeoman and Gulp also had a synthetic filesystem implementation. To the point of "not trivial to do it correctly, and it often results in rather hard-to-read implementations", both of these projects had a lot to be desired, and as an end-user I had many issues with their fs implementations (maybe they have gotten better since then, I used them primarily ~5 years ago).

@vdeturckheim
Copy link
Member

Thanks @boneskull for the heads up!

This is not totally the same as the Audit Hooks initiative, but there might be some overlap.
In the current goals, audit hooks do not allow rewriting things on the fly but cover at least item (3):

Policy enforcement tools may want to prevent access to certain files or change their contents on the fly to obfuscate contents.

Also, it aims at providing more hooks than fs only and to be also usable by module authors.

Would it make sense to discuss this further if the goals are not too orthogonals? I am working on a PoC for audit hooks starting with fs (but will be soon followed by eval hooking): nodejs/node@master...vdeturckheim:audit_hooks

@arcanis
Copy link

arcanis commented Apr 18, 2020

I couldn't attend to the meeting yesterday btw, but my main feedback regarding custom fs layers is that it's difficult to test them. First you need to write the tests, of course, but then you need to keep them up-to-date with whatever new functions or options Node releases. This is where the main problem lies: if you forget to implement a property, the option will just be silently ignored and the code will misbehave.

There are two ways I could see this being improved:

  • First, extracting the fs testsuites outside of the Node codebase, in a way that would make them usable by third-party layers, would help ensure the quality of layers.

  • Second, the fs API could become more defensive and throw exceptions when receiving unsupported options (instead of silently ignoring them). It wouldn't help with compatibility, but would make such problems much easier to debug, leading to faster fixes.

Also one funny thing I discovered during my journey: it's currently impossible to fully intercept the fs calls that Node runs during the commonjs module resolution. This is because Node doesn't actually use fs.statSync, it uses a unique builtin primitive which we can't overload.

@theKashey
Copy link

Just yesterday we went into a similar problem for "module federation" - download a "module" and execute it as a string, which is kind of FUSE case, with remote file system.

  • download file from another source
  • create a Module
  • put that module into the module cache
  • require that file. 💥Bang! the module resolution algorithm is not able to find the module.

So we fixed the problem by patching Module. _resolveFilename to return a fileName if it is found in the cache. The "root cause" was not the calls to the file system, which was not able to find a virtual file, but the resolution algorithms itself.

Let's check what actually could be used

  • Tools that build up a fake directory structure and access packed files or files that are located elsewhere. Examples: yarn --pnp/yarn@2, tink.

module resolution.

  • On-the-fly transpiling. While loader hooks give an avenue for ESM, this doesn't (yet) help for CJS modules (IIRC). Also, not all transpilation is strictly for code to run in the Node.js runtime.

module resolution or even extension resolvers, like how babel/tsc currently working

  • Policy enforcement tools may want to prevent access to certain files or change their contents on the fly to obfuscate contents.

module resolution

So all needs is a loader hook working for ESM as well as CSJ modules.

@vdeturckheim
Copy link
Member

Quick status here as I won't be able to join tonight's meeting.
I have been side tracked a bit by an ES module topic related to APM/RASP.
Right now, I have a prototype working on fs on my branch but I am not happy with the API quite yet. Also, I am trying to consider ways to do it in a way that does not add up too much complexity to the existing code.

Right now it works by monkeypatching internals of the fs module (for instance bindings.open) when a hook is registered (the goal is to have 0 impact when no hook is enabled). Then, it forwards the value of arguments to the hook. A hook is registered by using symbols exposed by the fs module.

This works but I am afraid the API is hard to understand (and therefore use right). Also, It does not allow to "make fs lie" (call the callback function with an arbitrary value and prevent the unpatched call). This could be done by having the hook responding with a given Symbol (Symbol('skip') for instance). But this seems to be something we want to build with caution.
The adavantage is that a similar approach actually would work for everything we would like to make possible to hook into.

@boneskull
Copy link
Collaborator

Ref: nodejs/diagnostics#180

@Qard
Copy link
Member

Qard commented May 29, 2020

There's likely to be some overlap with nodejs/diagnostics#180, so it might be worth unifying the concepts.

@Qard
Copy link
Member

Qard commented Jun 1, 2020

So at the tooling working group meeting there was discussion about having a meeting for discussing how the fs hooks, policies, and diagnostics channel efforts are related. Is someone arranging that or do we need someone to lead that effort?

@vdeturckheim
Copy link
Member

@Qard good idea. I won't have time to schedule anything before next week however.

@bengl
Copy link
Member Author

bengl commented Jul 10, 2020

Discussion on this has generally been taken over by the diagnostic hooks meeting and the summit, and the effort is taking place in the Diagnostics WG (nodejs/diagnostics#401) so I'll close this issue.

@bengl bengl closed this as completed Jul 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants