-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src: add require transform pipeline #12349
Conversation
This feature simply passes the content and filename through a transformer function which could be used for things like applying AST transforms to the source before it is run.
lib/internal/bootstrap_node.js
Outdated
@@ -474,6 +478,7 @@ | |||
} | |||
|
|||
NativeModule._source = process.binding('natives'); | |||
NativeModule.preloadModuleWrap = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Personally I would set it to a null transform
a => a
and remove the if in line 552.
I like polymorphism 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just went with the null
so there's no extra call in the vast majority of cases when it's not in-use. Probably not a big deal either way though.
lib/module.js
Outdated
@@ -537,6 +537,10 @@ var resolvedArgv; | |||
// the file. | |||
// Returns exception, if any. | |||
Module.prototype._compile = function(content, filename) { | |||
if (NativeModule.preloadModuleWrap !== null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see 1
I like this "middleware" approach, it's what made |
Or maybe it should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lacks docs so hard to understand how this would be used
lacks motivation in the PR description, some examples of how it would be used, and why this way is better would be helpful
Yep, no docs yet because:
For an example, there's https://github.com/nodejs/node/pull/12349/files#diff-38ba9d7bed72af8741677acd3cb7ec2c which adds an extra line to the text before it gets parsed. More advanced uses would be to feed it into an AST parser and make modifications to inject code coverage or instrumentation hooks, then regenerating the code text to pass along to the parser. As for why it's better: currently transpilers, code coverage tools and almost every APM agent makes extensive patches to the require function. APM is especially problematic because of the need for instrumenting core. Currently most APM providers monkey-patch everything at runtime, which can be incredibly fragile. If one wanted to try AST transform based instrumentation on userland modules they could patch |
I totally understand the desire to avoid docing something that might not happen, but in the absence of docs, its also hard to get feedback. |
A good reason to use on('moduleLoading') |
@sam-github Fair point. I'll write something up for it today. @refack An event emitter would likely just complicate things. You can't return from an emitter, so it'd need separate channel to propagate the changed values back and I feel like that'd just get messy fast. |
@sam-github I've added some documentation, if you'd like to have another look. |
Hmm...I'm wondering if it might be a good balance of performance and flexibility to instead have content = transforms.reduce((content, transform) => {
return transform(content, filename)
}, content) |
ironically so var e = {filename, source}
this.emit('moduleLoading', e)
var ret = e.source //works |
Yep, I'm aware. I hadn't thought of the object modifying approach...modifying an object in code that doesn't own it seems a bit questionable to me though, to be honest. 😟 |
It's not a trick, it's a valid and common pattern, like express's middleware, or DOM events ( |
3fd4099
to
5df01fe
Compare
I refactored to use How does it look? |
@nodejs/diagnostics |
If I remember correctly |
I don't think it was ever possible to patch built-in modules with As for why it was made private, I don't think it was ever strictly intended to be "public" in the first place. It just happened to not use an underscore in the early days of node, so people just assumed it was fair game to mess with it. Also, |
return next; | ||
}, content); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you reimplemented EventEmmiter
only with an explicit return. Do you really feel it's worth the duplication?
You could instead just wrap it, and win!
var ee = new EventEmitter();
var tMap = new WeakMap();
function applyTransforms(content, filename) {
var e = {content, filename};
ee.emit('moduleLoading', e);
return e.content;
}
function addTransform(transform) {
var t = (e) => {e.content = transform(e.content, e.filename)};
tmap.add(transform, t);
ee.addListener('moduleLoading', t)
}
function removeTransform(transform) { ee.removeListener('moduleLoading', tMap.get(transform))}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Event emitters are much bigger and more complicated than this needs.
That event emitter implementation has a lot more closures and object allocations. Event emitters also inherently have a much more complicated lookup scheme.
As require is a major startup hot-path, it's best not to put event emitters in the middle of that. Keep in mind this code would typically run many thousands of times during app startup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
premature optimization is the root of all evil...
IMHO code duplication is worse than a minor performance impact. I'm sure the actual compile
is orders of magnitude heavier.
Also EE are super fast when there are no listeners.
Let's test and see...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly code duplication is a thing to avoid generally, but the code to wrap an event emitter around it has just as much surface area as the custom implementation itself, without even taking into account all the extra stuff running inside the event emitter implementation. It's also a lot more difficult for someone unfamiliar with the code to understand at first glance. Were this something more complicated, an event emitter implementation would definitely make more sense, but it has little advantage here.
Also, the event emitter implementation is 2.5-3 times slower and has 11 times the memory footprint. It's not premature optimization, it's carefully considered and diligently measured optimization. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empirical measurement wins (even named my first company Empeeric)!
Changes were discussed, but PR is not yet complete...
I just wanted to elaborate a bit on my own personal reasoning for this PR: My use for this is as an alternative to monkey-patch-based performance monitoring instrumentation. Typically in a performance monitoring agent, it will apply instrumentation by wrapping functions in closures that add behaviour before running the original function and before running the original callback. This creates many, many closures for every function that is wrapped. In a heavily instrumented environment, the extra performance impact at high load can become problematic. There's also a lot of risk in wrapping code at runtime due to the potential for shifting, added or removed parameters, function.length checks, accidentally triggering getters/setters, inability to reliably inspect internal behaviour programmatically to inform decisions on where to place instrumentation hooks, etc. My hope is to provide a very usable alternative to the typical monkey-patching style through AST manipulation. This would eliminate most code behaviour quirks and would remove all those closures, taking a lot of extra indirection out of potentially numerous hot-paths in the execution of a user's app. This would enable much safer and lower impact real-time production monitoring. |
@thefourtheye Oh, good catch. I'll see if I can find some time to fix it later tonight (visiting family for Easter right now). Any thoughts on the concept though? |
5df01fe
to
22e91bc
Compare
Fixed the missing semi-colons and type info in the docs. |
lib/internal/module.js
Outdated
const transforms = []; | ||
|
||
function addTransform(transform) { | ||
transforms.push(transform); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert that transform
is a function here.
doc/api/globals.md
Outdated
@@ -245,6 +245,55 @@ added: v0.3.0 | |||
Use the internal `require()` machinery to look up the location of a module, | |||
but rather than loading the module, just return the resolved filename. | |||
|
|||
### require.addTransform(transform) | |||
|
|||
* `transform` {function} A function to use to transform module text given the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{Function}
doc/api/globals.md
Outdated
|
||
### require.removeTransform(transform) | ||
|
||
* `transform` {function} A function previously given to `require.addTransform()` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
return transforms.reduce((content, transform) => { | ||
const next = transform(content, filename); | ||
if (typeof next !== 'string') { | ||
throw new Error('Module transforms must return the modified content'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to make it known which module transform is faulty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could always attach the failing function to the error object before throwing.
I'm okay with the idea, and I recall some third-party module on npm doing something similar. However:
|
Perhaps there should be some way to alter the extension too, which could change the behaviour of the pipeline? Not sure how that'd work exactly... |
22e91bc
to
cc6083d
Compare
This is a simple pipeline to enable applying transform functions to loaded JavaScript files for various purposes such as transpiling, creating test mocks, recording code coverage or applying custom instrumentation.
cc6083d
to
7f3a29a
Compare
|
||
|
||
```js | ||
require.addTransform((content, filepath) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filename
?
Seems like it would "just work" if the transforms did all the work themselves - both generating the sourcemap (obviously), but also reading and interpreting any incoming sourcemaps. The onus is on the transform to do generate the sourcemap data anyway; it couldn't be done outside the transform. I could imagine a couple of ways sourcemap handling might be made easier for transforms, but seems pretty clear we'd have to play with this quite a bit to figure out how to do it right in practice. Eg, I'd bet you don't want "full" sourcemap support in production, but might want enough for stack trace generation. But as a general "extensibility" concern, I wonder if the incoming parameters and outgoing response of the transform should be ... more extendable. Eg, have the transform return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really excited about this. Hopefully, we can add it without any performance impact
@@ -537,6 +537,8 @@ var resolvedArgv; | |||
// the file. | |||
// Returns exception, if any. | |||
Module.prototype._compile = function(content, filename) { | |||
content = internalModule.applyTransforms(content, filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what the performance impact is here.
@@ -245,6 +245,49 @@ added: v0.3.0 | |||
Use the internal `require()` machinery to look up the location of a module, | |||
but rather than loading the module, just return the resolved filename. | |||
|
|||
### require.addTransform(transform) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work on builtin modules?
I can't really say that I'm a fan of this, largely because I do not see why it needs to be in core. Perhaps I just need to understand the use cases more. There are also questions about how this would interplay with ES6 modules, debugging, monkey patching, and so on. An EPS might be a good way to describe the feature and use cases in depth while you're working on this, so that we can better understand it. |
I had the same thought that an eps would be a good way do doc/explain why it need to be in core. Having said that, I also understand the desire to get feedback before deciding to do that and code is often the easiest way to do that. So just to say if you do want to propose adding it, its a significant enough addition to warrant an eps but it is interesting discuss in advance as well. What you have looks interesting to me... |
Yep, I agree with the EPS suggestion. I just wasn't sure yet exactly what the shape of a proposal would be, so I put together some code that solves my particular need as a proof-of-concept. I don't mind if this PR gets rejected, just wanted to spark some conversation. 👍 |
Since this has been referred to the EP process, I'm going to remove the |
@Qard I know that this is blocked by the EPS but as there is no progress in that thread for a long time - is this something you still want to follow up on? |
I still want the feature myself, but it doesn't seem like there's much interest in it. Maybe worth reevaluating after ESM has been out for a bit. |
@Qard ok, I am not certain what we should do with the PR in the meanwhile. Would you want to keep it open or is it fine to close it? |
Closing. I'll re-evaluate later to see how it applies with ESM stuff now existing. |
This feature simply passes module text content and filename through a transformer function which could be used for things like applying AST transforms to the source before it is compiled.
This is just an idea I'm playing with, I could use some feedback. Is this something people want? Is this a reasonable approach?
The ability to intercept the text content would make it easier to do several things like transpiling code, injecting code coverage hooks or applying custom instrumentation.
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)
src