-
Notifications
You must be signed in to change notification settings - Fork 43
Rediscussion for out-of-order CJS exec? #509
Comments
I seem to remember a PR where you implemented a "golden middle": during the binding phase, Node assumes that the exports are as the imports defined them, leaves the execution to the execution phase, and if during the execution it finds that the imports were not the same as the exports, throws an exception. That would sound to me like an (almost) perfect solution. |
@giltayar yes this was the dynamic modules proposal that we put so much time into at TC39 (https://github.com/nodejs/dynamic-modules). The solution, implemented pretty much exactly as you describe, hits on some quite deep complexities:
The dynamic modules solution was to ban export star of commonjs modules for (1) and to patch the namespace for (2), but this approach was shot down at TC39 after multiple attempts at consensus. The simple solution being discussed here can be implemented in a PR tomorrow, and doesn't come with anything close to these complexities. |
So this proposal determined the names of the CommonJS exports by looking at what was imported, e.g. in I remember there was opposition to evaluating the CommonJS in the instantiation phase, and strong opposition to evaluating CommonJS code twice; but what about just parsing it in one of these pre-evaluation phases? We would miss any CommonJS exports that are dynamically named, but perhaps that's okay? That could be a caveat we explain in the docs as a limitation of importing CommonJS into ESM; hopefully that edge case is rare enough that it shouldn't come up much. |
Is there a way to reuse the results of the parse to minimize the perf hit of that? |
I'm not a fan of parsing CJS modules, because it's heuristic, and there would be no rule we could write in the documentation that says when it will work and when not. Worse: due to a change in the parsing code (which will happen), a module that worked in one version of Node, will stop working in another. Too non-deterministic for my taste. |
True, and from TC39's perspective, they may be right. But if you're anyway deeming CJS to be "out of TC39 scope" (which I'm OK with!) then you can definitely go the "dynamic modules" route, which has the added benefit of making the execution order intuitive to the developer: I believe execution order should be "serial" and not an interleaving (which it will be if we execute CJS in the binding phase), whereby first the CJS in the tree will be executed and then the ESM. |
The problem is that dynamic module is the route that does need TC39 support because it requires actual support in the VM (correct me if I'm wrong, Guy). Early execution or something like a top-level-parse would both be possible if we say "CJS exists outside of the standardized module system and its execution isn't considered module execution". Afaik the current state is:
My personal take: I think partial parse would be a reasonable compromise and JDD's The bad news is that interop-require conflated |
To elaborate on this point: All the more advanced solutions have issues once somebody wants to write a userland loader. If I want to serve CJS from HTTP/archive/compiled-on-the-fly, I may have to reproduce the core behavior. Assuming the core behavior is "parse the source and then apply a specific heuristic to determine likely named exports", that quickly becomes a challenge. The same is true for any kind of static analysis tool (compiler, linter, bundler, ...). It's not a blocker but I wanted to call it out. |
We have discussed this many times, at length. I do not see a way forward and do not want to take this trade off given previous discussions.
I think this might be a bit downplayed. You import not just the CJS shallowly but all of its dependencies as well. This means that:
I agree that named import being able to pull from CJS is valuable, but I do not see the research into the affects that this causes to quickly jump on things. Additionally, this trade off doesn't seem to focus on anything except the ability to have named imports from faux modules as CJS. I would like to understand why none of the above are important vs that fact. It would also be helpful to understand why the consumer needs this and the author is in a situation that updates would be too painful to make. Currently transpiled CJS can be loaded into our ESM implementation, and you can use the full interface it provides, albeit with manually reproducing what a transpiler would do to interface with it. Things like https://github.com/addaleax/gen-esm-wrapper are providing facades for build time. Understanding how the transpiled workflows are unable to maintain build time tooling for purposes like this would also be helpful. Right now I do see that people using their transpilers are having problems when they try to stop using them; but, I don't think they need to stop using them. I do see that other problems with dropping transpilers exist (async loading, mocking, magic variables, etc.) and think the best approach at least in the short term is to continue transpiling, and think that trade offs should match transpiler behavior more than this would. It would be a breaking change from transpiler behavior as well. This trade off does not necessarily have a way out in the future since it would be another breaking change to undo this oddity if we introduce it. If we land such a change I would also never want to revert it. We would forever not match historical transpiler semantics nor match ESM semantics of order of evaluation. Currently, the semantics are somewhat easy to understand, even if they do not provide all of the features of faux modules. I value the ability to understand/teach how my program works, and introducing implicit and transitive preemption is far from desirable to me. Additional mitigations for performance like late binding also make this even worse for console based debugging and match transpilers potentially even less. Providing named imports from CJS does not solve a variety of other issues with ESM adoption, and I am unclear on how a variety of other upcoming and existing problems wouldn't be made even worse by taking this trade off. Things like standardized JSON modules are in talks in both TC39 and WHATWG and are being looked at as only providing a default export. Providing named imports from CJS and not from JSON would just be more confusion rather than having at least a semi-uniform approach for integrating values that are not designed for ESM. I do not want to re-discuss all of these issues unless we can discuss the trade offs on both directions. |
I forgot that this also may violate things like tick ordering of queued tasks, but I assume that is understood by the nature of this hoisting. |
@bmeck Does your take above apply to any of the options from @jkrems’ comment, or the overall idea of executing CommonJS out of order? I'm not clear on which proposal you're pointing out problems with. Looking at Jan's options, it sounds like the last one (Default Only) is what we have now; Early Execution is probably disqualified for all the reasons you listed in your comment about it cementing CommonJS primacy; Dynamic Modules is a no-go without TC39 buy-in; and that leaves (Partial) Parse. I don't think the parse option has all the timing and other issues you're discussing, does it? Aside from the parse possibly missing some names, and the issue with |
@GeoffreyBooth parsing does not have the issues of timing, but does have heuristic concerns. See the output of transpilers like: this with |
So others who have thought about this more can propose better solutions, but what I had in mind for a parsing algorithm would be to try to keep it simple enough that it's straightforward to understand. Like:
And . . . that's it? If there are any other really common ways of declaring exports, we could extend our logic to include them, but the idea would be that the docs describe this algorithm in plain English and say that CommonJS exports defined such that they're discoverable via this algorithm will be importable into ESM; and others will not be. So yes, the transpiler output of |
Prior art: https://github.com/rollup/plugins/blob/master/packages/commonjs/src/transform.js |
Regarding perf hit of a double parse. It is my understanding that the fetch / parse / link phase can be done asynchronously. As CJS modules will all be terminal nodes in the graph. While I don't think we are doing this yet we could move the parsing / validation off the main thread. I also think, but could be wrong, they there might be wiggle room in the spec to begin the execution phase before the validation is finished here... This might be a bit more controversial though. I'm also curious if we could potentially code cache the result of the parsing to reuse later... if V8 does not yet have an API for something like that it might be something we could ask about. So, with that said, I think there is likely room to explore the double parse option and ways to architect it so that the performance hit is not significant. Is there a reason it is a non-starter from a spec perspective? |
I do not have any significant objections to parsing but would like to see more numbers around it. To my knowledge parsing to form the export list does not violate any spec expectations (indeed this is what ESM does and across modules no less!), but doing evaluation would be a change. If parsing is done up front, I am unsure if we would need the execution to actually be hoisted. For a variety of runtime dependent stuff like the babel output I linked, if it is setup to be cross module you might be able to fully form the heuristic graph even for |
I have previously always been against parsing for the extra overhead but can possibly get behind it if we do it lazily only on the ESM/CJS boundary. I think it would be important that the parsing is done lazily because if we apply it to all CJS modules loaded that means even Node.js users who don't use ES modules will pay this cost - that is we should remove the injection into the ESM registry to do lazy snapshotting. @bmeck I know you have had a problem with the idea of lazy snapshotting before, would you be able to compromise on that? |
If evaluation order is preserved I can live with it. Might still be
surprising but our current CJS cache in the translator also has a similar
gotcha if you try hard enough.
…On Mon, May 4, 2020, 3:15 PM Guy Bedford ***@***.***> wrote:
I have previously always been against parsing for the extra overhead but
can possibly get behind it if we do it lazily only on the ESM/CJS boundary.
I think it would be important that the parsing is done lazily because if
we apply it to all CJS modules loaded that means even Node.js users who
don't use ES modules will pay this cost - that is we should remove the
injection into the ESM registry to do lazy snapshotting.
@bmeck <https://github.com/bmeck> I know you have had a problem with the
idea of lazy snapshotting before, would you be able to compromise on that?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#509 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABZJI44BT22F65PGFPU4CTRP4O57ANCNFSM4MYNSHUA>
.
|
Great, that sounds like it could well be a possibility then to me, with analysis as @GeoffreyBooth describes in #509 (comment). |
I know we use Acorn for parsing here and there in our internals; is it possible to use V8 itself to parse JavaScript source code and return an abstract syntax tree back to Node? On top of that, could we then submit that AST back into V8 when it’s time to evaluate that code? Rather than asking V8 to parse the source again. If both of these are possible, we could avoid a double parse. Even if only the first is possible, it would be preferable to using Acorn both for speed and to ensure that the parsed AST we work with matches what V8 evaluates. |
V8 does not expose any AST like structure after it parses. The AST is internal to V8 and not a public API. |
/cc @nodejs/v8 who might have opinions |
I'm against the cjs parsing idea. I know of a lot of cjs code that dynamically builds exports which this would fail on. Even if we do go forward with it, we will need to make sure we parse using the exact behaviour of the parser in the engine we use, which acorn doesn't fulfill. Since people also embed node within other engines (for example graaljs), I don't think we can ever reach that. |
Could this be a "perfect is the enemy of good" situation?
If we can statically analyze some exports with minimal performance overhead
is there a reason to not do it?
…On Mon, May 4, 2020 at 6:23 PM Gus Caplan ***@***.***> wrote:
I'm against the cjs parsing idea. I know of a lot of cjs code that
dynamically builds exports which this would fail on. Even if we do go
forward with it, we will need to make sure we parse using the exact
behaviour of the parser in the engine we use, which acorn doesn't fulfill.
Since people also embed node within other engines (for example graaljs), I
don't think we can ever reach that.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#509 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADZYV6AS4AJ3DACFI7V7NDRP4565ANCNFSM4MYNSHUA>
.
|
@MylesBorins since we know the pattern exists, we are creating a breaking change in cjs where there previously was none (dynamic assignment vs static assignment), so I would not go so far as to say this is "good" |
I wouldn’t call it “breaking” when those values are still reachable from the default export. And I would assume that the entire module.exports object would be on the default export in those cases. |
Is it worth mentioning that JS autocomplete/autoimport in vscode, VS, webstorm, and a bunch of other editors, is essentially predicated on the idea that "most JS cjs module exports are statically analyzable"? And that all of those go out the window when you use more dynamic export patterns, too? Don't get me wrong, I'd still rather see the dynamic modules solution happen via some means, but I wouldn't say cjs static analysis is a wholly untested concept. |
@jkrems if you restructure your cjs module to use dynamically created properties and someone using esm depends on named exports it is by definition a breaking change. the graph will not validate. the application will exit with an error. |
As a slight aside, I think we should avoid calling things/people the "enemy" or implying that something is inferior by nature (good vs perfect). I'm not clear on those statements on how they are actually making any claims about the arguments at hand and it is a bit confusing to read. If such a catch phrase is used can we explain the positions of why something is a better trade off more clearly and without implying that other trade offs are in some way evil? |
@bmeck it actually matches the definition of guess:
The suggestion is that we guess the properties of
For any CJS module, |
It does not claim to do so nor does it claim to do so completely, it only claims to create bindings if assignments in the special forms listed are found. In particular it was stated:
I assume you are claiming this to be guesswork based upon "without sufficient information to be sure of being correct", however if we are only concerned with static analysis it seems we do not fulfill your definition of guesswork. Can you clarify why only obtaining the static values might not be considered correct/still be considered guesswork if our stated goal is to only add support for those special forms?
So to be absolutely clear, since |
Because I can still import the other forms, even if the modules group tries to pretend they don't exist.
|
Can you clarify this? I'm unclear on how hoisting the special forms having 2 ways to access them makes the non-special form still being available a problem. Are you stating if we did not allow access to non-static analysis members of CJS modules it would not be a problem since we would be limited to only having one form of access and not a super/subset problem for a special form vs normal form? We could take an alternate stance that since we cannot identify runtime based values we could ban anything non-static. This is actually an interesting potential alternative trade-off we could pursue since
Yes, but I am unclear on how it create creates the breaking change, can you elaborate? Right now it seems that this would not be a breaking change any any way except adding bindings that did not previously exist.
I do not believe so, they could use the REPL to see by importing a namespace and reflection, they could always use the normal form, they could import a namespace, or use docs as well. More pointedly, the same problem is true for all of ESM bindings that are exported and CJS itself to some extent; coders need to know how they can use dependencies by some means of understanding the module being loaded in both CJS and ESM and this is not changed by any algorithm. Do you have an example of where you need to know if this is available as an export in a way that is true for this algorithm but are unable to reproduce a similar issue if you never knew the format of the dependency? |
It's worth noting that since If you narrow it down to only worrying about own string data properties of |
Because when someone requires/imports a module they aren't thinking "ok now i'm going to import a module that is written in commonjs where the commonjs source uses static assignments to module.exports". They're thinking "ok now i'm going to import express".
// module 1.0.0
module.exports = {
a: require('./x/a'),
b: require('./x/b'),
etc
}; // module 1.0.1
for (const name of readdirSync('./x')) {
module.exports[name] = require(`./x/${name}`);
}
If we assume there is documentation, because without documentation you have to guess no matter what you are importing: The documentation for the above module says |
@ljharb that problem is bigger than cjs (wasm exports, for example) so i'm not sure its worth worrying about that much in this context. |
If they are importing an ESM module don't they have the same issue of needing to know what bindings are exported? I'm confused on why needing to know the exported bindings of a CJS module is problematic but ESM requires it.
Thanks for the example. So your concern is that people may accidentally create breakage by moving to more dynamic forms of code in some non-major version. We have seen some issues of this with how
I don't understand the point about it being contradiction, if Node asserts a clear definition about when something will be flagged as an exported binding in CJS and code goes against it there is nothing we can do. Stating that creating exports using a static form flagging binding names is quite similar to how the export statement also flags binding names. It might be good to have a lint rule for this, and for module authors to test their changes, but it doesn't seem like something that is unable to be understood, tested, or even done through feature detection.
I don't understand this comparison, this algorithm is defining a special form of static syntax. It defines what the named exports for ESM are by using that form. As @weswigham points out there is precedent and tooling relying on similar behavior. Comparing WASM and Source Text Module Records also define their exports for ESM using static syntax. How is using a static syntax in CJS different from the fact that ESM also uses static syntax? |
Isn't the entire point of this issue to figure out what to do about all the cjs that won't be updated, or will be updated but won't be transitioned to support esm? If someone is updating their package and wants to support esm (reads our documentation, for example) we should tell them to use an esm wrapper file. @addaleax even created a tool to do it for you: https://github.com/addaleax/gen-esm-wrapper. We shouldn't tell them to write their cjs code in a way such that node can tease details from the ast. |
Given that, it seems like the real concern is existing packages, not new ones, since the new ones should be using an esm wrapper. Would it be worth partially surveying the npm registry itself, to do a comparison between the runtime "named exports" and those static analysis derived? |
FWIW I've opened a PR to improve error messages for the lack of named exports from CJS modules as a way to improve experience in the mean time. |
Yes, and no. From my point of you it is more importantly about the feasibility of being able to port to ESM from faux modules. If you must port an entire dependency graph, that is much more painful that porting parts of your subgraph. For example, if there is a module graph as follows:
The path to removing transpilation is to get A and B off of a transpiler. Since A and B and C are authored by different people this introduces some ordering concerns. C porting to ESM
All of this summarizes as updating C to be real ESM is quite breaking for consumers of C. Therefore the only safe path for C is to keep implementation in CJS and wrap CJS in ESM. B porting to ESM
A porting to ESM
By alleviating some of the back and forth of porting named imports we can at the least allow for fewer steps in porting implementation code over to ESM for top level apps. Additionally, we can avoid dependencies with their own dependencies from needing to wait on leaf dependencies to upgrade to keep tree shaking as well as reducing code churn. |
@bmeck The consumer breakage concern means that named imports don't actually matter; anyone who wants to avoid breaking people will keep their implementation in CJS, with an ESM wrapper (or, if it's stateless, duplicated, with or without a build tool) for the foreseeable future. The thing that would impact what you're talking about is |
So to summarize, the ultimate ask here is to enable named exports from CommonJS into ESM, e.g. One option that seems to still potentially be viable is to attempt a static analysis of the CommonJS package to determine its exports that way, with the understanding that dynamically-named exports and unconventionally defined exports will be missed. There are concerns to this approach around that it doesn't necessarily catch all CommonJS exports, even if we document that it isn't trying to support all possible CommonJS exports; and that what exports are discovered might change over time if we change the static analysis algorithm. The one other option I remember ever being discussed was double execution of CommonJS modules, with the initial execution during the linking (?) phase being only for the purpose of determining the exports. This was dismissed because of the possibility of side effects, e.g. a file being created or deleted twice or text being printed to the console twice, etc. However it occurs to me that there might be a way around that problem: what if there was a way to evaluate CommonJS code without causing side effects? Like if you could evaluate a CommonJS package but sandbox its access to the file system or network, and throw away its Even that has a caveat in that some packages might behave differently in such an environment versus the unrestricted access-to-everything environment that Node normally provides, so there would still be the possibility of inconsistency between the exports available in an all-CommonJS environment and the ones discovered for use in ESM. Ultimately I think the way to settle this is for someone to write the code for either of these approaches, and try it out on the top 1000 or so npm packages, comparing the detected named exports via this new method against what's found via Ultimately the question isn't whether a detection method is good enough, or if we're willing to accept less than 100% accuracy; it's a question of which user experience is the least bad of our options. The current user experience, where only the default export is supported from CommonJS packages, is pretty bad and people are complaining about it. Would a different user experience that supplied named exports but with exceptions be worse? (Obviously, it depends on the exceptions and how many there are and how common each exception is.) I think the only way to reasonably answer this question is to write the code and run it on as wide a dataset as possible, so that we can make an informed decision. Having the data would also provide us with a good answer to users when they come asking why they can only get a default export, or why dynamic exports aren't supported, or why ESM has whatever incompatibility they find as compared with pure CommonJS or a transpiler environment. |
If ESM modules were to also be allowed to execute before full module graph linking, would that make on-import CJS execution more palatable? |
I mean, arguably they already can, as they may have been cached by another module graph which executed prior to the currently resolving one. |
Right, that's (one of the reasons) why I ask. |
It should be noted that a few years ago while google was trying to ship ESM and testing the viability of it in real world productions workflows in the browser, they wanted to do out of order evaluation of purely ESM. They have a document at https://docs.google.com/document/d/1MJK0zigKbH4WFCKcHsWwFAzpU_DZppEAOpYJlIW7M7E/view ; however, as the document explains they chose not to do so for a variety of reasons. This also came up again this year with a similar result. |
I think the problem is less that CJS may execute before ESM but that it always executes before any ESM file may execute in the graph. That means that certain kinds of code (bootstrapping, polyfills) can only ever be written as ESM if there's not a single CJS file involved in the entire program. Which is... a tall order. |
Removing from module agenda for right now, feel free to re add |
@guybedford are we ok closing this? |
I'd like to reopen the discussion of out-of-order CJS exec one last time, given how much criticism there has been over the lack of named exports for CommonJS in ESM.
I can revive a former PR for this which is still sitting around somewhere no problem.
The two "problematic semantics" as far as this group is concerned are:
I have no problem with either of these semantics and never have.
Will other members of this group finally reconsider their positions in the name of practicality for users?
This scenario is definitely one of the biggest catches for users that we can still change. It could be worth re-hashing this old number even if that's a small possibility.
The text was updated successfully, but these errors were encountered: