Skip to content
This repository has been archived by the owner on Feb 19, 2018. It is now read-only.

Proposal: Ecosystem: CoffeeScript in Prettier #92

Closed
GeoffreyBooth opened this issue Nov 25, 2017 · 19 comments
Closed

Proposal: Ecosystem: CoffeeScript in Prettier #92

GeoffreyBooth opened this issue Nov 25, 2017 · 19 comments

Comments

@GeoffreyBooth
Copy link
Collaborator

An oft-requested improvement to the CoffeeScript ecosystem is support for the language in Prettier. Our own @lydell is also a maintainer of that project, so I asked him what would be required to make it happen. He boiled it down to two major tasks:

Produce a detailed abstract syntax tree (AST)

Something would need to be able to produce a JSON representation of the nodes of the abstract syntax tree (AST). An AST is a representation of all the parts of syntax of a program, like AssignmentExpression; the site astexplorer.net has great examples. You can see a simplified version of CoffeeScript’s AST by running coffee --nodes test.coffee. A fuller version can be seen by going to http://asaayers.github.io/clfiddle/ and clicking the AST tab, then one of the nodes in the tree.

Since the CoffeeScript compiler itself already has the --nodes option, it seems logical to me to extend it to produce this JSON-based output. Currently the Node API for the coffeescript module doesn’t support a nodes option, so we could add one, and have its output be plain JavaScript objects that could be JSON.stringify’ed.

That wouldn’t be the end of the job, however. We would also need to ensure that this AST output is complete, with the same amount of information as the original source code, such that you could reconstruct the original source using nothing but this AST. In the CoffeeScript compiler, some simplifications are made at the lexer stage, before the nodes get generated: numbers lose
their original 0x, 0o or 0b prefix (if any), whitespace is lost in multiline strings, multiline regexes are turned into a RegExp() call, etc. These changes would need to be refactored to happen in nodes.coffee, or added detail about the node would need to be saved as a property on the node (like we currently tack on the source maps location data or comments). The goal is that this JSON representation of the source code could then be used to output new source code, formatted as Prettier deems it should be formatted. Which leads us to:

Write a CoffeeScript code generator

Once a JSON version of the AST is available, we’ll need some function that takes it as input and produces a string of CoffeeScript source code as output. You’ve probably seen one of these already: js2coffee takes an AST produced by a JavaScript parser and creates CoffeeScript source code from those nodes. The function that does this is called a code generator, and js2coffee’s is here. With dependencies, it’s over a thousand lines of code. There’s one other CoffeeScript code generator that I’m aware of, cscodegen produced by the CoffeeScriptRedux effort, but it hasn’t been updated since 2012.

Prettier is itself a code generator. If it were to support CoffeeScript, a new code generator would need to be written as part of Prettier itself. Within the Prettier codebase, the code generators for supported languages are in src/printer*.js. One code generator supports all of JavaScript plus TypeScript and Flow, and it’s plain printer.js. It’s 5,000 lines of code. Writing a similar generator for CoffeeScript might not be much simpler, but you would be able to use js2coffee and cscodegen’s codebases as reference (not to mention Prettier’s JavaScript code generator) so you’re not starting from scratch.

So . . .

I would be willing to tackle the first task, outputting a detailed JSON AST, if one or more volunteers were up for the second task. Does anyone desire CoffeeScript support in Prettier strongly enough to invest the time in writing a quality CoffeeScript code generator?

@GeoffreyBooth
Copy link
Collaborator Author

By the way, either of these tasks are also investments in the extensibility of CoffeeScript in the future. The js2coffee project was possible in the first place because JavaScript has several excellent parsers that produce detailed ASTs. If js2coffee’s CoffeeScript code generation part could be replaced with a better code generator, js2coffee would be able to support the latest CoffeeScript features (and be more adaptable to future improvements). Coffeelint would be capable of greater things if it had a better AST to work with. And on and on.

CoffeeScriptRedux got so many things right, it’s a shame it never got completed. One of its insights was that code generation should be its own module that took an AST as input. (It supported both cscodegen to generate CoffeeScript, or escodegen to generate JavaScript.) This is also how Babel works, with babel-generator taking an AST and producing JavaScript. This modularity is one of the keys to Babel’s success, and the growth of the ecosystem around it. If the CoffeeScript compiler produced an AST compatible with Babel, the CoffeeScript compiler could outsource the JavaScript code generation to that and therefore jettison nodes.coffee’s 4,000 lines—a quarter of CoffeeScript’s entire codebase!

@lydell
Copy link

lydell commented Nov 25, 2017

Well summarized!

I forgot to mention that every src/printer*.js file in Prettier basically just defines a single function with a big switch statement in it – with one case for every AST node type. In other words, all the function is doing is saying “Here is how you print a number, here is how you print a string, here is how you print an array, here is how you print an if statement, etc.”. Some cases call this function recursively, such as the array case for every item of the array.

One way to go about this would be to start writing a src/printer-coffeescript.js file, and see where you bump into problems. Then, go and improve the CoffeeScript parser around all those problems.

@vendethiel
Copy link

CSR did get a lot right, yes. We probably need Concrete Syntax Tree, but our lexer/rewriter code is... hairy to say the least.

@GeoffreyBooth
Copy link
Collaborator Author

What’s a “concrete” syntax tree?

@vendethiel
Copy link

https://eli.thegreenplace.net/2009/02/16/abstract-vs-concrete-syntax-trees/

A CST is very similar to an AST, but it keeps more parse information around (and doesn't remove seemingly useless nodes).

@GeoffreyBooth
Copy link
Collaborator Author

What I was thinking we would do is generate an AST add similar to Babel's as possible. Then we can crib code from the main JavaScript code path in Prettier to parse it. It'll also be useful for working with other tools to have as “standard” of an AST as possible. We would add extra properties to the AST nodes to preserve the info we would need to generate CoffeeScript again from the tree.

Are you interested in helping tackle this?

@vendethiel
Copy link

The project then becomes pretty much "rewrite the compiler" ... or "upgrade CSR to support all the things CS2 now supports", which is an insane amount of work.
Removing JS code generation from the CS compiler is a very noble goal, but a tremendous time-consuming one.

@GeoffreyBooth
Copy link
Collaborator Author

That's more ambitious than I had in mind. I was thinking of just doing what is proposed at the top of this thread: create a way for the compiler to output an AST as JSON, similar to how it currently outputs nodes data as text via --nodes; and create a printer file in Prettier for CoffeeScript, similar to its existing ones for JavaScript, TypeScript, Markdown and so on.

@vendethiel
Copy link

We're discard too much info imho. Just for implicit objects etc.

@GeoffreyBooth
Copy link
Collaborator Author

Right, that's what would need to be added to the nodes as extra info. Stuff like whether a boolean was written as true versus on or yes, etc. would all need to be added to the AST.

@GeoffreyBooth
Copy link
Collaborator Author

Okay, I’ve taken the first few baby steps in getting CoffeeScript to produce an AST. Check out this branch, then create a test.coffee at the root of the repo with whatever CoffeeScript code you want to see an AST of, then run:

coffee -e "console.log require('util').inspect require('./lib/coffeescript').compile(require('fs').readFileSync('./test.coffee').toString(), nodes: yes), {colors: yes, depth: 10}"

You should see pretty-printed JSON like this:

{ type: 'Block',
  loc: { start: { line: 0, column: 0 }, end: { line: 0, column: 10 } },
  expressions:
   [ { type: 'Assign',
       loc: { start: { line: 0, column: 0 }, end: { line: 0, column: 10 } },
       variable:
        { type: 'IdentifierLiteral',
          loc: { start: { line: 0, column: 0 }, end: { line: 0, column: 5 } },
          value: 'answer' },
       value:
        { type: 'NumberLiteral',
          loc: { start: { line: 0, column: 9 }, end: { line: 0, column: 10 } },
          value: '42' } } ] }

This is the same output as the CLI’s --nodes, in JSON form, with:

  • the node class names as type, which is how the Babel AST has them
  • the location data structured the way Babel does it
  • any primitive (boolean, number, string) properties included, as these are serializable as is into JSON
  • children included recursively

This was all done by adding just one method on the base node class, and for many nodes this is all the data we need. For more complicated nodes, the next step is to override this method to add additional serializable properties to flesh out the objects for those nodes; and in some of those cases, we’ll have to reach back to the lexer to make sure that currently-discarded data is forwarded along into nodes.coffee. Eventually, all objects for all node types should contain complete enough data that we can recreate the original source from this AST. There’s a ways to go to get from here to there, but it’s very doable. The above took less than 50 lines of code.

@rattrayalex
Copy link
Contributor

rattrayalex commented Jan 6, 2018

This is exciting. I may be able to help with some of the prettier parts.

Could we write a test suite to ensure the ast being generated this way is always accurate?

@GeoffreyBooth
Copy link
Collaborator Author

@rattrayalex I would love the help. I’ve started a branch in a local copy of the Prettier repo, I’ll clean it up and post it soon and give you access.

We can certainly write tests around this. We could add a test/nodes.coffee in the CoffeeScript repo that calls CoffeeScript.compile(someCode, nodes: yes) and compares the response to some expected object. We should probably strip out stuff like the loc keys before comparing, so the tests don’t break for reasons we don’t care about as the compiler evolves over time.

@rattrayalex
Copy link
Contributor

rattrayalex commented Jan 6, 2018 via email

@GeoffreyBooth
Copy link
Collaborator Author

@rattrayalex I’ve created a repo of my Prettier fork and branch here, and added you to it. If anyone else would like to contribute, please let me know and I would be happy to add you. In my fork, the default branch is coffeescript, so you can work in other branches and submit pull requests against coffeescript.

There was some major reorganization of the Prettier codebase in the last few months since I initially started my fork, so some of my work needed to be thrown out; but I had only just barely gotten started anyway. Look in the src/language-coffeescript folder, the files in there are where we’ll want to build out CoffeeScript support. See src/language-js and src/language-vue as points of reference.

@GeoffreyBooth
Copy link
Collaborator Author

Once you’ve checked out the Prettier CoffeeScript branch and run yarn to install dependencies, you can see the CoffeeScript code path in action by creating a test.coffee at the root of the repo and then running:

./bin/prettier.js --parser coffeescript test.coffee

Currently I’m just printing the AST, but you have to start somewhere 😄. This is using my nodes branch, so updates to that branch should affect this; you’ll probably want to link the CoffeeScript module installed by Prettier to a local copy, so you can develop in the two in tandem.

@rattrayalex
Copy link
Contributor

rattrayalex commented Jan 29, 2018

Coming back to this, I might be able to help a bit in March but probably not earlier than that 😕 any progress in the last few weeks?

@GeoffreyBooth
Copy link
Collaborator Author

No, but I hope to have some time before March. I’ll push commits into both branches. Feel free to work in those repos, either in the same branches or other branches we can merge into them.

@coffeescriptbot
Copy link
Collaborator

Migrated to jashkenas/coffeescript#4984

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants