Refactor `Literal` into several subtypes #4198

lydell · 2016-02-02T19:03:13Z

Previously, the parser created Literal nodes for many things. This resulted in
information loss. Instead of being able to check the node type, we had to use
regexes to tell the different types of Literals apart. That was a bit like
parsing literals twice: Once in the lexer, and once (or more) in the compiler.
It also caused problems, such as this and this being indistinguishable
(fixes #2009).

Instead returning new Literal in the grammar, subtypes of it are now returned
instead, such as NumberLiteral, StringLiteral and IdentifierLiteral. new Literal by itself is only used to represent code chunks that fit no category.

StringWithInterpolations has been added as a subtype of Parens, and
RegexWithInterpolations as a subtype of Call. This makes it easier for other
programs to make use of CoffeeScript's "AST" (nodes). For example, it is now
possible to distinguish between "a #{b} c" and "a " + b + " c". Fixes #4192.

SuperCall has been added as a subtype of Call.

Note, though, that some information is still lost, especially in the lexer. For
example, there is no way to distinguish a heredoc from a regular string, or a
heregex without interpolations from a regular regex.

After the new subtypes were added, they were taken advantage of, removing most
regexes in nodes.coffee. SIMPLENUM (which matches non-hex integers) had to be
kept, though, because such numbers need special handling in JavaScript (for
example in 1..toString()).

An especially nice hack to get rid of was using new String() for the token
value for reserved identifiers (to be able to set a property on them which could
survive through the parser). Now it's a good old regular string.

In range literals, slices, splices and for loop steps when number literals
are involved, CoffeeScript can do some optimizations, such as precomputing the
value of, say, 5 - 3 (outputting 2 instead of 5 - 3 literally). As a side
bonus, this now also works with hexadecimal number literals, such as 0x02.

Finally, this also improves the output of coffee --nodes:

# Before:
$ bin/coffee -ne 'while true
  "#{a}"
  break'
Block
  While
    Value
      Bool
    Block
      Value
        Parens
          Block
            Op +
              Value """"
              Value
                Parens
                  Block
                    Value "a" "break"

# After:
$ bin/coffee -ne 'while true
  "#{a}"
  break'
Block
  While
    Value BooleanLiteral: true
    Block
      Value
        StringWithInterpolations
          Block
            Op +
              Value StringLiteral: ""
              Value
                Parens
                  Block
                    Value IdentifierLiteral: a
      StatementLiteral: break

vendethiel · 2016-02-02T20:16:54Z

Looks pretty good on the surface.

Can you please add a test with the new (fixed) behavior of this (not getting rewritten to _this in a bound function)?

lydell · 2016-02-02T20:49:17Z

I actually thought about adding a test, but then I chose not to because I wasn’t sure if we’d then lock us down on implementation details. I was thinking something like this:

nonce = {}
fn = null
(->
  fn = => `this`
).call null
eq nonce, fn.call nonce

But let’s say we wanted to compile => to ES2015 => – then the test wouldn’t work anymore. But I can add a test if that’s wanted (but not tonight).

bjmiller · 2016-02-02T21:09:24Z

This may be a little out of scope, but I don't think that we ever want to compile CS => to ES6 arrow functions. The semantics are very different, and they would be breaking changes for each other. There may need to be, unfortunately, another symbol. :-/

connec · 2016-02-02T23:08:54Z

src/nodes.coffee

-  isAssignable: NO
-  isComplex: NO
+  compileNode: (o) ->
+      code = if o.scope.method?.bound then o.scope.method.context else @value


This function is indented too much, I think.

lydell · 2016-02-03T06:38:06Z

Fixed indentation, added test … and also removed parseNum() (turns out Number() does the job).

(Regarding the test: What if we decide to compile to ES5 some day? (foo) => → function(foo) {}.bind(this)? :) But whatever, we'll think about that then.)

lydell · 2016-02-09T11:28:31Z

@michaelficarra Do you have any opinion on this PR? (Asking since you 👍ed #4192.)

michaelficarra · 2016-02-09T15:50:54Z

src/nodes.coffee

-  isComplex: NO
-  compileNode: -> [@makeCode "null"]
+class exports.Null extends Literal
+  constructor: -> super 'null'


The ThisLiteral constructor is on two lines and this is on one line. Be consistent.

michaelficarra · 2016-02-09T16:14:43Z

Agreed with @vendethiel about super having its own node. But I think it should be done as special case subclasses of Call and Access. That's the only way super is used after all.

lydell · 2016-02-09T16:18:30Z

Special case of Call and Access? Mind elaborating on that?

michaelficarra · 2016-02-09T16:28:39Z

Sorry, I had ECMAScript super on the mind. You're right, just Call.

lydell · 2016-02-09T17:10:25Z

I've fixed most things now. Would be happy for some feedback regarding IdentifierLiteral and SuperCall.

michaelficarra · 2016-02-09T17:15:17Z

test/functions.coffee

+  nonceB = {}
+  fn = null
+  (->
+    fn = => [this is nonceA, `this` is nonceB]


fn = => this is nonceA and `this` is nonceB

And then you can just ok it below.

michaelficarra · 2016-02-09T17:23:22Z

src/grammar.coffee

  ]

  # Alphanumerics are separated from the other **Literal** matchers because
  # they can also serve as keys in object literals.
  AlphaNumeric: [
-    o 'NUMBER',                                 -> new Literal $1
+    o 'NUMBER',                                 -> new NumberLiteral $1


We might want to separate InfinityLiteral from NumberLiteral. Whenever parseFloat returns an infinity value, we can render it simply as 2e308.

$ bin/coffee -ne 'Infinity' Block Value IdentifierLiteral: Infinity

Hehe.

CoffeeScript actually allows Infinity = 0. (That's not valid in strict mode.) Not sure if it's intentional.

Don't really know what to do now.

Well, we don't really do "strict"... #2337 has been open for a while.

But we have test/strict.coffee. #1547

@lydell I'm not talking about a reference to the Infinity global, I'm talking about infinity values (such as the initial value of the Infinity global or the value created by the literals 2e308 or 500 9s in a row).

Is this what you mean?

exports.NumberLiteral = class NumberLiteral extends Literal constructor: (@value) -> return new InfinityLiteral if parseFloat(@value) is Infinity exports.InfinityLiteral = class InfinityLiteral extends Literal constructor: -> super '2e308'

Or should we do the parseFloat check already in the lexer and output an INFINITY token?

The latter please. Sorry for the delay here.

Done. (See the next line.)

lydell · 2016-03-05T15:41:56Z

test/numbers.coffee

+
+test "Infinity", ->
+  eq Infinity, CoffeeScript.eval "0b#{Array(1024 + 1).join('1')}"
+  eq Infinity, CoffeeScript.eval "0o#{Array(342 + 1).join('7')}"


The above two lines actually failed before adding InfinityLiteral. Invalid JS was generated: 0xInfinity.

Awesome 😄.

lydell · 2016-03-05T15:42:52Z

InfinityLiteral is done. I’ll now continue with improving IdentifierLiteral.

michaelficarra · 2016-03-05T15:56:30Z

@lydell Maybe submit it as a separate PR? This should be good to merge right now, right?

Previously, the parser created `Literal` nodes for many things. This resulted in information loss. Instead of being able to check the node type, we had to use regexes to tell the different types of `Literal`s apart. That was a bit like parsing literals twice: Once in the lexer, and once (or more) in the compiler. It also caused problems, such as `` `this` `` and `this` being indistinguishable (fixes jashkenas#2009). Instead returning `new Literal` in the grammar, subtypes of it are now returned instead, such as `NumberLiteral`, `StringLiteral` and `IdentifierLiteral`. `new Literal` by itself is only used to represent code chunks that fit no category. (While mentioning `NumberLiteral`, there's also `InfinityLiteral` now, which is a subtype of `NumberLiteral`.) `StringWithInterpolations` has been added as a subtype of `Parens`, and `RegexWithInterpolations` as a subtype of `Call`. This makes it easier for other programs to make use of CoffeeScript's "AST" (nodes). For example, it is now possible to distinguish between `"a #{b} c"` and `"a " + b + " c"`. Fixes jashkenas#4192. `SuperCall` has been added as a subtype of `Call`. Note, though, that some information is still lost, especially in the lexer. For example, there is no way to distinguish a heredoc from a regular string, or a heregex without interpolations from a regular regex. Binary and octal number literals are indistinguishable from hexadecimal literals. After the new subtypes were added, they were taken advantage of, removing most regexes in nodes.coffee. `SIMPLENUM` (which matches non-hex integers) had to be kept, though, because such numbers need special handling in JavaScript (for example in `1..toString()`). An especially nice hack to get rid of was using `new String()` for the token value for reserved identifiers (to be able to set a property on them which could survive through the parser). Now it's a good old regular string. In range literals, slices, splices and for loop steps when number literals are involved, CoffeeScript can do some optimizations, such as precomputing the value of, say, `5 - 3` (outputting `2` instead of `5 - 3` literally). As a side bonus, this now also works with hexadecimal number literals, such as `0x02`. Finally, this also improves the output of `coffee --nodes`: # Before: $ bin/coffee -ne 'while true "#{a}" break' Block While Value Bool Block Value Parens Block Op + Value """" Value Parens Block Value "a" "break" # After: $ bin/coffee -ne 'while true "#{a}" break' Block While Value BooleanLiteral: true Block Value StringWithInterpolations Block Op + Value StringLiteral: "" Value Parens Block Value IdentifierLiteral: a StatementLiteral: break

lydell · 2016-03-05T16:09:34Z

Ok, I’ll do it in a separate PR. Yes, I consider this PR good to merge.

vendethiel · 2016-03-05T16:13:07Z

👍

Refactor `Literal` into several subtypes

michaelficarra · 2016-03-05T16:20:36Z

Thanks. Great work, @lydell.

lydell · 2016-03-05T21:28:50Z

The follow-up PR became three: #4219, #4220 and #4221.

connec reviewed Feb 2, 2016
View reviewed changes

lydell force-pushed the node-types branch from f256b7e to f94726f Compare February 3, 2016 06:35

michaelficarra reviewed Feb 9, 2016
View reviewed changes

lydell force-pushed the node-types branch from f94726f to 2a4f1cd Compare February 9, 2016 17:08

michaelficarra reviewed Feb 9, 2016
View reviewed changes

lydell force-pushed the node-types branch from 2a4f1cd to 0918db9 Compare February 9, 2016 17:21

michaelficarra reviewed Feb 9, 2016
View reviewed changes

lydell force-pushed the node-types branch 2 times, most recently from 5495920 to e6519e7 Compare February 9, 2016 17:51

lydell force-pushed the node-types branch from e6519e7 to ce82018 Compare March 5, 2016 15:38

lydell reviewed Mar 5, 2016
View reviewed changes

lydell force-pushed the node-types branch from ce82018 to 021d2e4 Compare March 5, 2016 16:08

michaelficarra added a commit that referenced this pull request Mar 5, 2016

Merge pull request #4198 from lydell/node-types

8afb7cc

Refactor `Literal` into several subtypes

michaelficarra merged commit 8afb7cc into jashkenas:master Mar 5, 2016

lydell deleted the node-types branch March 5, 2016 16:29

vendethiel mentioned this pull request Sep 2, 2016

Embedded javascript this in fat-arrow function should not compile to _this #4302

Closed

GeoffreyBooth mentioned this pull request Sep 20, 2016

[CS2] Output ES2015 arrow functions, default parameters, rest parameters #4311

Merged

lydell mentioned this pull request Sep 26, 2016

Class bodies shouldn't reference arguments in 1.11.0 #4320

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `Literal` into several subtypes #4198

Refactor `Literal` into several subtypes #4198

lydell commented Feb 2, 2016

vendethiel commented Feb 2, 2016

lydell commented Feb 2, 2016

bjmiller commented Feb 2, 2016

connec Feb 2, 2016

lydell commented Feb 3, 2016

lydell commented Feb 9, 2016

michaelficarra Feb 9, 2016

michaelficarra commented Feb 9, 2016

lydell commented Feb 9, 2016

michaelficarra commented Feb 9, 2016

lydell commented Feb 9, 2016

michaelficarra Feb 9, 2016

michaelficarra Feb 9, 2016

lydell Feb 9, 2016

vendethiel Feb 9, 2016

lydell Feb 9, 2016

michaelficarra Feb 9, 2016

lydell Feb 9, 2016

michaelficarra Mar 1, 2016

lydell Mar 5, 2016

lydell Mar 5, 2016

michaelficarra Mar 5, 2016

lydell commented Mar 5, 2016

michaelficarra commented Mar 5, 2016

lydell commented Mar 5, 2016

vendethiel commented Mar 5, 2016

michaelficarra commented Mar 5, 2016

lydell commented Mar 5, 2016

Refactor Literal into several subtypes #4198

Refactor Literal into several subtypes #4198

Conversation

lydell commented Feb 2, 2016

vendethiel commented Feb 2, 2016

lydell commented Feb 2, 2016

bjmiller commented Feb 2, 2016

Choose a reason for hiding this comment

lydell commented Feb 3, 2016

lydell commented Feb 9, 2016

Choose a reason for hiding this comment

michaelficarra commented Feb 9, 2016

lydell commented Feb 9, 2016

michaelficarra commented Feb 9, 2016

lydell commented Feb 9, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lydell commented Mar 5, 2016

michaelficarra commented Mar 5, 2016

lydell commented Mar 5, 2016

vendethiel commented Mar 5, 2016

michaelficarra commented Mar 5, 2016

lydell commented Mar 5, 2016

Refactor `Literal` into several subtypes #4198

Refactor `Literal` into several subtypes #4198