Expand string interpolation expressions into dedicated AST #4192

flying-sheep · 2016-01-30T11:32:10Z

Hi, as seen in #3831, string interpolation like "A #{a} B #{b}" is lexed as if it was 'A '+a+' B '+b.

as mentioned in juliankrispel/decaf#2, there are use cases for having the semantic information survive lexing.

Therefore it would be awesome if we could introduce a new StringInterpolation AST node which is later converted to a concatenation expression.

The text was updated successfully, but these errors were encountered:

michaelficarra · 2016-01-30T16:54:25Z

👍. This is how I did it in the CoffeeScriptRedux compiler.

jashkenas · 2016-01-31T01:54:59Z

IIRC, that would also help out with some other tricky issues. Like the old implicit call rewriter kludge.

lydell · 2016-01-31T06:28:50Z

@jashkenas I think that particular issue was fixed in commit 76c076d ;) But I do agree that information is lost in many places in the parser.

string interpolation like "A #{a} B #{b}" is lexed as if it was 'A '+a+' B '+b

Just a FYI: this is not true. But it is true that the AST looks the same. grammar.coffee:150.

juliankrispel · 2016-01-31T06:42:33Z

awesomesauce! This would be super helpful

lydell · 2016-02-02T07:48:21Z

@juliankrispel I'm working on this (and have come very far, all tests are currently passing): https://github.com/lydell/coffee-script/tree/node-types

juliankrispel · 2016-02-02T09:02:27Z

@lydell omg this is amazing. Tbh I'm not sure what I'm looking at but it's amazing nonetheless :D

flying-sheep · 2016-02-02T09:18:13Z

i hope you’re looking at lydell/coffee-script@5e54fc4, which is mostly new, more specific Node classes.

one of which being StringWithInterpolations, i.e. what we need.

changes like this tell me that doing the other stuff also is a very good idea!

vendethiel · 2016-02-02T09:59:42Z

It should also fix some other long-standing bugs like this

Previously, the parser created `Literal` nodes for many things. This resulted in information loss. Instead of being able to check the node type, we had to use regexes to tell the different types of `Literal`s apart. That was a bit like parsing literals twice: Once in the lexer, and once (or more) in the compiler. It also caused problems, such as `` `this` `` and `this` being indistinguishable (fixes jashkenas#2009). Instead returning `new Literal` in the grammar, subtypes of it are now returned instead, such as `NumberLiteral`, `StringLiteral` and `IdentifierLiteral`. `new Literal` by itself is only used to represent code chunks that fit no category. `StringWithInterpolations` has been added as a subtype of `Parens`, and `RegexWithInterpolations` as a subtype of `Call`. This makes it easier for other programs to make use of CoffeeScript's "AST" (nodes). For example, it is now possible to distinguish between `"a #{b} c"` and `"a " + b + " c"`. Fixes jashkenas#4192. Note, though, that some information is still lost, especially in the lexer. For example, there is no way to distinguish a heredoc from a regular string, or a heregex without interpolations from a regular regex. After the new subtypes were added, they were taken advantage of, removing most regexes in nodes.coffee. `SIMPLENUM` (which matches non-hex integers) had to be kept, though, because such numbers need special handling in JavaScript (for example in `1..toString()`). An especially nice hack to get rid of was using `new String()` for the token value for reserved identifiers (to be able to set a property on them which could survive through the parser). Now it's a good old regular string. In range literals, slices, splices and for loop steps when number literals are involved, CoffeeScript can do some optimizations, such as precomputing the value of, say, `5 - 3` (outputting `2` instead of `5 - 3` literally). As a side bonus, this now also works with hexadecimal number literals, such as `0x02`. Finally, this also improves the output of `coffee --nodes`: # Before: $ bin/coffee -ne 'while true "#{a}" break' Block While Value Bool Block Value Parens Block Op + Value """" Value Parens Block Value "a" "break" # After: $ bin/coffee -ne 'while true "#{a}" break' Block While Value Bool: true Block Value StringWithInterpolations Block Op + Value StringLiteral: "" Value Parens Block Value IdentifierLiteral: a StatementLiteral: break

Previously, the parser created `Literal` nodes for many things. This resulted in information loss. Instead of being able to check the node type, we had to use regexes to tell the different types of `Literal`s apart. That was a bit like parsing literals twice: Once in the lexer, and once (or more) in the compiler. It also caused problems, such as `` `this` `` and `this` being indistinguishable (fixes jashkenas#2009). Instead returning `new Literal` in the grammar, subtypes of it are now returned instead, such as `NumberLiteral`, `StringLiteral` and `IdentifierLiteral`. `new Literal` by itself is only used to represent code chunks that fit no category. `StringWithInterpolations` has been added as a subtype of `Parens`, and `RegexWithInterpolations` as a subtype of `Call`. This makes it easier for other programs to make use of CoffeeScript's "AST" (nodes). For example, it is now possible to distinguish between `"a #{b} c"` and `"a " + b + " c"`. Fixes jashkenas#4192. `SuperCall` has been added as a subtype of `Call`. However, I didn't manage to decouple the two, so most of the `super` logic still lives in `Call`. Note, though, that some information is still lost, especially in the lexer. For example, there is no way to distinguish a heredoc from a regular string, or a heregex without interpolations from a regular regex. After the new subtypes were added, they were taken advantage of, removing most regexes in nodes.coffee. `SIMPLENUM` (which matches non-hex integers) had to be kept, though, because such numbers need special handling in JavaScript (for example in `1..toString()`). An especially nice hack to get rid of was using `new String()` for the token value for reserved identifiers (to be able to set a property on them which could survive through the parser). Now it's a good old regular string. In range literals, slices, splices and for loop steps when number literals are involved, CoffeeScript can do some optimizations, such as precomputing the value of, say, `5 - 3` (outputting `2` instead of `5 - 3` literally). As a side bonus, this now also works with hexadecimal number literals, such as `0x02`. Finally, this also improves the output of `coffee --nodes`: # Before: $ bin/coffee -ne 'while true "#{a}" break' Block While Value Bool Block Value Parens Block Op + Value """" Value Parens Block Value "a" "break" # After: $ bin/coffee -ne 'while true "#{a}" break' Block While Value Bool: true Block Value StringWithInterpolations Block Op + Value StringLiteral: "" Value Parens Block Value IdentifierLiteral: a StatementLiteral: break

Previously, the parser created `Literal` nodes for many things. This resulted in information loss. Instead of being able to check the node type, we had to use regexes to tell the different types of `Literal`s apart. That was a bit like parsing literals twice: Once in the lexer, and once (or more) in the compiler. It also caused problems, such as `` `this` `` and `this` being indistinguishable (fixes jashkenas#2009). Instead returning `new Literal` in the grammar, subtypes of it are now returned instead, such as `NumberLiteral`, `StringLiteral` and `IdentifierLiteral`. `new Literal` by itself is only used to represent code chunks that fit no category. `StringWithInterpolations` has been added as a subtype of `Parens`, and `RegexWithInterpolations` as a subtype of `Call`. This makes it easier for other programs to make use of CoffeeScript's "AST" (nodes). For example, it is now possible to distinguish between `"a #{b} c"` and `"a " + b + " c"`. Fixes jashkenas#4192. `SuperCall` has been added as a subtype of `Call`. Note, though, that some information is still lost, especially in the lexer. For example, there is no way to distinguish a heredoc from a regular string, or a heregex without interpolations from a regular regex. After the new subtypes were added, they were taken advantage of, removing most regexes in nodes.coffee. `SIMPLENUM` (which matches non-hex integers) had to be kept, though, because such numbers need special handling in JavaScript (for example in `1..toString()`). An especially nice hack to get rid of was using `new String()` for the token value for reserved identifiers (to be able to set a property on them which could survive through the parser). Now it's a good old regular string. In range literals, slices, splices and for loop steps when number literals are involved, CoffeeScript can do some optimizations, such as precomputing the value of, say, `5 - 3` (outputting `2` instead of `5 - 3` literally). As a side bonus, this now also works with hexadecimal number literals, such as `0x02`. Finally, this also improves the output of `coffee --nodes`: # Before: $ bin/coffee -ne 'while true "#{a}" break' Block While Value Bool Block Value Parens Block Op + Value """" Value Parens Block Value "a" "break" # After: $ bin/coffee -ne 'while true "#{a}" break' Block While Value Bool: true Block Value StringWithInterpolations Block Op + Value StringLiteral: "" Value Parens Block Value IdentifierLiteral: a StatementLiteral: break

Previously, the parser created `Literal` nodes for many things. This resulted in information loss. Instead of being able to check the node type, we had to use regexes to tell the different types of `Literal`s apart. That was a bit like parsing literals twice: Once in the lexer, and once (or more) in the compiler. It also caused problems, such as `` `this` `` and `this` being indistinguishable (fixes jashkenas#2009). Instead returning `new Literal` in the grammar, subtypes of it are now returned instead, such as `NumberLiteral`, `StringLiteral` and `IdentifierLiteral`. `new Literal` by itself is only used to represent code chunks that fit no category. `StringWithInterpolations` has been added as a subtype of `Parens`, and `RegexWithInterpolations` as a subtype of `Call`. This makes it easier for other programs to make use of CoffeeScript's "AST" (nodes). For example, it is now possible to distinguish between `"a #{b} c"` and `"a " + b + " c"`. Fixes jashkenas#4192. `SuperCall` has been added as a subtype of `Call`. Note, though, that some information is still lost, especially in the lexer. For example, there is no way to distinguish a heredoc from a regular string, or a heregex without interpolations from a regular regex. After the new subtypes were added, they were taken advantage of, removing most regexes in nodes.coffee. `SIMPLENUM` (which matches non-hex integers) had to be kept, though, because such numbers need special handling in JavaScript (for example in `1..toString()`). An especially nice hack to get rid of was using `new String()` for the token value for reserved identifiers (to be able to set a property on them which could survive through the parser). Now it's a good old regular string. In range literals, slices, splices and for loop steps when number literals are involved, CoffeeScript can do some optimizations, such as precomputing the value of, say, `5 - 3` (outputting `2` instead of `5 - 3` literally). As a side bonus, this now also works with hexadecimal number literals, such as `0x02`. Finally, this also improves the output of `coffee --nodes`: # Before: $ bin/coffee -ne 'while true "#{a}" break' Block While Value Bool Block Value Parens Block Op + Value """" Value Parens Block Value "a" "break" # After: $ bin/coffee -ne 'while true "#{a}" break' Block While Value BooleanLiteral: true Block Value StringWithInterpolations Block Op + Value StringLiteral: "" Value Parens Block Value IdentifierLiteral: a StatementLiteral: break

Previously, the parser created `Literal` nodes for many things. This resulted in information loss. Instead of being able to check the node type, we had to use regexes to tell the different types of `Literal`s apart. That was a bit like parsing literals twice: Once in the lexer, and once (or more) in the compiler. It also caused problems, such as `` `this` `` and `this` being indistinguishable (fixes jashkenas#2009). Instead returning `new Literal` in the grammar, subtypes of it are now returned instead, such as `NumberLiteral`, `StringLiteral` and `IdentifierLiteral`. `new Literal` by itself is only used to represent code chunks that fit no category. (While mentioning `NumberLiteral`, there's also `InfinityLiteral` now, which is a subtype of `NumberLiteral`.) `StringWithInterpolations` has been added as a subtype of `Parens`, and `RegexWithInterpolations` as a subtype of `Call`. This makes it easier for other programs to make use of CoffeeScript's "AST" (nodes). For example, it is now possible to distinguish between `"a #{b} c"` and `"a " + b + " c"`. Fixes jashkenas#4192. `SuperCall` has been added as a subtype of `Call`. Note, though, that some information is still lost, especially in the lexer. For example, there is no way to distinguish a heredoc from a regular string, or a heregex without interpolations from a regular regex. Binary and octal number literals are indistinguishable from hexadecimal literals. After the new subtypes were added, they were taken advantage of, removing most regexes in nodes.coffee. `SIMPLENUM` (which matches non-hex integers) had to be kept, though, because such numbers need special handling in JavaScript (for example in `1..toString()`). An especially nice hack to get rid of was using `new String()` for the token value for reserved identifiers (to be able to set a property on them which could survive through the parser). Now it's a good old regular string. In range literals, slices, splices and for loop steps when number literals are involved, CoffeeScript can do some optimizations, such as precomputing the value of, say, `5 - 3` (outputting `2` instead of `5 - 3` literally). As a side bonus, this now also works with hexadecimal number literals, such as `0x02`. Finally, this also improves the output of `coffee --nodes`: # Before: $ bin/coffee -ne 'while true "#{a}" break' Block While Value Bool Block Value Parens Block Op + Value """" Value Parens Block Value "a" "break" # After: $ bin/coffee -ne 'while true "#{a}" break' Block While Value BooleanLiteral: true Block Value StringWithInterpolations Block Op + Value StringLiteral: "" Value Parens Block Value IdentifierLiteral: a StatementLiteral: break

flying-sheep mentioned this issue Jan 30, 2016

Template strings rainforestapp/decaf#2

Open

lydell mentioned this issue Feb 2, 2016

Refactor Literal into several subtypes #4198

Merged

michaelficarra closed this as completed in #4198 Mar 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand string interpolation expressions into dedicated AST #4192

Expand string interpolation expressions into dedicated AST #4192

flying-sheep commented Jan 30, 2016

michaelficarra commented Jan 30, 2016

jashkenas commented Jan 31, 2016

lydell commented Jan 31, 2016

juliankrispel commented Jan 31, 2016

lydell commented Feb 2, 2016

juliankrispel commented Feb 2, 2016

flying-sheep commented Feb 2, 2016

vendethiel commented Feb 2, 2016

Expand string interpolation expressions into dedicated AST #4192

Expand string interpolation expressions into dedicated AST #4192

Comments

flying-sheep commented Jan 30, 2016

michaelficarra commented Jan 30, 2016

jashkenas commented Jan 31, 2016

lydell commented Jan 31, 2016

juliankrispel commented Jan 31, 2016

lydell commented Feb 2, 2016

juliankrispel commented Feb 2, 2016

flying-sheep commented Feb 2, 2016

vendethiel commented Feb 2, 2016