Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand string interpolation expressions into dedicated AST #4192

Closed
flying-sheep opened this issue Jan 30, 2016 · 8 comments · Fixed by #4198
Closed

Expand string interpolation expressions into dedicated AST #4192

flying-sheep opened this issue Jan 30, 2016 · 8 comments · Fixed by #4198

Comments

@flying-sheep
Copy link

Hi, as seen in #3831, string interpolation like "A #{a} B #{b}" is lexed as if it was 'A '+a+' B '+b.

as mentioned in juliankrispel/decaf#2, there are use cases for having the semantic information survive lexing.

Therefore it would be awesome if we could introduce a new StringInterpolation AST node which is later converted to a concatenation expression.

@michaelficarra
Copy link
Collaborator

👍. This is how I did it in the CoffeeScriptRedux compiler.

@jashkenas
Copy link
Owner

IIRC, that would also help out with some other tricky issues. Like the old implicit call rewriter kludge.

@lydell
Copy link
Collaborator

lydell commented Jan 31, 2016

@jashkenas I think that particular issue was fixed in commit 76c076d ;) But I do agree that information is lost in many places in the parser.

string interpolation like "A #{a} B #{b}" is lexed as if it was 'A '+a+' B '+b

Just a FYI: this is not true. But it is true that the AST looks the same. grammar.coffee:150.

@juliankrispel
Copy link

awesomesauce! This would be super helpful

@lydell
Copy link
Collaborator

lydell commented Feb 2, 2016

@juliankrispel I'm working on this (and have come very far, all tests are currently passing): https://github.com/lydell/coffee-script/tree/node-types

@juliankrispel
Copy link

@lydell omg this is amazing. Tbh I'm not sure what I'm looking at but it's amazing nonetheless :D

@flying-sheep
Copy link
Author

i hope you’re looking at lydell/coffee-script@5e54fc4, which is mostly new, more specific Node classes.

one of which being StringWithInterpolations, i.e. what we need.

changes like this tell me that doing the other stuff also is a very good idea!

@vendethiel
Copy link
Collaborator

It should also fix some other long-standing bugs like this

lydell added a commit to lydell/coffee-script that referenced this issue Feb 2, 2016
Previously, the parser created `Literal` nodes for many things. This resulted in
information loss. Instead of being able to check the node type, we had to use
regexes to tell the different types of `Literal`s apart. That was a bit like
parsing literals twice: Once in the lexer, and once (or more) in the compiler.
It also caused problems, such as `` `this` `` and `this` being indistinguishable
(fixes jashkenas#2009).

Instead returning `new Literal` in the grammar, subtypes of it are now returned
instead, such as `NumberLiteral`, `StringLiteral` and `IdentifierLiteral`. `new
Literal` by itself is only used to represent code chunks that fit no category.

`StringWithInterpolations` has been added as a subtype of `Parens`, and
`RegexWithInterpolations` as a subtype of `Call`. This makes it easier for other
programs to make use of CoffeeScript's "AST" (nodes). For example, it is now
possible to distinguish between `"a #{b} c"` and `"a " + b + " c"`. Fixes jashkenas#4192.

Note, though, that some information is still lost, especially in the lexer. For
example, there is no way to distinguish a heredoc from a regular string, or a
heregex without interpolations from a regular regex.

After the new subtypes were added, they were taken advantage of, removing most
regexes in nodes.coffee. `SIMPLENUM` (which matches non-hex integers) had to be
kept, though, because such numbers need special handling in JavaScript (for
example in `1..toString()`).

An especially nice hack to get rid of was using `new String()` for the token
value for reserved identifiers (to be able to set a property on them which could
survive through the parser). Now it's a good old regular string.

In range literals, slices, splices and for loop steps when number literals
are involved, CoffeeScript can do some optimizations, such as precomputing the
value of, say, `5 - 3` (outputting `2` instead of `5 - 3` literally). As a side
bonus, this now also works with hexadecimal number literals, such as `0x02`.

Finally, this also improves the output of `coffee --nodes`:

    # Before:
    $ bin/coffee -ne 'while true
      "#{a}"
      break'
    Block
      While
        Value
          Bool
        Block
          Value
            Parens
              Block
                Op +
                  Value """"
                  Value
                    Parens
                      Block
                        Value "a" "break"

    # After:
    $ bin/coffee -ne 'while true
      "#{a}"
      break'
    Block
      While
        Value Bool: true
        Block
          Value
            StringWithInterpolations
              Block
                Op +
                  Value StringLiteral: ""
                  Value
                    Parens
                      Block
                        Value IdentifierLiteral: a
          StatementLiteral: break
lydell added a commit to lydell/coffee-script that referenced this issue Feb 3, 2016
Previously, the parser created `Literal` nodes for many things. This resulted in
information loss. Instead of being able to check the node type, we had to use
regexes to tell the different types of `Literal`s apart. That was a bit like
parsing literals twice: Once in the lexer, and once (or more) in the compiler.
It also caused problems, such as `` `this` `` and `this` being indistinguishable
(fixes jashkenas#2009).

Instead returning `new Literal` in the grammar, subtypes of it are now returned
instead, such as `NumberLiteral`, `StringLiteral` and `IdentifierLiteral`. `new
Literal` by itself is only used to represent code chunks that fit no category.

`StringWithInterpolations` has been added as a subtype of `Parens`, and
`RegexWithInterpolations` as a subtype of `Call`. This makes it easier for other
programs to make use of CoffeeScript's "AST" (nodes). For example, it is now
possible to distinguish between `"a #{b} c"` and `"a " + b + " c"`. Fixes jashkenas#4192.

Note, though, that some information is still lost, especially in the lexer. For
example, there is no way to distinguish a heredoc from a regular string, or a
heregex without interpolations from a regular regex.

After the new subtypes were added, they were taken advantage of, removing most
regexes in nodes.coffee. `SIMPLENUM` (which matches non-hex integers) had to be
kept, though, because such numbers need special handling in JavaScript (for
example in `1..toString()`).

An especially nice hack to get rid of was using `new String()` for the token
value for reserved identifiers (to be able to set a property on them which could
survive through the parser). Now it's a good old regular string.

In range literals, slices, splices and for loop steps when number literals
are involved, CoffeeScript can do some optimizations, such as precomputing the
value of, say, `5 - 3` (outputting `2` instead of `5 - 3` literally). As a side
bonus, this now also works with hexadecimal number literals, such as `0x02`.

Finally, this also improves the output of `coffee --nodes`:

    # Before:
    $ bin/coffee -ne 'while true
      "#{a}"
      break'
    Block
      While
        Value
          Bool
        Block
          Value
            Parens
              Block
                Op +
                  Value """"
                  Value
                    Parens
                      Block
                        Value "a" "break"

    # After:
    $ bin/coffee -ne 'while true
      "#{a}"
      break'
    Block
      While
        Value Bool: true
        Block
          Value
            StringWithInterpolations
              Block
                Op +
                  Value StringLiteral: ""
                  Value
                    Parens
                      Block
                        Value IdentifierLiteral: a
          StatementLiteral: break
lydell added a commit to lydell/coffee-script that referenced this issue Feb 9, 2016
Previously, the parser created `Literal` nodes for many things. This resulted in
information loss. Instead of being able to check the node type, we had to use
regexes to tell the different types of `Literal`s apart. That was a bit like
parsing literals twice: Once in the lexer, and once (or more) in the compiler.
It also caused problems, such as `` `this` `` and `this` being indistinguishable
(fixes jashkenas#2009).

Instead returning `new Literal` in the grammar, subtypes of it are now returned
instead, such as `NumberLiteral`, `StringLiteral` and `IdentifierLiteral`. `new
Literal` by itself is only used to represent code chunks that fit no category.

`StringWithInterpolations` has been added as a subtype of `Parens`, and
`RegexWithInterpolations` as a subtype of `Call`. This makes it easier for other
programs to make use of CoffeeScript's "AST" (nodes). For example, it is now
possible to distinguish between `"a #{b} c"` and `"a " + b + " c"`. Fixes jashkenas#4192.

`SuperCall` has been added as a subtype of `Call`. However, I didn't manage to
decouple the two, so most of the `super` logic still lives in `Call`.

Note, though, that some information is still lost, especially in the lexer. For
example, there is no way to distinguish a heredoc from a regular string, or a
heregex without interpolations from a regular regex.

After the new subtypes were added, they were taken advantage of, removing most
regexes in nodes.coffee. `SIMPLENUM` (which matches non-hex integers) had to be
kept, though, because such numbers need special handling in JavaScript (for
example in `1..toString()`).

An especially nice hack to get rid of was using `new String()` for the token
value for reserved identifiers (to be able to set a property on them which could
survive through the parser). Now it's a good old regular string.

In range literals, slices, splices and for loop steps when number literals
are involved, CoffeeScript can do some optimizations, such as precomputing the
value of, say, `5 - 3` (outputting `2` instead of `5 - 3` literally). As a side
bonus, this now also works with hexadecimal number literals, such as `0x02`.

Finally, this also improves the output of `coffee --nodes`:

    # Before:
    $ bin/coffee -ne 'while true
      "#{a}"
      break'
    Block
      While
        Value
          Bool
        Block
          Value
            Parens
              Block
                Op +
                  Value """"
                  Value
                    Parens
                      Block
                        Value "a" "break"

    # After:
    $ bin/coffee -ne 'while true
      "#{a}"
      break'
    Block
      While
        Value Bool: true
        Block
          Value
            StringWithInterpolations
              Block
                Op +
                  Value StringLiteral: ""
                  Value
                    Parens
                      Block
                        Value IdentifierLiteral: a
          StatementLiteral: break
lydell added a commit to lydell/coffee-script that referenced this issue Feb 9, 2016
Previously, the parser created `Literal` nodes for many things. This resulted in
information loss. Instead of being able to check the node type, we had to use
regexes to tell the different types of `Literal`s apart. That was a bit like
parsing literals twice: Once in the lexer, and once (or more) in the compiler.
It also caused problems, such as `` `this` `` and `this` being indistinguishable
(fixes jashkenas#2009).

Instead returning `new Literal` in the grammar, subtypes of it are now returned
instead, such as `NumberLiteral`, `StringLiteral` and `IdentifierLiteral`. `new
Literal` by itself is only used to represent code chunks that fit no category.

`StringWithInterpolations` has been added as a subtype of `Parens`, and
`RegexWithInterpolations` as a subtype of `Call`. This makes it easier for other
programs to make use of CoffeeScript's "AST" (nodes). For example, it is now
possible to distinguish between `"a #{b} c"` and `"a " + b + " c"`. Fixes jashkenas#4192.

`SuperCall` has been added as a subtype of `Call`. However, I didn't manage to
decouple the two, so most of the `super` logic still lives in `Call`.

Note, though, that some information is still lost, especially in the lexer. For
example, there is no way to distinguish a heredoc from a regular string, or a
heregex without interpolations from a regular regex.

After the new subtypes were added, they were taken advantage of, removing most
regexes in nodes.coffee. `SIMPLENUM` (which matches non-hex integers) had to be
kept, though, because such numbers need special handling in JavaScript (for
example in `1..toString()`).

An especially nice hack to get rid of was using `new String()` for the token
value for reserved identifiers (to be able to set a property on them which could
survive through the parser). Now it's a good old regular string.

In range literals, slices, splices and for loop steps when number literals
are involved, CoffeeScript can do some optimizations, such as precomputing the
value of, say, `5 - 3` (outputting `2` instead of `5 - 3` literally). As a side
bonus, this now also works with hexadecimal number literals, such as `0x02`.

Finally, this also improves the output of `coffee --nodes`:

    # Before:
    $ bin/coffee -ne 'while true
      "#{a}"
      break'
    Block
      While
        Value
          Bool
        Block
          Value
            Parens
              Block
                Op +
                  Value """"
                  Value
                    Parens
                      Block
                        Value "a" "break"

    # After:
    $ bin/coffee -ne 'while true
      "#{a}"
      break'
    Block
      While
        Value Bool: true
        Block
          Value
            StringWithInterpolations
              Block
                Op +
                  Value StringLiteral: ""
                  Value
                    Parens
                      Block
                        Value IdentifierLiteral: a
          StatementLiteral: break
lydell added a commit to lydell/coffee-script that referenced this issue Feb 9, 2016
Previously, the parser created `Literal` nodes for many things. This resulted in
information loss. Instead of being able to check the node type, we had to use
regexes to tell the different types of `Literal`s apart. That was a bit like
parsing literals twice: Once in the lexer, and once (or more) in the compiler.
It also caused problems, such as `` `this` `` and `this` being indistinguishable
(fixes jashkenas#2009).

Instead returning `new Literal` in the grammar, subtypes of it are now returned
instead, such as `NumberLiteral`, `StringLiteral` and `IdentifierLiteral`. `new
Literal` by itself is only used to represent code chunks that fit no category.

`StringWithInterpolations` has been added as a subtype of `Parens`, and
`RegexWithInterpolations` as a subtype of `Call`. This makes it easier for other
programs to make use of CoffeeScript's "AST" (nodes). For example, it is now
possible to distinguish between `"a #{b} c"` and `"a " + b + " c"`. Fixes jashkenas#4192.

`SuperCall` has been added as a subtype of `Call`.

Note, though, that some information is still lost, especially in the lexer. For
example, there is no way to distinguish a heredoc from a regular string, or a
heregex without interpolations from a regular regex.

After the new subtypes were added, they were taken advantage of, removing most
regexes in nodes.coffee. `SIMPLENUM` (which matches non-hex integers) had to be
kept, though, because such numbers need special handling in JavaScript (for
example in `1..toString()`).

An especially nice hack to get rid of was using `new String()` for the token
value for reserved identifiers (to be able to set a property on them which could
survive through the parser). Now it's a good old regular string.

In range literals, slices, splices and for loop steps when number literals
are involved, CoffeeScript can do some optimizations, such as precomputing the
value of, say, `5 - 3` (outputting `2` instead of `5 - 3` literally). As a side
bonus, this now also works with hexadecimal number literals, such as `0x02`.

Finally, this also improves the output of `coffee --nodes`:

    # Before:
    $ bin/coffee -ne 'while true
      "#{a}"
      break'
    Block
      While
        Value
          Bool
        Block
          Value
            Parens
              Block
                Op +
                  Value """"
                  Value
                    Parens
                      Block
                        Value "a" "break"

    # After:
    $ bin/coffee -ne 'while true
      "#{a}"
      break'
    Block
      While
        Value Bool: true
        Block
          Value
            StringWithInterpolations
              Block
                Op +
                  Value StringLiteral: ""
                  Value
                    Parens
                      Block
                        Value IdentifierLiteral: a
          StatementLiteral: break
lydell added a commit to lydell/coffee-script that referenced this issue Feb 9, 2016
Previously, the parser created `Literal` nodes for many things. This resulted in
information loss. Instead of being able to check the node type, we had to use
regexes to tell the different types of `Literal`s apart. That was a bit like
parsing literals twice: Once in the lexer, and once (or more) in the compiler.
It also caused problems, such as `` `this` `` and `this` being indistinguishable
(fixes jashkenas#2009).

Instead returning `new Literal` in the grammar, subtypes of it are now returned
instead, such as `NumberLiteral`, `StringLiteral` and `IdentifierLiteral`. `new
Literal` by itself is only used to represent code chunks that fit no category.

`StringWithInterpolations` has been added as a subtype of `Parens`, and
`RegexWithInterpolations` as a subtype of `Call`. This makes it easier for other
programs to make use of CoffeeScript's "AST" (nodes). For example, it is now
possible to distinguish between `"a #{b} c"` and `"a " + b + " c"`. Fixes jashkenas#4192.

`SuperCall` has been added as a subtype of `Call`.

Note, though, that some information is still lost, especially in the lexer. For
example, there is no way to distinguish a heredoc from a regular string, or a
heregex without interpolations from a regular regex.

After the new subtypes were added, they were taken advantage of, removing most
regexes in nodes.coffee. `SIMPLENUM` (which matches non-hex integers) had to be
kept, though, because such numbers need special handling in JavaScript (for
example in `1..toString()`).

An especially nice hack to get rid of was using `new String()` for the token
value for reserved identifiers (to be able to set a property on them which could
survive through the parser). Now it's a good old regular string.

In range literals, slices, splices and for loop steps when number literals
are involved, CoffeeScript can do some optimizations, such as precomputing the
value of, say, `5 - 3` (outputting `2` instead of `5 - 3` literally). As a side
bonus, this now also works with hexadecimal number literals, such as `0x02`.

Finally, this also improves the output of `coffee --nodes`:

    # Before:
    $ bin/coffee -ne 'while true
      "#{a}"
      break'
    Block
      While
        Value
          Bool
        Block
          Value
            Parens
              Block
                Op +
                  Value """"
                  Value
                    Parens
                      Block
                        Value "a" "break"

    # After:
    $ bin/coffee -ne 'while true
      "#{a}"
      break'
    Block
      While
        Value BooleanLiteral: true
        Block
          Value
            StringWithInterpolations
              Block
                Op +
                  Value StringLiteral: ""
                  Value
                    Parens
                      Block
                        Value IdentifierLiteral: a
          StatementLiteral: break
lydell added a commit to lydell/coffee-script that referenced this issue Mar 5, 2016
Previously, the parser created `Literal` nodes for many things. This resulted in
information loss. Instead of being able to check the node type, we had to use
regexes to tell the different types of `Literal`s apart. That was a bit like
parsing literals twice: Once in the lexer, and once (or more) in the compiler.
It also caused problems, such as `` `this` `` and `this` being indistinguishable
(fixes jashkenas#2009).

Instead returning `new Literal` in the grammar, subtypes of it are now returned
instead, such as `NumberLiteral`, `StringLiteral` and `IdentifierLiteral`. `new
Literal` by itself is only used to represent code chunks that fit no category.
(While mentioning `NumberLiteral`, there's also `InfinityLiteral` now, which is
a subtype of `NumberLiteral`.)

`StringWithInterpolations` has been added as a subtype of `Parens`, and
`RegexWithInterpolations` as a subtype of `Call`. This makes it easier for other
programs to make use of CoffeeScript's "AST" (nodes). For example, it is now
possible to distinguish between `"a #{b} c"` and `"a " + b + " c"`. Fixes jashkenas#4192.

`SuperCall` has been added as a subtype of `Call`.

Note, though, that some information is still lost, especially in the lexer. For
example, there is no way to distinguish a heredoc from a regular string, or a
heregex without interpolations from a regular regex. Binary and octal number
literals are indistinguishable from hexadecimal literals.

After the new subtypes were added, they were taken advantage of, removing most
regexes in nodes.coffee. `SIMPLENUM` (which matches non-hex integers) had to be
kept, though, because such numbers need special handling in JavaScript (for
example in `1..toString()`).

An especially nice hack to get rid of was using `new String()` for the token
value for reserved identifiers (to be able to set a property on them which could
survive through the parser). Now it's a good old regular string.

In range literals, slices, splices and for loop steps when number literals
are involved, CoffeeScript can do some optimizations, such as precomputing the
value of, say, `5 - 3` (outputting `2` instead of `5 - 3` literally). As a side
bonus, this now also works with hexadecimal number literals, such as `0x02`.

Finally, this also improves the output of `coffee --nodes`:

    # Before:
    $ bin/coffee -ne 'while true
      "#{a}"
      break'
    Block
      While
        Value
          Bool
        Block
          Value
            Parens
              Block
                Op +
                  Value """"
                  Value
                    Parens
                      Block
                        Value "a" "break"

    # After:
    $ bin/coffee -ne 'while true
      "#{a}"
      break'
    Block
      While
        Value BooleanLiteral: true
        Block
          Value
            StringWithInterpolations
              Block
                Op +
                  Value StringLiteral: ""
                  Value
                    Parens
                      Block
                        Value IdentifierLiteral: a
          StatementLiteral: break
lydell added a commit to lydell/coffee-script that referenced this issue Mar 5, 2016
Previously, the parser created `Literal` nodes for many things. This resulted in
information loss. Instead of being able to check the node type, we had to use
regexes to tell the different types of `Literal`s apart. That was a bit like
parsing literals twice: Once in the lexer, and once (or more) in the compiler.
It also caused problems, such as `` `this` `` and `this` being indistinguishable
(fixes jashkenas#2009).

Instead returning `new Literal` in the grammar, subtypes of it are now returned
instead, such as `NumberLiteral`, `StringLiteral` and `IdentifierLiteral`. `new
Literal` by itself is only used to represent code chunks that fit no category.
(While mentioning `NumberLiteral`, there's also `InfinityLiteral` now, which is
a subtype of `NumberLiteral`.)

`StringWithInterpolations` has been added as a subtype of `Parens`, and
`RegexWithInterpolations` as a subtype of `Call`. This makes it easier for other
programs to make use of CoffeeScript's "AST" (nodes). For example, it is now
possible to distinguish between `"a #{b} c"` and `"a " + b + " c"`. Fixes jashkenas#4192.

`SuperCall` has been added as a subtype of `Call`.

Note, though, that some information is still lost, especially in the lexer. For
example, there is no way to distinguish a heredoc from a regular string, or a
heregex without interpolations from a regular regex. Binary and octal number
literals are indistinguishable from hexadecimal literals.

After the new subtypes were added, they were taken advantage of, removing most
regexes in nodes.coffee. `SIMPLENUM` (which matches non-hex integers) had to be
kept, though, because such numbers need special handling in JavaScript (for
example in `1..toString()`).

An especially nice hack to get rid of was using `new String()` for the token
value for reserved identifiers (to be able to set a property on them which could
survive through the parser). Now it's a good old regular string.

In range literals, slices, splices and for loop steps when number literals
are involved, CoffeeScript can do some optimizations, such as precomputing the
value of, say, `5 - 3` (outputting `2` instead of `5 - 3` literally). As a side
bonus, this now also works with hexadecimal number literals, such as `0x02`.

Finally, this also improves the output of `coffee --nodes`:

    # Before:
    $ bin/coffee -ne 'while true
      "#{a}"
      break'
    Block
      While
        Value
          Bool
        Block
          Value
            Parens
              Block
                Op +
                  Value """"
                  Value
                    Parens
                      Block
                        Value "a" "break"

    # After:
    $ bin/coffee -ne 'while true
      "#{a}"
      break'
    Block
      While
        Value BooleanLiteral: true
        Block
          Value
            StringWithInterpolations
              Block
                Op +
                  Value StringLiteral: ""
                  Value
                    Parens
                      Block
                        Value IdentifierLiteral: a
          StatementLiteral: break
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants