Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Template string syntax highlighting (release-1.4) #1477

Closed
derekcicerone-zz opened this issue Dec 13, 2014 · 13 comments
Closed

Template string syntax highlighting (release-1.4) #1477

derekcicerone-zz opened this issue Dec 13, 2014 · 13 comments
Labels
Bug A bug in TypeScript Fixed A PR has been merged for this issue

Comments

@derekcicerone-zz
Copy link

I'm playing with the new support for template strings in release-1.4 and seeing potential issues with syntax highlighting.

In a single-line template string it looks like the string is classified as a keyword.

In a multi-line template string like this:

var template = `/hello/
/world/`;

It looks like the "/hello/" is classified as a keyword and the "/world/" as a regexp.

Sorry if this is just not implemented yet. I looked for an issue first but didn't see one.

@mhegazy
Copy link
Contributor

mhegazy commented Dec 14, 2014

The issue here is that the there are two classifiers, one lexical and one syntactic.

The lexical classifier is ment to be fast and stateless, something you run on the UI thread as the user is typing and get a result in ten milliseconds or so. The way this works is by only scanning, and looking at the lexim kind. This works well most of the time, but it can not get context sensitive constructs that require parsing, such as "var" in an object literal property name, without knowing you are inside an object literal it can be a keyword.

The other classifier is syntactic. This is something that you run on a background thread with a lesser time limitation, say less than a 100 ms so you can afford to parse, and inspect the parse tree. This one is the accurate one as it had all the needed information.

We are now adding incremental parsing support that should make syntactic parsing on an active file fast enough to be your only classifier you need.

@CyrusNajmabadi
Copy link
Contributor

Also, we intend to have a dedicated thread for doing syntactic classification. That should help make sure you get correct classification in very reasonable times.

@mhegazy
Copy link
Contributor

mhegazy commented Dec 14, 2014

Forgot to add the details:

The lexical classifier on ts.Classifier interface, accessible through:
createClassifier(...).getClassificationsForLine()

The syntactic classifier is on the ts.LanguageService interface accessible through:
createLanguageservice(...).getSyntacticClassifications()

For completeness, though not related to this issue, there is a third kind of classifiers: semantic classifier. ts.LanguageService.getSemanticClassifications, this one will give you more information about semantic classification for identifiers, e.g are they classes, interfaces, modules.. etc. this one uses the semantic state of the program along with the parse tree to figure out what identifiers really mean.

@mhegazy mhegazy added By Design Deprecated - use "Working as Intended" or "Design Limitation" instead and removed By Design Deprecated - use "Working as Intended" or "Design Limitation" instead labels Dec 14, 2014
@derekcicerone-zz
Copy link
Author

Cool, thanks for the context (I have a much better sense of what each classifier does now). I'm currently calling getClassificationsForLine() (this is for the TypeScript plugin). I'm kinda curious, why are template strings treated differently from old school strings by the lexical classifier? Is there something specific about template strings that makes them different from normal strings in terms of syntax highlighting?

@DanielRosenwasser
Copy link
Member

Hey @derekcicerone, I actually had some work going on with the lexical classifier for template strings (as template strings were mostly my recent work), and for reasons @mhegazy mentioned, the experience was not very up to par.

Just to elaborate for the sake of documentation, there are actually 4 types of template literal tokens: TemplateHead, TemplateMiddle, TemplateTail, and NoSubstitutionTemplateLiteral. The last one is a basic literal that looks like a string. The other three are components of a TemplateExpression. As soon as we have a TemplateHead, we start parsing out these TemplateSpans, which are pairs of Expressions with either a TemplateMiddle or TemplateTail. This is entirely motivated by the ES6 spec, and I encourage you to take a look into it. Just to make sure we're on the same page:

Syntax Kind Example
NoSubstitutionTemplateLiteral Hello world
TemplateHead ``abc ${`
TemplateMiddle } def ${
TemplateTail `} ghi ``
TemplateExpression abc ${ 0 } def ${ 1 } ghi

So for all practical purposes in our lexical classifier, all we can really do is recognize a TemplateHead and a NoSubstitutionTemplateLiteral. This is because in certain context-sensitive positions, the lexer needs to try to deliver tokens with a different "lexical goal" in mind (which we achieve with our rescan* family of functions), but those two tokens are completely accountable for, because no other token starts with a backtick.

While you can see that the lexical classifier uses some basic heuristics to enhance the classifications it delivers per token, we try to keep it as simple as possible while still delivering a decent experience. Knowing how to differentiate between a CloseBraceToken and the beginning of a TemplateMiddle or TemplateTail is a little complex - we'd really rather not maintain a stack across lines (which is also very error-prone given the way in which the lexical classifier operates).

So even so, you might say "Well why can't I at least get my TemplateHeads and NoSubstitutionTemplateLiterals?" Well, problematically when you have a TemplateTail, you end on a backtick. For example:

`Hello${ ' ' }world!`;

Here's the stream of tokens you'd get for that line without rescanning: TemplateHead, StringLiteral, CloseBraceToken, Identifier ("world"), ExclamationToken, NoSubstitutionTemplateLiteral.

Wait, what? NoSubstitutionTemplateLiteral? Yup! Because we can't contextually know that the CloseBraceToken should have been the TemplateTail, and so you now have a template string at the end of the line. Now the rest of your file is temporarily "poisoned" to be classified as a template string, much in the same way that a /* will turn your entire file into a comment - except that we'd be _wrong_ in this instance.

Given that we're aiming to improve our syntactic classifier to subsume the need for fast-acting lexical classifications, it's not worth agitating our users by making the entire file flash to a different color for them when you enter what should be a TemplateTail.

Hope that answers your question!

@derekcicerone-zz
Copy link
Author

Whoa, that's really cool - thanks for the thorough explanation!

@DanielRosenwasser
Copy link
Member

@derekcicerone, you may be interested in my comment on #1698 (comment).

In short, you can get very basic support if you cherry pick off e42ce9c.

@derekcicerone-zz
Copy link
Author

Ah cool, thanks for the tip!

@DanielRosenwasser
Copy link
Member

A fix is now in master through #2026.

@DanielRosenwasser DanielRosenwasser added Fixed A PR has been merged for this issue Bug A bug in TypeScript and removed By Design Deprecated - use "Working as Intended" or "Design Limitation" instead labels Feb 13, 2015
@derekcicerone-zz
Copy link
Author

Awesome, thanks for the fix! I will check this out shortly.

Related question: what is the proper design for using the 3 classifiers? It seems like the lexical classifier can be used synchronously but I'm guessing the semantic/syntactic classifiers need to be used asynchronously. I can add those to some existing async logic for showing occurrences, finding errors as the user is typing etc. I was wondering though, do I call both the syntactic and semantic classifiers? Only one? I'm not really clear on when each is supposed to be used. Any advice would be greatly appreciated!

@derekcicerone-zz
Copy link
Author

This looks great! Much better! Thanks!

@DanielRosenwasser
Copy link
Member

@basarat might also be interested in this for TypeStrong/atom-typescript#71.

You can use the lexical classifier synchronously, and to ensure speed, you probably should, but it ultimately depends on your use case and platform - it's a matter of fine-tuning.

Ideally, you'll have the other two classifiers running concurrently as well (or you can let them take over entirely at some point). As long as you prioritize the classifications of tokens in a certain way such that some can be appropriately overwritten by others, the ordering won't matter in practice. Syntactic classifications should always overwrite lexical classifications, and semantic classifications will overwrite some syntactic classifications (but only for identifiers).

The semantic classifier can be run on the same thread as any other semantically-aware feature (e.g. completions lists), and the syntactic classifier can also be run on the same thread as any syntactically aware feature - though you may want to play around with this; syntactic classifications should also be as fast as possible, so we actually run it on a separate thread.

Glad to hear it's working well for you so far - let me know if you have any other questions/suggestions!

@derekcicerone-zz
Copy link
Author

Cool, thanks for the explanation!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug A bug in TypeScript Fixed A PR has been merged for this issue
Projects
None yet
Development

No branches or pull requests

4 participants