Accept underscores in unicode escapes #43716

MaloJaffre · 2017-08-07T15:45:16Z

I don't know if this need an RFC, but at least the impl is here!

rust-highfive · 2017-08-07T15:45:28Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @nikomatsakis (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

kennytm · 2017-08-07T17:31:21Z

src/libsyntax/parse/lexer/mod.rs

    ///
-    /// At this point, we have already seen the \ and the u, the { is the current character. We
-    /// will read at least one digit, and up to 6, and pass over the }.
+    /// At this point, we have already seen the `\` and the `u`, the `{` is the current character. We


The CI is unhappy with this line because it is too long 🙂

[00:03:14] tidy error: /checkout/src/libsyntax/parse/lexer/mod.rs:968: line longer than 100 chars [00:03:15] some tidy checks failed

kennytm · 2017-08-07T17:33:19Z

issue-43692.rs should be a run-pass test, not a compile-fail test. It should check (assert_eq) if '\u{10__FFFF}' and "\u{10_F0FF}foo\u{1_0_0_0}" are equivalent to their no-underscore counterparts.

arielb1 · 2017-08-08T11:00:54Z

r? @petrochenkov

petrochenkov · 2017-08-09T22:18:28Z

@SimonSapin what do you think about this?

Also ping @rust-lang/lang
This probably doesn't need a full RFC, but will certainly need an FCP.

SimonSapin · 2017-08-09T22:33:35Z

Making numeric escape sequences consistent with integer literals makes sense to me. 👍

joshtriplett · 2017-08-09T23:27:00Z

Please don't allow prefixes or suffixes, but otherwise this seems like a great idea.

petrochenkov · 2017-08-12T19:08:18Z

@MaloJaffre
Could you somehow share the lexing code between unicode escapes and normal hexadecimal literals to ensure the rules are identical?
For example, scan_digits can be reused for unicode escapes.
Unicode escapes are more restrictive, but the restrictions could be enforced after a unicode escape is lexed (this can also give better error reporting and recovery).

MaloJaffre · 2017-08-13T18:01:06Z

Thanks for the suggestion @petrochenkov.
I've also rebased on master.

Edit: Travis failure looks spurious (workers failed to start)

petrochenkov · 2017-08-17T15:33:33Z

src/libsyntax/parse/lexer/mod.rs

+        loop {
+            match self.ch {
+                Some('}') => {
+                    if valid && count == 0 {


if count == 0 would give the same result

No, because in the case \u{#}, we don't want to say that the escape is empty, so we check there was no invalid characters before.

Ah, right, this is in a loop, okay then.

petrochenkov · 2017-08-17T15:36:06Z

src/libsyntax/parse/lexer/mod.rs

+                        self.err_span_char(start_bpos,
+                                           self.pos,
+                                           "invalid character in unicode escape",
+                                           c);


This error can now be reported a lot of times in case of unterminated unicode escapes.
It probably should be reported only the first time.

petrochenkov · 2017-08-17T15:43:55Z

src/libsyntax/parse/mod.rs

+                            diag.struct_span_err(span, "invalid unicode character escape")
+                                .help("unicode escape must be at most 10FFFF")
+                                .emit();
+                                None


I think you can avoid an changing the return type to option here and just return something like Replacement character U+FFFD.

@MaloJaffre
Could you also squash commits after updating the PR?

Thanks for the review @petrochenkov!

Ok, I will shortly do another round of changes and squash everything.

petrochenkov · 2017-08-17T15:52:43Z

Implementation LGTM, modulo comments.

@rfcbot fcp merge

petrochenkov · 2017-08-17T15:55:59Z

I have no rights for @rfcbot, could someone start an FCP?

Fixes rust-lang#43692.

MaloJaffre · 2017-08-17T18:08:46Z

@petrochenkov Done.
I've also added a more precise help message for surrogates.

Edit: Travis failure looks spurious (OSX jobs failed to start).

MaloJaffre · 2017-08-25T21:26:53Z

Friendly ping @nikomatsakis, to start a FCP, if there are no concerns about the implementation.

petrochenkov · 2017-08-25T21:44:13Z

@MaloJaffre
There's one more thing that I forgot about - this needs a feature gate (unless the lang team decides it doesn't), #[feature(unicode_escape_underscores)] or something.
See 50ecee2 for an example of how to add it.
"Parse session" is available from the lexer, so it shouldn't be a problem (I think).

SimonSapin · 2017-08-25T21:54:59Z

I think this is fine without a feature gate. (Though I’m not in any team that would make that decision.)

aturon · 2017-08-25T22:01:33Z

@rfcbot fcp merge

rfcbot · 2017-08-25T22:32:45Z

Team member @aturon has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

pnkfelix · 2017-08-30T23:06:19Z

src/libsyntax/parse/lexer/mod.rs

-            self.bump();
-            count += 1;
+        if let Some('_') = self.ch {
+            // disallow leading `_`


do we need a compile-fail test checking that leading _ is disallowed?

There is already a parse-fail test that checks that, do I need to move it to compile-fail?

rfcbot · 2017-09-01T22:29:24Z

🔔 This is now entering its final comment period, as per the review above. 🔔

rfcbot · 2017-09-11T22:31:20Z

The final comment period is now complete.

petrochenkov · 2017-09-11T23:01:59Z

The final comment period is now complete.

@bors r+

bors · 2017-09-11T23:02:00Z

📌 Commit d4e0e52 has been approved by petrochenkov

bors · 2017-09-12T01:25:29Z

⌛ Testing commit d4e0e52 with merge 11f64d8...

Accept underscores in unicode escapes Fixes #43692. I don't know if this need an RFC, but at least the impl is here!

bors · 2017-09-12T04:14:02Z

☀️ Test successful - status-appveyor, status-travis
Approved by: petrochenkov
Pushing 11f64d8 to master...

chris-morgan · 2017-09-20T08:29:30Z

It is worth noting that most syntax highlighters will need updating to support this. (I just did Vim.)

We need something like a mailing list for syntax highlighters where syntax changes can be announced.

Regular expression highlighters will now need something like \\u\{(?:\x_*){1,6}\}.

Rust gained this in rust-lang/rust#43716.

Changes in Rust highlighting: * Add missing types (keywords) and missing integer suffixes: `i128` & `u128` * https://doc.rust-lang.org/std/primitive.i128.html * https://doc.rust-lang.org/std/primitive.u128.html * Fix: Allow underscores in unicode escapes: rust-lang/rust#43716 Sources: * Rust documentation: https://doc.rust-lang.org/ * Rust in Vim: https://github.com/rust-lang/rust.vim/blob/master/syntax/rust.vim * Rust in ACE: https://github.com/ajaxorg/ace/blob/master/lib/ace/mode/rust_highlight_rules.js

rust-highfive assigned nikomatsakis Aug 7, 2017

MaloJaffre force-pushed the _-in-literals branch from 47416d3 to 7d1fa02 Compare August 7, 2017 15:47

kennytm reviewed Aug 7, 2017

View reviewed changes

arielb1 assigned petrochenkov and unassigned nikomatsakis Aug 8, 2017

arielb1 added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 8, 2017

MaloJaffre force-pushed the _-in-literals branch from 14085a8 to 0bac86c Compare August 13, 2017 18:00

petrochenkov reviewed Aug 17, 2017

View reviewed changes

Accept underscores in unicode escapes

d4e0e52

Fixes rust-lang#43692.

MaloJaffre force-pushed the _-in-literals branch from 0bac86c to d4e0e52 Compare August 17, 2017 18:04

petrochenkov approved these changes Aug 17, 2017

View reviewed changes

petrochenkov removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Aug 20, 2017

rfcbot added the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Aug 25, 2017

pnkfelix reviewed Aug 30, 2017

View reviewed changes

rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Sep 1, 2017

bors added a commit that referenced this pull request Sep 12, 2017

Auto merge of #43716 - MaloJaffre:_-in-literals, r=petrochenkov

11f64d8

Accept underscores in unicode escapes Fixes #43692. I don't know if this need an RFC, but at least the impl is here!

bors merged commit d4e0e52 into rust-lang:master Sep 12, 2017

MaloJaffre deleted the _-in-literals branch September 12, 2017 05:08

chris-morgan added a commit to rust-lang/rust.vim that referenced this pull request Sep 20, 2017

Support underscores in Unicode escapes

32d5688

Rust gained this in rust-lang/rust#43716.

ehuss mentioned this pull request Sep 25, 2017

Add grammar for char, string, byte, and raw literals rust-lang/reference#121

Merged

dtolnay mentioned this pull request Dec 25, 2017

Fails to lex underscore in unicode escapes dtolnay/proc-macro2#37

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accept underscores in unicode escapes #43716

Accept underscores in unicode escapes #43716

MaloJaffre commented Aug 7, 2017

rust-highfive commented Aug 7, 2017

kennytm Aug 7, 2017

kennytm commented Aug 7, 2017

arielb1 commented Aug 8, 2017

petrochenkov commented Aug 9, 2017 •

edited

Loading

SimonSapin commented Aug 9, 2017

joshtriplett commented Aug 9, 2017

petrochenkov commented Aug 12, 2017

MaloJaffre commented Aug 13, 2017 •

edited

Loading

petrochenkov Aug 17, 2017

MaloJaffre Aug 17, 2017

petrochenkov Aug 17, 2017

petrochenkov Aug 17, 2017

petrochenkov Aug 17, 2017

petrochenkov Aug 17, 2017

MaloJaffre Aug 17, 2017 •

edited

Loading

petrochenkov commented Aug 17, 2017

petrochenkov commented Aug 17, 2017

MaloJaffre commented Aug 17, 2017 •

edited

Loading

MaloJaffre commented Aug 25, 2017 •

edited

Loading

petrochenkov commented Aug 25, 2017 •

edited

Loading

SimonSapin commented Aug 25, 2017 •

edited

Loading

aturon commented Aug 25, 2017

rfcbot commented Aug 25, 2017 •

edited by withoutboats

Loading

pnkfelix Aug 30, 2017

MaloJaffre Aug 30, 2017

rfcbot commented Sep 1, 2017

rfcbot commented Sep 11, 2017

petrochenkov commented Sep 11, 2017

bors commented Sep 11, 2017

bors commented Sep 12, 2017

bors commented Sep 12, 2017

chris-morgan commented Sep 20, 2017 •

edited

Loading

Accept underscores in unicode escapes #43716

Accept underscores in unicode escapes #43716

Conversation

MaloJaffre commented Aug 7, 2017

rust-highfive commented Aug 7, 2017

Choose a reason for hiding this comment

kennytm commented Aug 7, 2017

arielb1 commented Aug 8, 2017

petrochenkov commented Aug 9, 2017 • edited Loading

SimonSapin commented Aug 9, 2017

joshtriplett commented Aug 9, 2017

petrochenkov commented Aug 12, 2017

MaloJaffre commented Aug 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MaloJaffre Aug 17, 2017 • edited Loading

Choose a reason for hiding this comment

petrochenkov commented Aug 17, 2017

petrochenkov commented Aug 17, 2017

MaloJaffre commented Aug 17, 2017 • edited Loading

MaloJaffre commented Aug 25, 2017 • edited Loading

petrochenkov commented Aug 25, 2017 • edited Loading

SimonSapin commented Aug 25, 2017 • edited Loading

aturon commented Aug 25, 2017

rfcbot commented Aug 25, 2017 • edited by withoutboats Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rfcbot commented Sep 1, 2017

rfcbot commented Sep 11, 2017

petrochenkov commented Sep 11, 2017

bors commented Sep 11, 2017

bors commented Sep 12, 2017

bors commented Sep 12, 2017

chris-morgan commented Sep 20, 2017 • edited Loading

petrochenkov commented Aug 9, 2017 •

edited

Loading

MaloJaffre commented Aug 13, 2017 •

edited

Loading

MaloJaffre Aug 17, 2017 •

edited

Loading

MaloJaffre commented Aug 17, 2017 •

edited

Loading

MaloJaffre commented Aug 25, 2017 •

edited

Loading

petrochenkov commented Aug 25, 2017 •

edited

Loading

SimonSapin commented Aug 25, 2017 •

edited

Loading

rfcbot commented Aug 25, 2017 •

edited by withoutboats

Loading

chris-morgan commented Sep 20, 2017 •

edited

Loading