Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement tokenization for some items in proc_macro #43230

Merged
merged 4 commits into from
Jul 28, 2017

Conversation

alexcrichton
Copy link
Member

This PR is a partial implementation of #43081 targeted towards preserving span information in attribute-like procedural macros. Currently all attribute-like macros will lose span information with the input token stream if it's iterated over due to the inability of the compiler to losslessly tokenize an AST node. This PR takes a strategy of saving off a list of tokens in particular AST nodes to return a lossless tokenized version. There's a few limitations with this PR, however, so the old fallback remains in place.

@rust-highfive
Copy link
Collaborator

r? @pnkfelix

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member Author

r? @jseyfried

cc @nrc

There's a few comments in the code particularly around handling inner attributes which I'd be very curious to hear others' thoughts on! I'm not sure that this is the best solution long-term, but I figured it'd be good to at least put this up for review!

@rust-highfive rust-highfive assigned jseyfried and unassigned pnkfelix Jul 14, 2017
@kennytm
Copy link
Member

kennytm commented Jul 14, 2017

Several compile-fail tests failed.

[00:40:29] failures:
[00:40:29]     [compile-fail] compile-fail/import-prefix-macro-1.rs
[00:40:29]     [compile-fail] compile-fail/issue-20616-2.rs
[00:40:29]     [compile-fail] compile-fail/issue-39616.rs
[00:40:29]     [compile-fail] compile-fail/privacy/restricted/tuple-struct-fields/test.rs
[00:40:29]     [compile-fail] compile-fail/privacy/restricted/tuple-struct-fields/test2.rs
[00:40:29]     [compile-fail] compile-fail/privacy/restricted/tuple-struct-fields/test3.rs
[00:40:29]     [compile-fail] compile-fail/self-vs-path-ambiguity.rs
[00:40:29] 
[00:40:29] test result: FAILED. 2683 passed; 7 failed; 13 ignored; 0 measured; 0 filtered out

@shepmaster shepmaster added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 14, 2017
@arielb1
Copy link
Contributor

arielb1 commented Jul 18, 2017

ping @jseyfried for a review.

@alexcrichton
Copy link
Member Author

r? @nrc

@rust-highfive rust-highfive assigned nrc and unassigned jseyfried Jul 22, 2017
@jseyfried
Copy link
Contributor

Reviewed, r=me once @nrc gets a chance to peruse (fn collect_tokens could use a second pair of eyes).

I believe we can improve the LastToken data model, but deferring that to another PR is fine. I think the best way forward is to extend the Cursor API to allow efficient slicing.

@alexcrichton
Copy link
Member Author

Friendly triage ping for you @nrc!


match nt.0 {
Nonterminal::NtItem(ref item) => {
tokens = item.tokens.as_ref();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since item.token doesn't include outer attributes, shouldn't we add them back in here?
For example, consider:

#[my_proc_macro]
#[some_attribute]
fn f() {}

Today, my_proc_macro sees #[some_attribute] fn f() {}; after this PR, I think it would just see fn f() {}.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, excellent point! I wasn't exactly sure how this worked. I think for now that'll have to fall back to the stringification path, but I'll implement that and add a test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just use stringification to get the attributes' tokens and then prepend them to the item's real tokens?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another excellent point!

@nrc
Copy link
Member

nrc commented Jul 28, 2017

So, I thought I understood inner attributes, but I recently found out that I don't :-s So I'm not sure I'll be helpful on those things.

I feel like adding tokens directly to Item is wrong some how. I think it would be better to add it to the Span. I suppose the former is more practical for now since we're only adding to items, not all nodes. We might also replace the span by tokens, so I'm not sure which is best in the long term. Which is to say, this is probably fine in the short term.

Copy link
Member

@nrc nrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+ with jseyfried's attribute comment addressed and the extra comment.

#[derive(Clone)]
enum LastToken {
Collecting(Vec<TokenTree>),
Was(Option<TokenTree>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment on how this enum is used please?

This commit adds a new field to the `Item` AST node in libsyntax to optionally
contain the original token stream that the item itself was parsed from. This is
currently `None` everywhere but is intended for use later with procedural
macros.
This partly resolves the `FIXME` located in `src/libproc_macro/lib.rs` when
interpreting interpolated tokens. All instances of `ast::Item` which have a list
of tokens attached to them now use that list of tokens to losslessly get
converted into a `TokenTree` instead of going through stringification and losing
span information.

cc rust-lang#43081
This test currently fails because the tokenization of an AST item during the
expansion of a procedural macro attribute rounds-trips through strings, losing
span information.
This is then later used by `proc_macro` to generate a new
`proc_macro::TokenTree` which preserves span information. Unfortunately this
isn't a bullet-proof approach as it doesn't handle the case when there's still
other attributes on the item, especially inner attributes.

Despite this the intention here is to solve the primary use case for procedural
attributes, attached to functions as outer attributes, likely bare. In this
situation we should be able to now yield a lossless stream of tokens to preserve
span information.
@alexcrichton
Copy link
Member Author

@bors: r=nrc,jseyfried

@bors
Copy link
Contributor

bors commented Jul 28, 2017

📌 Commit 4886ec8 has been approved by nrc,jseyfried

@alexcrichton
Copy link
Member Author

@bors: r=nrc,jseyfried

@bors
Copy link
Contributor

bors commented Jul 28, 2017

💡 This pull request was already approved, no need to approve it again.

@bors
Copy link
Contributor

bors commented Jul 28, 2017

📌 Commit 4886ec8 has been approved by nrc,jseyfried

@bors
Copy link
Contributor

bors commented Jul 28, 2017

⌛ Testing commit 4886ec8 with merge 126321e...

bors added a commit that referenced this pull request Jul 28, 2017
Implement tokenization for some items in proc_macro

This PR is a partial implementation of #43081 targeted towards preserving span information in attribute-like procedural macros. Currently all attribute-like macros will lose span information with the input token stream if it's iterated over due to the inability of the compiler to losslessly tokenize an AST node. This PR takes a strategy of saving off a list of tokens in particular AST nodes to return a lossless tokenized version. There's a few limitations with this PR, however, so the old fallback remains in place.
@bors
Copy link
Contributor

bors commented Jul 28, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: nrc,jseyfried
Pushing 126321e to master...

@bors bors merged commit 4886ec8 into rust-lang:master Jul 28, 2017
@alexcrichton alexcrichton deleted the more-tokenstream branch August 22, 2017 19:44
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Apr 10, 2018
This commit adds even more pessimization to use the cached `TokenStream` inside
of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary
AST node and transforming it back into a `TokenStream` to hand off to a
procedural macro. Such functionality isn't actually implemented in rustc today,
so the way `proc_macro` works today is that it stringifies an AST node and then
reparses for a list of tokens.

This strategy unfortunately loses all span information, so we try to avoid it
whenever possible. Implemented in rust-lang#43230 some AST nodes have a `TokenStream`
cache representing the tokens they were originally parsed from. This
`TokenStream` cache, however, has turned out to not always reflect the current
state of the item when it's being tokenized. For example `#[cfg]` processing or
macro expansion could modify the state of an item. Consequently we've seen a
number of bugs (rust-lang#48644 and rust-lang#49846) related to using this stale cache.

This commit tweaks the usage of the cached `TokenStream` to compare it to our
lossy stringification of the token stream. If the tokens that make up the cache
and the stringified token stream are the same then we return the cached version
(which has correct span information). If they differ, however, then we will
return the stringified version as the cache has been invalidated and we just
haven't figured that out.

Closes rust-lang#48644
Closes rust-lang#49846
kennytm added a commit to kennytm/rust that referenced this pull request Apr 14, 2018
…r=nrc

proc_macro: Avoid cached TokenStream more often

This commit adds even more pessimization to use the cached `TokenStream` inside
of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary
AST node and transforming it back into a `TokenStream` to hand off to a
procedural macro. Such functionality isn't actually implemented in rustc today,
so the way `proc_macro` works today is that it stringifies an AST node and then
reparses for a list of tokens.

This strategy unfortunately loses all span information, so we try to avoid it
whenever possible. Implemented in rust-lang#43230 some AST nodes have a `TokenStream`
cache representing the tokens they were originally parsed from. This
`TokenStream` cache, however, has turned out to not always reflect the current
state of the item when it's being tokenized. For example `#[cfg]` processing or
macro expansion could modify the state of an item. Consequently we've seen a
number of bugs (rust-lang#48644 and rust-lang#49846) related to using this stale cache.

This commit tweaks the usage of the cached `TokenStream` to compare it to our
lossy stringification of the token stream. If the tokens that make up the cache
and the stringified token stream are the same then we return the cached version
(which has correct span information). If they differ, however, then we will
return the stringified version as the cache has been invalidated and we just
haven't figured that out.

Closes rust-lang#48644
Closes rust-lang#49846
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Jul 22, 2018
Ever plagued by rust-lang#43081 the compiler can return surprising spans in situations
related to procedural macros. This is exhibited by rust-lang#47983 where whenever a
procedural macro is invoked in a nested item context it would fail to have
correct span information.

While rust-lang#43230 provided a "hack" to cache the token stream used for each item in
the compiler it's not a full-blown solution. This commit continues to extend
this "hack" a bit more to work for nested items.

Previously in the parser the `parse_item` method would collect the tokens for an
item into a cache on the item itself. It turned out, however, that nested items
were parsed through the `parse_item_` method, so they didn't receive similar
treatment. To remedy this situation the hook for collecting tokens was moved
into `parse_item_` instead of `parse_item`.

Afterwards the token collection scheme was updated to support nested collection
of tokens. This is implemented by tracking `TokenStream` tokens instead of
`TokenTree` to allow for collecting items into streams at intermediate layers
and having them interleaved in the upper layers.

All in all, this...

Closes rust-lang#47983
bors added a commit that referenced this pull request Jul 24, 2018
rustc: Implement tokenization of nested items

Ever plagued by #43081 the compiler can return surprising spans in situations
related to procedural macros. This is exhibited by #47983 where whenever a
procedural macro is invoked in a nested item context it would fail to have
correct span information.

While #43230 provided a "hack" to cache the token stream used for each item in
the compiler it's not a full-blown solution. This commit continues to extend
this "hack" a bit more to work for nested items.

Previously in the parser the `parse_item` method would collect the tokens for an
item into a cache on the item itself. It turned out, however, that nested items
were parsed through the `parse_item_` method, so they didn't receive similar
treatment. To remedy this situation the hook for collecting tokens was moved
into `parse_item_` instead of `parse_item`.

Afterwards the token collection scheme was updated to support nested collection
of tokens. This is implemented by tracking `TokenStream` tokens instead of
`TokenTree` to allow for collecting items into streams at intermediate layers
and having them interleaved in the upper layers.

All in all, this...

Closes #47983
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants