-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement tokenization for some items in proc_macro #43230
Conversation
r? @pnkfelix (rust_highfive has picked a reviewer for you, use r? to override) |
r? @jseyfried cc @nrc There's a few comments in the code particularly around handling inner attributes which I'd be very curious to hear others' thoughts on! I'm not sure that this is the best solution long-term, but I figured it'd be good to at least put this up for review! |
Several compile-fail tests failed.
|
8124ba5
to
8c31595
Compare
8c31595
to
b26b7f1
Compare
ping @jseyfried for a review. |
r? @nrc |
Reviewed, r=me once @nrc gets a chance to peruse ( I believe we can improve the |
Friendly triage ping for you @nrc! |
src/libproc_macro/lib.rs
Outdated
|
||
match nt.0 { | ||
Nonterminal::NtItem(ref item) => { | ||
tokens = item.tokens.as_ref(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since item.token
doesn't include outer attributes, shouldn't we add them back in here?
For example, consider:
#[my_proc_macro]
#[some_attribute]
fn f() {}
Today, my_proc_macro
sees #[some_attribute] fn f() {}
; after this PR, I think it would just see fn f() {}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, excellent point! I wasn't exactly sure how this worked. I think for now that'll have to fall back to the stringification path, but I'll implement that and add a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just use stringification to get the attributes' tokens and then prepend them to the item's real tokens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another excellent point!
So, I thought I understood inner attributes, but I recently found out that I don't :-s So I'm not sure I'll be helpful on those things. I feel like adding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r+ with jseyfried's attribute comment addressed and the extra comment.
src/libsyntax/parse/parser.rs
Outdated
#[derive(Clone)] | ||
enum LastToken { | ||
Collecting(Vec<TokenTree>), | ||
Was(Option<TokenTree>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you comment on how this enum is used please?
This commit adds a new field to the `Item` AST node in libsyntax to optionally contain the original token stream that the item itself was parsed from. This is currently `None` everywhere but is intended for use later with procedural macros.
This partly resolves the `FIXME` located in `src/libproc_macro/lib.rs` when interpreting interpolated tokens. All instances of `ast::Item` which have a list of tokens attached to them now use that list of tokens to losslessly get converted into a `TokenTree` instead of going through stringification and losing span information. cc rust-lang#43081
This test currently fails because the tokenization of an AST item during the expansion of a procedural macro attribute rounds-trips through strings, losing span information.
This is then later used by `proc_macro` to generate a new `proc_macro::TokenTree` which preserves span information. Unfortunately this isn't a bullet-proof approach as it doesn't handle the case when there's still other attributes on the item, especially inner attributes. Despite this the intention here is to solve the primary use case for procedural attributes, attached to functions as outer attributes, likely bare. In this situation we should be able to now yield a lossless stream of tokens to preserve span information.
b26b7f1
to
4886ec8
Compare
@bors: r=nrc,jseyfried |
📌 Commit 4886ec8 has been approved by |
@bors: r=nrc,jseyfried |
💡 This pull request was already approved, no need to approve it again.
|
📌 Commit 4886ec8 has been approved by |
Implement tokenization for some items in proc_macro This PR is a partial implementation of #43081 targeted towards preserving span information in attribute-like procedural macros. Currently all attribute-like macros will lose span information with the input token stream if it's iterated over due to the inability of the compiler to losslessly tokenize an AST node. This PR takes a strategy of saving off a list of tokens in particular AST nodes to return a lossless tokenized version. There's a few limitations with this PR, however, so the old fallback remains in place.
☀️ Test successful - status-appveyor, status-travis |
This commit adds even more pessimization to use the cached `TokenStream` inside of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary AST node and transforming it back into a `TokenStream` to hand off to a procedural macro. Such functionality isn't actually implemented in rustc today, so the way `proc_macro` works today is that it stringifies an AST node and then reparses for a list of tokens. This strategy unfortunately loses all span information, so we try to avoid it whenever possible. Implemented in rust-lang#43230 some AST nodes have a `TokenStream` cache representing the tokens they were originally parsed from. This `TokenStream` cache, however, has turned out to not always reflect the current state of the item when it's being tokenized. For example `#[cfg]` processing or macro expansion could modify the state of an item. Consequently we've seen a number of bugs (rust-lang#48644 and rust-lang#49846) related to using this stale cache. This commit tweaks the usage of the cached `TokenStream` to compare it to our lossy stringification of the token stream. If the tokens that make up the cache and the stringified token stream are the same then we return the cached version (which has correct span information). If they differ, however, then we will return the stringified version as the cache has been invalidated and we just haven't figured that out. Closes rust-lang#48644 Closes rust-lang#49846
…r=nrc proc_macro: Avoid cached TokenStream more often This commit adds even more pessimization to use the cached `TokenStream` inside of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary AST node and transforming it back into a `TokenStream` to hand off to a procedural macro. Such functionality isn't actually implemented in rustc today, so the way `proc_macro` works today is that it stringifies an AST node and then reparses for a list of tokens. This strategy unfortunately loses all span information, so we try to avoid it whenever possible. Implemented in rust-lang#43230 some AST nodes have a `TokenStream` cache representing the tokens they were originally parsed from. This `TokenStream` cache, however, has turned out to not always reflect the current state of the item when it's being tokenized. For example `#[cfg]` processing or macro expansion could modify the state of an item. Consequently we've seen a number of bugs (rust-lang#48644 and rust-lang#49846) related to using this stale cache. This commit tweaks the usage of the cached `TokenStream` to compare it to our lossy stringification of the token stream. If the tokens that make up the cache and the stringified token stream are the same then we return the cached version (which has correct span information). If they differ, however, then we will return the stringified version as the cache has been invalidated and we just haven't figured that out. Closes rust-lang#48644 Closes rust-lang#49846
Ever plagued by rust-lang#43081 the compiler can return surprising spans in situations related to procedural macros. This is exhibited by rust-lang#47983 where whenever a procedural macro is invoked in a nested item context it would fail to have correct span information. While rust-lang#43230 provided a "hack" to cache the token stream used for each item in the compiler it's not a full-blown solution. This commit continues to extend this "hack" a bit more to work for nested items. Previously in the parser the `parse_item` method would collect the tokens for an item into a cache on the item itself. It turned out, however, that nested items were parsed through the `parse_item_` method, so they didn't receive similar treatment. To remedy this situation the hook for collecting tokens was moved into `parse_item_` instead of `parse_item`. Afterwards the token collection scheme was updated to support nested collection of tokens. This is implemented by tracking `TokenStream` tokens instead of `TokenTree` to allow for collecting items into streams at intermediate layers and having them interleaved in the upper layers. All in all, this... Closes rust-lang#47983
rustc: Implement tokenization of nested items Ever plagued by #43081 the compiler can return surprising spans in situations related to procedural macros. This is exhibited by #47983 where whenever a procedural macro is invoked in a nested item context it would fail to have correct span information. While #43230 provided a "hack" to cache the token stream used for each item in the compiler it's not a full-blown solution. This commit continues to extend this "hack" a bit more to work for nested items. Previously in the parser the `parse_item` method would collect the tokens for an item into a cache on the item itself. It turned out, however, that nested items were parsed through the `parse_item_` method, so they didn't receive similar treatment. To remedy this situation the hook for collecting tokens was moved into `parse_item_` instead of `parse_item`. Afterwards the token collection scheme was updated to support nested collection of tokens. This is implemented by tracking `TokenStream` tokens instead of `TokenTree` to allow for collecting items into streams at intermediate layers and having them interleaved in the upper layers. All in all, this... Closes #47983
This PR is a partial implementation of #43081 targeted towards preserving span information in attribute-like procedural macros. Currently all attribute-like macros will lose span information with the input token stream if it's iterated over due to the inability of the compiler to losslessly tokenize an AST node. This PR takes a strategy of saving off a list of tokens in particular AST nodes to return a lossless tokenized version. There's a few limitations with this PR, however, so the old fallback remains in place.