-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What is a trailing expression in a block exactly? #61733
Comments
Macros used in the examples: macro empty() {}
macro expr() { 0 }
macro stmt() { 0 ; }
macro stmt_expr() { 0 ; 1 } and equivalent attribute macros producing the same tokens. Definitions: "Token-based expansion" - macro invocation tokens are replaced by tokens produced by the macro, without knowing anything about AST-based context of that macro invocation (whether it's a macro in item position, or in expression position, etc). Having an expansion model like this is necessary if we want to perform eager expansion. 0 . id 2 stmt!() ; bar + To expand it we replace it with its produced tokens without having any idea about the context 0 . id 2 stmt!() ; bar +
=>
0 . id 2 0 ; ; bar + "AST-based expansion" - macro invocation knows its AST node kind (whether it's a macro expression, or a macro item, or something else) and its produced tokens (+ perhaps some other neighbouring tokens) are immediately (re-)parsed as one or multiple AST nodes of that kind, then the new nodes replace the old node. stmt!(); // Statement kind
=>
0 ; // Produced tokens, parsed as a statement `0;`
=>
0; // The parsed statement replaces the previous statement `stmt!();` |
fn main() {
{ 2 }
empty!();
} Token-based expansion:
AST-based expansion:
Compiler's behavior: |
fn main() {
{ 2 }
expr!();
} Token-based expansion:
AST-based expansion:
Compiler's behavior: |
fn main() {
{ 2 }
stmt!();
} Token-based expansion:
AST-based expansion:
Compiler's behavior: |
fn main() {
{ 2 }
stmt_expr!();
} Token-based expansion:
AST-based expansion:
Compiler's behavior: |
fn main() {
{ 2 }
empty!()
} Token-based expansion:
AST-based expansion 1 (
AST-based expansion 2 (
Compiler's behavior: |
fn main() {
{ 2 }
expr!()
} Token-based expansion:
AST-based expansion 1 (
AST-based expansion 2 (
Compiler's behavior: |
fn main() {
{ 2 }
stmt!()
} Token-based expansion:
AST-based expansion 1 (
AST-based expansion 2 (
Compiler's behavior: |
fn main() {
{ 2 }
stmt_expr!()
} Token-based expansion:
AST-based expansion 1 (
AST-based expansion 2 (
Compiler's behavior: |
For attribute macros token-based and AST-based interpretations are the same as for fn-like macros, so I'm just going to list the actual compiler's behaviors.
fn main() {
{ () } // trailing expression
#[cfg(FALSE)]
0; // not trailing expression
}
fn main() {
{ () } // trailing expression
#[cfg(FALSE)]
0 // not trailing expression
}
fn main() {
{ () } // not trailing expression
#[cfg(TRUE)]
0; // not trailing expression
}
fn main() {
{ () } // not trailing expression
#[cfg(TRUE)]
0 // trailing expression
} |
fn main() {
{ () } // trailing expression
#[empty]
0; // not trailing expression
}
fn main() {
{ () } // trailing expression
#[empty]
0 // not trailing expression
}
fn main() {
{ () } // not trailing expression
#[expr]
0; // trailing expression (the attribute transforms the whole statement, including semicolon)
// so it's equivalent to `expr!( 0 ; )`
}
fn main() {
{ () } // not trailing expression
#[expr]
0 // trailing expression
}
fn main() -> u8 {
{ () } // not trailing expression
#[stmt]
0; // not trailing expression
}
fn main() -> () {
{ () } // not trailing expression
#[stmt]
0 // error: macro expansion ignores token `;` and any following
}
fn main() -> () {
{ () } // not trailing expression
#[stmt_expr]
0; // trailing expression (the attribute transforms the whole statement, including semicolon)
// so it's equivalent to `stmt_expr!( 0 ; )`
}
fn main() -> () {
{ () } // not trailing expression
#[stmt_expr]
0 // error: macro expansion ignores token `;` and any following
} |
Proposed resolution:
Non-eager expansionWhen doing a regular macro expansion we have a partially built AST with some nodes in it being unexpanded macros. In the token-based model macro expansion produces a token stream that needs to be converted into AST somehow. macro foo() { 2 + 3 }
fn main() { 1 * foo!() }
=> expand and reparse the whole crate =>
macro foo() { 2 + 3 }
fn main() { 1 * 2 + 3 } // Oh, wait For multiple reasons, starting with "operator priority hygiene" (as in the example above), and ending with performance we want to reparse as few tokens as possible. Proposal 1: reparse context includes only tokens from the macro invocation's AST node. // Example: macro invocation in a statement node
// <reparse_context>
#[inert_attrs] // optional
// <invocation>
foo!()
// </invocation>
; // optional
// </reparse_context> So, we are going to reparse the tokens produced by the macro together with their "closest environment". The reparse context tokens are reparsed as multiple statements and the original statement node is replaced with them. Proposal 2: treat the trailing semicolon-less macro invocation as a statement rather than an expression like it's sometimes treated now. So its produced tokens could be parsed as multiple statements as well. Proposal 3: introduce an empty statement We need it in the token model. Eager expansion0 . id 2 foo!() ; bar + Reparse context either consists of the invocation only, or there is a programmatic way to mark some neighbouring tokens as belonging to it. Examples: What changes and what continues workingSee the next comments |
What stops working: fn main() {
// `0` inside `stmt!()` is no longer a trailing expression
// Fixes https://github.com/rust-lang/rust/issues/33953
stmt!()
} |
What starts working: fn main() {
// fn main() {}
empty!()
}
fn main() {
// fn main() { 0; 1 }, trailing
stmt_expr!()
}
fn main() {
// fn main() { 0; }, non-trailing
#[stmt]
0
}
fn main() {
// fn main() { 0; 1 }, trailing
#[stmt_expr]
0
} The reason is that semicolon-less macro statements can now expand into multiple statements. |
Interesting cases that work now and keep working: fn foo() -> u8 {
{ 0 } // <- trailing expression
empty!();
} In the "reparse everything" model this would expand into We can introduce the rule that all empty statements are thrown away when determining the trailing expression to make fn foo() -> u8 {
{ 0 } // <- trailing?
fn bar() {}
} , but that's not strictly necessary for backward compatibility. |
@petrochenkov: Assuming that it doesn't cause much (or any breakage), I think it would be better to not ignore an empty statement when determining the trailing expression. That is: fn foo() -> u8 {
{ 0 }
empty!(); //~ ERROR: mismatched types
} would fail to typecheck, since we have a trailing (statement) expression of There are couple of reasons I think we should prefer this:
fn trailing_item() -> bool {
{ true }
fn inner() {}
}
fn trailing_stmt() -> bool {
{ true }
let a = 1;
}
fn trailing_smei() -> bool {
{ true }
;
} Of course, these are parsed somewhat differently (the newline in
If the user writes: fn foo() {
{ 0 }
empty!(); //~ ERROR: mismatched types
} Then someone reading the code will get the impression that this function returns a value of Of course, doing this is a breaking change, since this currently compiles: macro_rules! empty {
() => { }
}
fn foo() -> bool {
{ true }
empty!();
} @petrochenkov: Assuming you don't object to the idea, I'll do a Crater run to get an idea of how much breakage this might cause. With a bit of effort, I think I could come up with a future-incompatibility lint that would fire on any functions that would have their behavior changed by this (closures are a different story). During lowering, we would mark 'former werid trailing expressions' like |
@Aaron1011 |
One consequence of the proposed resolution in #61733 (comment): As described in #61733 (comment), we will properly handle trailing semicolons in fn main() {
macro_rules! a {
($e:expr) => { $e; }
}
a!(true)
} This code will continue to compile (note the semicolon after fn main() {
macro_rules! a {
($e:expr) => { $e; }
}
a!(true);
} However, what should happen to this code is less clear: fn main() {
macro_rules! a {
($e:expr) => { $e; }
}
let _val = a!(true);
} This will expand to I think rejecting this code is most consistent with the idea of a 'reparse context'. If we switch to token-based expansion (e.g. not constructing intermediate AST nodes), then allowing this would require reparsing arbitrarily many preceding tokens. For example, we could have However, this may be somewhat surprising to users, |
See rust-lang#61733 (comment) We now preserve the trailing semicolon in a macro invocation, even if the macro expands to nothing. As a result, the following code no longer compiles: ```rust macro_rules! empty { () => { } } fn foo() -> bool { //~ ERROR mismatched { true } //~ ERROR mismatched empty!(); } ``` Previously, `{ true }` would be considered the trailing expression, even though there's a semicolon in `empty!();` This makes macro expansion more token-based.
…expr, r=petrochenkov Treat trailing semicolon as a statement in macro call See rust-lang#61733 (comment) We now preserve the trailing semicolon in a macro invocation, even if the macro expands to nothing. As a result, the following code no longer compiles: ```rust macro_rules! empty { () => { } } fn foo() -> bool { //~ ERROR mismatched { true } //~ ERROR mismatched empty!(); } ``` Previously, `{ true }` would be considered the trailing expression, even though there's a semicolon in `empty!();` This makes macro expansion more token-based.
…expr, r=petrochenkov Treat trailing semicolon as a statement in macro call See rust-lang#61733 (comment) We now preserve the trailing semicolon in a macro invocation, even if the macro expands to nothing. As a result, the following code no longer compiles: ```rust macro_rules! empty { () => { } } fn foo() -> bool { //~ ERROR mismatched { true } //~ ERROR mismatched empty!(); } ``` Previously, `{ true }` would be considered the trailing expression, even though there's a semicolon in `empty!();` This makes macro expansion more token-based.
Is it determined syntactically or semantically?
Before or after macro expansion?
Answering these questions is necessary to specify expansion of macros (stable fn-like ones or unstable attribute ones) in expression and statement positions.
The current implementation is sometimes inconsistent.
Below I'll be dumping some code examples expanded using different expansion models in hope to come up with some rules that are both self-consistent and backward compatible.
cc #33953
The text was updated successfully, but these errors were encountered: