Allow keywords to overlap identifiers #568

AntonyBlakey · 2023-08-19T10:27:19Z

This is valid code in 0.4.21:

contract ClientReceipt {
    event Deposit();
    function deposit() public payable {
        uint256 emit;
        emit Deposit();
    }
}

The key thing is that the emit statement is enabled from 0.4.21, but emit is still valid as an identifier. We currently cannot handle this, and it makes it impossible to compute over the versioning of tokens vs. identifiers.

Xanewok · 2023-09-04T21:18:54Z

https://docs.soliditylang.org/en/v0.8.21/grammar.html#a4.SolidityParser.identifier

In 0.8.21 from, error, revert and global can also overlap:

pragma solidity ^0.8.21;

contract Contract {
    error MyError();

    function deposit() public payable {
        uint256 error;
        uint256 from;
        uint256 revert;
        uint256 global;

        revert MyError();
    }
}

This allows using `constructor` keyword in 0.4.22 but some keywords are too eagerly reserved like `leave` in Yul since 0.6.0, whereas it's actually reserved in 0.7.1. See NomicFoundation#568.

Part of #638 ### Outline The new definition is used to construct the grammar definition model from the DSL v1, which is then used by the codegen to create the parser and rule/token kinds, like before. The rough translation done here is to translate every struct/enum/repeated/separated/precedence expression as a rule/non-terminal kind (with the exception of choice nodes, which are inlined, as per #650 (comment)); the rest (i.e. trivia, fragment, token, keywords) are translated as "old" tokens/terminals. ### More context In general, this requires more polish and decision which rules are we comfortable with leaving and fixing leftover issues such as explicit `PrecedenceExpression#rule_name` added to fit the old DSL v1 model/codegen better. I specifically didn't want to touch the v1 codegen/model yet, because it helped being able to incrementally migrate to a new definition and I wanted to limit changes done in a single PR, so that it's easier to review. We can modify the codegen/model to fit the v2 model better once we fully migrate and remove the old DSL v1. The translation/construction is done at build "run-time" rather than inside the newly introduced definition "compiler" proc macro. There's not much explicit reason for that, other than the fact that it was quicker for me to work on (quicker dev cycle) and the logic is just plain Rust code. This can be moved inside the proc macro later on, if we decide that it's better that way. ### Differences Because the DSL v2 takes a more named and structured approach (as it's designed to be a model for the strongly typed CST/AST), translating it 1-1 to our current parser structure is impossible or even desired. There are some rules that were introduced and it doesn't make much sense for some to replicate the old behaviour. 1. At the moment, it's impossible to specify an optional *separated* item, so the following ones are introduced as wrappers: - `TupleValue` (used in `TupleValuesList`) - `TupleMemberDeconstruction` (used in `TupleMembersList`) 2. Some sequences in the structs are optional, but these are now introduced and named, rather than inlined as before: - `IndexAccessEnd` - `ImportAlias` - `UsingAlias` - `VariableDeclarationValue` 3. ~Previously inlined sequence parsers now have to be named~ (fixed with #657) - `NumericExpression` is now split into two, now named, choices: `HexNumberExpression` and `DecimalNumberExpression` Moreover, the following was done to bring back the old CST shape as much as possible: - Some of the new rules where renamed back to the old ones, e.g. some repeated nodes have `List` suffix again - The (Yul)Statement are outlined (as in v0/v1) and re-introduced as a struct wrapper - `ArgumentsDeclaration` is outlined (as in v0/v1) as well - `FunctionCallOptions` is outlined again as a `(NamedArgs NamedArgs*)` rather than `NamedArgs | NamedArgs+` to better match the old CST in the singular case (but differs from the old CST in the multiple case) - #653 This was done to unify the definitions where possible and to help with reviewing the CST changes by minimizing the shape differences. Once we move off v0 (#637) and v1 (#638), we will be free to change the final shape, since the old definitions will stop tying us down quite a bit. ### Outstanding issues - [x] Separate outstanding FIXMEs that need to be done after the migration into a task list * LeadingTrivia in v2 is using the v1 definition for now * SourceUnit is hacked to always be followed by Leading Trivia (copies v1; trivia model in v2 is a bit different) * Clean up `PrecedenceExpression#rule_name` and adapt the codegen model to fit the v2 definition better * Stop leaking identifiers by adapting either v1 or v2 models * Keyword trie inclusion should be reworked to not require synthetic rules over all keywords (v1 model) and to properly allow keywords to overlap identifiers (#568) - [x] Fix the Prettier error caused by newly possible nested, empty rules: ``` [error] crates/solidity/testing/snapshots/cst_output/TupleExpression/empty/generated/0.4.11-success.yml: SyntaxError: All collection items must start at the same column (11:9) [error] 9 | - TupleExpression (Rule): # 0..3 "( )" [error] 10 | - OpenParen (Token): "(" # 0..1 [error] > 11 | - TupleValuesList (Rule): [] # 1..1 [error] | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [error] > 12 | - TupleValue (Rule): [] # 1..1 [error] | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [error] > 13 | - CloseParen (Token): ")" # 2..3 [error] | ^ [error] 14 | ```

Closes #568 There is still one outstanding issue where we return a `Vec<TokenKind>` from `next_token`; it'd like to return a more specialized type and ideally pass it on stack (2x2 bytes), rather than on-heap (extra 3x8 bytes for the Vec handle + indirection). We should name it better and properly show that we can return at most 2 token kinds (single token kind or identifier + kw combo). To do: - [x] Return tokens from `next_token` via stack Apart from that, I think this is a more correct approach than #598, especially accounting for the new keyword definition format in DSL v2. The main change is that we only check the keyword trie and additionally the (newly introduced) compound keyword scanners only after the token has been lexed as an identifier. For each context, we collect Identifier scanners used by the keywords and attempt promotion there. The existing lexing performance is not impacted from what I've seen when running the sanctuary tests and I can verify (incl. CST tests) that we now properly parse source that uses contextual keywords (e.g. `from`) and that the compound keywords (e.g. `ufixedMxN`) are properly versioned. This adapts the existing `codegen_grammar` interface that's a leftover from DSLv1; I did that to work on finishing #638; once this is merged and we now properly parse contextual keywords, I'll move to clean it up and reduce the parser codegen indirection (right now we go from v2 -> v1 model -> code generator -> Tera templates; it'd like to at least cut out the v1 model and/or simplify visiting v2 from the existing `CodeGenerator`). Please excuse the WIP comments in the middle; the first and the last ones should make sense when reviewing. I can simplify this a bit for review, if needed.

github-project-automation bot added this to Slang - 2023 Q3 Aug 19, 2023

Xanewok mentioned this issue Sep 19, 2023

Implement support for contextual keywords #598

Closed

Xanewok added this to Slang - 2023 Q4 Nov 2, 2023

Xanewok removed this from Slang - 2023 Q3 Nov 2, 2023

OmarTawfik assigned Xanewok Nov 8, 2023

Xanewok mentioned this issue Nov 12, 2023

Use DSL v2 for parser codegen #650

Merged

2 tasks

Xanewok moved this to In Progress in Slang - 2023 Q4 Dec 19, 2023

Xanewok mentioned this issue Dec 27, 2023

Implement (contextual) keywords and use their versioning from v2 #723

Merged

1 task

Xanewok closed this as completed in #723 Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow keywords to overlap identifiers #568

Allow keywords to overlap identifiers #568

AntonyBlakey commented Aug 19, 2023

Xanewok commented Sep 4, 2023

Allow keywords to overlap identifiers #568

Allow keywords to overlap identifiers #568

Comments

AntonyBlakey commented Aug 19, 2023

Xanewok commented Sep 4, 2023