Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slang #1032

Merged
merged 162 commits into from
Sep 18, 2024
Merged

Slang #1032

merged 162 commits into from
Sep 18, 2024

Conversation

Janther
Copy link
Contributor

@Janther Janther commented Aug 15, 2024

This is a proposal for a version 2 of this plugin.

It has been rewritten leveraging the power of Nomic Foundation's Slang to provide a much more scalable and better supported parser.

While we are still in beta we have disabled browser support until we can make sure we can fully provide this functionality.

The coverage has been reduced as well since slang's AST is much more detailed than solidity-parser's and while all our tests still pass, they don't provide every possible scenario.

Using with the old parser is still possible but a deprecated warning will be logged.

Copy link

@OmarTawfik OmarTawfik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an initial pass and left a couple of comments. Thanks for working on this. Exciting stuff!

README.md Outdated Show resolved Hide resolved
| StrictAstNode
| Comment
| Identifier
| YulIdentifier
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OmarTawfik It would be beneficial if Identifier and YulIdentifier were in the types.
For example:

export declare class EnumDefinition {
    readonly cst: NonterminalNode;
    private readonly fetch;
    constructor(cst: NonterminalNode);
    get enumKeyword(): TerminalNode;
    get name(): TerminalNode;
    get openBrace(): TerminalNode;
    get members(): EnumMembers;
    get closeBrace(): TerminalNode;
}

I care about getting the location information of name but not from enumKeyword, openBrace and closeBrace.
I went through every TerminalNode in the AST and plucked out the Identifiers and YulIdentifiers but if you make a change in the AST I have no Type security that will trigger an error on my side if I need to adapt. So I would appreciate if it would be possible to deliver something like:

export declare class EnumDefinition {
    readonly cst: NonterminalNode;
    private readonly fetch;
    constructor(cst: NonterminalNode);
    get enumKeyword(): TerminalNode;
    get name(): Identifier;
    get openBrace(): TerminalNode;
    get members(): EnumMembers;
    get closeBrace(): TerminalNode;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion!

but if you make a change in the AST I have no Type security that will trigger an error on my side

To make sure I understand you correctly, do you mean getNodeMetadata().offsets?
Is that because you are storing offsets as an array, and accessing it by a numeric index? Would it help if it was changed to be an object/dict with the field name as keys? for example offsets.name here won't break if the position of name was changed relative to its siblings.

It would be beneficial if Identifier and YulIdentifier were in the types.

How would that help in this case? if they were non-terminal nodes? how would this be tracked otherwise?

One other way you can use Slang API if you want, so that you don't have to do the manual tracking of offsets yourself, is to use cursors.
For example, in getNodeMetadata(), instead of iterating over ast.cst.children(), you can ast.cst.createCursor(), and then use cursor.goToNextSibling() to iterate, and cursor.textOffset() or cursor.textRange() will be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 using a Map with the cst as the key could really improve this aspect of the implementation.
it would even render the passing of an offset moot since the we are already passing the AST to the constructors.
I'll experiment with this and see how it behaves.

Copy link
Contributor Author

@Janther Janther Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unable to use the Node as the key to the Map because the Node object I get from iterating over cst.children() is not the same as when I check ast.cst.

const source = `
pragma solidity ^0.8.26;
`;

const language = new Language('0.8.26');
const parsed = new SourceUnit(
  language.parse(NonterminalKind.SourceUnit, source).tree()
);
parsed.cst.children()[0]; // NonterminalNode { kind: 'SourceUnitMembers', textLength: { utf8: 26, utf16: 26, line: 2, column: 0 }, type: 'Nonterminal', children: λ, createCursor: λ, toJSON: λ, unparse: λ }
parsed.members.cst;       // NonterminalNode { kind: 'SourceUnitMembers', textLength: { utf8: 26, utf16: 26, line: 2, column: 0 }, type: 'Nonterminal', children: λ, createCursor: λ, toJSON: λ, unparse: λ }

parsed.cst.children()[0] === parsed.members.cst; // false

It probably happens because the cst node is not reused but reinstantiated either by Rust, TypeScript, or the communication between Rust and TypeScript.

at the moment I can't use names in an object/dict in the current solution since when I'm iterating over there cst.children() I have no clue about the ast object.

I made a proof of concept in a different branch and even if the tests don't pass because of the described issue, the change impacts greatly in a good way cleaning up a lot.

If you could make the cst point to the same object or maybe add an id field to Terminal and Nonterminal Nodes, let me know so I can make a PR with these changes?

By the way, the idea to have Identifier and YulIdentifier as a specific type is partly related to the offsets array but also related to having clarity to which Terminal Nodes can have comments attached to them on the Prettier side.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out. The id already exists in the Rust API, and I would be happy to add it in TypeScript. It should be a simple change.
When I'm done with WASM, I will also be able to expose the field names themselves as a strongly-typed EdgeLabel enum, so you would be able to refer to and persist the name of get name(): Identifier field as EdgeLabel.Name.

@fvictorio
Copy link
Member

I didn't do an in-depth review of the PR; this is just the result of manually testing it and skimming the code for a couple of hours. If there are things you would like me to review in depth, please point them out to me, because I don't think I'll be able to do a line-by-line review of the whole PR.

Also, leaving inline comments in a PR this big is impossible (GitHub's UI sucks for big diffs), so I will just list my comments here.

Pushed commits

I added a couple of commits. The first one clears the current line when printing a warning. Without this, if you do prettier Foo.sol and get a warning, the line says Foo.sol[prettier-solidity] 'solidity-parse' has been deprecated, please use 'slang-solidity'. The Foo.sol part is a temporary write that prettier does to stdout, and the warning messes with the cleanup of that temporary warning.

The other commit just makes src/index.ts clearer, by using explicit names for antlr and slang variables.

Comments.sol tests

These tests are only run with the ANTLR parser, and the run-format-test.js file has a comment that says // TODO: finish Comments. Does this mean that there's something pending around comments when using the Slang parser?

And, as an aside, all the Slang-related code in run-format-test.js doesn't seem to make sense? shouldTestSlang will always return false, so all the code used in that scenario seems like dead code.

Defaulting to the latest supported version

I mentioned this already before, but defaulting to the latest Solidity version supported by slang won't work for projects with multiple versions of Solidity, which is a common thing for projects that have been around for some years.

I don't know what's the right solution here. I kind of feel that Slang should export some helper utility to let you infer the version to use for a given code (by getting the pragmas doing what solidity-analyzer does, and then returning the latest supported version that matches the pragmas). But there might be better solutions.

Other comments

  • There are some commented out lines in CI.yml
  • The options.parentParser trick used in the SourceUnit nodes is, if I remember correctly, done to avoid printing empty trailing lines in Solidity snippets within a markdown file. Can we add a comment explaining that? Otherwise it's a bit obscure.
  • Out of scope for the Slang refactor, but I think we should use tsx instead of ts-node.

@Janther
Copy link
Contributor Author

Janther commented Sep 17, 2024

I didn't do an in-depth review of the PR; this is just the result of manually testing it and skimming the code for a couple of hours. If there are things you would like me to review in depth, please point them out to me, because I don't think I'll be able to do a line-by-line review of the whole PR.

The only thing that pops up is the slang-comments section which is new.

Comments.sol tests

These tests are only run with the ANTLR parser, and the run-format-test.js file has a comment that says // TODO: finish Comments. Does this mean that there's something pending around comments when using the Slang parser?

These were some extremely edge cases with the positioning of comments where honestly comments should not go. I pushed a commit making these scenarios, not the same as with the antlr parser but closer to what prettier does with JS in a similar situation.

And, as an aside, all the Slang-related code in run-format-test.js doesn't seem to make sense? shouldTestSlang will always return false, so all the code used in that scenario seems like dead code.

yeah this was dead code (just removed it) that was helpful when I was slowly implementing new printers and had a proper list of tests that needed to be tested with both parsers.

Defaulting to the latest supported version

I mentioned this already before, but defaulting to the latest Solidity version supported by slang won't work for projects with multiple versions of Solidity, which is a common thing for projects that have been around for some years.

I don't know what's the right solution here. I kind of feel that Slang should export some helper utility to let you infer the version to use for a given code (by getting the pragmas doing what solidity-analyzer does, and then returning the latest supported version that matches the pragmas). But there might be better solutions.

This is planned but not in the scope of this PR.

Other comments

  • There are some commented out lines in CI.yml

While we slang doesn't support WASM, the build process is broken and the CI step for browser should be skipped.
Once they start supporting it, the CI comes right back.

  • The options.parentParser trick used in the SourceUnit nodes is, if I remember correctly, done to avoid printing empty trailing lines in Solidity snippets within a markdown file. Can we add a comment explaining that? Otherwise it's a bit obscure.

Done.

  • Out of scope for the Slang refactor, but I think we should use tsx instead of ts-node.

Will look into this. The main thing would be that it would need to be compatible with jest-light-runner

@Janther Janther merged commit 0fded41 into v2 Sep 18, 2024
7 checks passed
@Janther Janther deleted the slang branch September 18, 2024 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants