Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

challenges of working with the Solidity AST #1099

Closed
0xalpharush opened this issue Sep 19, 2024 · 1 comment
Closed

challenges of working with the Solidity AST #1099

0xalpharush opened this issue Sep 19, 2024 · 1 comment
Assignees

Comments

@0xalpharush
Copy link

0xalpharush commented Sep 19, 2024

Slang has the opportunity to improve from solc's mistakes and limitations to enable the next generation of analysis frameworks. What follows is a brief list of edge cases and nuisances I recall encountering while working on Slither that I can think of off-hand. It may be useful to consult the issues and PR's linked (or searching the github tracker) in addition to using test cases from Slither that have been compiled to fix challenging issues in supporting different versions or quirky cases.

Tuple expressions are difficult to work with and not straightforward

It would be nice if Slang represented tuples as fully typed with their indices so that there is no need for record keeping of initialization or elided variables.

There are 2 cases we need to handle:

  1. Initializing multiple vars with multiple individual values:
                    var (a, b) = (1,2);

In this case, we convert to the following code:

                    var a = 1;
                    var b = 2;
  1. Initializing multiple vars with any other method (function call, skipped args, etc):
                    var (a, , c) = f();

In this case, we convert to the following code:

                    var a;
                    var c;
                    (a, , c) = f();

Changes in syntax that introduce ambiguity

For example, between versions .value() is either a builtin/intrinsic (.value() member pre 0.7 crytic/slither#1923) or a function call thereafter. Unifying the call options across versions would be nice as well (value, salt, etc)

Missing control flow information in the AST

Logical operators like && and || as well as ternary operators do not reflect their laziness/ eagerness or short-circuiting behavior. Personally, I would like to see these lifted to more a structured control flow that can be uniformly analyzed such as an if-else statement.

Additionally, implicit named returns are not reflected in the AST and one must analyze where the exit points are in the CFG and add a return node artificially. (crytic/slither#1880)

Ambiguities caused by explicit arguments to events, structs, functions, etc

Solidity supports explicit passing arguments as {b: ... c: ..., a: ...} in addition to the more common implicit, positional option (a, b, c). Similar to tuples this can cause issues and requires additional record keeping that may not be in the AST (crytic/slither#1949)

Lack of info on truncation/casting behavior in Solidity AST

For instance, across versions the shifting behavior changes and it's totally implicit (https://docs.soliditylang.org/en/latest/070-breaking-changes.html). Additionally, the type conversion rules for constants are implicit and it's not always clear what type the frontend considers literals and what type conversion rules it follows to perform typecasts (crytic/slither#1931, crytic/slither#1688). Representing every type cast as an operation in the AST would be valuable.

Import/export of using for directives and aliases causes ambiguities

Unlike other definitions, solidity does not report the bindings of user defined operators to types (notably global ones) in its exportedSymbols field of the AST. This makes it difficult to track which references are in scope (directly or transitively). I'm sure this could be improved in Slither but older versions of solc had unreliable reference ID's.
(https://github.com/crytic/slither/blob/2c792b2b73c6c1fbbf5464bd1f9fc8ccedf0c0bf/slither/core/scope/scope.py#L69-L71)

Additionally, aliased imports further complicate resolving references as an aliased import can be imported and aliased again. Slither performs a late lookup that is not ideal to be able to find symbols. Likely, it could use reference ID's but I'm not sure if there's limitations for older versions... (crytic/slither#2166) https://github.com/crytic/slither/blob/2c792b2b73c6c1fbbf5464bd1f9fc8ccedf0c0bf/slither/solc_parsing/expressions/find_variable.py#L141-L144
https://github.com/crytic/slither/blob/2c792b2b73c6c1fbbf5464bd1f9fc8ccedf0c0bf/slither/visitors/slithir/expression_to_slithir.py#L602-L617

Cyclic imports and top level definitions

Solidity allows cycles in the import graph and this can make it difficult to determine the order to parse AST nodes as there's no topological ordering. Slither has had to incrementally figure out the right order (often new versions had very little complex code available until the behavior was clarified by developers writing more). For example, top level errors can be referenced that have not been analyzed yet and thus their signature is not known yet
https://github.com/crytic/slither/blob/2c792b2b73c6c1fbbf5464bd1f9fc8ccedf0c0bf/slither/solc_parsing/expressions/find_variable.py#L149-L165

Making sure scoping rules are accurate

Backporting fixes

Solc has not backported fixes to my knowledge and this makes it difficult as a third-party tool dev when user's are stuck on a version with a bug. Ideally, fixes would be backported

Many of the challenges are probably mitigated by having a really great reference-declaration resolver and API. Some of the issues are probably poor abstractions in Slither that were hard to change while maintaining backward compatibility and supporting solc's legacy AST, so it may be that many of these are not applicable to Slang. As a principle, any semantic info that would be needed to write a compiler or reference interpreter for the langauge should be included in the AST (https://www.youtube.com/watch?v=rBfSs2yipoE&t=1364s). I hope this is useful and best of luck!

@OmarTawfik
Copy link
Collaborator

OmarTawfik commented Sep 25, 2024

Thank you so much for the feedback! this is really helpful!

Tuple expressions are difficult to work with and not straightforward

Ambiguities caused by explicit arguments to events, structs, functions, etc

The Slang CST/AST matches the input source code 1:1, without any modifications, so tuples/arguments/etc.. will keep the original indices as they exist in input. In the future, when we ship the type system, it will have direct links to the same syntax nodes, along with their locations, so that users don't have to build that again. Here are examples of the CST nodes produced for tuples today: [1] [2].

Changes in syntax that introduce ambiguity

Our CST is designed to match the source 100%. So while future releases of Slang might add/change some syntax elements, each single release of Slang is designed to represent the entirety of Solidity, across all Solidity language versions. Therefore, Slang users need to worry about one set of types/inputs, and expect them to be stable regardless of which version of Solidity they are analyzing. If Solidity syntax itself changes, then both old/new syntaxes will be represented in our tree, and both will exist in our types/API, so that your code can handle both.

Moving forward, as we work on our backend/type system, these version differences will indeed be unified into a single API. For example, named functions, unnamed functions, receive/fallback functions, and other function variations can be unified into a single interface, regardless of whether you are analyzing old or new Solidity syntax.

Import/export of using for directives and aliases causes ambiguities

Our Bindings API (in progress) allows using cursors (pointers to specific tree nodes) to resolve references or definitions, and navigate that graph freely. Cursors will have explicit source locations, and not numeric IDs, and allow bi-directional lookups.

Cyclic imports and top level definitions

Our Compilation API (in progress) allows building the entire project from any node, and we will crawl all import statements and load imported files transitively, until the entire graph is built. It will be able to detect cycles and handle them.

Missing control flow information in the AST

Lack of info on truncation/casting behavior in Solidity AST

Making sure scoping rules are accurate

Backporting fixes

These are valuable pieces of feedback, and we will definitely consider when completing our backend. Thank you!

@github-project-automation github-project-automation bot moved this from ⏳ Todo to ✅ Done in Slang - 2024 H2 Nov 19, 2024
@OmarTawfik OmarTawfik self-assigned this Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

No branches or pull requests

2 participants