-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
challenges of working with the Solidity AST #1099
Comments
Thank you so much for the feedback! this is really helpful!
The Slang CST/AST matches the input source code 1:1, without any modifications, so tuples/arguments/etc.. will keep the original indices as they exist in input. In the future, when we ship the type system, it will have direct links to the same syntax nodes, along with their locations, so that users don't have to build that again. Here are examples of the CST nodes produced for tuples today: [1] [2].
Our CST is designed to match the source 100%. So while future releases of Slang might add/change some syntax elements, each single release of Slang is designed to represent the entirety of Solidity, across all Solidity language versions. Therefore, Slang users need to worry about one set of types/inputs, and expect them to be stable regardless of which version of Solidity they are analyzing. If Solidity syntax itself changes, then both old/new syntaxes will be represented in our tree, and both will exist in our types/API, so that your code can handle both. Moving forward, as we work on our backend/type system, these version differences will indeed be unified into a single API. For example, named functions, unnamed functions, receive/fallback functions, and other function variations can be unified into a single interface, regardless of whether you are analyzing old or new Solidity syntax.
Our
Our
These are valuable pieces of feedback, and we will definitely consider when completing our backend. Thank you! |
Slang has the opportunity to improve from solc's mistakes and limitations to enable the next generation of analysis frameworks. What follows is a brief list of edge cases and nuisances I recall encountering while working on Slither that I can think of off-hand. It may be useful to consult the issues and PR's linked (or searching the github tracker) in addition to using test cases from Slither that have been compiled to fix challenging issues in supporting different versions or quirky cases.
Tuple expressions are difficult to work with and not straightforward
It would be nice if Slang represented tuples as fully typed with their indices so that there is no need for record keeping of initialization or elided variables.
There are 2 cases we need to handle:
In this case, we convert to the following code:
In this case, we convert to the following code:
Changes in syntax that introduce ambiguity
For example, between versions
.value()
is either a builtin/intrinsic (.value() member pre 0.7 crytic/slither#1923) or a function call thereafter. Unifying the call options across versions would be nice as well (value, salt, etc)Missing control flow information in the AST
Logical operators like
&&
and||
as well as ternary operators do not reflect their laziness/ eagerness or short-circuiting behavior. Personally, I would like to see these lifted to more a structured control flow that can be uniformly analyzed such as an if-else statement.Additionally, implicit named returns are not reflected in the AST and one must analyze where the exit points are in the CFG and add a return node artificially. (crytic/slither#1880)
Ambiguities caused by explicit arguments to events, structs, functions, etc
Solidity supports explicit passing arguments as
{b: ... c: ..., a: ...}
in addition to the more common implicit, positional option(a, b, c)
. Similar to tuples this can cause issues and requires additional record keeping that may not be in the AST (crytic/slither#1949)Lack of info on truncation/casting behavior in Solidity AST
For instance, across versions the shifting behavior changes and it's totally implicit (https://docs.soliditylang.org/en/latest/070-breaking-changes.html). Additionally, the type conversion rules for constants are implicit and it's not always clear what type the frontend considers literals and what type conversion rules it follows to perform typecasts (crytic/slither#1931, crytic/slither#1688). Representing every type cast as an operation in the AST would be valuable.
Import/export of using for directives and aliases causes ambiguities
Unlike other definitions, solidity does not report the bindings of user defined operators to types (notably global ones) in its
exportedSymbols
field of the AST. This makes it difficult to track which references are in scope (directly or transitively). I'm sure this could be improved in Slither but older versions of solc had unreliable reference ID's.(https://github.com/crytic/slither/blob/2c792b2b73c6c1fbbf5464bd1f9fc8ccedf0c0bf/slither/core/scope/scope.py#L69-L71)
Additionally, aliased imports further complicate resolving references as an aliased import can be imported and aliased again. Slither performs a late lookup that is not ideal to be able to find symbols. Likely, it could use reference ID's but I'm not sure if there's limitations for older versions... (crytic/slither#2166) https://github.com/crytic/slither/blob/2c792b2b73c6c1fbbf5464bd1f9fc8ccedf0c0bf/slither/solc_parsing/expressions/find_variable.py#L141-L144
https://github.com/crytic/slither/blob/2c792b2b73c6c1fbbf5464bd1f9fc8ccedf0c0bf/slither/visitors/slithir/expression_to_slithir.py#L602-L617
Cyclic imports and top level definitions
Solidity allows cycles in the import graph and this can make it difficult to determine the order to parse AST nodes as there's no topological ordering. Slither has had to incrementally figure out the right order (often new versions had very little complex code available until the behavior was clarified by developers writing more). For example, top level errors can be referenced that have not been analyzed yet and thus their signature is not known yet
https://github.com/crytic/slither/blob/2c792b2b73c6c1fbbf5464bd1f9fc8ccedf0c0bf/slither/solc_parsing/expressions/find_variable.py#L149-L165
Making sure scoping rules are accurate
Backporting fixes
Solc has not backported fixes to my knowledge and this makes it difficult as a third-party tool dev when user's are stuck on a version with a bug. Ideally, fixes would be backported
Many of the challenges are probably mitigated by having a really great reference-declaration resolver and API. Some of the issues are probably poor abstractions in Slither that were hard to change while maintaining backward compatibility and supporting solc's legacy AST, so it may be that many of these are not applicable to Slang. As a principle, any semantic info that would be needed to write a compiler or reference interpreter for the langauge should be included in the AST (https://www.youtube.com/watch?v=rBfSs2yipoE&t=1364s). I hope this is useful and best of luck!
The text was updated successfully, but these errors were encountered: