TON Static Analyzer #436

byakuren-hijiri · 2024-02-10T11:40:14Z

Summary

Develop a modular, extensible static analyzer to enhance the security and reliability of TON smart contracts.

Context

Static analyzers detect potential vulnerabilities in contracts without executing them.

A new tool offers:

Vulnerability Detection: Mitigates risks by introducing security analysis to catch potential issues.
Enhanced Code Quality: Promotes code to follow recommended best-practices.
Streamlined Auditor Workflow: Provides a tool for more efficient smart contract reviews.
CI/CD Integration: Automates security checks in the development process for early problem-spotting.

From a business perspective, a static analyzer offers these benefits:

Competitive Edge: Sets the TON ecosystem apart with a focus on security.
User Trust: Builds confidence by protecting smart contracts.
Ecosystem Strength: Shows users a mature development environment.
Improved Development: Gives developers better tools to work more efficiently.
Lower Risk: Helps avoid costly security incidents.
Community Interest: The tool's extensible nature and community-driven development model make it attractive to the TON security community, encouraging collaboration on security efforts.

Milestones

Implement Internals:

Internal representation
Analyzer pipeline extensible with custom lints similar to Slither
A module to solve dataflow problems
Configuration that allows to manage lints and suppress warnings

Prepare Infrastructure:

Write unit tests and setup CI
Setup documentation generator

Implement analyses to demonstrate the capabilities:

Read-only variables: Read only variables and fields typically could be replaced with constants for optimization purposes.
Variable is never accessed: Unused and write-only variables should be reported, as the developer potentially forgot to implement the intended logic.
Using zero address: Occurrence of the zero address in a contract is typically problematic, therefore the suspicious cases must be highlighted.
Unbounded loop operations: When iterating over a collection, the upper-limit the loop should typically use the length of that collection.
Divide before multiply: Performing division before multiplication can lead to precision loss and give unexpected results to the developer.

Publish documentation and report grant results

References

Leading smart contract ecosystems provide security tooling for developers and auditors. An outstanding example is Ethereum's Slither, a community-driven tool known for its extensibility and open development model. It is highly customizable for auditors and follows up the recent vulnerability disclosures, providing up-to-date lints for new findings. Inspired by successful cases, the goal of this grant is to strengthen the TON ecosystem with a similar tool.

Estimate suggested reward

9000 USD in TON equivalent
Estimated time to finish: 8 weeks

anton-trunov · 2024-02-10T11:46:12Z

The plan looks really awesome! I'm all in favor of this proposal. It would be great to integrate the analyzer with https://github.com/tact-lang/tact-vscode after it's functional.

seriybeliy11 · 2024-02-10T12:18:43Z

This is a great idea. We would like to help you with the implementation, if there is a possibility, contact me at tg: @oocovo

Gusarich · 2024-02-11T19:50:42Z

This will work well in pair with #395

byakuren-hijiri · 2024-02-12T10:40:39Z

I want to apply to this grant, as I have extensive expertise in the Compilers and Static Program Analysis domain. I bring hands-on experience in creating security tools for other blockchain projects, understand the typical security vulnerabilities of smart contracts, and know the approaches to automatically detect bugs as well as their applicability and limitations.

I can provide references and look forward to discuss the details privately with TON.

delovoyhomie · 2024-02-19T06:35:32Z

@byakuren-hijiri, thank you for your interesting bounty.

We want to split the bounty into several milestones, and as a first step, develop an MVP and then add other needs.

Let's start by forming a detailed list of rules that the analyzer will process.

byakuren-hijiri · 2024-02-19T14:58:17Z

Hey @delovoyhomie,

Thank you for looking at this.

We want to split the bounty into several milestones, and as a first step, develop an MVP and then add other needs.

The idea exactly in this. The current grant application presents an MVP. Future grant applications will aim to improve its capabilities.
Before discussing further development plans, let's consider the technical context in greater detail.

Context

A static analyzer is a program that reason about other program's behavior without executing it. A simplified pipeline of a static analyzer includes the following steps:

Read the other program and represent it in form of different intermediate representations (IR)
Extract constraints from the IR using specific algorithms on graphs to express the supposed behavior of the program
Solve these constraints
Report violations of these constraints

The analyzer proposed in this grant application implements this pipeline with added usability enhanced for developers.
Let's it break down the actionable tasks of it for a more detailed explanation:

Develop an IR. This enables us to represent the contract in the analyzer in a format suitable for analysis. An important IR essential to implementing the analyses is a Control Flow Graph, which is DAG that shows execution order of the statements.
Develop an analyzer module. It uses IR as an input and extracts constraints from it to verify the security properties of the contract. This process involves implementing classic dataflow algorithms in our code.
Develop a constraints solver. The solver processes the IR and constraints generated by the analyzer, and solves them to detect any violation. Considering industry experience and the needs of the project, the solver used in this MVP will utilize Datalog-based analyses. This approach has recommended itself in smart-contracts analysis [brent2020] and broader program analyzers implementations [smaragdakis2010] and is considered in text books on this subject. We will utilize the Soufflé Datalog implementation which is specifically designed for analysis problems and offers an effective parallel runtime execution.
Develop several analyses. The MVP will include several analyses to showcase its capabilities. This task is straightforward with the internal components developed in the previous steps.

The additional steps highlighted in the grant application requires developing a driver, the execution pipeline and a modular architecture to enable third-party developers write their own analyses.

All these steps are necessary to develop an MVP and cannot be omitted. After implementing these, we will have the core of the analyzer and will be able to extend its capabilities. Let's discuss further steps for its improvement.

Further plans

To make development more transparent and trackable, further improvements will be divided into several grant applications. This also allows us to collaborate with developers to receive feedback and focus on more important tasks when needed.

Each of these milestones will improve either tool' usability or add more security checks making it more effective:

1. Address taint analysis problems
This enables us to detect typical problems with using of untrusted input and insufficient access controls. Ideally, it requires improvements in the IR as well, as it is simpler to work with the SSA form.

2. Address TON-specific problems
TON contracts have their own design features inherited from the TVM design. Expressing the subtle issues related to transaction ordering dependence and cell behavior is important for improving security.

3. Address cross-contracts interaction problems
To resolve these issues, it is important to emulate the execution environment with interaction between a few contracts.

4. Increase the overall number of lints
Some lints should be implemented based on the analyzer's internals. These are mostly classic and well-known, yet it is important to implement them. The good sources of examples of such the lints are slither's detectors and the recent paper that provides categorization of machine-auditable smart contracts bugs.

5. Improve developers experience
Integration to text editors via LSP, support of standard analyzer's formats like SARIF, developing of user interfaces for analyzer's reports are examples of tasks to improve developer's experience and make the tool more useful.

6. Writing educational content
Tutorials, documentation, and API usage examples which are crucial for improving the analyzer to be adapted by third parties.

That's essentially what this is about. Would be glad to answer any questions and explain the technical details further if needed.

hacker-volodya · 2024-02-19T15:04:03Z

@byakuren-hijiri for which language do you want to write an analyzer? Tact, FunC, or TVM bytecode?

byakuren-hijiri · 2024-02-19T15:18:27Z

@hacker-volodya,

for which language do you want to write an analyzer? Tact, FunC, or TVM bytecode?

The target language of this MVP is Tact, as this is the most straightforward way to get a working project as quickly as possible.

Then, it will be extended to support FunC. This task is not complicated from a technical perspective but is time-consuming since it involves writing the FunC frontend and modifying the IR.

TVM bytecode is out of scope for now. It might be useful for specific tasks like symbolic execution, but implementing this requires more effort due to the non-standard design of TVM. So, my position is that while it is an interesting target, it is not practical to start with it, as it will take more time at the beginning.

byakuren-hijiri · 2024-02-24T10:12:23Z

In terms of milestones, they might be organized as follows:

Analyzer's Pipeline and IR: Develop the driver with an extensible pipeline architecture, including all necessary interfaces for further implementation, configuration management, and IR. Prepare a test suite and documentation generator.
Analyzer and Solver: Develop modules responsible for collecting constraints from the user's code and solving them using Soufflé as described above.
Lints: Develop 5 lints as mentioned above to demonstrate the capabilities of the tool.

Each milestone is expected to take up to 3 weeks and requires 3000 USD in TON equivalent. As a result, the static analyzer described in this grant will be developed, presenting the fully-functional MVP.

byakuren-hijiri · 2024-03-20T14:12:50Z

I would like refer to the latest grant application in #489, as it suggests a similar security tool but with a different approach.
Here is a short summary:

Difference	#436	#489
Approach	dataflow fixed-point analysis	symbolic execution
Similar tools	Slither	Oyente, MAIAN, Mythril
Performance	Instant feedback. By design it cannot find some kinds of vulnerabilities.	In general slower, which depends on the implementation and the vulnerabilities targeted. For example, emulating the blockchain state to track transactions and state changes (necessary, for example, for some LTL-properties) might be too slow, but symbolic execution on a single contract's bytecode could be faster.
Extensibility	Modular architecture; extensible with user-defined modules. This approach has recommended itself in Slither. Although most auditors don't publish their custom detectors, there are few examples available. IR could be reused by third-parties to implement other devtools, write custom lints, or experiment with new analysis techniques.	According to the grant application, it will be extensible. Explaining symbolic execution to the community might be somewhat more challenging, but it could have the same benefits if a clear and well-documented API is provided for the internals.
Integrability	Writing an LSP and the VSCode plugin is quite straightforward. The used approach allows to execute the tool in the background giving the developer instant feedback.	It is possible to integrate developer tooling with such a tool, but symbolic execution in general takes more time to execute. It depends on the implementation, but the better use-case for such a tool might be CI/CD or long-running tasks to provide more comprehensive analysis and find errors which cannot be found using classical dataflow algorithms.
Language support	Tact is the primary target, and FunC will be supported. Adding a new language requires efforts, but being a source-code analyzer, the tool has more context about the contract, which enables to provide more security checks and share that information to third-party plugins.	The tool works with TVM bytecode, therefore it supports all TVM-based languages. Integrating with these languages to provide context about an error found is also challenging and requires effort.
Coverage and limitations	Dataflow-analysis is a lightweight approach that has its own pros and cons. A carefully crafted analyzer provides low false-positive rate, can detect specific problems in contracts addressing previously found vulnerabilities (for example). But by design cannot find some categories of bugs, and will provide false-negatives in the cases it doesn't have enough information.	The symbolic execution approach is able to find more complicated bugs described in the grant application. But addressing specific patterns in the source code is out of the scope of such a tool, since it requires more information about the source code and the suitable IR.
Team and development	As a solo developer (9 years of experience; 6 in program analysis), it is important for me to make the tool reusable and extendable by the community to facilitate its maintenance later.	Developed by a professional team with proven experience.

The tool suggested in this grant is more lightweight, easy-to-use, and will be community-driven. The tool suggested in #489 provides more comprehensive analysis, which makes it a good candidate to find tricky errors, especially when integrated in CI/CD.

Thus, determining in which tool should be implemented: both of them, as they complement each other and will strengthen the ecosystem from a security perspective in different aspects.

@korifey please let me know if you have any feedback on this, especially regarding certain aspects of the tool I mentioned. I am discussing the approach in general, based on my experience and the grant description to clarify the difference to the TON grant team and suggest supporting both grants.

korifey · 2024-04-01T07:51:49Z

@byakuren-hijiri Thanks for a very detailed comparision. Both source-code and bytecode analyzers have their own niches. The same is about technology: dataflow-based and symbolic execution based tools.
E.g. running symbolic execution in IDE in online mode is quite challenging as you mentioned.
So I believe both tools will contribute to TON ecosystem.

byakuren-hijiri · 2024-04-15T13:58:44Z

I am on track according to the proposed roadmap, and the first milestone has been completed in two weeks as expected:

Analyzer's Pipeline and IR: Develop the driver with an extensible pipeline architecture, including all necessary interfaces for further implementation, configuration management, and IR. Prepare a test suite and documentation generator.

The project is now available at: https://github.com/nowarp/misti/.

This version includes everything planned:

A driver with an extensible pipeline capable of dynamically loading custom detector modules. For an example, see the example detector.
A configuration file for managing the detectors in use.
IR, currently Tact is supported.
CFG builder and lattice interfaces for further analysis.
CI and test suite.
The API documentation is available at https://nowarp.github.io/misti/ and will later be expanded to include tutorials and additional usage information.

So, the infrastructure and driver is now ready, and I'm beginning to work on implementing the analyzer's main logic as outlined in the plan.

byakuren-hijiri · 2024-07-01T14:36:13Z

I'm currently finalizing the last two lints and polishing the final steps to improve everything before the release.

The milestone 2 is fully covered:

Analyzer and Solver: Develop modules responsible for collecting constraints from the user's code and solving them using Soufflé as described above.

The detectors that define dataflow problems and collect constraints are available in the repository.
The Souffle API has been developed and is used to solve dataflow constraints. Here is an example of using it in one of the lints: ReadOnlyVariables.check, which first creates a Soufflé program generating the relevant declarations, rules, and facts, and then calls the Soufflé executor to solve them. The API is fully functional for the static analyzer's needs, but I'm going to create a separate library from it since it might be useful for the community, and make a declarative version of the API to make its description more concise.

The milestone 3 is partially covered:

Lints: Develop 5 lints as mentioned above to demonstrate the capabilities of the tool.

The available lints are:
There is also the additional example lint: Implicit Init. It demonstrates how to use the analyzer as a library.
The UnboundLoopsLint is work in progress. I have already defined the dataflow problem, so it could be easily finished, but I would like to implement the more generic dataflow equations solver based on Souffle, because it will be useful later on more complicated lints.
The DivideBeforeMultiply lint is the same, I would like to use the generic solver for above for it as well, just for consistency.

Thus, it could be finished quite fast, but I would like to make it more generic and reusable from the first release, so I'm still working on it.

byakuren-hijiri · 2024-08-06T02:27:20Z

@delovoyhomie I have finished and published the first version of the static analyzer, which implements the functionality stated in this grant application.

Lints: Develop 5 lints as mentioned above to demonstrate the capabilities of the tool.

The third milestone is finished and the lints are implemented. Here are the references to the documentation with motivation and example and the appropriate implementation:

Key Contributions

Developed the source-level static analyzer for TON with modular and extendable architecture:
- The analyzer provides a few built-in lints as described above.
- It is possible to write custom detectors to solve specific problems in the codebase or to implemented custom automatic checks for auditor companies.
- The user could configure the analyzer.
- It provides an API to solve problems using Souffle Datalog and implements classical data-flow analysis based on the monotone framework and the worklist solver.
Prepared the documentation:
- Created a static website: nowarp.github.io [source]
- Published the documentation and tutorial for Misti: nowarp.github.io/tools/misti/docs
- Created an API reference for developers: nowarp.github.io/tools/misti/api. It is automatically generated, since the source code is fully covered with docstring comments.
- Each warning contains a reference to the documentation:
Developed a comprehensive test suite:
- Unit tests that cover corner-cases to avoid regressions in development.
- Created a prototype of the test suite that downloads contracts from verifier.ton.org and other known open-source contracts and runs the analyzer on them. Currently, it is in the private repo and is available for the Tact team. It will be useful to test the Tact compiler in the future.
- Next step is using tact-check – the grammar-aware fuzzer for Tact which is currently available for the Tact team only.
Setup processes needed to maintain and develop it as an open-source project:
- All the development is based on GitHub issues, milestones and follows the classic git-flow.
- There are some good first issue issues available for new contributors.
Developed Souffle bindings for TypeScript. They will be moved to a separate repository later.
Highlighted possible improvements on tact compiler API:
- Tact compiler development is moving towards creating a modular architecture with an API for third-party tools. Being one of the first tools in that ecosystem, it is easy to find new ways to improve the API, which is beneficial for both the tool and the entire compiler ecosystem.
- See: Tact frontend API: Improvements for tooling and issues labeled api in the Tact repo.
Tools useful for auditors
- Being a security tool, Misti addresses the common problems auditors needed to understand and analyze the source code.
- Currently, it implements the CFG dump:
- Import and callgraph support will be added later.
Insights on Tact security:
- Gathered feedback and gained experience on potential Tact security issues, which is crucial for the ongoing secure language design and improvements in tooling.
- This resulted in new ideas about the detectors we need: new detector issues.

Acknowledgments

My thanks to the following people who helped make this release possible:

Anton Trunov and Daniil Sedov: Assistance with testing, discussions on the analyzer implementation, and valuable input on new detector ideas and their design.
Novus Nota: Insightful discussions on tooling and compiler API, and support from the Tact compiler side.

Next steps

The next desired steps in the development include:

v0.2: Fixes for issues that arise, new detectors for Tact, community engagement, and following to the Tact development (some of the issues are blocked and depend on compiler API updates).
v0.3: A release focused on tooling support and optimizations. The main goal will be to develop an LSP server and integrate it with the supported editors/IDEs.
v1.0: Complete FunC support and the final version of the IR that supports both. This major release will guarantee a stable API for all internals, which third parties can rely on.

delovoyhomie · 2024-08-06T10:20:46Z

@byakuren-hijiri thank you for the contribution!

To accurately recognize your valuable contributions in our repository, we kindly request you to submit a Pull Request to the Hall of Fame file, providing the wallet address and a link to the bounty with the number.

Please follow these steps:

Fork the repository (if you haven't already).
Edit the Hall of Fame file, commit, and push your changes.
Create a Pull Request from your fork to the main repository, providing the wallet address and a link to the bounty with the number (for example, Pull Request Article: Generation of block random seed #136).
For reference on what your entry should look like, please see the examples of past merged pull requests.
And please follow the questbook proposal stage in accordance with the bounty guideline

byakuren-hijiri added the Developer Tool Related to tools or utilities used by developers label Feb 10, 2024

anton-trunov mentioned this issue Mar 19, 2024

TON Static Analysis based on Symbolic Execution #489

Closed

delovoyhomie added the Approved This proposal is approved by the committee label Mar 30, 2024

delovoyhomie assigned byakuren-hijiri Mar 30, 2024

anton-trunov mentioned this issue Apr 26, 2024

Incremental and error-resilient parser tact-lang/tact#286

Open

delovoyhomie mentioned this issue Jun 14, 2024

CONTINUOUS AUDIT OF TON SMART CONTRACTS #603

Closed

byakuren-hijiri mentioned this issue Aug 6, 2024

Add Misti to Hall of Fame #748

Open

jubnzv mentioned this issue Aug 29, 2024

Upgrade Misti with Advanced Tact Detectors #777

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TON Static Analyzer #436

TON Static Analyzer #436

byakuren-hijiri commented Feb 10, 2024

anton-trunov commented Feb 10, 2024

seriybeliy11 commented Feb 10, 2024

Gusarich commented Feb 11, 2024

byakuren-hijiri commented Feb 12, 2024

delovoyhomie commented Feb 19, 2024

byakuren-hijiri commented Feb 19, 2024

hacker-volodya commented Feb 19, 2024

byakuren-hijiri commented Feb 19, 2024

byakuren-hijiri commented Feb 24, 2024

byakuren-hijiri commented Mar 20, 2024

korifey commented Apr 1, 2024

byakuren-hijiri commented Apr 15, 2024

byakuren-hijiri commented Jul 1, 2024

byakuren-hijiri commented Aug 6, 2024

delovoyhomie commented Aug 6, 2024

TON Static Analyzer #436

TON Static Analyzer #436

Comments

byakuren-hijiri commented Feb 10, 2024

Summary

Context

Milestones

References

Estimate suggested reward

anton-trunov commented Feb 10, 2024

seriybeliy11 commented Feb 10, 2024

Gusarich commented Feb 11, 2024

byakuren-hijiri commented Feb 12, 2024

delovoyhomie commented Feb 19, 2024

byakuren-hijiri commented Feb 19, 2024

Context

Further plans

hacker-volodya commented Feb 19, 2024

byakuren-hijiri commented Feb 19, 2024

byakuren-hijiri commented Feb 24, 2024

byakuren-hijiri commented Mar 20, 2024

korifey commented Apr 1, 2024

byakuren-hijiri commented Apr 15, 2024

byakuren-hijiri commented Jul 1, 2024

byakuren-hijiri commented Aug 6, 2024

Key Contributions

Acknowledgments

Next steps

delovoyhomie commented Aug 6, 2024

@byakuren-hijiri thank you for the contribution!