Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TON Static Analyzer #436

Open
byakuren-hijiri opened this issue Feb 10, 2024 · 15 comments
Open

TON Static Analyzer #436

byakuren-hijiri opened this issue Feb 10, 2024 · 15 comments
Assignees
Labels
Approved This proposal is approved by the committee Developer Tool Related to tools or utilities used by developers

Comments

@byakuren-hijiri
Copy link

Summary

Develop a modular, extensible static analyzer to enhance the security and reliability of TON smart contracts.

Context

Static analyzers detect potential vulnerabilities in contracts without executing them.

A new tool offers:

  • Vulnerability Detection: Mitigates risks by introducing security analysis to catch potential issues.
  • Enhanced Code Quality: Promotes code to follow recommended best-practices.
  • Streamlined Auditor Workflow: Provides a tool for more efficient smart contract reviews.
  • CI/CD Integration: Automates security checks in the development process for early problem-spotting.

From a business perspective, a static analyzer offers these benefits:

  • Competitive Edge: Sets the TON ecosystem apart with a focus on security.
  • User Trust: Builds confidence by protecting smart contracts.
  • Ecosystem Strength: Shows users a mature development environment.
  • Improved Development: Gives developers better tools to work more efficiently.
  • Lower Risk: Helps avoid costly security incidents.
  • Community Interest: The tool's extensible nature and community-driven development model make it attractive to the TON security community, encouraging collaboration on security efforts.

Milestones

  1. Implement Internals:
  • Internal representation
  • Analyzer pipeline extensible with custom lints similar to Slither
  • A module to solve dataflow problems
  • Configuration that allows to manage lints and suppress warnings
  1. Prepare Infrastructure:
  • Write unit tests and setup CI
  • Setup documentation generator
  1. Implement analyses to demonstrate the capabilities:
  • Read-only variables: Read only variables and fields typically could be replaced with constants for optimization purposes.
  • Variable is never accessed: Unused and write-only variables should be reported, as the developer potentially forgot to implement the intended logic.
  • Using zero address: Occurrence of the zero address in a contract is typically problematic, therefore the suspicious cases must be highlighted.
  • Unbounded loop operations: When iterating over a collection, the upper-limit the loop should typically use the length of that collection.
  • Divide before multiply: Performing division before multiplication can lead to precision loss and give unexpected results to the developer.
  1. Publish documentation and report grant results

References

Leading smart contract ecosystems provide security tooling for developers and auditors. An outstanding example is Ethereum's Slither, a community-driven tool known for its extensibility and open development model. It is highly customizable for auditors and follows up the recent vulnerability disclosures, providing up-to-date lints for new findings. Inspired by successful cases, the goal of this grant is to strengthen the TON ecosystem with a similar tool.

Estimate suggested reward

9000 USD in TON equivalent
Estimated time to finish: 8 weeks

@byakuren-hijiri byakuren-hijiri added the Developer Tool Related to tools or utilities used by developers label Feb 10, 2024
@anton-trunov
Copy link

The plan looks really awesome! I'm all in favor of this proposal. It would be great to integrate the analyzer with https://github.com/tact-lang/tact-vscode after it's functional.

@seriybeliy11
Copy link
Contributor

This is a great idea. We would like to help you with the implementation, if there is a possibility, contact me at tg: @oocovo

@Gusarich
Copy link
Contributor

This will work well in pair with #395

@byakuren-hijiri
Copy link
Author

I want to apply to this grant, as I have extensive expertise in the Compilers and Static Program Analysis domain. I bring hands-on experience in creating security tools for other blockchain projects, understand the typical security vulnerabilities of smart contracts, and know the approaches to automatically detect bugs as well as their applicability and limitations.

I can provide references and look forward to discuss the details privately with TON.

@delovoyhomie
Copy link
Collaborator

@byakuren-hijiri, thank you for your interesting bounty.

We want to split the bounty into several milestones, and as a first step, develop an MVP and then add other needs.

Let's start by forming a detailed list of rules that the analyzer will process.

@byakuren-hijiri
Copy link
Author

Hey @delovoyhomie,

Thank you for looking at this.

We want to split the bounty into several milestones, and as a first step, develop an MVP and then add other needs.

The idea exactly in this. The current grant application presents an MVP. Future grant applications will aim to improve its capabilities.
Before discussing further development plans, let's consider the technical context in greater detail.

Context

A static analyzer is a program that reason about other program's behavior without executing it. A simplified pipeline of a static analyzer includes the following steps:

  • Read the other program and represent it in form of different intermediate representations (IR)
  • Extract constraints from the IR using specific algorithms on graphs to express the supposed behavior of the program
  • Solve these constraints
  • Report violations of these constraints

The analyzer proposed in this grant application implements this pipeline with added usability enhanced for developers.
Let's it break down the actionable tasks of it for a more detailed explanation:

  1. Develop an IR. This enables us to represent the contract in the analyzer in a format suitable for analysis. An important IR essential to implementing the analyses is a Control Flow Graph, which is DAG that shows execution order of the statements.
  2. Develop an analyzer module. It uses IR as an input and extracts constraints from it to verify the security properties of the contract. This process involves implementing classic dataflow algorithms in our code.
  3. Develop a constraints solver. The solver processes the IR and constraints generated by the analyzer, and solves them to detect any violation. Considering industry experience and the needs of the project, the solver used in this MVP will utilize Datalog-based analyses. This approach has recommended itself in smart-contracts analysis [brent2020] and broader program analyzers implementations [smaragdakis2010] and is considered in text books on this subject. We will utilize the Soufflé Datalog implementation which is specifically designed for analysis problems and offers an effective parallel runtime execution.
  4. Develop several analyses. The MVP will include several analyses to showcase its capabilities. This task is straightforward with the internal components developed in the previous steps.

The additional steps highlighted in the grant application requires developing a driver, the execution pipeline and a modular architecture to enable third-party developers write their own analyses.

All these steps are necessary to develop an MVP and cannot be omitted. After implementing these, we will have the core of the analyzer and will be able to extend its capabilities. Let's discuss further steps for its improvement.

Further plans

To make development more transparent and trackable, further improvements will be divided into several grant applications. This also allows us to collaborate with developers to receive feedback and focus on more important tasks when needed.

Each of these milestones will improve either tool' usability or add more security checks making it more effective:

1. Address taint analysis problems
This enables us to detect typical problems with using of untrusted input and insufficient access controls. Ideally, it requires improvements in the IR as well, as it is simpler to work with the SSA form.

2. Address TON-specific problems
TON contracts have their own design features inherited from the TVM design. Expressing the subtle issues related to transaction ordering dependence and cell behavior is important for improving security.

3. Address cross-contracts interaction problems
To resolve these issues, it is important to emulate the execution environment with interaction between a few contracts.

4. Increase the overall number of lints
Some lints should be implemented based on the analyzer's internals. These are mostly classic and well-known, yet it is important to implement them. The good sources of examples of such the lints are slither's detectors and the recent paper that provides categorization of machine-auditable smart contracts bugs.

5. Improve developers experience
Integration to text editors via LSP, support of standard analyzer's formats like SARIF, developing of user interfaces for analyzer's reports are examples of tasks to improve developer's experience and make the tool more useful.

6. Writing educational content
Tutorials, documentation, and API usage examples which are crucial for improving the analyzer to be adapted by third parties.


That's essentially what this is about. Would be glad to answer any questions and explain the technical details further if needed.

@hacker-volodya
Copy link
Contributor

@byakuren-hijiri for which language do you want to write an analyzer? Tact, FunC, or TVM bytecode?

@byakuren-hijiri
Copy link
Author

@hacker-volodya,

for which language do you want to write an analyzer? Tact, FunC, or TVM bytecode?

The target language of this MVP is Tact, as this is the most straightforward way to get a working project as quickly as possible.

Then, it will be extended to support FunC. This task is not complicated from a technical perspective but is time-consuming since it involves writing the FunC frontend and modifying the IR.

TVM bytecode is out of scope for now. It might be useful for specific tasks like symbolic execution, but implementing this requires more effort due to the non-standard design of TVM. So, my position is that while it is an interesting target, it is not practical to start with it, as it will take more time at the beginning.

@byakuren-hijiri
Copy link
Author

In terms of milestones, they might be organized as follows:

  1. Analyzer's Pipeline and IR: Develop the driver with an extensible pipeline architecture, including all necessary interfaces for further implementation, configuration management, and IR. Prepare a test suite and documentation generator.
  2. Analyzer and Solver: Develop modules responsible for collecting constraints from the user's code and solving them using Soufflé as described above.
  3. Lints: Develop 5 lints as mentioned above to demonstrate the capabilities of the tool.

Each milestone is expected to take up to 3 weeks and requires 3000 USD in TON equivalent. As a result, the static analyzer described in this grant will be developed, presenting the fully-functional MVP.

@byakuren-hijiri
Copy link
Author

I would like refer to the latest grant application in #489, as it suggests a similar security tool but with a different approach.
Here is a short summary:

Difference #436 #489
Approach dataflow fixed-point analysis symbolic execution
Similar tools Slither Oyente, MAIAN, Mythril
Performance Instant feedback. By design it cannot find some kinds of vulnerabilities. In general slower, which depends on the implementation and the vulnerabilities targeted. For example, emulating the blockchain state to track transactions and state changes (necessary, for example, for some LTL-properties) might be too slow, but symbolic execution on a single contract's bytecode could be faster.
Extensibility Modular architecture; extensible with user-defined modules. This approach has recommended itself in Slither. Although most auditors don't publish their custom detectors, there are few examples available. IR could be reused by third-parties to implement other devtools, write custom lints, or experiment with new analysis techniques. According to the grant application, it will be extensible. Explaining symbolic execution to the community might be somewhat more challenging, but it could have the same benefits if a clear and well-documented API is provided for the internals.
Integrability Writing an LSP and the VSCode plugin is quite straightforward. The used approach allows to execute the tool in the background giving the developer instant feedback. It is possible to integrate developer tooling with such a tool, but symbolic execution in general takes more time to execute. It depends on the implementation, but the better use-case for such a tool might be CI/CD or long-running tasks to provide more comprehensive analysis and find errors which cannot be found using classical dataflow algorithms.
Language support Tact is the primary target, and FunC will be supported. Adding a new language requires efforts, but being a source-code analyzer, the tool has more context about the contract, which enables to provide more security checks and share that information to third-party plugins. The tool works with TVM bytecode, therefore it supports all TVM-based languages. Integrating with these languages to provide context about an error found is also challenging and requires effort.
Coverage and limitations Dataflow-analysis is a lightweight approach that has its own pros and cons. A carefully crafted analyzer provides low false-positive rate, can detect specific problems in contracts addressing previously found vulnerabilities (for example). But by design cannot find some categories of bugs, and will provide false-negatives in the cases it doesn't have enough information. The symbolic execution approach is able to find more complicated bugs described in the grant application. But addressing specific patterns in the source code is out of the scope of such a tool, since it requires more information about the source code and the suitable IR.
Team and development As a solo developer (9 years of experience; 6 in program analysis), it is important for me to make the tool reusable and extendable by the community to facilitate its maintenance later. Developed by a professional team with proven experience.

The tool suggested in this grant is more lightweight, easy-to-use, and will be community-driven. The tool suggested in #489 provides more comprehensive analysis, which makes it a good candidate to find tricky errors, especially when integrated in CI/CD.

Thus, determining in which tool should be implemented: both of them, as they complement each other and will strengthen the ecosystem from a security perspective in different aspects.

@korifey please let me know if you have any feedback on this, especially regarding certain aspects of the tool I mentioned. I am discussing the approach in general, based on my experience and the grant description to clarify the difference to the TON grant team and suggest supporting both grants.

@delovoyhomie delovoyhomie added the Approved This proposal is approved by the committee label Mar 30, 2024
@korifey
Copy link

korifey commented Apr 1, 2024

@byakuren-hijiri Thanks for a very detailed comparision. Both source-code and bytecode analyzers have their own niches. The same is about technology: dataflow-based and symbolic execution based tools.
E.g. running symbolic execution in IDE in online mode is quite challenging as you mentioned.
So I believe both tools will contribute to TON ecosystem.

@byakuren-hijiri
Copy link
Author

I am on track according to the proposed roadmap, and the first milestone has been completed in two weeks as expected:

  1. Analyzer's Pipeline and IR: Develop the driver with an extensible pipeline architecture, including all necessary interfaces for further implementation, configuration management, and IR. Prepare a test suite and documentation generator.

The project is now available at: https://github.com/nowarp/misti/.

This version includes everything planned:

  • A driver with an extensible pipeline capable of dynamically loading custom detector modules. For an example, see the example detector.
  • A configuration file for managing the detectors in use.
  • IR, currently Tact is supported.
  • CFG builder and lattice interfaces for further analysis.
  • CI and test suite.
  • The API documentation is available at https://nowarp.github.io/misti/ and will later be expanded to include tutorials and additional usage information.

So, the infrastructure and driver is now ready, and I'm beginning to work on implementing the analyzer's main logic as outlined in the plan.

@byakuren-hijiri
Copy link
Author

I'm currently finalizing the last two lints and polishing the final steps to improve everything before the release.

The milestone 2 is fully covered:

  1. Analyzer and Solver: Develop modules responsible for collecting constraints from the user's code and solving them using Soufflé as described above.

The milestone 3 is partially covered:

  1. Lints: Develop 5 lints as mentioned above to demonstrate the capabilities of the tool.

Thus, it could be finished quite fast, but I would like to make it more generic and reusable from the first release, so I'm still working on it.

@byakuren-hijiri
Copy link
Author

@delovoyhomie I have finished and published the first version of the static analyzer, which implements the functionality stated in this grant application.

  1. Lints: Develop 5 lints as mentioned above to demonstrate the capabilities of the tool.

The third milestone is finished and the lints are implemented. Here are the references to the documentation with motivation and example and the appropriate implementation:

Key Contributions

  • Developed the source-level static analyzer for TON with modular and extendable architecture:
    • The analyzer provides a few built-in lints as described above.
    • It is possible to write custom detectors to solve specific problems in the codebase or to implemented custom automatic checks for auditor companies.
    • The user could configure the analyzer.
    • It provides an API to solve problems using Souffle Datalog and implements classical data-flow analysis based on the monotone framework and the worklist solver.
  • Prepared the documentation:
  • Developed a comprehensive test suite:
    • Unit tests that cover corner-cases to avoid regressions in development.
    • Created a prototype of the test suite that downloads contracts from verifier.ton.org and other known open-source contracts and runs the analyzer on them. Currently, it is in the private repo and is available for the Tact team. It will be useful to test the Tact compiler in the future.
    • Next step is using tact-check – the grammar-aware fuzzer for Tact which is currently available for the Tact team only.
  • Setup processes needed to maintain and develop it as an open-source project:
  • Developed Souffle bindings for TypeScript. They will be moved to a separate repository later.
  • Highlighted possible improvements on tact compiler API:
    • Tact compiler development is moving towards creating a modular architecture with an API for third-party tools. Being one of the first tools in that ecosystem, it is easy to find new ways to improve the API, which is beneficial for both the tool and the entire compiler ecosystem.
    • See: Tact frontend API: Improvements for tooling and issues labeled api in the Tact repo.
  • Tools useful for auditors
    • Being a security tool, Misti addresses the common problems auditors needed to understand and analyze the source code.
    • Currently, it implements the CFG dump:
      image
    • Import and callgraph support will be added later.
  • Insights on Tact security:
    • Gathered feedback and gained experience on potential Tact security issues, which is crucial for the ongoing secure language design and improvements in tooling.
    • This resulted in new ideas about the detectors we need: new detector issues.

Acknowledgments

My thanks to the following people who helped make this release possible:

  • Anton Trunov and Daniil Sedov: Assistance with testing, discussions on the analyzer implementation, and valuable input on new detector ideas and their design.
  • Novus Nota: Insightful discussions on tooling and compiler API, and support from the Tact compiler side.

Next steps

The next desired steps in the development include:

  • v0.2: Fixes for issues that arise, new detectors for Tact, community engagement, and following to the Tact development (some of the issues are blocked and depend on compiler API updates).
  • v0.3: A release focused on tooling support and optimizations. The main goal will be to develop an LSP server and integrate it with the supported editors/IDEs.
  • v1.0: Complete FunC support and the final version of the IR that supports both. This major release will guarantee a stable API for all internals, which third parties can rely on.

@delovoyhomie
Copy link
Collaborator

@byakuren-hijiri thank you for the contribution!

To accurately recognize your valuable contributions in our repository, we kindly request you to submit a Pull Request to the Hall of Fame file, providing the wallet address and a link to the bounty with the number.

Please follow these steps:

  1. Fork the repository (if you haven't already).

  2. Edit the Hall of Fame file, commit, and push your changes.

  3. Create a Pull Request from your fork to the main repository, providing the wallet address and a link to the bounty with the number (for example, Pull Request Article: Generation of block random seed #136).
    For reference on what your entry should look like, please see the examples of past merged pull requests.

  4. And please follow the questbook proposal stage in accordance with the bounty guideline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Approved This proposal is approved by the committee Developer Tool Related to tools or utilities used by developers
Projects
None yet
Development

No branches or pull requests

7 participants