Skip to content

aptos-labs/tree-sitter-move-on-aptos

Repository files navigation

Treesitter Semgrep Move on Aptos (BETA)

Semgrep integration and Move Tree-sitter grammar

Project Structure

Most files within this repo are auto-generated by tree-sitter. The only files you need to care about:

  • grammar.js: the main grammar rules for move programming language;
  • src/scanner.c: the external scanner used in grammar.js. Currently, it is used to scan block (document) comments and line document comments. It’s unlikely you will need to update it or add new scanners;
  • batch-test.py: a Python script for testing the grammar. It will recursively scan the given paths and test files ending with .move against the grammar. Usage: python3 batch-test.py <PATH> [ ... <PATH> ]. You should run tree-sitter generate each time you modify the grammar before testing.
  • .github/workflows/test-on-repo.yaml: GitHub Workflow configurations.

Setting up the Environment

Before contributing to the grammar rules, install and configure tree-sitter . A good way to install it is going through tree-sitter's Getting Started section.

It is recommended to use a node version manager for Node.js runtimes.

By the time you have finished, you should have these installed and configured:

  • Node.js (optimally installed by a version manager);
  • A working C compiler (for macOS user, this is shipped by Xcode Command Line Tools);
  • tree-sitter installed either through cargo or npm. Be sure that tree-sitter can be found within $PATH.
  • (Optional) Rust compiler and Cargo.

Additionally, you may also want to install Python for batch testing the rules.

You need to execute tree-sitter init-config under the repo to initialize tree-sitter for the first time.

Writing the Rules

Mostly likely, grammar.js is the only file requiring modifications. It is rare to update src/scanner.c.

Tp learn how to write tree-sitter grammar DSL, see:

In addition, a few sources you may need:

  1. https://github.com/tree-sitter/tree-sitter-rust: Rust’s tree-sitter grammars.

  2. https://github.com/tree-sitter/tree-sitter-javascript: JavaScript’s tree-sitter grammars.

  3. third_party/move/move-compiler/src/parser/syntax.rs: Move’s top-down parser, the de-facto grammar reference.

    Be aware, the documents within syntax.rs (especially the doc comments before a parsing method) could be incomplete or wrong. You should always read the codes for reference.

    Also, when contributing to this repo, be sure to pull aptos-core periodically in case of new language features.

After you finish coding, run npm run format to format your code.

Finally, run tree-sitter generate to check

  1. whether grammar.js contains any syntax errors;
  2. whether the rules contain any conflicts. Tree-sitter’s documents serve as a great literature for resolving conflicts.

Testing the Grammar

To test the grammar on an individual file, run:

$ tree-sitter parse ${MOVE_FILE}

Some useful flags for debugging:

  • -d: show parsing debug log;
  • -D: produce the log.html file with debugging parsing graphs.

You would get a parsing tree in the standard output after execution. An error message may be present at the last line, and you can jump to the place based on the line and column number. Line numbers and column numbers start from 0.

Remember to test the rule on a larger scale using batch-test.py.

Submitting the Code

You should run

npm run format
tree-sitter generate

before committing. Remember to include all updated generated code into your git commit.