Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: find scoped identifiers in Java #504

Merged
merged 5 commits into from
Sep 13, 2024
Merged

fix: find scoped identifiers in Java #504

merged 5 commits into from
Sep 13, 2024

Conversation

morgante
Copy link
Contributor

@morgante morgante commented Sep 13, 2024

Fixes #503

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new test for validating Java import statement parsing.
    • Added support for Java import statements in code snippet generation.
    • Implemented Python and Go bindings for the Tree-sitter Java parser.
    • Enhanced Java grammar with improved Unicode escape sequence recognition.
  • Bug Fixes

    • Improved error handling in module loading for Java bindings.
  • Documentation

    • Added configuration files for editor settings and build processes.
  • Chores

    • Updated package dependencies and project metadata for better compatibility.

@morgante morgante requested a review from a team as a code owner September 13, 2024 02:40
Copy link
Contributor

coderabbitai bot commented Sep 13, 2024

Walkthrough

Walkthrough

The changes include the addition of a new test function for parsing Java import statements, updates to the Java language implementation for snippet context strings, and the introduction of various configuration files such as .editorconfig, .gitattributes, and .gitignore. Additionally, new bindings for Node.js, Go, and Python have been implemented, along with modifications to the build scripts and grammar definitions. These changes collectively enhance the Java language support within the Tree-sitter framework.

Changes

Files Change Summary
crates/core/src/test.rs, crates/language/src/java.rs Added a test function for Java import parsing and updated snippet context strings to include Java import statements.
resources/language-metavariables/tree-sitter-java/.editorconfig, resources/language-metavariables/tree-sitter-java/.gitattributes, resources/language-metavariables/tree-sitter-java/.gitignore Introduced configuration files to enforce coding style guidelines, manage file attributes, and ignore unnecessary files in version control.
resources/language-metavariables/tree-sitter-java/binding.gyp, resources/language-metavariables/tree-sitter-java/bindings/node/binding.cc, resources/language-metavariables/tree-sitter-java/bindings/go/binding.go, resources/language-metavariables/tree-sitter-java/bindings/python/binding.c Implemented bindings for Node.js, Go, and Python, including necessary dependencies and build configurations.
resources/language-metavariables/tree-sitter-java/grammar.js, resources/language-metavariables/tree-sitter-java/src/grammar.json Modified regex patterns for Unicode escape sequences in the Java grammar to allow broader matching criteria.
resources/language-metavariables/tree-sitter-java/setup.py, resources/language-metavariables/tree-sitter-java/pyproject.toml Introduced build configuration files for Python packaging and specified project metadata.

Assessment against linked issues

Objective Addressed Explanation
Improved Java metavariable support (#503)
Support for parsing specific import patterns (#503)

Tip

OpenAI O1 model for chat
  • We have deployed OpenAI's latest O1 model for chat.
  • OpenAI claims that this model has superior reasoning capabilities than their GPT-4o model.
  • Please share any feedback with us in the discussions post.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Outside diff range and nitpick comments (1)
resources/language-metavariables/tree-sitter-java/bindings/node/index.d.ts (1)

21-25: Consider using a more specific type for the language property.

The Language type is correctly defined. However, the language property is of type unknown, which means it can be of any type. This can lead to potential type safety issues.

Consider using a more specific type for the language property based on the expected value. If the type is not known, consider using any instead of unknown.

Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 6dd1842 and 4f91c19.

Files ignored due to path filters (1)
  • crates/wasm-bindings/wasm_parsers/tree-sitter-java.wasm is excluded by !**/*.wasm
Files selected for processing (28)
  • crates/core/src/test.rs (1 hunks)
  • crates/language/src/java.rs (1 hunks)
  • resources/language-metavariables/tree-sitter-java/.editorconfig (1 hunks)
  • resources/language-metavariables/tree-sitter-java/.gitattributes (1 hunks)
  • resources/language-metavariables/tree-sitter-java/.gitignore (1 hunks)
  • resources/language-metavariables/tree-sitter-java/binding.gyp (1 hunks)
  • resources/language-metavariables/tree-sitter-java/bindings/c/tree-sitter-java.h (1 hunks)
  • resources/language-metavariables/tree-sitter-java/bindings/c/tree-sitter-java.pc.in (1 hunks)
  • resources/language-metavariables/tree-sitter-java/bindings/go/binding.go (1 hunks)
  • resources/language-metavariables/tree-sitter-java/bindings/go/binding_test.go (1 hunks)
  • resources/language-metavariables/tree-sitter-java/bindings/go/go.mod (1 hunks)
  • resources/language-metavariables/tree-sitter-java/bindings/node/binding.cc (1 hunks)
  • resources/language-metavariables/tree-sitter-java/bindings/node/index.d.ts (1 hunks)
  • resources/language-metavariables/tree-sitter-java/bindings/node/index.js (1 hunks)
  • resources/language-metavariables/tree-sitter-java/bindings/python/tree_sitter_java/init.py (1 hunks)
  • resources/language-metavariables/tree-sitter-java/bindings/python/tree_sitter_java/init.pyi (1 hunks)
  • resources/language-metavariables/tree-sitter-java/bindings/python/tree_sitter_java/binding.c (1 hunks)
  • resources/language-metavariables/tree-sitter-java/bindings/rust/build.rs (1 hunks)
  • resources/language-metavariables/tree-sitter-java/grammar.js (1 hunks)
  • resources/language-metavariables/tree-sitter-java/package.json (1 hunks)
  • resources/language-metavariables/tree-sitter-java/pyproject.toml (1 hunks)
  • resources/language-metavariables/tree-sitter-java/setup.py (1 hunks)
  • resources/language-metavariables/tree-sitter-java/src/grammar.json (1 hunks)
  • resources/language-metavariables/tree-sitter-java/src/tree_sitter/alloc.h (1 hunks)
  • resources/language-metavariables/tree-sitter-java/src/tree_sitter/array.h (1 hunks)
  • resources/language-metavariables/tree-sitter-java/src/tree_sitter/parser.h (7 hunks)
  • resources/language-metavariables/tree-sitter-java/test/highlight/types.java (1 hunks)
  • resources/metavariable-grammars/java-metavariable-grammar.js (1 hunks)
Files skipped from review due to trivial changes (7)
  • resources/language-metavariables/tree-sitter-java/.editorconfig
  • resources/language-metavariables/tree-sitter-java/.gitattributes
  • resources/language-metavariables/tree-sitter-java/.gitignore
  • resources/language-metavariables/tree-sitter-java/bindings/c/tree-sitter-java.h
  • resources/language-metavariables/tree-sitter-java/bindings/c/tree-sitter-java.pc.in
  • resources/language-metavariables/tree-sitter-java/bindings/go/go.mod
  • resources/language-metavariables/tree-sitter-java/pyproject.toml
Additional comments not posted (40)
resources/language-metavariables/tree-sitter-java/bindings/python/tree_sitter_java/__init__.pyi (1)

1-1: LGTM!

The Python binding stub file for the language function is correctly defined, following the appropriate stub file conventions and Tree-sitter Python binding patterns.

resources/language-metavariables/tree-sitter-java/bindings/python/tree_sitter_java/__init__.py (1)

1-5: LGTM!

The Python package initialization file follows the standard structure and correctly exports the language object from the _binding module. The use of the __all__ variable to specify the public interface is a good practice.

resources/language-metavariables/tree-sitter-java/bindings/node/index.js (2)

1-2: LGTM!

The code segment is correctly using the path module to get the root path. The root path is being used to load the module using node-gyp-build in the next code segment.


3-3: Excellent refactor!

The code segment is correctly using node-gyp-build to load the module from the root path. This change simplifies the module loading logic by eliminating the need for multiple try-catch blocks and reduces the complexity of error handling.

resources/language-metavariables/tree-sitter-java/bindings/go/binding.go (1)

1-13: LGTM!

The Go binding code for the Tree-sitter Java grammar looks good. It follows the standard pattern for creating a binding, imports the necessary packages, and correctly returns the language pointer from the Language() function.

The code is simple, clear, and follows best practices. I don't see any issues or areas for improvement.

resources/language-metavariables/tree-sitter-java/bindings/go/binding_test.go (1)

10-15: LGTM!

The test function is correctly loading the Java grammar and checking if it loaded successfully. The test function is also correctly reporting an error if the language failed to load.

resources/language-metavariables/tree-sitter-java/bindings/node/index.d.ts (3)

1-4: LGTM!

The BaseNode type is correctly defined with appropriate property types.


6-10: LGTM!

The ChildNode type is correctly defined with appropriate property types.


12-19: LGTM!

The NodeInfo union type is correctly defined with appropriate object types and property types.

resources/language-metavariables/tree-sitter-java/bindings/rust/build.rs (1)

13-15: LGTM!

The addition of the -utf-8 flag for the MSVC target environment is a good practice to ensure proper handling of UTF-8 encoded source files. This change enhances compatibility and aligns with the overall goal of improving the build process.

resources/language-metavariables/tree-sitter-java/bindings/node/binding.cc (5)

1-1: LGTM!

The inclusion of the napi.h header file is necessary for using the N-API framework.


3-3: LGTM!

The typedef for TSLanguage struct is necessary for using it in the binding code.


5-5: LGTM!

The extern C function declaration is necessary for linking the Tree-sitter Java language parser with the binding code.


7-10: LGTM!

The LANGUAGE_TYPE_TAG is used for type safety in N-API. The hash value is unique and specific to the Tree-sitter Java language parser.


12-20: LGTM!

The Init function correctly exports the language name and a new external reference to the TSLanguage structure. The type tag for the language is set correctly using LANGUAGE_TYPE_TAG. The module registration macro has been updated correctly to align with the N-API conventions.

resources/language-metavariables/tree-sitter-java/binding.gyp (3)

5-7: LGTM!

The addition of the node-addon-api dependency using the dynamic command is correct and necessary for building the Node.js binding.


12-12: LGTM!

The inclusion of the bindings/node/binding.cc source file is correct and necessary for compiling the Node.js binding implementation.


17-26: LGTM!

The refinement of the compiler flags based on the operating system is a positive change. It ensures compliance with the C11 standard and sets the appropriate encoding for Windows. This enhances cross-platform compatibility and maintainability.

resources/language-metavariables/tree-sitter-java/bindings/python/tree_sitter_java/binding.c (1)

1-27: LGTM!

The Python binding code for the Tree-sitter Java grammar looks good:

  • It correctly includes the necessary Python headers.
  • The tree_sitter_java function is declared to return the Tree-sitter language for Java.
  • The _binding_language function correctly returns the pointer to the Tree-sitter Java language as a Python long object.
  • The methods array and module struct are correctly defined to create the _binding Python module with the language method.
  • The PyInit__binding function correctly initializes and returns the _binding Python module.

No changes needed.

resources/language-metavariables/tree-sitter-java/test/highlight/types.java (2)

1-9: LGTM!

The enum declaration and constants are syntactically correct and align with the PR objective of improving Java language support in Tree-sitter. The code comments are consistent with the expected highlighting for the enum type and constants.


11-36: LGTM!

The class declaration, constructor, and method are syntactically correct and align with the PR objective of improving Java language support in Tree-sitter. The code comments are consistent with the expected highlighting for types, methods, variables, and constants.

The usage of scoped identifiers like one.two.Three and Material.DENIM aligns with the linked issue objective of improving Java metavariable support.

resources/language-metavariables/tree-sitter-java/src/tree_sitter/alloc.h (1)

1-54: LGTM!

The new alloc.h header file is well-structured and follows best practices. The use of macros to allow clients to override allocation functions provides flexibility. The code is clear, concise, and easy to understand.

Some key observations:

  • The file uses header guards to prevent multiple inclusions.
  • The extern "C" block ensures C++ compatibility.
  • If the TREE_SITTER_REUSE_ALLOCATOR macro is defined, it uses client-provided allocation functions.
  • If the macro is not defined, it defaults to standard C allocation functions.

Overall, the code looks good and is ready to be merged.

resources/language-metavariables/tree-sitter-java/package.json (7)

6-6: LGTM!

The addition of the "types" field improves the developer experience by providing TypeScript type definitions for the package.


11-18: LGTM!

The addition of the "files" field helps reduce the package size by specifying which files and directories should be included in the published package. The listed files and directories appear to be relevant and necessary for the package's functionality.


26-28: LGTM!

The update to the "dependencies" section, replacing "nan" with "node-addon-api" and "node-gyp-build", is a positive change. It suggests a shift towards using more modern and efficient tools for building native Node.js addons, which is likely to improve the package's compatibility and performance.


29-31: LGTM!

The addition of the "peerDependencies" section, specifying "tree-sitter" as a peer dependency with version "^0.21.0", is a good practice. It ensures that the package is used with a compatible version of the "tree-sitter" library, helping prevent potential version conflicts and ensuring the package works as expected.


32-35: LGTM!

The addition of the "peerDependenciesMeta" section, marking the "tree_sitter" peer dependency as optional, provides flexibility for users who may not need the full functionality provided by the "tree-sitter" library. This change allows the package to be used in a wider range of scenarios while still recommending the use of "tree-sitter" for full functionality.


38-39: LGTM!

The update to the "devDependencies" section, replacing "tree-sitter-cli" with "prebuildify", suggests an improvement in the package's build process. The addition of "prebuildify" indicates that prebuilt binaries will be created for the package, which can enhance the installation experience for users by reducing the need to compile the package from source on their machines.


45-47: LGTM!

The expansion of the "scripts" section with new commands "build-test", "install", and "prebuildify" is a positive change. These additions enhance the build process and provide more flexibility for developers working with the package. The "build-test" script offers a convenient way to ensure the package builds successfully and passes tests, while the "install" script simplifies the installation process. The "prebuildify" script, likely used with the "prebuildify" development dependency, facilitates the creation of prebuilt binaries for the package.

resources/language-metavariables/tree-sitter-java/setup.py (1)

1-60: LGTM!

The Python setup script follows the standard structure and conventions. The custom build and bdist_wheel commands are implemented correctly. The package configuration options in the setup function are appropriate for the Tree-sitter grammar package. The C extension module is configured with the necessary source files, compile arguments, and other options. There are no apparent issues or inconsistencies in the code.

crates/language/src/java.rs (1)

65-65: LGTM!

The addition of the ("import ", ";") tuple to the snippet_context_strings method is a valuable change that aligns with the PR objective of improving Java metavariable support for import statements. This tuple correctly represents the context for Java import statements, enabling the parser to handle them appropriately when generating code snippets.

resources/language-metavariables/tree-sitter-java/src/tree_sitter/parser.h (5)

17-17: LGTM!

The reintroduction of the TSStateId typedef is necessary for the correct functioning of the parser and ensures that the type is consistently defined for state identification.


89-92: LGTM!

The introduction of the TSCharacterRange struct is a useful addition to the parser implementation. It enhances the ability to manage character ranges effectively within the parsing logic by providing start and end fields.


133-149: LGTM!

The introduction of the set_contains inline function is a valuable addition to the parser implementation. It optimizes performance when dealing with multiple character ranges by employing a binary search approach to efficiently check for containment. This change improves the efficiency of the parsing logic.


155-159: LGTM!

The addition of preprocessor directives to define the UNUSED macro differently for MSVC and other compilers improves the portability of the code. The UNUSED macro is used to suppress warnings for unused variables, and the preprocessor directives ensure that the macro is defined correctly for different compilers.


180-190: LGTM!

The modifications to the SMALL_STATE and REDUCE macros improve the clarity and maintainability of the code.

The SMALL_STATE macro is modified to ensure proper parentheses around the subtraction operation, preventing potential issues with operator precedence.

The REDUCE macro is revised to include additional parameters for dynamic precedence and production ID, enhancing its functionality and flexibility in parsing actions.

These changes contribute to the overall robustness and readability of the parser implementation.

Also applies to: 209-209, 240-248

resources/language-metavariables/tree-sitter-java/src/tree_sitter/array.h (1)

1-290: LGTM!

The generic array implementation in this header file is well-designed and follows best practices. The code is clean, well-documented, and provides a nice set of macros for easy usage. The static inline functions are efficient and handle memory management safely. The binary search implementation is based on a good reference from Rust's standard library. The code is also portable across different compilers and platforms.

Some key highlights:

  • Consistent naming convention for macros, functions, and variables.
  • Clean separation of public macros and private functions.
  • Efficient implementation of array operations using static inline functions.
  • Proper use of assertions and memory management functions for safety.
  • Portable code with conditional compilation and pragma directives.

Overall, this is a high-quality code for a generic array implementation in C. Great job!

resources/language-metavariables/tree-sitter-java/grammar.js (1)

184-184: LGTM!

The change to the regular expression pattern in the escape_sequence rule improves the flexibility of the parser in recognizing valid Unicode escape sequences. By removing the specific quantifier {1,3} for octal digits and allowing for a broader match of one or more hexadecimal digits, the parser can potentially handle a wider range of input without altering the fundamental logic of the grammar.

resources/metavariable-grammars/java-metavariable-grammar.js (1)

184-184: LGTM! The change improves compatibility with Java source code.

The updated regex pattern /u[0-9a-fA-F]+/ for Unicode escape sequences aligns with the Java Language Specification, which supports Unicode escape sequences both with and without curly braces.

Removing the curly braces {} from the pattern allows the grammar to correctly match and parse Unicode escape sequences in the non-braced form, such as \u1F600, in addition to the braced form \u{1F600}.

This change improves the grammar's compatibility with Java source code that uses the non-braced form of Unicode escape sequences.

crates/core/src/test.rs (1)

8843-8884: Looks good!

This is a well-structured test case that covers the parsing of Java import statements based on the defined patterns. It distinguishes between Spring-specific imports and generic imports correctly.

The test input includes a good variety of import statements, and the expected output aligns with the pattern logic.

@@ -1437,7 +1437,7 @@
},
{
"type": "PATTERN",
"value": "u{[0-9a-fA-F]+}"
"value": "u[0-9a-fA-F]+"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid Unicode escape sequence pattern.

The modified pattern "value": "u[0-9a-fA-F]+" is not a valid format for Unicode escape sequences in Java. According to the Java Language Specification, Unicode escape sequences must be of the form \u{NNNN} (with curly braces) or \uNNNN (without curly braces), where NNNN is a sequence of 4 hex digits.

Removing the curly braces from the original pattern "value": "u{[0-9a-fA-F]+}" changes the syntax and makes it invalid. The original pattern correctly matches Unicode escape sequences with 1 or more hex digits enclosed in curly braces.

Please revert this change and keep the original pattern to ensure compatibility with the Java specification.

@morgante morgante merged commit 50690a3 into main Sep 13, 2024
10 checks passed
@morgante morgante deleted the java-support branch September 13, 2024 04:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improved Java metavariable support
1 participant