Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync to 1.2: adding in filter for the partition state rows scan #16077

Merged
merged 15 commits into from
May 14, 2024

Conversation

gouhongshen
Copy link
Contributor

@gouhongshen gouhongshen commented May 13, 2024

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue ##15775

What this PR does / why we need it:

support in filter to fast filter rows.

@matrix-meow matrix-meow added the size/M Denotes a PR that changes [100,499] lines label May 13, 2024
@mergify mergify bot requested a review from sukki37 May 13, 2024 14:51
@matrix-meow
Copy link
Contributor

@gouhongshen Thanks for your contributions!

Here are review comments for file pkg/vm/engine/disttae/logtailreplay/rows_iter.go:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating that the changes are related to syncing to version 1.2 and adding a filter for partition state rows scan.

Body:

The body of the pull request provides information on the type of PR (Improvement), the related issue (#15775), and the purpose of the changes, which is to support a filter for fast filtering rows.

Changes in pkg/vm/engine/disttae/logtailreplay/rows_iter.go:

  1. Code Optimization:

    • The addition of the primaryIndex field in the primaryKeyIter struct seems to be a useful optimization to avoid unnecessary copying of the primary index.
    • The introduction of the phase type and constants (scan, seek, judge) along with the Move method in PrimaryKeyMatchSpec enhances the readability and maintainability of the code.
    • The refactoring of the Move method in PrimaryKeyMatchSpec to handle different matching scenarios (Exact, Prefix, ExactIn) improves code organization and clarity.
  2. Security Issue:

    • The use of the first variable in the Move methods of PrimaryKeyMatchSpec could lead to unexpected behavior in a concurrent environment. It should be handled carefully to ensure thread safety. Consider using a mutex or atomic operations to manage the first variable.
  3. Code Quality:

    • The MinMax function in PrimaryKeyMatchSpec is incomplete and returns an empty PrimaryKeyMatchSpec{}. This should be implemented according to the intended functionality to avoid confusion and potential bugs.
  4. Optimization:

    • The ExactIn function in PrimaryKeyMatchSpec could be optimized further by refactoring the logic within the Move method to improve performance and readability. Consider simplifying the logic and reducing the number of nested conditions for better maintainability.
  5. Consistency:

    • Ensure consistent naming conventions and formatting throughout the file to maintain code readability and consistency.

Suggestions for Improvement:

  1. Security Enhancement:

    • Address the potential thread safety issue related to the first variable by implementing a thread-safe solution such as using synchronization mechanisms like mutexes or atomic operations.
  2. Completeness:

    • Implement the logic for the MinMax function in PrimaryKeyMatchSpec to ensure that all functions are fully functional and consistent.
  3. Optimization:

    • Refactor the ExactIn function to improve the efficiency and readability of the code, possibly by simplifying the logic and reducing nested conditions.
  4. Consistency:

    • Ensure consistent naming conventions, formatting, and code style throughout the file to enhance code maintainability and readability.

By addressing the security issue, completing the MinMax function, optimizing the ExactIn function, and ensuring code consistency, the pull request can be further improved in terms of security, functionality, and maintainability.

Here are review comments for file pkg/vm/engine/disttae/reader.go:

Pull Request Review:

Title and Body:

The title of the pull request is clear and informative, indicating that the changes are related to syncing to version 1.2 and adding a filter for partition state rows scan. The body of the pull request provides additional context by mentioning the issue it addresses and the need for the changes. It also specifies the type of PR as an improvement.

Changes in pkg/vm/engine/disttae/reader.go:

  1. Redundant Code Removal:

    • Lines 53 and 81: The removal of mixin.columns.rowidPos = -1 is repeated in the reset and tryUpdateColumns functions. This indicates redundancy and can be removed from both places to avoid confusion and unnecessary code duplication.
  2. Variable Renaming:

    • Lines 581 and 583: The variable encodedPrimaryKey has been renamed to pkVal. While this change may be acceptable, it's essential to ensure that the new variable name accurately reflects its purpose and usage throughout the codebase.
  3. Logic Improvement:

    • Lines 581 and 583: The condition len(r.pkVal) > 0 is used to check the length of pkVal. This change seems reasonable, but it's crucial to ensure that this logic aligns with the intended functionality and doesn't introduce any unintended consequences.
  4. Security Concern (Potential):

    • Line 581: When handling sensitive data like primary keys (pkVal), it's crucial to consider potential security implications. Ensure that the handling of pkVal is done securely to prevent any data leakage or vulnerabilities. Consider encryption or secure handling mechanisms if necessary.
  5. Optimization Suggestion:

    • Line 581: Instead of directly accessing the length of pkVal multiple times, consider storing the length in a variable if it's used repeatedly. This can improve code readability and potentially optimize performance by avoiding redundant calculations.
  6. Code Readability:

    • Line 589: The use of logtailreplay.Prefix(r.pkVal) should be well-documented to clarify its purpose and ensure that future developers understand its functionality without ambiguity.
  7. Consistency Check:

    • Throughout the changes, ensure consistency in variable naming, formatting, and commenting to maintain code readability and maintainability.

Overall Suggestions:

  • Review the security implications of handling sensitive data like primary keys (pkVal) and implement necessary security measures.
  • Remove redundant code snippets to improve code clarity and maintainability.
  • Ensure that variable names accurately reflect their purpose to avoid confusion.
  • Consider optimizing code by storing frequently used values in variables to enhance performance.
  • Document any complex or critical logic to aid future maintenance and understanding.

By addressing these points, the pull request can be enhanced in terms of code quality, security, and maintainability.

Here are review comments for file pkg/vm/engine/disttae/txn_table.go:

Pull Request Review:

Title and Body:

The title of the pull request is clear and concise, indicating that the changes are related to syncing to version 1.2 and adding a filter for partition state rows scan. The body of the pull request provides relevant information about the type of PR and the issue it fixes, which is helpful for context.

Changes in pkg/vm/engine/disttae/txn_table.go:

  1. Function Naming and Parameters:

    • The function makeEncodedPK has been renamed to tryExtractPKFilter, which is more descriptive of its purpose.
    • The parameters of the NewReader and newMergeReader functions have been refactored to include pkValue and fnType, which improves readability and clarity of the code.
  2. Security Concern - Potential SQL Injection:

    • The function `tryExtract

Here are review comments for file pkg/vm/engine/disttae/types.go:

Pull Request Review:

Title and Body:

The title of the pull request is clear and concise, indicating that the changes are related to syncing to version 1.2 and adding a filter for partition state rows scan. The body of the pull request provides information about the type of PR, the related issue, and the purpose of the changes. It would be beneficial to provide more detailed information about the specific improvements made in this PR to enhance clarity.

Changes in types.go:

  1. Redundant Code Removal:

    • The removal of rowidPos int from the withFilterMixin struct is a positive change as it eliminates redundant code that was not being used. This simplifies the structure and reduces unnecessary complexity.
  2. Variable Renaming:

    • Renaming encodedPrimaryKey to pkVal in the blockMergeReader struct is a good practice for improving code readability. However, it would be beneficial to ensure that the new variable name accurately reflects its purpose and usage in the context of the codebase.
  3. Consistency in Comments:

    • It is recommended to update the comments in the code to reflect the changes accurately. For example, the comment mentioning rowidPos should be removed or updated to reflect the removal of the variable.

Suggestions for Improvement:

  1. Documentation Update:

    • It would be helpful to update the code comments and documentation to reflect the changes made in this pull request. This ensures that future developers understand the purpose and functionality of the modified code.
  2. Variable Naming Clarity:

    • Ensure that the variable names (pkPos, pkVal) are clear and descriptive to avoid confusion and improve code maintainability.
  3. Code Optimization:

    • Consider reviewing other parts of the codebase for similar redundant variables or structures that can be removed to streamline the code and improve its efficiency.
  4. Security Consideration:

    • While the changes in this pull request do not introduce security vulnerabilities, it is essential to conduct a thorough security review of the entire codebase to identify and address any potential security risks.

Overall, the changes in this pull request are focused on code cleanup and minor optimizations. By addressing the suggestions mentioned above and ensuring consistency in code quality across the project, the codebase can be further improved in terms of readability, maintainability, and efficiency.

Here are review comments for file test/distributed/cases/disttae_filters/ranges_filters/ranges_filters.result:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating that the changes are related to syncing to version 1.2 and adding a filter for partition state rows scan.

Body:

The body of the pull request provides information about the type of PR (Improvement), the related issue (#15775), and the purpose of the changes, which is to support filtering to fast filter rows.

Changes:

The changes in the pull request involve modifying a file named ranges_filters.result in the test/distributed/cases/disttae_filters/ranges_filters/ directory. However, the diff provided does not show any actual code changes, only a file rename from ranges_filters.result to disttae_filters/ranges_filters/ranges_filters.result.

Feedback:

  1. File Path Typo:

    • The file path in the diff seems to have a typo. It shows a change from test/distributed/cases/ranges_filters/ranges_filters.result to test/distributed/cases/disttae_filters/ranges_filters/ranges_filters.result. The correct directory name should be dist**r**ibuted instead of dist**t**ae_filters. This typo needs to be corrected.
  2. Missing Code Changes:

    • The diff provided does not show any actual code changes within the file. It is important to ensure that the changes made in the codebase are included in the pull request diff for review.
  3. Security Concerns:

    • Since the diff provided does not show the actual code changes, it is not possible to assess any potential security implications. It is crucial to review the actual code modifications to ensure that no security vulnerabilities are introduced.

Suggestions:

  1. Correct File Path Typo:

    • Update the file path in the diff to reflect the correct directory name distributed instead of disttae_filters.
  2. Include Code Changes:

    • Ensure that the actual code changes are included in the pull request diff for review. This will allow for a thorough assessment of the modifications made.
  3. Security Review:

    • Conduct a detailed review of the code changes to identify and address any potential security concerns or vulnerabilities that may have been introduced.

By addressing the file path typo, including the actual code changes in the diff, and conducting a security review, the pull request can be improved in terms of accuracy, completeness, and security considerations.

Here are review comments for file test/distributed/cases/disttae_filters/ranges_filters/ranges_filters.sql:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating that the changes are related to syncing to version 1.2 and adding a filter for partition state rows scan.

Body:

The body of the pull request provides information about the type of PR, the related issue, and the purpose of the changes. It specifies that this PR is an improvement and addresses issue #15775 by adding support for a filter to fast filter rows.

Changes:

The changes in the pull request involve modifying the file test/distributed/cases/ranges_filters/ranges_filters.sql. However, the diff provided does not show any actual code changes, only a file path change from ranges_filters to disttae_filters. This seems to be a typo in the file path.

Feedback and Suggestions:

  1. File Path Typo:

    • The file path in the diff seems to have a typo. It changes from ranges_filters to disttae_filters, which might be incorrect. It is recommended to double-check the file path changes and ensure they are accurate.
  2. Code Changes:

    • Since the diff does not show any actual code changes, it is essential to verify if the intended changes have been correctly included in the pull request. If there are code changes related to adding a filter for partition state rows scan, ensure they are visible in the diff.
  3. Optimization:

    • If the changes involve adding a new filter for faster row filtering, consider providing comments or documentation within the code to explain the purpose and functionality of the filter. This will help maintain code clarity and assist other developers in understanding the implementation.
  4. Testing:

    • After verifying the code changes, ensure that appropriate testing is conducted to validate the functionality of the added filter. Adding test cases to cover different scenarios will help ensure the reliability of the new feature.
  5. Collaboration:

    • Collaborate with team members or reviewers to confirm the correctness of the file path changes and the presence of the expected code modifications. This collaborative effort can help in catching any oversights or errors before merging the pull request.

By addressing the file path typo, verifying the code changes, adding documentation, conducting thorough testing, and collaborating with team members, the quality and effectiveness of the pull request can be enhanced.

Here are review comments for file test/distributed/cases/disttae_filters/reader_filters/partition_state_primary_key_index_filter.result:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating that the changes are related to syncing to version 1.2 and adding a filter for partition state rows scan.

Body:

The body of the pull request provides relevant information about the type of PR (Improvement), the linked issue (#15775), and the purpose of the changes, which is to support a filter for fast filtering rows.

Changes:

  1. The changes in the file partition_state_primary_key_index_filter.result involve creating, inserting, selecting, and dropping tables (t1, t2, t3, t4) in a test scenario.
  2. Various SQL queries are performed on these tables to test filtering based on primary keys and indexes.
  3. The test includes operations like selecting rows based on specific values, deleting rows, inserting new rows, and checking the results.
  4. The test script ends with dropping tables and the test database.

Feedback and Suggestions:

  1. Security Issue - SQL Injection Risk:

    • The use of dynamic SQL queries with execute using in the test script (execute s1 using @a,@a,@a,@a, @b,@b,@b,@b, @c,@c,@c,@c, @d,@d,@d,@d, @e,@e,@e,@e;) poses a security risk for SQL injection attacks. It's recommended to use parameterized queries to prevent this vulnerability.
  2. Optimization:

    • Consider optimizing the test script by breaking it into smaller, more focused tests. This can improve readability, maintainability, and ease of debugging.
  3. End of File Indicator:

    • The file seems to lack a newline at the end. It's good practice to have a newline at the end of files to adhere to common conventions.
  4. Consistency in SQL Formatting:

    • Ensure consistent formatting of SQL queries throughout the script for better readability and maintainability.
  5. Documentation:

    • Consider adding comments or documentation within the test script to explain the purpose of each test case and the expected outcomes.
  6. Cleanup Operations:

    • Ensure that all cleanup operations (like dropping tables and databases) are performed correctly to avoid leaving unnecessary artifacts in the test environment.
  7. Testing Edge Cases:

    • It might be beneficial to include test cases that cover edge scenarios to ensure robustness and reliability of the filtering functionality.
  8. Automated Testing:

    • Consider automating these test cases to run as part of the continuous integration (CI) process to catch any regressions early.

By addressing the security issue, optimizing the test script, ensuring consistency, adding documentation, and improving testing coverage, the quality and reliability of the codebase can be enhanced.

Here are review comments for file test/distributed/cases/disttae_filters/reader_filters/partition_state_primary_key_index_filter.sql:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating that the changes are related to syncing to version 1.2 and adding a filter for partition state rows scan.

Body:

The body of the pull request provides information about the type of PR (Improvement), the linked issue (#15775), and the purpose of the changes, which is to support a filter for fast filtering rows.

Changes:

  1. The changes in the partition_state_primary_key_index_filter.sql file involve creating, inserting data into, and querying multiple tables (t1, t2, t3, t4) with various scenarios and conditions.
  2. The script includes operations like creating databases, tables, inserting data, selecting data based on primary keys and indexes, using IN clause, preparing statements for insert, and dropping tables and databases.

Feedback and Suggestions:

  1. Security Concerns:

    • The script contains hardcoded values for sensitive operations like database credentials (create database, drop database). It's important to avoid exposing sensitive information in scripts. Consider using environment variables or secure storage mechanisms for such data.
    • Avoid using plain text passwords or sensitive information directly in scripts to prevent security vulnerabilities.
  2. Code Optimization:

    • Consider parameterizing the script to make it more reusable and secure. This can help prevent SQL injection attacks and improve performance.
    • Break down the script into smaller, more manageable parts or functions for better readability and maintainability.
    • Use transactions where necessary to ensure data consistency and integrity during operations like inserts and deletes.
  3. Testing:

    • Ensure comprehensive testing of the script with different data scenarios to validate its functionality and performance.
    • Consider adding comments to explain the purpose of each section or operation in the script for better understanding by other developers.
  4. Cleanup:

    • Remove unnecessary operations or queries that do not directly contribute to the testing or demonstration of the filter functionality to keep the script focused and concise.
    • Avoid using @ignore annotations without a clear reason, as it may lead to confusion or oversight during code reviews or testing.
  5. Documentation:

    • Provide inline comments or a separate documentation section to explain the purpose of the script, the expected outcomes of each operation, and any dependencies or prerequisites required for running the script successfully.

By addressing the security concerns, optimizing the code for performance and readability, thorough testing, and improving documentation, the pull request can be enhanced to ensure a more secure, efficient, and maintainable codebase.

@mergify mergify bot merged commit 7a589fc into matrixorigin:1.2-dev May 14, 2024
18 of 19 checks passed
@aylei aylei mentioned this pull request Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement size/M Denotes a PR that changes [100,499] lines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants