sync to 1.2: adding in filter for the partition state rows scan #16077

gouhongshen · 2024-05-13T14:50:24Z

What type of PR is this?

Which issue(s) this PR fixes:

issue ##15775

What this PR does / why we need it:

support in filter to fast filter rows.

matrix-meow · 2024-05-13T14:52:59Z

@gouhongshen Thanks for your contributions!

Here are review comments for file pkg/vm/engine/disttae/logtailreplay/rows_iter.go:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating that the changes are related to syncing to version 1.2 and adding a filter for partition state rows scan.

Body:

The body of the pull request provides information on the type of PR (Improvement), the related issue (#15775), and the purpose of the changes, which is to support a filter for fast filtering rows.

Changes in `pkg/vm/engine/disttae/logtailreplay/rows_iter.go`:

Code Optimization:
- The addition of the primaryIndex field in the primaryKeyIter struct seems to be a useful optimization to avoid unnecessary copying of the primary index.
- The introduction of the phase type and constants (scan, seek, judge) along with the Move method in PrimaryKeyMatchSpec enhances the readability and maintainability of the code.
- The refactoring of the Move method in PrimaryKeyMatchSpec to handle different matching scenarios (Exact, Prefix, ExactIn) improves code organization and clarity.
Security Issue:
- The use of the first variable in the Move methods of PrimaryKeyMatchSpec could lead to unexpected behavior in a concurrent environment. It should be handled carefully to ensure thread safety. Consider using a mutex or atomic operations to manage the first variable.
Code Quality:
- The MinMax function in PrimaryKeyMatchSpec is incomplete and returns an empty PrimaryKeyMatchSpec{}. This should be implemented according to the intended functionality to avoid confusion and potential bugs.
Optimization:
- The ExactIn function in PrimaryKeyMatchSpec could be optimized further by refactoring the logic within the Move method to improve performance and readability. Consider simplifying the logic and reducing the number of nested conditions for better maintainability.
Consistency:
- Ensure consistent naming conventions and formatting throughout the file to maintain code readability and consistency.

Suggestions for Improvement:

Security Enhancement:
- Address the potential thread safety issue related to the first variable by implementing a thread-safe solution such as using synchronization mechanisms like mutexes or atomic operations.
Completeness:
- Implement the logic for the MinMax function in PrimaryKeyMatchSpec to ensure that all functions are fully functional and consistent.
Optimization:
- Refactor the ExactIn function to improve the efficiency and readability of the code, possibly by simplifying the logic and reducing nested conditions.
Consistency:
- Ensure consistent naming conventions, formatting, and code style throughout the file to enhance code maintainability and readability.

By addressing the security issue, completing the MinMax function, optimizing the ExactIn function, and ensuring code consistency, the pull request can be further improved in terms of security, functionality, and maintainability.

Here are review comments for file pkg/vm/engine/disttae/reader.go:

Pull Request Review:

Title and Body:

The title of the pull request is clear and informative, indicating that the changes are related to syncing to version 1.2 and adding a filter for partition state rows scan. The body of the pull request provides additional context by mentioning the issue it addresses and the need for the changes. It also specifies the type of PR as an improvement.

Changes in `pkg/vm/engine/disttae/reader.go`:

Redundant Code Removal:
- Lines 53 and 81: The removal of mixin.columns.rowidPos = -1 is repeated in the reset and tryUpdateColumns functions. This indicates redundancy and can be removed from both places to avoid confusion and unnecessary code duplication.
Variable Renaming:
- Lines 581 and 583: The variable encodedPrimaryKey has been renamed to pkVal. While this change may be acceptable, it's essential to ensure that the new variable name accurately reflects its purpose and usage throughout the codebase.
Logic Improvement:
- Lines 581 and 583: The condition len(r.pkVal) > 0 is used to check the length of pkVal. This change seems reasonable, but it's crucial to ensure that this logic aligns with the intended functionality and doesn't introduce any unintended consequences.
Security Concern (Potential):
- Line 581: When handling sensitive data like primary keys (pkVal), it's crucial to consider potential security implications. Ensure that the handling of pkVal is done securely to prevent any data leakage or vulnerabilities. Consider encryption or secure handling mechanisms if necessary.
Optimization Suggestion:
- Line 581: Instead of directly accessing the length of pkVal multiple times, consider storing the length in a variable if it's used repeatedly. This can improve code readability and potentially optimize performance by avoiding redundant calculations.
Code Readability:
- Line 589: The use of logtailreplay.Prefix(r.pkVal) should be well-documented to clarify its purpose and ensure that future developers understand its functionality without ambiguity.
Consistency Check:
- Throughout the changes, ensure consistency in variable naming, formatting, and commenting to maintain code readability and maintainability.

Overall Suggestions:

Review the security implications of handling sensitive data like primary keys (pkVal) and implement necessary security measures.
Remove redundant code snippets to improve code clarity and maintainability.
Ensure that variable names accurately reflect their purpose to avoid confusion.
Consider optimizing code by storing frequently used values in variables to enhance performance.
Document any complex or critical logic to aid future maintenance and understanding.

By addressing these points, the pull request can be enhanced in terms of code quality, security, and maintainability.

Here are review comments for file pkg/vm/engine/disttae/txn_table.go:

Pull Request Review:

Title and Body:

The title of the pull request is clear and concise, indicating that the changes are related to syncing to version 1.2 and adding a filter for partition state rows scan. The body of the pull request provides relevant information about the type of PR and the issue it fixes, which is helpful for context.

Changes in pkg/vm/engine/disttae/txn_table.go:

Function Naming and Parameters:
- The function makeEncodedPK has been renamed to tryExtractPKFilter, which is more descriptive of its purpose.
- The parameters of the NewReader and newMergeReader functions have been refactored to include pkValue and fnType, which improves readability and clarity of the code.
Security Concern - Potential SQL Injection:
- The function `tryExtract

Here are review comments for file pkg/vm/engine/disttae/types.go:

Pull Request Review:

Title and Body:

The title of the pull request is clear and concise, indicating that the changes are related to syncing to version 1.2 and adding a filter for partition state rows scan. The body of the pull request provides information about the type of PR, the related issue, and the purpose of the changes. It would be beneficial to provide more detailed information about the specific improvements made in this PR to enhance clarity.

Changes in types.go:

Redundant Code Removal:
- The removal of rowidPos int from the withFilterMixin struct is a positive change as it eliminates redundant code that was not being used. This simplifies the structure and reduces unnecessary complexity.
Variable Renaming:
- Renaming encodedPrimaryKey to pkVal in the blockMergeReader struct is a good practice for improving code readability. However, it would be beneficial to ensure that the new variable name accurately reflects its purpose and usage in the context of the codebase.
Consistency in Comments:
- It is recommended to update the comments in the code to reflect the changes accurately. For example, the comment mentioning rowidPos should be removed or updated to reflect the removal of the variable.

Suggestions for Improvement:

Documentation Update:
- It would be helpful to update the code comments and documentation to reflect the changes made in this pull request. This ensures that future developers understand the purpose and functionality of the modified code.
Variable Naming Clarity:
- Ensure that the variable names (pkPos, pkVal) are clear and descriptive to avoid confusion and improve code maintainability.
Code Optimization:
- Consider reviewing other parts of the codebase for similar redundant variables or structures that can be removed to streamline the code and improve its efficiency.
Security Consideration:
- While the changes in this pull request do not introduce security vulnerabilities, it is essential to conduct a thorough security review of the entire codebase to identify and address any potential security risks.

Overall, the changes in this pull request are focused on code cleanup and minor optimizations. By addressing the suggestions mentioned above and ensuring consistency in code quality across the project, the codebase can be further improved in terms of readability, maintainability, and efficiency.

Here are review comments for file test/distributed/cases/disttae_filters/ranges_filters/ranges_filters.result:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating that the changes are related to syncing to version 1.2 and adding a filter for partition state rows scan.

Body:

The body of the pull request provides information about the type of PR (Improvement), the related issue (#15775), and the purpose of the changes, which is to support filtering to fast filter rows.

Changes:

The changes in the pull request involve modifying a file named ranges_filters.result in the test/distributed/cases/disttae_filters/ranges_filters/ directory. However, the diff provided does not show any actual code changes, only a file rename from ranges_filters.result to disttae_filters/ranges_filters/ranges_filters.result.

Feedback:

File Path Typo:
- The file path in the diff seems to have a typo. It shows a change from test/distributed/cases/ranges_filters/ranges_filters.result to test/distributed/cases/disttae_filters/ranges_filters/ranges_filters.result. The correct directory name should be dist**r**ibuted instead of dist**t**ae_filters. This typo needs to be corrected.
Missing Code Changes:
- The diff provided does not show any actual code changes within the file. It is important to ensure that the changes made in the codebase are included in the pull request diff for review.
Security Concerns:
- Since the diff provided does not show the actual code changes, it is not possible to assess any potential security implications. It is crucial to review the actual code modifications to ensure that no security vulnerabilities are introduced.

Suggestions:

Correct File Path Typo:
- Update the file path in the diff to reflect the correct directory name distributed instead of disttae_filters.
Include Code Changes:
- Ensure that the actual code changes are included in the pull request diff for review. This will allow for a thorough assessment of the modifications made.
Security Review:
- Conduct a detailed review of the code changes to identify and address any potential security concerns or vulnerabilities that may have been introduced.

By addressing the file path typo, including the actual code changes in the diff, and conducting a security review, the pull request can be improved in terms of accuracy, completeness, and security considerations.

Here are review comments for file test/distributed/cases/disttae_filters/ranges_filters/ranges_filters.sql:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating that the changes are related to syncing to version 1.2 and adding a filter for partition state rows scan.

Body:

The body of the pull request provides information about the type of PR, the related issue, and the purpose of the changes. It specifies that this PR is an improvement and addresses issue #15775 by adding support for a filter to fast filter rows.

Changes:

The changes in the pull request involve modifying the file test/distributed/cases/ranges_filters/ranges_filters.sql. However, the diff provided does not show any actual code changes, only a file path change from ranges_filters to disttae_filters. This seems to be a typo in the file path.

Feedback and Suggestions:

File Path Typo:
- The file path in the diff seems to have a typo. It changes from ranges_filters to disttae_filters, which might be incorrect. It is recommended to double-check the file path changes and ensure they are accurate.
Code Changes:
- Since the diff does not show any actual code changes, it is essential to verify if the intended changes have been correctly included in the pull request. If there are code changes related to adding a filter for partition state rows scan, ensure they are visible in the diff.
Optimization:
- If the changes involve adding a new filter for faster row filtering, consider providing comments or documentation within the code to explain the purpose and functionality of the filter. This will help maintain code clarity and assist other developers in understanding the implementation.
Testing:
- After verifying the code changes, ensure that appropriate testing is conducted to validate the functionality of the added filter. Adding test cases to cover different scenarios will help ensure the reliability of the new feature.
Collaboration:
- Collaborate with team members or reviewers to confirm the correctness of the file path changes and the presence of the expected code modifications. This collaborative effort can help in catching any oversights or errors before merging the pull request.

By addressing the file path typo, verifying the code changes, adding documentation, conducting thorough testing, and collaborating with team members, the quality and effectiveness of the pull request can be enhanced.

Here are review comments for file test/distributed/cases/disttae_filters/reader_filters/partition_state_primary_key_index_filter.result:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating that the changes are related to syncing to version 1.2 and adding a filter for partition state rows scan.

Body:

The body of the pull request provides relevant information about the type of PR (Improvement), the linked issue (#15775), and the purpose of the changes, which is to support a filter for fast filtering rows.

Changes:

The changes in the file partition_state_primary_key_index_filter.result involve creating, inserting, selecting, and dropping tables (t1, t2, t3, t4) in a test scenario.
Various SQL queries are performed on these tables to test filtering based on primary keys and indexes.
The test includes operations like selecting rows based on specific values, deleting rows, inserting new rows, and checking the results.
The test script ends with dropping tables and the test database.

Feedback and Suggestions:

Security Issue - SQL Injection Risk:
- The use of dynamic SQL queries with execute using in the test script (execute s1 using @a,@a,@a,@a, @b,@b,@b,@b, @c,@c,@c,@c, @d,@d,@d,@d, @e,@e,@e,@e;) poses a security risk for SQL injection attacks. It's recommended to use parameterized queries to prevent this vulnerability.
Optimization:
- Consider optimizing the test script by breaking it into smaller, more focused tests. This can improve readability, maintainability, and ease of debugging.
End of File Indicator:
- The file seems to lack a newline at the end. It's good practice to have a newline at the end of files to adhere to common conventions.
Consistency in SQL Formatting:
- Ensure consistent formatting of SQL queries throughout the script for better readability and maintainability.
Documentation:
- Consider adding comments or documentation within the test script to explain the purpose of each test case and the expected outcomes.
Cleanup Operations:
- Ensure that all cleanup operations (like dropping tables and databases) are performed correctly to avoid leaving unnecessary artifacts in the test environment.
Testing Edge Cases:
- It might be beneficial to include test cases that cover edge scenarios to ensure robustness and reliability of the filtering functionality.
Automated Testing:
- Consider automating these test cases to run as part of the continuous integration (CI) process to catch any regressions early.

By addressing the security issue, optimizing the test script, ensuring consistency, adding documentation, and improving testing coverage, the quality and reliability of the codebase can be enhanced.

Here are review comments for file test/distributed/cases/disttae_filters/reader_filters/partition_state_primary_key_index_filter.sql:

Pull Request Review:

Title:

The title of the pull request is clear and concise, indicating that the changes are related to syncing to version 1.2 and adding a filter for partition state rows scan.

Body:

The body of the pull request provides information about the type of PR (Improvement), the linked issue (#15775), and the purpose of the changes, which is to support a filter for fast filtering rows.

Changes:

The changes in the partition_state_primary_key_index_filter.sql file involve creating, inserting data into, and querying multiple tables (t1, t2, t3, t4) with various scenarios and conditions.
The script includes operations like creating databases, tables, inserting data, selecting data based on primary keys and indexes, using IN clause, preparing statements for insert, and dropping tables and databases.

Feedback and Suggestions:

Security Concerns:
- The script contains hardcoded values for sensitive operations like database credentials (create database, drop database). It's important to avoid exposing sensitive information in scripts. Consider using environment variables or secure storage mechanisms for such data.
- Avoid using plain text passwords or sensitive information directly in scripts to prevent security vulnerabilities.
Code Optimization:
- Consider parameterizing the script to make it more reusable and secure. This can help prevent SQL injection attacks and improve performance.
- Break down the script into smaller, more manageable parts or functions for better readability and maintainability.
- Use transactions where necessary to ensure data consistency and integrity during operations like inserts and deletes.
Testing:
- Ensure comprehensive testing of the script with different data scenarios to validate its functionality and performance.
- Consider adding comments to explain the purpose of each section or operation in the script for better understanding by other developers.
Cleanup:
- Remove unnecessary operations or queries that do not directly contribute to the testing or demonstration of the filter functionality to keep the script focused and concise.
- Avoid using @ignore annotations without a clear reason, as it may lead to confusion or oversight during code reviews or testing.
Documentation:
- Provide inline comments or a separate documentation section to explain the purpose of the script, the expected outcomes of each operation, and any dependencies or prerequisites required for running the script successfully.

By addressing the security concerns, optimizing the code for performance and readability, thorough testing, and improving documentation, the pull request can be enhanced to ensure a more secure, efficient, and maintainable codebase.

XuPeng-SH and others added 14 commits May 13, 2024 22:45

remove 1

687f429

update

813c9a3

in-memory-reader in expr

46d9b97

exact in, tpcc not found, not finish

d9fa631

tpcc done

dce56e8

serial_full(a,b) done

5323b11

in done

c88284c

fix len(vec) > 1, pkval should be nil

8ff1759

fix sca

4d7953f

fix bvt

b16b798

fix bvt

5292913

add more bvt in case

ff707a6

fix bvt

2bd2e52

fix bvt

9611425

gouhongshen requested review from reusee, triump2020, XuPeng-SH, aressu1985 and heni02 as code owners May 13, 2024 14:50

matrix-meow added the size/M Denotes a PR that changes [100,499] lines label May 13, 2024

mergify bot requested a review from sukki37 May 13, 2024 14:51

mergify bot added the kind/enhancement label May 13, 2024

XuPeng-SH approved these changes May 13, 2024

View reviewed changes

reusee approved these changes May 14, 2024

View reviewed changes

heni02 approved these changes May 14, 2024

View reviewed changes

sukki37 approved these changes May 14, 2024

View reviewed changes

Merge branch '1.2-dev' into local-1.2-dev

058359a

mergify bot merged commit 7a589fc into matrixorigin:1.2-dev May 14, 2024
18 of 19 checks passed

aylei mentioned this pull request Jun 5, 2024

1.2 dev #16688

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync to 1.2: adding in filter for the partition state rows scan #16077

sync to 1.2: adding in filter for the partition state rows scan #16077

gouhongshen commented May 13, 2024 •

edited

Loading

matrix-meow commented May 13, 2024

sync to 1.2: adding in filter for the partition state rows scan #16077

sync to 1.2: adding in filter for the partition state rows scan #16077

Conversation

gouhongshen commented May 13, 2024 • edited Loading

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

matrix-meow commented May 13, 2024

Pull Request Review:

Title:

Body:

Changes in pkg/vm/engine/disttae/logtailreplay/rows_iter.go:

Suggestions for Improvement:

Pull Request Review:

Title and Body:

Changes in pkg/vm/engine/disttae/reader.go:

Overall Suggestions:

Pull Request Review:

Title and Body:

Changes in pkg/vm/engine/disttae/txn_table.go:

Pull Request Review:

Title and Body:

Changes in types.go:

Suggestions for Improvement:

Pull Request Review:

Title:

Body:

Changes:

Feedback:

Suggestions:

Pull Request Review:

Title:

Body:

Changes:

Feedback and Suggestions:

Pull Request Review:

Title:

Body:

Changes:

Feedback and Suggestions:

Pull Request Review:

Title:

Body:

Changes:

Feedback and Suggestions:

gouhongshen commented May 13, 2024 •

edited

Loading

Changes in `pkg/vm/engine/disttae/logtailreplay/rows_iter.go`:

Changes in `pkg/vm/engine/disttae/reader.go`: