Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix scope.ParallelRun bug which cause hung when error happens. #16452

Merged
merged 6 commits into from
May 29, 2024

Conversation

m-schen
Copy link
Contributor

@m-schen m-schen commented May 28, 2024

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #16439

What this PR does / why we need it:

修复了一个在pipeline执行前发生错误会导致其他已启动的pipeline(如各种带有收数据功能的pipeline)卡死的问题。


PR Type

Bug fix


Description

  • Refactored the ParallelRun function in pkg/sql/compile/scope.go to handle different pipeline types (JOIN, LOAD, SCAN) by introducing helper functions buildJoinParallelRun, buildLoadParallelRun, and buildScanParallelRun.
  • Removed the remote parameter from the ParallelRun function and updated its calls accordingly.
  • Reorganized import statements in pkg/sql/compile/scope.go.

Changes walkthrough 📝

Relevant files
Bug fix
scope.go
Refactor ParallelRun and add helper functions for different pipeline
types.

pkg/sql/compile/scope.go

  • Refactored ParallelRun to handle different pipeline types (JOIN, LOAD,
    SCAN).
  • Added helper functions buildJoinParallelRun, buildLoadParallelRun, and
    buildScanParallelRun.
  • Removed the remote parameter from ParallelRun.
  • Updated import statements and reorganized them.
  • +364/-318
    scopeRemoteRun.go
    Remove `remote` parameter from `ParallelRun` call.             

    pkg/sql/compile/scopeRemoteRun.go

    • Removed the remote parameter from the ParallelRun call.
    +1/-1     

    💡 PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    Copy link

    PR Review 🔍

    ⏱️ Estimated effort to review [1-5]

    4, because the PR involves complex refactoring of the ParallelRun function with multiple helper functions and conditions. It requires a deep understanding of the existing codebase and the specific functionalities being refactored.

    🧪 Relevant tests

    No

    ⚡ Possible issues

    Possible Bug: The refactoring introduces multiple helper functions like buildJoinParallelRun, buildLoadParallelRun, and buildScanParallelRun. There is a risk of introducing bugs if these functions do not handle all edge cases or maintain the original logic accurately.

    Performance Concern: The changes in how parallelism is handled (e.g., determining the number of CPUs to use) could impact performance. It's crucial to ensure that performance does not degrade, especially in different deployment environments that may have varying hardware configurations.

    🔒 Security concerns

    No

    @matrix-meow matrix-meow added the size/L Denotes a PR that changes [500,999] lines label May 28, 2024
    @mergify mergify bot added the kind/bug Something isn't working label May 28, 2024
    …pipeline start.
    
    add parallel run 2
    
    rewrite ParallelRun
    Copy link

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Best practice
    Add error logging for the ParallelRun function to aid in debugging

    Consider adding a log statement or error handling mechanism to capture and log the error
    returned by s.ParallelRun(c). This will help in debugging and understanding the cause of
    the error.

    pkg/sql/compile/scopeRemoteRun.go [215-216]

     err = s.ParallelRun(c)
    -if err == nil {
    +if err != nil {
    +    log.Errorf("ParallelRun failed: %v", err)
    +    return err
    +}
     
    Suggestion importance[1-10]: 8

    Why: The suggestion correctly identifies a missing error handling mechanism for the s.ParallelRun(c) method, which is crucial for debugging and maintaining robustness in error scenarios.

    8
    Possible issue
    Add a nil check for the context before setting it recursively

    Ensure that the context c.ctx is not nil before calling SetContextRecursively to prevent
    potential runtime panics.

    pkg/sql/compile/scopeRemoteRun.go [213]

    -s.SetContextRecursively(c.ctx)
    +if c.ctx != nil {
    +    s.SetContextRecursively(c.ctx)
    +} else {
    +    log.Warn("Context is nil, skipping SetContextRecursively")
    +}
     
    Suggestion importance[1-10]: 7

    Why: Adding a nil check before using the context can prevent runtime panics, which is a good practice for ensuring the stability of the application. However, the PR does not indicate issues with c.ctx being nil, so the suggestion is preventive rather than corrective.

    7

    @m-schen m-schen changed the title fix scope.ParallelRun to avoid hung when error happens. fix scope.ParallelRun bug which cause hung when error happens. May 28, 2024
    @m-schen
    Copy link
    Contributor Author

    m-schen commented May 28, 2024

    修改后的 parallel run 执行流程:

    1. 根据是否为join, 是否为scan等,生成可并行执行的pipeline。
    2. 执行pipeline.

    其中第一部分如果有任意报错,将会执行pipeline.cleanup确保其接受者等可以正常退出。

    @mergify mergify bot merged commit 16fb37e into matrixorigin:main May 29, 2024
    17 of 18 checks passed
    mergify bot pushed a commit that referenced this pull request May 30, 2024
    #16453)
    
    修复一个当pipeline启动前发生错误,导致pipeline hung住的bug.
    
    
    ___
    
    ### **PR Type**
    Bug fix
    
    
    ___
    
    ### **Description**
    - Refactored the `ParallelRun` function in `pkg/sql/compile/scope.go` to handle different pipeline types (JOIN, LOAD, SCAN) and removed the `remote` parameter.
    - Added new helper functions: `buildJoinParallelRun`, `buildLoadParallelRun`, and `buildScanParallelRun` to modularize the code.
    - Updated import statements in `pkg/sql/compile/scope.go` to remove duplicates and organize imports.
    - Updated the call to `ParallelRun` in `pkg/sql/compile/scopeRemoteRun.go` to match the new function signature.
    
    
    
    ___
    
    
    
    ### **Changes walkthrough** 📝
    <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Enhancement
    </strong></td><td><table>
    <tr>
    <td>
    <details>
    <summary><strong>scope.go</strong><dd><code>Refactor ParallelRun function and add helper functions for different </code><br><code>pipeline types.</code></dd></summary>
    <hr>
    
    pkg/sql/compile/scope.go
    
    <li>Removed the <code>remote</code> parameter from the <code>ParallelRun</code> function.<br> <li> Refactored the <code>ParallelRun</code> function to handle different pipeline types <br>(JOIN, LOAD, SCAN).<br> <li> Added new helper functions: <code>buildJoinParallelRun</code>, <br><code>buildLoadParallelRun</code>, and <code>buildScanParallelRun</code>.<br> <li> Updated import statements to remove duplicates and organize imports.<br>
    
    
    </details>
    
    
    </td>
    <td><a href="https://github.com/matrixorigin/matrixone/pull/16453/files#diff-1a84d311b058e390f8c70e84f5e0c59dbd42736a85d022e2a283d926d43f5bd6">+354/-308</a></td>
    
    </tr>
    
    <tr>
    <td>
    <details>
    <summary><strong>scopeRemoteRun.go</strong><dd><code>Update ParallelRun call to match new function signature.</code>&nbsp; </dd></summary>
    <hr>
    
    pkg/sql/compile/scopeRemoteRun.go
    
    <li>Updated the call to <code>ParallelRun</code> to remove the <code>remote</code> parameter.<br>
    
    
    </details>
    
    
    </td>
    <td><a href="https://github.com/matrixorigin/matrixone/pull/16453/files#diff-e597c9151caad687e737d3b97366e5451df63a8c21c9623500cbc737552445a0">+1/-1</a>&nbsp; &nbsp; &nbsp; </td>
    
    </tr>
    </table></td></tr></tr></tbody></table>
    
    ___
    
    > 💡 **PR-Agent usage**:
    >Comment `/help` on the PR to get a list of all available PR-Agent tools and their descriptions
    
    Approved by: @ouyuanning, @sukki37
    @aylei aylei mentioned this pull request Jun 5, 2024
    @m-schen m-schen deleted the fix-16439 branch June 18, 2024 02:34
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    Bug fix kind/bug Something isn't working Review effort [1-5]: 4 size/L Denotes a PR that changes [500,999] lines
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    4 participants