Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Gluten-1.2] Port #10652 to Branch-1.2 for Let HashProbe keep track of memory consumption when listing join results (#10652) #495

Merged
merged 1 commit into from
Sep 18, 2024

Conversation

zsmj2017
Copy link

Summary:
Hash probe currently has limited memory control when extracting results from the hash table. When a small number of large sized rows from the build side is frequently joined with the left side, the total extracted size will explode, making HashProbe using a large amount of memory. And the process of filling output is not in spillable state, and will often cause OOM. This PR computes the total size when listing join results in hash probe if there are any variable size columns from the build side that is going to be extracted. It stops listing further when it reaches the maximum size. This can help to control hash probe side memory usage to a confined limit.

Pull Request resolved: facebookincubator#10652

Reviewed By: xiaoxmeng

Differential Revision: D60771773

Pulled By: tanjialiang

fbshipit-source-id: 2cb8c58ba795a0aa1df0485b58e4f6d0100be8f8 (cherry picked from commit 82e5492)

…lts (facebookincubator#10652)

Summary:
Hash probe currently has limited memory control when extracting results from the hash table. When a small number of large sized rows from the build side is frequently joined with the left side, the total extracted size will explode, making HashProbe using a large amount of memory. And the process of filling output is not in spillable state, and will often cause OOM.
This PR computes the total size when listing join results in hash probe if there are any variable size columns from the build side that is going to be extracted. It stops listing further when it reaches the maximum size. This can help to control hash probe side memory usage to a confined limit.

Pull Request resolved: facebookincubator#10652

Reviewed By: xiaoxmeng

Differential Revision: D60771773

Pulled By: tanjialiang

fbshipit-source-id: 2cb8c58ba795a0aa1df0485b58e4f6d0100be8f8
(cherry picked from commit 82e5492)
@kecookier
Copy link

@zhouyuan Can you help review this?

@zhztheplayer
Copy link
Collaborator

The patch causes performance regression in newest Gluten. Are you already aware of that? @kecookier

@kecookier
Copy link

kecookier commented Aug 29, 2024

@zhztheplayer Thanks, I'm not clear about this situation yet. Let us hold on for now and consider merging after we have resolve it.

cc @zsmj2017

@weiting-chen
Copy link
Collaborator

The PR has run pass 333 velox tests with fail 1 test related to ParquetTableScanTest.timestampFilter test report empty parquet file.
The issue is related to facebookincubator#4680 and not related to this PR.
Approval to merge it.

@weiting-chen weiting-chen merged commit 9e22c2e into oap-project:branch-1.2 Sep 18, 2024
@zsmj2017 zsmj2017 deleted the branch-1.2-10652 branch September 19, 2024 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants