Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let HashProbe keep track of memory consumption when listing join results #10652

Closed
wants to merge 1 commit into from

Conversation

tanjialiang
Copy link
Contributor

@tanjialiang tanjialiang commented Aug 2, 2024

Hash probe currently has limited memory control when extracting results from the hash table. When a small number of large sized rows from the build side is frequently joined with the left side, the total extracted size will explode, making HashProbe using a large amount of memory. And the process of filling output is not in spillable state, and will often cause OOM.
This PR computes the total size when listing join results in hash probe if there are any variable size columns from the build side that is going to be extracted. It stops listing further when it reaches the maximum size. This can help to control hash probe side memory usage to a confined limit.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 2, 2024
Copy link

netlify bot commented Aug 3, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 89aa01e
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66ba9fa6f7bf790008b1bf1b

@tanjialiang tanjialiang changed the title Temp HashProb PR [WIP] Temp HashProb PR Aug 5, 2024
@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@tanjialiang tanjialiang force-pushed the hash_probe branch 2 times, most recently from 43b1e46 to 8e118f0 Compare August 5, 2024 17:24
@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@tanjialiang tanjialiang changed the title [WIP] Temp HashProb PR Let HashProbe keep track of memory consumption when listing join results Aug 5, 2024
@tanjialiang tanjialiang marked this pull request as ready for review August 5, 2024 21:24
@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

uint64_t totalBytes{0};
for (const auto& column : columns) {
if (!rows_->columnTypes()[column]->isFixedWidth()) {
totalBytes += rows_->variableSizeAt(row, column);
Copy link
Contributor

@Yuhta Yuhta Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little worried about performance implication on this line. Usually we don't load the row container memory while listing the join results, so the memory is cold and reading from it takes very long time. Is it possible to do some row size estimation based on total size in row container and adjust the number of listing rows smartly? It is less accurate but will have better performance.

Copy link
Contributor Author

@tanjialiang tanjialiang Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Yuhta , yeah there is going to be some regression, but only on list results part. Quickly after this list results, we are going to do memory copy which is going to be considerably more expensive than list results, making the regression less significant overall. Row size estimation based on total size in row container might not work in this case because we don't know which build side row the probe side is going to match. It could happen that all probe side rows are matching with a few super large build side rows (hence significant skew).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can try to use the average row size of the build size. Skewness on the build side is hard to solve in this case, but do we run into that in real workload?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, real production workload leads to this improvement.

Copy link
Contributor

@Yuhta Yuhta Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we going over the row container to get the maximum row size and use that to adjust max num rows out? That way we don't need to stride through the memory twice for each probe and keep string columns on fast path.

}
if (varSizeColumns.empty() && !hasDuplicates_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another regression here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify a bit on this part? Is it the row counts limit in fast path?
Maybe we can run some shadow benchmark to see how big of an impact it is to the overall performance?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This disable the fast path when string is present. The overall impact is probably small as hash join is not majority of the computation, but on individual queries this can be bad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean listJoinResultsNoDuplicates is fast path compared with duplicate case? Thanks!

template <bool ignoreNullKeys>
int32_t HashTable<ignoreNullKeys>::listJoinResults(
const std::vector<vector_size_t>& listColumns,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put listColumns in JoinResultIterator?

}
if (varSizeColumns.empty() && !hasDuplicates_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean listJoinResultsNoDuplicates is fast path compared with duplicate case? Thanks!

int32_t numOut = 0;
auto maxOut = inputRows.size();
auto maxOut = std::min(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto maxOut

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanjialiang thanks for the update % minors.

velox/exec/RowContainer.h Show resolved Hide resolved
velox/exec/HashTable.h Outdated Show resolved Hide resolved
velox/exec/HashTable.h Show resolved Hide resolved
velox/exec/HashProbe.cpp Outdated Show resolved Hide resolved
velox/exec/HashProbe.cpp Outdated Show resolved Hide resolved
velox/exec/HashTable.cpp Outdated Show resolved Hide resolved
velox/exec/HashTable.cpp Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@tanjialiang tanjialiang force-pushed the hash_probe branch 2 times, most recently from b0d3392 to d6d068a Compare August 11, 2024 08:09
@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanjialiang thanks for the update!

velox/exec/tests/RowContainerTest.cpp Outdated Show resolved Hide resolved
velox/exec/HashTable.h Outdated Show resolved Hide resolved
int32_t numOut = 0;
auto maxOut = inputRows.size();
const auto maxOut = std::min(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this function only applies for the case: (1) table has no rows with duplicate join keys, (2) all list columns are fixed size?
Shall we rename to

s/listJoinResultsNoDuplicates/listJoinResultsWithFixSizeColumnsAndNoDuplicateJoinKeys/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just call it listJoinResultsFastPath and let comments do the explanation.

velox/exec/HashTable.cpp Outdated Show resolved Hide resolved
velox/exec/HashTable.cpp Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanjialiang thanks for the iterations. LGTM!

velox/exec/benchmarks/HashJoinListResultBenchmark.cpp Outdated Show resolved Hide resolved
velox/exec/HashProbe.cpp Show resolved Hide resolved
velox/exec/HashTable.h Outdated Show resolved Hide resolved
velox/exec/HashTable.h Outdated Show resolved Hide resolved
velox/exec/HashTable.h Outdated Show resolved Hide resolved
velox/exec/HashTable.h Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@tanjialiang tanjialiang force-pushed the hash_probe branch 2 times, most recently from a3013ee to fb39a3a Compare August 12, 2024 23:48
@facebook-github-bot
Copy link
Contributor

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@tanjialiang merged this pull request in 82e5492.

Copy link

Conbench analyzed the 1 benchmark run on commit 82e54926.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

jinchengchenghh added a commit to jinchengchenghh/velox that referenced this pull request Aug 15, 2024
@jinchengchenghh
Copy link
Contributor

Our TPCDS performance regression nearly 5%, do we need to re-consider this feature?
All the small query slows and q24a and q24b slows very much. After revert this one, performance recovered. CC @zhouyuan @FelixYBW

<style> </style>
query log/native_master_08_14_2024_time.csv log/native_master_08_13_2024_time.csv difference percentage
q1 8.71 9.65 0.933 110.70%
q2 7.52 7.36 -0.159 97.88%
q3 2.49 2.69 0.198 107.94%
q4 48.51 47.43 -1.079 97.77%
q5 6.05 5.85 -0.200 96.70%
q6 2.40 3.14 0.734 130.55%
q7 3.98 4.06 0.088 102.21%
q8 3.15 2.28 -0.877 72.20%
q9 13.32 12.04 -1.274 90.44%
q10 8.27 8.36 0.097 101.17%
q11 26.99 26.55 -0.436 98.38%
q12 1.81 1.00 -0.804 55.53%
q13 3.76 4.57 0.815 121.69%
q14a 31.70 31.37 -0.326 98.97%
q14b 29.85 29.91 0.054 100.18%
q15 2.05 2.64 0.593 128.94%
q16 6.45 5.29 -1.158 82.05%
q17 3.81 3.85 0.042 101.09%
q18 5.93 5.30 -0.637 89.26%
q19 1.76 1.73 -0.030 98.31%
q20 0.97 0.88 -0.082 91.52%
q21 0.59 0.59 -0.005 99.17%
q22 3.16 2.15 -1.006 68.17%
q23a 56.83 55.69 -1.146 97.98%
q23b 70.50 69.36 -1.142 98.38%
q24a 87.19 70.75 -16.432 81.15%
q24b 80.95 64.90 -16.047 80.18%
q25 3.26 3.34 0.085 102.60%
q26 1.88 1.85 -0.029 98.45%
q27 2.73 1.96 -0.768 71.90%
q28 16.38 16.21 -0.174 98.94%
q29 5.87 6.05 0.181 103.08%
q30 4.10 3.67 -0.433 89.45%
q31 4.87 4.65 -0.217 95.54%
q32 1.51 0.92 -0.586 61.08%
q33 1.60 1.51 -0.084 94.73%
q34 1.84 1.99 0.149 108.12%
q35 4.99 5.04 0.059 101.17%
q36 1.79 2.53 0.735 140.99%
q37 2.65 2.52 -0.129 95.14%
q38 9.83 9.73 -0.095 99.03%
q39a 2.76 2.62 -0.140 94.93%
q39b 2.41 2.35 -0.058 97.60%
q40 2.62 2.24 -0.380 85.49%
q41 0.35 0.36 0.015 104.45%
q42 0.46 0.39 -0.073 84.16%
q43 1.82 1.70 -0.126 93.09%
q44 5.62 5.06 -0.553 90.16%
q45 2.20 2.20 -0.005 99.75%
q46 2.36 2.50 0.142 106.00%
q47 10.18 9.37 -0.811 92.04%
q48 2.47 2.46 -0.008 99.66%
q49 3.70 3.50 -0.207 94.42%
q50 16.71 17.36 0.649 103.88%
q51 5.65 5.68 0.028 100.49%
q52 0.57 0.60 0.032 105.61%
q53 1.76 1.08 -0.675 61.62%
q54 2.49 2.43 -0.059 97.63%
q55 0.56 0.55 -0.008 98.60%
q56 1.39 1.39 -0.000 99.98%
q57 6.68 5.87 -0.808 87.91%
q58 1.99 1.73 -0.257 87.10%
q59 3.64 3.36 -0.283 92.22%
q60 2.19 1.99 -0.203 90.73%
q61 1.78 1.82 0.045 102.52%
q62 2.70 2.48 -0.218 91.92%
q63 1.08 1.82 0.747 169.46%
q64 25.77 25.35 -0.428 98.34%
q65 10.76 10.24 -0.514 95.22%
q66 2.03 2.31 0.280 113.81%
q67 63.05 63.32 0.276 100.44%
q68 2.07 1.98 -0.097 95.32%
q69 4.47 4.88 0.408 109.12%
q70 4.80 4.52 -0.276 94.25%
q71 1.78 1.71 -0.065 96.36%
q72 19.42 18.22 -1.200 93.82%
q73 1.46 1.49 0.024 101.64%
q74 16.75 16.16 -0.587 96.49%
q75 18.99 18.97 -0.018 99.91%
q76 5.86 5.76 -0.105 98.20%
q77 1.13 1.24 0.110 109.70%
q78 32.34 32.77 0.433 101.34%
q79 2.91 2.69 -0.217 92.53%
q80 7.73 8.42 0.685 108.86%
q81 4.18 4.20 0.022 100.52%
q82 5.61 5.18 -0.433 92.29%
q83 0.98 0.96 -0.027 97.29%
q84 1.95 2.21 0.258 113.21%
q85 4.52 4.31 -0.210 95.35%
q86 1.92 1.65 -0.272 85.82%
q87 10.15 9.91 -0.234 97.70%
q88 15.03 12.08 -2.945 80.40%
q89 1.76 1.73 -0.031 98.23%
q90 1.33 1.76 0.428 132.15%
q91 1.83 1.74 -0.098 94.66%
q92 0.80 0.79 -0.007 99.09%
q93 22.93 23.05 0.124 100.54%
q94 8.05 8.30 0.247 103.07%
q95 54.02 53.64 -0.380 99.30%
q96 1.90 1.70 -0.202 89.39%
q97 9.91 9.80 -0.111 98.88%
q98 1.60 1.61 0.010 100.63%
q99 5.35 5.41 0.059 101.09%
total 1043.70 994.73 -48.966 95.31%

@Yuhta
Copy link
Contributor

Yuhta commented Aug 15, 2024

@tanjialiang Can you add the check to compare average vs max build row size and only do the per row size estimation if the max size is larger than say 2x of the average size? TPC benchmarks are important externally and we want to keep it stay SoTA

@tanjialiang
Copy link
Contributor Author

@jinchengchenghh This is an important fix for Meta internal traffic that prevents certain queries from OOMing. I will patch a fast path for the feature to improve the performance, as @Yuhta mentioned.

@jinchengchenghh
Copy link
Contributor

Thanks very much. @tanjialiang @Yuhta

@zhouyuan
Copy link
Contributor

@tanjialiang @Yuhta thanks for the quick turnaround. Please also note the benchmark results is done with TPCDS SF3000. Haven't done a larger scale test yet but based on experiences the performance drop may be bigger in that case

thanks, -yuan

zsmj2017 pushed a commit to zsmj2017/velox that referenced this pull request Aug 23, 2024
…lts (facebookincubator#10652)

Summary:
Hash probe currently has limited memory control when extracting results from the hash table. When a small number of large sized rows from the build side is frequently joined with the left side, the total extracted size will explode, making HashProbe using a large amount of memory. And the process of filling output is not in spillable state, and will often cause OOM.
This PR computes the total size when listing join results in hash probe if there are any variable size columns from the build side that is going to be extracted. It stops listing further when it reaches the maximum size. This can help to control hash probe side memory usage to a confined limit.

Pull Request resolved: facebookincubator#10652

Reviewed By: xiaoxmeng

Differential Revision: D60771773

Pulled By: tanjialiang

fbshipit-source-id: 2cb8c58ba795a0aa1df0485b58e4f6d0100be8f8
(cherry picked from commit 82e5492)
weiting-chen pushed a commit to oap-project/velox that referenced this pull request Sep 18, 2024
…lts (facebookincubator#10652) (#495)

Summary:
Hash probe currently has limited memory control when extracting results from the hash table. When a small number of large sized rows from the build side is frequently joined with the left side, the total extracted size will explode, making HashProbe using a large amount of memory. And the process of filling output is not in spillable state, and will often cause OOM.
This PR computes the total size when listing join results in hash probe if there are any variable size columns from the build side that is going to be extracted. It stops listing further when it reaches the maximum size. This can help to control hash probe side memory usage to a confined limit.

Pull Request resolved: facebookincubator#10652

Reviewed By: xiaoxmeng

Differential Revision: D60771773

Pulled By: tanjialiang

fbshipit-source-id: 2cb8c58ba795a0aa1df0485b58e4f6d0100be8f8
(cherry picked from commit 82e5492)

Co-authored-by: Jialiang Tan <[email protected]>
NEUpanning pushed a commit to NEUpanning/velox that referenced this pull request Oct 10, 2024
…m-01' into 'rebase-upstream-1.2.x-vcpkg'

Let HashProbe keep track of memory consumption when listing join results (facebookincubator#10652)

Summary:
Hash probe currently has limited memory control when extracting results from the hash table. When a small number of large sized rows from the build side is frequently joined with the left side, the total extracted size will explode, making HashProbe using a large amount of memory. And the process of filling output is not in spillable state, and will often cause OOM.
This PR computes the total size when listing join results in hash probe if there are any variable size columns from the build side that is going to be extracted. It stops listing further when it reaches the maximum size. This can help to control hash probe side memory usage to a confined limit.

PR Link: https://dev.sankuai.com/code/repo-detail/data/velox/pr/42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants