Let HashProbe keep track of memory consumption when listing join results #10652

tanjialiang · 2024-08-02T23:59:42Z

Hash probe currently has limited memory control when extracting results from the hash table. When a small number of large sized rows from the build side is frequently joined with the left side, the total extracted size will explode, making HashProbe using a large amount of memory. And the process of filling output is not in spillable state, and will often cause OOM.
This PR computes the total size when listing join results in hash probe if there are any variable size columns from the build side that is going to be extracted. It stops listing further when it reaches the maximum size. This can help to control hash probe side memory usage to a confined limit.

netlify · 2024-08-03T00:00:01Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`89aa01e`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/66ba9fa6f7bf790008b1bf1b

facebook-github-bot · 2024-08-05T16:39:29Z

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-08-05T17:24:52Z

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-08-05T21:25:30Z

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-08-05T23:41:41Z

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-08-05T23:52:25Z

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Yuhta · 2024-08-06T17:20:35Z

velox/exec/HashTable.cpp

+  uint64_t totalBytes{0};
+  for (const auto& column : columns) {
+    if (!rows_->columnTypes()[column]->isFixedWidth()) {
+      totalBytes += rows_->variableSizeAt(row, column);


I am a little worried about performance implication on this line. Usually we don't load the row container memory while listing the join results, so the memory is cold and reading from it takes very long time. Is it possible to do some row size estimation based on total size in row container and adjust the number of listing rows smartly? It is less accurate but will have better performance.

Thanks @Yuhta , yeah there is going to be some regression, but only on list results part. Quickly after this list results, we are going to do memory copy which is going to be considerably more expensive than list results, making the regression less significant overall. Row size estimation based on total size in row container might not work in this case because we don't know which build side row the probe side is going to match. It could happen that all probe side rows are matching with a few super large build side rows (hence significant skew).

We can try to use the average row size of the build size. Skewness on the build side is hard to solve in this case, but do we run into that in real workload?

Yeah, real production workload leads to this improvement.

How about we going over the row container to get the maximum row size and use that to adjust max num rows out? That way we don't need to stride through the memory twice for each probe and keep string columns on fast path.

Yuhta · 2024-08-06T19:35:05Z

velox/exec/HashTable.cpp

  }
+  if (varSizeColumns.empty() && !hasDuplicates_) {


There is another regression here

Could you clarify a bit on this part? Is it the row counts limit in fast path?
Maybe we can run some shadow benchmark to see how big of an impact it is to the overall performance?

This disable the fast path when string is present. The overall impact is probably small as hash join is not majority of the computation, but on individual queries this can be bad.

You mean listJoinResultsNoDuplicates is fast path compared with duplicate case? Thanks!

xiaoxmeng · 2024-08-08T17:21:54Z

velox/exec/HashTable.cpp

 template <bool ignoreNullKeys>
 int32_t HashTable<ignoreNullKeys>::listJoinResults(
+    const std::vector<vector_size_t>& listColumns,


Can we put listColumns in JoinResultIterator?

xiaoxmeng · 2024-08-08T17:29:13Z

velox/exec/HashTable.cpp

  }
+  if (varSizeColumns.empty() && !hasDuplicates_) {


You mean listJoinResultsNoDuplicates is fast path compared with duplicate case? Thanks!

xiaoxmeng · 2024-08-08T17:30:22Z

velox/exec/HashTable.cpp

  int32_t numOut = 0;
-  auto maxOut = inputRows.size();
+  auto maxOut = std::min(


const auto maxOut

xiaoxmeng

@tanjialiang thanks for the update % minors.

velox/exec/RowContainer.h

velox/exec/HashTable.h

velox/exec/HashProbe.cpp

velox/exec/HashTable.cpp

facebook-github-bot · 2024-08-10T23:35:46Z

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-08-11T08:11:05Z

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

xiaoxmeng

@tanjialiang thanks for the update!

velox/exec/tests/RowContainerTest.cpp

velox/exec/HashTable.h

xiaoxmeng · 2024-08-11T23:13:08Z

velox/exec/HashTable.cpp

  int32_t numOut = 0;
-  auto maxOut = inputRows.size();
+  const auto maxOut = std::min(


Now this function only applies for the case: (1) table has no rows with duplicate join keys, (2) all list columns are fixed size?
Shall we rename to

s/listJoinResultsNoDuplicates/listJoinResultsWithFixSizeColumnsAndNoDuplicateJoinKeys/

Maybe just call it listJoinResultsFastPath and let comments do the explanation.

velox/exec/HashTable.cpp

facebook-github-bot · 2024-08-12T04:41:28Z

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-08-12T04:55:15Z

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

xiaoxmeng

@tanjialiang thanks for the iterations. LGTM!

velox/exec/benchmarks/HashJoinListResultBenchmark.cpp

velox/exec/HashProbe.cpp

velox/exec/HashTable.h

facebook-github-bot · 2024-08-12T18:18:50Z

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-08-12T23:50:22Z

@tanjialiang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-08-13T03:11:52Z

@tanjialiang merged this pull request in 82e5492.

conbench-facebook · 2024-08-13T03:40:12Z

Conbench analyzed the 1 benchmark run on commit 82e54926.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

…oin results (facebookincubator#10652)" This reverts commit 82e5492.

jinchengchenghh · 2024-08-15T03:22:26Z

Our TPCDS performance regression nearly 5%, do we need to re-consider this feature?
All the small query slows and q24a and q24b slows very much. After revert this one, performance recovered. CC @zhouyuan @FelixYBW

query	log/native_master_08_14_2024_time.csv	log/native_master_08_13_2024_time.csv	difference	percentage
q1	8.71	9.65	0.933	110.70%
q2	7.52	7.36	-0.159	97.88%
q3	2.49	2.69	0.198	107.94%
q4	48.51	47.43	-1.079	97.77%
q5	6.05	5.85	-0.200	96.70%
q6	2.40	3.14	0.734	130.55%
q7	3.98	4.06	0.088	102.21%
q8	3.15	2.28	-0.877	72.20%
q9	13.32	12.04	-1.274	90.44%
q10	8.27	8.36	0.097	101.17%
q11	26.99	26.55	-0.436	98.38%
q12	1.81	1.00	-0.804	55.53%
q13	3.76	4.57	0.815	121.69%
q14a	31.70	31.37	-0.326	98.97%
q14b	29.85	29.91	0.054	100.18%
q15	2.05	2.64	0.593	128.94%
q16	6.45	5.29	-1.158	82.05%
q17	3.81	3.85	0.042	101.09%
q18	5.93	5.30	-0.637	89.26%
q19	1.76	1.73	-0.030	98.31%
q20	0.97	0.88	-0.082	91.52%
q21	0.59	0.59	-0.005	99.17%
q22	3.16	2.15	-1.006	68.17%
q23a	56.83	55.69	-1.146	97.98%
q23b	70.50	69.36	-1.142	98.38%
q24a	87.19	70.75	-16.432	81.15%
q24b	80.95	64.90	-16.047	80.18%
q25	3.26	3.34	0.085	102.60%
q26	1.88	1.85	-0.029	98.45%
q27	2.73	1.96	-0.768	71.90%
q28	16.38	16.21	-0.174	98.94%
q29	5.87	6.05	0.181	103.08%
q30	4.10	3.67	-0.433	89.45%
q31	4.87	4.65	-0.217	95.54%
q32	1.51	0.92	-0.586	61.08%
q33	1.60	1.51	-0.084	94.73%
q34	1.84	1.99	0.149	108.12%
q35	4.99	5.04	0.059	101.17%
q36	1.79	2.53	0.735	140.99%
q37	2.65	2.52	-0.129	95.14%
q38	9.83	9.73	-0.095	99.03%
q39a	2.76	2.62	-0.140	94.93%
q39b	2.41	2.35	-0.058	97.60%
q40	2.62	2.24	-0.380	85.49%
q41	0.35	0.36	0.015	104.45%
q42	0.46	0.39	-0.073	84.16%
q43	1.82	1.70	-0.126	93.09%
q44	5.62	5.06	-0.553	90.16%
q45	2.20	2.20	-0.005	99.75%
q46	2.36	2.50	0.142	106.00%
q47	10.18	9.37	-0.811	92.04%
q48	2.47	2.46	-0.008	99.66%
q49	3.70	3.50	-0.207	94.42%
q50	16.71	17.36	0.649	103.88%
q51	5.65	5.68	0.028	100.49%
q52	0.57	0.60	0.032	105.61%
q53	1.76	1.08	-0.675	61.62%
q54	2.49	2.43	-0.059	97.63%
q55	0.56	0.55	-0.008	98.60%
q56	1.39	1.39	-0.000	99.98%
q57	6.68	5.87	-0.808	87.91%
q58	1.99	1.73	-0.257	87.10%
q59	3.64	3.36	-0.283	92.22%
q60	2.19	1.99	-0.203	90.73%
q61	1.78	1.82	0.045	102.52%
q62	2.70	2.48	-0.218	91.92%
q63	1.08	1.82	0.747	169.46%
q64	25.77	25.35	-0.428	98.34%
q65	10.76	10.24	-0.514	95.22%
q66	2.03	2.31	0.280	113.81%
q67	63.05	63.32	0.276	100.44%
q68	2.07	1.98	-0.097	95.32%
q69	4.47	4.88	0.408	109.12%
q70	4.80	4.52	-0.276	94.25%
q71	1.78	1.71	-0.065	96.36%
q72	19.42	18.22	-1.200	93.82%
q73	1.46	1.49	0.024	101.64%
q74	16.75	16.16	-0.587	96.49%
q75	18.99	18.97	-0.018	99.91%
q76	5.86	5.76	-0.105	98.20%
q77	1.13	1.24	0.110	109.70%
q78	32.34	32.77	0.433	101.34%
q79	2.91	2.69	-0.217	92.53%
q80	7.73	8.42	0.685	108.86%
q81	4.18	4.20	0.022	100.52%
q82	5.61	5.18	-0.433	92.29%
q83	0.98	0.96	-0.027	97.29%
q84	1.95	2.21	0.258	113.21%
q85	4.52	4.31	-0.210	95.35%
q86	1.92	1.65	-0.272	85.82%
q87	10.15	9.91	-0.234	97.70%
q88	15.03	12.08	-2.945	80.40%
q89	1.76	1.73	-0.031	98.23%
q90	1.33	1.76	0.428	132.15%
q91	1.83	1.74	-0.098	94.66%
q92	0.80	0.79	-0.007	99.09%
q93	22.93	23.05	0.124	100.54%
q94	8.05	8.30	0.247	103.07%
q95	54.02	53.64	-0.380	99.30%
q96	1.90	1.70	-0.202	89.39%
q97	9.91	9.80	-0.111	98.88%
q98	1.60	1.61	0.010	100.63%
q99	5.35	5.41	0.059	101.09%
total	1043.70	994.73	-48.966	95.31%

Yuhta · 2024-08-15T20:33:22Z

@tanjialiang Can you add the check to compare average vs max build row size and only do the per row size estimation if the max size is larger than say 2x of the average size? TPC benchmarks are important externally and we want to keep it stay SoTA

tanjialiang · 2024-08-15T21:09:17Z

@jinchengchenghh This is an important fix for Meta internal traffic that prevents certain queries from OOMing. I will patch a fast path for the feature to improve the performance, as @Yuhta mentioned.

jinchengchenghh · 2024-08-15T23:42:28Z

Thanks very much. @tanjialiang @Yuhta

zhouyuan · 2024-08-16T00:25:17Z

@tanjialiang @Yuhta thanks for the quick turnaround. Please also note the benchmark results is done with TPCDS SF3000. Haven't done a larger scale test yet but based on experiences the performance drop may be bigger in that case

thanks, -yuan

…lts (facebookincubator#10652) Summary: Hash probe currently has limited memory control when extracting results from the hash table. When a small number of large sized rows from the build side is frequently joined with the left side, the total extracted size will explode, making HashProbe using a large amount of memory. And the process of filling output is not in spillable state, and will often cause OOM. This PR computes the total size when listing join results in hash probe if there are any variable size columns from the build side that is going to be extracted. It stops listing further when it reaches the maximum size. This can help to control hash probe side memory usage to a confined limit. Pull Request resolved: facebookincubator#10652 Reviewed By: xiaoxmeng Differential Revision: D60771773 Pulled By: tanjialiang fbshipit-source-id: 2cb8c58ba795a0aa1df0485b58e4f6d0100be8f8 (cherry picked from commit 82e5492)

…lts (facebookincubator#10652) (#495) Summary: Hash probe currently has limited memory control when extracting results from the hash table. When a small number of large sized rows from the build side is frequently joined with the left side, the total extracted size will explode, making HashProbe using a large amount of memory. And the process of filling output is not in spillable state, and will often cause OOM. This PR computes the total size when listing join results in hash probe if there are any variable size columns from the build side that is going to be extracted. It stops listing further when it reaches the maximum size. This can help to control hash probe side memory usage to a confined limit. Pull Request resolved: facebookincubator#10652 Reviewed By: xiaoxmeng Differential Revision: D60771773 Pulled By: tanjialiang fbshipit-source-id: 2cb8c58ba795a0aa1df0485b58e4f6d0100be8f8 (cherry picked from commit 82e5492) Co-authored-by: Jialiang Tan <[email protected]>

…m-01' into 'rebase-upstream-1.2.x-vcpkg' Let HashProbe keep track of memory consumption when listing join results (facebookincubator#10652) Summary: Hash probe currently has limited memory control when extracting results from the hash table. When a small number of large sized rows from the build side is frequently joined with the left side, the total extracted size will explode, making HashProbe using a large amount of memory. And the process of filling output is not in spillable state, and will often cause OOM. This PR computes the total size when listing join results in hash probe if there are any variable size columns from the build side that is going to be extracted. It stops listing further when it reaches the maximum size. This can help to control hash probe side memory usage to a confined limit. PR Link: https://dev.sankuai.com/code/repo-detail/data/velox/pr/42

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 2, 2024

tanjialiang force-pushed the hash_probe branch from d88203f to 3e4cf5d Compare August 3, 2024 00:36

tanjialiang changed the title ~~Temp HashProb PR~~ [WIP] Temp HashProb PR Aug 5, 2024

tanjialiang force-pushed the hash_probe branch 2 times, most recently from 43b1e46 to 8e118f0 Compare August 5, 2024 17:24

tanjialiang force-pushed the hash_probe branch from 8e118f0 to fab059e Compare August 5, 2024 21:20

tanjialiang changed the title ~~[WIP] Temp HashProb PR~~ Let HashProbe keep track of memory consumption when listing join results Aug 5, 2024

tanjialiang marked this pull request as ready for review August 5, 2024 21:24

tanjialiang requested review from xiaoxmeng and kevinwilfong August 5, 2024 21:46

tanjialiang force-pushed the hash_probe branch from fab059e to 7c644b2 Compare August 5, 2024 23:40

tanjialiang force-pushed the hash_probe branch from 7c644b2 to 54a912c Compare August 5, 2024 23:51

Yuhta reviewed Aug 6, 2024

View reviewed changes

xiaoxmeng reviewed Aug 8, 2024

View reviewed changes

tanjialiang force-pushed the hash_probe branch from 54a912c to bf87f1f Compare August 9, 2024 23:38

xiaoxmeng reviewed Aug 10, 2024

View reviewed changes

tanjialiang force-pushed the hash_probe branch from bf87f1f to 02dac6a Compare August 10, 2024 23:35

tanjialiang force-pushed the hash_probe branch 2 times, most recently from b0d3392 to d6d068a Compare August 11, 2024 08:09

xiaoxmeng reviewed Aug 11, 2024

View reviewed changes

tanjialiang force-pushed the hash_probe branch from d6d068a to 4ea4855 Compare August 12, 2024 02:51

tanjialiang force-pushed the hash_probe branch from 4ea4855 to 04dec5e Compare August 12, 2024 04:41

tanjialiang force-pushed the hash_probe branch from 04dec5e to 3920fec Compare August 12, 2024 04:47

xiaoxmeng approved these changes Aug 12, 2024

View reviewed changes

tanjialiang force-pushed the hash_probe branch from 3920fec to 4db78bd Compare August 12, 2024 18:18

tanjialiang force-pushed the hash_probe branch 2 times, most recently from a3013ee to fb39a3a Compare August 12, 2024 23:48

Temp HashProb PR

89aa01e

tanjialiang force-pushed the hash_probe branch from fb39a3a to 89aa01e Compare August 12, 2024 23:49

facebook-github-bot closed this in 82e5492 Aug 13, 2024

facebook-github-bot added the Merged label Aug 13, 2024

jinchengchenghh added a commit to jinchengchenghh/velox that referenced this pull request Aug 15, 2024

Revert "Let HashProbe keep track of memory consumption when listing j…

0672a8f

…oin results (facebookincubator#10652)" This reverts commit 82e5492.

zsmj2017 mentioned this pull request Aug 23, 2024

[Gluten-1.2] Port #10652 to Branch-1.2 for Let HashProbe keep track of memory consumption when listing join results (#10652) oap-project/velox#495

Merged

JunhyungSong mentioned this pull request Nov 5, 2024

Hash probe performance regression with https://github.com/facebookincubator/velox/pull/10652 #11438

Open

Let HashProbe keep track of memory consumption when listing join results #10652

Let HashProbe keep track of memory consumption when listing join results #10652

Conversation

tanjialiang commented Aug 2, 2024 • edited Loading

netlify bot commented Aug 3, 2024 • edited Loading

✅ Deploy Preview for meta-velox canceled.

facebook-github-bot commented Aug 5, 2024

facebook-github-bot commented Aug 5, 2024

facebook-github-bot commented Aug 5, 2024

facebook-github-bot commented Aug 5, 2024

facebook-github-bot commented Aug 5, 2024

Yuhta Aug 6, 2024 • edited Loading

Choose a reason for hiding this comment

tanjialiang Aug 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Yuhta Aug 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiaoxmeng left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Aug 10, 2024

facebook-github-bot commented Aug 11, 2024

xiaoxmeng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Aug 12, 2024

facebook-github-bot commented Aug 12, 2024

xiaoxmeng left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Aug 12, 2024

facebook-github-bot commented Aug 12, 2024

facebook-github-bot commented Aug 13, 2024

conbench-facebook bot commented Aug 13, 2024

jinchengchenghh commented Aug 15, 2024

Yuhta commented Aug 15, 2024

tanjialiang commented Aug 15, 2024

jinchengchenghh commented Aug 15, 2024

zhouyuan commented Aug 16, 2024

tanjialiang commented Aug 2, 2024 •

edited

Loading

netlify bot commented Aug 3, 2024 •

edited

Loading

Yuhta Aug 6, 2024 •

edited

Loading

tanjialiang Aug 6, 2024 •

edited

Loading

Yuhta Aug 6, 2024 •

edited

Loading