Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](spill) Avoid releasing resources while spill tasks are executing #32783

Merged
merged 1 commit into from
Mar 27, 2024

Conversation

mrhhsg
Copy link
Member

@mrhhsg mrhhsg commented Mar 25, 2024

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@mrhhsg
Copy link
Member Author

mrhhsg commented Mar 25, 2024

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

return Status::OK();
}

ThreadPoolToken* PartitionedHashJoinProbeLocalState::_get_or_create_token(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method '_get_or_create_token' can be made static [readability-convert-member-functions-to-static]

be/src/pipeline/exec/partitioned_hash_join_probe_operator.h:70:

-     ThreadPoolToken* _get_or_create_token(ThreadPool* thread_pool,
+     static ThreadPoolToken* _get_or_create_token(ThreadPool* thread_pool,

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.26% (8733/24770)
Line Coverage: 27.05% (71520/264355)
Region Coverage: 26.29% (37107/141127)
Branch Coverage: 23.19% (18981/81838)
Coverage Report: http://coverage.selectdb-in.cc/coverage/7923a433db71731439d0162208552ae820947768_7923a433db71731439d0162208552ae820947768/report/index.html

Status PartitionedHashJoinProbeLocalState::close(RuntimeState* state) {
RETURN_IF_ERROR(JoinProbeLocalState::close(state));
LOG(INFO) << "here begin to shutdown all thread pool tokens.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could not do like this. It may cause the pipeline worker thread hang.

Status PartitionedHashJoinProbeLocalState::close(RuntimeState* state) {
RETURN_IF_ERROR(JoinProbeLocalState::close(state));
LOG(INFO) << "here begin to shutdown all thread pool tokens.";
for (auto& token : _spilling_tokens) {
token.second->shutdown();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pipelinex engine, we should use dependency to signal that all IO has finished.
And if you want to cancel the io thread, then do not need wait. Using task execution context weak ptr.
You could refer async_result_Writer.

@doris-robot
Copy link

TPC-H: Total hot run time: 37790 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7923a433db71731439d0162208552ae820947768, data reload: false

------ Round 1 ----------------------------------
q1	17631	4191	4073	4073
q2	2107	164	153	153
q3	10577	1138	1203	1138
q4	10239	782	770	770
q5	7465	2978	2931	2931
q6	208	124	121	121
q7	1028	568	559	559
q8	9336	2017	1993	1993
q9	7203	6548	6539	6539
q10	8416	3479	3536	3479
q11	432	222	217	217
q12	416	195	189	189
q13	17811	2862	2832	2832
q14	248	202	203	202
q15	489	465	463	463
q16	496	368	363	363
q17	939	513	598	513
q18	7114	6378	6413	6378
q19	2382	1425	1411	1411
q20	587	250	242	242
q21	3623	2931	2936	2931
q22	354	308	293	293
Total cold run time: 109101 ms
Total hot run time: 37790 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4082	4080	4083	4080
q2	334	227	232	227
q3	2960	2851	2828	2828
q4	1839	1529	1513	1513
q5	5303	5311	5344	5311
q6	192	114	117	114
q7	2252	1839	1810	1810
q8	3147	3288	3283	3283
q9	8679	8715	8690	8690
q10	3800	3717	3768	3717
q11	544	448	444	444
q12	702	580	550	550
q13	16969	2877	2840	2840
q14	272	254	248	248
q15	483	473	461	461
q16	459	405	412	405
q17	1712	1475	1441	1441
q18	7446	7206	7041	7041
q19	1603	1518	1543	1518
q20	1919	1694	1707	1694
q21	4939	4753	4751	4751
q22	509	424	453	424
Total cold run time: 70145 ms
Total hot run time: 53390 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185681 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7923a433db71731439d0162208552ae820947768, data reload: false

query1	934	382	357	357
query2	6650	1987	1832	1832
query3	6709	218	210	210
query4	31949	21673	21840	21673
query5	4409	434	406	406
query6	330	179	197	179
query7	5084	307	309	307
query8	542	184	187	184
query9	10455	2269	2268	2268
query10	1119	267	249	249
query11	15039	14260	14178	14178
query12	139	95	95	95
query13	1638	416	468	416
query14	12820	11136	11335	11136
query15	270	191	193	191
query16	8238	259	255	255
query17	1981	595	552	552
query18	2115	293	281	281
query19	326	160	160	160
query20	98	86	89	86
query21	207	128	127	127
query22	5017	4794	4763	4763
query23	33876	32491	32579	32491
query24	10488	2815	2815	2815
query25	590	361	359	359
query26	1190	150	154	150
query27	2391	336	343	336
query28	7101	1868	1818	1818
query29	884	609	601	601
query30	300	148	147	147
query31	947	735	723	723
query32	92	55	55	55
query33	756	244	243	243
query34	1095	477	484	477
query35	818	605	609	605
query36	1029	873	900	873
query37	105	76	81	76
query38	3507	3387	3420	3387
query39	1485	1442	1414	1414
query40	214	115	114	114
query41	50	48	47	47
query42	105	96	97	96
query43	472	442	444	442
query44	1192	718	707	707
query45	297	263	264	263
query46	1123	700	706	700
query47	1893	1834	1815	1815
query48	458	349	345	345
query49	1112	338	341	338
query50	762	372	373	372
query51	6711	6641	6604	6604
query52	105	88	92	88
query53	349	281	282	281
query54	310	241	237	237
query55	87	78	78	78
query56	241	230	229	229
query57	1207	1134	1130	1130
query58	233	204	207	204
query59	2809	2656	2551	2551
query60	277	245	244	244
query61	115	113	113	113
query62	654	446	438	438
query63	306	282	280	280
query64	5218	4089	4140	4089
query65	3059	3040	3001	3001
query66	881	401	379	379
query67	15163	14665	14966	14665
query68	8766	525	523	523
query69	642	383	398	383
query70	1239	1186	1129	1129
query71	539	273	271	271
query72	6945	2799	2519	2519
query73	766	318	331	318
query74	6944	6292	6467	6292
query75	4153	2882	2847	2847
query76	5377	893	855	855
query77	655	255	258	255
query78	10902	10235	10127	10127
query79	9579	518	525	518
query80	1779	378	365	365
query81	533	211	216	211
query82	360	199	196	196
query83	218	143	147	143
query84	290	84	75	75
query85	1058	317	310	310
query86	362	308	284	284
query87	3720	3535	3509	3509
query88	4855	2271	2279	2271
query89	484	367	374	367
query90	2047	176	175	175
query91	171	141	134	134
query92	57	55	47	47
query93	6150	491	484	484
query94	1339	175	174	174
query95	444	335	320	320
query96	606	264	267	264
query97	3064	2899	2867	2867
query98	238	212	214	212
query99	1179	911	913	911
Total cold run time: 313587 ms
Total hot run time: 185681 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 7923a433db71731439d0162208552ae820947768 with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       22.1 seconds inserted 10000000 Rows, about 452K ops/s

@mrhhsg mrhhsg changed the title [fix](spill) Avoid releasing resources while spill tasks are executing [fix](spill) Add finish dependency to avoid releasing resources while spill tasks are executing Mar 25, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@@ -152,12 +156,21 @@ Status PartitionedHashJoinProbeLocalState::close(RuntimeState* state) {
return Status::OK();
}

void PartitionedHashJoinProbeLocalState::_decrease_spilling_tasks_count() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method '_decrease_spilling_tasks_count' can be made static [readability-convert-member-functions-to-static]

be/src/pipeline/exec/partitioned_hash_join_probe_operator.h:72:

-     void _decrease_spilling_tasks_count();
+     static void _decrease_spilling_tasks_count();

@mrhhsg
Copy link
Member Author

mrhhsg commented Mar 25, 2024

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.26% (8738/24781)
Line Coverage: 27.05% (71562/264509)
Region Coverage: 26.29% (37116/141175)
Branch Coverage: 23.18% (18981/81868)
Coverage Report: http://coverage.selectdb-in.cc/coverage/71fdaa400b97809435ee54a6a0ef163b0b464c2a_71fdaa400b97809435ee54a6a0ef163b0b464c2a/report/index.html

@mrhhsg mrhhsg changed the title [fix](spill) Add finish dependency to avoid releasing resources while spill tasks are executing [fix](spill) Avoid releasing resources while spill tasks are executing Mar 26, 2024
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@mrhhsg
Copy link
Member Author

mrhhsg commented Mar 26, 2024

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.26% (8739/24782)
Line Coverage: 27.04% (71555/264668)
Region Coverage: 26.28% (37121/141273)
Branch Coverage: 23.18% (18985/81914)
Coverage Report: http://coverage.selectdb-in.cc/coverage/b012d311c2c75f181ff74622a3428f6f763c3587_b012d311c2c75f181ff74622a3428f6f763c3587/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 37951 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b012d311c2c75f181ff74622a3428f6f763c3587, data reload: false

------ Round 1 ----------------------------------
q1	17594	5023	4169	4169
q2	2101	156	150	150
q3	10589	1153	1216	1153
q4	10231	825	751	751
q5	7487	3002	2984	2984
q6	207	124	126	124
q7	1028	575	567	567
q8	9331	2001	1993	1993
q9	7331	6582	6554	6554
q10	8432	3470	3540	3470
q11	435	236	224	224
q12	406	201	189	189
q13	17805	2874	2861	2861
q14	250	201	205	201
q15	521	471	472	471
q16	491	374	375	374
q17	936	525	625	525
q18	7203	6465	6479	6465
q19	4500	1375	1437	1375
q20	546	269	252	252
q21	3500	2819	2992	2819
q22	341	280	285	280
Total cold run time: 111265 ms
Total hot run time: 37951 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4183	4127	4083	4083
q2	326	230	230	230
q3	2992	2866	2855	2855
q4	1841	1574	1577	1574
q5	5297	5316	5306	5306
q6	191	116	119	116
q7	2247	1876	1817	1817
q8	3167	3286	3277	3277
q9	8703	8711	8721	8711
q10	3778	3755	3788	3755
q11	566	453	450	450
q12	716	546	551	546
q13	16927	2839	2834	2834
q14	277	252	259	252
q15	489	462	462	462
q16	469	436	427	427
q17	1725	1497	1489	1489
q18	7418	7187	7219	7187
q19	1612	1550	1528	1528
q20	1913	1709	1711	1709
q21	4747	4559	4653	4559
q22	507	478	454	454
Total cold run time: 70091 ms
Total hot run time: 53621 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 181096 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b012d311c2c75f181ff74622a3428f6f763c3587, data reload: false

query1	920	378	350	350
query2	6547	2018	1794	1794
query3	6712	213	215	213
query4	31559	21491	21385	21385
query5	4316	392	392	392
query6	270	181	172	172
query7	4622	312	289	289
query8	227	167	173	167
query9	9501	2284	2251	2251
query10	556	241	245	241
query11	15506	14138	14273	14138
query12	143	93	86	86
query13	1658	415	415	415
query14	10072	8025	7616	7616
query15	264	195	208	195
query16	8237	263	269	263
query17	1962	580	554	554
query18	2104	289	280	280
query19	353	167	159	159
query20	90	88	94	88
query21	198	127	125	125
query22	4993	4875	4878	4875
query23	33305	32665	32563	32563
query24	10825	2922	2889	2889
query25	602	379	379	379
query26	1206	156	159	156
query27	2606	351	359	351
query28	7526	1922	1899	1899
query29	890	661	628	628
query30	300	151	149	149
query31	972	738	717	717
query32	95	60	59	59
query33	777	262	241	241
query34	1057	470	482	470
query35	832	609	601	601
query36	1034	872	878	872
query37	121	62	62	62
query38	3533	3444	3388	3388
query39	1502	1432	1423	1423
query40	212	107	107	107
query41	48	46	44	44
query42	105	92	97	92
query43	489	457	456	456
query44	1224	729	744	729
query45	277	235	256	235
query46	1101	734	721	721
query47	1914	1827	1811	1811
query48	452	373	363	363
query49	1089	352	332	332
query50	756	376	370	370
query51	6691	6477	6651	6477
query52	100	89	85	85
query53	342	274	270	270
query54	319	246	223	223
query55	79	84	78	78
query56	238	215	219	215
query57	1211	1145	1143	1143
query58	222	208	201	201
query59	2671	2583	2635	2583
query60	274	253	248	248
query61	94	96	96	96
query62	667	452	452	452
query63	299	287	279	279
query64	5464	3942	3990	3942
query65	3077	3059	2994	2994
query66	882	382	381	381
query67	15167	14886	14667	14667
query68	6336	534	542	534
query69	607	370	374	370
query70	1232	1180	1113	1113
query71	484	265	270	265
query72	6921	2833	2682	2682
query73	722	322	335	322
query74	7858	6457	6490	6457
query75	3356	2207	2178	2178
query76	4330	935	895	895
query77	615	273	263	263
query78	10806	10065	10080	10065
query79	8413	533	526	526
query80	1891	389	378	378
query81	537	224	217	217
query82	1600	86	83	83
query83	280	149	149	149
query84	283	78	77	77
query85	1630	312	309	309
query86	487	315	310	310
query87	3716	3498	3492	3492
query88	5068	2393	2379	2379
query89	517	380	390	380
query90	1956	173	173	173
query91	176	135	139	135
query92	63	47	47	47
query93	7015	510	496	496
query94	1205	176	176	176
query95	432	329	332	329
query96	617	274	273	273
query97	2659	2462	2496	2462
query98	229	209	205	205
query99	1208	867	922	867
Total cold run time: 306706 ms
Total hot run time: 181096 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit b012d311c2c75f181ff74622a3428f6f763c3587 with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       13.7 seconds inserted 10000000 Rows, about 729K ops/s

@@ -244,8 +244,16 @@ Status PartitionedAggSinkLocalState::revoke_memory(RuntimeState* state) {
}
}
}};

auto execution_context_ = state->get_task_execution_context();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an ordinary variable, not ending with _, just use execution_context

status = ExecEnv::GetInstance()->spill_stream_mgr()->get_async_task_thread_pool()->submit_func(
[this, &parent, state] {
[this, &parent, state, execution_context_] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has to use value to copy the variable, not use reference in lambda object.
Because the referenced object maybe destroyed.
You could refer this PR
https://github.com/apache/doris/pull/32132/files

auto execution_context = execution_context_.lock();
if (!execution_context) {
// FIXME: return status is meaningless?
return Status::Cancelled("Cancelled");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should print log here.

auto execution_context_ = state->get_task_execution_context();
_shared_state_holder = _shared_state->shared_from_this();
return spill_io_pool->submit_func(
[execution_context_, state, &build_spilling_stream, &mutable_block, this] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also here, please use copy-value not use reference in lambda object.

struct PartitionedHashJoinSharedState : public HashJoinSharedState {
struct PartitionedHashJoinSharedState
: public HashJoinSharedState,
public std::enable_shared_from_this<PartitionedHashJoinSharedState> {
std::vector<std::unique_ptr<vectorized::MutableBlock>> partitioned_build_blocks;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add enable_factory_creator for this object to forbidden raw pointer for this object。
And also using class not struct. I am not sure if enable_factory_creator has effect on struct.

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@mrhhsg
Copy link
Member Author

mrhhsg commented Mar 27, 2024

run buildall

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 27, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 37902 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 21e0d4ec700cae8a8f61e1e3cff079bd0334877f, data reload: false

------ Round 1 ----------------------------------
q1	17620	4208	4096	4096
q2	2104	159	152	152
q3	10579	1161	1211	1161
q4	10226	782	701	701
q5	7434	3037	2968	2968
q6	206	125	122	122
q7	1036	576	561	561
q8	9340	2023	2014	2014
q9	7256	6597	6578	6578
q10	8462	3481	3554	3481
q11	434	216	219	216
q12	445	198	195	195
q13	17808	2832	2862	2832
q14	250	214	205	205
q15	502	467	465	465
q16	491	373	372	372
q17	954	537	630	537
q18	7083	6424	6395	6395
q19	2941	1454	1495	1454
q20	571	268	274	268
q21	3559	2871	2836	2836
q22	359	293	319	293
Total cold run time: 109660 ms
Total hot run time: 37902 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4084	4063	4060	4060
q2	326	230	229	229
q3	2978	2882	2814	2814
q4	1804	1543	1566	1543
q5	5341	5341	5326	5326
q6	197	115	118	115
q7	2233	1888	1900	1888
q8	3166	3310	3279	3279
q9	8702	8679	8658	8658
q10	3748	3727	3817	3727
q11	552	445	447	445
q12	747	543	538	538
q13	16938	2843	2833	2833
q14	283	242	255	242
q15	494	473	466	466
q16	477	426	421	421
q17	1723	1528	1469	1469
q18	7394	7264	6987	6987
q19	1600	1500	1580	1500
q20	1909	1752	1715	1715
q21	4849	4580	4788	4580
q22	516	443	460	443
Total cold run time: 70061 ms
Total hot run time: 53278 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.25% (8741/24796)
Line Coverage: 27.02% (71551/264760)
Region Coverage: 26.27% (37122/141303)
Branch Coverage: 23.17% (18981/81926)
Coverage Report: http://coverage.selectdb-in.cc/coverage/21e0d4ec700cae8a8f61e1e3cff079bd0334877f_21e0d4ec700cae8a8f61e1e3cff079bd0334877f/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 181801 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 21e0d4ec700cae8a8f61e1e3cff079bd0334877f, data reload: false

query1	927	368	353	353
query2	6520	2090	1930	1930
query3	6707	207	213	207
query4	31752	21301	21682	21301
query5	4344	403	397	397
query6	275	185	195	185
query7	4631	293	300	293
query8	222	165	174	165
query9	9342	2307	2339	2307
query10	562	250	256	250
query11	14778	14264	14185	14185
query12	133	91	89	89
query13	1628	423	420	420
query14	10124	7697	7589	7589
query15	238	190	199	190
query16	8226	269	257	257
query17	1970	581	546	546
query18	2108	292	289	289
query19	349	158	160	158
query20	94	86	89	86
query21	207	127	128	127
query22	4998	4846	4781	4781
query23	33664	32759	32801	32759
query24	10805	2861	2913	2861
query25	584	367	358	358
query26	1169	154	151	151
query27	2975	357	350	350
query28	7343	1897	1845	1845
query29	853	641	608	608
query30	301	147	149	147
query31	953	721	753	721
query32	95	54	56	54
query33	754	248	248	248
query34	1055	470	475	470
query35	835	599	610	599
query36	1027	880	896	880
query37	118	65	64	64
query38	3589	3494	3422	3422
query39	1466	1428	1492	1428
query40	227	109	110	109
query41	49	48	44	44
query42	106	99	98	98
query43	493	451	462	451
query44	1161	723	728	723
query45	274	257	279	257
query46	1116	683	689	683
query47	1925	1836	1862	1836
query48	435	360	360	360
query49	1113	334	338	334
query50	769	371	371	371
query51	6768	6652	6627	6627
query52	105	88	92	88
query53	347	274	277	274
query54	307	232	227	227
query55	80	77	76	76
query56	244	218	218	218
query57	1222	1130	1177	1130
query58	223	201	204	201
query59	2787	2712	2565	2565
query60	275	237	243	237
query61	95	94	91	91
query62	681	455	462	455
query63	301	274	272	272
query64	5309	4078	4076	4076
query65	3064	3023	3047	3023
query66	861	355	364	355
query67	15387	14818	14836	14818
query68	5335	507	518	507
query69	579	380	384	380
query70	1190	1199	1173	1173
query71	425	267	271	267
query72	6378	2879	2700	2700
query73	695	317	312	312
query74	8039	6455	6463	6455
query75	2994	2219	2212	2212
query76	3482	866	835	835
query77	393	272	262	262
query78	11108	10112	10161	10112
query79	5664	527	537	527
query80	1914	392	400	392
query81	567	218	216	216
query82	987	94	90	90
query83	288	162	162	162
query84	305	94	81	81
query85	2162	393	380	380
query86	498	312	290	290
query87	3762	3487	3485	3485
query88	4745	2321	2305	2305
query89	473	369	370	369
query90	2007	177	173	173
query91	171	140	140	140
query92	61	51	50	50
query93	4927	499	483	483
query94	1245	175	176	175
query95	428	325	324	324
query96	594	271	275	271
query97	2634	2459	2489	2459
query98	231	216	203	203
query99	1216	957	942	942
Total cold run time: 298942 ms
Total hot run time: 181801 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 21e0d4ec700cae8a8f61e1e3cff079bd0334877f with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      33 seconds loaded 861443392 Bytes, about 24 MB/s
Insert into select:       13.8 seconds inserted 10000000 Rows, about 724K ops/s

@jacktengg
Copy link
Contributor

run external

@yiguolei
Copy link
Contributor

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 38359 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 22cbfc7031eb39793562e048782d97c878b1a532, data reload: false

------ Round 1 ----------------------------------
q1	18255	4376	4245	4245
q2	2760	167	159	159
q3	11908	1136	1253	1136
q4	10641	776	820	776
q5	7670	3028	2983	2983
q6	202	125	125	125
q7	1016	600	609	600
q8	9335	2006	2007	2006
q9	7177	6627	6641	6627
q10	8413	3515	3578	3515
q11	424	224	213	213
q12	369	200	203	200
q13	17798	2850	2876	2850
q14	232	195	203	195
q15	500	466	470	466
q16	493	384	384	384
q17	981	541	574	541
q18	7192	6488	6455	6455
q19	1580	1477	1435	1435
q20	549	268	276	268
q21	3681	2999	2895	2895
q22	361	298	285	285
Total cold run time: 111537 ms
Total hot run time: 38359 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4139	4083	4090	4083
q2	327	234	227	227
q3	2946	2803	2843	2803
q4	1863	1615	1588	1588
q5	5314	5325	5350	5325
q6	199	119	120	119
q7	2236	1841	1864	1841
q8	3185	3273	3278	3273
q9	8736	8664	8693	8664
q10	3808	3812	3741	3741
q11	546	434	445	434
q12	723	541	536	536
q13	16883	2854	2858	2854
q14	279	254	245	245
q15	495	454	460	454
q16	456	428	432	428
q17	1749	1505	1492	1492
q18	7420	7133	7030	7030
q19	1610	1524	1545	1524
q20	1925	1724	1684	1684
q21	4719	4546	4661	4546
q22	509	441	466	441
Total cold run time: 70067 ms
Total hot run time: 53332 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.25% (8741/24796)
Line Coverage: 27.02% (71549/264775)
Region Coverage: 26.27% (37124/141314)
Branch Coverage: 23.17% (18983/81934)
Coverage Report: http://coverage.selectdb-in.cc/coverage/22cbfc7031eb39793562e048782d97c878b1a532_22cbfc7031eb39793562e048782d97c878b1a532/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 181245 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 22cbfc7031eb39793562e048782d97c878b1a532, data reload: false

query1	938	364	356	356
query2	6541	2011	1884	1884
query3	6706	212	209	209
query4	31822	21332	21550	21332
query5	4257	401	395	395
query6	264	177	173	173
query7	4621	294	291	291
query8	231	179	170	170
query9	9630	2327	2301	2301
query10	567	237	248	237
query11	17209	14171	14201	14171
query12	137	85	87	85
query13	1633	409	402	402
query14	9906	7552	7461	7461
query15	248	186	217	186
query16	8097	263	264	263
query17	1951	570	554	554
query18	2071	301	284	284
query19	239	155	161	155
query20	93	90	92	90
query21	204	129	131	129
query22	5040	4857	4772	4772
query23	33651	32659	32870	32659
query24	11787	2854	2921	2854
query25	662	373	388	373
query26	1789	159	154	154
query27	3040	356	358	356
query28	7769	1867	1884	1867
query29	1047	620	625	620
query30	311	151	162	151
query31	979	741	750	741
query32	98	59	54	54
query33	772	255	255	255
query34	1081	483	487	483
query35	866	626	648	626
query36	1028	897	914	897
query37	272	63	63	63
query38	3488	3431	3437	3431
query39	1469	1458	1417	1417
query40	298	120	112	112
query41	51	49	47	47
query42	104	93	98	93
query43	490	460	449	449
query44	1217	736	723	723
query45	280	269	263	263
query46	1121	698	704	698
query47	1947	1860	1821	1821
query48	446	371	353	353
query49	1243	347	338	338
query50	759	376	369	369
query51	6831	6670	6579	6579
query52	106	96	94	94
query53	342	281	267	267
query54	314	235	221	221
query55	80	79	85	79
query56	235	215	215	215
query57	1220	1135	1123	1123
query58	231	202	192	192
query59	2850	2684	2541	2541
query60	259	244	239	239
query61	93	92	92	92
query62	655	455	458	455
query63	302	266	270	266
query64	6314	4130	3901	3901
query65	3170	3045	3021	3021
query66	1431	377	352	352
query67	15599	15224	15098	15098
query68	5907	511	513	511
query69	595	376	385	376
query70	1207	1110	1182	1110
query71	451	269	256	256
query72	6460	2712	2545	2545
query73	721	325	318	318
query74	7977	6383	6322	6322
query75	3164	2244	2207	2207
query76	4277	876	843	843
query77	617	268	260	260
query78	10821	10169	10130	10130
query79	6816	519	525	519
query80	1119	389	388	388
query81	511	222	213	213
query82	1569	90	84	84
query83	216	146	148	146
query84	283	84	82	82
query85	1566	368	367	367
query86	484	316	313	313
query87	3789	3505	3569	3505
query88	4876	2324	2291	2291
query89	478	366	365	365
query90	2027	174	176	174
query91	172	136	140	136
query92	61	51	46	46
query93	5351	501	482	482
query94	1122	179	174	174
query95	426	326	345	326
query96	587	271	267	267
query97	2720	2474	2467	2467
query98	227	211	216	211
query99	1218	909	955	909
Total cold run time: 308317 ms
Total hot run time: 181245 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 22cbfc7031eb39793562e048782d97c878b1a532 with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       13.7 seconds inserted 10000000 Rows, about 729K ops/s

@yiguolei yiguolei merged commit 94d745b into apache:master Mar 27, 2024
24 of 28 checks passed
Jibing-Li added a commit that referenced this pull request Mar 29, 2024
* [fix](merge cloud) Fix cloud be set be tag map (#32864)

* [chore] Add gavinchou to collaborators (#32881)

* [chore](show) support statement to show views from table (#32358)

MySQL [test]> show views;
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
| t2_view        |
+----------------+
2 rows in set (0.00 sec)

MySQL [test]> show views like '%t1%';
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
+----------------+
1 row in set (0.01 sec)

MySQL [test]> show views where create_time > '2024-03-18';
+----------------+
| Tables_in_test |
+----------------+
| t2_view        |
+----------------+
1 row in set (0.02 sec)

* [Enhancement](ranger) Disable some permission operations when Ranger or LDAP are enabled (#32538)

Disable some permission operations when Ranger or LDAP are enabled.

* [chore](ci) exclude unstable trino_connector case (#32892)

Co-authored-by: stephen <[email protected]>

* [fix](Nereids) NPE when create table with implicit index type (#32893)

* [improvement](mtmv) Support more join types for query rewriting by materialized view (#32685)

This pattern of rewriting is supported for multi-table joins and supported join types is as following:

INNER JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
LEFT SEMI JOIN
RIGHT SEMI JOIN
LEFT ANTI JOIN
RIGHT ANTI JOIN

* [Serde](Variant) support arrow serialization for varint type (#32780)

* [fix](multicatalog) fix no data error when read hive table on cosn (#32815)

Currently, when reading a hive on cosn table, doris return empty result, but the table has data.
iceberg on cosn is ok.
The reason is misuse of cosn's file sytem. according to cosn's doc, its fs.cosn.impl should be org.apache.hadoop.fs.CosFileSystem

* [fix](nereids)EliminateGroupByConstant should replace agg's output after removing constant group by keys (#32878)

* [Fix](executor)Fix regression test for test_active_queries/test_backend_active_tasks #32899

* [fix](iceberg) fix iceberg catalog bug and p2 test cases (#32898)

1. Fix iceberg catalog bug

    This PR #30198 change the logic of `IcebergHMSExternalCatalog.java`,
    to get locationUrl by calling hive metastore's `getCatalog()` method.
    But this method only exists in hive 3+. So it will fail if we using hive 2.x.

    I temporary remove this logic, because this logic is only used from iceberg table writing.
    Which is still under development. We will rethink this logic later.

2. Fix test cases

    Some of P2 test cases missed `order_qt`. And because the output format of the floating point
    type is changed, some result in `out` files need to be regenerated.

* [revert](jni) revert part of #32455 (#32904)

* [fix](spill) Avoid releasing resources while spill tasks are executing (#32783)

* [chore](log) print query id before logging profile in be.INFO (#32922)

* [fix](grace-exit) Stop incorrectly of reportwork cause heap use after free #32929

* [improvement](decommission be) decommission check replica num (#32748)

* [fix](arrow-flight) Fix reach limit of connections error (#32911)

Fix Reach limit of connections error
in fe.conf , arrow_flight_token_cache_size is mandatory less than qe_max_connection/2. arrow flight sql is a stateless protocol, connection is usually not actively disconnected, bearer token is evict from the cache will unregister ConnectContext.

Fix ConnectContext.command not be reset to COM_SLEEP in time, this will result in frequent kill connection after query timeout.

Fix bearer token evict log and exception.

TODO: use arrow flight session: https://mail.google.com/mail/u/0/#inbox/FMfcgzGxRdxBLQLTcvvtRpqsvmhrHpdH

* [bugfix](cloud) few variable not initialized (#32868)

../../cloud/src/recycler/meta_checker.cpp
can cause uninitialised memory read.

* [fix](arrow-flight) Fix arrow flight sql compatible with JDK 17 and upgrade arrow 15.0.2 (#32796)

--add-opens=java.base/java.nio=ALL-UNNAMED, see: https://arrow.apache.org/docs/java/install.html#java-compatibility
groovy use flight sql connection to execute query SUM(MAX(c1) OVER (PARTITION BY)) report error: AGGREGATE clause must not contain analytic expressions, but no problem in Java execute it with jdbc::arrow-flight-sql.
groovy not support print arrow array type, throw IndexOutOfBoundsException.
"arrow_flight_sql" not support two phase read
./run-regression-test.sh --run --clean -g arrow_flight_sql

* [fix](spill) SpillStream's writer maybe may not have been finalized (#32931)

* [improvement](spill) Disable DistinctStreamingAgg when spill is enabled (#32932)

* [Improve](inverted_index) update clucene and improve array inverted index writer  (#32436)

* [Performance](exec) replace SipHash in function by XXHash (#32919)

* [feature](agg) add aggregate function sum0 (#32541)

* [improvement](mtmv) Support to get tables in materialized view when collecting table in plan (#32797)

Support to get tables in materialized view when collecting table in plan

table scehma as fllowing:

create materialized view mv1
BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
DISTRIBUTED BY RANDOM BUCKETS 1 
PROPERTIES ('replication_num' = '1')
 as 
select 
  t1.c1, 
  t3.c2 
from 
  table1 t1 
  inner join table3 t3 on t1.c1 = t3.c2

if get table from the plan as follwoing, we can get [table1, table3, table2], the mv1 is expanded to get base tables;

SELECT 
  mv1.*, 
  uuid() 
FROM 
  mv1 LEFT SEMI 
  JOIN table2 ON mv1.c1 = table2.c1 
WHERE 
  mv1.c1 IN (
    SELECT 
      c1 
    FROM 
      table2
  ) 
  OR mv1.c1 < 10

* [enhance](mtmv)support olap table partition column is null (#32698)

* [enhancement](cloud) add table version to cloud (#32738)

Add table version to cloud.

In Fe:
Get: If Fe is cloud mode, get table version from meta service.
Update: Op drop/replace temp partition, commit transaction.

In meta service:
Add: create Index. init value is 1.
Remove: by recycler.
Update: commit/drop partition rpc, commit txn rpc. Atomic++.

* [fix](cloud) schema change from not null to null (#32913)

1. Use equals instead of == for type comparing
2. null bitmap size is reisze by size of ref column.

* [feature](Nereids): add ColumnPruningPostProcessor. (#32800)

* [case](rowpolicy)fix row policy has been exist (#32880)

* [fix](pipeline) fix use error row desc when origin block clear (#32803)

* [fix](Nereids) support variant column with index when create table (#32948)

* [opt](Nereids) support create table with variant type (#32953)

* [test](insert-overwrite) Add insert overwrite auto detect concurrency cases (#32935)

* [fix](compile) fe cannot compile in idea (#32955)

* [enhancement](plsql) Support select * from routines (#32866)

Support show of plsql procedure using select * from routines.

* [fix](trino-connector) fix `NoClassDefFoundError` of hudi `Utils` class (#32846)

Due to the change of this PR #32455 , the `trino-connector-scanner` package cannot access the `hudi_scanner` package, so the exception NoclassDeffounderror will appear.

We need to write a separate Utils class.

* [exec](column) change some complex column move to noexcept (#32954)

* [Enhancement](data skew) extends show data skew (#32732)

* [chore](test) let suite compatible with Nereids (#32964)

* Support identical column name in different index. (#32792)

* Limit the max string length to 1024 while collecting column stats to control BE memory usage. (#32470)

* [fix](merge-iterator) fix NOT_IMPLEMENTED_ERROR when read next block view (#32961)

* [improvement](executor)Add tag property for workload group #32874

* [fix](auth)unified workload and resource permission logic (#32907)

- `Grant resource` can no longer grant global `usage_priv`
-  `grant resource %` instead of `grant resource *`

before change:
```
grant usage_priv on resource * to f;
show grants for f\G
*************************** 1. row ***************************
      UserIdentity: 'f'@'%'
           Comment: 
          Password: No
             Roles: 
       GlobalPrivs: Usage_priv 
      CatalogPrivs: NULL
     DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv 
        TablePrivs: NULL
          ColPrivs: NULL
     ResourcePrivs: NULL
 CloudClusterPrivs: NULL
WorkloadGroupPrivs: normal: Usage_priv 
```
after change
```
grant usage_priv on resource '%' to f;
show grants for f\G
*************************** 1. row ***************************
      UserIdentity: 'f'@'%'
           Comment: 
          Password: No
             Roles: 
       GlobalPrivs: NULL
      CatalogPrivs: NULL
     DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv 
        TablePrivs: NULL
          ColPrivs: NULL
     ResourcePrivs: %: Usage_priv 
 CloudClusterPrivs: NULL
WorkloadGroupPrivs: normal: Usage_priv 

```

---------

Co-authored-by: yujun <[email protected]>
Co-authored-by: Gavin Chou <[email protected]>
Co-authored-by: xy720 <[email protected]>
Co-authored-by: yongjinhou <[email protected]>
Co-authored-by: Dongyang Li <[email protected]>
Co-authored-by: stephen <[email protected]>
Co-authored-by: morrySnow <[email protected]>
Co-authored-by: seawinde <[email protected]>
Co-authored-by: lihangyu <[email protected]>
Co-authored-by: Yulei-Yang <[email protected]>
Co-authored-by: starocean999 <[email protected]>
Co-authored-by: wangbo <[email protected]>
Co-authored-by: Mingyu Chen <[email protected]>
Co-authored-by: Jerry Hu <[email protected]>
Co-authored-by: zhiqiang <[email protected]>
Co-authored-by: Xinyi Zou <[email protected]>
Co-authored-by: Vallish Pai <[email protected]>
Co-authored-by: amory <[email protected]>
Co-authored-by: HappenLee <[email protected]>
Co-authored-by: Jensen <[email protected]>
Co-authored-by: zhangdong <[email protected]>
Co-authored-by: Yongqiang YANG <[email protected]>
Co-authored-by: jakevin <[email protected]>
Co-authored-by: Mryange <[email protected]>
Co-authored-by: zclllyybb <[email protected]>
Co-authored-by: Tiewei Fang <[email protected]>
Co-authored-by: Xin Liao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants