Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](scanner) Fix incorrect _max_thread_num in scanner context when many queries are running. #41273

Merged
merged 2 commits into from
Oct 11, 2024

Conversation

zhiqiang-hhhh
Copy link
Contributor

@zhiqiang-hhhh zhiqiang-hhhh commented Sep 25, 2024

  1. Minor refactor for scanner constructor, calculation of _max_thread_num is moved to init method
  2. The expected value of _max_thread_num is changed. There is no need to submit too many scan task to scan scheduler, since thread num is limited.
  3. Calculation of _max_bytes_in_queue is changed. _max_bytes_in_queue for each scan instance is limited to 100MB by default.
mysql [tpch]>select count(*) from supplier;
--------------
select count(*) from supplier
--------------

+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.04 sec)

mysql [tpch]>select count(*) from revenue0;
--------------
select count(*) from revenue0
--------------

+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.19 sec)

To illustrate the effect, we need to create much scanners, so

set global experimental_parallel_scan_min_rows_per_scanner=29715

default value is 2097152, we can make scanner num almost equal to experimental_parallel_scan_max_scanners_count which is 48.

Lets use mysqlslap to do concurrent test.

Current master:

[hezhiqiang@VM-10-8-centos be_1]$ mysqlslap -hxxxx -uroot -Pyyyy  --create-schema=tpch -c 20 -i 5 -q "select     s_suppkey,     s_name,     s_address,     s_phone,     total_revenue from     supplier,     revenue0 where     s_suppkey = supplier_no     and total_revenue = (         select             max(total_revenue)         from             revenue0     ) order by     s_suppkey;"
Benchmark
	Average number of seconds to run all queries: 12.480 seconds
	Minimum number of seconds to run all queries: 12.159 seconds
	Maximum number of seconds to run all queries: 12.843 seconds
	Number of clients running queries: 20
	Average number of queries per client: 1

[hezhiqiang@VM-10-8-centos be_1]$ mysqlslap -hyyyy -uroot -Pyyyy  --create-schema=tpch -c 25 -i 5 -q "select     s_suppkey,     s_name,     s_address,     s_phone,     total_revenue from     supplier,     revenue0 where     s_suppkey = supplier_no     and total_revenue = (         select             max(total_revenue)         from             revenue0     ) order by     s_suppkey;"
mysqlslap: Cannot run query select     s_suppkey,     s_name,     s_address,     s_phone,     total_revenue from     supplier,     revenue0 where     s_suppkey = supplier_no     and total_revenue = (         select             max(total_revenue)         from             revenue0     ) order by     s_suppkey; 
ERROR : errCode = 2, detailMessage = (10.16.10.8)[TOO_MANY_TASKS]Failed to submit scanner to scanner pool reason:Thread pool Scan_normal is at capacity (192/192 tasks running, 102400/102400 tasks queued)|type:0

After this pr

[hezhiqiang@VM-10-8-centos lib]$ mysqlslap -hxxx -uroot -Pxxx  --create-schema=tpch -c 50 -i 5 -q "select     s_suppkey,     s_name,     s_address,     s_phone,     total_revenue from     supplier,     revenue0 where     s_suppkey = supplier_no     and total_revenue = (         select             max(total_revenue)         from             revenue0     ) order by     s_suppkey;"
Benchmark
	Average number of seconds to run all queries: 31.520 seconds
	Minimum number of seconds to run all queries: 30.164 seconds
	Maximum number of seconds to run all queries: 34.131 seconds
	Number of clients running queries: 50
	Average number of queries per client: 1

The max concurrency increased from 25 to 50.

Actually, for sequential query test, the performance does not decrease, submit_many_scan_tasks_for_potential_performance_issue can be remove in the future.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@zhiqiang-hhhh zhiqiang-hhhh changed the title []Refactor scanner [fix](scanner) Fix incorrect _max_thread_num in scanner context when many queries are running. Sep 25, 2024
@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@zhiqiang-hhhh
Copy link
Contributor Author

almost same with #40569.

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.29% (9625/25808)
Line Coverage: 28.71% (79682/277509)
Region Coverage: 28.13% (41195/146443)
Branch Coverage: 24.77% (20991/84744)
Coverage Report: http://coverage.selectdb-in.cc/coverage/194bff621cbc6becdb9a841c6dd739ec307bbed6_194bff621cbc6becdb9a841c6dd739ec307bbed6/report/index.html

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41063 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c27cf187971c1d967d45faae7a5ec0bd3b10074a, data reload: false

------ Round 1 ----------------------------------
q1	17576	7502	7207	7207
q2	2010	286	278	278
q3	12264	1048	1152	1048
q4	10560	781	764	764
q5	7747	2974	2951	2951
q6	244	150	149	149
q7	1029	626	620	620
q8	9367	2015	2068	2015
q9	6605	6460	6401	6401
q10	6972	2257	2346	2257
q11	448	252	250	250
q12	416	215	224	215
q13	17774	2931	2993	2931
q14	238	208	205	205
q15	563	526	527	526
q16	639	591	565	565
q17	989	589	650	589
q18	7398	6669	6613	6613
q19	1395	1100	1115	1100
q20	493	206	198	198
q21	3969	3220	3195	3195
q22	1102	1016	986	986
Total cold run time: 109798 ms
Total hot run time: 41063 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7185	7224	7201	7201
q2	322	236	224	224
q3	3107	2980	3030	2980
q4	2095	1906	1814	1814
q5	5780	5751	5797	5751
q6	234	142	149	142
q7	2232	1811	1801	1801
q8	3395	3490	3466	3466
q9	8958	8899	8835	8835
q10	3591	3532	3551	3532
q11	582	497	499	497
q12	831	601	603	601
q13	11452	3197	3159	3159
q14	318	277	282	277
q15	581	522	510	510
q16	703	641	644	641
q17	1846	1660	1618	1618
q18	8284	7806	7470	7470
q19	1725	1622	1546	1546
q20	2097	1867	1840	1840
q21	5554	5414	5432	5414
q22	1133	1053	1052	1052
Total cold run time: 72005 ms
Total hot run time: 60371 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.31% (9635/25825)
Line Coverage: 28.70% (79707/277694)
Region Coverage: 28.13% (41215/146525)
Branch Coverage: 24.76% (20991/84778)
Coverage Report: http://coverage.selectdb-in.cc/coverage/c27cf187971c1d967d45faae7a5ec0bd3b10074a_c27cf187971c1d967d45faae7a5ec0bd3b10074a/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 191933 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c27cf187971c1d967d45faae7a5ec0bd3b10074a, data reload: false

query1	913	391	408	391
query2	6264	1982	1963	1963
query3	8706	192	199	192
query4	33657	23570	24199	23570
query5	4129	496	483	483
query6	277	170	162	162
query7	4202	305	309	305
query8	277	209	220	209
query9	8444	2660	2655	2655
query10	477	278	290	278
query11	17752	15630	15336	15336
query12	149	99	97	97
query13	1542	412	405	405
query14	9595	7223	7468	7223
query15	260	170	180	170
query16	7741	479	477	477
query17	1634	630	619	619
query18	1923	323	321	321
query19	218	176	167	167
query20	123	110	120	110
query21	212	108	130	108
query22	4912	4480	4427	4427
query23	34914	33986	33790	33790
query24	10945	3020	3029	3020
query25	629	421	410	410
query26	1067	163	168	163
query27	2444	297	297	297
query28	6458	2412	2373	2373
query29	818	442	424	424
query30	265	158	151	151
query31	1028	820	778	778
query32	101	54	55	54
query33	765	291	304	291
query34	921	491	488	488
query35	891	724	751	724
query36	1118	941	917	917
query37	167	92	82	82
query38	3974	3881	3907	3881
query39	1463	1418	1420	1418
query40	200	97	98	97
query41	51	49	50	49
query42	118	96	97	96
query43	510	450	466	450
query44	1207	791	774	774
query45	196	165	172	165
query46	1123	799	788	788
query47	1932	1841	1853	1841
query48	427	365	374	365
query49	932	419	422	419
query50	882	420	439	420
query51	7143	6972	6784	6784
query52	101	92	89	89
query53	260	183	180	180
query54	974	463	461	461
query55	79	77	74	74
query56	274	283	265	265
query57	1233	1096	1081	1081
query58	252	249	270	249
query59	3102	2892	2699	2699
query60	299	275	269	269
query61	106	103	105	103
query62	933	672	670	670
query63	217	184	184	184
query64	4061	631	628	628
query65	3263	3178	3159	3159
query66	961	302	331	302
query67	15847	15539	15493	15493
query68	4645	574	562	562
query69	571	287	295	287
query70	1197	1150	1099	1099
query71	462	270	278	270
query72	7331	3986	3999	3986
query73	761	335	336	335
query74	9601	8987	9000	8987
query75	4224	2758	2691	2691
query76	3641	1301	1145	1145
query77	587	334	321	321
query78	10554	9609	9662	9609
query79	2179	594	600	594
query80	3113	465	449	449
query81	603	241	240	240
query82	702	143	137	137
query83	314	134	135	134
query84	292	76	84	76
query85	1651	296	299	296
query86	470	302	312	302
query87	4540	4366	4324	4324
query88	3179	2395	2356	2356
query89	409	289	293	289
query90	2248	185	185	185
query91	176	143	141	141
query92	71	47	51	47
query93	2471	601	581	581
query94	1263	307	295	295
query95	358	251	249	249
query96	613	277	274	274
query97	3213	3173	3095	3095
query98	219	204	193	193
query99	1631	1301	1304	1301
Total cold run time: 301347 ms
Total hot run time: 191933 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.74 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c27cf187971c1d967d45faae7a5ec0bd3b10074a, data reload: false

query1	0.05	0.04	0.04
query2	0.06	0.02	0.03
query3	0.23	0.06	0.06
query4	1.64	0.09	0.10
query5	0.50	0.51	0.51
query6	1.15	0.74	0.72
query7	0.02	0.02	0.01
query8	0.04	0.02	0.03
query9	0.56	0.53	0.51
query10	0.57	0.55	0.57
query11	0.13	0.11	0.11
query12	0.13	0.11	0.11
query13	0.60	0.59	0.59
query14	2.76	2.81	2.89
query15	0.93	0.86	0.87
query16	0.39	0.38	0.38
query17	1.02	1.02	1.02
query18	0.18	0.18	0.18
query19	1.91	1.82	2.00
query20	0.01	0.01	0.01
query21	15.35	0.55	0.55
query22	3.26	4.14	2.80
query23	17.40	0.97	0.98
query24	2.63	0.29	0.13
query25	0.18	0.07	0.07
query26	0.17	0.14	0.13
query27	0.03	0.04	0.04
query28	12.29	1.02	0.98
query29	12.53	3.22	3.20
query30	0.25	0.06	0.06
query31	2.89	0.38	0.37
query32	3.28	0.46	0.46
query33	2.94	2.96	3.02
query34	15.47	4.44	4.39
query35	4.45	4.46	4.46
query36	0.69	0.48	0.47
query37	0.08	0.05	0.05
query38	0.05	0.04	0.03
query39	0.03	0.02	0.02
query40	0.15	0.12	0.13
query41	0.08	0.02	0.03
query42	0.03	0.02	0.02
query43	0.03	0.02	0.03
Total cold run time: 107.14 s
Total hot run time: 32.74 s

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40778 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c843245b629963449c6ef68b08c71939ae69941f, data reload: false

------ Round 1 ----------------------------------
q1	18022	7782	7306	7306
q2	2789	157	152	152
q3	11858	1179	1205	1179
q4	10588	761	772	761
q5	8392	2975	2929	2929
q6	246	153	150	150
q7	971	619	599	599
q8	9352	1969	1936	1936
q9	6677	6381	6379	6379
q10	6932	2291	2272	2272
q11	432	248	245	245
q12	398	216	216	216
q13	17781	3002	2998	2998
q14	241	214	225	214
q15	574	530	501	501
q16	627	586	594	586
q17	986	599	508	508
q18	7355	6472	6692	6472
q19	1395	1017	1047	1017
q20	482	208	212	208
q21	4009	3132	3132	3132
q22	1098	1018	1019	1018
Total cold run time: 111205 ms
Total hot run time: 40778 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7348	7209	7255	7209
q2	324	224	232	224
q3	2892	2760	2707	2707
q4	2058	1671	1686	1671
q5	5472	5493	5435	5435
q6	225	141	141	141
q7	2130	1711	1733	1711
q8	3244	3390	3384	3384
q9	8515	8486	8538	8486
q10	3471	3442	3423	3423
q11	581	486	468	468
q12	751	590	564	564
q13	6595	2989	3015	2989
q14	290	262	264	262
q15	554	520	512	512
q16	673	612	644	612
q17	1775	1553	1565	1553
q18	7869	7500	7492	7492
q19	1658	1535	1560	1535
q20	2057	1808	1807	1807
q21	5437	5079	5206	5079
q22	1130	1022	1016	1016
Total cold run time: 65049 ms
Total hot run time: 58280 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.30% (9627/25813)
Line Coverage: 28.71% (79693/277618)
Region Coverage: 28.13% (41191/146450)
Branch Coverage: 24.75% (20977/84758)
Coverage Report: http://coverage.selectdb-in.cc/coverage/c843245b629963449c6ef68b08c71939ae69941f_c843245b629963449c6ef68b08c71939ae69941f/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 190996 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c843245b629963449c6ef68b08c71939ae69941f, data reload: false

query1	962	383	381	381
query2	6516	2017	2081	2017
query3	6701	208	232	208
query4	33881	23379	23298	23298
query5	4469	481	474	474
query6	269	181	163	163
query7	4622	319	303	303
query8	270	216	216	216
query9	9467	2651	2636	2636
query10	456	292	286	286
query11	17879	15066	15153	15066
query12	151	99	97	97
query13	1632	421	420	420
query14	10609	7543	7063	7063
query15	262	171	180	171
query16	7909	473	447	447
query17	1666	578	557	557
query18	2129	311	325	311
query19	349	148	151	148
query20	124	106	107	106
query21	206	107	107	107
query22	4460	4271	4255	4255
query23	34780	33987	33906	33906
query24	11156	2795	2810	2795
query25	647	401	394	394
query26	1384	159	159	159
query27	2853	301	288	288
query28	7977	2393	2385	2385
query29	869	418	416	416
query30	308	158	152	152
query31	1011	793	837	793
query32	98	55	55	55
query33	759	286	289	286
query34	983	500	498	498
query35	867	718	752	718
query36	1107	957	959	957
query37	157	87	86	86
query38	4040	3876	3867	3867
query39	1485	1443	1421	1421
query40	283	100	95	95
query41	49	48	46	46
query42	115	94	92	92
query43	529	493	475	475
query44	1228	813	791	791
query45	202	163	166	163
query46	1148	726	720	720
query47	1914	1841	1818	1818
query48	451	367	371	367
query49	1129	397	390	390
query50	838	400	398	398
query51	6976	6871	6983	6871
query52	99	86	85	85
query53	261	188	182	182
query54	986	465	451	451
query55	78	75	80	75
query56	294	250	249	249
query57	1245	1113	1073	1073
query58	239	252	231	231
query59	3137	3193	3081	3081
query60	302	262	275	262
query61	112	108	108	108
query62	872	662	669	662
query63	223	193	188	188
query64	4402	744	720	720
query65	3265	3239	3180	3180
query66	1481	320	335	320
query67	15914	15362	15545	15362
query68	4035	593	567	567
query69	444	304	305	304
query70	1177	1063	1127	1063
query71	351	272	288	272
query72	6406	4279	4174	4174
query73	768	346	343	343
query74	9979	8931	9080	8931
query75	3394	2666	2685	2666
query76	2876	910	984	910
query77	497	297	291	291
query78	10524	9696	9620	9620
query79	2591	606	615	606
query80	1785	461	442	442
query81	582	244	242	242
query82	672	137	142	137
query83	243	139	144	139
query84	260	84	76	76
query85	2069	299	290	290
query86	508	282	306	282
query87	4371	4339	4262	4262
query88	3934	2374	2359	2359
query89	411	289	293	289
query90	2071	189	191	189
query91	180	144	142	142
query92	62	50	55	50
query93	2016	545	551	545
query94	1065	303	303	303
query95	365	259	268	259
query96	630	277	273	273
query97	3244	3154	3108	3108
query98	213	209	193	193
query99	1615	1318	1308	1308
Total cold run time: 302580 ms
Total hot run time: 190996 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.26 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c843245b629963449c6ef68b08c71939ae69941f, data reload: false

query1	0.04	0.05	0.04
query2	0.07	0.03	0.03
query3	0.23	0.07	0.06
query4	1.65	0.11	0.10
query5	0.50	0.50	0.53
query6	1.14	0.73	0.71
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.57	0.50	0.49
query10	0.54	0.57	0.56
query11	0.14	0.10	0.10
query12	0.13	0.10	0.10
query13	0.61	0.59	0.59
query14	2.72	2.71	2.70
query15	0.89	0.83	0.82
query16	0.38	0.39	0.36
query17	1.06	1.03	1.04
query18	0.23	0.22	0.22
query19	1.90	1.87	1.96
query20	0.01	0.01	0.01
query21	15.36	0.60	0.58
query22	2.54	2.10	1.86
query23	16.86	0.89	0.97
query24	3.08	0.48	2.07
query25	0.28	0.13	0.04
query26	0.42	0.13	0.13
query27	0.05	0.04	0.04
query28	10.43	1.11	1.06
query29	12.59	3.27	3.26
query30	0.25	0.06	0.06
query31	2.97	0.38	0.38
query32	3.26	0.46	0.46
query33	2.98	3.00	3.05
query34	17.04	4.41	4.41
query35	4.50	4.52	4.54
query36	0.67	0.48	0.47
query37	0.08	0.06	0.05
query38	0.05	0.03	0.03
query39	0.03	0.02	0.02
query40	0.16	0.12	0.12
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.03	0.03	0.02
Total cold run time: 106.61 s
Total hot run time: 32.26 s

HappenLee
HappenLee previously approved these changes Sep 30, 2024
Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 30, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Oct 8, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/src/vec/exec/scan/scanner_context.cpp Show resolved Hide resolved
@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

2 similar comments
@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.42% (9663/25826)
Line Coverage: 28.68% (80164/279482)
Region Coverage: 28.12% (41448/147416)
Branch Coverage: 24.72% (21109/85400)
Coverage Report: http://coverage.selectdb-in.cc/coverage/f5855b7090e0b8619c2bd3555b7d7a948f52dd86_f5855b7090e0b8619c2bd3555b7d7a948f52dd86/report/index.html

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.42% (9663/25826)
Line Coverage: 28.68% (80157/279482)
Region Coverage: 28.11% (41442/147414)
Branch Coverage: 24.71% (21105/85398)
Coverage Report: http://coverage.selectdb-in.cc/coverage/b67039d5dbc8657e11e5ba460b6cc9d44083cfdb_b67039d5dbc8657e11e5ba460b6cc9d44083cfdb/report/index.html

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Oct 11, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/src/vec/exec/scan/scanner_context.cpp Show resolved Hide resolved
@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.41% (9661/25828)
Line Coverage: 28.68% (80151/279507)
Region Coverage: 28.11% (41452/147461)
Branch Coverage: 24.71% (21112/85424)
Coverage Report: http://coverage.selectdb-in.cc/coverage/74c20dc03f13565a0d734ec46d3e1f0ecf41c574_74c20dc03f13565a0d734ec46d3e1f0ecf41c574/report/index.html

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 11, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@yiguolei yiguolei merged commit 3b18b1f into apache:master Oct 11, 2024
25 of 28 checks passed
@zhiqiang-hhhh zhiqiang-hhhh deleted the refactor-scanner branch October 12, 2024 09:04
zhiqiang-hhhh added a commit to zhiqiang-hhhh/doris that referenced this pull request Oct 17, 2024
…many queries are running. (apache#41273)

1. Minor refactor for scanner constructor, calculation of
_max_thread_num is moved to init method
2. The expected value of _max_thread_num is changed. There is no need to
submit too many scan task to scan scheduler, since thread num is
limited.
3. Calculation of _max_bytes_in_queue is changed. _max_bytes_in_queue
for each scan instance is limited to 100MB by default.

```
mysql [tpch]>select count(*) from supplier;
--------------
select count(*) from supplier
--------------

+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.04 sec)

mysql [tpch]>select count(*) from revenue0;
--------------
select count(*) from revenue0
--------------

+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.19 sec)
```
To illustrate the effect, we need to create much scanners, so 
```
set global experimental_parallel_scan_min_rows_per_scanner=29715
```
default value is `2097152`, we can make scanner num almost equal to
`experimental_parallel_scan_max_scanners_count` which is 48.

Lets use mysqlslap to do concurrent test.

Current master:
```text
[hezhiqiang@VM-10-8-centos be_1]$ mysqlslap -hxxxx -uroot -Pyyyy  --create-schema=tpch -c 20 -i 5 -q "select     s_suppkey,     s_name,     s_address,     s_phone,     total_revenue from     supplier,     revenue0 where     s_suppkey = supplier_no     and total_revenue = (         select             max(total_revenue)         from             revenue0     ) order by     s_suppkey;"
Benchmark
	Average number of seconds to run all queries: 12.480 seconds
	Minimum number of seconds to run all queries: 12.159 seconds
	Maximum number of seconds to run all queries: 12.843 seconds
	Number of clients running queries: 20
	Average number of queries per client: 1

[hezhiqiang@VM-10-8-centos be_1]$ mysqlslap -hyyyy -uroot -Pyyyy  --create-schema=tpch -c 25 -i 5 -q "select     s_suppkey,     s_name,     s_address,     s_phone,     total_revenue from     supplier,     revenue0 where     s_suppkey = supplier_no     and total_revenue = (         select             max(total_revenue)         from             revenue0     ) order by     s_suppkey;"
mysqlslap: Cannot run query select     s_suppkey,     s_name,     s_address,     s_phone,     total_revenue from     supplier,     revenue0 where     s_suppkey = supplier_no     and total_revenue = (         select             max(total_revenue)         from             revenue0     ) order by     s_suppkey; 
ERROR : errCode = 2, detailMessage = (10.16.10.8)[TOO_MANY_TASKS]Failed to submit scanner to scanner pool reason:Thread pool Scan_normal is at capacity (192/192 tasks running, 102400/102400 tasks queued)|type:0
```

After this pr
```
[hezhiqiang@VM-10-8-centos lib]$ mysqlslap -hxxx -uroot -Pxxx  --create-schema=tpch -c 50 -i 5 -q "select     s_suppkey,     s_name,     s_address,     s_phone,     total_revenue from     supplier,     revenue0 where     s_suppkey = supplier_no     and total_revenue = (         select             max(total_revenue)         from             revenue0     ) order by     s_suppkey;"
Benchmark
	Average number of seconds to run all queries: 31.520 seconds
	Minimum number of seconds to run all queries: 30.164 seconds
	Maximum number of seconds to run all queries: 34.131 seconds
	Number of clients running queries: 50
	Average number of queries per client: 1
```

The max concurrency increased from 25 to 50.

Actually, for sequential query test, the performance does not decrease,
`submit_many_scan_tasks_for_potential_performance_issue` can be remove
in the future.
zhiqiang-hhhh added a commit to zhiqiang-hhhh/doris that referenced this pull request Oct 17, 2024
…many queries are running. (apache#41273)

1. Minor refactor for scanner constructor, calculation of
_max_thread_num is moved to init method
2. The expected value of _max_thread_num is changed. There is no need to
submit too many scan task to scan scheduler, since thread num is
limited.
3. Calculation of _max_bytes_in_queue is changed. _max_bytes_in_queue
for each scan instance is limited to 100MB by default.

```
mysql [tpch]>select count(*) from supplier;
--------------
select count(*) from supplier
--------------

+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.04 sec)

mysql [tpch]>select count(*) from revenue0;
--------------
select count(*) from revenue0
--------------

+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.19 sec)
```
To illustrate the effect, we need to create much scanners, so
```
set global experimental_parallel_scan_min_rows_per_scanner=29715
```
default value is `2097152`, we can make scanner num almost equal to
`experimental_parallel_scan_max_scanners_count` which is 48.

Lets use mysqlslap to do concurrent test.

Current master:
```text
[hezhiqiang@VM-10-8-centos be_1]$ mysqlslap -hxxxx -uroot -Pyyyy  --create-schema=tpch -c 20 -i 5 -q "select     s_suppkey,     s_name,     s_address,     s_phone,     total_revenue from     supplier,     revenue0 where     s_suppkey = supplier_no     and total_revenue = (         select             max(total_revenue)         from             revenue0     ) order by     s_suppkey;"
Benchmark
	Average number of seconds to run all queries: 12.480 seconds
	Minimum number of seconds to run all queries: 12.159 seconds
	Maximum number of seconds to run all queries: 12.843 seconds
	Number of clients running queries: 20
	Average number of queries per client: 1

[hezhiqiang@VM-10-8-centos be_1]$ mysqlslap -hyyyy -uroot -Pyyyy  --create-schema=tpch -c 25 -i 5 -q "select     s_suppkey,     s_name,     s_address,     s_phone,     total_revenue from     supplier,     revenue0 where     s_suppkey = supplier_no     and total_revenue = (         select             max(total_revenue)         from             revenue0     ) order by     s_suppkey;"
mysqlslap: Cannot run query select     s_suppkey,     s_name,     s_address,     s_phone,     total_revenue from     supplier,     revenue0 where     s_suppkey = supplier_no     and total_revenue = (         select             max(total_revenue)         from             revenue0     ) order by     s_suppkey;
ERROR : errCode = 2, detailMessage = (10.16.10.8)[TOO_MANY_TASKS]Failed to submit scanner to scanner pool reason:Thread pool Scan_normal is at capacity (192/192 tasks running, 102400/102400 tasks queued)|type:0
```

After this pr
```
[hezhiqiang@VM-10-8-centos lib]$ mysqlslap -hxxx -uroot -Pxxx  --create-schema=tpch -c 50 -i 5 -q "select     s_suppkey,     s_name,     s_address,     s_phone,     total_revenue from     supplier,     revenue0 where     s_suppkey = supplier_no     and total_revenue = (         select             max(total_revenue)         from             revenue0     ) order by     s_suppkey;"
Benchmark
	Average number of seconds to run all queries: 31.520 seconds
	Minimum number of seconds to run all queries: 30.164 seconds
	Maximum number of seconds to run all queries: 34.131 seconds
	Number of clients running queries: 50
	Average number of queries per client: 1
```

The max concurrency increased from 25 to 50.

Actually, for sequential query test, the performance does not decrease,
`submit_many_scan_tasks_for_potential_performance_issue` can be remove
in the future.
yiguolei pushed a commit that referenced this pull request Oct 18, 2024
yiguolei pushed a commit that referenced this pull request Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.7-merged dev/3.0.3-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants