Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Opt](Iceberg) handle count pushdown in fe side #34928

Merged
merged 1 commit into from
May 17, 2024

Conversation

zhangbutao
Copy link
Contributor

Proposed changes

Iceberg count can be regarded as metadata operation, and this can be done in fe side. So we need to pass count push down to fe as well as no need to initialize iceberg spilits.

This optimization can reduce lots of time when dealing with iceberg count statement.

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@zhangbutao
Copy link
Contributor Author

cc @morningman

@zhangbutao
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40857 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a0c78380bc94bdfe29b49ca6908aeb9173b84a62, data reload: false

------ Round 1 ----------------------------------
q1	17699	4343	4290	4290
q2	2028	193	190	190
q3	10566	1314	1206	1206
q4	10814	802	882	802
q5	7829	2684	2669	2669
q6	217	136	137	136
q7	1048	604	609	604
q8	9425	2118	2073	2073
q9	9675	6669	6602	6602
q10	9413	3691	3690	3690
q11	459	247	237	237
q12	413	221	213	213
q13	17765	2975	2983	2975
q14	261	213	220	213
q15	509	469	474	469
q16	484	384	378	378
q17	962	671	740	671
q18	8018	7488	7458	7458
q19	2806	1555	1510	1510
q20	659	300	300	300
q21	5021	3893	3969	3893
q22	342	290	278	278
Total cold run time: 116413 ms
Total hot run time: 40857 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4335	4273	4235	4235
q2	370	270	272	270
q3	3003	2738	2721	2721
q4	1863	1611	1619	1611
q5	5257	5235	5256	5235
q6	215	126	125	125
q7	2210	1860	1855	1855
q8	3188	3322	3326	3322
q9	8308	8287	8316	8287
q10	3880	3715	3645	3645
q11	572	485	480	480
q12	732	602	586	586
q13	16198	2959	2974	2959
q14	289	263	267	263
q15	525	472	460	460
q16	462	406	436	406
q17	1752	1499	1458	1458
q18	7768	7479	7446	7446
q19	1641	1505	1490	1490
q20	1955	1776	1756	1756
q21	5047	4885	4873	4873
q22	567	487	482	482
Total cold run time: 70137 ms
Total hot run time: 53965 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186375 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a0c78380bc94bdfe29b49ca6908aeb9173b84a62, data reload: false

query1	912	391	372	372
query2	6472	2662	2423	2423
query3	6669	220	228	220
query4	23015	21569	21107	21107
query5	4232	428	423	423
query6	265	171	176	171
query7	4584	318	284	284
query8	236	188	184	184
query9	8568	2391	2396	2391
query10	430	255	246	246
query11	14886	14150	14190	14150
query12	137	93	92	92
query13	1641	384	365	365
query14	9801	8572	6858	6858
query15	257	172	165	165
query16	8070	278	262	262
query17	1837	574	568	568
query18	2078	290	278	278
query19	214	160	157	157
query20	95	89	90	89
query21	198	131	135	131
query22	5061	4859	4856	4856
query23	34609	33680	33397	33397
query24	12167	2922	2809	2809
query25	652	369	362	362
query26	1727	155	154	154
query27	3067	326	349	326
query28	7705	2062	2049	2049
query29	1007	616	604	604
query30	277	176	182	176
query31	1000	774	744	744
query32	97	52	54	52
query33	737	245	254	245
query34	1079	492	489	489
query35	816	683	688	683
query36	1079	919	942	919
query37	258	72	71	71
query38	2877	2788	2800	2788
query39	1625	1556	1559	1556
query40	284	126	123	123
query41	47	41	45	41
query42	105	97	100	97
query43	608	577	569	569
query44	1238	733	750	733
query45	270	245	248	245
query46	1074	726	727	726
query47	1940	1888	1906	1888
query48	379	314	302	302
query49	1178	403	401	401
query50	780	393	394	393
query51	6809	6803	6776	6776
query52	112	91	92	91
query53	345	282	279	279
query54	882	431	421	421
query55	76	72	75	72
query56	240	217	214	214
query57	1252	1142	1150	1142
query58	222	219	197	197
query59	3762	3065	3168	3065
query60	256	225	244	225
query61	91	84	89	84
query62	676	476	483	476
query63	304	285	282	282
query64	9769	7413	7363	7363
query65	3145	3101	3090	3090
query66	1385	368	333	333
query67	15342	15002	14980	14980
query68	4492	526	538	526
query69	473	295	310	295
query70	1202	1146	1168	1146
query71	409	265	271	265
query72	7157	2581	2344	2344
query73	705	324	323	323
query74	6557	6123	6104	6104
query75	3525	2643	2654	2643
query76	2939	1010	1061	1010
query77	401	270	271	270
query78	10635	10051	10113	10051
query79	2429	515	527	515
query80	1076	438	436	436
query81	523	242	243	242
query82	805	100	96	96
query83	247	168	185	168
query84	237	87	88	87
query85	1598	266	264	264
query86	524	325	290	290
query87	3283	3119	3048	3048
query88	4023	2356	2395	2356
query89	470	378	386	378
query90	1980	192	190	190
query91	126	96	98	96
query92	62	46	47	46
query93	1737	509	496	496
query94	1199	179	186	179
query95	403	297	301	297
query96	591	268	271	268
query97	3178	3028	3023	3023
query98	240	230	219	219
query99	1171	898	880	880
Total cold run time: 287405 ms
Total hot run time: 186375 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.43 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a0c78380bc94bdfe29b49ca6908aeb9173b84a62, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.23	0.04	0.05
query4	1.68	0.08	0.08
query5	0.50	0.49	0.49
query6	1.13	0.72	0.71
query7	0.02	0.02	0.01
query8	0.04	0.04	0.04
query9	0.54	0.49	0.49
query10	0.55	0.55	0.54
query11	0.16	0.12	0.12
query12	0.15	0.11	0.12
query13	0.60	0.59	0.60
query14	0.78	0.77	0.78
query15	0.83	0.80	0.80
query16	0.37	0.38	0.35
query17	1.02	1.01	0.98
query18	0.23	0.24	0.27
query19	1.73	1.74	1.71
query20	0.02	0.01	0.01
query21	15.70	0.68	0.65
query22	4.56	7.36	1.89
query23	18.19	1.30	1.25
query24	1.74	0.24	0.20
query25	0.13	0.09	0.08
query26	0.27	0.17	0.17
query27	0.08	0.08	0.08
query28	13.41	1.02	1.01
query29	13.19	3.28	3.25
query30	0.24	0.06	0.06
query31	2.86	0.40	0.39
query32	3.28	0.48	0.47
query33	2.83	2.84	2.80
query34	17.27	4.43	4.46
query35	4.49	4.50	4.50
query36	0.65	0.45	0.46
query37	0.18	0.15	0.16
query38	0.15	0.15	0.14
query39	0.04	0.04	0.04
query40	0.17	0.14	0.15
query41	0.09	0.05	0.04
query42	0.05	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 110.31 s
Total hot run time: 30.43 s

@zhangbutao
Copy link
Contributor Author

run buildall

@zhangbutao zhangbutao force-pushed the handle_iceberg_count_in_fe branch 2 times, most recently from 20824cd to 0fbe478 Compare May 16, 2024 02:26
@zhangbutao
Copy link
Contributor Author

run buildall

@zhangbutao
Copy link
Contributor Author

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 17, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@morrySnow morrySnow merged commit e4df4b4 into apache:master May 17, 2024
26 of 28 checks passed
dataroaring pushed a commit that referenced this pull request May 26, 2024
Iceberg count can be regarded as metadata operation, and this can be done in fe side.
So we need to pass count push down to fe as well as no need to initialize iceberg spilits.
This optimization can reduce lots of time when dealing with iceberg count statement.
yiguolei pushed a commit that referenced this pull request Jul 16, 2024
morningman pushed a commit that referenced this pull request Aug 2, 2024
dataroaring pushed a commit that referenced this pull request Aug 11, 2024
dataroaring pushed a commit that referenced this pull request Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.5-merged dev/3.0.0-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants