Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improvement](mtmv) Support grouping_sets rewrite when query rewrite by materialized view #35976

Closed

Conversation

seawinde
Copy link
Contributor

@seawinde seawinde commented Jun 6, 2024

Proposed changes

this is replaced by #36056

this depends on #35897

Support grouping_sets, cube, rollup query rewrite by materialized view, if mv group by fields contains all the group by fields in query.
For example as following:
mv def

CREATE MATERIALIZED VIEW mv_1 BUILD IMMEDIATE REFRESH AUTO ON MANUAL DISTRIBUTED BY RANDOM BUCKETS >2 PROPERTIES ('replication_num' = '1') AS
select
o_orderstatus,
o_orderdate,
o_orderpriority,
sum(o_totalprice) as sum_total,
max(o_totalprice) as max_total,
min(o_totalprice) as min_total,
count(*) as count_all,
bitmap_union(
to_bitmap(
case when o_shippriority > 1
and o_orderkey IN (1, 3) then o_custkey else null end
)
) as bitmap_union_basic
from
orders
group by
o_orderstatus,
o_orderdate,
o_orderpriority;

the query following can rewrite successfully by mv above

    select o_orderstatus, o_orderdate, o_orderpriority,
       grouping_id(o_orderstatus, o_orderdate, o_orderpriority),
       grouping_id(o_orderstatus, o_orderdate),
       grouping(o_orderdate),
       sum(o_totalprice),
       max(o_totalprice),
       min(o_totalprice),
       count(*),
       count(distinct case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)
       from orders
       group by
       GROUPING SETS ((o_orderstatus, o_orderdate), (o_orderpriority), (o_orderstatus), ());

if query group by fields is sub of mv group by fields, and the query aggregate function extends RollupTrait
it can also rewrites successfully, for example query as following.
this is applicable for CUBE, ROLLUP

       select o_orderstatus, o_orderdate,
       grouping_id(o_orderstatus, o_orderdate),
       grouping(o_orderdate),
       sum(o_totalprice),
       max(o_totalprice),
       min(o_totalprice),
       count(*),
       count(distinct case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)
       from orders
       group by
       GROUPING SETS ((o_orderstatus, o_orderdate), (o_orderdate),());

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@seawinde
Copy link
Contributor Author

seawinde commented Jun 6, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39936 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit be6fe8a95294fd1cd29ace4e680e6952c00d9ca5, data reload: false

------ Round 1 ----------------------------------
q1	18062	4475	4342	4342
q2	2624	193	192	192
q3	11482	1230	1090	1090
q4	10880	853	760	760
q5	7505	2708	2663	2663
q6	226	137	137	137
q7	958	623	613	613
q8	9638	2072	2103	2072
q9	9034	6500	6459	6459
q10	9139	3736	3695	3695
q11	460	250	239	239
q12	523	221	224	221
q13	17768	2992	2979	2979
q14	273	220	221	220
q15	504	469	475	469
q16	517	375	378	375
q17	954	699	655	655
q18	8041	7444	7399	7399
q19	4719	1473	1414	1414
q20	667	316	317	316
q21	4956	3287	4014	3287
q22	390	345	339	339
Total cold run time: 119320 ms
Total hot run time: 39936 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4348	4225	4197	4197
q2	382	257	270	257
q3	3011	2728	2738	2728
q4	1841	1630	1566	1566
q5	5230	5271	5299	5271
q6	214	128	127	127
q7	2072	1679	1720	1679
q8	3171	3285	3312	3285
q9	8309	8406	8278	8278
q10	3892	3693	3632	3632
q11	584	479	479	479
q12	770	590	610	590
q13	17340	2982	2987	2982
q14	300	284	258	258
q15	539	484	489	484
q16	471	403	423	403
q17	1784	1512	1476	1476
q18	7675	7560	7332	7332
q19	1736	1612	1579	1579
q20	1996	1786	1787	1786
q21	4805	4676	4718	4676
q22	634	529	530	529
Total cold run time: 71104 ms
Total hot run time: 53594 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 168980 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit be6fe8a95294fd1cd29ace4e680e6952c00d9ca5, data reload: false

query1	931	378	368	368
query2	6449	2388	2314	2314
query3	6651	212	210	210
query4	20799	17257	17129	17129
query5	4115	469	480	469
query6	244	160	153	153
query7	4593	299	293	293
query8	329	290	281	281
query9	8518	2425	2388	2388
query10	444	289	277	277
query11	10561	9963	9918	9918
query12	125	83	81	81
query13	1628	368	370	368
query14	10299	6984	7516	6984
query15	233	183	181	181
query16	7807	268	269	268
query17	1543	523	512	512
query18	1866	270	287	270
query19	194	151	158	151
query20	89	83	82	82
query21	212	135	127	127
query22	4486	4047	4073	4047
query23	33592	32982	33082	32982
query24	12090	2792	2860	2792
query25	666	372	360	360
query26	1787	151	151	151
query27	2959	317	325	317
query28	7254	2025	2071	2025
query29	1096	610	607	607
query30	269	147	145	145
query31	955	745	726	726
query32	87	52	51	51
query33	761	280	275	275
query34	976	460	465	460
query35	755	632	646	632
query36	1094	947	915	915
query37	198	72	80	72
query38	2824	2756	2744	2744
query39	885	797	790	790
query40	282	124	125	124
query41	56	55	52	52
query42	124	127	97	97
query43	583	558	574	558
query44	1246	730	730	730
query45	196	172	166	166
query46	1079	739	713	713
query47	1826	1774	1778	1774
query48	371	295	294	294
query49	1200	403	417	403
query50	788	389	406	389
query51	6754	6656	6736	6656
query52	99	97	88	88
query53	357	284	291	284
query54	983	448	436	436
query55	76	74	73	73
query56	280	247	249	247
query57	1128	1034	1082	1034
query58	244	264	244	244
query59	3341	3255	3020	3020
query60	315	267	271	267
query61	91	89	94	89
query62	649	437	435	435
query63	325	289	288	288
query64	9867	2230	1750	1750
query65	3151	3072	3112	3072
query66	1342	329	332	329
query67	15650	15073	14904	14904
query68	4918	546	550	546
query69	571	429	383	383
query70	1157	1057	1098	1057
query71	437	273	266	266
query72	7044	2751	2550	2550
query73	764	324	332	324
query74	5982	5474	5475	5474
query75	3564	2650	2671	2650
query76	3399	963	845	845
query77	661	304	295	295
query78	10294	9888	9626	9626
query79	2160	527	507	507
query80	2057	468	450	450
query81	594	213	221	213
query82	1019	101	103	101
query83	292	168	168	168
query84	267	87	83	83
query85	1562	297	264	264
query86	475	315	323	315
query87	3285	3078	3089	3078
query88	4200	2376	2357	2357
query89	481	383	377	377
query90	1783	198	193	193
query91	128	100	96	96
query92	61	48	48	48
query93	2260	516	502	502
query94	1210	193	193	193
query95	407	317	380	317
query96	588	265	264	264
query97	3180	2988	2984	2984
query98	227	201	193	193
query99	1132	830	843	830
Total cold run time: 278926 ms
Total hot run time: 168980 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.17 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit be6fe8a95294fd1cd29ace4e680e6952c00d9ca5, data reload: false

query1	0.03	0.03	0.03
query2	0.08	0.04	0.03
query3	0.23	0.05	0.05
query4	1.67	0.09	0.08
query5	0.51	0.49	0.50
query6	1.12	0.72	0.74
query7	0.02	0.01	0.02
query8	0.05	0.04	0.04
query9	0.54	0.49	0.50
query10	0.56	0.54	0.55
query11	0.15	0.11	0.12
query12	0.15	0.12	0.12
query13	0.60	0.60	0.61
query14	0.76	0.78	0.78
query15	0.84	0.81	0.81
query16	0.39	0.36	0.37
query17	1.01	0.96	0.97
query18	0.22	0.25	0.26
query19	1.87	1.71	1.70
query20	0.01	0.01	0.01
query21	15.62	0.67	0.66
query22	4.08	7.79	1.63
query23	18.23	1.34	1.26
query24	2.06	0.22	0.22
query25	0.15	0.09	0.07
query26	0.27	0.18	0.17
query27	0.08	0.08	0.08
query28	13.29	1.01	1.00
query29	13.21	3.28	3.26
query30	0.24	0.07	0.06
query31	2.87	0.40	0.39
query32	3.27	0.50	0.46
query33	2.81	2.88	2.87
query34	17.10	4.40	4.40
query35	4.46	4.43	4.51
query36	0.65	0.48	0.46
query37	0.18	0.15	0.16
query38	0.15	0.13	0.15
query39	0.04	0.03	0.04
query40	0.18	0.14	0.14
query41	0.09	0.05	0.04
query42	0.06	0.05	0.05
query43	0.04	0.03	0.04
Total cold run time: 109.94 s
Total hot run time: 30.17 s

return true;
}
// if both query and view has group sets, or query doesn't hava, mv have, not supported
if ((queryHasGroupSets && viewHasGroupSets) || (!queryHasGroupSets && viewHasGroupSets)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ((queryHasGroupSets && viewHasGroupSets) || (!queryHasGroupSets && viewHasGroupSets)) {
if (viewHasGroupSets) {

Comment on lines +278 to +280
if (!queryHasGroupSets && !viewHasGroupSets) {
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this if

@seawinde seawinde closed this Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants