Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](serde)Optimize the filling of fixed values ​​into block columns without repeated deserialization. #37377

Merged
merged 5 commits into from
Jul 9, 2024

Conversation

hubgeter
Copy link
Contributor

@hubgeter hubgeter commented Jul 6, 2024

Proposed changes

Since the value of the partition column is fixed when querying the partition table, we can deserialize the value only once and then repeatedly insert the value into the block.

in Hive: 
CREATE TABLE parquet_partition_tb (
    col1 STRING,
    col2 INT,
    col3 DOUBLE
) PARTITIONED BY (
    partition_col1 STRING,
    partition_col2 INT
)
STORED AS PARQUET;

insert into  parquet_partition_tb partition (partition_col1="hello",partition_col2=1) values("word",2,2.3);

insert into parquet_partition_tb partition(partition_col1="hello",partition_col2=1 )  
select col1,col2,col3 from  parquet_partition_tb where partition_col1="hello" and partition_col2=1;
Repeat the `insert into xxx select  xxx`operation several times.


Doris :
before:
mysql>  select count(partition_col1) from parquet_partition_tb;
+-----------------------+
| count(partition_col1) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (3.24 sec)

mysql>  select count(partition_col2) from parquet_partition_tb;
+-----------------------+
| count(partition_col2) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (3.34 sec)


after:
mysql>  select count(partition_col1) from parquet_partition_tb ;
+-----------------------+
| count(partition_col1) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (0.79 sec)

mysql> select count(partition_col2) from parquet_partition_tb;
+-----------------------+
| count(partition_col2) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (0.51 sec)

Summary:

test sql select count(partition_col) from tbl;
Number of lines : 33554432

before after
boolean 3.96 0.47
tinyint 3.39 0.47
smallint 3.14 0.50
int 3.34 0.51
bigint 3.61 0.51
float 4.59 0.51
double 4.60 0.55
decimal(5,2) 3.96 0.61
date 5.80 0.52
timestamp 7.68 0.52
string 3.24 0.79

Issue Number: close #xxx

@hubgeter
Copy link
Contributor Author

hubgeter commented Jul 6, 2024

run buildall

Copy link
Contributor

github-actions bot commented Jul 6, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40091 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d4bc530af61f63b9da46d9c9a668f6562b3d9102, data reload: false

------ Round 1 ----------------------------------
q1	17636	4689	4369	4369
q2	2023	196	188	188
q3	10478	1210	1147	1147
q4	10180	785	935	785
q5	7494	2710	2634	2634
q6	225	138	141	138
q7	966	608	615	608
q8	9242	2116	2132	2116
q9	9124	6538	6538	6538
q10	8940	3734	3757	3734
q11	449	240	242	240
q12	435	236	235	235
q13	17769	3008	2997	2997
q14	266	236	243	236
q15	540	484	486	484
q16	513	376	374	374
q17	994	646	668	646
q18	8081	7504	7332	7332
q19	7782	1513	1405	1405
q20	659	325	329	325
q21	4949	3225	3898	3225
q22	399	341	335	335
Total cold run time: 119144 ms
Total hot run time: 40091 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4454	4292	4300	4292
q2	383	276	260	260
q3	3108	2922	2943	2922
q4	1967	1737	1784	1737
q5	5527	5506	5418	5418
q6	235	138	135	135
q7	2287	1868	1824	1824
q8	3310	3493	3448	3448
q9	8679	8883	8815	8815
q10	4103	3711	3786	3711
q11	596	500	515	500
q12	818	643	622	622
q13	16013	3177	3197	3177
q14	326	277	283	277
q15	539	489	492	489
q16	463	423	443	423
q17	1858	1558	1508	1508
q18	8092	8128	7834	7834
q19	1787	1825	1652	1652
q20	2251	1861	1869	1861
q21	5046	5006	4836	4836
q22	604	552	554	552
Total cold run time: 72446 ms
Total hot run time: 56293 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172692 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d4bc530af61f63b9da46d9c9a668f6562b3d9102, data reload: false

query1	917	390	368	368
query2	7546	2288	2435	2288
query3	6637	209	216	209
query4	27666	17125	17209	17125
query5	3682	461	500	461
query6	313	177	169	169
query7	4607	294	295	294
query8	337	282	288	282
query9	8500	2360	2353	2353
query10	562	299	280	280
query11	11914	9859	10036	9859
query12	118	83	81	81
query13	1636	389	368	368
query14	10054	7043	7812	7043
query15	236	187	184	184
query16	7477	302	303	302
query17	1355	549	512	512
query18	1934	275	265	265
query19	198	157	160	157
query20	89	82	82	82
query21	207	134	127	127
query22	4301	4133	3928	3928
query23	33932	33400	33435	33400
query24	11135	2882	2897	2882
query25	622	394	382	382
query26	1159	160	155	155
query27	2330	324	330	324
query28	7256	2112	2116	2112
query29	914	618	639	618
query30	264	153	151	151
query31	998	764	734	734
query32	99	55	54	54
query33	737	309	308	308
query34	963	496	507	496
query35	758	637	640	637
query36	1140	993	1006	993
query37	143	80	91	80
query38	2918	2887	2766	2766
query39	923	854	832	832
query40	216	133	132	132
query41	54	53	51	51
query42	116	101	101	101
query43	597	566	574	566
query44	1253	740	758	740
query45	197	171	158	158
query46	1078	753	724	724
query47	1890	1755	1795	1755
query48	384	297	308	297
query49	845	413	419	413
query50	789	400	393	393
query51	6871	6728	6811	6728
query52	111	94	94	94
query53	352	289	285	285
query54	884	449	456	449
query55	75	76	76	76
query56	288	264	265	264
query57	1123	1029	1047	1029
query58	245	251	265	251
query59	3437	3136	3229	3136
query60	316	272	275	272
query61	99	95	98	95
query62	604	438	450	438
query63	325	322	289	289
query64	9472	2158	1661	1661
query65	3223	3182	3119	3119
query66	735	329	336	329
query67	15442	14911	14993	14911
query68	4481	536	535	535
query69	578	454	365	365
query70	1187	1081	1096	1081
query71	392	285	291	285
query72	6953	5653	5653	5653
query73	756	320	324	320
query74	5918	5502	5467	5467
query75	3374	2703	2673	2673
query76	2287	962	926	926
query77	632	314	305	305
query78	9481	8918	8880	8880
query79	3198	530	524	524
query80	2367	467	471	467
query81	603	220	225	220
query82	1386	115	113	113
query83	309	174	172	172
query84	273	98	88	88
query85	1356	329	379	329
query86	465	309	313	309
query87	3305	3041	3078	3041
query88	4257	2459	2449	2449
query89	496	392	373	373
query90	1741	190	193	190
query91	132	105	103	103
query92	60	52	52	52
query93	4417	505	506	505
query94	1114	212	213	212
query95	404	323	322	322
query96	605	277	270	270
query97	3172	3032	3020	3020
query98	214	198	193	193
query99	1161	848	833	833
Total cold run time: 284672 ms
Total hot run time: 172692 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.23 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d4bc530af61f63b9da46d9c9a668f6562b3d9102, data reload: false

query1	0.04	0.04	0.04
query2	0.08	0.04	0.04
query3	0.23	0.06	0.05
query4	1.66	0.08	0.07
query5	0.48	0.47	0.48
query6	1.14	0.72	0.73
query7	0.02	0.01	0.02
query8	0.06	0.04	0.04
query9	0.56	0.49	0.49
query10	0.53	0.55	0.54
query11	0.15	0.11	0.11
query12	0.15	0.13	0.12
query13	0.59	0.59	0.58
query14	0.77	0.77	0.76
query15	0.84	0.81	0.81
query16	0.37	0.37	0.36
query17	1.00	1.04	0.96
query18	0.22	0.24	0.26
query19	1.78	1.67	1.71
query20	0.01	0.01	0.02
query21	15.39	0.75	0.64
query22	4.06	7.74	1.74
query23	18.30	1.41	1.23
query24	2.20	0.22	0.23
query25	0.16	0.09	0.08
query26	0.30	0.21	0.20
query27	0.46	0.23	0.23
query28	13.15	1.02	1.00
query29	12.61	3.28	3.26
query30	0.25	0.06	0.06
query31	2.86	0.38	0.38
query32	3.29	0.48	0.46
query33	2.93	2.86	2.85
query34	16.78	4.37	4.36
query35	4.42	4.38	4.46
query36	0.64	0.47	0.47
query37	0.20	0.15	0.15
query38	0.15	0.14	0.14
query39	0.04	0.03	0.03
query40	0.16	0.13	0.13
query41	0.09	0.05	0.05
query42	0.05	0.04	0.05
query43	0.05	0.04	0.04
Total cold run time: 109.22 s
Total hot run time: 30.23 s

@hubgeter
Copy link
Contributor Author

hubgeter commented Jul 7, 2024

run buildall

Copy link
Contributor

github-actions bot commented Jul 7, 2024

clang-tidy review says "All clean, LGTM! 👍"

@hubgeter
Copy link
Contributor Author

hubgeter commented Jul 7, 2024

run buildall

Copy link
Contributor

github-actions bot commented Jul 7, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40231 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 23a7dc5059cd3c23630ded99e6048845875e7fe0, data reload: false

------ Round 1 ----------------------------------
q1	17604	4415	4333	4333
q2	2018	194	187	187
q3	10454	1146	1079	1079
q4	10179	823	897	823
q5	7508	2694	2667	2667
q6	223	134	136	134
q7	972	596	605	596
q8	9223	2098	2143	2098
q9	8826	6565	6515	6515
q10	9004	3752	3741	3741
q11	451	246	246	246
q12	441	230	222	222
q13	17767	3012	3011	3011
q14	275	223	232	223
q15	534	486	477	477
q16	505	390	374	374
q17	973	699	681	681
q18	8107	7543	7423	7423
q19	5625	1538	1557	1538
q20	699	338	337	337
q21	4939	3187	3869	3187
q22	400	339	340	339
Total cold run time: 116727 ms
Total hot run time: 40231 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4428	4252	4277	4252
q2	387	272	268	268
q3	2952	2845	2844	2844
q4	2034	1707	1763	1707
q5	5689	5534	5466	5466
q6	225	136	129	129
q7	2202	1898	1851	1851
q8	3305	3462	3463	3462
q9	8878	8894	8910	8894
q10	4175	3795	3793	3793
q11	604	495	508	495
q12	821	645	654	645
q13	16384	3179	3262	3179
q14	309	294	309	294
q15	521	491	489	489
q16	503	449	440	440
q17	1880	1530	1507	1507
q18	8126	8119	7895	7895
q19	1794	1684	1635	1635
q20	2148	1883	1838	1838
q21	5091	4794	4868	4794
q22	691	585	576	576
Total cold run time: 73147 ms
Total hot run time: 56453 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173854 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 23a7dc5059cd3c23630ded99e6048845875e7fe0, data reload: false

query1	925	379	373	373
query2	6464	2523	2479	2479
query3	6633	204	216	204
query4	27778	17439	17168	17168
query5	3606	490	484	484
query6	254	168	159	159
query7	4591	290	286	286
query8	344	302	303	302
query9	8772	2371	2364	2364
query10	584	308	290	290
query11	11510	10242	10145	10145
query12	126	88	82	82
query13	1660	400	381	381
query14	10114	7990	7432	7432
query15	252	188	192	188
query16	7833	328	327	327
query17	1783	564	545	545
query18	2009	289	285	285
query19	209	154	155	154
query20	89	84	85	84
query21	214	131	134	131
query22	4315	4217	4008	4008
query23	34194	33904	33667	33667
query24	11130	2913	2861	2861
query25	603	437	413	413
query26	711	162	159	159
query27	2312	328	327	327
query28	5745	2185	2122	2122
query29	912	664	651	651
query30	260	154	152	152
query31	986	760	747	747
query32	105	58	64	58
query33	765	325	318	318
query34	957	485	503	485
query35	833	640	643	640
query36	1162	916	971	916
query37	149	79	85	79
query38	2963	2869	2859	2859
query39	883	829	844	829
query40	200	122	122	122
query41	53	50	53	50
query42	111	97	97	97
query43	579	554	576	554
query44	1228	734	737	734
query45	201	159	160	159
query46	1082	728	703	703
query47	1864	1816	1795	1795
query48	372	307	292	292
query49	852	415	417	415
query50	776	384	380	380
query51	6872	6852	6683	6683
query52	107	95	93	93
query53	360	281	283	281
query54	898	453	447	447
query55	74	71	73	71
query56	281	287	272	272
query57	1142	1073	1069	1069
query58	256	252	255	252
query59	3615	3248	3346	3248
query60	323	279	287	279
query61	99	95	94	94
query62	620	447	437	437
query63	322	289	296	289
query64	9154	2162	1660	1660
query65	3191	3224	3110	3110
query66	757	324	324	324
query67	15837	15006	15123	15006
query68	5602	533	534	533
query69	691	458	344	344
query70	1136	1133	1061	1061
query71	451	286	277	277
query72	8607	5645	5255	5255
query73	802	321	334	321
query74	6104	5565	5480	5480
query75	4304	2625	2688	2625
query76	3906	959	986	959
query77	672	325	296	296
query78	10342	9659	8816	8816
query79	8597	518	514	514
query80	1188	481	471	471
query81	592	229	223	223
query82	775	104	103	103
query83	290	166	166	166
query84	278	90	85	85
query85	1354	310	308	308
query86	343	323	371	323
query87	3312	3125	3094	3094
query88	4608	2475	2454	2454
query89	540	399	385	385
query90	1839	188	188	188
query91	133	104	103	103
query92	59	49	50	49
query93	6416	516	508	508
query94	956	213	212	212
query95	401	320	318	318
query96	605	271	268	268
query97	3167	3026	3033	3026
query98	213	195	205	195
query99	1177	824	844	824
Total cold run time: 295421 ms
Total hot run time: 173854 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 23a7dc5059cd3c23630ded99e6048845875e7fe0, data reload: false

query1	0.04	0.03	0.04
query2	0.08	0.04	0.03
query3	0.23	0.06	0.05
query4	1.67	0.06	0.06
query5	0.52	0.48	0.49
query6	1.13	0.73	0.74
query7	0.02	0.02	0.02
query8	0.05	0.04	0.04
query9	0.55	0.49	0.49
query10	0.56	0.54	0.54
query11	0.16	0.11	0.11
query12	0.15	0.12	0.13
query13	0.59	0.58	0.58
query14	0.77	0.80	0.76
query15	0.84	0.82	0.81
query16	0.36	0.37	0.37
query17	0.99	0.96	0.98
query18	0.25	0.24	0.25
query19	1.77	1.68	1.76
query20	0.01	0.00	0.00
query21	15.39	0.78	0.67
query22	4.87	6.03	2.36
query23	18.27	1.30	1.28
query24	2.11	0.23	0.22
query25	0.16	0.09	0.09
query26	0.30	0.20	0.20
query27	0.46	0.23	0.22
query28	13.27	1.02	1.00
query29	12.57	3.31	3.25
query30	0.25	0.06	0.05
query31	2.85	0.39	0.38
query32	3.30	0.48	0.47
query33	2.97	2.98	2.91
query34	17.15	4.34	4.39
query35	4.40	4.42	4.47
query36	0.64	0.49	0.48
query37	0.18	0.15	0.15
query38	0.14	0.14	0.15
query39	0.04	0.03	0.04
query40	0.16	0.13	0.13
query41	0.09	0.06	0.04
query42	0.06	0.04	0.05
query43	0.04	0.04	0.03
Total cold run time: 110.41 s
Total hot run time: 31 s

@hubgeter
Copy link
Contributor Author

hubgeter commented Jul 8, 2024

run p1

@hubgeter
Copy link
Contributor Author

hubgeter commented Jul 8, 2024

run buildall

Copy link
Contributor

github-actions bot commented Jul 8, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40865 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 28fe6a842715020d74fb7bfe46dedbf1c2ee4500, data reload: false

------ Round 1 ----------------------------------
q1	17645	4443	4292	4292
q2	2026	192	190	190
q3	10453	1187	1087	1087
q4	10192	819	813	813
q5	7490	2685	2662	2662
q6	225	144	143	143
q7	966	618	626	618
q8	9231	2079	2106	2079
q9	8964	6531	6518	6518
q10	9023	3760	3743	3743
q11	477	241	245	241
q12	564	243	235	235
q13	18989	3008	3022	3008
q14	268	242	233	233
q15	520	492	500	492
q16	522	383	377	377
q17	978	704	729	704
q18	8181	7667	7348	7348
q19	1683	1562	1463	1463
q20	652	321	335	321
q21	5155	3976	3952	3952
q22	425	346	357	346
Total cold run time: 114629 ms
Total hot run time: 40865 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4329	4251	4241	4241
q2	359	274	264	264
q3	2998	2717	2713	2713
q4	1925	1651	1704	1651
q5	5318	5296	5340	5296
q6	218	134	135	134
q7	2150	1788	1743	1743
q8	3238	3404	3363	3363
q9	8357	8383	8386	8383
q10	3938	3683	3736	3683
q11	584	501	486	486
q12	839	622	609	609
q13	17455	3036	3013	3013
q14	287	264	274	264
q15	528	474	471	471
q16	478	425	418	418
q17	1790	1479	1488	1479
q18	7624	7632	7272	7272
q19	2457	1641	1551	1551
q20	2010	1756	1812	1756
q21	4932	4794	4876	4794
q22	648	575	551	551
Total cold run time: 72462 ms
Total hot run time: 54135 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173943 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 28fe6a842715020d74fb7bfe46dedbf1c2ee4500, data reload: false

query1	913	372	371	371
query2	6462	2333	2462	2333
query3	6655	206	216	206
query4	28378	17585	17338	17338
query5	4222	476	477	476
query6	251	195	168	168
query7	4600	291	286	286
query8	311	297	306	297
query9	8598	2409	2400	2400
query10	630	308	282	282
query11	10697	10186	10056	10056
query12	145	89	88	88
query13	1662	402	391	391
query14	10443	7806	7722	7722
query15	246	192	177	177
query16	7765	322	312	312
query17	1807	564	548	548
query18	1820	290	287	287
query19	200	163	162	162
query20	91	82	88	82
query21	218	143	130	130
query22	4352	3996	3955	3955
query23	33847	33203	33054	33054
query24	8792	2798	2802	2798
query25	613	388	385	385
query26	702	150	153	150
query27	2194	276	279	276
query28	5687	2104	2087	2087
query29	909	651	634	634
query30	295	149	150	149
query31	987	755	750	750
query32	99	54	55	54
query33	690	308	308	308
query34	878	504	499	499
query35	688	628	594	594
query36	1086	940	938	938
query37	133	83	81	81
query38	2983	2772	2746	2746
query39	860	812	814	812
query40	206	120	121	120
query41	54	51	52	51
query42	117	101	99	99
query43	576	578	547	547
query44	1124	747	754	747
query45	194	164	159	159
query46	1082	732	728	728
query47	1862	1761	1767	1761
query48	383	321	294	294
query49	1042	413	406	406
query50	782	404	415	404
query51	6898	6845	6740	6740
query52	109	89	97	89
query53	364	294	294	294
query54	847	484	446	446
query55	77	73	76	73
query56	288	266	267	266
query57	1151	1022	1048	1022
query58	256	252	259	252
query59	3572	3089	3287	3089
query60	297	274	279	274
query61	95	98	101	98
query62	808	644	617	617
query63	330	291	286	286
query64	9250	2178	1631	1631
query65	3185	3117	3121	3117
query66	822	344	325	325
query67	15695	15073	15136	15073
query68	4688	549	538	538
query69	569	337	331	331
query70	1189	1140	1118	1118
query71	393	287	279	279
query72	8473	5585	5312	5312
query73	749	328	326	326
query74	5912	5565	5633	5565
query75	4273	2616	2693	2616
query76	3032	960	973	960
query77	726	313	306	306
query78	9590	8987	10229	8987
query79	4455	518	530	518
query80	2781	522	475	475
query81	599	221	220	220
query82	938	141	133	133
query83	314	166	169	166
query84	271	88	89	88
query85	2115	316	299	299
query86	480	330	295	295
query87	3299	3105	3177	3105
query88	4066	2441	2439	2439
query89	490	385	385	385
query90	1952	188	187	187
query91	129	105	104	104
query92	108	49	49	49
query93	3705	513	514	513
query94	1395	212	213	212
query95	415	322	308	308
query96	598	274	270	270
query97	3187	3049	3006	3006
query98	209	207	192	192
query99	1489	1260	1251	1251
Total cold run time: 285964 ms
Total hot run time: 173943 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.26 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 28fe6a842715020d74fb7bfe46dedbf1c2ee4500, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.04	0.05
query3	0.22	0.05	0.05
query4	1.68	0.07	0.07
query5	0.50	0.48	0.48
query6	1.13	0.74	0.72
query7	0.01	0.01	0.01
query8	0.06	0.05	0.05
query9	0.55	0.49	0.50
query10	0.54	0.55	0.55
query11	0.16	0.12	0.11
query12	0.15	0.12	0.12
query13	0.59	0.60	0.58
query14	0.76	0.77	0.78
query15	0.85	0.83	0.81
query16	0.36	0.38	0.37
query17	1.00	0.97	1.02
query18	0.22	0.22	0.22
query19	1.79	1.70	1.66
query20	0.01	0.00	0.00
query21	15.39	0.77	0.64
query22	4.41	6.44	2.58
query23	18.26	1.32	1.27
query24	2.13	0.24	0.23
query25	0.17	0.09	0.09
query26	0.29	0.21	0.20
query27	0.45	0.24	0.23
query28	13.25	1.01	1.00
query29	12.60	3.27	3.30
query30	0.25	0.06	0.05
query31	2.86	0.38	0.38
query32	3.29	0.49	0.48
query33	2.86	2.99	2.91
query34	17.06	4.35	4.35
query35	4.44	4.44	4.43
query36	0.65	0.48	0.48
query37	0.19	0.15	0.15
query38	0.15	0.14	0.14
query39	0.05	0.03	0.04
query40	0.14	0.12	0.13
query41	0.09	0.04	0.04
query42	0.06	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 109.77 s
Total hot run time: 31.26 s

@hubgeter
Copy link
Contributor Author

hubgeter commented Jul 9, 2024

run buildall

Copy link
Contributor

github-actions bot commented Jul 9, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40165 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c5a286eb7f901677c92831d7c794d993b32b3a4e, data reload: false

------ Round 1 ----------------------------------
q1	17596	4441	4286	4286
q2	2009	191	190	190
q3	10454	1212	1166	1166
q4	10191	835	723	723
q5	7476	2654	2655	2654
q6	221	138	140	138
q7	954	602	606	602
q8	9226	2104	2123	2104
q9	8941	6517	6445	6445
q10	8883	3796	3734	3734
q11	481	239	243	239
q12	454	231	237	231
q13	18025	2977	2985	2977
q14	266	225	243	225
q15	535	488	498	488
q16	529	375	374	374
q17	980	761	712	712
q18	8242	7552	7527	7527
q19	7300	1452	1508	1452
q20	697	323	336	323
q21	4923	3236	3880	3236
q22	391	339	354	339
Total cold run time: 118774 ms
Total hot run time: 40165 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4393	4254	4230	4230
q2	367	266	257	257
q3	3093	2933	2956	2933
q4	1946	1737	1720	1720
q5	5546	5543	5476	5476
q6	229	132	131	131
q7	2237	1901	1834	1834
q8	3256	3445	3449	3445
q9	8721	8822	8779	8779
q10	4133	3662	3743	3662
q11	579	490	489	489
q12	813	659	649	649
q13	16510	3168	3205	3168
q14	293	302	290	290
q15	522	477	483	477
q16	508	429	436	429
q17	1839	1542	1486	1486
q18	8127	7943	7822	7822
q19	4303	1755	1678	1678
q20	2130	1861	1851	1851
q21	5079	4784	4739	4739
q22	672	552	584	552
Total cold run time: 75296 ms
Total hot run time: 56097 ms

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 9, 2024
Copy link
Contributor

github-actions bot commented Jul 9, 2024

PR approved by at least one committer and no changes requested.

@doris-robot
Copy link

TPC-DS: Total hot run time: 171645 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c5a286eb7f901677c92831d7c794d993b32b3a4e, data reload: false

query1	902	367	357	357
query2	6449	2533	2426	2426
query3	6638	211	219	211
query4	27993	17324	17578	17324
query5	3635	492	480	480
query6	262	169	166	166
query7	4577	299	293	293
query8	336	291	322	291
query9	8468	2474	2448	2448
query10	577	302	294	294
query11	10621	10017	10025	10017
query12	116	84	89	84
query13	1667	373	375	373
query14	10359	7944	7420	7420
query15	298	188	190	188
query16	7589	321	327	321
query17	1777	569	534	534
query18	1632	283	287	283
query19	202	154	151	151
query20	91	83	84	83
query21	216	136	131	131
query22	4347	4290	4011	4011
query23	34154	33948	33334	33334
query24	11147	2847	2946	2847
query25	628	425	407	407
query26	1083	153	156	153
query27	2349	283	292	283
query28	7105	2168	2148	2148
query29	927	672	650	650
query30	265	157	157	157
query31	964	755	762	755
query32	101	56	57	56
query33	771	321	314	314
query34	912	521	522	521
query35	701	602	594	594
query36	1149	969	980	969
query37	153	87	86	86
query38	3035	2819	2822	2819
query39	887	874	816	816
query40	216	124	123	123
query41	56	53	53	53
query42	119	100	98	98
query43	573	561	559	559
query44	1265	799	709	709
query45	192	159	159	159
query46	1079	740	774	740
query47	1881	1759	1767	1759
query48	380	295	306	295
query49	865	405	427	405
query50	798	408	404	404
query51	7126	7083	6802	6802
query52	106	91	97	91
query53	362	292	285	285
query54	887	458	449	449
query55	74	74	74	74
query56	274	261	270	261
query57	1123	1051	1065	1051
query58	256	248	241	241
query59	3425	3140	3188	3140
query60	313	273	280	273
query61	92	94	93	93
query62	795	653	663	653
query63	317	293	292	292
query64	9355	2191	1605	1605
query65	3334	3106	3105	3105
query66	758	326	318	318
query67	15838	15007	15010	15007
query68	8291	550	540	540
query69	763	473	396	396
query70	1198	1141	1113	1113
query71	516	284	283	283
query72	8503	5781	2785	2785
query73	1373	330	331	330
query74	5961	5594	5468	5468
query75	4998	2664	2673	2664
query76	5143	899	967	899
query77	791	302	300	300
query78	9513	9039	8884	8884
query79	8231	509	513	509
query80	1048	464	510	464
query81	595	219	221	219
query82	750	133	133	133
query83	343	172	166	166
query84	275	85	89	85
query85	1371	304	300	300
query86	399	315	287	287
query87	3295	3087	3084	3084
query88	4147	2442	2456	2442
query89	541	401	388	388
query90	2018	191	194	191
query91	166	102	105	102
query92	63	49	48	48
query93	6014	501	496	496
query94	1259	211	207	207
query95	410	316	321	316
query96	614	271	273	271
query97	3154	3017	3019	3017
query98	223	203	192	192
query99	1535	1274	1227	1227
Total cold run time: 300443 ms
Total hot run time: 171645 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.53 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c5a286eb7f901677c92831d7c794d993b32b3a4e, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.23	0.05	0.06
query4	1.67	0.09	0.08
query5	0.49	0.50	0.48
query6	1.13	0.74	0.72
query7	0.02	0.01	0.01
query8	0.05	0.04	0.05
query9	0.55	0.49	0.50
query10	0.55	0.54	0.54
query11	0.14	0.11	0.12
query12	0.14	0.12	0.12
query13	0.59	0.58	0.58
query14	0.77	0.78	0.78
query15	0.84	0.82	0.80
query16	0.37	0.36	0.36
query17	1.02	1.03	0.97
query18	0.23	0.22	0.21
query19	1.80	1.82	1.68
query20	0.01	0.01	0.01
query21	15.39	0.75	0.65
query22	3.95	6.35	2.80
query23	18.32	1.36	1.37
query24	2.18	0.23	0.21
query25	0.15	0.08	0.08
query26	0.30	0.21	0.20
query27	0.46	0.23	0.23
query28	13.20	1.01	0.99
query29	12.60	3.32	3.33
query30	0.26	0.06	0.06
query31	2.87	0.38	0.38
query32	3.27	0.48	0.48
query33	2.84	2.91	2.92
query34	16.99	4.30	4.31
query35	4.47	4.45	4.42
query36	0.64	0.49	0.47
query37	0.18	0.16	0.15
query38	0.15	0.14	0.15
query39	0.04	0.04	0.03
query40	0.14	0.12	0.12
query41	0.09	0.04	0.05
query42	0.05	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 109.3 s
Total hot run time: 31.53 s

@AshinGau
Copy link
Member

AshinGau commented Jul 9, 2024

LGTM

@morningman morningman merged commit b48b5da into apache:master Jul 9, 2024
27 of 31 checks passed
hubgeter added a commit to hubgeter/doris that referenced this pull request Jul 9, 2024
… without repeated deserialization. (apache#37377)

## Proposed changes

Since the value of the partition column is fixed when querying the
partition table, we can deserialize the value only once and then
repeatedly insert the value into the block.
```sql
in Hive: 
CREATE TABLE parquet_partition_tb (
    col1 STRING,
    col2 INT,
    col3 DOUBLE
) PARTITIONED BY (
    partition_col1 STRING,
    partition_col2 INT
)
STORED AS PARQUET;

insert into  parquet_partition_tb partition (partition_col1="hello",partition_col2=1) values("word",2,2.3);

insert into parquet_partition_tb partition(partition_col1="hello",partition_col2=1 )  
select col1,col2,col3 from  parquet_partition_tb where partition_col1="hello" and partition_col2=1;
Repeat the `insert into xxx select  xxx`operation several times.


Doris :
before:
mysql>  select count(partition_col1) from parquet_partition_tb;
+-----------------------+
| count(partition_col1) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (3.24 sec)

mysql>  select count(partition_col2) from parquet_partition_tb;
+-----------------------+
| count(partition_col2) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (3.34 sec)


after:
mysql>  select count(partition_col1) from parquet_partition_tb ;
+-----------------------+
| count(partition_col1) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (0.79 sec)

mysql> select count(partition_col2) from parquet_partition_tb;
+-----------------------+
| count(partition_col2) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (0.51 sec)

```
## Summary:
test sql `select count(partition_col) from tbl;`
Number of lines : 33554432
| |before | after|
|---|---|--|
|boolean |  3.96|0.47  | 
|tinyint  |  3.39|0.47  |  
|smallint |  3.14|0.50   |
|int    |3.34|0.51   | 
|bigint  |   3.61|0.51  |
|float   | 4.59 |0.51  | 
|double   |4.60| 0.55  | 
|decimal(5,2)|  3.96  |0.61 | 
|date   | 5.80|0.52    | 
|timestamp |  7.68 | 0.52 | 
|string  |  3.24 |0.79   | 

Issue Number: close #xxx

<!--Describe your changes.-->
hubgeter added a commit to hubgeter/doris that referenced this pull request Jul 10, 2024
… without repeated deserialization. (apache#37377)

## Proposed changes

Since the value of the partition column is fixed when querying the
partition table, we can deserialize the value only once and then
repeatedly insert the value into the block.
```sql
in Hive: 
CREATE TABLE parquet_partition_tb (
    col1 STRING,
    col2 INT,
    col3 DOUBLE
) PARTITIONED BY (
    partition_col1 STRING,
    partition_col2 INT
)
STORED AS PARQUET;

insert into  parquet_partition_tb partition (partition_col1="hello",partition_col2=1) values("word",2,2.3);

insert into parquet_partition_tb partition(partition_col1="hello",partition_col2=1 )  
select col1,col2,col3 from  parquet_partition_tb where partition_col1="hello" and partition_col2=1;
Repeat the `insert into xxx select  xxx`operation several times.


Doris :
before:
mysql>  select count(partition_col1) from parquet_partition_tb;
+-----------------------+
| count(partition_col1) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (3.24 sec)

mysql>  select count(partition_col2) from parquet_partition_tb;
+-----------------------+
| count(partition_col2) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (3.34 sec)


after:
mysql>  select count(partition_col1) from parquet_partition_tb ;
+-----------------------+
| count(partition_col1) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (0.79 sec)

mysql> select count(partition_col2) from parquet_partition_tb;
+-----------------------+
| count(partition_col2) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (0.51 sec)

```
## Summary:
test sql `select count(partition_col) from tbl;`
Number of lines : 33554432
| |before | after|
|---|---|--|
|boolean |  3.96|0.47  | 
|tinyint  |  3.39|0.47  |  
|smallint |  3.14|0.50   |
|int    |3.34|0.51   | 
|bigint  |   3.61|0.51  |
|float   | 4.59 |0.51  | 
|double   |4.60| 0.55  | 
|decimal(5,2)|  3.96  |0.61 | 
|date   | 5.80|0.52    | 
|timestamp |  7.68 | 0.52 | 
|string  |  3.24 |0.79   | 

Issue Number: close #xxx

<!--Describe your changes.-->
hubgeter added a commit to hubgeter/doris that referenced this pull request Jul 11, 2024
… without repeated deserialization. (apache#37377)

## Proposed changes

Since the value of the partition column is fixed when querying the
partition table, we can deserialize the value only once and then
repeatedly insert the value into the block.
```sql
in Hive: 
CREATE TABLE parquet_partition_tb (
    col1 STRING,
    col2 INT,
    col3 DOUBLE
) PARTITIONED BY (
    partition_col1 STRING,
    partition_col2 INT
)
STORED AS PARQUET;

insert into  parquet_partition_tb partition (partition_col1="hello",partition_col2=1) values("word",2,2.3);

insert into parquet_partition_tb partition(partition_col1="hello",partition_col2=1 )  
select col1,col2,col3 from  parquet_partition_tb where partition_col1="hello" and partition_col2=1;
Repeat the `insert into xxx select  xxx`operation several times.


Doris :
before:
mysql>  select count(partition_col1) from parquet_partition_tb;
+-----------------------+
| count(partition_col1) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (3.24 sec)

mysql>  select count(partition_col2) from parquet_partition_tb;
+-----------------------+
| count(partition_col2) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (3.34 sec)


after:
mysql>  select count(partition_col1) from parquet_partition_tb ;
+-----------------------+
| count(partition_col1) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (0.79 sec)

mysql> select count(partition_col2) from parquet_partition_tb;
+-----------------------+
| count(partition_col2) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (0.51 sec)

```
## Summary:
test sql `select count(partition_col) from tbl;`
Number of lines : 33554432
| |before | after|
|---|---|--|
|boolean |  3.96|0.47  | 
|tinyint  |  3.39|0.47  |  
|smallint |  3.14|0.50   |
|int    |3.34|0.51   | 
|bigint  |   3.61|0.51  |
|float   | 4.59 |0.51  | 
|double   |4.60| 0.55  | 
|decimal(5,2)|  3.96  |0.61 | 
|date   | 5.80|0.52    | 
|timestamp |  7.68 | 0.52 | 
|string  |  3.24 |0.79   | 

Issue Number: close #xxx

<!--Describe your changes.-->
hubgeter added a commit to hubgeter/doris that referenced this pull request Jul 15, 2024
… without repeated deserialization. (apache#37377)

## Proposed changes

Since the value of the partition column is fixed when querying the
partition table, we can deserialize the value only once and then
repeatedly insert the value into the block.
```sql
in Hive: 
CREATE TABLE parquet_partition_tb (
    col1 STRING,
    col2 INT,
    col3 DOUBLE
) PARTITIONED BY (
    partition_col1 STRING,
    partition_col2 INT
)
STORED AS PARQUET;

insert into  parquet_partition_tb partition (partition_col1="hello",partition_col2=1) values("word",2,2.3);

insert into parquet_partition_tb partition(partition_col1="hello",partition_col2=1 )  
select col1,col2,col3 from  parquet_partition_tb where partition_col1="hello" and partition_col2=1;
Repeat the `insert into xxx select  xxx`operation several times.


Doris :
before:
mysql>  select count(partition_col1) from parquet_partition_tb;
+-----------------------+
| count(partition_col1) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (3.24 sec)

mysql>  select count(partition_col2) from parquet_partition_tb;
+-----------------------+
| count(partition_col2) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (3.34 sec)


after:
mysql>  select count(partition_col1) from parquet_partition_tb ;
+-----------------------+
| count(partition_col1) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (0.79 sec)

mysql> select count(partition_col2) from parquet_partition_tb;
+-----------------------+
| count(partition_col2) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (0.51 sec)

```
## Summary:
test sql `select count(partition_col) from tbl;`
Number of lines : 33554432
| |before | after|
|---|---|--|
|boolean |  3.96|0.47  | 
|tinyint  |  3.39|0.47  |  
|smallint |  3.14|0.50   |
|int    |3.34|0.51   | 
|bigint  |   3.61|0.51  |
|float   | 4.59 |0.51  | 
|double   |4.60| 0.55  | 
|decimal(5,2)|  3.96  |0.61 | 
|date   | 5.80|0.52    | 
|timestamp |  7.68 | 0.52 | 
|string  |  3.24 |0.79   | 

Issue Number: close #xxx

<!--Describe your changes.-->
hubgeter added a commit to hubgeter/doris that referenced this pull request Jul 15, 2024
… without repeated deserialization. (apache#37377)

## Proposed changes

Since the value of the partition column is fixed when querying the
partition table, we can deserialize the value only once and then
repeatedly insert the value into the block.
```sql
in Hive: 
CREATE TABLE parquet_partition_tb (
    col1 STRING,
    col2 INT,
    col3 DOUBLE
) PARTITIONED BY (
    partition_col1 STRING,
    partition_col2 INT
)
STORED AS PARQUET;

insert into  parquet_partition_tb partition (partition_col1="hello",partition_col2=1) values("word",2,2.3);

insert into parquet_partition_tb partition(partition_col1="hello",partition_col2=1 )  
select col1,col2,col3 from  parquet_partition_tb where partition_col1="hello" and partition_col2=1;
Repeat the `insert into xxx select  xxx`operation several times.


Doris :
before:
mysql>  select count(partition_col1) from parquet_partition_tb;
+-----------------------+
| count(partition_col1) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (3.24 sec)

mysql>  select count(partition_col2) from parquet_partition_tb;
+-----------------------+
| count(partition_col2) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (3.34 sec)


after:
mysql>  select count(partition_col1) from parquet_partition_tb ;
+-----------------------+
| count(partition_col1) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (0.79 sec)

mysql> select count(partition_col2) from parquet_partition_tb;
+-----------------------+
| count(partition_col2) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (0.51 sec)

```
## Summary:
test sql `select count(partition_col) from tbl;`
Number of lines : 33554432
| |before | after|
|---|---|--|
|boolean |  3.96|0.47  | 
|tinyint  |  3.39|0.47  |  
|smallint |  3.14|0.50   |
|int    |3.34|0.51   | 
|bigint  |   3.61|0.51  |
|float   | 4.59 |0.51  | 
|double   |4.60| 0.55  | 
|decimal(5,2)|  3.96  |0.61 | 
|date   | 5.80|0.52    | 
|timestamp |  7.68 | 0.52 | 
|string  |  3.24 |0.79   | 

Issue Number: close #xxx

<!--Describe your changes.-->
morningman pushed a commit that referenced this pull request Jul 16, 2024
dataroaring pushed a commit that referenced this pull request Jul 17, 2024
… without repeated deserialization. (#37377)

## Proposed changes

Since the value of the partition column is fixed when querying the
partition table, we can deserialize the value only once and then
repeatedly insert the value into the block.
```sql
in Hive: 
CREATE TABLE parquet_partition_tb (
    col1 STRING,
    col2 INT,
    col3 DOUBLE
) PARTITIONED BY (
    partition_col1 STRING,
    partition_col2 INT
)
STORED AS PARQUET;

insert into  parquet_partition_tb partition (partition_col1="hello",partition_col2=1) values("word",2,2.3);

insert into parquet_partition_tb partition(partition_col1="hello",partition_col2=1 )  
select col1,col2,col3 from  parquet_partition_tb where partition_col1="hello" and partition_col2=1;
Repeat the `insert into xxx select  xxx`operation several times.


Doris :
before:
mysql>  select count(partition_col1) from parquet_partition_tb;
+-----------------------+
| count(partition_col1) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (3.24 sec)

mysql>  select count(partition_col2) from parquet_partition_tb;
+-----------------------+
| count(partition_col2) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (3.34 sec)


after:
mysql>  select count(partition_col1) from parquet_partition_tb ;
+-----------------------+
| count(partition_col1) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (0.79 sec)

mysql> select count(partition_col2) from parquet_partition_tb;
+-----------------------+
| count(partition_col2) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (0.51 sec)

```
## Summary:
test sql `select count(partition_col) from tbl;`
Number of lines : 33554432
| |before | after|
|---|---|--|
|boolean |  3.96|0.47  | 
|tinyint  |  3.39|0.47  |  
|smallint |  3.14|0.50   |
|int    |3.34|0.51   | 
|bigint  |   3.61|0.51  |
|float   | 4.59 |0.51  | 
|double   |4.60| 0.55  | 
|decimal(5,2)|  3.96  |0.61 | 
|date   | 5.80|0.52    | 
|timestamp |  7.68 | 0.52 | 
|string  |  3.24 |0.79   | 

Issue Number: close #xxx

<!--Describe your changes.-->
morningman added a commit that referenced this pull request Jul 17, 2024
… columns without repeated deserialization. (#37377)" (#38007)

Reverts #37530
Need more test, revert it temporarily
morningman pushed a commit that referenced this pull request Aug 2, 2024
…rom_fixed_json (#38245)

## Proposed changes
fix a bug in DataTypeNullableSerDe.deserialize_column_from_fixed_json.

The expected behavior of the `deserialize_column_from_fixed_json`
function is to `insert` n values ​​into the column.

However, when the `DataTypeNullableSerDe` class implements this
function, the null_map column is `resize` to n, which does not insert n
values ​​into it. Since this function is only used by the
`_fill_partition_columns` of the `parquet/orc reader` and is not called
repeatedly for a `get_next_block`, this bug is covered up.
before pr : #37377
hubgeter added a commit to hubgeter/doris that referenced this pull request Aug 2, 2024
… without repeated deserialization. (apache#37377)

## Proposed changes

Since the value of the partition column is fixed when querying the
partition table, we can deserialize the value only once and then
repeatedly insert the value into the block.
```sql
in Hive: 
CREATE TABLE parquet_partition_tb (
    col1 STRING,
    col2 INT,
    col3 DOUBLE
) PARTITIONED BY (
    partition_col1 STRING,
    partition_col2 INT
)
STORED AS PARQUET;

insert into  parquet_partition_tb partition (partition_col1="hello",partition_col2=1) values("word",2,2.3);

insert into parquet_partition_tb partition(partition_col1="hello",partition_col2=1 )  
select col1,col2,col3 from  parquet_partition_tb where partition_col1="hello" and partition_col2=1;
Repeat the `insert into xxx select  xxx`operation several times.


Doris :
before:
mysql>  select count(partition_col1) from parquet_partition_tb;
+-----------------------+
| count(partition_col1) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (3.24 sec)

mysql>  select count(partition_col2) from parquet_partition_tb;
+-----------------------+
| count(partition_col2) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (3.34 sec)


after:
mysql>  select count(partition_col1) from parquet_partition_tb ;
+-----------------------+
| count(partition_col1) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (0.79 sec)

mysql> select count(partition_col2) from parquet_partition_tb;
+-----------------------+
| count(partition_col2) |
+-----------------------+
|              33554432 |
+-----------------------+
1 row in set (0.51 sec)

```
## Summary:
test sql `select count(partition_col) from tbl;`
Number of lines : 33554432
| |before | after|
|---|---|--|
|boolean |  3.96|0.47  | 
|tinyint  |  3.39|0.47  |  
|smallint |  3.14|0.50   |
|int    |3.34|0.51   | 
|bigint  |   3.61|0.51  |
|float   | 4.59 |0.51  | 
|double   |4.60| 0.55  | 
|decimal(5,2)|  3.96  |0.61 | 
|date   | 5.80|0.52    | 
|timestamp |  7.68 | 0.52 | 
|string  |  3.24 |0.79   | 

Issue Number: close #xxx

<!--Describe your changes.-->
hubgeter added a commit to hubgeter/doris that referenced this pull request Aug 2, 2024
…rom_fixed_json (apache#38245)

## Proposed changes
fix a bug in DataTypeNullableSerDe.deserialize_column_from_fixed_json.

The expected behavior of the `deserialize_column_from_fixed_json`
function is to `insert` n values ​​into the column.

However, when the `DataTypeNullableSerDe` class implements this
function, the null_map column is `resize` to n, which does not insert n
values ​​into it. Since this function is only used by the
`_fill_partition_columns` of the `parquet/orc reader` and is not called
repeatedly for a `get_next_block`, this bug is covered up.
before pr : apache#37377
yiguolei pushed a commit that referenced this pull request Aug 5, 2024
… without repeated deserialization. (#37377) (#38245) (#38810)

## Proposed changes
pick pr: #38575  and fix this pr bug :  #38245
dataroaring pushed a commit that referenced this pull request Aug 11, 2024
…rom_fixed_json (#38245)

## Proposed changes
fix a bug in DataTypeNullableSerDe.deserialize_column_from_fixed_json.

The expected behavior of the `deserialize_column_from_fixed_json`
function is to `insert` n values ​​into the column.

However, when the `DataTypeNullableSerDe` class implements this
function, the null_map column is `resize` to n, which does not insert n
values ​​into it. Since this function is only used by the
`_fill_partition_columns` of the `parquet/orc reader` and is not called
repeatedly for a `get_next_block`, this bug is covered up.
before pr : #37377
dataroaring pushed a commit that referenced this pull request Aug 16, 2024
…rom_fixed_json (#38245)

## Proposed changes
fix a bug in DataTypeNullableSerDe.deserialize_column_from_fixed_json.

The expected behavior of the `deserialize_column_from_fixed_json`
function is to `insert` n values ​​into the column.

However, when the `DataTypeNullableSerDe` class implements this
function, the null_map column is `resize` to n, which does not insert n
values ​​into it. Since this function is only used by the
`_fill_partition_columns` of the `parquet/orc reader` and is not called
repeatedly for a `get_next_block`, this bug is covered up.
before pr : #37377
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.6-merged dev/3.0.1-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants