Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature](hive)Support reading renamed Parquet Hive and Orc Hive tables. #38432

Merged

Conversation

hubgeter
Copy link
Contributor

@hubgeter hubgeter commented Jul 26, 2024

Proposed changes

Add hive_parquet_use_column_names and hive_orc_use_column_names session variables to read the table after rename column in Hive.

These two session variables are referenced from parquet_use_column_names and orc_use_column_names of Trino hive connector.

By default, these two session variables are true. When they are set to false, reading orc/parquet will access the columns according to the ordinal position in the Hive table definition.

For example:

in Hive : 
hive> create table tmp (a int , b string) stored as parquet;
hive> insert into table tmp values(1,"2");
hive> alter table tmp  change column  a new_a int;
hive> insert into table tmp values(2,"4");

in Doris : 
mysql> set hive_parquet_use_column_names=true;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|  NULL | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)

mysql> set hive_parquet_use_column_names=false;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|     1 | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)

You can use set parquet.column.index.access/orc.force.positional.evolution = true/false in hive 3 to control the results of reading the table like these two session variables. However, for the rename struct inside column parquet table, the effects of hive and doris are different.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@hubgeter
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 39639 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 29dadddc119b8f8a21b4bfdeb6a79104f8373165, data reload: false

------ Round 1 ----------------------------------
q1	18210	5137	4283	4283
q2	2016	202	195	195
q3	10502	1164	1144	1144
q4	10155	737	717	717
q5	7514	2709	2672	2672
q6	219	136	138	136
q7	961	601	595	595
q8	9212	1911	1941	1911
q9	8828	6601	6613	6601
q10	8709	3801	3804	3801
q11	521	248	256	248
q12	397	230	223	223
q13	18936	3005	3022	3005
q14	286	234	241	234
q15	517	485	484	484
q16	492	414	394	394
q17	991	706	676	676
q18	8033	7453	7448	7448
q19	4766	1075	1045	1045
q20	666	341	340	340
q21	5027	3193	3269	3193
q22	366	294	295	294
Total cold run time: 117324 ms
Total hot run time: 39639 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4503	4262	4234	4234
q2	368	285	281	281
q3	3010	2884	2897	2884
q4	2035	1722	1723	1722
q5	5646	5549	5505	5505
q6	227	132	140	132
q7	2167	1827	1864	1827
q8	3283	3427	3441	3427
q9	8728	8872	8941	8872
q10	4066	3867	3775	3775
q11	603	503	501	501
q12	820	655	693	655
q13	16345	3165	3184	3165
q14	332	303	288	288
q15	535	486	496	486
q16	502	455	469	455
q17	1838	1537	1509	1509
q18	8140	7941	7704	7704
q19	1717	1562	1639	1562
q20	2921	1890	1885	1885
q21	5042	4906	4784	4784
q22	588	508	524	508
Total cold run time: 73416 ms
Total hot run time: 56161 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173267 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 29dadddc119b8f8a21b4bfdeb6a79104f8373165, data reload: false

query1	917	371	375	371
query2	6467	1915	1893	1893
query3	6630	202	211	202
query4	27673	17716	17600	17600
query5	3621	499	484	484
query6	276	181	169	169
query7	4582	290	296	290
query8	260	197	192	192
query9	8524	2465	2444	2444
query10	432	295	274	274
query11	10859	9989	9979	9979
query12	122	83	84	83
query13	1620	370	356	356
query14	10271	7841	7640	7640
query15	215	170	160	160
query16	7624	448	443	443
query17	1586	552	527	527
query18	1806	277	278	277
query19	191	141	142	141
query20	92	83	81	81
query21	206	101	105	101
query22	4403	4056	4004	4004
query23	34123	33445	33773	33445
query24	11281	2978	2900	2900
query25	630	410	381	381
query26	1124	156	148	148
query27	2298	272	283	272
query28	6735	2094	2086	2086
query29	837	434	428	428
query30	259	158	153	153
query31	983	796	739	739
query32	92	54	52	52
query33	767	358	324	324
query34	934	476	486	476
query35	904	760	764	760
query36	1147	969	929	929
query37	142	81	88	81
query38	2860	2747	2738	2738
query39	872	789	815	789
query40	207	123	113	113
query41	48	45	46	45
query42	127	96	102	96
query43	493	472	466	466
query44	1166	733	720	720
query45	207	177	180	177
query46	1100	740	743	740
query47	1877	1777	1760	1760
query48	366	290	296	290
query49	841	406	406	406
query50	796	405	404	404
query51	6792	6627	6657	6627
query52	100	87	90	87
query53	264	182	179	179
query54	914	438	439	438
query55	75	72	74	72
query56	283	274	276	274
query57	1122	1036	1062	1036
query58	270	282	273	273
query59	2923	2681	2686	2681
query60	315	278	290	278
query61	95	94	100	94
query62	795	639	633	633
query63	208	184	178	178
query64	9472	2251	1690	1690
query65	3207	3108	3107	3107
query66	748	325	326	325
query67	15266	15183	15074	15074
query68	4519	537	548	537
query69	454	308	304	304
query70	1168	1057	1108	1057
query71	371	272	280	272
query72	6898	5523	5850	5523
query73	740	323	322	322
query74	6052	5705	5656	5656
query75	3352	2672	2673	2672
query76	2218	913	917	913
query77	439	294	293	293
query78	9642	9938	8897	8897
query79	2275	503	513	503
query80	1511	473	479	473
query81	604	219	217	217
query82	725	136	138	136
query83	282	170	173	170
query84	244	91	77	77
query85	1684	317	297	297
query86	477	315	326	315
query87	3277	3066	3070	3066
query88	3895	2453	2461	2453
query89	389	288	290	288
query90	1715	192	195	192
query91	127	99	99	99
query92	61	50	48	48
query93	2702	529	535	529
query94	801	297	284	284
query95	356	264	263	263
query96	607	277	282	277
query97	3195	3041	3026	3026
query98	231	256	195	195
query99	1658	1274	1238	1238
Total cold run time: 277111 ms
Total hot run time: 173267 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.81 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 29dadddc119b8f8a21b4bfdeb6a79104f8373165, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.22	0.06	0.05
query4	1.67	0.08	0.08
query5	0.50	0.47	0.48
query6	1.13	0.72	0.73
query7	0.01	0.01	0.02
query8	0.05	0.04	0.05
query9	0.56	0.50	0.49
query10	0.56	0.54	0.54
query11	0.15	0.11	0.12
query12	0.15	0.13	0.12
query13	0.60	0.58	0.58
query14	0.75	0.80	0.77
query15	0.86	0.82	0.81
query16	0.37	0.36	0.39
query17	1.01	0.98	0.95
query18	0.23	0.22	0.22
query19	1.79	1.66	1.69
query20	0.02	0.01	0.01
query21	15.43	0.76	0.65
query22	4.10	7.41	2.28
query23	18.26	1.30	1.33
query24	2.06	0.23	0.23
query25	0.16	0.09	0.08
query26	0.31	0.20	0.21
query27	0.46	0.23	0.23
query28	13.36	1.01	1.00
query29	12.54	3.31	3.27
query30	0.25	0.06	0.05
query31	2.86	0.39	0.39
query32	3.27	0.48	0.47
query33	2.92	2.93	2.87
query34	17.10	4.28	4.38
query35	4.41	4.40	4.38
query36	0.66	0.50	0.50
query37	0.19	0.16	0.16
query38	0.15	0.15	0.15
query39	0.05	0.04	0.03
query40	0.14	0.13	0.12
query41	0.10	0.05	0.04
query42	0.05	0.04	0.05
query43	0.05	0.03	0.03
Total cold run time: 109.63 s
Total hot run time: 30.81 s

@hubgeter hubgeter force-pushed the feature_read_hive_rename_table_parquet_orc branch from 29daddd to 41fabdf Compare July 27, 2024 15:37
@hubgeter
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 39625 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2733ba7049991873613e3630276fff0cc0d77501, data reload: false

------ Round 1 ----------------------------------
q1	18377	4451	4309	4309
q2	2020	203	198	198
q3	10504	1206	1143	1143
q4	10144	722	710	710
q5	7605	2754	2714	2714
q6	225	141	141	141
q7	972	598	595	595
q8	9217	1936	1961	1936
q9	8899	6584	6554	6554
q10	8890	3833	3780	3780
q11	464	251	260	251
q12	404	229	226	226
q13	18760	2991	3004	2991
q14	289	229	245	229
q15	520	468	484	468
q16	524	390	381	381
q17	1005	626	717	626
q18	8176	7531	7388	7388
q19	4895	1061	1105	1061
q20	678	337	349	337
q21	4906	3302	3315	3302
q22	355	287	285	285
Total cold run time: 117829 ms
Total hot run time: 39625 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4508	4251	4291	4251
q2	366	276	269	269
q3	3014	2890	2904	2890
q4	2028	1713	1756	1713
q5	5700	5598	5571	5571
q6	224	148	140	140
q7	2231	1901	1843	1843
q8	3299	3437	3778	3437
q9	8909	8893	8907	8893
q10	4147	3807	3838	3807
q11	600	499	507	499
q12	818	662	674	662
q13	16122	3136	3214	3136
q14	308	296	287	287
q15	524	496	492	492
q16	485	452	458	452
q17	1847	1522	1510	1510
q18	8193	7994	7824	7824
q19	1799	1673	1509	1509
q20	2929	1906	1861	1861
q21	6453	5014	4750	4750
q22	667	496	490	490
Total cold run time: 75171 ms
Total hot run time: 56286 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173012 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2733ba7049991873613e3630276fff0cc0d77501, data reload: false

query1	917	377	361	361
query2	6446	2006	1819	1819
query3	6658	205	218	205
query4	28471	17550	17548	17548
query5	3695	491	493	491
query6	289	189	160	160
query7	4571	285	292	285
query8	237	196	194	194
query9	8570	2465	2434	2434
query10	447	301	271	271
query11	11744	10090	10083	10083
query12	116	85	84	84
query13	1633	377	368	368
query14	10072	7717	6961	6961
query15	224	167	169	167
query16	7076	498	469	469
query17	945	568	568	568
query18	1934	295	297	295
query19	196	144	144	144
query20	92	90	86	86
query21	215	99	103	99
query22	4180	3957	3849	3849
query23	34155	34018	33722	33722
query24	11551	2999	2911	2911
query25	637	393	388	388
query26	1244	152	158	152
query27	2416	285	283	283
query28	6915	2101	2068	2068
query29	928	431	425	425
query30	264	152	154	152
query31	971	798	761	761
query32	99	59	58	58
query33	794	347	332	332
query34	884	486	498	486
query35	909	732	751	732
query36	1116	945	931	931
query37	157	84	84	84
query38	2986	2886	2804	2804
query39	923	883	856	856
query40	212	120	120	120
query41	45	46	46	46
query42	109	110	101	101
query43	518	460	464	460
query44	1249	723	731	723
query45	207	177	176	176
query46	1091	725	738	725
query47	1831	1748	1734	1734
query48	374	294	290	290
query49	864	413	422	413
query50	790	404	400	400
query51	6784	6693	6700	6693
query52	110	85	94	85
query53	259	183	190	183
query54	905	444	439	439
query55	75	72	75	72
query56	301	276	270	270
query57	1203	1026	1045	1026
query58	255	277	273	273
query59	2828	2600	2818	2600
query60	316	282	286	282
query61	97	96	94	94
query62	807	656	659	656
query63	207	183	177	177
query64	9517	2298	1704	1704
query65	3485	3155	3142	3142
query66	971	326	323	323
query67	15412	15080	14779	14779
query68	4548	515	533	515
query69	554	323	314	314
query70	1122	1111	1106	1106
query71	441	280	278	278
query72	8435	5626	5676	5626
query73	765	336	326	326
query74	6078	5641	5714	5641
query75	3598	2678	2693	2678
query76	2561	942	973	942
query77	635	346	299	299
query78	10593	10034	9037	9037
query79	8469	535	516	516
query80	1516	509	509	509
query81	593	224	221	221
query82	1408	138	129	129
query83	343	177	178	177
query84	273	80	79	79
query85	1843	317	307	307
query86	328	329	294	294
query87	3233	3054	3057	3054
query88	4778	2485	2477	2477
query89	423	285	285	285
query90	1837	194	200	194
query91	126	103	160	103
query92	66	48	49	48
query93	5715	530	540	530
query94	700	263	282	263
query95	345	264	267	264
query96	632	281	273	273
query97	3171	2998	3044	2998
query98	223	202	193	193
query99	1591	1264	1291	1264
Total cold run time: 293442 ms
Total hot run time: 173012 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.48 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2733ba7049991873613e3630276fff0cc0d77501, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.04	0.04
query3	0.23	0.06	0.06
query4	1.66	0.08	0.08
query5	0.51	0.51	0.48
query6	1.13	0.73	0.73
query7	0.02	0.01	0.01
query8	0.06	0.04	0.04
query9	0.56	0.50	0.48
query10	0.55	0.55	0.54
query11	0.16	0.11	0.12
query12	0.15	0.11	0.12
query13	0.60	0.58	0.59
query14	0.77	0.77	0.77
query15	0.86	0.81	0.82
query16	0.38	0.37	0.37
query17	0.96	1.01	1.00
query18	0.23	0.22	0.21
query19	1.86	1.74	1.76
query20	0.01	0.01	0.01
query21	15.66	0.80	0.68
query22	4.25	8.30	1.77
query23	18.28	1.45	1.26
query24	2.17	0.23	0.22
query25	0.16	0.09	0.09
query26	0.30	0.22	0.21
query27	0.46	0.24	0.23
query28	13.23	1.02	1.00
query29	12.64	3.25	3.29
query30	0.25	0.06	0.06
query31	2.85	0.39	0.39
query32	3.27	0.48	0.47
query33	2.92	2.88	2.98
query34	17.08	4.33	4.36
query35	4.41	4.38	4.40
query36	0.66	0.47	0.49
query37	0.18	0.16	0.15
query38	0.15	0.16	0.16
query39	0.05	0.04	0.04
query40	0.15	0.12	0.12
query41	0.10	0.04	0.04
query42	0.07	0.04	0.06
query43	0.05	0.04	0.04
Total cold run time: 110.16 s
Total hot run time: 30.48 s

@hubgeter hubgeter force-pushed the feature_read_hive_rename_table_parquet_orc branch from 2733ba7 to 41fabdf Compare July 28, 2024 10:19
@hubgeter
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

2 similar comments
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 39823 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit bcf4eaa6d9442783a8ff689f5c6ee38449481871, data reload: false

------ Round 1 ----------------------------------
q1	18030	5188	4475	4475
q2	2551	205	204	204
q3	11764	1255	1156	1156
q4	10409	796	734	734
q5	7578	2730	2863	2730
q6	224	143	141	141
q7	989	606	611	606
q8	9278	1937	1959	1937
q9	8957	6627	6661	6627
q10	8723	3834	3790	3790
q11	464	242	254	242
q12	405	219	215	215
q13	17737	2985	2996	2985
q14	287	235	242	235
q15	529	483	498	483
q16	492	398	384	384
q17	978	662	726	662
q18	8209	7340	7436	7340
q19	1393	1057	989	989
q20	694	323	345	323
q21	5012	3281	3339	3281
q22	346	298	284	284
Total cold run time: 115049 ms
Total hot run time: 39823 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4315	4290	4278	4278
q2	368	270	268	268
q3	2992	2770	2742	2742
q4	1938	1648	1617	1617
q5	5310	5346	5321	5321
q6	224	128	131	128
q7	2119	1737	1764	1737
q8	3202	3374	3337	3337
q9	8452	8423	8459	8423
q10	3924	3736	3713	3713
q11	610	524	488	488
q12	747	588	589	588
q13	17434	2960	2966	2960
q14	292	268	286	268
q15	523	469	482	469
q16	479	419	423	419
q17	1806	1517	1461	1461
q18	7606	7582	7617	7582
q19	1683	1580	1444	1444
q20	1964	1775	1758	1758
q21	4964	4722	4674	4674
q22	577	478	492	478
Total cold run time: 71529 ms
Total hot run time: 54153 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172930 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit bcf4eaa6d9442783a8ff689f5c6ee38449481871, data reload: false

query1	923	374	355	355
query2	6440	1943	1861	1861
query3	6663	206	215	206
query4	28266	17713	17576	17576
query5	4203	497	476	476
query6	277	167	150	150
query7	4579	294	288	288
query8	248	221	194	194
query9	8750	2468	2438	2438
query10	454	281	264	264
query11	11861	10045	10042	10042
query12	133	85	89	85
query13	1627	380	378	378
query14	10284	7747	7700	7700
query15	227	172	167	167
query16	7836	491	486	486
query17	1597	564	549	549
query18	2010	288	283	283
query19	199	148	148	148
query20	94	88	94	88
query21	213	101	103	101
query22	4150	4252	3897	3897
query23	34015	33171	32972	32972
query24	12225	2916	2887	2887
query25	698	393	399	393
query26	1803	149	155	149
query27	2982	274	278	274
query28	7331	2053	2033	2033
query29	1122	438	429	429
query30	290	153	151	151
query31	952	751	764	751
query32	96	55	57	55
query33	792	349	349	349
query34	906	470	475	470
query35	873	730	755	730
query36	1097	901	929	901
query37	297	83	78	78
query38	2848	2736	2740	2736
query39	901	792	831	792
query40	272	119	116	116
query41	49	47	46	46
query42	119	102	110	102
query43	516	471	455	455
query44	1208	726	728	726
query45	213	180	178	178
query46	1094	785	735	735
query47	1838	1751	1744	1744
query48	378	304	299	299
query49	1209	435	430	430
query50	828	408	405	405
query51	6706	6695	6668	6668
query52	101	94	95	94
query53	260	183	186	183
query54	978	463	451	451
query55	75	75	75	75
query56	328	286	293	286
query57	1137	1036	1023	1023
query58	388	256	276	256
query59	2914	2828	2684	2684
query60	310	284	292	284
query61	94	114	103	103
query62	843	651	658	651
query63	207	191	183	183
query64	10476	2281	1758	1758
query65	3262	3102	3090	3090
query66	1370	335	340	335
query67	15624	14667	14648	14648
query68	9384	567	580	567
query69	766	413	324	324
query70	1402	1083	1080	1080
query71	549	271	265	265
query72	9166	5608	5912	5608
query73	2248	328	327	327
query74	6162	5721	5724	5721
query75	6137	2736	2674	2674
query76	5629	963	942	942
query77	792	325	304	304
query78	9602	9123	8963	8963
query79	9445	534	536	534
query80	921	502	510	502
query81	584	225	214	214
query82	283	135	133	133
query83	344	177	179	177
query84	267	77	80	77
query85	989	316	300	300
query86	359	329	300	300
query87	3314	3166	3052	3052
query88	5082	2491	2490	2490
query89	496	285	285	285
query90	2056	197	196	196
query91	126	101	101	101
query92	59	48	51	48
query93	5945	556	558	556
query94	1047	285	268	268
query95	364	265	263	263
query96	625	279	273	273
query97	3157	3056	3029	3029
query98	217	206	203	203
query99	1519	1279	1264	1264
Total cold run time: 312095 ms
Total hot run time: 172930 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.46 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit bcf4eaa6d9442783a8ff689f5c6ee38449481871, data reload: false

query1	0.05	0.04	0.04
query2	0.08	0.03	0.04
query3	0.22	0.04	0.05
query4	1.68	0.09	0.08
query5	0.51	0.49	0.48
query6	1.13	0.72	0.73
query7	0.02	0.02	0.01
query8	0.05	0.04	0.04
query9	0.56	0.50	0.50
query10	0.53	0.55	0.54
query11	0.16	0.11	0.12
query12	0.15	0.12	0.12
query13	0.61	0.58	0.59
query14	0.76	0.78	0.78
query15	0.86	0.81	0.81
query16	0.36	0.37	0.36
query17	0.97	1.03	0.95
query18	0.23	0.22	0.21
query19	1.88	1.70	1.73
query20	0.02	0.01	0.03
query21	15.40	0.78	0.66
query22	3.88	7.67	1.89
query23	18.28	1.36	1.18
query24	2.19	0.24	0.23
query25	0.16	0.08	0.10
query26	0.30	0.22	0.21
query27	0.46	0.24	0.23
query28	13.20	1.02	1.02
query29	12.64	3.30	3.25
query30	0.25	0.06	0.06
query31	2.89	0.40	0.38
query32	3.28	0.49	0.47
query33	2.92	2.88	2.92
query34	17.00	4.33	4.38
query35	4.39	4.45	4.37
query36	0.65	0.49	0.49
query37	0.20	0.16	0.17
query38	0.17	0.16	0.16
query39	0.04	0.03	0.04
query40	0.17	0.13	0.13
query41	0.10	0.05	0.05
query42	0.06	0.05	0.05
query43	0.05	0.04	0.05
Total cold run time: 109.51 s
Total hot run time: 30.46 s

@hubgeter hubgeter force-pushed the feature_read_hive_rename_table_parquet_orc branch from bcf4eaa to 5cfcedd Compare July 29, 2024 07:25
@hubgeter
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 41972 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5cfcedd8f8c84899c8483d037dce8fddc55171e0, data reload: false

------ Round 1 ----------------------------------
q1	17647	4192	4106	4106
q2	2026	203	207	203
q3	10436	1323	1361	1323
q4	10176	815	882	815
q5	7657	2979	2972	2972
q6	222	135	135	135
q7	1035	614	608	608
q8	9435	1943	1946	1943
q9	8442	6635	6648	6635
q10	8711	3855	3834	3834
q11	435	253	251	251
q12	412	230	230	230
q13	17748	2959	2949	2949
q14	268	244	252	244
q15	530	485	493	485
q16	505	398	380	380
q17	976	918	897	897
q18	8143	7399	7344	7344
q19	1684	1220	1208	1208
q20	581	343	336	336
q21	5325	4783	4811	4783
q22	354	291	292	291
Total cold run time: 112748 ms
Total hot run time: 41972 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4092	4054	4096	4054
q2	328	226	220	220
q3	2986	2999	3162	2999
q4	2004	2028	1962	1962
q5	5575	5511	5454	5454
q6	232	131	130	130
q7	2149	1812	1837	1812
q8	3320	3416	3374	3374
q9	9089	9106	9235	9106
q10	3974	4049	3916	3916
q11	569	470	468	468
q12	809	606	611	606
q13	16380	3117	3096	3096
q14	317	288	276	276
q15	537	489	487	487
q16	463	418	423	418
q17	1756	1727	1713	1713
q18	8334	7774	7804	7774
q19	1723	1769	1759	1759
q20	2065	1833	1827	1827
q21	5734	5466	5253	5253
q22	544	473	459	459
Total cold run time: 72980 ms
Total hot run time: 57163 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169644 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5cfcedd8f8c84899c8483d037dce8fddc55171e0, data reload: false

query1	922	376	378	376
query2	6473	1692	1645	1645
query3	6667	210	219	210
query4	20431	17344	17261	17261
query5	3664	502	506	502
query6	285	186	176	176
query7	4587	293	289	289
query8	254	206	188	188
query9	8516	2417	2426	2417
query10	421	281	275	275
query11	10453	9937	10055	9937
query12	122	87	89	87
query13	1648	385	389	385
query14	9661	7892	7779	7779
query15	202	163	161	161
query16	6956	472	436	436
query17	964	584	571	571
query18	1917	291	293	291
query19	195	148	150	148
query20	93	89	85	85
query21	206	102	114	102
query22	4160	3894	4090	3894
query23	33670	33582	33288	33288
query24	10492	3072	3092	3072
query25	694	432	447	432
query26	1680	167	161	161
query27	2982	293	300	293
query28	7373	2049	2035	2035
query29	1225	453	519	453
query30	250	160	155	155
query31	975	817	760	760
query32	102	59	57	57
query33	668	321	321	321
query34	918	497	505	497
query35	887	769	730	730
query36	1011	865	846	846
query37	203	77	78	77
query38	2919	2763	2777	2763
query39	875	806	818	806
query40	256	112	111	111
query41	46	43	47	43
query42	120	97	101	97
query43	458	426	414	414
query44	1166	721	742	721
query45	202	174	176	174
query46	1098	799	775	775
query47	1774	1688	1722	1688
query48	378	294	287	287
query49	951	410	417	410
query50	908	435	435	435
query51	6955	6587	6741	6587
query52	109	87	88	87
query53	256	175	176	175
query54	635	476	446	446
query55	74	72	72	72
query56	272	245	251	245
query57	1126	1016	1013	1013
query58	283	261	275	261
query59	2465	2305	2329	2305
query60	302	276	285	276
query61	108	96	124	96
query62	874	660	658	658
query63	211	176	183	176
query64	5622	1955	1884	1884
query65	3179	3107	3076	3076
query66	1314	329	328	328
query67	15354	14615	14654	14615
query68	4514	575	585	575
query69	737	391	322	322
query70	1108	1027	1068	1027
query71	479	273	285	273
query72	8054	2673	2540	2540
query73	796	327	329	327
query74	5947	5702	5559	5559
query75	4179	2693	2724	2693
query76	3407	1348	1409	1348
query77	695	314	308	308
query78	9432	8919	8923	8919
query79	2854	546	534	534
query80	1084	520	500	500
query81	547	234	220	220
query82	1324	134	137	134
query83	327	168	169	168
query84	272	85	85	85
query85	1383	376	302	302
query86	423	296	302	296
query87	3268	3085	3144	3085
query88	3798	2486	2460	2460
query89	392	286	298	286
query90	1991	194	187	187
query91	123	102	100	100
query92	59	50	53	50
query93	2883	633	645	633
query94	904	292	258	258
query95	383	274	272	272
query96	608	290	276	276
query97	3206	3016	3053	3016
query98	227	204	191	191
query99	1588	1284	1304	1284
Total cold run time: 269209 ms
Total hot run time: 169644 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.55 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5cfcedd8f8c84899c8483d037dce8fddc55171e0, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.03	0.04
query3	0.22	0.05	0.05
query4	1.69	0.08	0.07
query5	0.48	0.48	0.48
query6	1.14	0.73	0.71
query7	0.02	0.02	0.01
query8	0.06	0.04	0.04
query9	0.58	0.50	0.51
query10	0.55	0.54	0.55
query11	0.16	0.12	0.12
query12	0.15	0.13	0.12
query13	0.62	0.61	0.60
query14	0.78	0.79	0.80
query15	0.90	0.86	0.85
query16	0.35	0.35	0.36
query17	0.99	1.03	0.98
query18	0.22	0.21	0.22
query19	1.86	1.76	1.77
query20	0.02	0.01	0.01
query21	15.39	0.77	0.66
query22	3.84	7.24	1.74
query23	18.08	1.42	1.36
query24	2.25	0.22	0.22
query25	0.18	0.08	0.09
query26	0.31	0.21	0.21
query27	0.45	0.25	0.24
query28	13.18	1.00	0.98
query29	12.55	3.36	3.34
query30	0.25	0.06	0.06
query31	2.85	0.40	0.40
query32	3.25	0.49	0.48
query33	2.87	2.98	2.94
query34	15.46	4.28	4.26
query35	4.31	4.27	4.28
query36	0.69	0.47	0.48
query37	0.18	0.16	0.16
query38	0.16	0.14	0.15
query39	0.04	0.04	0.04
query40	0.15	0.12	0.14
query41	0.10	0.05	0.04
query42	0.06	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 107.54 s
Total hot run time: 30.55 s

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 30, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit 1157db4 into apache:master Aug 2, 2024
29 of 32 checks passed
hubgeter added a commit to hubgeter/doris that referenced this pull request Aug 2, 2024
…es. (apache#38432)

Add `hive_parquet_use_column_names` and `hive_orc_use_column_names`
session variables to read the table after rename column in `Hive`.

These two session variables are referenced from
`parquet_use_column_names` and `orc_use_column_names` of `Trino` hive
connector.

By default, these two session variables are true. When they are set to
false, reading orc/parquet will access the columns according to the
ordinal position in the Hive table definition.

For example:
```mysql
in Hive :
hive> create table tmp (a int , b string) stored as parquet;
hive> insert into table tmp values(1,"2");
hive> alter table tmp  change column  a new_a int;
hive> insert into table tmp values(2,"4");

in Doris :
mysql> set hive_parquet_use_column_names=true;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|  NULL | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)

mysql> set hive_parquet_use_column_names=false;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|     1 | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)
```

You can use `set
parquet.column.index.access/orc.force.positional.evolution = true/false`
in hive 3 to control the results of reading the table like these two
session variables. However, for the rename struct inside column parquet
table, the effects of hive and doris are different.
hubgeter added a commit to hubgeter/doris that referenced this pull request Aug 2, 2024
…es. (apache#38432)

Add `hive_parquet_use_column_names` and `hive_orc_use_column_names`
session variables to read the table after rename column in `Hive`.

These two session variables are referenced from
`parquet_use_column_names` and `orc_use_column_names` of `Trino` hive
connector.

By default, these two session variables are true. When they are set to
false, reading orc/parquet will access the columns according to the
ordinal position in the Hive table definition.

For example:
```mysql
in Hive :
hive> create table tmp (a int , b string) stored as parquet;
hive> insert into table tmp values(1,"2");
hive> alter table tmp  change column  a new_a int;
hive> insert into table tmp values(2,"4");

in Doris :
mysql> set hive_parquet_use_column_names=true;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|  NULL | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)

mysql> set hive_parquet_use_column_names=false;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|     1 | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)
```

You can use `set
parquet.column.index.access/orc.force.positional.evolution = true/false`
in hive 3 to control the results of reading the table like these two
session variables. However, for the rename struct inside column parquet
table, the effects of hive and doris are different.
yiguolei pushed a commit that referenced this pull request Aug 5, 2024
…es. (#38432) (#38809)

bp #38432 

## Proposed changes
Add `hive_parquet_use_column_names` and `hive_orc_use_column_names`
session variables to read the table after rename column in `Hive`.

These two session variables are referenced from
`parquet_use_column_names` and `orc_use_column_names` of `Trino` hive
connector.

By default, these two session variables are true. When they are set to
false, reading orc/parquet will access the columns according to the
ordinal position in the Hive table definition.

For example:
```mysql
in Hive :
hive> create table tmp (a int , b string) stored as parquet;
hive> insert into table tmp values(1,"2");
hive> alter table tmp  change column  a new_a int;
hive> insert into table tmp values(2,"4");

in Doris :
mysql> set hive_parquet_use_column_names=true;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|  NULL | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)

mysql> set hive_parquet_use_column_names=false;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|     1 | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)
```

You can use `set
parquet.column.index.access/orc.force.positional.evolution = true/false`
in hive 3 to control the results of reading the table like these two
session variables. However, for the rename struct inside column parquet
table, the effects of hive and doris are different.
hubgeter added a commit to hubgeter/doris that referenced this pull request Oct 14, 2024
…es. (apache#38432)

Add `hive_parquet_use_column_names` and `hive_orc_use_column_names`
session variables to read the table after rename column in `Hive`.

These two session variables are referenced from
`parquet_use_column_names` and `orc_use_column_names` of `Trino` hive
connector.

By default, these two session variables are true. When they are set to
false, reading orc/parquet will access the columns according to the
ordinal position in the Hive table definition.

For example:
```mysql
in Hive :
hive> create table tmp (a int , b string) stored as parquet;
hive> insert into table tmp values(1,"2");
hive> alter table tmp  change column  a new_a int;
hive> insert into table tmp values(2,"4");

in Doris :
mysql> set hive_parquet_use_column_names=true;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|  NULL | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)

mysql> set hive_parquet_use_column_names=false;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|     1 | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)
```

You can use `set
parquet.column.index.access/orc.force.positional.evolution = true/false`
in hive 3 to control the results of reading the table like these two
session variables. However, for the rename struct inside column parquet
table, the effects of hive and doris are different.
hubgeter added a commit to hubgeter/doris that referenced this pull request Oct 16, 2024
…es. (apache#38432)

Add `hive_parquet_use_column_names` and `hive_orc_use_column_names`
session variables to read the table after rename column in `Hive`.

These two session variables are referenced from
`parquet_use_column_names` and `orc_use_column_names` of `Trino` hive
connector.

By default, these two session variables are true. When they are set to
false, reading orc/parquet will access the columns according to the
ordinal position in the Hive table definition.

For example:
```mysql
in Hive :
hive> create table tmp (a int , b string) stored as parquet;
hive> insert into table tmp values(1,"2");
hive> alter table tmp  change column  a new_a int;
hive> insert into table tmp values(2,"4");

in Doris :
mysql> set hive_parquet_use_column_names=true;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|  NULL | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)

mysql> set hive_parquet_use_column_names=false;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|     1 | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)
```

You can use `set
parquet.column.index.access/orc.force.positional.evolution = true/false`
in hive 3 to control the results of reading the table like these two
session variables. However, for the rename struct inside column parquet
table, the effects of hive and doris are different.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.6-merged dev/3.0.3-merged meta-change reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants