Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](tvf) Support fs.defaultFS with postfix '/' #33202

Merged
merged 2 commits into from
Apr 3, 2024

Conversation

morningman
Copy link
Contributor

Proposed changes

For HDFS tvf like:

select count(*) from hdfs(
"uri" = "hdfs://HDFS8000871/path/to/1.parquet",
"fs.defaultFS" = "hdfs://HDFS8000871/",
"format" = "parquet"
);

Before, if the fs.defaultFS is end with /, the query will fail with error like:

reason: RemoteException: File does not exist: /user/doris/path/to/1.parquet

You can see that is a wrong path with wrong prefix /user/doris
User need to set fs.defaultFS to hdfs://HDFS8000871 to avoid this error.

This PR fix this issue

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@morningman
Copy link
Contributor Author

run buildall

Copy link
Contributor

github-actions bot commented Apr 3, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 38926 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b1984bf83636051f2ef3a2aab0487920a8824df3, data reload: false

------ Round 1 ----------------------------------
q1	17927	4196	4169	4169
q2	2599	200	188	188
q3	11106	1307	1397	1307
q4	10516	886	1104	886
q5	8740	2991	2936	2936
q6	219	134	133	133
q7	1102	610	607	607
q8	9400	2064	2022	2022
q9	6752	6221	6155	6155
q10	8427	3523	3501	3501
q11	414	254	228	228
q12	382	221	210	210
q13	17777	2912	2932	2912
q14	275	244	246	244
q15	538	497	474	474
q16	495	384	376	376
q17	944	895	887	887
q18	7333	6444	6453	6444
q19	1600	1550	1534	1534
q20	608	305	303	303
q21	3543	3116	3111	3111
q22	370	299	309	299
Total cold run time: 111067 ms
Total hot run time: 38926 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4045	4032	4025	4025
q2	329	217	216	216
q3	2958	2970	2953	2953
q4	1920	1859	1869	1859
q5	5241	5213	5235	5213
q6	209	124	122	122
q7	2285	1850	1835	1835
q8	3219	3276	3290	3276
q9	8509	8489	8497	8489
q10	3774	3868	3846	3846
q11	546	464	455	455
q12	700	550	542	542
q13	16765	2878	2881	2878
q14	296	265	266	265
q15	515	466	478	466
q16	456	400	408	400
q17	1718	1666	1668	1666
q18	7618	7175	7248	7175
q19	1636	1652	1641	1641
q20	1954	1750	1723	1723
q21	5100	4690	4712	4690
q22	499	420	431	420
Total cold run time: 70292 ms
Total hot run time: 54155 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.65% (8881/24909)
Line Coverage: 27.38% (72908/266312)
Region Coverage: 26.55% (37710/142010)
Branch Coverage: 23.35% (19215/82308)
Coverage Report: http://coverage.selectdb-in.cc/coverage/b1984bf83636051f2ef3a2aab0487920a8824df3_b1984bf83636051f2ef3a2aab0487920a8824df3/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 181250 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b1984bf83636051f2ef3a2aab0487920a8824df3, data reload: false

query1	1217	363	1122	363
query2	6472	1965	1845	1845
query3	6661	224	227	224
query4	24540	21541	21520	21520
query5	4201	414	400	400
query6	282	180	181	180
query7	4615	307	301	301
query8	230	179	180	179
query9	8467	2177	2184	2177
query10	578	274	257	257
query11	14937	14478	14566	14478
query12	145	95	98	95
query13	1639	399	380	380
query14	8648	7046	6893	6893
query15	210	183	180	180
query16	7131	274	278	274
query17	998	605	582	582
query18	1905	288	287	287
query19	210	158	162	158
query20	100	94	96	94
query21	199	132	131	131
query22	4940	4805	4851	4805
query23	33688	32902	32974	32902
query24	12066	3148	3172	3148
query25	704	399	403	399
query26	1894	171	161	161
query27	2967	331	334	331
query28	6724	1818	1800	1800
query29	1391	600	568	568
query30	320	165	159	159
query31	1032	731	720	720
query32	100	60	63	60
query33	700	247	249	247
query34	1029	493	498	493
query35	819	686	721	686
query36	1026	814	845	814
query37	283	79	75	75
query38	3540	3396	3391	3391
query39	1566	1516	1535	1516
query40	287	131	135	131
query41	46	43	43	43
query42	121	104	100	100
query43	450	422	416	416
query44	1062	710	702	702
query45	265	255	250	250
query46	1086	784	778	778
query47	1893	1793	1777	1777
query48	377	310	304	304
query49	1150	358	355	355
query50	816	388	393	388
query51	6747	6618	6517	6517
query52	110	97	96	96
query53	350	291	287	287
query54	309	227	227	227
query55	97	80	85	80
query56	233	221	215	215
query57	1205	1092	1132	1092
query58	245	210	226	210
query59	2437	2314	2471	2314
query60	249	228	248	228
query61	92	91	92	91
query62	706	447	442	442
query63	306	287	282	282
query64	6357	3027	3484	3027
query65	3039	2999	2980	2980
query66	1451	343	328	328
query67	15405	14913	14921	14913
query68	5115	548	573	548
query69	495	333	335	333
query70	1130	1124	1102	1102
query71	428	284	279	279
query72	6309	2680	2571	2571
query73	718	329	333	329
query74	6644	6287	6339	6287
query75	2996	2305	2270	2270
query76	3166	1068	1208	1068
query77	401	256	262	256
query78	10874	10109	10126	10109
query79	8263	553	544	544
query80	2055	440	434	434
query81	534	247	245	245
query82	1595	118	101	101
query83	267	164	159	159
query84	271	91	88	88
query85	2092	287	277	277
query86	502	309	272	272
query87	3671	3546	3505	3505
query88	4407	2352	2348	2348
query89	502	375	367	367
query90	2013	172	179	172
query91	139	109	103	103
query92	63	50	51	50
query93	6452	535	524	524
query94	1237	196	188	188
query95	1094	1091	1098	1091
query96	619	274	271	271
query97	2676	2492	2460	2460
query98	238	216	222	216
query99	1306	866	837	837
Total cold run time: 293337 ms
Total hot run time: 181250 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.76 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b1984bf83636051f2ef3a2aab0487920a8824df3, data reload: false

query1	0.04	0.03	0.04
query2	0.07	0.04	0.04
query3	0.23	0.04	0.04
query4	1.68	0.07	0.06
query5	0.48	0.48	0.48
query6	1.16	0.64	0.65
query7	0.01	0.01	0.01
query8	0.05	0.04	0.05
query9	0.57	0.52	0.51
query10	0.57	0.57	0.55
query11	0.15	0.11	0.12
query12	0.14	0.12	0.12
query13	0.62	0.59	0.60
query14	0.77	0.80	0.79
query15	0.89	0.86	0.85
query16	0.36	0.36	0.35
query17	1.00	0.99	1.00
query18	0.25	0.24	0.26
query19	1.78	1.70	1.73
query20	0.01	0.01	0.01
query21	15.42	0.75	0.70
query22	3.21	5.28	1.37
query23	17.52	1.32	1.12
query24	1.51	0.23	0.21
query25	0.12	0.09	0.08
query26	0.30	0.18	0.20
query27	0.08	0.09	0.08
query28	13.76	0.96	0.98
query29	12.55	3.40	3.58
query30	0.25	0.07	0.06
query31	2.85	0.39	0.40
query32	3.26	0.47	0.47
query33	2.84	2.83	2.84
query34	15.48	4.34	4.30
query35	4.36	4.36	4.36
query36	0.66	0.47	0.48
query37	0.19	0.16	0.18
query38	0.17	0.17	0.15
query39	0.05	0.04	0.05
query40	0.19	0.16	0.15
query41	0.10	0.04	0.05
query42	0.05	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 105.79 s
Total hot run time: 29.76 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit b1984bf83636051f2ef3a2aab0487920a8824df3 with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       17.2 seconds inserted 10000000 Rows, about 581K ops/s

@morningman
Copy link
Contributor Author

run buildall

Copy link
Contributor

github-actions bot commented Apr 3, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 39239 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit cba9b147f2f6a3a87958d4443735ee9a7acc0f4b, data reload: false

------ Round 1 ----------------------------------
q1	17723	4287	4186	4186
q2	2639	206	194	194
q3	13186	1282	1437	1282
q4	11136	898	1103	898
q5	7917	3053	2991	2991
q6	228	138	134	134
q7	1143	648	628	628
q8	9692	2060	2047	2047
q9	6713	6237	6174	6174
q10	8418	3519	3504	3504
q11	411	243	231	231
q12	381	216	209	209
q13	17765	2932	2953	2932
q14	274	239	251	239
q15	517	491	479	479
q16	512	380	380	380
q17	976	914	913	913
q18	7249	6590	6568	6568
q19	1589	1532	1549	1532
q20	622	315	317	315
q21	3532	3145	3101	3101
q22	362	302	310	302
Total cold run time: 112985 ms
Total hot run time: 39239 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4131	4052	4040	4040
q2	338	218	227	218
q3	2962	2943	2991	2943
q4	1878	1864	1848	1848
q5	5252	5258	5240	5240
q6	210	122	124	122
q7	2258	1847	1812	1812
q8	3230	3317	3302	3302
q9	8528	8481	8547	8481
q10	3763	3827	3830	3827
q11	543	458	445	445
q12	698	557	544	544
q13	12987	2916	2909	2909
q14	284	268	273	268
q15	520	473	468	468
q16	461	399	397	397
q17	1745	1704	1676	1676
q18	7692	7331	7258	7258
q19	1649	1639	1636	1636
q20	1932	1747	1721	1721
q21	5053	4741	4781	4741
q22	495	436	423	423
Total cold run time: 66609 ms
Total hot run time: 54319 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.65% (8880/24909)
Line Coverage: 27.38% (72906/266309)
Region Coverage: 26.55% (37703/142006)
Branch Coverage: 23.35% (19222/82304)
Coverage Report: http://coverage.selectdb-in.cc/coverage/cba9b147f2f6a3a87958d4443735ee9a7acc0f4b_cba9b147f2f6a3a87958d4443735ee9a7acc0f4b/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 179766 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit cba9b147f2f6a3a87958d4443735ee9a7acc0f4b, data reload: false

query1	1237	369	1130	369
query2	6471	1927	1772	1772
query3	6667	213	223	213
query4	23416	21448	21530	21448
query5	4212	424	424	424
query6	273	195	185	185
query7	4603	305	312	305
query8	234	174	172	172
query9	8461	2178	2170	2170
query10	592	260	272	260
query11	15002	14508	14352	14352
query12	145	105	94	94
query13	1649	408	394	394
query14	8560	6848	6731	6731
query15	200	182	183	182
query16	7137	268	273	268
query17	1013	594	565	565
query18	1891	283	267	267
query19	207	154	154	154
query20	96	95	89	89
query21	195	128	129	128
query22	4973	4812	4831	4812
query23	33362	32591	32856	32591
query24	12457	3161	3083	3083
query25	689	412	407	407
query26	1923	155	160	155
query27	3033	317	341	317
query28	6697	1825	1798	1798
query29	1348	602	593	593
query30	329	165	164	164
query31	1003	712	747	712
query32	96	62	59	59
query33	710	258	252	252
query34	1042	483	501	483
query35	837	691	696	691
query36	997	848	864	848
query37	281	85	79	79
query38	3526	3421	3365	3365
query39	1576	1553	1506	1506
query40	303	135	140	135
query41	49	47	47	47
query42	121	104	106	104
query43	433	397	399	397
query44	1115	722	709	709
query45	287	257	267	257
query46	1061	780	784	780
query47	1896	1807	1790	1790
query48	379	303	311	303
query49	1174	371	363	363
query50	796	393	390	390
query51	6774	6505	6612	6505
query52	108	95	104	95
query53	358	293	288	288
query54	317	244	244	244
query55	90	82	82	82
query56	247	234	231	231
query57	1197	1087	1117	1087
query58	251	222	225	222
query59	2547	2354	2300	2300
query60	262	233	250	233
query61	113	107	113	107
query62	705	444	440	440
query63	320	289	290	289
query64	6545	3547	3276	3276
query65	3086	2993	3006	2993
query66	1453	349	330	330
query67	15465	14983	14801	14801
query68	7451	603	574	574
query69	538	324	343	324
query70	1196	1115	1083	1083
query71	482	279	278	278
query72	6411	2585	2389	2389
query73	810	326	328	326
query74	6696	6286	6271	6271
query75	3339	2303	2327	2303
query76	4906	1149	1214	1149
query77	612	254	250	250
query78	10800	10097	10260	10097
query79	8266	537	537	537
query80	1225	430	421	421
query81	507	239	244	239
query82	715	106	103	103
query83	202	164	163	163
query84	268	87	88	87
query85	1446	287	273	273
query86	435	272	288	272
query87	3683	3544	3499	3499
query88	3786	2346	2344	2344
query89	551	367	379	367
query90	1961	176	180	176
query91	135	108	106	106
query92	64	52	56	52
query93	6635	542	520	520
query94	1148	194	190	190
query95	433	335	337	335
query96	615	269	272	269
query97	2651	2508	2478	2478
query98	235	210	211	210
query99	1297	835	826	826
Total cold run time: 294091 ms
Total hot run time: 179766 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.44 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit cba9b147f2f6a3a87958d4443735ee9a7acc0f4b, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.67	0.07	0.06
query5	0.48	0.48	0.48
query6	1.17	0.65	0.65
query7	0.02	0.01	0.01
query8	0.05	0.04	0.05
query9	0.56	0.51	0.51
query10	0.56	0.57	0.57
query11	0.15	0.11	0.11
query12	0.14	0.12	0.12
query13	0.61	0.60	0.59
query14	0.77	0.78	0.79
query15	0.86	0.84	0.83
query16	0.35	0.36	0.35
query17	0.95	0.95	0.96
query18	0.25	0.24	0.26
query19	1.87	1.69	1.69
query20	0.02	0.01	0.01
query21	15.41	0.78	0.72
query22	3.05	5.00	2.16
query23	17.78	1.34	1.13
query24	1.68	0.31	0.22
query25	0.15	0.10	0.09
query26	0.28	0.17	0.20
query27	0.08	0.09	0.07
query28	13.53	0.96	0.93
query29	12.62	3.60	3.31
query30	0.26	0.07	0.06
query31	2.84	0.39	0.39
query32	3.26	0.47	0.48
query33	2.89	2.83	2.88
query34	15.50	4.30	4.33
query35	4.36	4.39	4.38
query36	0.65	0.47	0.48
query37	0.20	0.17	0.18
query38	0.17	0.17	0.15
query39	0.04	0.05	0.04
query40	0.18	0.15	0.14
query41	0.10	0.05	0.05
query42	0.06	0.05	0.04
query43	0.05	0.04	0.04
Total cold run time: 105.97 s
Total hot run time: 30.44 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit cba9b147f2f6a3a87958d4443735ee9a7acc0f4b with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       16.8 seconds inserted 10000000 Rows, about 595K ops/s

Copy link
Contributor

@wsjz wsjz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

github-actions bot commented Apr 3, 2024

PR approved by anyone and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 3, 2024
Copy link
Contributor

github-actions bot commented Apr 3, 2024

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit 84f5fa6 into apache:master Apr 3, 2024
27 of 29 checks passed
morningman added a commit to morningman/doris that referenced this pull request Apr 3, 2024
morningman added a commit to morningman/doris that referenced this pull request Apr 7, 2024
For HDFS tvf like:
```
select count(*) from hdfs(
"uri" = "hdfs://HDFS8000871/path/to/1.parquet",
"fs.defaultFS" = "hdfs://HDFS8000871/",
"format" = "parquet"
);
```

Before, if the `fs.defaultFS` is end with `/`, the query will fail with error like:
```
reason: RemoteException: File does not exist: /user/doris/path/to/1.parquet
```
You can see that is a wrong path with wrong prefix `/user/doris`
User need to set `fs.defaultFS` to `hdfs://HDFS8000871` to avoid this error.

This PR fix this issue
mongo360 pushed a commit to mongo360/doris that referenced this pull request Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.8-merged dev/2.1.2-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants