Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](catalog) fix wrong check when using "use_meta_cache=true" #36530

Merged
merged 1 commit into from
Jun 19, 2024

Conversation

morningman
Copy link
Contributor

@morningman morningman commented Jun 19, 2024

This is PR #33610 introduce a new feature of use_meta_cache=true.
And made a wrong check when checking this config.
And if user enable the hive metastore even listener for hive catalog,
it may causing FE unable to restart due to meta data replay error:

2024-06-19 14:25:32,536 ERROR (stateListener|118) [EditLog.loadJournal():1231] Operation Type 325
java.util.NoSuchElementException: No value present
        at java.util.Optional.get(Optional.java:135) ~[?:1.8.0_341]
        at org.apache.doris.datasource.ExternalCatalog.replayInitCatalog(ExternalCatalog.java:594) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.datasource.CatalogMgr.replayInitCatalog(CatalogMgr.java:584) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.persist.EditLog.loadJournal(EditLog.java:1012) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env.replayJournal(Env.java:2779) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env.transferToMaster(Env.java:1473) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env.access$1400(Env.java:324) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env$5.runOneCycle(Env.java:2670) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.Daemon.run(Daemon.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]

TODO:
add hive event listener test suit in external p0

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@morningman morningman changed the title [fix](catalog) fix wrong check when using [fix](catalog) fix wrong check when using "use_meta_cache=true" Jun 19, 2024
@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39812 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d2d7f0483efe5dfcd9e378f0a466122d95b138dd, data reload: false

------ Round 1 ----------------------------------
q1	17617	4581	4263	4263
q2	2025	193	193	193
q3	10464	1083	1117	1083
q4	10191	751	815	751
q5	7482	2634	2633	2633
q6	221	138	142	138
q7	957	615	593	593
q8	9243	2048	2079	2048
q9	8713	6456	6432	6432
q10	8892	3703	3734	3703
q11	457	234	235	234
q12	443	232	234	232
q13	17766	2979	2988	2979
q14	273	219	225	219
q15	527	494	491	491
q16	519	400	375	375
q17	954	709	683	683
q18	8128	7483	7451	7451
q19	5406	1413	1470	1413
q20	650	325	337	325
q21	4944	3233	3850	3233
q22	397	340	346	340
Total cold run time: 116269 ms
Total hot run time: 39812 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4365	4224	4237	4224
q2	378	284	269	269
q3	3002	2736	2725	2725
q4	1875	1572	1602	1572
q5	5270	5289	5294	5289
q6	219	129	135	129
q7	2101	1766	1720	1720
q8	3175	3303	3322	3303
q9	8287	8234	8275	8234
q10	3856	3636	3709	3636
q11	586	479	481	479
q12	762	630	582	582
q13	16404	3027	3006	3006
q14	291	254	257	254
q15	518	475	477	475
q16	477	403	438	403
q17	1770	1481	1490	1481
q18	7670	7520	7278	7278
q19	1718	1546	1616	1546
q20	1957	1818	1754	1754
q21	4715	4708	4739	4708
q22	628	539	558	539
Total cold run time: 70024 ms
Total hot run time: 53606 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172020 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d2d7f0483efe5dfcd9e378f0a466122d95b138dd, data reload: false

query1	925	385	367	367
query2	6459	2453	2286	2286
query3	6645	203	210	203
query4	19239	17338	17398	17338
query5	4133	468	468	468
query6	251	159	158	158
query7	4588	304	301	301
query8	324	279	277	277
query9	8447	2397	2387	2387
query10	614	297	297	297
query11	10651	10100	10080	10080
query12	134	92	85	85
query13	1654	378	376	376
query14	10252	6932	7238	6932
query15	239	188	192	188
query16	7807	281	274	274
query17	1656	559	558	558
query18	1927	288	284	284
query19	204	161	161	161
query20	93	82	86	82
query21	209	141	127	127
query22	4372	3950	4132	3950
query23	33656	32921	32953	32921
query24	12107	2867	2826	2826
query25	686	384	434	384
query26	1813	156	158	156
query27	3005	311	308	308
query28	7652	2039	2026	2026
query29	1258	636	612	612
query30	287	151	151	151
query31	968	749	763	749
query32	89	53	55	53
query33	765	300	284	284
query34	997	483	468	468
query35	721	611	624	611
query36	1129	965	922	922
query37	287	70	69	69
query38	2885	2743	2699	2699
query39	861	805	785	785
query40	289	129	126	126
query41	59	59	57	57
query42	127	95	105	95
query43	581	539	531	531
query44	1206	712	711	711
query45	197	167	167	167
query46	1103	761	723	723
query47	1894	1770	1767	1767
query48	359	298	299	298
query49	1199	424	403	403
query50	751	383	391	383
query51	6647	6684	6651	6651
query52	99	90	91	90
query53	355	285	284	284
query54	1057	442	432	432
query55	72	72	72	72
query56	279	268	256	256
query57	1174	1043	1035	1035
query58	248	239	246	239
query59	3477	3293	3040	3040
query60	293	268	283	268
query61	98	90	90	90
query62	680	429	431	429
query63	318	288	286	286
query64	9828	2259	1749	1749
query65	3162	3125	3087	3087
query66	1354	328	330	328
query67	15482	15086	14883	14883
query68	4632	537	536	536
query69	457	296	306	296
query70	1106	1162	1173	1162
query71	417	264	277	264
query72	7073	5597	5183	5183
query73	744	321	324	321
query74	5986	5500	5497	5497
query75	3444	2665	2647	2647
query76	2915	986	929	929
query77	450	306	349	306
query78	10410	9945	9644	9644
query79	2633	508	497	497
query80	2256	459	452	452
query81	584	223	221	221
query82	848	104	102	102
query83	291	177	173	173
query84	273	91	90	90
query85	2094	288	266	266
query86	501	295	327	295
query87	3269	3089	3066	3066
query88	4215	2345	2333	2333
query89	472	391	395	391
query90	1841	192	192	192
query91	129	99	98	98
query92	66	48	47	47
query93	2793	527	500	500
query94	1262	182	244	182
query95	414	309	310	309
query96	585	273	264	264
query97	3249	3030	3063	3030
query98	220	191	189	189
query99	1262	869	855	855
Total cold run time: 279268 ms
Total hot run time: 172020 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.24 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d2d7f0483efe5dfcd9e378f0a466122d95b138dd, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.03	0.03
query3	0.22	0.06	0.06
query4	1.67	0.09	0.08
query5	0.50	0.47	0.48
query6	1.13	0.71	0.72
query7	0.02	0.01	0.02
query8	0.05	0.04	0.04
query9	0.54	0.49	0.48
query10	0.54	0.54	0.54
query11	0.16	0.11	0.11
query12	0.15	0.11	0.12
query13	0.59	0.58	0.60
query14	0.79	0.78	0.78
query15	0.83	0.81	0.81
query16	0.35	0.36	0.38
query17	1.03	0.95	0.96
query18	0.21	0.24	0.22
query19	1.78	1.66	1.67
query20	0.01	0.01	0.01
query21	15.42	0.65	0.65
query22	4.52	7.26	1.81
query23	18.26	1.34	1.27
query24	2.10	0.23	0.22
query25	0.17	0.08	0.08
query26	0.26	0.17	0.18
query27	0.09	0.09	0.08
query28	13.27	1.01	1.00
query29	12.64	3.39	3.34
query30	0.26	0.07	0.07
query31	2.86	0.40	0.37
query32	3.27	0.47	0.46
query33	2.91	2.83	2.94
query34	17.16	4.44	4.36
query35	4.49	4.44	4.43
query36	0.65	0.46	0.47
query37	0.18	0.15	0.16
query38	0.15	0.14	0.14
query39	0.04	0.03	0.04
query40	0.17	0.14	0.14
query41	0.09	0.05	0.05
query42	0.06	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 109.75 s
Total hot run time: 30.24 s

Copy link
Contributor

@Jibing-Li Jibing-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 19, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit 34d3e91 into apache:master Jun 19, 2024
29 of 32 checks passed
dataroaring pushed a commit that referenced this pull request Jun 21, 2024
This is PR #33610 introduce a new feature of `use_meta_cache=true`.
And made a wrong check when checking this config.
And if user enable the hive metastore even listener for hive catalog,
it may causing FE unable to restart due to meta data replay error:
```
2024-06-19 14:25:32,536 ERROR (stateListener|118) [EditLog.loadJournal():1231] Operation Type 325
java.util.NoSuchElementException: No value present
        at java.util.Optional.get(Optional.java:135) ~[?:1.8.0_341]
        at org.apache.doris.datasource.ExternalCatalog.replayInitCatalog(ExternalCatalog.java:594) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.datasource.CatalogMgr.replayInitCatalog(CatalogMgr.java:584) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.persist.EditLog.loadJournal(EditLog.java:1012) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env.replayJournal(Env.java:2779) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env.transferToMaster(Env.java:1473) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env.access$1400(Env.java:324) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env$5.runOneCycle(Env.java:2670) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.Daemon.run(Daemon.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
```

TODO:
add hive event listener test suit in external p0
morningman added a commit that referenced this pull request Jul 15, 2024
1. Redundant hashCode of FunctionCallExpr.java
This line is redundant, and will cause the amount of hashcode
calculation to grow exponentially.

2. A potential deadlock of external catalog
    The following case may causing this deadlock:
    - high frequency load.
    - querying external table join inner table on non-master FE. 
    - refresh external catalog frequently.

3. A workaround to avoid FE restart failure because db does not found
    introduced from #33610 and fixed in #36530.
    This PR is to make previous metadata restart successfully.
seawinde pushed a commit to seawinde/doris that referenced this pull request Jul 17, 2024
1. Redundant hashCode of FunctionCallExpr.java
This line is redundant, and will cause the amount of hashcode
calculation to grow exponentially.

2. A potential deadlock of external catalog
    The following case may causing this deadlock:
    - high frequency load.
    - querying external table join inner table on non-master FE. 
    - refresh external catalog frequently.

3. A workaround to avoid FE restart failure because db does not found
    introduced from apache#33610 and fixed in apache#36530.
    This PR is to make previous metadata restart successfully.
dataroaring pushed a commit that referenced this pull request Jul 17, 2024
1. Redundant hashCode of FunctionCallExpr.java
This line is redundant, and will cause the amount of hashcode
calculation to grow exponentially.

2. A potential deadlock of external catalog
    The following case may causing this deadlock:
    - high frequency load.
    - querying external table join inner table on non-master FE. 
    - refresh external catalog frequently.

3. A workaround to avoid FE restart failure because db does not found
    introduced from #33610 and fixed in #36530.
    This PR is to make previous metadata restart successfully.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.4-merged dev/3.0.0-merged p0_b reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants