Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](scan) catch exceptions thrown in scanner #36101

Merged
merged 1 commit into from
Jun 11, 2024

Conversation

mrhhsg
Copy link
Member

@mrhhsg mrhhsg commented Jun 11, 2024

Proposed changes

The uncaught exceptions thrown in the scanner will cause the BE to crash.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@mrhhsg
Copy link
Member Author

mrhhsg commented Jun 11, 2024

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 11, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 41139 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit bb61bf44430667c1ae2a495039aa11b924b36f03, data reload: false

------ Round 1 ----------------------------------
q1	17617	4366	4277	4277
q2	2031	188	183	183
q3	10465	1180	1174	1174
q4	10185	862	850	850
q5	7467	2680	2575	2575
q6	222	139	141	139
q7	947	602	606	602
q8	9234	2051	2077	2051
q9	9017	6523	6476	6476
q10	9001	3713	3699	3699
q11	461	244	241	241
q12	429	245	223	223
q13	18819	2977	2980	2977
q14	275	222	224	222
q15	527	464	466	464
q16	481	388	376	376
q17	971	705	645	645
q18	8351	7960	7833	7833
q19	8759	1462	1494	1462
q20	683	334	329	329
q21	5148	4012	4037	4012
q22	402	368	329	329
Total cold run time: 121492 ms
Total hot run time: 41139 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4625	4486	4335	4335
q2	363	261	273	261
q3	3116	2895	2889	2889
q4	1938	1709	1754	1709
q5	5516	5545	5551	5545
q6	218	126	132	126
q7	2242	1845	1781	1781
q8	3290	3431	3403	3403
q9	8673	8755	8698	8698
q10	4054	3874	3892	3874
q11	593	510	492	492
q12	832	606	623	606
q13	17008	3036	3130	3036
q14	311	265	288	265
q15	518	482	495	482
q16	477	430	423	423
q17	1794	1510	1491	1491
q18	8193	7769	7753	7753
q19	1759	1583	1675	1583
q20	3165	1904	1863	1863
q21	5985	4768	4870	4768
q22	704	545	539	539
Total cold run time: 75374 ms
Total hot run time: 55922 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172369 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit bb61bf44430667c1ae2a495039aa11b924b36f03, data reload: false

query1	825	382	375	375
query2	6397	2331	2272	2272
query3	6401	205	218	205
query4	19487	17545	17326	17326
query5	3889	463	469	463
query6	243	163	169	163
query7	4453	297	305	297
query8	306	288	273	273
query9	8334	2380	2366	2366
query10	565	313	296	296
query11	10419	10015	10041	10015
query12	130	87	85	85
query13	1567	367	346	346
query14	9972	7062	6386	6386
query15	241	186	182	182
query16	7183	266	256	256
query17	1386	517	509	509
query18	1901	267	268	267
query19	195	158	156	156
query20	88	83	82	82
query21	224	156	131	131
query22	4372	4123	3962	3962
query23	33618	33648	33639	33639
query24	11095	2859	2865	2859
query25	678	387	392	387
query26	1405	152	149	149
query27	3018	324	321	321
query28	7513	2035	2027	2027
query29	977	681	607	607
query30	228	147	148	147
query31	928	742	728	728
query32	88	52	54	52
query33	752	272	270	270
query34	950	488	468	468
query35	718	634	609	609
query36	1086	924	942	924
query37	156	67	67	67
query38	2846	2742	2740	2740
query39	847	780	800	780
query40	211	126	127	126
query41	52	45	48	45
query42	123	98	97	97
query43	579	538	532	532
query44	1159	731	721	721
query45	197	168	161	161
query46	1068	688	728	688
query47	1928	1736	1772	1736
query48	384	304	301	301
query49	886	409	398	398
query50	760	375	383	375
query51	6867	6712	6673	6673
query52	100	91	93	91
query53	356	295	282	282
query54	869	446	433	433
query55	73	76	71	71
query56	280	254	261	254
query57	1155	1080	1009	1009
query58	249	258	245	245
query59	3354	3295	3076	3076
query60	302	269	260	260
query61	112	83	83	83
query62	640	448	451	448
query63	321	282	296	282
query64	8950	2193	1733	1733
query65	3185	3079	3092	3079
query66	1057	319	337	319
query67	15409	14922	14950	14922
query68	5862	541	530	530
query69	715	433	376	376
query70	1187	1104	1044	1044
query71	465	280	274	274
query72	7974	5506	5648	5506
query73	768	328	327	327
query74	5844	5472	5553	5472
query75	3911	2669	2636	2636
query76	3717	888	921	888
query77	583	296	305	296
query78	10515	9985	9890	9890
query79	2113	512	516	512
query80	1292	460	447	447
query81	590	223	219	219
query82	1389	105	99	99
query83	231	167	165	165
query84	260	82	88	82
query85	1520	296	273	273
query86	467	315	326	315
query87	3222	3119	3056	3056
query88	3971	2340	2357	2340
query89	482	371	393	371
query90	1867	195	189	189
query91	125	97	98	97
query92	68	51	52	51
query93	2159	490	488	488
query94	1238	203	188	188
query95	395	320	306	306
query96	596	261	263	261
query97	3167	3016	3029	3016
query98	214	199	203	199
query99	1289	837	829	829
Total cold run time: 274303 ms
Total hot run time: 172369 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.44% (8977/24634)
Line Coverage: 27.97% (73368/262317)
Region Coverage: 27.36% (38016/138942)
Branch Coverage: 23.99% (19305/80474)
Coverage Report: http://coverage.selectdb-in.cc/coverage/bb61bf44430667c1ae2a495039aa11b924b36f03_bb61bf44430667c1ae2a495039aa11b924b36f03/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 30.37 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit bb61bf44430667c1ae2a495039aa11b924b36f03, data reload: false

query1	0.05	0.03	0.03
query2	0.08	0.04	0.05
query3	0.23	0.05	0.06
query4	1.68	0.10	0.10
query5	0.51	0.49	0.51
query6	1.13	0.72	0.73
query7	0.02	0.01	0.02
query8	0.05	0.04	0.04
query9	0.53	0.48	0.48
query10	0.55	0.54	0.54
query11	0.16	0.11	0.12
query12	0.15	0.11	0.12
query13	0.61	0.59	0.61
query14	0.78	0.78	0.80
query15	0.84	0.82	0.82
query16	0.36	0.36	0.37
query17	1.02	0.99	0.95
query18	0.24	0.24	0.26
query19	1.77	1.75	1.81
query20	0.02	0.01	0.01
query21	15.41	0.65	0.65
query22	3.87	8.36	1.59
query23	18.29	1.38	1.31
query24	2.12	0.21	0.23
query25	0.15	0.07	0.08
query26	0.28	0.17	0.17
query27	0.09	0.08	0.09
query28	13.19	1.02	1.00
query29	12.65	3.31	3.25
query30	0.26	0.06	0.06
query31	2.86	0.40	0.39
query32	3.25	0.48	0.48
query33	2.89	2.91	2.93
query34	17.23	4.40	4.42
query35	4.55	4.53	4.53
query36	0.65	0.46	0.48
query37	0.18	0.15	0.16
query38	0.15	0.14	0.14
query39	0.04	0.04	0.03
query40	0.17	0.15	0.14
query41	0.10	0.06	0.05
query42	0.06	0.05	0.05
query43	0.04	0.03	0.04
Total cold run time: 109.26 s
Total hot run time: 30.37 s

@mrhhsg mrhhsg merged commit da7269c into apache:master Jun 11, 2024
27 of 31 checks passed
@mrhhsg mrhhsg deleted the catch_scanner_exception branch June 11, 2024 08:22
dataroaring pushed a commit that referenced this pull request Jun 13, 2024
## Proposed changes

The uncaught exceptions thrown in the scanner will cause the BE to
crash.

<!--Describe your changes.-->
dataroaring pushed a commit that referenced this pull request Jul 2, 2024
## Proposed changes

```
F20240628 01:49:16.382710 4183685 delete_handler.cpp:388] Check failed: !_is_inited reinitialize delete handler.
*** Check failure stack trace: ***
    @     0x55700470e3c6  google::LogMessage::SendToLog()
    @     0x55700470ae10  google::LogMessage::Flush()
    @     0x55700470ec09  google::LogMessageFatal::~LogMessageFatal()
    @     0x556fccf40e64  doris::DeleteHandler::init()
    @     0x556fcff46678  doris::TabletReader::_init_delete_condition()
    @     0x556fcff3a2dd  doris::TabletReader::_init_params()
    @     0x556fcff39432  doris::TabletReader::init()
    @     0x556fffb8c2dd  doris::vectorized::BlockReader::init()
    @     0x557002cca96a  doris::vectorized::NewOlapScanner::open()
    @     0x556fe892d565  doris::vectorized::ScannerScheduler::_scanner_scan()
    @     0x556fe8931a0f  _ZNSt17_Function_handlerIFvvEZZN5doris10vectorized16ScannerScheduler6submitESt10shared_ptrINS2_14ScannerContextEES4_INS2_8ScanTaskEEENK3$_1clEvEUlvE_E9_M_invokeERKSt9_Any_data
    @     0x556fd0ed95dc  doris::ThreadPool::dispatch_thread()
    @     0x556fd0eb1288  doris::Thread::supervise_thread()
    @     0x7f95143b5609  start_thread
    @     0x7f9514662133  clone
    @              (nil)  (unknown)
*** Query id: c389fc2a1ff6473c-a06f032b8970810c ***
*** is nereids: 1 ***
*** tablet id: 0 ***
*** Aborted at 1719510556 (unix time) try "date -d @1719510556" if you are using GNU date ***
*** Current BE git commitID: b13c17d ***
*** SIGABRT unknown detail explain (@0x3fca33) received by PID 4180531 (TID 4183685 OR 0x7f89734a5700) from PID 4180531; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421
 1# 0x00007F9514586090 in /lib/x86_64-linux-gnu/libc.so.6
 2# raise at ../sysdeps/unix/sysv/linux/raise.c:51
 3# abort at /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:81
 4# 0x0000557004718C9D in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
 5# 0x000055700470B2DA in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
 6# google::LogMessage::SendToLog() in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
 7# google::LogMessage::Flush() in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
 8# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
 9# doris::DeleteHandler::init(std::shared_ptr<doris::TabletSchema>, std::vector<std::shared_ptr<doris::RowsetMeta>, std::allocator<std::shared_ptr<doris::RowsetMeta> > > const&, long, bool) at /home/zcp/repo_center/doris_master/doris/be/src/olap/delete_handler.cpp:388
10# doris::TabletReader::_init_delete_condition(doris::TabletReader::ReaderParams const&) at /home/zcp/repo_center/doris_master/doris/be/src/olap/tablet_reader.cpp:654
11# doris::TabletReader::_init_params(doris::TabletReader::ReaderParams const&) at /home/zcp/repo_center/doris_master/doris/be/src/olap/tablet_reader.cpp:295
12# doris::TabletReader::init(doris::TabletReader::ReaderParams const&) at /home/zcp/repo_center/doris_master/doris/be/src/olap/tablet_reader.cpp:128
13# doris::vectorized::BlockReader::init(doris::TabletReader::ReaderParams const&) in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
14# doris::vectorized::NewOlapScanner::open(doris::RuntimeState*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/new_olap_scanner.cpp:219
15# doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:250
16# std::_Function_handler<void (), doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_1::operator()() const::{lambda()#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
17# doris::ThreadPool::dispatch_thread() in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
18# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_master/doris/be/src/util/thread.cpp:499
19# start_thread at /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:478
20# __clone at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

```

related PRs: #36090,
#36101,
#36314
mrhhsg added a commit to mrhhsg/doris that referenced this pull request Jul 8, 2024
## Proposed changes

The uncaught exceptions thrown in the scanner will cause the BE to
crash.

<!--Describe your changes.-->
mrhhsg added a commit to mrhhsg/doris that referenced this pull request Jul 11, 2024
## Proposed changes

The uncaught exceptions thrown in the scanner will cause the BE to
crash.

<!--Describe your changes.-->
mrhhsg added a commit that referenced this pull request Jul 12, 2024
## Proposed changes

pick #36101

The uncaught exceptions thrown in the scanner will cause the BE to
crash.
dataroaring pushed a commit that referenced this pull request Jul 17, 2024
## Proposed changes

```
F20240628 01:49:16.382710 4183685 delete_handler.cpp:388] Check failed: !_is_inited reinitialize delete handler.
*** Check failure stack trace: ***
    @     0x55700470e3c6  google::LogMessage::SendToLog()
    @     0x55700470ae10  google::LogMessage::Flush()
    @     0x55700470ec09  google::LogMessageFatal::~LogMessageFatal()
    @     0x556fccf40e64  doris::DeleteHandler::init()
    @     0x556fcff46678  doris::TabletReader::_init_delete_condition()
    @     0x556fcff3a2dd  doris::TabletReader::_init_params()
    @     0x556fcff39432  doris::TabletReader::init()
    @     0x556fffb8c2dd  doris::vectorized::BlockReader::init()
    @     0x557002cca96a  doris::vectorized::NewOlapScanner::open()
    @     0x556fe892d565  doris::vectorized::ScannerScheduler::_scanner_scan()
    @     0x556fe8931a0f  _ZNSt17_Function_handlerIFvvEZZN5doris10vectorized16ScannerScheduler6submitESt10shared_ptrINS2_14ScannerContextEES4_INS2_8ScanTaskEEENK3$_1clEvEUlvE_E9_M_invokeERKSt9_Any_data
    @     0x556fd0ed95dc  doris::ThreadPool::dispatch_thread()
    @     0x556fd0eb1288  doris::Thread::supervise_thread()
    @     0x7f95143b5609  start_thread
    @     0x7f9514662133  clone
    @              (nil)  (unknown)
*** Query id: c389fc2a1ff6473c-a06f032b8970810c ***
*** is nereids: 1 ***
*** tablet id: 0 ***
*** Aborted at 1719510556 (unix time) try "date -d @1719510556" if you are using GNU date ***
*** Current BE git commitID: b13c17d ***
*** SIGABRT unknown detail explain (@0x3fca33) received by PID 4180531 (TID 4183685 OR 0x7f89734a5700) from PID 4180531; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421
 1# 0x00007F9514586090 in /lib/x86_64-linux-gnu/libc.so.6
 2# raise at ../sysdeps/unix/sysv/linux/raise.c:51
 3# abort at /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:81
 4# 0x0000557004718C9D in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
 5# 0x000055700470B2DA in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
 6# google::LogMessage::SendToLog() in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
 7# google::LogMessage::Flush() in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
 8# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
 9# doris::DeleteHandler::init(std::shared_ptr<doris::TabletSchema>, std::vector<std::shared_ptr<doris::RowsetMeta>, std::allocator<std::shared_ptr<doris::RowsetMeta> > > const&, long, bool) at /home/zcp/repo_center/doris_master/doris/be/src/olap/delete_handler.cpp:388
10# doris::TabletReader::_init_delete_condition(doris::TabletReader::ReaderParams const&) at /home/zcp/repo_center/doris_master/doris/be/src/olap/tablet_reader.cpp:654
11# doris::TabletReader::_init_params(doris::TabletReader::ReaderParams const&) at /home/zcp/repo_center/doris_master/doris/be/src/olap/tablet_reader.cpp:295
12# doris::TabletReader::init(doris::TabletReader::ReaderParams const&) at /home/zcp/repo_center/doris_master/doris/be/src/olap/tablet_reader.cpp:128
13# doris::vectorized::BlockReader::init(doris::TabletReader::ReaderParams const&) in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
14# doris::vectorized::NewOlapScanner::open(doris::RuntimeState*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/new_olap_scanner.cpp:219
15# doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:250
16# std::_Function_handler<void (), doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::ScanTask>)::$_1::operator()() const::{lambda()#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
17# doris::ThreadPool::dispatch_thread() in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
18# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_master/doris/be/src/util/thread.cpp:499
19# start_thread at /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:478
20# __clone at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

```

related PRs: #36090,
#36101,
#36314
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.5-merged dev/3.0.0-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants