Graphd Crashed down with single big query #5615
Labels
affects/none
PR/issue: this bug affects none version.
process/fixed
Process of bug
severity/none
Severity of bug
type/bug
Type: something is unexpected
Milestone
Please check the FAQ documentation before raising an issue
Describe the bug (required)
Your Environments (required)
How To Reproduce(required)
Steps to reproduce the behavior:
change graphd watermark config
--system_memory_high_watermark_ratio=0.5
--enable_space_level_metrics=false
--memory_tracker_limit_ratio=0.5
--memory_tracker_untracked_reserved_memory_mb=1024
--memory_tracker_detail_log=true
--memory_tracker_detail_log_interval_ms=3000
Execute sql
MATCH p=(src)-[e:ScheduleTaskRelationship*10]-(dst) where id(src) in [1113869279948709888] RETURN DISTINCT p LIMIT 300
graphd crashed
`
/var/log/messages-Jun 30 13:42:23 kernel: [ 7847] 0 7847 53675 278 58 0 0 sudo
/var/log/messages-Jun 30 13:42:23 kernel: [ 7848] 0 7848 28919 159 13 0 0 bash
/var/log/messages-Jun 30 13:42:23 kernel: [ 8263] 0 8263 4194482 3629679 7166 0 0 nebula-graphd
/var/log/messages-Jun 30 13:42:23 kernel: [ 8376] 0 8376 27024 29 9 0 0 tail
/var/log/messages-Jun 30 13:42:23 kernel: Out of memory: Kill process 8263 (nebula-graphd) score 882 or sacrifice child
/var/log/messages:Jun 30 13:42:23 kernel: Killed process 8263 (nebula-graphd) total-vm:16777928kB, anon-rss:14518716kB, file-rss:0kB, shmem-rss:0kB
/var/log/messages-Jun 30 13:42:23 kernel: AliYunDunUpdate invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
/var/log/messages-Jun 30 13:42:23 kernel: AliYunDunUpdate cpuset=/ mems_allowed=0
we found oom killed process!
Expected behavior
when a query need a lot of memory in query steps! stop it!
Additional context
we do not found any information in error log;
the graphd info details here
20230630 13:41:51.127751 8315 MemoryUtils.cpp:227] sys:1.434GiB/15.250GiB 9.41% usr:31.000MiB/7.125GiB 0.42% I20230630 13:41:54.127897 8315 MemoryUtils.cpp:227] sys:1.434GiB/15.250GiB 9.41% usr:31.000MiB/7.125GiB 0.42% I20230630 13:41:57.129904 8315 MemoryUtils.cpp:227] sys:1.434GiB/15.250GiB 9.40% usr:31.000MiB/7.125GiB 0.42% I20230630 13:42:00.128515 8315 MemoryUtils.cpp:227] sys:1.437GiB/15.250GiB 9.42% usr:31.000MiB/7.125GiB 0.42% I20230630 13:42:00.460376 8313 MetaClient.cpp:2662] Send heartbeat to "172.20.221.57":9559, clusterId 0 I20230630 13:42:00.460444 8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559 I20230630 13:42:00.460459 8292 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559 I20230630 13:42:00.461441 8292 MetaClient.cpp:2680] Metad last update time: 1688024786386 I20230630 13:42:02.843515 8282 GraphService.cpp:77] Authenticating user root from 192.168.28.30:53815 I20230630 13:42:02.843633 8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559 I20230630 13:42:02.843653 8312 MetaClient.cpp:730] Send request to meta 172.20.221.5:9559 I20230630 13:42:02.844628 8282 GraphSessionManager.cpp:139] Create session id: 1688103841596307, for user: root I20230630 13:42:02.844677 8282 GraphService.cpp:111] Create session doFinish I20230630 13:42:02.856063 8282 GraphSessionManager.cpp:40] Find session from cache: 1688103841596307 I20230630 13:42:02.856137 8283 ClientSession.cpp:43] Add query: USE data_asset_10022 epId: 0 I20230630 13:42:02.856153 8283 QueryInstance.cpp:80] Parsing query: USE data_asset_10022; I20230630 13:42:02.856284 8283 Symbols.cpp:48] New variable for: __Start_0 I20230630 13:42:02.856295 8283 PlanNode.cpp:27] New variable: __Start_0 I20230630 13:42:02.856319 8283 Symbols.cpp:48] New variable for: __RegisterSpaceToSession_1 I20230630 13:42:02.856325 8283 PlanNode.cpp:27] New variable: __RegisterSpaceToSession_1 I20230630 13:42:02.856338 8283 Validator.cpp:409] root: RegisterSpaceToSession tail: Start I20230630 13:42:02.856627 8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559 I20230630 13:42:02.856642 8292 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559 I20230630 13:42:02.856972 8283 SwitchSpaceExecutor.cpp:45] Graph switched to [data_asset_10022,](url) space id: 127 I20230630 13:42:02.856995 8283 QueryInstance.cpp:128] Finish query: USE data_asset_10022; I20230630 13:42:02.857013 8283 ClientSession.cpp:52] Delete query, epId: 0 I20230630 13:42:02.868083 8283 GraphSessionManager.cpp:40] Find session from cache: 1688103841596307 I20230630 13:42:02.868115 8283 ClientSession.cpp:43] Add query: MATCH p=(src)-[e:ScheduleTaskRelationship*10]-(dst) where id(src) in [1113869279948709888] RETURN DISTINCT p LIMIT 300, epId: 1 I20230630 13:42:02.868131 8283 QueryInstance.cpp:80] Parsing query: MATCH p=(src)-[e:ScheduleTaskRelationship*10]-(dst) where id(src) in [1113869279948709888] RETURN DISTINCT p LIMIT 300 I20230630 13:42:02.868371 8283 Symbols.cpp:48] New variable for: __Start_0 I20230630 13:42:02.868379 8283 PlanNode.cpp:27] New variable: __Start_0 I20230630 13:42:02.868388 8283 Validator.cpp:350] Space chosen, name: data_asset_10022 id: 127 I20230630 13:42:02.868533 8283 Symbols.cpp:48] New variable for: __VAR_0 I20230630 13:42:02.868541 8283 AnonVarGenerator.h:28] Build anon var: __VAR_0 I20230630 13:42:02.868553 8283 Symbols.cpp:48] New variable for: __PassThrough_1 I20230630 13:42:02.868558 8283 PlanNode.cpp:27] New variable: __PassThrough_1 I20230630 13:42:02.868566 8283 Symbols.cpp:48] New variable for: __Dedup_2 I20230630 13:42:02.868570 8283 PlanNode.cpp:27] New variable: __Dedup_2 I20230630 13:42:02.868579 8283 MatchPathPlanner.cpp:126] Find starts: 0, Pattern has 1 edges, root: __Dedup_2, colNames: _vid I20230630 13:42:02.868587 8283 Symbols.cpp:48] New variable for: __Start_3 I20230630 13:42:02.868590 8283 PlanNode.cpp:27] New variable: __Start_3 I20230630 13:42:02.868599 8283 Symbols.cpp:48] New variable for: __Traverse_4 I20230630 13:42:02.868604 8283 PlanNode.cpp:27] New variable: __Traverse_4 I20230630 13:42:02.868779 8283 Symbols.cpp:48] New variable for: __AppendVertices_5 I20230630 13:42:02.868788 8283 PlanNode.cpp:27] New variable: __AppendVertices_5 I20230630 13:42:02.868871 8283 Symbols.cpp:48] New variable for: __Project_6 I20230630 13:42:02.868876 8283 PlanNode.cpp:27] New variable: __Project_6 I20230630 13:42:02.868893 8283 Symbols.cpp:48] New variable for: __Project_7 I20230630 13:42:02.868897 8283 PlanNode.cpp:27] New variable: __Project_7 I20230630 13:42:02.868913 8283 Symbols.cpp:48] New variable for: __Dedup_8 I20230630 13:42:02.868917 8283 PlanNode.cpp:27] New variable: __Dedup_8 I20230630 13:42:02.868925 8283 Symbols.cpp:48] New variable for: __Limit_9 I20230630 13:42:02.868929 8283 PlanNode.cpp:27] New variable: __Limit_9 I20230630 13:42:02.868935 8283 ReturnClausePlanner.cpp:52] return root: __Limit_9 colNames: p I20230630 13:42:02.868942 8283 MatchPlanner.cpp:172] root(Limit_9): __Limit_9, tail(Start_3): __Start_3 I20230630 13:42:02.868948 8283 Validator.cpp:409] root: Limit tail: Start I20230630 13:42:02.868955 8283 Validator.cpp:409] root: Limit tail: Start I20230630 13:42:02.869010 8283 Symbols.cpp:48] New variable for: __Project_10 I20230630 13:42:02.869016 8283 PlanNode.cpp:27] New variable: __Project_10 I20230630 13:42:02.869575 8312 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.57":9779, trying to create one I20230630 13:42:02.869596 8312 ThriftClientManager-inl.h:74] Connecting to "172.20.221.57":9779 for 1 times I20230630 13:42:02.873762 8292 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.59":9779, trying to create one I20230630 13:42:02.873785 8292 ThriftClientManager-inl.h:74] Connecting to "172.20.221.59":9779 for 1 times I20230630 13:42:02.874086 8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779 I20230630 13:42:02.874369 8292 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.58":9779, trying to create one I20230630 13:42:02.874384 8292 ThriftClientManager-inl.h:74] Connecting to "172.20.221.58":9779 for 2 times I20230630 13:42:02.913040 8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779 I20230630 13:42:02.913278 8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.59":9779 I20230630 13:42:02.913506 8312 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.58":9779, trying to create one I20230630 13:42:02.913524 8312 ThriftClientManager-inl.h:74] Connecting to "172.20.221.58":9779 for 2 times I20230630 13:42:02.937062 8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.58":9779 I20230630 13:42:02.937290 8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779 I20230630 13:42:02.937518 8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.59":9779 I20230630 13:42:02.961143 8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779 I20230630 13:42:02.961340 8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.58":9779 I20230630 13:42:02.961562 8312 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.59":9779, trying to create one I20230630 13:42:02.961585 8312 ThriftClientManager-inl.h:74] Connecting to "172.20.221.59":9779 for 3 times I20230630 13:42:02.975271 8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9779 I20230630 13:42:02.975541 8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.58":9779 I20230630 13:42:02.975596 8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.59":9779 I20230630 13:42:02.979934 8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.59":9779 I20230630 13:42:02.980085 8292 ThriftClientManager-inl.h:53] There is no existing client to "172.20.221.57":9779, trying to create one I20230630 13:42:02.980099 8292 ThriftClientManager-inl.h:74] Connecting to "172.20.221.57":9779 for 3 times I20230630 13:42:02.980264 8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.58":9779 I20230630 13:42:03.128459 8315 MemoryUtils.cpp:227] sys:1.539GiB/15.250GiB 10.09% usr:129.000MiB/7.125GiB 1.77% I20230630 13:42:06.127883 8315 MemoryUtils.cpp:227] sys:3.634GiB/15.250GiB 23.83% usr:2.183GiB/7.125GiB 30.63% I20230630 13:42:09.127878 8315 MemoryUtils.cpp:227] sys:5.809GiB/15.250GiB 38.09% usr:4.315GiB/7.125GiB 60.57% I20230630 13:42:10.471643 8313 MetaClient.cpp:2662] Send heartbeat to "172.20.221.57":9559, clusterId 0 I20230630 13:42:10.471740 8292 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559 I20230630 13:42:10.471755 8292 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559 I20230630 13:42:10.472738 8292 MetaClient.cpp:2680] Metad last update time: 1688024786386 I20230630 13:42:12.128513 8315 MemoryUtils.cpp:227] sys:8.024GiB/15.250GiB 52.62% usr:6.502GiB/7.125GiB 91.25% I20230630 13:42:15.128669 8315 MemoryUtils.cpp:227] sys:10.238GiB/15.250GiB 67.13% usr:8.684GiB/7.125GiB 121.87% I20230630 13:42:18.129509 8315 MemoryUtils.cpp:227] sys:12.467GiB/15.250GiB 81.75% usr:10.882GiB/7.125GiB 152.72% I20230630 13:42:20.483639 8313 MetaClient.cpp:2662] Send heartbeat to "172.20.221.57":9559, clusterId 0 I20230630 13:42:20.483723 8312 ThriftClientManager-inl.h:47] Getting a client to "172.20.221.57":9559 I20230630 13:42:20.483740 8312 MetaClient.cpp:730] Send request to meta "172.20.221.57":9559 I20230630 13:42:20.485126 8312 MetaClient.cpp:2680] Metad last update time: 1688024786386 I20230630 13:42:21.128391 8315 MemoryUtils.cpp:227] sys:14.712GiB/15.250GiB 96.47% usr:13.093GiB/7.125GiB 183.76%
the dashbord memory info pic
nebula disscus url :
https://discuss.nebula-graph.com.cn/t/topic/13424/15
The text was updated successfully, but these errors were encountered: