Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] hugegraph任务调度机制 #2389

Closed
1 task done
xiaoleizi2016 opened this issue Dec 15, 2023 · 2 comments · Fixed by #2401
Closed
1 task done

[Bug] hugegraph任务调度机制 #2389

xiaoleizi2016 opened this issue Dec 15, 2023 · 2 comments · Fixed by #2401
Labels
bug Something isn't working

Comments

@xiaoleizi2016
Copy link
Contributor

xiaoleizi2016 commented Dec 15, 2023

Bug Type (问题类型)

None

Before submit

  • 我已经确认现有的 IssuesFAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

  • Server Version: v0.11.2
  • Backend: RocksDB
  • OS: Ubuntu 2x.x
  • Data Size: like 1000W 点, 9000W 边

Expected & Actual behavior (期望与实际表现)

期望:进程重启时,hugegraph会尝试恢复之前存在的异步任务,恢复任务后,正常启动。
实际:
1、恢复之前存在的异步任务失败时,会中断系统启动,但是再次重启时,系统又因为异步任务恢复失败被中断启动,导致无法恢复。
2、异步任务恢复时,考虑了任务状态的优先级,但是提交时的处理顺序,存在部分任务重复调度的可能性。

异常堆栈
#tailf deploy-stderr.log
at org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:269)
at com.baidu.hugegraph.server.RestServer.start(RestServer.java:64)
at com.baidu.hugegraph.server.RestServer.start(RestServer.java:133)
at com.baidu.hugegraph.dist.HugeRestServer.start(HugeRestServer.java:58)
at com.baidu.hugegraph.dist.HugeGraphServer.(HugeGraphServer.java:55)
at com.baidu.hugegraph.dist.HugeGraphServer.main(HugeGraphServer.java:100)
Dec 15, 2023 4:19:32 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [0.0.0.0:9080]
Dec 15, 2023 4:19:33 PM org.glassfish.grizzly.http.server.NetworkListener shutdownNow
INFO: Stopped listener bound to [0.0.0.0:9080]

Exception in thread "main" java.lang.IllegalArgumentException: Task '37474' is already in the queue
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:163)
at com.baidu.hugegraph.util.E.checkArgument(E.java:56)
at com.baidu.hugegraph.task.StandardTaskScheduler.restore(StandardTaskScheduler.java:206)
at com.baidu.hugegraph.task.StandardTaskScheduler.restoreTasks(StandardTaskScheduler.java:194)
at com.baidu.hugegraph.StandardHugeGraph.serverStarted(StandardHugeGraph.java:223)
at com.baidu.hugegraph.auth.HugeGraphAuthProxy.serverStarted(HugeGraphAuthProxy.java:601)
at com.baidu.hugegraph.core.GraphManager.serverStarted(GraphManager.java:237)
at com.baidu.hugegraph.core.GraphManager.(GraphManager.java:78)
at com.baidu.hugegraph.server.ApplicationConfig$GraphManagerFactory$1.onEvent(ApplicationConfig.java:108)
at org.glassfish.jersey.server.internal.monitoring.CompositeApplicationEventListener.onEvent(CompositeApplicationEventListener.java:74)
at org.glassfish.jersey.server.internal.monitoring.MonitoringContainerListener.onStartup(MonitoringContainerListener.java:81)
at org.glassfish.jersey.server.ApplicationHandler.onStartup(ApplicationHandler.java:1180)
at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.start(GrizzlyHttpContainer.java:357)
at org.glassfish.grizzly.http.server.HttpHandlerChain.start(HttpHandlerChain.java:398)
at org.glassfish.grizzly.http.server.HttpServer.setupHttpHandler(HttpServer.java:293)
at org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:269)
at com.baidu.hugegraph.server.RestServer.start(RestServer.java:64)
at com.baidu.hugegraph.server.RestServer.start(RestServer.java:133)
at com.baidu.hugegraph.dist.HugeRestServer.start(HugeRestServer.java:58)
at com.baidu.hugegraph.dist.HugeGraphServer.(HugeGraphServer.java:55)
at com.baidu.hugegraph.dist.HugeGraphServer.main(HugeGraphServer.java:100)

潜在bug点,第一步捞出来的RESTORING类型的任务,在其被提交后,可能调度到并变成RUNNING状态,导致第二步查询RUNNING的任务列表时,包含第一步捞取的RESTORING任务,任务ID重复,导致系统启动失败。
image

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

@xiaoleizi2016 xiaoleizi2016 added the bug Something isn't working label Dec 15, 2023
@imbajin
Copy link
Member

imbajin commented Dec 15, 2023

could check the latest code https://github.com/apache/incubator-hugegraph/tree/pd-store to see if still exist the same problem?

@xiaoleizi2016
Copy link
Contributor Author

xiaoleizi2016 commented Dec 18, 2023

看代码是有问题,因为任务不是一次性全部查询出来,而是处理了一批再查询下一批,处理过的任务状态轮转后,可能导致第二步又查询到相同的任务id。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants