Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] hugeserver backend with postgresql schedule gremlin job unit test failed #2648

Closed
1 task done
JackyYangPassion opened this issue Aug 23, 2024 · 1 comment · Fixed by #2643
Closed
1 task done
Labels
bug Something isn't working gremlin TinkerPop gremlin

Comments

@JackyYangPassion
Copy link
Contributor

JackyYangPassion commented Aug 23, 2024

Bug Type (问题类型)

None

Before submit

  • 我已经确认现有的 IssuesFAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

Expected & Actual behavior (期望与实际表现)

当前问题

testGremlinJobAndCancel(org.apache.hugegraph.core.TaskCoreTest) failed when hugeserver backend with postgresql.

2024-08-20 02:49:47 [server-info-db-worker-1] [INFO] o.a.h.b.s.m.MysqlSessions - Connect to the jdbc url: 'jdbc:postgresql://localhost:5432/hugegraph?loggerLevel=OFF&characterEncoding=utf-8&rewriteBatchedStatements=true&useServerPrepStmts=false&autoReconnect=true&maxReconnects=3&initialTimeout=3&useSSL=false'
Error: -20 02:49:49 [task-worker-1] [ERROR] o.a.h.t.TaskCallable - Failed to save task with error "java.lang.IllegalStateException: Can't find task scheduler for graph 'standardhugegraph[hugegraph]'": {task_name=test-gremlin-job, task_progress=0, task_create=2024-08-20T02:49:37.021+0000, task_status=success, task_update=2024-08-20T02:49:49.221+0000, task_retries=0, id=1, task_type=gremlin, task_server=server-test}
Error: -20 02:49:49 [task-worker-1] [ERROR] o.a.h.t.HugeTask - An exception occurred when calling done()
java.lang.IllegalStateException: Can't find task scheduler for graph 'standardhugegraph[hugegraph]'
	at com.google.common.base.Preconditions.checkState(Preconditions.java:532) ~[guava-30.0-jre.jar:?]
	at org.apache.hugegraph.util.E.checkState(E.java:64) ~[hugegraph-common-1.3.0.jar:?]
	at org.apache.hugegraph.StandardHugeGraph.taskScheduler(StandardHugeGraph.java:1078) ~[classes/:?]
	at org.apache.hugegraph.task.TaskCallable.save(TaskCallable.java:107) ~[classes/:?]
	at org.apache.hugegraph.job.UserJob.done(UserJob.java:33) ~[classes/:?]
	at org.apache.hugegraph.task.HugeTask.done(HugeTask.java:362) ~[classes/:?]
	at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:381) ~[?:?]
	at java.util.concurrent.FutureTask.set(FutureTask.java:232) ~[?:?]
	at org.apache.hugegraph.task.HugeTask.set(HugeTask.java:378) ~[classes/:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:272) ~[?:?]
	at org.apache.hugegraph.task.HugeTask.run(HugeTask.java:307) ~[classes/:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]

Error:  Tests run: 731, Failures: 1, Errors: 10, Skipped: 48, Time elapsed: 288.808 s <<< FAILURE! - in org.apache.hugegraph.core.CoreTestSuite
Error:  testGremlinJobAndCancel(org.apache.hugegraph.core.TaskCoreTest)  Time elapsed: 0.129 s  <<< FAILURE!
java.lang.AssertionError: expected:<CANCELLING> but was:<RUNNING>
	at org.apache.hugegraph.core.TaskCoreTest.testGremlinJobAndCancel(TaskCoreTest.java:560)

根因

主线程 和 调度线程时间差不够,当后端存储是PostgreSQL 时候 Task 状态机从SHEDULING 转变到 RUNNING 时间较长

具体日志
Sleep 100ms

2024-08-23 09:22:59 [task-scheduler-1] [INFO] o.a.h.t.ServerInfoManager - Switch only_single_node to true
2024-08-23 09:22:59 [task-scheduler-1] [INFO] o.a.h.t.StandardTaskScheduler - Scheduled task '1' to server 'server-test' at 2024-08-23 09:22:59.066
2024-08-23 09:22:59 [task-scheduler-1] [INFO] o.a.h.t.StandardTaskScheduler - running task '1' on server 'server-test' at 2024-08-23 09:22:59.077
2024-08-23 09:22:59 [main] [INFO] o.a.h.t.StandardTaskScheduler - Cancel task '1' in status SCHEDULING at 2024-08-23 09:22:59.086

解决方案

适度增大 主线程等待时间 ,修复后的结果如下

sleep 1000 ms

2024-08-22 20:27:55 [task-scheduler-1] [INFO] o.a.h.t.StandardTaskScheduler - Scheduled task '1' to server 'server-test' at 2024-08-22 20:27:55.298
2024-08-22 20:27:55 [task-scheduler-1] [INFO] o.a.h.t.StandardTaskScheduler - running task '1' on server 'server-test' at 2024-08-22 20:27:55.310
2024-08-22 20:27:56 [main] [INFO] o.a.h.t.StandardTaskScheduler - Cancel task '1' in status RUNNING at 2024-08-22 20:27:56.088
@JackyYangPassion JackyYangPassion added the bug Something isn't working label Aug 23, 2024
Copy link

dosubot bot commented Aug 23, 2024

To address the unit test failure in testGremlinJobAndCancel when using PostgreSQL as the backend in HugeServer, you can increase the sleep duration to allow more time for the task status to transition from SCHEDULING to RUNNING. This adjustment helps accommodate the longer time PostgreSQL may take for this transition.

Here's the modified part of the testGremlinJobAndCancel method with an increased sleep duration:

@Test
public void testGremlinJobAndCancel() throws TimeoutException {
    HugeGraph graph = graph();
    TaskScheduler scheduler = graph.taskScheduler();

    HugeTask<Object> task = runGremlinJob("Thread.sleep(1000 * 10);");

    // Increase sleep duration to 1000 ms
    sleepAWhile(1000);
    task = scheduler.task(task.id());
    scheduler.cancel(task);

    task = scheduler.task(task.id());
    Assert.assertEquals(TaskStatus.CANCELLING, task.status());

    task = scheduler.waitUntilTaskCompleted(task.id(), 10);
    Assert.assertEquals(TaskStatus.CANCELLED, task.status());
    Assert.assertEquals("test-gremlin-job", task.name());
    Assert.assertTrue(task.result(), task.result() == null ||
                                     task.result().endsWith("InterruptedException"));

    // Cancel success task
    HugeTask<Object> task2 = runGremlinJob("1+2");
    task2 = scheduler.waitUntilTaskCompleted(task2.id(), 10);
    Assert.assertEquals(TaskStatus.SUCCESS, task2.status());
    scheduler.cancel(task2);
    task2 = scheduler.task(task2.id());
    Assert.assertEquals(TaskStatus.SUCCESS, task2.status());
    Assert.assertEquals("3", task2.result());
}

This change should help mitigate the timing issue observed with PostgreSQL, allowing the test to pass successfully [1].

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added the gremlin TinkerPop gremlin label Aug 23, 2024
imbajin pushed a commit that referenced this issue Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gremlin TinkerPop gremlin
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant