Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JanusGraph image stop responding after query timeout #2120

Open
doryosef opened this issue May 24, 2020 · 10 comments · Fixed by #3990 or #4425
Open

JanusGraph image stop responding after query timeout #2120

doryosef opened this issue May 24, 2020 · 10 comments · Fixed by #3990 or #4425

Comments

@doryosef
Copy link

I'm using the default docker image janusgraph/janusgraph:latest (Berkeley and Lucene)
and connecting with gremlin console.

When JanusGraph server exceeded his 'evaluationTimeout' the server stop responding

server error:

java.util.concurrent.TimeoutException: Evaluation exceeded the configured 'evaluationTimeout' threshold of 30000 ms or evaluation was otherwise cancelled directly for request [g.V()]
        at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$1(GremlinExecutor.java:316)
        at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at java.lang.Thread.run(Thread.java:748)
1318978 [pool-6-thread-1] WARN  org.janusgraph.diskstorage.log.kcvs.KCVSLog  - Could not read messages for timestamp [2020-05-24T10:12:30.449Z] (this read will be retried)
org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:56)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:158)
        at org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller.run(KCVSLog.java:725)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.janusgraph.diskstorage.PermanentBackendException: Could not start BerkeleyJE transaction
        at org.janusgraph.diskstorage.berkeleyje.BerkeleyJEStoreManager.beginTransaction(BerkeleyJEStoreManager.java:163)
        at org.janusgraph.diskstorage.berkeleyje.BerkeleyJEStoreManager.beginTransaction(BerkeleyJEStoreManager.java:47)
        at org.janusgraph.diskstorage.keycolumnvalue.keyvalue.OrderedKeyValueStoreManagerAdapter.beginTransaction(OrderedKeyValueStoreManagerAdapter.java:68)
        at org.janusgraph.diskstorage.log.kcvs.KCVSLog.openTx(KCVSLog.java:319)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:145)
        at org.janusgraph.diskstorage.util.BackendOperation$1.call(BackendOperation.java:161)
        at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:68)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:54)
        ... 9 more
Caused by: com.sleepycat.je.ThreadInterruptedException: (JE 18.3.12) Environment must be closed, caused by: com.sleepycat.je.ThreadInterruptedException: Environment invalid because of previous exception: (JE 18.3.12) /var/lib/janusgraph/data java.lang.InterruptedException THREAD_INTERRUPTED: InterruptedException may cause incorrect internal state, unable to continue. Environment is invalid and must be closed.
        at com.sleepycat.je.ThreadInterruptedException.wrapSelf(ThreadInterruptedException.java:105)
        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1835)
        at com.sleepycat.je.dbi.EnvironmentImpl.checkOpen(EnvironmentImpl.java:1844)
        at com.sleepycat.je.Environment.checkOpen(Environment.java:2697)
        at com.sleepycat.je.Environment.beginTransactionInternal(Environment.java:1409)
        at com.sleepycat.je.Environment.beginTransaction(Environment.java:1383)
        at org.janusgraph.diskstorage.berkeleyje.BerkeleyJEStoreManager.beginTransaction(BerkeleyJEStoreManager.java:146)
        ... 16 more
Caused by: com.sleepycat.je.ThreadInterruptedException: Environment invalid because of previous exception: (JE 18.3.12) /var/lib/janusgraph/data java.lang.InterruptedException THREAD_INTERRUPTED: InterruptedException may cause incorrect internal state, unable to continue. Environment is invalid and must be closed.
        at com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:67)
        at com.sleepycat.je.tree.IN.latch(IN.java:547)
        at com.sleepycat.je.dbi.CursorImpl.latchBIN(CursorImpl.java:402)
        at com.sleepycat.je.dbi.CursorImpl.cloneCursor(CursorImpl.java:230)
        at com.sleepycat.je.Cursor.beginMoveCursor(Cursor.java:5252)
        at com.sleepycat.je.Cursor.beginMoveCursor(Cursor.java:5259)
        at com.sleepycat.je.Cursor.retrieveNextNoDups(Cursor.java:3550)
        at com.sleepycat.je.Cursor.retrieveNext(Cursor.java:3312)
        at com.sleepycat.je.Cursor.getInternal(Cursor.java:1313)
        at com.sleepycat.je.Cursor.get(Cursor.java:1244)
        at com.sleepycat.je.Cursor.getNext(Cursor.java:1512)

after the query been sent to server and timeout exceeded other queries which worked before gets same response

Evaluation exceeded the configured 'evaluationTimeout' threshold of 30000 ms or evaluation was otherwise cancelled directly for request [g.V().limit(4).valueMap()]: null - try increasing the timeout with the :remote command

@cbobed
Copy link

cbobed commented Jun 1, 2021

I've found the same behaviour in 0.5.3 submitting scripts both from the console and from a connection. Once the server launches a timeout, it stops answering and tells you that it's always a timeout.

I can confirm that it happens with Berkeley + ES, versions 0.5.2 and 0.5.3 (when using Cassandra + ES in those versions, this doesn't happen).

@Omig12
Copy link

Omig12 commented Jun 3, 2021

I've found the same behaviour in 0.5.3 submitting scripts both from the console and from a connection. Once the server launches a timeout, it stops answering and tells you that it's always a timeout.

I can confirm that it happens with Berkeley + ES, versions 0.5.2 and 0.5.3 (when using Cassandra + ES in those versions, this doesn't happen).

We've encountered the same issue using the full release version 0.5.3 with the Cassandra + ES backend, connecting through a JavaScript Driver, a Python Driver, and a Gremlin.sh Groovy console.

@mohamad-haddad-tribo
Copy link

I faced same issue on 0.6 (latest) + Cassandra + ES.

Have no idea why, any update on how to overcome it? I had to remove all my datas then re-run the engine to get it working, does it mean that the data is corrupted?

@farodin91
Copy link
Contributor

@mohamad-haddad-tribo The exception from above clear comes from berkeley. I'm sure you did get a berkeley exception in cassandra setup.

@javiramos1
Copy link

I'm also running into the same issue with Cassandra during data ingestion using concurrent inserts

@jldevezas
Copy link

I am using JanusGraph 0.6.0 and I confirm this is still an issue with BerkeleyDB. Once this error occurs, the server won't be able to recover from it. (P.S.: I know 0.6.1 has been released, but I was encountering issues with it, so I stick with 0.6.0).

@delenius
Copy link

delenius commented Nov 1, 2022

Same for us on the in-memory backend :(

mad added a commit to mad/janusgraph that referenced this issue Sep 15, 2023
mad added a commit to mad/janusgraph that referenced this issue Sep 16, 2023
mad added a commit to mad/janusgraph that referenced this issue Sep 18, 2023
mad added a commit to mad/janusgraph that referenced this issue Sep 18, 2023
@li-boxuan li-boxuan added this to the Release v1.0.0 milestone Oct 6, 2023
li-boxuan pushed a commit that referenced this issue Oct 6, 2023
janusgraph-automations pushed a commit that referenced this issue Oct 6, 2023
Fixes #2120

Signed-off-by: Pavel Ershov <[email protected]>
(cherry picked from commit cdea0d7)
janusgraph-automations pushed a commit that referenced this issue Oct 6, 2023
Fixes #2120

Signed-off-by: Pavel Ershov <[email protected]>
(cherry picked from commit cdea0d7)
@li-boxuan li-boxuan reopened this Oct 6, 2023
@m-thirumal
Copy link

The same issue with the Cassandra backend in 1.0.0

@tien
Copy link
Contributor

tien commented Apr 24, 2024

I've just ran into this issue with version 1.0.0.

@porunov
Copy link
Member

porunov commented Oct 18, 2024

The PR #4425 has been reverted by the PR #4702
Thus, I'm re-opening this issue.

@porunov porunov reopened this Oct 18, 2024
@porunov porunov removed this from the Release v1.1.0 milestone Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment