-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk index tasks stuck forever #31099
Comments
Pinging @elastic/es-core-infra |
Do you need more details? Or that I run some commands on the cluster? I also found out recently that this bug is responsible for stuck relocations. Somehow the ongoing bulk operation prevents the relocation from finishing completely, and the workaround I found is is to close / re-open the index. |
@bleskes this seems more of a distributed systems issue? Can you follow up |
@jeancornic what's peculiar about this issue is that the task |
@bleskes sure, follows a json extract (
|
Pinging @elastic/es-distributed |
@jeancornic thanks. It seems these are stuck waiting for a replica that never responded. The two tickets that you mention are similar in nature, but I don't see anything here that point in that direction. I understand that you see this regularly. Is that true? if so, can you maybe upgrade to 6.3.0 and see how it goes? it will help reduce the scope of the search. Some more generic questions about your setup:
|
@bleskes Thanks for those comments. Migrating to 6.3.0 is not very straightforward for us to be honest.
In the logs (on the 3 nodes
Seeing also one
but 4 hours after the bulk, probably unrelated. |
@jeancornic thanks. Nothing pops up directly here, which means it's going to be a long search. It's good it happens frequently as it greatly increases to chance of finding out what its. The the last log message you sent is important:
Can you post the full stack trace for this? can you also check other nodes for transport level logs that look suspicious? |
We also encountered the same problem, this task caused several nodes to load very high, and then the cluster could not write. Restart the node with high load to recover. Is there a solution? Thank you. |
@linyy2233 it looks like you're having a different (unrelated) problem, possibly caused by having a very large JSON number in your document? Can you open a topic on discuss.elastic.co please for further investigation? Thanks. Also, as no reply on original ticket, closing this. |
@jeancornic Did you ever get more info on this? Experiencing the same. |
For the case where tasks are stuck on replicas (Tasks are organized in a tree hierarchy, and the leaves are tasks with an action name |
Thanks, I changed this parameter to false and it does not appear. |
We experience the same problem on 6.8.7 Is there any generic workaround how to delete stuck tasks? Example:
|
Elasticsearch version (
bin/elasticsearch --version
): 6.1.3Plugins installed: []
JVM version (
java -version
):OS version (
uname -a
if on a Unix-like system):Description of the problem including expected versus actual behavior:
Seeing a weird behaviour, with bulk index actions that never finish.
They are triggered by the Elasticsearch Java client (version 6.1.3), through a
BulkRequestBuilder
, with the default timeout (1m). Seems we never enter theActionListener
callbacks.I found out recently that it's linked with tasks running indefinitely. When checking the /_tasks api, I'm seeing tasks that have been running for several days (!)..
I would expect the bulk action to run for 1min, and then timeout.
I came across several issues / PRs on github:
Steps to reproduce: I have no idea, it happens randomly, since we migrated from 5.6 to 6.1.3.
Provide logs (if relevant): found nothing relevant nor in ES, nor in the client logs.
The text was updated successfully, but these errors were encountered: