Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task fails silently during parallel _reindex operation #33764

Closed
Katulus opened this issue Sep 17, 2018 · 5 comments
Closed

Task fails silently during parallel _reindex operation #33764

Katulus opened this issue Sep 17, 2018 · 5 comments
Assignees
Labels
>bug :Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level.

Comments

@Katulus
Copy link

Katulus commented Sep 17, 2018

Elasticsearch version: Version: 6.4.0, Build: default/tar/595516e/2018-08-17T23:18:47.308994Z, JVM: 10.0.2 (Note: This is official docker container docker.elastic.co/elasticsearch/elasticsearch:6.4.0)

Plugins installed: [og9AEaa ingest-geoip(6.4.0), og9AEaa ingest-user-agent (6.4.0)]

JVM version: 10.0.2

OS version: Linux 45660be5076f 4.15.0-34-generic #37-Ubuntu SMP Mon Aug 27 15:21:48 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

I'm running reindex operation from remote ES 1.4 cluster via Reindex API. Since remote reindex does not support slicing I'm doing manual slicing using query and limiting remote search by min < document_timestamp && document_timestamp < max and using wait_for_completion=false to generate multiple parallel tasks.
When I use up to 8 manual slices everything seems to be working fine. But when I double to 16 slices (source index has 100,000 documents so each slice has 6250 documents) the target ES 6.4 at one point loses track of reindex task.
It first correctly creates 16 reindexing tasks and returns their IDs, then it correctly gives me status of all 16 tasks when I ask for it via http://localhost:9206/_tasks/<task:id> (I have container port 9200 mapped to local 9206) but when I ask for status few times one of those 16 tasks returns 404 - not found.

Sample of reindexing script log (the problematic task is HI4YEIRMRuSdwsE4bCK7Bg:197):

... create 16 reindexing tasks ...

2018-09-17 13:48:23 DEBUG http://localhost:9206 "POST /_reindex?pretty&refresh&wait_for_completion=false&requests_per_second=-1 HTTP/1.1" 200 70
2018-09-17 13:48:23 DEBUG {'task': 'HI4YEIRMRuSdwsE4bCK7Bg:192'}
2018-09-17 13:48:23 DEBUG http://localhost:9206 "POST /_reindex?pretty&refresh&wait_for_completion=false&requests_per_second=-1 HTTP/1.1" 200 70
2018-09-17 13:48:23 DEBUG {'task': 'HI4YEIRMRuSdwsE4bCK7Bg:193'}
2018-09-17 13:48:23 DEBUG http://localhost:9206 "POST /_reindex?pretty&refresh&wait_for_completion=false&requests_per_second=-1 HTTP/1.1" 200 70
2018-09-17 13:48:23 DEBUG {'task': 'HI4YEIRMRuSdwsE4bCK7Bg:194'}
2018-09-17 13:48:23 DEBUG http://localhost:9206 "POST /_reindex?pretty&refresh&wait_for_completion=false&requests_per_second=-1 HTTP/1.1" 200 70
2018-09-17 13:48:23 DEBUG {'task': 'HI4YEIRMRuSdwsE4bCK7Bg:195'}
2018-09-17 13:48:23 DEBUG http://localhost:9206 "POST /_reindex?pretty&refresh&wait_for_completion=false&requests_per_second=-1 HTTP/1.1" 200 70
2018-09-17 13:48:23 DEBUG {'task': 'HI4YEIRMRuSdwsE4bCK7Bg:196'}
2018-09-17 13:48:23 DEBUG http://localhost:9206 "POST /_reindex?pretty&refresh&wait_for_completion=false&requests_per_second=-1 HTTP/1.1" 200 70
2018-09-17 13:48:23 DEBUG {'task': 'HI4YEIRMRuSdwsE4bCK7Bg:197'}
2018-09-17 13:48:23 DEBUG http://localhost:9206 "POST /_reindex?pretty&refresh&wait_for_completion=false&requests_per_second=-1 HTTP/1.1" 200 70
2018-09-17 13:48:23 DEBUG {'task': 'HI4YEIRMRuSdwsE4bCK7Bg:198'}
2018-09-17 13:48:23 DEBUG http://localhost:9206 "POST /_reindex?pretty&refresh&wait_for_completion=false&requests_per_second=-1 HTTP/1.1" 200 70
2018-09-17 13:48:23 DEBUG {'task': 'HI4YEIRMRuSdwsE4bCK7Bg:199'}
2018-09-17 13:48:23 DEBUG http://localhost:9206 "POST /_reindex?pretty&refresh&wait_for_completion=false&requests_per_second=-1 HTTP/1.1" 200 70
2018-09-17 13:48:23 DEBUG {'task': 'HI4YEIRMRuSdwsE4bCK7Bg:200'}
2018-09-17 13:48:24 DEBUG http://localhost:9206 "POST /_reindex?pretty&refresh&wait_for_completion=false&requests_per_second=-1 HTTP/1.1" 200 70
2018-09-17 13:48:24 DEBUG {'task': 'HI4YEIRMRuSdwsE4bCK7Bg:201'}
2018-09-17 13:48:24 DEBUG http://localhost:9206 "POST /_reindex?pretty&refresh&wait_for_completion=false&requests_per_second=-1 HTTP/1.1" 200 70
2018-09-17 13:48:24 DEBUG {'task': 'HI4YEIRMRuSdwsE4bCK7Bg:202'}
2018-09-17 13:48:24 DEBUG http://localhost:9206 "POST /_reindex?pretty&refresh&wait_for_completion=false&requests_per_second=-1 HTTP/1.1" 200 70
2018-09-17 13:48:24 DEBUG {'task': 'HI4YEIRMRuSdwsE4bCK7Bg:203'}
2018-09-17 13:48:24 DEBUG http://localhost:9206 "POST /_reindex?pretty&refresh&wait_for_completion=false&requests_per_second=-1 HTTP/1.1" 200 70
2018-09-17 13:48:24 DEBUG {'task': 'HI4YEIRMRuSdwsE4bCK7Bg:204'}
2018-09-17 13:48:24 DEBUG http://localhost:9206 "POST /_reindex?pretty&refresh&wait_for_completion=false&requests_per_second=-1 HTTP/1.1" 200 70
2018-09-17 13:48:24 DEBUG {'task': 'HI4YEIRMRuSdwsE4bCK7Bg:205'}
2018-09-17 13:48:24 DEBUG http://localhost:9206 "POST /_reindex?pretty&refresh&wait_for_completion=false&requests_per_second=-1 HTTP/1.1" 200 70
2018-09-17 13:48:24 DEBUG {'task': 'HI4YEIRMRuSdwsE4bCK7Bg:208'}
2018-09-17 13:48:24 DEBUG http://localhost:9206 "POST /_reindex?pretty&refresh&wait_for_completion=false&requests_per_second=-1 HTTP/1.1" 200 70
2018-09-17 13:48:24 DEBUG {'task': 'HI4YEIRMRuSdwsE4bCK7Bg:209'}

... now ask every 5 seconds for tasks status ...

2018-09-17 13:48:25 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:192 HTTP/1.1" 200 576
2018-09-17 13:48:25 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:192: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":192,"type":"transport","action":"indices:data/write/reindex","status":{"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536826733750,\n            \"gte\" : 1536820483000\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184902870,"running_time_in_nanos":2272736951,"cancellable":true,"headers":{}}}
2018-09-17 13:48:25 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:193 HTTP/1.1" 200 576
2018-09-17 13:48:25 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:193: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":193,"type":"transport","action":"indices:data/write/reindex","status":{"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536832984500,\n            \"gte\" : 1536826733750\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184903030,"running_time_in_nanos":2151889086,"cancellable":true,"headers":{}}}
2018-09-17 13:48:25 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:194 HTTP/1.1" 200 576
2018-09-17 13:48:25 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:194: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":194,"type":"transport","action":"indices:data/write/reindex","status":{"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536839235250,\n            \"gte\" : 1536832984500\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184903061,"running_time_in_nanos":2159403370,"cancellable":true,"headers":{}}}
2018-09-17 13:48:25 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:195 HTTP/1.1" 200 576
2018-09-17 13:48:25 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:195: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":195,"type":"transport","action":"indices:data/write/reindex","status":{"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536845486000,\n            \"gte\" : 1536839235250\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184903104,"running_time_in_nanos":2125611320,"cancellable":true,"headers":{}}}
2018-09-17 13:48:25 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:196 HTTP/1.1" 200 577
2018-09-17 13:48:25 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:196: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":196,"type":"transport","action":"indices:data/write/reindex","status":{"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536851736750,\n            \"gte\" : 1536845486000\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184903172,"running_time_in_nanos":2076257258,"cancellable":true,"headers":{}}}
2018-09-17 13:48:25 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:197 HTTP/1.1" 200 574
2018-09-17 13:48:25 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:197: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":197,"type":"transport","action":"indices:data/write/reindex","status":{"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536857987500,\n            \"gte\" : 1536851736750\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184903328,"running_time_in_nanos":1931714207,"cancellable":true,"headers":{}}}
2018-09-17 13:48:25 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:198 HTTP/1.1" 200 577
2018-09-17 13:48:25 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:198: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":198,"type":"transport","action":"indices:data/write/reindex","status":{"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536864238250,\n            \"gte\" : 1536857987500\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184903489,"running_time_in_nanos":1780568785,"cancellable":true,"headers":{}}}
2018-09-17 13:48:25 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:199 HTTP/1.1" 200 576
2018-09-17 13:48:25 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:199: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":199,"type":"transport","action":"indices:data/write/reindex","status":{"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536870489000,\n            \"gte\" : 1536864238250\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184903615,"running_time_in_nanos":1678313831,"cancellable":true,"headers":{}}}
2018-09-17 13:48:25 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:200 HTTP/1.1" 200 576
2018-09-17 13:48:25 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:200: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":200,"type":"transport","action":"indices:data/write/reindex","status":{"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536876739750,\n            \"gte\" : 1536870489000\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184903722,"running_time_in_nanos":1584805836,"cancellable":true,"headers":{}}}
2018-09-17 13:48:25 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:201 HTTP/1.1" 200 576
2018-09-17 13:48:25 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:201: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":201,"type":"transport","action":"indices:data/write/reindex","status":{"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536882990500,\n            \"gte\" : 1536876739750\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184903964,"running_time_in_nanos":1380729600,"cancellable":true,"headers":{}}}
2018-09-17 13:48:25 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:202 HTTP/1.1" 200 575
2018-09-17 13:48:25 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:202: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":202,"type":"transport","action":"indices:data/write/reindex","status":{"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536889241250,\n            \"gte\" : 1536882990500\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184904100,"running_time_in_nanos":1259823857,"cancellable":true,"headers":{}}}
2018-09-17 13:48:25 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:203 HTTP/1.1" 200 574
2018-09-17 13:48:25 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:203: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":203,"type":"transport","action":"indices:data/write/reindex","status":{"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536895492000,\n            \"gte\" : 1536889241250\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184904160,"running_time_in_nanos":1225550074,"cancellable":true,"headers":{}}}
2018-09-17 13:48:25 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:204 HTTP/1.1" 200 575
2018-09-17 13:48:25 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:204: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":204,"type":"transport","action":"indices:data/write/reindex","status":{"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536901742750,\n            \"gte\" : 1536895492000\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184904394,"running_time_in_nanos":1013116012,"cancellable":true,"headers":{}}}
2018-09-17 13:48:25 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:205 HTTP/1.1" 200 573
2018-09-17 13:48:25 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:205: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":205,"type":"transport","action":"indices:data/write/reindex","status":{"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536907993500,\n            \"gte\" : 1536901742750\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184904537,"running_time_in_nanos":896998560,"cancellable":true,"headers":{}}}
2018-09-17 13:48:25 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:208 HTTP/1.1" 200 574
2018-09-17 13:48:25 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:208: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":208,"type":"transport","action":"indices:data/write/reindex","status":{"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536914244250,\n            \"gte\" : 1536907993500\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184904632,"running_time_in_nanos":861332226,"cancellable":true,"headers":{}}}
2018-09-17 13:48:25 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:209 HTTP/1.1" 200 572
2018-09-17 13:48:25 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:209: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":209,"type":"transport","action":"indices:data/write/reindex","status":{"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536920495001,\n            \"gte\" : 1536914244250\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184904850,"running_time_in_nanos":664256833,"cancellable":true,"headers":{}}}

... second, and third round are the same, all tasks report status ...
... now 4th round ...

2018-09-17 13:48:39 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:192 HTTP/1.1" 200 581
2018-09-17 13:48:39 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:192: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":192,"type":"transport","action":"indices:data/write/reindex","status":{"total":6251,"updated":0,"created":0,"deleted":0,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":2,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536826733750,\n            \"gte\" : 1536820483000\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184902870,"running_time_in_nanos":16132017224,"cancellable":true,"headers":{}}}
2018-09-17 13:48:39 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:193 HTTP/1.1" 200 582
2018-09-17 13:48:39 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:193: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":193,"type":"transport","action":"indices:data/write/reindex","status":{"total":6250,"updated":0,"created":0,"deleted":0,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":2,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536832984500,\n            \"gte\" : 1536826733750\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184903030,"running_time_in_nanos":15979935211,"cancellable":true,"headers":{}}}
2018-09-17 13:48:39 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:194 HTTP/1.1" 200 624
2018-09-17 13:48:39 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:194: {"completed":true,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":194,"type":"transport","action":"indices:data/write/reindex","status":{"total":6251,"updated":0,"created":6251,"deleted":0,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536839235250,\n            \"gte\" : 1536832984500\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184903061,"running_time_in_nanos":12494097836,"cancellable":true,"headers":{}},"response":{"took":12488,"timed_out":false,"total":6251,"updated":0,"created":6251,"deleted":0,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[]}}
2018-09-17 13:48:39 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:195 HTTP/1.1" 200 583
2018-09-17 13:48:39 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:195: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":195,"type":"transport","action":"indices:data/write/reindex","status":{"total":6249,"updated":0,"created":0,"deleted":0,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":1,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536845486000,\n            \"gte\" : 1536839235250\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184903104,"running_time_in_nanos":16142388968,"cancellable":true,"headers":{}}}
2018-09-17 13:48:39 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:196 HTTP/1.1" 200 583
2018-09-17 13:48:39 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:196: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":196,"type":"transport","action":"indices:data/write/reindex","status":{"total":6249,"updated":0,"created":0,"deleted":0,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":2,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536851736750,\n            \"gte\" : 1536845486000\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184903172,"running_time_in_nanos":16080621544,"cancellable":true,"headers":{}}}

... and now the task HI4YEIRMRuSdwsE4bCK7Bg:197 returns "not found" instead of status as all others ...

2018-09-17 13:48:39 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:197 HTTP/1.1" 404 176
2018-09-17 13:48:39 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:197: {"error":{"root_cause":[{"type":"resource_not_found_exception","reason":"task [HI4YEIRMRuSdwsE4bCK7Bg:197] isn't running and hasn't stored its results"}],"type":"resource_not_found_exception","reason":"task [HI4YEIRMRuSdwsE4bCK7Bg:197] isn't running and hasn't stored its results"},"status":404}

... but next tasks still works correctly ...

2018-09-17 13:48:39 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:198 HTTP/1.1" 200 582
2018-09-17 13:48:39 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:198: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":198,"type":"transport","action":"indices:data/write/reindex","status":{"total":6250,"updated":0,"created":0,"deleted":0,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":3,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536864238250,\n            \"gte\" : 1536857987500\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184903489,"running_time_in_nanos":15868744097,"cancellable":true,"headers":{}}}
2018-09-17 13:48:39 DEBUG http://localhost:9206 "GET /_tasks/HI4YEIRMRuSdwsE4bCK7Bg:199 HTTP/1.1" 200 580
2018-09-17 13:48:39 DEBUG Got reindexing task result from http://localhost:9206/_tasks/HI4YEIRMRuSdwsE4bCK7Bg:199: {"completed":false,"task":{"node":"HI4YEIRMRuSdwsE4bCK7Bg","id":199,"type":"transport","action":"indices:data/write/reindex","status":{"total":6249,"updated":0,"created":0,"deleted":0,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":2,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0},"description":"reindex from [host=es port=9200 query={\n  \"bool\" : {\n    \"must\" : [\n      {\n        \"term\" : {\n          \"_custid\" : 1\n        }\n      },\n      {\n        \"range\" : {\n          \"_idxts\" : {\n            \"lt\" : 1536870489000,\n            \"gte\" : 1536864238250\n          }\n        }\n      }\n    ]\n  }\n}][1.180914.1021.shared.7f7640] updated with Script{type=inline, lang='painless', idOrCode='ctx._source.remove('_id');ctx._type = \"_doc\";', options={}, params={}} to [1.180914.1021.shared.7f7640]","start_time_in_millis":1537184903615,"running_time_in_nanos":15752580914,"cancellable":true,"headers":{}}}

When I look at ES log I see that given task failed (full log at the bottom):

[2018-09-17T11:48:39,243][WARN ][o.e.t.LoggingTaskListener] 197 failed with exception
org.elasticsearch.transport.RemoteTransportException: [HI4YEIR][172.30.0.6:9300][indices:data/write/bulk[s][p]]
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.TransportService$7@15ac841d on EsThreadPoolExecutor[name = HI4YEIR/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@36d96df1[Running, pool size = 8, active threads = 8, queued tasks = 200, completed tasks = 239]]

As it looks to me (without knowing much about ES internals) the write thread pool is full because of running reindexing and when ES tries to something with task that operation is rejected because of full queue. I would expect that tasks just take longer to execute, not that they fail silently (looking from outside). If it happens I have to fail whole reindexing process and start over, which fails again.

Steps to reproduce:

  1. Have ES 1.4 instance with index that contains 100,000 documents, total index size is 92MB.
  2. Have ES 6.4 running in docker container having max Java heap set to 16GB
  3. Start 16 parallel reindexing tasks from 1.4 to 6.4 using request to http://localhost:9206/_reindex?pretty&refresh&wait_for_completion=false
  4. Every 5 seconds query ES 6.4 for tasks status using GET http://localhost:9206/_tasks/<part_1_of_task_id:number>
  5. As some point one of tasks query returns resource_not_found_exception
  6. Looking at ES logs there is an error
[2018-09-17T11:48:39,243][WARN ][o.e.t.LoggingTaskListener] 197 failed with exception
org.elasticsearch.transport.RemoteTransportException: [HI4YEIR][172.30.0.6:9300][indices:data/write/bulk[s][p]]
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.TransportService$7@15ac841d on EsThreadPoolExecutor[name = HI4YEIR/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@36d96df1[Running, pool size = 8, active threads = 8, queued tasks = 200, completed tasks = 239]]

ES 6.4 logs:

OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
[2018-09-17T11:48:07,965][INFO ][o.e.n.Node               ] [] initializing ...
[2018-09-17T11:48:08,017][INFO ][o.e.e.NodeEnvironment    ] [HI4YEIR] using [1] data paths, mounts [[/ (overlay)]], net usable_space [90.5gb], net total_space [196.4gb], types [overlay]
[2018-09-17T11:48:08,018][INFO ][o.e.e.NodeEnvironment    ] [HI4YEIR] heap size [15.9gb], compressed ordinary object pointers [true]
[2018-09-17T11:48:08,019][INFO ][o.e.n.Node               ] [HI4YEIR] node name derived from node ID [HI4YEIRMRuSdwsE4bCK7Bg]; set [node.name] to override
[2018-09-17T11:48:08,020][INFO ][o.e.n.Node               ] [HI4YEIR] version[6.4.0], pid[10], build[default/tar/595516e/2018-08-17T23:18:47.308994Z], OS[Linux/4.15.0-34-generic/amd64], JVM["Oracle Corporation"/OpenJDK 64-Bit Server VM/10.0.2/10.0.2+13]
[2018-09-17T11:48:08,020][INFO ][o.e.n.Node               ] [HI4YEIR] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch.b7vtokTJ, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Djava.locale.providers=COMPAT, -XX:UseAVX=2, -Des.cgroups.hierarchy.override=/, -Xms1g, -Xmx16g, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config, -Des.distribution.flavor=default, -Des.distribution.type=tar]
[2018-09-17T11:48:08,705][WARN ][o.e.x.w.Watcher          ] the [action.auto_create_index] setting is configured to be restrictive [.security,.monitoring*,.watches,.triggered_watches,.watcher-history*,.ml*,.kibana*].  for the next 6 months daily history indices are allowed to be created, but please make sure that any future history indices after 6 months with the pattern [.watcher-history-YYYY.MM.dd] are allowed to be created
[2018-09-17T11:48:09,538][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [aggs-matrix-stats]
[2018-09-17T11:48:09,538][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [analysis-common]
[2018-09-17T11:48:09,538][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [ingest-common]
[2018-09-17T11:48:09,538][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [lang-expression]
[2018-09-17T11:48:09,538][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [lang-mustache]
[2018-09-17T11:48:09,538][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [lang-painless]
[2018-09-17T11:48:09,538][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [mapper-extras]
[2018-09-17T11:48:09,538][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [parent-join]
[2018-09-17T11:48:09,538][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [percolator]
[2018-09-17T11:48:09,538][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [rank-eval]
[2018-09-17T11:48:09,539][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [reindex]
[2018-09-17T11:48:09,539][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [repository-url]
[2018-09-17T11:48:09,539][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [transport-netty4]
[2018-09-17T11:48:09,539][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [tribe]
[2018-09-17T11:48:09,539][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [x-pack-core]
[2018-09-17T11:48:09,539][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [x-pack-deprecation]
[2018-09-17T11:48:09,539][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [x-pack-graph]
[2018-09-17T11:48:09,539][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [x-pack-logstash]
[2018-09-17T11:48:09,539][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [x-pack-ml]
[2018-09-17T11:48:09,539][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [x-pack-monitoring]
[2018-09-17T11:48:09,539][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [x-pack-rollup]
[2018-09-17T11:48:09,539][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [x-pack-security]
[2018-09-17T11:48:09,540][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [x-pack-sql]
[2018-09-17T11:48:09,540][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [x-pack-upgrade]
[2018-09-17T11:48:09,540][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded module [x-pack-watcher]
[2018-09-17T11:48:09,540][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded plugin [ingest-geoip]
[2018-09-17T11:48:09,540][INFO ][o.e.p.PluginsService     ] [HI4YEIR] loaded plugin [ingest-user-agent]
[2018-09-17T11:48:11,141][WARN ][o.e.d.s.ScriptModule     ] Script: returning default values for missing document values is deprecated. Set system property '-Des.scripting.exception_for_missing_value=true' to make behaviour compatible with future major versions.
[2018-09-17T11:48:12,930][INFO ][o.e.x.s.a.s.FileRolesStore] [HI4YEIR] parsed [0] roles from file [/usr/share/elasticsearch/config/roles.yml]
[2018-09-17T11:48:13,409][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [controller/124] [Main.cc@109] controller (64 bit): Version 6.4.0 (Build cf8246175efff5) Copyright (c) 2018 Elasticsearch BV
[2018-09-17T11:48:13,901][INFO ][o.e.d.DiscoveryModule    ] [HI4YEIR] using discovery type [single-node]
[2018-09-17T11:48:14,537][INFO ][o.e.n.Node               ] [HI4YEIR] initialized
[2018-09-17T11:48:14,537][INFO ][o.e.n.Node               ] [HI4YEIR] starting ...
[2018-09-17T11:48:14,705][INFO ][o.e.t.TransportService   ] [HI4YEIR] publish_address {172.30.0.6:9300}, bound_addresses {0.0.0.0:9300}
[2018-09-17T11:48:14,723][WARN ][o.e.b.BootstrapChecks    ] [HI4YEIR] initial heap size [1073741824] not equal to maximum heap size [17179869184]; this can cause resize pauses and prevents mlockall from locking the entire heap
[2018-09-17T11:48:14,723][WARN ][o.e.b.BootstrapChecks    ] [HI4YEIR] max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2018-09-17T11:48:14,750][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [HI4YEIR] publish_address {172.30.0.6:9200}, bound_addresses {0.0.0.0:9200}
[2018-09-17T11:48:14,750][INFO ][o.e.n.Node               ] [HI4YEIR] started
[2018-09-17T11:48:14,813][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [HI4YEIR] Failed to clear cache for realms [[]]
[2018-09-17T11:48:14,867][INFO ][o.e.g.GatewayService     ] [HI4YEIR] recovered [0] indices into cluster_state
[2018-09-17T11:48:14,969][INFO ][o.e.c.m.MetaDataIndexTemplateService] [HI4YEIR] adding template [.watches] for index patterns [.watches*]
[2018-09-17T11:48:15,015][INFO ][o.e.c.m.MetaDataIndexTemplateService] [HI4YEIR] adding template [.watch-history-9] for index patterns [.watcher-history-9*]
[2018-09-17T11:48:15,039][INFO ][o.e.c.m.MetaDataIndexTemplateService] [HI4YEIR] adding template [.triggered_watches] for index patterns [.triggered_watches*]
[2018-09-17T11:48:15,068][INFO ][o.e.c.m.MetaDataIndexTemplateService] [HI4YEIR] adding template [.monitoring-logstash] for index patterns [.monitoring-logstash-6-*]
[2018-09-17T11:48:15,109][INFO ][o.e.c.m.MetaDataIndexTemplateService] [HI4YEIR] adding template [.monitoring-es] for index patterns [.monitoring-es-6-*]
[2018-09-17T11:48:15,132][INFO ][o.e.c.m.MetaDataIndexTemplateService] [HI4YEIR] adding template [.monitoring-alerts] for index patterns [.monitoring-alerts-6]
[2018-09-17T11:48:15,167][INFO ][o.e.c.m.MetaDataIndexTemplateService] [HI4YEIR] adding template [.monitoring-beats] for index patterns [.monitoring-beats-6-*]
[2018-09-17T11:48:15,193][INFO ][o.e.c.m.MetaDataIndexTemplateService] [HI4YEIR] adding template [.monitoring-kibana] for index patterns [.monitoring-kibana-6-*]
[2018-09-17T11:48:15,309][INFO ][o.e.l.LicenseService     ] [HI4YEIR] license [d8db3744-797c-4065-87a8-25b8958f8fd5] mode [basic] - valid
[2018-09-17T11:48:17,845][INFO ][o.e.c.m.MetaDataCreateIndexService] [HI4YEIR] [1.180914.1021.shared.79e425] creating index, cause [api], templates [], shards [40]/[0], mappings [_doc]
[2018-09-17T11:48:19,732][INFO ][o.e.c.r.a.AllocationService] [HI4YEIR] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[1.180914.1021.shared.79e425][39]] ...]).
[2018-09-17T11:48:19,847][INFO ][o.e.c.m.MetaDataCreateIndexService] [HI4YEIR] [1.180914.1021.shared.32dd90] creating index, cause [api], templates [], shards [40]/[0], mappings [_doc]
[2018-09-17T11:48:20,670][INFO ][o.e.c.m.MetaDataCreateIndexService] [HI4YEIR] [.monitoring-kibana-6-2018.09.17] creating index, cause [auto(bulk api)], templates [.monitoring-kibana], shards [1]/[0], mappings [doc]
[2018-09-17T11:48:21,397][INFO ][o.e.c.r.a.AllocationService] [HI4YEIR] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[1.180914.1021.shared.32dd90][38], [1.180914.1021.shared.32dd90][39]] ...]).
[2018-09-17T11:48:21,467][INFO ][o.e.c.m.MetaDataCreateIndexService] [HI4YEIR] [1.180914.1021.shared.7f7640] creating index, cause [api], templates [], shards [40]/[0], mappings [_doc]
[2018-09-17T11:48:22,840][INFO ][o.e.c.r.a.AllocationService] [HI4YEIR] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[1.180914.1021.shared.7f7640][39], [1.180914.1021.shared.7f7640][38]] ...]).
[2018-09-17T11:48:27,107][INFO ][o.e.c.m.MetaDataCreateIndexService] [HI4YEIR] [.monitoring-es-6-2018.09.17] creating index, cause [auto(bulk api)], templates [.monitoring-es], shards [1]/[0], mappings [doc]
[2018-09-17T11:48:27,626][INFO ][o.e.c.r.a.AllocationService] [HI4YEIR] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.monitoring-es-6-2018.09.17][0]] ...]).
[2018-09-17T11:48:28,727][INFO ][o.e.m.j.JvmGcMonitorService] [HI4YEIR] [gc][14] overhead, spent [270ms] collecting in the last [1s]
[2018-09-17T11:48:29,496][INFO ][o.e.c.m.MetaDataMappingService] [HI4YEIR] [1.180914.1021.shared.7f7640/9z0s7w05QTm9Zja8BWUW1g] update_mapping [_doc]
[2018-09-17T11:48:35,633][INFO ][o.e.c.m.MetaDataCreateIndexService] [HI4YEIR] [.tasks] creating index, cause [auto(task api)], templates [], shards [1]/[1], mappings [task]
[2018-09-17T11:48:35,655][INFO ][o.e.c.r.a.AllocationService] [HI4YEIR] updating number_of_replicas to [0] for indices [.tasks]
[2018-09-17T11:48:36,163][INFO ][o.e.c.r.a.AllocationService] [HI4YEIR] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.tasks][0]] ...]).
[2018-09-17T11:48:38,687][INFO ][o.e.t.LoggingTaskListener] 194 finished with response BulkByScrollResponse[took=12.4s,timed_out=false,sliceId=null,updated=0,created=6251,deleted=0,batches=1,versionConflicts=0,noops=0,retries=0,throttledUntil=0s,bulk_failures=[],search_failures=[]]
[2018-09-17T11:48:38,696][INFO ][o.e.t.LoggingTaskListener] 201 finished with response BulkByScrollResponse[took=14.3s,timed_out=false,sliceId=null,updated=0,created=6249,deleted=0,batches=1,versionConflicts=0,noops=0,retries=0,throttledUntil=0s,bulk_failures=[],search_failures=[]]
[2018-09-17T11:48:39,226][WARN ][o.e.t.TaskManager        ] [HI4YEIR] couldn't store response BulkByScrollResponse[took=15.8s,timed_out=false,sliceId=null,updated=0,created=6251,deleted=0,batches=1,versionConflicts=0,noops=0,retries=0,throttledUntil=0s,bulk_failures=[],search_failures=[]]
org.elasticsearch.transport.RemoteTransportException: [HI4YEIR][172.30.0.6:9300][indices:data/write/bulk[s][p]]
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.TransportService$7@15ac841d on EsThreadPoolExecutor[name = HI4YEIR/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@36d96df1[Running, pool size = 8, active threads = 8, queued tasks = 200, completed tasks = 239]]
	at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:48) ~[elasticsearch-6.4.0.jar:6.4.0]
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:832) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1365) ~[?:?]
	at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.doExecute(EsThreadPoolExecutor.java:98) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:93) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.transport.TransportService.sendLocalRequest(TransportService.java:661) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.transport.TransportService.access$000(TransportService.java:75) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.transport.TransportService$3.sendRequest(TransportService.java:131) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:605) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$1.sendRequest(SecurityServerTransportInterceptor.java:137) [x-pack-security-6.4.0.jar:6.4.0]
	at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:524) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:512) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.performAction(TransportReplicationAction.java:825) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.performLocalAction(TransportReplicationAction.java:743) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.doRun(TransportReplicationAction.java:731) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.replication.TransportReplicationAction.doExecute(TransportReplicationAction.java:169) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.replication.TransportReplicationAction.doExecute(TransportReplicationAction.java:97) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:128) [x-pack-security-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:81) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation.doRun(TransportBulkAction.java:349) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:461) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:174) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:85) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:128) [x-pack-security-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.bulk.TransportSingleItemBulkWriteAction.doExecute(TransportSingleItemBulkWriteAction.java:69) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.bulk.TransportSingleItemBulkWriteAction.doExecute(TransportSingleItemBulkWriteAction.java:44) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:128) [x-pack-security-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:81) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:87) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:76) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:407) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:71) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.tasks.TaskResultsService.doStoreResult(TaskResultsService.java:159) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.tasks.TaskResultsService.storeResult(TaskResultsService.java:134) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.tasks.TaskManager.storeResult(TaskManager.java:237) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction$TaskResultStoringActionListener.onResponse(TransportAction.java:196) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction$TaskResultStoringActionListener.onResponse(TransportAction.java:182) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.index.reindex.AbstractAsyncBulkByScrollAction.lambda$finishHim$4(AbstractAsyncBulkByScrollAction.java:500) [reindex-6.4.0.jar:6.4.0]
	at org.elasticsearch.index.reindex.remote.RemoteScrollableHitSource.lambda$cleanup$2(RemoteScrollableHitSource.java:149) [reindex-6.4.0.jar:6.4.0]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [elasticsearch-6.4.0.jar:6.4.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:844) [?:?]
[2018-09-17T11:48:39,243][WARN ][o.e.t.LoggingTaskListener] 197 failed with exception
org.elasticsearch.transport.RemoteTransportException: [HI4YEIR][172.30.0.6:9300][indices:data/write/bulk[s][p]]
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.TransportService$7@15ac841d on EsThreadPoolExecutor[name = HI4YEIR/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@36d96df1[Running, pool size = 8, active threads = 8, queued tasks = 200, completed tasks = 239]]
	at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:48) ~[elasticsearch-6.4.0.jar:6.4.0]
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:832) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1365) ~[?:?]
	at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.doExecute(EsThreadPoolExecutor.java:98) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:93) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.transport.TransportService.sendLocalRequest(TransportService.java:661) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.transport.TransportService.access$000(TransportService.java:75) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.transport.TransportService$3.sendRequest(TransportService.java:131) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:605) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$1.sendRequest(SecurityServerTransportInterceptor.java:137) [x-pack-security-6.4.0.jar:6.4.0]
	at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:524) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:512) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.performAction(TransportReplicationAction.java:825) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.performLocalAction(TransportReplicationAction.java:743) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.doRun(TransportReplicationAction.java:731) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.replication.TransportReplicationAction.doExecute(TransportReplicationAction.java:169) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.replication.TransportReplicationAction.doExecute(TransportReplicationAction.java:97) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:128) [x-pack-security-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:81) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation.doRun(TransportBulkAction.java:349) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:461) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:174) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:85) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:128) [x-pack-security-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.bulk.TransportSingleItemBulkWriteAction.doExecute(TransportSingleItemBulkWriteAction.java:69) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.bulk.TransportSingleItemBulkWriteAction.doExecute(TransportSingleItemBulkWriteAction.java:44) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:128) [x-pack-security-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:81) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:87) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:76) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:407) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:71) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.tasks.TaskResultsService.doStoreResult(TaskResultsService.java:159) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.tasks.TaskResultsService.storeResult(TaskResultsService.java:134) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.tasks.TaskManager.storeResult(TaskManager.java:237) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction$TaskResultStoringActionListener.onResponse(TransportAction.java:196) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.action.support.TransportAction$TaskResultStoringActionListener.onResponse(TransportAction.java:182) [elasticsearch-6.4.0.jar:6.4.0]
	at org.elasticsearch.index.reindex.AbstractAsyncBulkByScrollAction.lambda$finishHim$4(AbstractAsyncBulkByScrollAction.java:500) [reindex-6.4.0.jar:6.4.0]
	at org.elasticsearch.index.reindex.remote.RemoteScrollableHitSource.lambda$cleanup$2(RemoteScrollableHitSource.java:149) [reindex-6.4.0.jar:6.4.0]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [elasticsearch-6.4.0.jar:6.4.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:844) [?:?]
[2018-09-17T11:48:44,932][INFO ][o.e.t.LoggingTaskListener] 202 finished with response BulkByScrollResponse[took=15.6s,timed_out=false,sliceId=null,updated=0,created=6251,deleted=0,batches=1,versionConflicts=0,noops=0,retries=0,throttledUntil=0s,bulk_failures=[],search_failures=[]]
[2018-09-17T11:48:45,535][INFO ][o.e.t.LoggingTaskListener] 203 finished with response BulkByScrollResponse[took=16.2s,timed_out=false,sliceId=null,updated=0,created=6249,deleted=0,batches=1,versionConflicts=0,noops=0,retries=0,throttledUntil=0s,bulk_failures=[],search_failures=[]]
[2018-09-17T11:48:45,547][INFO ][o.e.t.LoggingTaskListener] 195 finished with response BulkByScrollResponse[took=17.9s,timed_out=false,sliceId=null,updated=0,created=6249,deleted=0,batches=1,versionConflicts=0,noops=0,retries=1,throttledUntil=0s,bulk_failures=[],search_failures=[]]
[2018-09-17T11:48:45,558][INFO ][o.e.t.LoggingTaskListener] 196 finished with response BulkByScrollResponse[took=18.8s,timed_out=false,sliceId=null,updated=0,created=6249,deleted=0,batches=1,versionConflicts=0,noops=0,retries=2,throttledUntil=0s,bulk_failures=[],search_failures=[]]
[2018-09-17T11:48:45,568][INFO ][o.e.t.LoggingTaskListener] 209 finished with response BulkByScrollResponse[took=18.2s,timed_out=false,sliceId=null,updated=0,created=6251,deleted=0,batches=1,versionConflicts=0,noops=0,retries=2,throttledUntil=0s,bulk_failures=[],search_failures=[]]
[2018-09-17T11:48:45,578][INFO ][o.e.t.LoggingTaskListener] 192 finished with response BulkByScrollResponse[took=21.2s,timed_out=false,sliceId=null,updated=0,created=6251,deleted=0,batches=1,versionConflicts=0,noops=0,retries=2,throttledUntil=0s,bulk_failures=[],search_failures=[]]
[2018-09-17T11:48:48,833][INFO ][o.e.t.LoggingTaskListener] 199 finished with response BulkByScrollResponse[took=22.2s,timed_out=false,sliceId=null,updated=0,created=6249,deleted=0,batches=1,versionConflicts=0,noops=0,retries=2,throttledUntil=0s,bulk_failures=[],search_failures=[]]
@javanna javanna added the :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. label Sep 18, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@javanna
Copy link
Member

javanna commented Sep 18, 2018

At first glance, I would say that you increased the number of parallel reindex tasks a bit too much. 16 at the same time cause rejections like you have found out rather than just delays and that is expected. The way that the indexing threadpool works is that we queue up a certain number of requests, but at some point (like in your case) the queue may reach a threshold, which causes rejections. I will let somebody @elastic/es-distributed comment though before closing, just to make sure that there is nothing we can do.

@ywelsch
Copy link
Contributor

ywelsch commented Sep 18, 2018

This looks like a bug to me. I think the same retry policy should be used for storing the task result as for sending the bulk requests. @nik9000 what are your thoughts on this?

@Katulus
Copy link
Author

Katulus commented Sep 18, 2018

I admit that 16 parallel tasks is much and I don't need to run that many. I created this case to point out that the way how ES handles the rejection does not allow client to react. If it instead of removing failed task sets the task result to something like "rejected" and let client back off and retry later it would be perfectly fine.

@nik9000
Copy link
Member

nik9000 commented Oct 6, 2018

@nik9000 what are your thoughts on this?

I think we should be fairly relentless about storing the task result, yes. This is more fuel for the "don't stick task results into an index" argument.

If it instead of removing failed task sets the task result to something like "rejected" and let client back off and retry later it would be perfectly fine.

If we didn't accidentally throw the task result away you'd get something like that. It'd be a failure recorded with the reason set to "rejected" or something along those lines and it'd be up to you to figure out how to handle it. How you handle it is kind of dependant on the data that you were reindexing. Sometimes it'd be fine to just retry the reindex.

@nik9000 nik9000 self-assigned this Oct 6, 2018
@nik9000 nik9000 added :Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. >bug and removed :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. labels Oct 6, 2018
nik9000 added a commit to nik9000/elasticsearch that referenced this issue Oct 29, 2018
Adds about a minute worth of backoffs and retries to saving task
results so it is *much* more likely that a busy cluster won't lose task
results. This isn't an ideal solution to losing task results, but it is
an incremental improvement. If all of the retries fail when still log
the task result, but that is far from ideal.

Closes elastic#33764
nik9000 added a commit that referenced this issue Nov 30, 2018
Adds about a minute worth of backoffs and retries to saving task
results so it is *much* more likely that a busy cluster won't lose task
results. This isn't an ideal solution to losing task results, but it is
an incremental improvement. If all of the retries fail when still log
the task result, but that is far from ideal.

Closes #33764
nik9000 added a commit that referenced this issue Dec 3, 2018
Adds about a minute worth of backoffs and retries to saving task
results so it is *much* more likely that a busy cluster won't lose task
results. This isn't an ideal solution to losing task results, but it is
an incremental improvement. If all of the retries fail when still log
the task result, but that is far from ideal.

Closes #33764
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level.
Projects
None yet
Development

No branches or pull requests

5 participants