Reindexing a 5.x index in 6.7.1 causes ArrayOutOfBoundsException #41298

braunsonm · 2019-04-17T08:36:17Z

Elasticsearch version (bin/elasticsearch --version):
Version: 6.7.1, Build: default/deb/2f32220/2019-04-02T15:59:27.961366Z, JVM: 1.8.0_212

Plugins installed: None

JVM version (java -version):
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-8u212-b01-1~deb9u1-b01)
OpenJDK 64-Bit Server VM (build 25.212-b01, mixed mode)

OS version (uname -a if on a Unix-like system):
Linux 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3+deb9u1 (2017-12-23) x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
I have successfully moved from ES 5.x to 6.7.1, and I wish to reindex my indices before moving to 7.0 as I have some created from in 5.x that I would like to keep.

However this results in the following exception when running this request:

curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "logs"
  },
  "dest": {
    "index": "logs_v6"
  }
}
'

[2019-04-17T08:31:13,268][DEBUG][o.e.a.s.TransportSearchScrollAction] [Q0LH2rW] [12] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [Q0LH2rW][127.0.0.1:9300][indices:data/read/search[phase/fetch/id/scroll]]
Caused by: java.lang.ArrayIndexOutOfBoundsException
        at org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:130) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
        at org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:138) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$BlockState.document(CompressingStoredFieldsReader.java:555) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - $
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.document(CompressingStoredFieldsReader.java:571) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 $
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:578) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-0$
        at org.apache.lucene.index.CodecReader.document(CodecReader.java:84) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
        at org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:341) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
        at org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:435) ~[elasticsearch-6.7.1.jar:6.7.1]
        at org.elasticsearch.search.fetch.FetchPhase.getSearchFields(FetchPhase.java:234) ~[elasticsearch-6.7.1.jar:6.7.1]
        at org.elasticsearch.search.fetch.FetchPhase.createSearchHit(FetchPhase.java:209) ~[elasticsearch-6.7.1.jar:6.7.1]
        at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:160) ~[elasticsearch-6.7.1.jar:6.7.1]
        at org.elasticsearch.search.SearchService.lambda$executeFetchPhase$3(SearchService.java:540) ~[elasticsearch-6.7.1.jar:6.7.1]
        at org.elasticsearch.search.SearchService$3.doRun(SearchService.java:380) [elasticsearch-6.7.1.jar:6.7.1]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.7.1.jar:6.7.1]
        at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) [elasticsearch-6.7.1.jar:6.7.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) [elasticsearch-6.7.1.jar:6.7.1]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.7.1.jar:6.7.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]

Steps to reproduce:

Unfortunately, not sure. I simply run the reindex command. The logs index is fully searchable and mutable.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-04-17T09:02:33Z

Pinging @elastic/es-distributed

elasticmachine · 2019-04-17T09:02:34Z

Pinging @elastic/es-search

henningandersen · 2019-04-17T10:15:26Z

@Chaosca this sounds like a corruption at the lucene level. It is reading out the _source for a specific document when encountering this.

First off, if you have replica of the logs index, it is possible that one of the replica does not have the issue. I think that if that was the case, simply rerunning reindex a few times should make it succeed, since the scroll query will eventually hit the good node. Otherwise, if you can find out which node holds the bad shard, you could shut that down and then retry the reindex operation.

You can also consider using the elasticsearch-shard tool to repair, though you will then loose some data. That should be last resort.

braunsonm · 2019-04-17T16:24:29Z

Unfortunately this is a single node cluster. Using elasticsearch-shard returns the following for all 5 shards:

Exception in thread "main" ElasticsearchException[Shard does not seem to be corrupted at /home/elasticsearch/data/elasticsearch/nodes/0/indices/zyqMtQmnSLePx1yfKibWYg/4]
        at org.elasticsearch.index.shard.RemoveCorruptedShardDataCommand.lambda$execute$1(RemoveCorruptedShardDataCommand.java:367)
        at org.elasticsearch.index.shard.RemoveCorruptedShardDataCommand.findAndProcessShardPath(RemoveCorruptedShardDataCommand.java:211)
        at org.elasticsearch.index.shard.RemoveCorruptedShardDataCommand.execute(RemoveCorruptedShardDataCommand.java:297)
        at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124)
        at org.elasticsearch.cli.MultiCommand.execute(MultiCommand.java:77)
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124)
        at org.elasticsearch.cli.Command.main(Command.java:90)
        at org.elasticsearch.index.shard.ShardToolCli.main(ShardToolCli.java:35)

ywelsch · 2019-04-23T07:23:19Z

@henningandersen AFAICS, the elasticsearch-shard tool only checks shards that have a corruption marker. Given that the corruption here happened on read (and not on write), there is no corruption marker added by ES. I wonder if we should add a flag to the elasticsearch-shard tool that allows running a fully checkIndex in case there is no corruption marker (or make that even the default).

@Chaosca you have two options to detect the corruption: 1) temporarily adding a file to the shard's index folder whose name starts with corrupted_ and then run the elasticsearch-shard tool again, or 2) use the index.shard.check_on_startup index setting (see https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings), which requires closing the index, applying the index setting, and then reopening it.

The shard CLI tool would not do anything if a corruption marker was not present. But a corruption marker is only added if a corruption is detected during indexing/writing, not if a search or other read fails. Changed the tool to always check shards regardless of corruption marker presence. Releated to elastic#41298

braunsonm · 2019-04-26T21:46:52Z

@ywelsch Thank you, adding then corrupted_ file and running the shard tool resolved my issue.

The shard CLI tool would not do anything if a corruption marker was not present. But a corruption marker is only added if a corruption is detected during indexing/writing, not if a search or other read fails. Changed the tool to always check shards regardless of corruption marker presence. Related to #41298

The shard CLI tool would not do anything if a corruption marker was not present. But a corruption marker is only added if a corruption is detected during indexing/writing, not if a search or other read fails. Changed the tool to always check shards regardless of corruption marker presence. Related to elastic#41298

henningandersen · 2019-05-28T11:04:05Z

We changed the shard CLI tool to always check shards, regardless of corruption marker. Closing this issue.

jimczi added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. :Search/Search Search-related issues that do not fall into other categories labels Apr 17, 2019

henningandersen self-assigned this Apr 17, 2019

dnhatn added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Apr 17, 2019

henningandersen mentioned this issue Apr 24, 2019

Shard CLI tool always check shards #41480

Merged

henningandersen closed this as completed May 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reindexing a 5.x index in 6.7.1 causes ArrayOutOfBoundsException #41298

Reindexing a 5.x index in 6.7.1 causes ArrayOutOfBoundsException #41298

braunsonm commented Apr 17, 2019 •

edited

Loading

elasticmachine commented Apr 17, 2019

elasticmachine commented Apr 17, 2019

henningandersen commented Apr 17, 2019 •

edited

Loading

braunsonm commented Apr 17, 2019

ywelsch commented Apr 23, 2019

braunsonm commented Apr 26, 2019

henningandersen commented May 28, 2019

Reindexing a 5.x index in 6.7.1 causes ArrayOutOfBoundsException #41298

Reindexing a 5.x index in 6.7.1 causes ArrayOutOfBoundsException #41298

Comments

braunsonm commented Apr 17, 2019 • edited Loading

elasticmachine commented Apr 17, 2019

elasticmachine commented Apr 17, 2019

henningandersen commented Apr 17, 2019 • edited Loading

braunsonm commented Apr 17, 2019

ywelsch commented Apr 23, 2019

braunsonm commented Apr 26, 2019

henningandersen commented May 28, 2019

braunsonm commented Apr 17, 2019 •

edited

Loading

henningandersen commented Apr 17, 2019 •

edited

Loading