Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reindexing a 5.x index in 6.7.1 causes ArrayOutOfBoundsException #41298

Closed
braunsonm opened this issue Apr 17, 2019 · 7 comments
Closed

Reindexing a 5.x index in 6.7.1 causes ArrayOutOfBoundsException #41298

braunsonm opened this issue Apr 17, 2019 · 7 comments
Assignees
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. :Search/Search Search-related issues that do not fall into other categories

Comments

@braunsonm
Copy link

braunsonm commented Apr 17, 2019

Elasticsearch version (bin/elasticsearch --version):
Version: 6.7.1, Build: default/deb/2f32220/2019-04-02T15:59:27.961366Z, JVM: 1.8.0_212

Plugins installed: None

JVM version (java -version):
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-8u212-b01-1~deb9u1-b01)
OpenJDK 64-Bit Server VM (build 25.212-b01, mixed mode)

OS version (uname -a if on a Unix-like system):
Linux 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3+deb9u1 (2017-12-23) x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
I have successfully moved from ES 5.x to 6.7.1, and I wish to reindex my indices before moving to 7.0 as I have some created from in 5.x that I would like to keep.

However this results in the following exception when running this request:

curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "logs"
  },
  "dest": {
    "index": "logs_v6"
  }
}
'
[2019-04-17T08:31:13,268][DEBUG][o.e.a.s.TransportSearchScrollAction] [Q0LH2rW] [12] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [Q0LH2rW][127.0.0.1:9300][indices:data/read/search[phase/fetch/id/scroll]]
Caused by: java.lang.ArrayIndexOutOfBoundsException
        at org.apache.lucene.codecs.compressing.LZ4.decompress(LZ4.java:130) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
        at org.apache.lucene.codecs.compressing.CompressionMode$4.decompress(CompressionMode.java:138) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$BlockState.document(CompressingStoredFieldsReader.java:555) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - $
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.document(CompressingStoredFieldsReader.java:571) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 $
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:578) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-0$
        at org.apache.lucene.index.CodecReader.document(CodecReader.java:84) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
        at org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:341) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
        at org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:435) ~[elasticsearch-6.7.1.jar:6.7.1]
        at org.elasticsearch.search.fetch.FetchPhase.getSearchFields(FetchPhase.java:234) ~[elasticsearch-6.7.1.jar:6.7.1]
        at org.elasticsearch.search.fetch.FetchPhase.createSearchHit(FetchPhase.java:209) ~[elasticsearch-6.7.1.jar:6.7.1]
        at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:160) ~[elasticsearch-6.7.1.jar:6.7.1]
        at org.elasticsearch.search.SearchService.lambda$executeFetchPhase$3(SearchService.java:540) ~[elasticsearch-6.7.1.jar:6.7.1]
        at org.elasticsearch.search.SearchService$3.doRun(SearchService.java:380) [elasticsearch-6.7.1.jar:6.7.1]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.7.1.jar:6.7.1]
        at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) [elasticsearch-6.7.1.jar:6.7.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) [elasticsearch-6.7.1.jar:6.7.1]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.7.1.jar:6.7.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]

Steps to reproduce:

Unfortunately, not sure. I simply run the reindex command. The logs index is fully searchable and mutable.

@jimczi jimczi added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. :Search/Search Search-related issues that do not fall into other categories labels Apr 17, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@henningandersen henningandersen self-assigned this Apr 17, 2019
@henningandersen
Copy link
Contributor

henningandersen commented Apr 17, 2019

@Chaosca this sounds like a corruption at the lucene level. It is reading out the _source for a specific document when encountering this.

First off, if you have replica of the logs index, it is possible that one of the replica does not have the issue. I think that if that was the case, simply rerunning reindex a few times should make it succeed, since the scroll query will eventually hit the good node. Otherwise, if you can find out which node holds the bad shard, you could shut that down and then retry the reindex operation.

You can also consider using the elasticsearch-shard tool to repair, though you will then loose some data. That should be last resort.

@braunsonm
Copy link
Author

Unfortunately this is a single node cluster. Using elasticsearch-shard returns the following for all 5 shards:

Exception in thread "main" ElasticsearchException[Shard does not seem to be corrupted at /home/elasticsearch/data/elasticsearch/nodes/0/indices/zyqMtQmnSLePx1yfKibWYg/4]
        at org.elasticsearch.index.shard.RemoveCorruptedShardDataCommand.lambda$execute$1(RemoveCorruptedShardDataCommand.java:367)
        at org.elasticsearch.index.shard.RemoveCorruptedShardDataCommand.findAndProcessShardPath(RemoveCorruptedShardDataCommand.java:211)
        at org.elasticsearch.index.shard.RemoveCorruptedShardDataCommand.execute(RemoveCorruptedShardDataCommand.java:297)
        at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124)
        at org.elasticsearch.cli.MultiCommand.execute(MultiCommand.java:77)
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124)
        at org.elasticsearch.cli.Command.main(Command.java:90)
        at org.elasticsearch.index.shard.ShardToolCli.main(ShardToolCli.java:35)

@dnhatn dnhatn added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Apr 17, 2019
@ywelsch
Copy link
Contributor

ywelsch commented Apr 23, 2019

@henningandersen AFAICS, the elasticsearch-shard tool only checks shards that have a corruption marker. Given that the corruption here happened on read (and not on write), there is no corruption marker added by ES. I wonder if we should add a flag to the elasticsearch-shard tool that allows running a fully checkIndex in case there is no corruption marker (or make that even the default).

@Chaosca you have two options to detect the corruption: 1) temporarily adding a file to the shard's index folder whose name starts with corrupted_ and then run the elasticsearch-shard tool again, or 2) use the index.shard.check_on_startup index setting (see https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings), which requires closing the index, applying the index setting, and then reopening it.

henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Apr 24, 2019
The shard CLI tool would not do anything if a corruption marker was not
present. But a corruption marker is only added if a corruption is
detected during indexing/writing, not if a search or other read fails.

Changed the tool to always check shards regardless of corruption marker
presence.

Releated to elastic#41298
@braunsonm
Copy link
Author

@ywelsch Thank you, adding then corrupted_ file and running the shard tool resolved my issue.

henningandersen added a commit that referenced this issue May 24, 2019
The shard CLI tool would not do anything if a corruption marker was not
present. But a corruption marker is only added if a corruption is
detected during indexing/writing, not if a search or other read fails.

Changed the tool to always check shards regardless of corruption marker
presence.

Related to #41298
henningandersen added a commit that referenced this issue May 24, 2019
The shard CLI tool would not do anything if a corruption marker was not
present. But a corruption marker is only added if a corruption is
detected during indexing/writing, not if a search or other read fails.

Changed the tool to always check shards regardless of corruption marker
presence.

Related to #41298
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
The shard CLI tool would not do anything if a corruption marker was not
present. But a corruption marker is only added if a corruption is
detected during indexing/writing, not if a search or other read fails.

Changed the tool to always check shards regardless of corruption marker
presence.

Related to elastic#41298
@henningandersen
Copy link
Contributor

We changed the shard CLI tool to always check shards, regardless of corruption marker. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

6 participants