[CI] ShrinkActionIT.testShrinkDuringSnapshot times out #69325

dimitris-athanasiou · 2021-02-22T10:23:36Z

Build scan: https://gradle-enterprise.elastic.co/s/oadnvwmqnkegk

Repro line:

./gradlew ':x-pack:plugin:ilm:qa:multi-node:javaRestTest' --tests "org.elasticsearch.xpack.ilm.actions.ShrinkActionIT.testShrinkDuringSnapshot" -Dtests.seed=293141A690EAFF58 -Dtests.security.manager=true -Dtests.locale=es-PY -Dtests.timezone=BET -Druntime.java=11

Reproduces locally?: Νο

Failure history: Quite a few failures recently: see here

Failure excerpt:

java.net.SocketTimeoutException: 60.000 milliseconds timeout on connection http-outgoing-148 [ACTIVE]
        at __randomizedtesting.SeedInfo.seed([293141A690EAFF58:45481A2E74593713]:0)
        at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:869)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:283)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:286)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:286)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:286)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:270)
        at org.elasticsearch.test.rest.ESRestTestCase.ensureHealth(ESRestTestCase.java:1131)
        at org.elasticsearch.test.rest.ESRestTestCase.ensureHealth(ESRestTestCase.java:1124)
        at org.elasticsearch.test.rest.ESRestTestCase.ensureGreen(ESRestTestCase.java:1111)
        at org.elasticsearch.xpack.TimeSeriesRestDriver.createIndexWithSettings(TimeSeriesRestDriver.java:283)
        at org.elasticsearch.xpack.TimeSeriesRestDriver.createIndexWithSettings(TimeSeriesRestDriver.java:268)
        at org.elasticsearch.xpack.ilm.actions.ShrinkActionIT.testShrinkDuringSnapshot(ShrinkActionIT.java:126)

        Caused by:
        java.net.SocketTimeoutException: 60.000 milliseconds timeout on connection http-outgoing-148 [ACTIVE]
            at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387)
            at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92)
            at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39)
            at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175)
            at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:261)
            at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:502)
            at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:211)
            at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
            at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
            at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
            at java.base/java.lang.Thread.run(Thread.java:834)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-02-22T10:23:39Z

Pinging @elastic/es-core-features (Team:Core/Features)

Relates elastic#69325

Relates #69325

Relates elastic#69325 Backport of elastic#69342

Relates #69325 Backport of #69342

This failure was actually related to a separate assertion trip that caused the node to shut down (hence the timeout). The failed assertion was: ``` [2021-02-22T07:31:26,842][WARN ][o.e.c.s.ClusterApplierService] [javaRestTest-0] failed to apply updated cluster state in [0s]: version [2695], uuid [LEExkxqaR3qhDXvNEq6ypQ], source [Publication{term=6, version=2695}] java.io.UncheckedIOException: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(FrozenIndexInput(_0.cfe)[length=405, file pointer=0, offset=0])) at org.elasticsearch.index.engine.FrozenEngine$2.acquireSearcherInternal(FrozenEngine.java:194) ~[?:?] at org.elasticsearch.index.engine.Engine$SearcherSupplier.acquireSearcher(Engine.java:1187) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] Caused by: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(FrozenIndexInput(_0.cfe)[length=405, file pointer=0, offset=0])) at org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:523) ~[lucene-core-8.8.0.jar:8.8.0 b10659f0fc18b58b90929cfdadde94544d202c4a - noble - 2021-01-25 19:07:45] Suppressed: java.lang.AssertionError: current thread [Thread[elasticsearch[javaRestTest-0][clusterApplierService#updateTask][T#1],5,main]] may not read [name: __Rp_bk_o3QnydJkc1kB2l1w, numberOfParts: 1, partSize: 8192pb, partBytes: 9223372036854775807, metadata: name [_0.cfe], length [405], checksum [1x9s817], writtenBy [8.8.0]] at org.elasticsearch.index.store.BaseSearchableSnapshotIndexInput.assertCurrentThreadMayAccessBlobStore(BaseSearchableSnapshotIndexInput.java:263) ~[?:?] at org.elasticsearch.index.store.BaseSearchableSnapshotIndexInput.openInputStreamFromBlobStore(BaseSearchableSnapshotIndexInput.java:128) ~[?:?] at org.elasticsearch.index.store.cache.FrozenIndexInput.readDirectlyIfAlreadyClosed(FrozenIndexInput.java:424) ~[?:?] at org.elasticsearch.index.store.cache.FrozenIndexInput.doReadInternal(FrozenIndexInput.java:391) ~[?:?] [2021-02-22T07:31:26,847][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [javaRestTest-0] fatal error in thread [elasticsearch[javaRestTest-0][clusterApplierService#updateTask][T#1]], exiting java.lang.AssertionError: null at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:427) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:151) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] ``` Tanguy has recently fixed the underlying issue in a separate PR, so this can be unmuted now. Resolves elastic#69325

This failure was actually related to a separate assertion trip that caused the node to shut down (hence the timeout). The failed assertion was: ``` [2021-02-22T07:31:26,842][WARN ][o.e.c.s.ClusterApplierService] [javaRestTest-0] failed to apply updated cluster state in [0s]: version [2695], uuid [LEExkxqaR3qhDXvNEq6ypQ], source [Publication{term=6, version=2695}] java.io.UncheckedIOException: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(FrozenIndexInput(_0.cfe)[length=405, file pointer=0, offset=0])) at org.elasticsearch.index.engine.FrozenEngine$2.acquireSearcherInternal(FrozenEngine.java:194) ~[?:?] at org.elasticsearch.index.engine.Engine$SearcherSupplier.acquireSearcher(Engine.java:1187) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] Caused by: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(FrozenIndexInput(_0.cfe)[length=405, file pointer=0, offset=0])) at org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:523) ~[lucene-core-8.8.0.jar:8.8.0 b10659f0fc18b58b90929cfdadde94544d202c4a - noble - 2021-01-25 19:07:45] Suppressed: java.lang.AssertionError: current thread [Thread[elasticsearch[javaRestTest-0][clusterApplierService#updateTask][T#1],5,main]] may not read [name: __Rp_bk_o3QnydJkc1kB2l1w, numberOfParts: 1, partSize: 8192pb, partBytes: 9223372036854775807, metadata: name [_0.cfe], length [405], checksum [1x9s817], writtenBy [8.8.0]] at org.elasticsearch.index.store.BaseSearchableSnapshotIndexInput.assertCurrentThreadMayAccessBlobStore(BaseSearchableSnapshotIndexInput.java:263) ~[?:?] at org.elasticsearch.index.store.BaseSearchableSnapshotIndexInput.openInputStreamFromBlobStore(BaseSearchableSnapshotIndexInput.java:128) ~[?:?] at org.elasticsearch.index.store.cache.FrozenIndexInput.readDirectlyIfAlreadyClosed(FrozenIndexInput.java:424) ~[?:?] at org.elasticsearch.index.store.cache.FrozenIndexInput.doReadInternal(FrozenIndexInput.java:391) ~[?:?] [2021-02-22T07:31:26,847][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [javaRestTest-0] fatal error in thread [elasticsearch[javaRestTest-0][clusterApplierService#updateTask][T#1]], exiting java.lang.AssertionError: null at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:427) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:151) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] ``` Tanguy has recently fixed the underlying issue in a separate PR, so this can be unmuted now. Resolves #69325

This failure was actually related to a separate assertion trip that caused the node to shut down (hence the timeout). The failed assertion was: ``` [2021-02-22T07:31:26,842][WARN ][o.e.c.s.ClusterApplierService] [javaRestTest-0] failed to apply updated cluster state in [0s]: version [2695], uuid [LEExkxqaR3qhDXvNEq6ypQ], source [Publication{term=6, version=2695}] java.io.UncheckedIOException: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(FrozenIndexInput(_0.cfe)[length=405, file pointer=0, offset=0])) at org.elasticsearch.index.engine.FrozenEngine$2.acquireSearcherInternal(FrozenEngine.java:194) ~[?:?] at org.elasticsearch.index.engine.Engine$SearcherSupplier.acquireSearcher(Engine.java:1187) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] Caused by: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(FrozenIndexInput(_0.cfe)[length=405, file pointer=0, offset=0])) at org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:523) ~[lucene-core-8.8.0.jar:8.8.0 b10659f0fc18b58b90929cfdadde94544d202c4a - noble - 2021-01-25 19:07:45] Suppressed: java.lang.AssertionError: current thread [Thread[elasticsearch[javaRestTest-0][clusterApplierService#updateTask][T#1],5,main]] may not read [name: __Rp_bk_o3QnydJkc1kB2l1w, numberOfParts: 1, partSize: 8192pb, partBytes: 9223372036854775807, metadata: name [_0.cfe], length [405], checksum [1x9s817], writtenBy [8.8.0]] at org.elasticsearch.index.store.BaseSearchableSnapshotIndexInput.assertCurrentThreadMayAccessBlobStore(BaseSearchableSnapshotIndexInput.java:263) ~[?:?] at org.elasticsearch.index.store.BaseSearchableSnapshotIndexInput.openInputStreamFromBlobStore(BaseSearchableSnapshotIndexInput.java:128) ~[?:?] at org.elasticsearch.index.store.cache.FrozenIndexInput.readDirectlyIfAlreadyClosed(FrozenIndexInput.java:424) ~[?:?] at org.elasticsearch.index.store.cache.FrozenIndexInput.doReadInternal(FrozenIndexInput.java:391) ~[?:?] [2021-02-22T07:31:26,847][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [javaRestTest-0] fatal error in thread [elasticsearch[javaRestTest-0][clusterApplierService#updateTask][T#1]], exiting java.lang.AssertionError: null at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:427) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:151) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] ``` Tanguy has recently fixed the underlying issue in a separate PR, so this can be unmuted now. Resolves elastic#69325

This failure was actually related to a separate assertion trip that caused the node to shut down (hence the timeout). The failed assertion was: ``` [2021-02-22T07:31:26,842][WARN ][o.e.c.s.ClusterApplierService] [javaRestTest-0] failed to apply updated cluster state in [0s]: version [2695], uuid [LEExkxqaR3qhDXvNEq6ypQ], source [Publication{term=6, version=2695}] java.io.UncheckedIOException: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(FrozenIndexInput(_0.cfe)[length=405, file pointer=0, offset=0])) at org.elasticsearch.index.engine.FrozenEngine$2.acquireSearcherInternal(FrozenEngine.java:194) ~[?:?] at org.elasticsearch.index.engine.Engine$SearcherSupplier.acquireSearcher(Engine.java:1187) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] Caused by: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(FrozenIndexInput(_0.cfe)[length=405, file pointer=0, offset=0])) at org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:523) ~[lucene-core-8.8.0.jar:8.8.0 b10659f0fc18b58b90929cfdadde94544d202c4a - noble - 2021-01-25 19:07:45] Suppressed: java.lang.AssertionError: current thread [Thread[elasticsearch[javaRestTest-0][clusterApplierService#updateTask][T#1],5,main]] may not read [name: __Rp_bk_o3QnydJkc1kB2l1w, numberOfParts: 1, partSize: 8192pb, partBytes: 9223372036854775807, metadata: name [_0.cfe], length [405], checksum [1x9s817], writtenBy [8.8.0]] at org.elasticsearch.index.store.BaseSearchableSnapshotIndexInput.assertCurrentThreadMayAccessBlobStore(BaseSearchableSnapshotIndexInput.java:263) ~[?:?] at org.elasticsearch.index.store.BaseSearchableSnapshotIndexInput.openInputStreamFromBlobStore(BaseSearchableSnapshotIndexInput.java:128) ~[?:?] at org.elasticsearch.index.store.cache.FrozenIndexInput.readDirectlyIfAlreadyClosed(FrozenIndexInput.java:424) ~[?:?] at org.elasticsearch.index.store.cache.FrozenIndexInput.doReadInternal(FrozenIndexInput.java:391) ~[?:?] [2021-02-22T07:31:26,847][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [javaRestTest-0] fatal error in thread [elasticsearch[javaRestTest-0][clusterApplierService#updateTask][T#1]], exiting java.lang.AssertionError: null at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:427) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:151) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT] ``` Tanguy has recently fixed the underlying issue in a separate PR, so this can be unmuted now. Resolves #69325

dimitris-athanasiou added >test-failure Triaged test failures from CI :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Feb 22, 2021

elasticmachine added the Team:Data Management Meta label for data/management team label Feb 22, 2021

dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this issue Feb 22, 2021

Mute ShrinkActionIT.testShrinkDuringSnapshot

4ee2f06

Relates elastic#69325

dimitris-athanasiou mentioned this issue Feb 22, 2021

Mute ShrinkActionIT.testShrinkDuringSnapshot #69342

Merged

dimitris-athanasiou added a commit that referenced this issue Feb 22, 2021

Mute ShrinkActionIT.testShrinkDuringSnapshot (#69342)

fe971f2

Relates #69325

dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this issue Feb 22, 2021

[7.x] Mute ShrinkActionIT.testShrinkDuringSnapshot (elastic#69342)

82ef678

Relates elastic#69325 Backport of elastic#69342

dimitris-athanasiou mentioned this issue Feb 22, 2021

[7.x] Mute ShrinkActionIT.testShrinkDuringSnapshot (#69342) #69344

Merged

dimitris-athanasiou added a commit that referenced this issue Feb 22, 2021

[7.x] Mute ShrinkActionIT.testShrinkDuringSnapshot (#69342) (#69344)

1b5e498

Relates #69325 Backport of #69342

dakrone self-assigned this Feb 24, 2021

dakrone mentioned this issue Feb 24, 2021

Remove AwaitsFix from #69325 #69567

Merged

dakrone closed this as completed in #69567 Feb 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] ShrinkActionIT.testShrinkDuringSnapshot times out #69325

[CI] ShrinkActionIT.testShrinkDuringSnapshot times out #69325

dimitris-athanasiou commented Feb 22, 2021

elasticmachine commented Feb 22, 2021

[CI] ShrinkActionIT.testShrinkDuringSnapshot times out #69325

[CI] ShrinkActionIT.testShrinkDuringSnapshot times out #69325

Comments

dimitris-athanasiou commented Feb 22, 2021

elasticmachine commented Feb 22, 2021