Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] ShrinkActionIT.testShrinkDuringSnapshot times out #69325

Closed
dimitris-athanasiou opened this issue Feb 22, 2021 · 1 comment · Fixed by #69567
Closed

[CI] ShrinkActionIT.testShrinkDuringSnapshot times out #69325

dimitris-athanasiou opened this issue Feb 22, 2021 · 1 comment · Fixed by #69567
Assignees
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI

Comments

@dimitris-athanasiou
Copy link
Contributor

Build scan: https://gradle-enterprise.elastic.co/s/oadnvwmqnkegk

Repro line:

./gradlew ':x-pack:plugin:ilm:qa:multi-node:javaRestTest' --tests "org.elasticsearch.xpack.ilm.actions.ShrinkActionIT.testShrinkDuringSnapshot" -Dtests.seed=293141A690EAFF58 -Dtests.security.manager=true -Dtests.locale=es-PY -Dtests.timezone=BET -Druntime.java=11

Reproduces locally?: Νο

Failure history: Quite a few failures recently: see here

Failure excerpt:

java.net.SocketTimeoutException: 60.000 milliseconds timeout on connection http-outgoing-148 [ACTIVE]
        at __randomizedtesting.SeedInfo.seed([293141A690EAFF58:45481A2E74593713]:0)
        at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:869)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:283)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:286)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:286)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:286)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:270)
        at org.elasticsearch.test.rest.ESRestTestCase.ensureHealth(ESRestTestCase.java:1131)
        at org.elasticsearch.test.rest.ESRestTestCase.ensureHealth(ESRestTestCase.java:1124)
        at org.elasticsearch.test.rest.ESRestTestCase.ensureGreen(ESRestTestCase.java:1111)
        at org.elasticsearch.xpack.TimeSeriesRestDriver.createIndexWithSettings(TimeSeriesRestDriver.java:283)
        at org.elasticsearch.xpack.TimeSeriesRestDriver.createIndexWithSettings(TimeSeriesRestDriver.java:268)
        at org.elasticsearch.xpack.ilm.actions.ShrinkActionIT.testShrinkDuringSnapshot(ShrinkActionIT.java:126)

        Caused by:
        java.net.SocketTimeoutException: 60.000 milliseconds timeout on connection http-outgoing-148 [ACTIVE]
            at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387)
            at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92)
            at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39)
            at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175)
            at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:261)
            at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:502)
            at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:211)
            at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
            at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
            at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
            at java.base/java.lang.Thread.run(Thread.java:834)
@dimitris-athanasiou dimitris-athanasiou added >test-failure Triaged test failures from CI :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Feb 22, 2021
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Feb 22, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this issue Feb 22, 2021
dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this issue Feb 22, 2021
@dakrone dakrone self-assigned this Feb 24, 2021
dakrone added a commit to dakrone/elasticsearch that referenced this issue Feb 24, 2021
This failure was actually related to a separate assertion trip that caused the node to shut
down (hence the timeout). The failed assertion was:

```
[2021-02-22T07:31:26,842][WARN ][o.e.c.s.ClusterApplierService] [javaRestTest-0] failed to apply updated cluster state in [0s]:
version [2695], uuid [LEExkxqaR3qhDXvNEq6ypQ], source [Publication{term=6, version=2695}]
java.io.UncheckedIOException: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(FrozenIndexInput(_0.cfe)[length=405, file pointer=0, offset=0]))
	at org.elasticsearch.index.engine.FrozenEngine$2.acquireSearcherInternal(FrozenEngine.java:194) ~[?:?]
	at org.elasticsearch.index.engine.Engine$SearcherSupplier.acquireSearcher(Engine.java:1187) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
Caused by: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(FrozenIndexInput(_0.cfe)[length=405, file pointer=0, offset=0]))
	at org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:523) ~[lucene-core-8.8.0.jar:8.8.0 b10659f0fc18b58b90929cfdadde94544d202c4a - noble - 2021-01-25 19:07:45]
	Suppressed: java.lang.AssertionError: current thread [Thread[elasticsearch[javaRestTest-0][clusterApplierService#updateTask][T#1],5,main]] may not read [name: __Rp_bk_o3QnydJkc1kB2l1w, numberOfParts: 1, partSize: 8192pb, partBytes: 9223372036854775807, metadata: name [_0.cfe], length [405], checksum [1x9s817], writtenBy [8.8.0]]
		at org.elasticsearch.index.store.BaseSearchableSnapshotIndexInput.assertCurrentThreadMayAccessBlobStore(BaseSearchableSnapshotIndexInput.java:263) ~[?:?]
		at org.elasticsearch.index.store.BaseSearchableSnapshotIndexInput.openInputStreamFromBlobStore(BaseSearchableSnapshotIndexInput.java:128) ~[?:?]
		at org.elasticsearch.index.store.cache.FrozenIndexInput.readDirectlyIfAlreadyClosed(FrozenIndexInput.java:424) ~[?:?]
		at org.elasticsearch.index.store.cache.FrozenIndexInput.doReadInternal(FrozenIndexInput.java:391) ~[?:?]
[2021-02-22T07:31:26,847][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [javaRestTest-0] fatal error in thread [elasticsearch[javaRestTest-0][clusterApplierService#updateTask][T#1]], exiting
java.lang.AssertionError: null
	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:427) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:151) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
```

Tanguy has recently fixed the underlying issue in a separate PR, so this can be unmuted now.

Resolves elastic#69325
dakrone added a commit that referenced this issue Feb 25, 2021
This failure was actually related to a separate assertion trip that caused the node to shut
down (hence the timeout). The failed assertion was:

```
[2021-02-22T07:31:26,842][WARN ][o.e.c.s.ClusterApplierService] [javaRestTest-0] failed to apply updated cluster state in [0s]:
version [2695], uuid [LEExkxqaR3qhDXvNEq6ypQ], source [Publication{term=6, version=2695}]
java.io.UncheckedIOException: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(FrozenIndexInput(_0.cfe)[length=405, file pointer=0, offset=0]))
	at org.elasticsearch.index.engine.FrozenEngine$2.acquireSearcherInternal(FrozenEngine.java:194) ~[?:?]
	at org.elasticsearch.index.engine.Engine$SearcherSupplier.acquireSearcher(Engine.java:1187) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
Caused by: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(FrozenIndexInput(_0.cfe)[length=405, file pointer=0, offset=0]))
	at org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:523) ~[lucene-core-8.8.0.jar:8.8.0 b10659f0fc18b58b90929cfdadde94544d202c4a - noble - 2021-01-25 19:07:45]
	Suppressed: java.lang.AssertionError: current thread [Thread[elasticsearch[javaRestTest-0][clusterApplierService#updateTask][T#1],5,main]] may not read [name: __Rp_bk_o3QnydJkc1kB2l1w, numberOfParts: 1, partSize: 8192pb, partBytes: 9223372036854775807, metadata: name [_0.cfe], length [405], checksum [1x9s817], writtenBy [8.8.0]]
		at org.elasticsearch.index.store.BaseSearchableSnapshotIndexInput.assertCurrentThreadMayAccessBlobStore(BaseSearchableSnapshotIndexInput.java:263) ~[?:?]
		at org.elasticsearch.index.store.BaseSearchableSnapshotIndexInput.openInputStreamFromBlobStore(BaseSearchableSnapshotIndexInput.java:128) ~[?:?]
		at org.elasticsearch.index.store.cache.FrozenIndexInput.readDirectlyIfAlreadyClosed(FrozenIndexInput.java:424) ~[?:?]
		at org.elasticsearch.index.store.cache.FrozenIndexInput.doReadInternal(FrozenIndexInput.java:391) ~[?:?]
[2021-02-22T07:31:26,847][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [javaRestTest-0] fatal error in thread [elasticsearch[javaRestTest-0][clusterApplierService#updateTask][T#1]], exiting
java.lang.AssertionError: null
	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:427) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:151) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
```

Tanguy has recently fixed the underlying issue in a separate PR, so this can be unmuted now.

Resolves #69325
dakrone added a commit to dakrone/elasticsearch that referenced this issue Feb 25, 2021
This failure was actually related to a separate assertion trip that caused the node to shut
down (hence the timeout). The failed assertion was:

```
[2021-02-22T07:31:26,842][WARN ][o.e.c.s.ClusterApplierService] [javaRestTest-0] failed to apply updated cluster state in [0s]:
version [2695], uuid [LEExkxqaR3qhDXvNEq6ypQ], source [Publication{term=6, version=2695}]
java.io.UncheckedIOException: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(FrozenIndexInput(_0.cfe)[length=405, file pointer=0, offset=0]))
	at org.elasticsearch.index.engine.FrozenEngine$2.acquireSearcherInternal(FrozenEngine.java:194) ~[?:?]
	at org.elasticsearch.index.engine.Engine$SearcherSupplier.acquireSearcher(Engine.java:1187) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
Caused by: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(FrozenIndexInput(_0.cfe)[length=405, file pointer=0, offset=0]))
	at org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:523) ~[lucene-core-8.8.0.jar:8.8.0 b10659f0fc18b58b90929cfdadde94544d202c4a - noble - 2021-01-25 19:07:45]
	Suppressed: java.lang.AssertionError: current thread [Thread[elasticsearch[javaRestTest-0][clusterApplierService#updateTask][T#1],5,main]] may not read [name: __Rp_bk_o3QnydJkc1kB2l1w, numberOfParts: 1, partSize: 8192pb, partBytes: 9223372036854775807, metadata: name [_0.cfe], length [405], checksum [1x9s817], writtenBy [8.8.0]]
		at org.elasticsearch.index.store.BaseSearchableSnapshotIndexInput.assertCurrentThreadMayAccessBlobStore(BaseSearchableSnapshotIndexInput.java:263) ~[?:?]
		at org.elasticsearch.index.store.BaseSearchableSnapshotIndexInput.openInputStreamFromBlobStore(BaseSearchableSnapshotIndexInput.java:128) ~[?:?]
		at org.elasticsearch.index.store.cache.FrozenIndexInput.readDirectlyIfAlreadyClosed(FrozenIndexInput.java:424) ~[?:?]
		at org.elasticsearch.index.store.cache.FrozenIndexInput.doReadInternal(FrozenIndexInput.java:391) ~[?:?]
[2021-02-22T07:31:26,847][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [javaRestTest-0] fatal error in thread [elasticsearch[javaRestTest-0][clusterApplierService#updateTask][T#1]], exiting
java.lang.AssertionError: null
	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:427) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:151) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
```

Tanguy has recently fixed the underlying issue in a separate PR, so this can be unmuted now.

Resolves elastic#69325
dakrone added a commit that referenced this issue Feb 25, 2021
This failure was actually related to a separate assertion trip that caused the node to shut
down (hence the timeout). The failed assertion was:

```
[2021-02-22T07:31:26,842][WARN ][o.e.c.s.ClusterApplierService] [javaRestTest-0] failed to apply updated cluster state in [0s]:
version [2695], uuid [LEExkxqaR3qhDXvNEq6ypQ], source [Publication{term=6, version=2695}]
java.io.UncheckedIOException: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(FrozenIndexInput(_0.cfe)[length=405, file pointer=0, offset=0]))
	at org.elasticsearch.index.engine.FrozenEngine$2.acquireSearcherInternal(FrozenEngine.java:194) ~[?:?]
	at org.elasticsearch.index.engine.Engine$SearcherSupplier.acquireSearcher(Engine.java:1187) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
Caused by: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(FrozenIndexInput(_0.cfe)[length=405, file pointer=0, offset=0]))
	at org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:523) ~[lucene-core-8.8.0.jar:8.8.0 b10659f0fc18b58b90929cfdadde94544d202c4a - noble - 2021-01-25 19:07:45]
	Suppressed: java.lang.AssertionError: current thread [Thread[elasticsearch[javaRestTest-0][clusterApplierService#updateTask][T#1],5,main]] may not read [name: __Rp_bk_o3QnydJkc1kB2l1w, numberOfParts: 1, partSize: 8192pb, partBytes: 9223372036854775807, metadata: name [_0.cfe], length [405], checksum [1x9s817], writtenBy [8.8.0]]
		at org.elasticsearch.index.store.BaseSearchableSnapshotIndexInput.assertCurrentThreadMayAccessBlobStore(BaseSearchableSnapshotIndexInput.java:263) ~[?:?]
		at org.elasticsearch.index.store.BaseSearchableSnapshotIndexInput.openInputStreamFromBlobStore(BaseSearchableSnapshotIndexInput.java:128) ~[?:?]
		at org.elasticsearch.index.store.cache.FrozenIndexInput.readDirectlyIfAlreadyClosed(FrozenIndexInput.java:424) ~[?:?]
		at org.elasticsearch.index.store.cache.FrozenIndexInput.doReadInternal(FrozenIndexInput.java:391) ~[?:?]
[2021-02-22T07:31:26,847][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [javaRestTest-0] fatal error in thread [elasticsearch[javaRestTest-0][clusterApplierService#updateTask][T#1]], exiting
java.lang.AssertionError: null
	at org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:427) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
	at org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:151) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
```

Tanguy has recently fixed the underlying issue in a separate PR, so this can be unmuted now.

Resolves #69325
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants