[CI] SLMSnapshotBlockingIntegTests.testRetentionWhileSnapshotInProgress failing #47834

mark-vieira · 2019-10-09T22:36:45Z

org.elasticsearch.xpack.slm.SLMSnapshotBlockingIntegTests.testRetentionWhileSnapshotInProgress has started failing pretty often now with the following error:

java.lang.AssertionError: expected at least one master-eligible node left in {node_sc1=org.elasticsearch.test.InternalTestCluster$NodeAndClient@6e458321}

Here are some example build scans:
https://gradle-enterprise.elastic.co/s/hlldgfw3cx4bc/tests/rfsroxnx4sflo-uvkkyo6qd6mno
https://gradle-enterprise.elastic.co/s/i4ncjbmiflt6a/tests/rfsroxnx4sflo-uvkkyo6qd6mno
https://gradle-enterprise.elastic.co/s/33mp5rqowzsbi/tests/rfsroxnx4sflo-uvkkyo6qd6mno

Looking at build-stats this looks to be happening in both master and 7.x.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-10-09T22:36:47Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

mark-vieira · 2019-10-09T22:38:59Z

This has been failing often enough that I've muted this in master and 7.x.

One of the tests in this suit stops a master node, plus we're doing other node starts in this suit. => the internal test cluster should be TEST and not `SUITE` scoped to avoid random failures like the one in elastic#47834 Closes elastic#47834

elasticmachine · 2019-10-10T08:16:20Z

Pinging @elastic/es-core-features (:Core/Features/ILM+SLM)

One of the tests in this suit stops a master node, plus we're doing other node starts in this suit. => the internal test cluster should be TEST and not `SUITE` scoped to avoid random failures like the one in #47834 Closes #47834

One of the tests in this suit stops a master node, plus we're doing other node starts in this suit. => the internal test cluster should be TEST and not `SUITE` scoped to avoid random failures like the one in elastic#47834 Closes elastic#47834

One of the tests in this suit stops a master node, plus we're doing other node starts in this suit. => the internal test cluster should be TEST and not `SUITE` scoped to avoid random failures like the one in #47834 Closes #47834

mayya-sharipova · 2019-10-23T21:12:39Z

The test failed on the intake: https://gradle-enterprise.elastic.co/s/anqtao57gamow

org.elasticsearch.repositories.RepositoryException: [my-repo] Could not determine repository generation from root blobsClose stacktrace
at __randomizedtesting.SeedInfo.seed([AF89D7C3B10D5638:D253C9D96B797893]:0)
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getRepositoryData(BlobStoreRepository.java:906)
at org.elasticsearch.snapshots.SnapshotsService.getRepositoryData(SnapshotsService.java:163)
at org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.buildResponse(TransportSnapshotsStatusAction.java:201)
at org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.masterOperation(TransportSnapshotsStatusAction.java:105)
at org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.masterOperation(TransportSnapshotsStatusAction.java:65)
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.lambda$doStart$3(TransportMasterNodeAction.java:166)
at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:769)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.lang.Thread.run(Thread.java:834)
Caused by: java.nio.file.NoSuchFileException: /dev/shm/elastic+elasticsearch+master+multijob+fast+part2/x-pack/plugin/ilm/build/testrun/test/temp/org.elasticsearch.xpack.slm.SLMSnapshotBlockingIntegTests_AF89D7C3B10D5638-001/tempDir-002/repos/UOoOkN/index-0Open stacktrace

original-brownbear · 2019-10-23T22:51:44Z

This will be trivial to fix now thanks to #48329 , I'm on it :)

Just like #48329 (and using the changes) in that PR we can run into a concurrent repo modification that we will throw on and must retry until consistent handling of this situation is implemented. Closes #47834

dnhatn · 2019-10-30T19:14:38Z

org.elasticsearch.client.SnapshotIT > testCreateSnapshot FAILED
    org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=repository_exception, reason=[test_repository] Could not determine repository generation from root blobs]
        at __randomizedtesting.SeedInfo.seed([BCF37B148E73CF08:4295EEA7B8C03D11]:0)
        at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177)
        at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1793)
        at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1770)
        at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1527)
        at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1484)
        at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1454)
        at org.elasticsearch.client.SnapshotClient.delete(SnapshotClient.java:344)
        at org.elasticsearch.client.ESRestHighLevelClientTestCase.execute(ESRestHighLevelClientTestCase.java:90)
        at org.elasticsearch.client.ESRestHighLevelClientTestCase.execute(ESRestHighLevelClientTestCase.java:81)
        at org.elasticsearch.client.SnapshotIT.testCreateSnapshot(SnapshotIT.java:167)

        Caused by:
        org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=no_such_file_exception, reason=/dev/shm/elastic+elasticsearch+master+multijob+fast+part1/client/rest-high-level/build/testclusters/integTest-0/repo/index-18]
            at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
            at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
            at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
            at org.elasticsearch.ElasticsearchException.failureFromXContent(ElasticsearchException.java:603)
            at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:169)
            ... 9 more
REPRODUCE WITH: ./gradlew ':client:rest-high-level:integTestRunner' --tests "org.elasticsearch.client.SnapshotIT.testCreateSnapshot" -Dtests.seed=BCF37B148E73CF08 -Dtests.security.manager=true -Dtests.locale=es-US -Dtests.timezone=America/Jujuy -Dcompiler.java=12 -Druntime.java=11

This failed again on master but with HLRC: https://gradle-enterprise.elastic.co/s/5abbkuha6acvy/console-log

This is intended as a stop-gap solution/improvement to #38941 that prevents repo modifications without an intermittent master failover from causing inconsistent (outdated due to inconsistent listing of index-N blobs) `RepositoryData` to be written. Tracking the latest repository generation will move to the cluster state in a separate pull request. This is intended as a low-risk change to be backported as far as possible and motived by the recently increased chance of #38941 causing trouble via SLM (see #47520). Closes #47834 Closes #49048

This is intended as a stop-gap solution/improvement to elastic#38941 that prevents repo modifications without an intermittent master failover from causing inconsistent (outdated due to inconsistent listing of index-N blobs) `RepositoryData` to be written. Tracking the latest repository generation will move to the cluster state in a separate pull request. This is intended as a low-risk change to be backported as far as possible and motived by the recently increased chance of elastic#38941 causing trouble via SLM (see elastic#47520). Closes elastic#47834 Closes elastic#49048

This is intended as a stop-gap solution/improvement to #38941 that prevents repo modifications without an intermittent master failover from causing inconsistent (outdated due to inconsistent listing of index-N blobs) `RepositoryData` to be written. Tracking the latest repository generation will move to the cluster state in a separate pull request. This is intended as a low-risk change to be backported as far as possible and motived by the recently increased chance of #38941 causing trouble via SLM (see #47520). Closes #47834 Closes #49048

mark-vieira added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Oct 9, 2019

original-brownbear self-assigned this Oct 10, 2019

original-brownbear mentioned this issue Oct 10, 2019

Fix SLMSnapshotBlockingIntegTests #47841

Merged

original-brownbear added :Data Management/ILM+SLM Index and Snapshot lifecycle management and removed :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Oct 10, 2019

original-brownbear closed this as completed in #47841 Oct 10, 2019

original-brownbear mentioned this issue Oct 10, 2019

Fix SLMSnapshotBlockingIntegTests (#47841) #47863

Merged

mayya-sharipova reopened this Oct 23, 2019

original-brownbear mentioned this issue Oct 24, 2019

Handle Concurrent Repo Modification to Fix Test #48433

Merged

original-brownbear closed this as completed in #48433 Oct 24, 2019

dnhatn reopened this Oct 30, 2019

original-brownbear mentioned this issue Nov 12, 2019

Track Repository Gen. in BlobStoreRepository #48944

Merged

original-brownbear closed this as completed in #48944 Nov 14, 2019

original-brownbear mentioned this issue Nov 14, 2019

Track Repository Gen. in BlobStoreRepository (#48944) #49116

Merged

original-brownbear mentioned this issue Nov 14, 2019

Track Repository Gen. in BlobStoreRepository (#48944) #49119

Merged

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] SLMSnapshotBlockingIntegTests.testRetentionWhileSnapshotInProgress failing #47834

[CI] SLMSnapshotBlockingIntegTests.testRetentionWhileSnapshotInProgress failing #47834

mark-vieira commented Oct 9, 2019

elasticmachine commented Oct 9, 2019

mark-vieira commented Oct 9, 2019

elasticmachine commented Oct 10, 2019

mayya-sharipova commented Oct 23, 2019 •

edited

Loading

original-brownbear commented Oct 23, 2019

dnhatn commented Oct 30, 2019

[CI] SLMSnapshotBlockingIntegTests.testRetentionWhileSnapshotInProgress failing #47834

[CI] SLMSnapshotBlockingIntegTests.testRetentionWhileSnapshotInProgress failing #47834

Comments

mark-vieira commented Oct 9, 2019

elasticmachine commented Oct 9, 2019

mark-vieira commented Oct 9, 2019

elasticmachine commented Oct 10, 2019

mayya-sharipova commented Oct 23, 2019 • edited Loading

original-brownbear commented Oct 23, 2019

dnhatn commented Oct 30, 2019

mayya-sharipova commented Oct 23, 2019 •

edited

Loading