Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FollowerFailOverIT.testFailOverOnFollower fails on 7.x reproducibly #58778

Closed
dakrone opened this issue Jun 30, 2020 · 2 comments
Closed

FollowerFailOverIT.testFailOverOnFollower fails on 7.x reproducibly #58778

dakrone opened this issue Jun 30, 2020 · 2 comments
Labels
:Distributed Indexing/CCR Issues around the Cross Cluster State Replication features Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI

Comments

@dakrone
Copy link
Member

dakrone commented Jun 30, 2020

Build scan:
https://gradle-enterprise.elastic.co/s/66hnn7nhkbtjw

Repro line:

./gradlew ':x-pack:plugin:ccr:internalClusterTest' --tests "org.elasticsearch.xpack.ccr.FollowerFailOverIT.testFailOverOnFollower" -Dtests.seed=ECB3C06592940245 -Dtests.security.manager=true -Dtests.locale=be-BY -Dtests.timezone=Kwajalein -Druntime.java=8

Reproduces locally?:
Yep

Applicable branches:
7.x

Failure history:
None

Failure excerpt:

org.elasticsearch.xpack.ccr.FollowerFailOverIT > testFailOverOnFollower FAILED
    java.lang.AssertionError: timed out waiting for green state
        at __randomizedtesting.SeedInfo.seed([ECB3C06592940245:33E069F906F4A17F]:0)
        at org.junit.Assert.fail(Assert.java:88)
        at org.elasticsearch.xpack.CcrIntegTestCase.ensureColor(CcrIntegTestCase.java:347)
        at org.elasticsearch.xpack.CcrIntegTestCase.ensureFollowerGreen(CcrIntegTestCase.java:321)
        at org.elasticsearch.xpack.CcrIntegTestCase.ensureFollowerGreen(CcrIntegTestCase.java:316)
        at org.elasticsearch.xpack.ccr.FollowerFailOverIT.testFailOverOnFollower(FollowerFailOverIT.java:104)
REPRODUCE WITH: ./gradlew ':x-pack:plugin:ccr:internalClusterTest' --tests "org.elasticsearch.xpack.ccr.FollowerFailOverIT.testFailOverOnFollower" -Dtests.seed=ECB3C06592940245 -Dtests.security.manager=true -Dtests.locale=be-BY -Dtests.timezone=Kwajalein -Druntime.java=8

org.elasticsearch.xpack.ccr.FollowerFailOverIT > classMethod FAILED
    com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.elasticsearch.xpack.ccr.FollowerFailOverIT: 
       1) Thread[id=148, name=Thread-5, state=TIMED_WAITING, group=TGRP-FollowerFailOverIT]
            at sun.misc.Unsafe.park(Native Method)
            at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
            at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:409)
            at org.elasticsearch.xpack.ccr.FollowerFailOverIT.lambda$testFailOverOnFollower$0(FollowerFailOverIT.java:73)
            at org.elasticsearch.xpack.ccr.FollowerFailOverIT$$Lambda$3513/1225697012.run(Unknown Source)
            at java.lang.Thread.run(Thread.java:748)
        at __randomizedtesting.SeedInfo.seed([ECB3C06592940245]:0)

With lots of these in the logs:

  1> [2020-07-01T07:59:13,153][WARN ][o.e.i.c.IndicesClusterStateService] [follower5] [follower_test_failover][0] marking and sending shard failed due to [failed recovery]
  1> org.elasticsearch.indices.recovery.RecoveryFailedException: [follower_test_failover][0]: Recovery failed on {follower5}{wUxLqhZNQ3iWFQc86r0ZjA}{BMnTibr8QkWZJp8IlUhz_w}{127.0.0.1}{127.0.0.1:35485}{d}{xpack.installed=true}
  1> 	at org.elasticsearch.index.shard.IndexShard.lambda$executeRecovery$21(IndexShard.java:2663) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:71) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.index.shard.StoreRecovery.lambda$recoveryListener$6(StoreRecovery.java:362) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:71) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.index.shard.StoreRecovery.lambda$restore$8(StoreRecovery.java:484) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:71) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.xpack.ccr.repository.CcrRepository.lambda$restoreShard$1(CcrRepository.java:324) ~[main/:?]
  1> 	at org.elasticsearch.action.ActionListener$2.onFailure(ActionListener.java:94) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.action.ActionListener$6.onFailure(ActionListener.java:292) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.xpack.ccr.repository.CcrRepository.restoreShard(CcrRepository.java:386) ~[main/:?]
  1> 	at org.elasticsearch.index.shard.StoreRecovery.lambda$restore$10(StoreRecovery.java:507) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.common.util.concurrent.ListenableFuture$1.doRun(ListenableFuture.java:112) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:226) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:106) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.common.util.concurrent.ListenableFuture.addListener(ListenableFuture.java:68) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.action.StepListener.whenComplete(StepListener.java:78) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.index.shard.StoreRecovery.restore(StoreRecovery.java:507) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.index.shard.StoreRecovery.recoverFromRepository(StoreRecovery.java:290) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.index.shard.IndexShard.restoreFromRepository(IndexShard.java:1890) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$17(IndexShard.java:2610) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73) [elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:710) [elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_231]
  1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_231]
  1> 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_231]
  1> Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed recovery
  1> 	... 26 more
  1> Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: restore failed
  1> 	... 24 more
  1> Caused by: org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: failed to restore snapshot [_latest_/_latest_]
  1> 	... 22 more
  1> Caused by: java.lang.IllegalArgumentException: this node does not have the remote_cluster_client role
  1> 	at org.elasticsearch.transport.RemoteClusterService.getRemoteClusterClient(RemoteClusterService.java:450) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.client.node.NodeClient.getRemoteClusterClient(NodeClient.java:123) ~[elasticsearch-7.9.0-SNAPSHOT.jar:7.9.0-SNAPSHOT]
  1> 	at org.elasticsearch.xpack.ccr.repository.CcrRepository.getRemoteClusterClient(CcrRepository.java:166) ~[main/:?]
  1> 	at org.elasticsearch.xpack.ccr.repository.CcrRepository.restoreShard(CcrRepository.java:336) ~[main/:?]
  1> 	... 18 more
@dakrone dakrone added >test-failure Triaged test failures from CI :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features labels Jun 30, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/CCR)

@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jun 30, 2020
@dnhatn
Copy link
Member

dnhatn commented Jun 30, 2020

Duplicate of #58534.

@dnhatn dnhatn closed this as completed Jun 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CCR Issues around the Cross Cluster State Replication features Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

3 participants