[CI] IndexFollowingIT.testFollowIndex failure #38949

iverase · 2019-02-15T12:14:09Z

This test has fail with an assertion error. In the logs what looks relevant Is the following:

  1> [2019-02-15T08:37:44,302][INFO ][o.e.c.m.MetaDataMappingService] [leaderm2] [index1/zhspcSQ3Su-DfqnT8FirZg] create_mapping [doc]
  1> [2019-02-15T08:37:44,327][INFO ][o.e.c.m.MetaDataMappingService] [follower1] [index2/48YajD45TdeuzdasRRO35A] create_mapping [doc]
  1> [2019-02-15T08:37:45,353][INFO ][o.e.c.m.MetaDataIndexStateService] [follower1] closing indices [index2/48YajD45TdeuzdasRRO35A]
  1> [2019-02-15T08:37:45,368][INFO ][o.e.c.m.MetaDataIndexStateService] [follower1] completed closing of indices [index2]
  1> [2019-02-15T08:37:45,385][WARN ][o.e.x.c.a.b.TransportBulkShardOperationsAction] [follower1] unexpected error during the primary phase for action [indices:data/write/bulk_shard_operations[s]], request [BulkShardOperationsRequest{historyUUID=N2HT1imJT2-IDpPZ0D6Mog, operations=1, maxSeqNoUpdates=-1, shardId=[index2][0], timeout=1m, index='index2', waitForActiveShards=DEFAULT}]
  1> org.elasticsearch.indices.IndexClosedException: closed
  1> 	at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:215) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]

It does not reproduce locally:

./gradlew :x-pack:plugin:ccr:internalClusterTest \
  -Dtests.seed=37D22C55D4DB9A44 \
  -Dtests.class=org.elasticsearch.xpack.ccr.IndexFollowingIT \
  -Dtests.method="testFollowIndex" \
  -Dtests.security.manager=true \
  -Dtests.locale=es-CU \
  -Dtests.timezone=America/Argentina/Tucuman \
  -Dcompiler.java=11 \
  -Druntime.java=8

Console log: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+intake/122/console

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-02-15T12:14:10Z

Pinging @elastic/es-distributed

martijnvg · 2019-02-15T13:16:54Z

The shards of the follow index could not be acked (PutFollowAction.Response#isFollowIndexShardsAcked() returned false). This is the first time that I've seen this test failed at this assertion.

It appears this failure is caught because of missing ops during file based recovery:

[2019-02-15T08:36:57,681][WARN ][o.e.i.c.IndicesClusterStateService] [follower0] [[index2][0]] marking and sending shard failed due to [failed recovery]
  1> org.elasticsearch.indices.recovery.RecoveryFailedException: [index2][0]: Recovery failed from {follower1}{iqu96z6sTPyAZFgqumPkDA}{2Y6Xuyf3SWi11ApN6Rr_vQ}{127.0.0.1}{127.0.0.1:36877}{xpack.installed=true} in
to {follower0}{zgOtImoTSEuCUs71SNSBqA}{k5d-fAtgRN2TKv_5578QOg}{127.0.0.1}{127.0.0.1:46759}{xpack.installed=true}
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.lambda$doRecovery$2(PeerRecoveryTargetService.java:253) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$1.handleException(PeerRecoveryTargetService.java:298) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.PlainTransportFuture.handleException(PlainTransportFuture.java:97) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1118) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.TcpTransport.lambda$handleException$23(TcpTransport.java:1000) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_202]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_202]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [follower1][127.0.0.1:36877][internal:index/shard/recovery/start_recovery]
  1> Caused by: java.lang.IllegalStateException: translog replay failed to cover required sequence numbers (required range [9:63). first missing op is [9]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.lambda$phase2$25(RecoverySourceHandler.java:592) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.sendBatch(RecoverySourceHandler.java:648) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.lambda$sendBatch$26(RecoverySourceHandler.java:634) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.indices.recovery.RemoteRecoveryTargetHandler.lambda$indexTranslogOperations$4(RemoteRecoveryTargetHandler.java:133) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:54) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.PlainTransportFuture.handleResponse(PlainTransportFuture.java:87) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1108) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.TcpTransport$1.doRun(TcpTransport.java:974) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    ... 3 more

@tvernum, @ywelsch or @dnhatn Have we seen this IllegalStateException failure before during file based recovery?

jpountz · 2019-02-15T14:34:19Z

@iverase Let's mute this test?

Relates to #38949

dnhatn · 2019-02-16T00:18:34Z

Thanks @martijnvg for the initial look. I have some idea of what is happening.

martijnvg · 2019-02-18T14:14:35Z

The IndexFollowingIT#testFollowIndexWithConcurrentMappingChanges test fails for the same reason as above. The follower index can't get into a green state:

[2019-02-18T13:07:38,034][INFO ][o.e.x.c.IndexFollowingIT ] [testFollowIndexWithConcurrentMappingChanges] after test
FAILURE 30.4s J1 | IndexFollowingIT.testFollowIndexWithConcurrentMappingChanges <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: timed out waiting for green state
   >    at __randomizedtesting.SeedInfo.seed([702638CC73823FFE:5754AC34DA21D6DA]:0)
   >    at org.elasticsearch.xpack.CcrIntegTestCase.ensureColor(CcrIntegTestCase.java:336)
   >    at org.elasticsearch.xpack.CcrIntegTestCase.ensureFollowerGreen(CcrIntegTestCase.java:310)
   >    at org.elasticsearch.xpack.CcrIntegTestCase.ensureFollowerGreen(CcrIntegTestCase.java:305)
   >    at org.elasticsearch.xpack.ccr.IndexFollowingIT.testFollowIndexWithConcurrentMappingChanges(IndexFollowingIT.java:269)
   >    at java.lang.Thread.run(Thread.java:748)

Error in the logs:

[2019-02-18T13:07:08,840][WARN ][o.e.c.r.a.AllocationService] [follower1] failing shard [failed shard, shard [index2][0], node[RmgnfY6vTDe-fyixPwBAsA], [R], recovery_source[peer recovery], s[INITIALIZING], 
a[id=EVB96a62RQW6pTtldBWYcA], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-02-18T13:07:08.659Z], failed_attempts[4], delayed=false, details[failed shard on node [RmgnfY6vTDe-fyixPwBAsA]: failed to perform
 indices:data/write/bulk_shard_operations[s] on replica [index2][0], node[RmgnfY6vTDe-fyixPwBAsA], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=gNA_4HuDT7S4NJfIz6RJZA], unassigned_info[[reason=ALLO
CATION_FAILED], at[2019-02-18T13:07:08.480Z], failed_attempts[3], delayed=false, details[failed shard on node [RmgnfY6vTDe-fyixPwBAsA]: failed recovery, failure RecoveryFailedException[[index2][0]: Recovery fail
ed from {follower1}{TF5Ju3gbQimfP5f5UAx1Pw}{EZ9qhrdkR-GZ-r3kGcu-QA}{127.0.0.1}{127.0.0.1:33857}{xpack.installed=true} into {follower0}{RmgnfY6vTDe-fyixPwBAsA}{k9gtij__QTqzLoW2w45oBg}{127.0.0.1}{127.0.0.1:33247}{
xpack.installed=true}]; nested: RemoteTransportException[[follower1][127.0.0.1:33857][internal:index/shard/recovery/start_recovery]]; nested: IllegalStateException[translog replay failed to cover required sequen
ce numbers (required range [23:58). first missing op is [23]]; ], allocation_status[no_attempt]], failure RemoteTransportException[[follower0][127.0.0.1:33247][indices:data/write/bulk_shard_operations[s][r]]]; n
ested: ShardNotFoundException[no such shard]; ], allocation_status[no_attempt]], message [failed recovery], failure [RecoveryFailedException[[index2][0]: Recovery failed from {follower1}{TF5Ju3gbQimfP5f5UAx1Pw}{
EZ9qhrdkR-GZ-r3kGcu-QA}{127.0.0.1}{127.0.0.1:33857}{xpack.installed=true} into {follower0}{RmgnfY6vTDe-fyixPwBAsA}{k9gtij__QTqzLoW2w45oBg}{127.0.0.1}{127.0.0.1:33247}{xpack.installed=true}]; nested: RemoteTransp
ortException[[follower1][127.0.0.1:33857][internal:index/shard/recovery/start_recovery]]; nested: IllegalStateException[translog replay failed to cover required sequence numbers (required range [23:85). first mi
ssing op is [23]]; ], markAsStale [true]]
  1> org.elasticsearch.indices.recovery.RecoveryFailedException: [index2][0]: Recovery failed from {follower1}{TF5Ju3gbQimfP5f5UAx1Pw}{EZ9qhrdkR-GZ-r3kGcu-QA}{127.0.0.1}{127.0.0.1:33857}{xpack.installed=true} in
to {follower0}{RmgnfY6vTDe-fyixPwBAsA}{k9gtij__QTqzLoW2w45oBg}{127.0.0.1}{127.0.0.1:33247}{xpack.installed=true}
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.lambda$doRecovery$2(PeerRecoveryTargetService.java:253) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$1.handleException(PeerRecoveryTargetService.java:298) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.PlainTransportFuture.handleException(PlainTransportFuture.java:97) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1118) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.TcpTransport.lambda$handleException$23(TcpTransport.java:1000) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_202]
  1>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_202]
  1>    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
  1> Caused by: org.elasticsearch.transport.RemoteTransportException: [follower1][127.0.0.1:33857][internal:index/shard/recovery/start_recovery]
  1> Caused by: java.lang.IllegalStateException: translog replay failed to cover required sequence numbers (required range [23:85). first missing op is [23]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.lambda$phase2$25(RecoverySourceHandler.java:592) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.sendBatch(RecoverySourceHandler.java:648) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.indices.recovery.RecoverySourceHandler.lambda$sendBatch$26(RecoverySourceHandler.java:634) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.indices.recovery.RemoteRecoveryTargetHandler.lambda$indexTranslogOperations$4(RemoteRecoveryTargetHandler.java:133) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:61) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:54) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.PlainTransportFuture.handleResponse(PlainTransportFuture.java:87) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1108) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.transport.TcpTransport$1.doRun(TcpTransport.java:974) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
  1>    ... 3 more

Error in the cluster state:

[2019-02-18T13:07:37,939][INFO ][o.e.x.c.IndexFollowingIT ] [testFollowIndexWithConcurrentMappingChanges] ensureGreen timed out, cluster state:
  1> cluster uuid: Ng9EU-PNRtKSGAho1lXeuw
  1> version: 133
  1> state uuid: MHeg6LFATgqtJNXpGR7jCQ
  1> from_diff: false
  1> meta data version: 111
  1>    coordination_metadata:
  1>       term: 1
  1>       last_committed_config: VotingConfiguration{TF5Ju3gbQimfP5f5UAx1Pw}
  1>       last_accepted_config: VotingConfiguration{TF5Ju3gbQimfP5f5UAx1Pw}
  1>       voting tombstones: []
  1>    [index2/WvAgjeByTKyKyHyfv4F3_g]: v[29], mv[26], sv[1]
  1>       0: p_term [1], isa_ids [4_9HUKWCRFabr2ctMYuJDA]
  1> metadata customs:
  1>    persistent_tasks: {"last_allocation_id":1,"tasks":[{"id":"WvAgjeByTKyKyHyfv4F3_g-0","task":{"xpack/ccr/shard_follow_task":{"params":{"remote_cluster":"leader_cluster","follow_shard_index":"index2","follo
w_shard_index_uuid":"WvAgjeByTKyKyHyfv4F3_g","follow_shard_shard":0,"leader_shard_index":"index1","leader_shard_index_uuid":"iV7aduaNSqyO9sJE9bZgOw","leader_shard_shard":0,"max_read_request_operation_count":5120
,"max_write_request_operation_count":5120,"max_outstanding_read_requests":12,"max_outstanding_write_requests":9,"max_read_request_size":"32mb","max_write_request_size":"9223372036854775807b","max_write_buffer_co
unt":2147483647,"max_write_buffer_size":"512mb","max_retry_delay":"10ms","read_poll_timeout":"10ms","headers":{}}}},"allocation_id":1,"assignment":{"executor_node":"TF5Ju3gbQimfP5f5UAx1Pw","explanation":""}}]}  
 index-graveyard: IndexGraveyard[[[index=[index2/oUo45WFIT4KD6fx44LuOug], deleteDate=2019-02-18T13:06:54.275Z], [index=[index2/Jig6XuL7S1ugGmHMbpgYZw], deleteDate=2019-02-18T13:06:55.007Z], [index=[index2/0ou9jM
_IT9Sg2LWBXoAbog], deleteDate=2019-02-18T13:06:56.962Z], [index=[follower/jblDMPVeTx2yc_6Bguu2vg], deleteDate=2019-02-18T13:07:00.029Z], [index=[follower/IYSgSyO9Su-I9uX9hKb5gQ], deleteDate=2019-02-18T13:07:03.0
73Z], [index=[index2/1On8lJ1aQuGV4y8J6I65hg], deleteDate=2019-02-18T13:07:05.700Z], [index=[index2/-Jd8S67ETySPcNWbbm2-kA], deleteDate=2019-02-18T13:07:07.622Z]]]
  1> nodes: 
  1>    {follower1}{TF5Ju3gbQimfP5f5UAx1Pw}{EZ9qhrdkR-GZ-r3kGcu-QA}{127.0.0.1}{127.0.0.1:33857}{xpack.installed=true}, local, master
  1>    {follower0}{RmgnfY6vTDe-fyixPwBAsA}{k9gtij__QTqzLoW2w45oBg}{127.0.0.1}{127.0.0.1:33247}{xpack.installed=true}
  1> routing_table (version 63):
  1> -- index [[index2/WvAgjeByTKyKyHyfv4F3_g]]
  1> ----shard_id [index2][0]
  1> --------[index2][0], node[TF5Ju3gbQimfP5f5UAx1Pw], [P], s[STARTED], a[id=4_9HUKWCRFabr2ctMYuJDA]
  1> --------[index2][0], node[null], [R], recovery_source[peer recovery], s[UNASSIGNED], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-02-18T13:07:08.840Z], failed_attempts[5], delayed=false, details[failed shard on node [RmgnfY6vTDe-fyixPwBAsA]: failed recovery, failure RecoveryFailedException[[index2][0]: Recovery failed from {follower1}{TF5Ju3gbQimfP5f5UAx1Pw}{EZ9qhrdkR-GZ-r3kGcu-QA}{127.0.0.1}{127.0.0.1:33857}{xpack.installed=true} into {follower0}{RmgnfY6vTDe-fyixPwBAsA}{k9gtij__QTqzLoW2w45oBg}{127.0.0.1}{127.0.0.1:33247}{xpack.installed=true}]; nested: RemoteTransportException[[follower1][127.0.0.1:33857][internal:index/shard/recovery/start_recovery]]; nested: IllegalStateException[translog replay failed to cover required sequence numbers (required range [23:85). first missing op is [23]]; ], allocation_status[no_attempt]]
  1> routing_nodes:
  1> -----node_id[TF5Ju3gbQimfP5f5UAx1Pw][V]
  1> --------[index2][0], node[TF5Ju3gbQimfP5f5UAx1Pw], [P], s[STARTED], a[id=4_9HUKWCRFabr2ctMYuJDA]
  1> -----node_id[RmgnfY6vTDe-fyixPwBAsA][V]
  1> ---- unassigned
  1> --------[index2][0], node[null], [R], recovery_source[peer recovery], s[UNASSIGNED], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-02-18T13:07:08.840Z], failed_attempts[5], delayed=false, details[failed shard on node [RmgnfY6vTDe-fyixPwBAsA]: failed recovery, failure RecoveryFailedException[[index2][0]: Recovery failed from {follower1}{TF5Ju3gbQimfP5f5UAx1Pw}{EZ9qhrdkR-GZ-r3kGcu-QA}{127.0.0.1}{127.0.0.1:33857}{xpack.installed=true} into {follower0}{RmgnfY6vTDe-fyixPwBAsA}{k9gtij__QTqzLoW2w45oBg}{127.0.0.1}{127.0.0.1:33247}{xpack.installed=true}]; nested: RemoteTransportException[[follower1][127.0.0.1:33857][internal:index/shard/recovery/start_recovery]]; nested: IllegalStateException[translog replay failed to cover required sequence numbers (required range [23:85). first missing op is [23]]; ], allocation_status[no_attempt]]
  1> customs:
  1>    restore: RestoreInProgress[[]]
  1> tasks: (0):

Build url: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+intake/156/console

I'm going to mute this test as well.

martijnvg · 2019-02-18T14:30:39Z

This is another instance of the test failure in the previous commit: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+release-tests/462/console

…ate-file * elastic/master: Remove tests and branches that will never execute (elastic#38772) also check ccr stats api return empty response in ensureNoCcrTasks() Add overlapping, before, after filters to intervals query (elastic#38999) Mute test elastic#38949 Add remote recovery to ShardFollowTaskReplicationTests (elastic#39007) [ML] More advanced post-test cleanup of ML indices (elastic#39049) wait for shard to be allocated before executing a resume follow api Update track-total-hits.asciidoc Force kill testcluster nodes (elastic#37353) Make pullFixture a task dependency of resolveAllDependencies (elastic#38956) set minimum supported version (elastic#39043) Enforce Completion Context Limit (elastic#38675) Mute test Don't close caches while there might still be in-flight requests. (elastic#38958) Fix elastic#38623 remove xpack namespace REST API (elastic#38625) Add data frame feature (elastic#38934) Test bi-directional index following during a rolling upgrade. (elastic#38962) Generate mvn pom for ssl-config library (elastic#39019) Mute testRetentionLeaseIsRenewedDuringRecovery

Relates to #38949

With this change, we won't wait for the local checkpoint to advance to the max_seq_no before starting phase2 of peer-recovery. We also remove the sequence number range check in peer-recovery. We can safely do these thanks to Yannick's finding. The replication group to be used is currently sampled after indexing into the primary (see `ReplicationOperation` class). This means that when initiating tracking of a new replica, we have to consider the following two cases: - There are operations for which the replication group has not been sampled yet. As we initiated the new replica as tracking, we know that those operations will be replicated to the new replica and follow the typical replication group semantics (e.g. marked as stale when unavailable). - There are operations for which the replication group has already been sampled. These operations will not be sent to the new replica. However, we know that those operations are already indexed into Lucene and the translog on the primary, as the sampling is happening after that. This means that by taking a snapshot of Lucene or the translog, we will be getting those ops as well. What we cannot guarantee anymore is that all ops up to `endingSeqNo` are available in the snapshot (i.e. also see comment in `RecoverySourceHandler` saying `We need to wait for all operations up to the current max to complete, otherwise we can not guarantee that all operations in the required range will be available for replaying from the translog of the source.`). This is not needed, though, as we can no longer guarantee that max seq no == local checkpoint. Relates #39000 Closes #38949 Co-authored-by: Yannick Welsch <[email protected]>

iverase added >test-failure Triaged test failures from CI :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features labels Feb 15, 2019

martijnvg self-assigned this Feb 15, 2019

iverase added a commit that referenced this issue Feb 15, 2019

[CI] Muting method testFollowIndex in IndexFollowingIT

296febb

Relates to #38949

iverase added a commit that referenced this issue Feb 15, 2019

[CI] Muting method testFollowIndex in IndexFollowingIT

b19b778

Relates to #38949

dnhatn self-assigned this Feb 15, 2019

This was referenced Feb 16, 2019

[CI] FollowerFailOverIT testFailOverOnFollower fails frequently #38633

Closed

Replica recovery on follower can fail after recovery from remote #39000

Closed

Do not wait for advancement of checkpoint in recovery #39006

Merged

martijnvg added a commit that referenced this issue Feb 18, 2019

Mute test #38949

395f3e1

martijnvg added a commit that referenced this issue Feb 18, 2019

Mute test #38949

4fd1f80

martijnvg added a commit that referenced this issue Feb 18, 2019

Mute test #38949

b6a7e99

martijnvg added a commit that referenced this issue Feb 18, 2019

Mute test #38949

34b3665

dnhatn mentioned this issue Feb 19, 2019

Replay history of operations in remote recovery #39153

Closed

talevy pushed a commit that referenced this issue Feb 20, 2019

[CI] Muting method testFollowIndex in IndexFollowingIT

1c250ae

Relates to #38949

dnhatn closed this as completed in #39006 Feb 25, 2019

jtibshirani mentioned this issue Apr 9, 2019

IndexFollowingIT#testFollowIndex can fail with assertion error. #41037

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] IndexFollowingIT.testFollowIndex failure #38949

[CI] IndexFollowingIT.testFollowIndex failure #38949

iverase commented Feb 15, 2019

elasticmachine commented Feb 15, 2019

martijnvg commented Feb 15, 2019

jpountz commented Feb 15, 2019

dnhatn commented Feb 16, 2019

martijnvg commented Feb 18, 2019

martijnvg commented Feb 18, 2019

[CI] IndexFollowingIT.testFollowIndex failure #38949

[CI] IndexFollowingIT.testFollowIndex failure #38949

Comments

iverase commented Feb 15, 2019

elasticmachine commented Feb 15, 2019

martijnvg commented Feb 15, 2019

jpountz commented Feb 15, 2019

dnhatn commented Feb 16, 2019

martijnvg commented Feb 18, 2019

martijnvg commented Feb 18, 2019