New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Flush old indices on primary promotion and relocation #27580

Merged

bleskes merged 29 commits into elastic:6.0 from bleskes:recovery_mixed_seqno

Nov 30, 2017

Contributor

bleskes commented Nov 29, 2017

During a recovery the target shard may process both new indexing operations and old ones concurrently. When the primary is on a 6.0 node, the new indexing operations are guaranteed to have sequence numbers but we don't have that guarantee for the old operations as they may come from a period when the primary was on a pre 6.0 node. Have this mixture of old and new is something we do not support and it triggers exceptions.

This PR adds a flush on primary promotion and primary relocations to make sure that any recoveries from a primary on a 6.0 will be guaranteed to only need operations with sequence numbers. A recovery from store already flushes when we start the engine if there were any ops in the translog.

With this extra flushes in place we can now actively filter out operations that have no sequence numbers during recovery. Since filtering out operations is risky, I have opted to harden the logic in the recovery source handler to verify that all operations in the required sequence number range (from the local checkpoint in the commit onwards) are not missed. This comes at an extra complexity for this PR but I think it's worth it.

Finally I added two tests that reproduce the problems.

Closes #27536

PS. I still need to add unit tests but since there's some time pressure on this one I think we can start reviewing.

bleskes added 15 commits

November 28, 2017 13:36


          add testRecoveryWithConcurrentIndexing

41d27c8


          fix for promotion

e99de20


          relax. You can't guarantee what you want

277c87a


          assert we ship what we want to ship

d77e00a


          verify we ship the right ops

af3f815


          logging

04cbccf

doh

eb3bd72


          lint

0a75735


          more intuitive range indication

4c4a1bc


          fix testSendSnapshotSendsOps

a29acf2


          add primary relocation test

895b78b


          index specific ensure green


          fix counts

b77d4e8


          tighten testRelocationWithConcurrentIndexing

22d5bfa


          flush on relocation

b2082b0

bleskes added :Distributed Indexing/Recovery :Sequence IDs >bug v6.0.1 v6.1.0 labels

bleskes requested review from ywelsch and dnhatn

November 29, 2017 10:12


          simplify relation ship between flush and roll

ywelsch added the v6.2.0 label


          add explicit index names to health check

10fbb3e

ywelsch suggested changes

View reviewed changes

Contributor

ywelsch left a comment

Overall change LGTM, I've made some suggestions around code structure though.

core/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java Outdated

+                              // but we must have everything above the local checkpoint in the commit
+                              requiredSeqNoRangeStart =
+                                  Long.parseLong(phase1Snapshot.getIndexCommit().getUserData().get(SequenceNumbers.LOCAL_CHECKPOINT_KEY)) + 1;
+                              assert requiredSeqNoRangeStart >= 0 :

Contributor

ywelsch Nov 29, 2017

I would instead add the following two assertions after the if (isSequenceNumberBasedRecoveryPossible) { ... } else { ... } as that's what we actually want to hold for both branches:

assert startingSeqNo >= 0;
assert requiredSeqNoRangeStart >= start;

Contributor Author

bleskes Nov 29, 2017

sure

core/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java Outdated

                       });
                   }
+                  private long determineEndingSeqNo() {

Contributor

ywelsch Nov 29, 2017

I don't like the name of this method. I would prefer to not have a separate method, and just have:

final long endingSeqNo = shard.seqNoStats().getMaxSeqNo();
cancellableThreads.execute(() -> shard.waitForOpsToComplete(endingSeqNo));

and then use endingSeqNo as an inclusive bound instead of an exclusive one in the remaining calculations.

core/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java Outdated

                    *
                    * @param startingSeqNo the sequence number to start recovery from, or {@link SequenceNumbers#UNASSIGNED_SEQ_NO} if all
                    *                      ops should be sent
+                   * @param requiredSeqNoRangeStart the lower sequence number of the required range (ending with endingSeqNo)
+                   * @param endingSeqNo   the highest sequence number that should be sent

Contributor

ywelsch Nov 29, 2017

here it's defined as inclusive, below in the Javadocs of finalizeRecovery, it's defined as exclusive...
Let's use the inclusive version.

core/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java

                       int ops = 0;
                       long size = 0;
                       int skippedOps = 0;
                       int totalSentOps = 0;
                       final AtomicLong targetLocalCheckpoint = new AtomicLong(SequenceNumbers.UNASSIGNED_SEQ_NO);
                       final List<Translog.Operation> operations = new ArrayList<>();
+                      final LocalCheckpointTracker requiredOpsTracker = new LocalCheckpointTracker(endingSeqNo, requiredSeqNoRangeStart - 1);

Contributor

ywelsch Nov 29, 2017

If endingSeqNo is exclusive, then this should probably be endingSeqNo - 1. I know that it does not really matter as we only use the markSeqNoAsCompleted method, we might as well initialize this to
new LocalCheckpointTracker(requiredSeqNoRangeStart - 1, requiredSeqNoRangeStart - 1), but yeah, let's use inclusive bounds for endingSeqNo ;-)

core/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java Outdated

                           cancellableThreads.executeIO(sendBatch);
                       }
+                      if (requiredOpsTracker.getCheckpoint() < endingSeqNo - 1) {

Contributor

ywelsch Nov 29, 2017

again having to use -1 here, easier to have endingSeqNo be inclusive.

qa/rolling-upgrade/src/test/resources/rest-api-spec/test/upgraded_cluster/10_basic.yml Outdated

@@ @@ -7,6 +7,7 @@ @@
                       # wait for long enough that we give delayed unassigned shards to stop being delayed
                       timeout: 70s
                       level: shards
+                      index: test_index, index_with_replicas, multi_type_index

Contributor

ywelsch Nov 29, 2017

why do we need to wait here? Why not directly in the corresponding tests? This makes for example the RecoveryIT test dependent on a specific yml file, which is ugly IMO.

Contributor

ywelsch Nov 29, 2017

I misread, ignore this comment.

bleskes added 3 commits

November 29, 2017 14:24


          beef up testSendSnapshotSendsOps

fb8a105


          fix testWaitForPendingSeqNo

c1a0cc7


          feedback

e8f65f3

Contributor Author

bleskes commented Nov 29, 2017

thx @ywelsch . I addressed your feedback. can you take another look?

bleskes added 4 commits

November 29, 2017 23:32


          fix testSendSnapshotSendsOps as we always send at least one (potentia…

c09e419

…lly empty) batch


          extra space?

53869b3


          add empty shard test


          make sure seq no info is in commit if recovering an old index

09f7133

Contributor Author

bleskes commented Nov 30, 2017

@ywelsch I run into another edge case. Can you please take another look at the last few commits?

ywelsch approved these changes

View reviewed changes

qa/full-cluster-restart/src/test/java/org/elasticsearch/upgrades/FullClusterRestartIT.java Outdated

+                   * Tests that a single empty shard index is correctly recovered. Empty shards are often an edge case.
+                   */
+                  public void testEmptyShard() throws IOException {
+                      final String index = "test_empty_hard";

Contributor

ywelsch Nov 30, 2017

hard -> shard

bleskes added 2 commits

November 30, 2017 10:27


          add assertions that commit point in store always has sequence numbers…

1501d4a

… info once recovery is done.


          hard -> shard

e5a734c

ywelsch approved these changes

View reviewed changes

Contributor

ywelsch left a comment

LGTM. Thanks @bleskes

bleskes merged commit 0ee31b5 into elastic:6.0

Contributor Author

bleskes commented Nov 30, 2017

Thanks @ywelsch & @dnhatn

bleskes deleted the recovery_mixed_seqno branch

November 30, 2017 10:35

bleskes added a commit that referenced this pull request


          Flush old indices on primary promotion and relocation (#27580)

cd9fde0

During a recovery the target shard may process both new indexing operations and old ones concurrently. When the primary is on a 6.0 node, the new indexing operations are guaranteed to have sequence numbers but we don't have that guarantee for the old operations as they may come from a period when the primary was on a pre 6.0 node. Have this mixture of old and new is something we do not support and it triggers exceptions.

This PR adds a flush on primary promotion and primary relocations to make sure that any recoveries from a primary on a 6.0 will be guaranteed to only need operations with sequence numbers. A recovery from store already flushes when we start the engine if there were any ops in the translog.

With this extra flushes in place we can now actively filter out operations that have no sequence numbers during recovery. Since filtering out operations is risky, I have opted to harden the logic in the recovery source handler to verify that all operations in the required sequence number range (from the local checkpoint in the commit onwards) are not missed. This comes at an extra complexity for this PR but I think it's worth it.

Finally I added two tests that reproduce the problems.

Closes #27536

bleskes added a commit that referenced this pull request


          Flush old indices on primary promotion and relocation (#27580)

During a recovery the target shard may process both new indexing operations and old ones concurrently. When the primary is on a 6.0 node, the new indexing operations are guaranteed to have sequence numbers but we don't have that guarantee for the old operations as they may come from a period when the primary was on a pre 6.0 node. Have this mixture of old and new is something we do not support and it triggers exceptions.

This PR adds a flush on primary promotion and primary relocations to make sure that any recoveries from a primary on a 6.0 will be guaranteed to only need operations with sequence numbers. A recovery from store already flushes when we start the engine if there were any ops in the translog.

With this extra flushes in place we can now actively filter out operations that have no sequence numbers during recovery. Since filtering out operations is risky, I have opted to harden the logic in the recovery source handler to verify that all operations in the required sequence number range (from the local checkpoint in the commit onwards) are not missed. This comes at an extra complexity for this PR but I think it's worth it.

Finally I added two tests that reproduce the problems.

Closes #27536s

bleskes added a commit to bleskes/elasticsearch that referenced this pull request


          elastic#27580 the good parts (non bwc)

f046603

bleskes added a commit that referenced this pull request


          adapt testWaitForPendingSeqNo to stricter operation recovery range

736703a

Before we use to ship anything in the translog above a certain point. #27580 changed to have a strict upper bound.

bleskes added a commit that referenced this pull request


          adapt testWaitForPendingSeqNo to stricter operation recovery range

54114ef

Before we use to ship anything in the translog above a certain point. #27580 changed to have a strict upper bound.

bleskes added a commit that referenced this pull request


          adapt testWaitForPendingSeqNo to stricter operation recovery range

601be4a

Before we use to ship anything in the translog above a certain point. #27580 changed to have a strict upper bound.

bleskes added a commit that referenced this pull request


          Cherry pick tests and seqNo recovery hardning from #27580

1a976ea

bleskes added a commit that referenced this pull request


          adapt testWaitForPendingSeqNo to stricter operation recovery range

2900e3f

Before we use to ship anything in the translog above a certain point. #27580 changed to have a strict upper bound.

bleskes added a commit that referenced this pull request


          IndexShard - only flush on primary activation if it's a relocation ta…

a16d5c5

…rget from an old node

#27580 added extra flushes when a shard transitions to primary to make sure that we never replay translog ops without seq# during recovery. The current logic causes an extra flush when a primary starts when it's recovering from the store. This is not needed as we also flush in the engine itself (to add sequence numbers info into the commit). This double flushing confuses tests and is unneeded.

Fixes #27649

bleskes added a commit that referenced this pull request


          IndexShard - only flush on primary activation if it's a relocation ta…

e8c34c2

…rget from an old node

#27580 added extra flushes when a shard transitions to primary to make sure that we never replay translog ops without seq# during recovery. The current logic causes an extra flush when a primary starts when it's recovering from the store. This is not needed as we also flush in the engine itself (to add sequence numbers info into the commit). This double flushing confuses tests and is unneeded.

Fixes #27649

bleskes added a commit that referenced this pull request


          IndexShard - only flush on primary activation if it's a relocation ta…

9ba66d8

…rget from an old node

#27580 added extra flushes when a shard transitions to primary to make sure that we never replay translog ops without seq# during recovery. The current logic causes an extra flush when a primary starts when it's recovering from the store. This is not needed as we also flush in the engine itself (to add sequence numbers info into the commit). This double flushing confuses tests and is unneeded.

Fixes #27649

# Conflicts:
#	core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

dnhatn mentioned this pull request

[TEST] org.elasticsearch.upgrades.RecoveryIT fails intermittently on 6.x #26769

Closed

clintongormley added :Distributed Indexing/Engine and removed :Sequence IDs labels

dnhatn mentioned this pull request

Do not wait for advancement of checkpoint in recovery #39006

Merged

ywelsch mentioned this pull request

Replica recovery on follower can fail after recovery from remote #39000

Closed

dnhatn mentioned this pull request

Resync should not send operations without sequence number #40433

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed Indexing/Engine :Distributed Indexing/Recovery v6.0.1 v6.1.0 v6.2.0