Truncate tlog cli should assign global checkpoint #28192

dnhatn · 2018-01-12T02:56:43Z

We are targeting to always have a safe index once the recovery is done. This invariant does not hold if the translog is manually truncated by users because the truncate translog cli resets the global checkpoint to unassigned. This commit assigns the global checkpoint to the max_seqno of the last commit when truncating translog. We can only safely do it because the truncate translog command will generate a new history uuid for that shard. With a new history UUID, sequence-based recovery between that shard and other old shards will be disabled.

Relates #28181

We are targeting to always have a safe index once the recovery is done. This invariant does not hold if the translog is manually truncated by users because the truncate translog cli resets the global checkpoint to unassigned. This commit assigns the max_seqno of the last commit to the global checkpoint when truncating translog. Relates elastic#28181

dnhatn · 2018-01-12T02:57:32Z

Recovery from local shards and snapshots can set the global checkpoint to the local checkpoint as this is the only copy of the shard.

The invariant does not hold for recovering from local shards and snapshots. I think we can assign the global checkpoint to max_seqno in these cases.

bleskes

I left a question. Btw - can you change the PR description to note that the only reason why we can safely do this is that we mark the shard with a new history uuid (which disables all seq# guarantees for other histories).

bleskes · 2018-01-12T10:09:16Z

server/src/main/java/org/elasticsearch/index/translog/TruncateTranslogCommand.java

@@ -135,6 +135,13 @@ protected void execute(Terminal terminal, OptionSet options, Environment env) th
                commitData = commits.get(commits.size() - 1).getUserData();
                String translogGeneration = commitData.get(Translog.TRANSLOG_GENERATION_KEY);
                String translogUUID = commitData.get(Translog.TRANSLOG_UUID_KEY);
+                final long globalCheckpoint;
+                // In order to have a safe commit invariant, we have to assign max_seqno of the last commit to the global checkpoint.


I think we need to do more here to get to a healthy shard - we need to also move the local checkpoint to the max seq # . Otherwise we have the weird state where local checkpoint < global checkpoint. By setting the local checkpoint to the max seq# we declare all missing ops (that were in the translog) as no ops.

I would also like to double check why no test failed due to the above. We should get a tight grip on when our invariants (gcp <= lcp <= max seq#) kick in.

dnhatn · 2018-01-12T20:53:10Z

@bleskes, I've updated the description and moved the local checkpoint to max_seqno in the truncate translog cli. Could you please take another look.

No test failed because we execute fillSeqNoGaps when recovering from the store. This makes the local checkpoint be in sync the global checkpoint and the max_seqno. The local checkpoint from the last commit can be lower than the global checkpoint if the global checkpoint advanced (eg. the local checkpoint also advanced) but we do not flush.

bleskes · 2018-01-12T21:12:44Z

server/src/test/java/org/elasticsearch/index/translog/TruncateTranslogIT.java

@@ -214,6 +217,9 @@ public void testCorruptTranslogTruncation() throws Exception {
        final RecoveryState replicaRecoveryState = recoveryResponse.shardRecoveryStates().get("test").stream()
            .filter(recoveryState -> recoveryState.getPrimary() == false).findFirst().get();
        assertThat(replicaRecoveryState.getIndex().toString(), replicaRecoveryState.getIndex().recoveredFileCount(), greaterThan(0));
+        // Ensure that the global checkpoint is restored from the max seqno of the last commit.
+        final SeqNoStats seqNoStats = getSeqNoStats("test", 0);
+        assertThat(seqNoStats.getGlobalCheckpoint(), equalTo(seqNoStats.getMaxSeqNo()));


also check the local checkpoint?

dnhatn · 2018-01-13T00:06:00Z

Thanks @bleskes for reviewing.

We are targeting to always have a safe index once the recovery is done. This invariant does not hold if the translog is manually truncated by users because the truncate translog cli resets the global checkpoint to unassigned. This commit assigns the global checkpoint to the max_seqno of the last commit when truncating translog. We can only safely do it because the truncate translog command will generate a new history uuid for that shard. With a new history UUID, sequence-based recovery between that shard and other old shards will be disabled. Relates #28181

* master: TEST: init unassigned gcp in testAcquireIndexCommit Replica start peer recovery with safe commit (elastic#28181) Truncate tlog cli should assign global checkpoint (elastic#28192)

* master: (59 commits) Correct backport replica rollback to 6.2 (elastic#28181) Backport replica rollback to 6.2 (elastic#28181) Rename deleteLocalTranslog to createNewTranslog AwaitsFix #testRecoveryAfterPrimaryPromotion TEST: init unassigned gcp in testAcquireIndexCommit Replica start peer recovery with safe commit (elastic#28181) Truncate tlog cli should assign global checkpoint (elastic#28192) Fix lock accounting in releasable lock Add ability to associate an ID with tasks (elastic#27764) [DOCS] Removed differencies between text and code (elastic#27993) text fixes (elastic#28136) Update getting-started.asciidoc (elastic#28145) [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187) Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186) Do not keep 5.x commits once having 6.x commits (elastic#28188) Rename core module to server (elastic#28180) upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183) [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads Primary send safe commit in file-based recovery (elastic#28038) [Docs] Correct response json in rank-eval.asciidoc ...

* master: (74 commits) Update version of TaskInfo header serialization after backport TEST: Tightens file-based condition in peer-recovery Correct backport replica rollback to 6.2 (elastic#28181) Backport replica rollback to 6.2 (elastic#28181) Rename deleteLocalTranslog to createNewTranslog AwaitsFix #testRecoveryAfterPrimaryPromotion TEST: init unassigned gcp in testAcquireIndexCommit Replica start peer recovery with safe commit (elastic#28181) Truncate tlog cli should assign global checkpoint (elastic#28192) Fix lock accounting in releasable lock Add ability to associate an ID with tasks (elastic#27764) [DOCS] Removed differencies between text and code (elastic#27993) text fixes (elastic#28136) Update getting-started.asciidoc (elastic#28145) [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187) Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186) Do not keep 5.x commits once having 6.x commits (elastic#28188) Rename core module to server (elastic#28180) upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183) [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads ...

* compile-with-jdk-9: (56 commits) TEST: init unassigned gcp in testAcquireIndexCommit Replica start peer recovery with safe commit (elastic#28181) Truncate tlog cli should assign global checkpoint (elastic#28192) Fix lock accounting in releasable lock Add ability to associate an ID with tasks (elastic#27764) [DOCS] Removed differencies between text and code (elastic#27993) text fixes (elastic#28136) Update getting-started.asciidoc (elastic#28145) [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187) Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186) Do not keep 5.x commits once having 6.x commits (elastic#28188) Rename core module to server (elastic#28180) upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183) [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads Primary send safe commit in file-based recovery (elastic#28038) [Docs] Correct response json in rank-eval.asciidoc Add scroll parameter to _reindex API (elastic#28041) Include all sentences smaller than fragment_size in the unified highlighter (elastic#28132) Modifies the JavaAPI docs related to AggregationBuilder [Docs] Improvements in script-fields.asciidoc (elastic#28174) ...

dnhatn added >enhancement review :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 v6.2.0 labels Jan 12, 2018

dnhatn requested review from bleskes and ywelsch January 12, 2018 02:56

Test only the latest values

f6aa36b

bleskes suggested changes Jan 12, 2018

View reviewed changes

Also advances the local checkpoint

817eb34

bleskes approved these changes Jan 12, 2018

View reviewed changes

Asserts the local checkpoint

2549442

dnhatn merged commit f2db2a0 into elastic:master Jan 13, 2018

dnhatn deleted the truncate_tlog_cli_set_gcp branch January 13, 2018 00:06

dnhatn added the backport pending label Jan 13, 2018

dnhatn removed the backport pending label Jan 13, 2018

jpountz removed the :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. label Jan 29, 2019

jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Truncate tlog cli should assign global checkpoint #28192

Truncate tlog cli should assign global checkpoint #28192

dnhatn commented Jan 12, 2018 •

edited

Loading

dnhatn commented Jan 12, 2018

bleskes left a comment

bleskes Jan 12, 2018

bleskes Jan 12, 2018

dnhatn commented Jan 12, 2018

bleskes Jan 12, 2018

dnhatn commented Jan 13, 2018

Truncate tlog cli should assign global checkpoint #28192

Truncate tlog cli should assign global checkpoint #28192

Conversation

dnhatn commented Jan 12, 2018 • edited Loading

dnhatn commented Jan 12, 2018

bleskes left a comment

Choose a reason for hiding this comment

bleskes Jan 12, 2018

Choose a reason for hiding this comment

bleskes Jan 12, 2018

Choose a reason for hiding this comment

dnhatn commented Jan 12, 2018

bleskes Jan 12, 2018

Choose a reason for hiding this comment

dnhatn commented Jan 13, 2018

dnhatn commented Jan 12, 2018 •

edited

Loading