-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Truncate tlog cli should assign global checkpoint #28192
Conversation
We are targeting to always have a safe index once the recovery is done. This invariant does not hold if the translog is manually truncated by users because the truncate translog cli resets the global checkpoint to unassigned. This commit assigns the max_seqno of the last commit to the global checkpoint when truncating translog. Relates elastic#28181
The invariant does not hold for recovering from local shards and snapshots. I think we can assign the global checkpoint to max_seqno in these cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a question. Btw - can you change the PR description to note that the only reason why we can safely do this is that we mark the shard with a new history uuid (which disables all seq# guarantees for other histories).
@@ -135,6 +135,13 @@ protected void execute(Terminal terminal, OptionSet options, Environment env) th | |||
commitData = commits.get(commits.size() - 1).getUserData(); | |||
String translogGeneration = commitData.get(Translog.TRANSLOG_GENERATION_KEY); | |||
String translogUUID = commitData.get(Translog.TRANSLOG_UUID_KEY); | |||
final long globalCheckpoint; | |||
// In order to have a safe commit invariant, we have to assign max_seqno of the last commit to the global checkpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to do more here to get to a healthy shard - we need to also move the local checkpoint to the max seq # . Otherwise we have the weird state where local checkpoint < global checkpoint. By setting the local checkpoint to the max seq# we declare all missing ops (that were in the translog) as no ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also like to double check why no test failed due to the above. We should get a tight grip on when our invariants (gcp <= lcp <= max seq#) kick in.
@bleskes, I've updated the description and moved the local checkpoint to max_seqno in the truncate translog cli. Could you please take another look. No test failed because we execute |
@@ -214,6 +217,9 @@ public void testCorruptTranslogTruncation() throws Exception { | |||
final RecoveryState replicaRecoveryState = recoveryResponse.shardRecoveryStates().get("test").stream() | |||
.filter(recoveryState -> recoveryState.getPrimary() == false).findFirst().get(); | |||
assertThat(replicaRecoveryState.getIndex().toString(), replicaRecoveryState.getIndex().recoveredFileCount(), greaterThan(0)); | |||
// Ensure that the global checkpoint is restored from the max seqno of the last commit. | |||
final SeqNoStats seqNoStats = getSeqNoStats("test", 0); | |||
assertThat(seqNoStats.getGlobalCheckpoint(), equalTo(seqNoStats.getMaxSeqNo())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also check the local checkpoint?
Thanks @bleskes for reviewing. |
We are targeting to always have a safe index once the recovery is done. This invariant does not hold if the translog is manually truncated by users because the truncate translog cli resets the global checkpoint to unassigned. This commit assigns the global checkpoint to the max_seqno of the last commit when truncating translog. We can only safely do it because the truncate translog command will generate a new history uuid for that shard. With a new history UUID, sequence-based recovery between that shard and other old shards will be disabled. Relates #28181
* master: TEST: init unassigned gcp in testAcquireIndexCommit Replica start peer recovery with safe commit (elastic#28181) Truncate tlog cli should assign global checkpoint (elastic#28192)
* master: (59 commits) Correct backport replica rollback to 6.2 (elastic#28181) Backport replica rollback to 6.2 (elastic#28181) Rename deleteLocalTranslog to createNewTranslog AwaitsFix #testRecoveryAfterPrimaryPromotion TEST: init unassigned gcp in testAcquireIndexCommit Replica start peer recovery with safe commit (elastic#28181) Truncate tlog cli should assign global checkpoint (elastic#28192) Fix lock accounting in releasable lock Add ability to associate an ID with tasks (elastic#27764) [DOCS] Removed differencies between text and code (elastic#27993) text fixes (elastic#28136) Update getting-started.asciidoc (elastic#28145) [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187) Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186) Do not keep 5.x commits once having 6.x commits (elastic#28188) Rename core module to server (elastic#28180) upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183) [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads Primary send safe commit in file-based recovery (elastic#28038) [Docs] Correct response json in rank-eval.asciidoc ...
* master: (74 commits) Update version of TaskInfo header serialization after backport TEST: Tightens file-based condition in peer-recovery Correct backport replica rollback to 6.2 (elastic#28181) Backport replica rollback to 6.2 (elastic#28181) Rename deleteLocalTranslog to createNewTranslog AwaitsFix #testRecoveryAfterPrimaryPromotion TEST: init unassigned gcp in testAcquireIndexCommit Replica start peer recovery with safe commit (elastic#28181) Truncate tlog cli should assign global checkpoint (elastic#28192) Fix lock accounting in releasable lock Add ability to associate an ID with tasks (elastic#27764) [DOCS] Removed differencies between text and code (elastic#27993) text fixes (elastic#28136) Update getting-started.asciidoc (elastic#28145) [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187) Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186) Do not keep 5.x commits once having 6.x commits (elastic#28188) Rename core module to server (elastic#28180) upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183) [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads ...
* compile-with-jdk-9: (56 commits) TEST: init unassigned gcp in testAcquireIndexCommit Replica start peer recovery with safe commit (elastic#28181) Truncate tlog cli should assign global checkpoint (elastic#28192) Fix lock accounting in releasable lock Add ability to associate an ID with tasks (elastic#27764) [DOCS] Removed differencies between text and code (elastic#27993) text fixes (elastic#28136) Update getting-started.asciidoc (elastic#28145) [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187) Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186) Do not keep 5.x commits once having 6.x commits (elastic#28188) Rename core module to server (elastic#28180) upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183) [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads Primary send safe commit in file-based recovery (elastic#28038) [Docs] Correct response json in rank-eval.asciidoc Add scroll parameter to _reindex API (elastic#28041) Include all sentences smaller than fragment_size in the unified highlighter (elastic#28132) Modifies the JavaAPI docs related to AggregationBuilder [Docs] Improvements in script-fields.asciidoc (elastic#28174) ...
We are targeting to always have a safe index once the recovery is done. This invariant does not hold if the translog is manually truncated by users because the truncate translog cli resets the global checkpoint to unassigned. This commit assigns the global checkpoint to the max_seqno of the last commit when truncating translog. We can only safely do it because the truncate translog command will generate a new history uuid for that shard. With a new history UUID, sequence-based recovery between that shard and other old shards will be disabled.
Relates #28181