Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncate tlog cli should assign global checkpoint #28192

Merged
merged 4 commits into from
Jan 13, 2018

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Jan 12, 2018

We are targeting to always have a safe index once the recovery is done. This invariant does not hold if the translog is manually truncated by users because the truncate translog cli resets the global checkpoint to unassigned. This commit assigns the global checkpoint to the max_seqno of the last commit when truncating translog. We can only safely do it because the truncate translog command will generate a new history uuid for that shard. With a new history UUID, sequence-based recovery between that shard and other old shards will be disabled.

Relates #28181

We are targeting to always have a safe index once the recovery is done.
This invariant does not hold if the translog is manually truncated by
users because the truncate translog cli resets the global checkpoint to
unassigned. This commit assigns the max_seqno of the last commit to the
global checkpoint when truncating translog.

Relates elastic#28181
@dnhatn dnhatn added >enhancement review :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 v6.2.0 labels Jan 12, 2018
@dnhatn
Copy link
Member Author

dnhatn commented Jan 12, 2018

Recovery from local shards and snapshots can set the global checkpoint to the local checkpoint as this is the only copy of the shard.

The invariant does not hold for recovering from local shards and snapshots. I think we can assign the global checkpoint to max_seqno in these cases.

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a question. Btw - can you change the PR description to note that the only reason why we can safely do this is that we mark the shard with a new history uuid (which disables all seq# guarantees for other histories).

@@ -135,6 +135,13 @@ protected void execute(Terminal terminal, OptionSet options, Environment env) th
commitData = commits.get(commits.size() - 1).getUserData();
String translogGeneration = commitData.get(Translog.TRANSLOG_GENERATION_KEY);
String translogUUID = commitData.get(Translog.TRANSLOG_UUID_KEY);
final long globalCheckpoint;
// In order to have a safe commit invariant, we have to assign max_seqno of the last commit to the global checkpoint.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to do more here to get to a healthy shard - we need to also move the local checkpoint to the max seq # . Otherwise we have the weird state where local checkpoint < global checkpoint. By setting the local checkpoint to the max seq# we declare all missing ops (that were in the translog) as no ops.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also like to double check why no test failed due to the above. We should get a tight grip on when our invariants (gcp <= lcp <= max seq#) kick in.

@dnhatn
Copy link
Member Author

dnhatn commented Jan 12, 2018

@bleskes, I've updated the description and moved the local checkpoint to max_seqno in the truncate translog cli. Could you please take another look.

No test failed because we execute fillSeqNoGaps when recovering from the store. This makes the local checkpoint be in sync the global checkpoint and the max_seqno. The local checkpoint from the last commit can be lower than the global checkpoint if the global checkpoint advanced (eg. the local checkpoint also advanced) but we do not flush.

@@ -214,6 +217,9 @@ public void testCorruptTranslogTruncation() throws Exception {
final RecoveryState replicaRecoveryState = recoveryResponse.shardRecoveryStates().get("test").stream()
.filter(recoveryState -> recoveryState.getPrimary() == false).findFirst().get();
assertThat(replicaRecoveryState.getIndex().toString(), replicaRecoveryState.getIndex().recoveredFileCount(), greaterThan(0));
// Ensure that the global checkpoint is restored from the max seqno of the last commit.
final SeqNoStats seqNoStats = getSeqNoStats("test", 0);
assertThat(seqNoStats.getGlobalCheckpoint(), equalTo(seqNoStats.getMaxSeqNo()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also check the local checkpoint?

@dnhatn
Copy link
Member Author

dnhatn commented Jan 13, 2018

Thanks @bleskes for reviewing.

@dnhatn dnhatn merged commit f2db2a0 into elastic:master Jan 13, 2018
@dnhatn dnhatn deleted the truncate_tlog_cli_set_gcp branch January 13, 2018 00:06
dnhatn added a commit that referenced this pull request Jan 13, 2018
We are targeting to always have a safe index once the recovery is done. 
This invariant does not hold if the translog is manually truncated by 
users because the truncate translog cli resets the global checkpoint to
unassigned. This commit assigns the global checkpoint to the max_seqno
of the last commit when truncating translog. We can only safely do it
because the truncate translog command will generate a new history uuid
for that shard. With a new history UUID, sequence-based recovery between
that shard and other old shards will be disabled.

Relates #28181
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jan 13, 2018
* master:
  TEST: init unassigned gcp in testAcquireIndexCommit
  Replica start peer recovery with safe commit (elastic#28181)
  Truncate tlog cli should assign global checkpoint (elastic#28192)
matarrese added a commit to matarrese/elasticsearch that referenced this pull request Jan 13, 2018
* master: (59 commits)
  Correct backport replica rollback to 6.2 (elastic#28181)
  Backport replica rollback to 6.2 (elastic#28181)
  Rename deleteLocalTranslog to createNewTranslog
  AwaitsFix #testRecoveryAfterPrimaryPromotion
  TEST: init unassigned gcp in testAcquireIndexCommit
  Replica start peer recovery with safe commit (elastic#28181)
  Truncate tlog cli should assign global checkpoint (elastic#28192)
  Fix lock accounting in releasable lock
  Add ability to associate an ID with tasks  (elastic#27764)
  [DOCS] Removed differencies between text and code (elastic#27993)
  text fixes (elastic#28136)
  Update getting-started.asciidoc (elastic#28145)
  [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187)
  Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186)
  Do not keep 5.x commits once having 6.x commits (elastic#28188)
  Rename core module to server (elastic#28180)
  upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183)
  [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads
  Primary send safe commit in file-based recovery (elastic#28038)
  [Docs] Correct response json in rank-eval.asciidoc
  ...
matarrese added a commit to matarrese/elasticsearch that referenced this pull request Jan 15, 2018
* master: (74 commits)
  Update version of TaskInfo header serialization after backport
  TEST: Tightens file-based condition in peer-recovery
  Correct backport replica rollback to 6.2 (elastic#28181)
  Backport replica rollback to 6.2 (elastic#28181)
  Rename deleteLocalTranslog to createNewTranslog
  AwaitsFix #testRecoveryAfterPrimaryPromotion
  TEST: init unassigned gcp in testAcquireIndexCommit
  Replica start peer recovery with safe commit (elastic#28181)
  Truncate tlog cli should assign global checkpoint (elastic#28192)
  Fix lock accounting in releasable lock
  Add ability to associate an ID with tasks  (elastic#27764)
  [DOCS] Removed differencies between text and code (elastic#27993)
  text fixes (elastic#28136)
  Update getting-started.asciidoc (elastic#28145)
  [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187)
  Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186)
  Do not keep 5.x commits once having 6.x commits (elastic#28188)
  Rename core module to server (elastic#28180)
  upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183)
  [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads
  ...
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jan 16, 2018
* compile-with-jdk-9: (56 commits)
  TEST: init unassigned gcp in testAcquireIndexCommit
  Replica start peer recovery with safe commit (elastic#28181)
  Truncate tlog cli should assign global checkpoint (elastic#28192)
  Fix lock accounting in releasable lock
  Add ability to associate an ID with tasks  (elastic#27764)
  [DOCS] Removed differencies between text and code (elastic#27993)
  text fixes (elastic#28136)
  Update getting-started.asciidoc (elastic#28145)
  [Docs] Spelling fix in painless-getting-started.asciidoc (elastic#28187)
  Fixed the cat.health REST test to accept 4ms, not just 4.0ms (elastic#28186)
  Do not keep 5.x commits once having 6.x commits (elastic#28188)
  Rename core module to server (elastic#28180)
  upgraded jna from 4.4.0-1 to 4.5.1 (elastic#28183)
  [TEST] Do not call RandomizedTest.scaledRandomIntBetween from multiple threads
  Primary send safe commit in file-based recovery (elastic#28038)
  [Docs] Correct response json in rank-eval.asciidoc
  Add scroll parameter to _reindex API (elastic#28041)
  Include all sentences smaller than fragment_size in the unified highlighter (elastic#28132)
  Modifies the JavaAPI docs related to AggregationBuilder
  [Docs] Improvements in script-fields.asciidoc (elastic#28174)
  ...
@clintongormley clintongormley added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Translog :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. labels Feb 13, 2018
@jpountz jpountz removed the :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. label Jan 29, 2019
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v6.2.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants