Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce retention lease state file #39004

Merged
merged 27 commits into from
Feb 18, 2019

Conversation

jasontedor
Copy link
Member

@jasontedor jasontedor commented Feb 16, 2019

This commit moves retention leases from being persisted in the Lucene commit point to being persisted in a dedicated state file.

Relates #37165
Closes #38588
Closes #39032

This commit moves retention leases from being persisted in the Lucene
commit point to being persisted in a dedicated state file.
@jasontedor jasontedor added >enhancement v7.0.0 :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. v6.7.0 v8.0.0 v7.2.0 labels Feb 16, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@jasontedor jasontedor mentioned this pull request Feb 16, 2019
24 tasks
@jasontedor
Copy link
Member Author

@elasticmachine run elasticsearch-ci/oss-distro-docs

@@ -278,7 +278,6 @@ public void testCcrAndIlmWithRollover() throws Exception {
}
}

@AwaitsFix(bugUrl = "https://github.com/elastic/elasticsearch/issues/37165")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason that we can remove this awaits fix is because the test is asserting that the shrink action is performed on the follower. The shrink action is prevented from being performed on the follower is any shard history retention leases exist on the follower with source ccr. The reason that these leases exist on the follower is because we copied them over from the commit point during recovery from remote! With this PR, this will no longer happen and this test can pass. 😌

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has helped me with my flushing woes greatly. I left a couple of questions about tests.

@@ -304,76 +289,6 @@ public void testRetentionLeaseStats() throws IOException {
}
}

public void testRecoverFromStoreReserveRetentionLeases() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems to assert something useful, namely that reinitShard() gets the same retention leases back again. Can we keep it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that reinitShard is a test-only method. However, I agree it's useful to have a test that we load the expected retention leases when we recover from store, even if we are not storing them in the commit point any longer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that testPersistence already tests this. I think that we can remove this test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to remove it.

@@ -307,6 +307,7 @@ public void finalizeRecovery(final long globalCheckpoint, ActionListener<Void> l
indexShard.updateGlobalCheckpointOnReplica(globalCheckpoint, "finalizing recovery");
// Persist the global checkpoint.
indexShard.sync();
indexShard.persistRetentionLeases();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test that shows that we need to do this here? I deleted this line and :server:unittest and :server:integtest both still passed (once). Is this expected?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 162cf77.

…ate-file

* elastic/master:
  Remove tests and branches that will never execute (elastic#38772)
  also check ccr stats api return empty response in ensureNoCcrTasks()
  Add overlapping, before, after filters to intervals query (elastic#38999)
  Mute test elastic#38949
  Add remote recovery to ShardFollowTaskReplicationTests (elastic#39007)
  [ML] More advanced post-test cleanup of ML indices (elastic#39049)
  wait for shard to be allocated before executing a resume follow api
  Update track-total-hits.asciidoc
  Force kill testcluster nodes (elastic#37353)
  Make pullFixture a task dependency of resolveAllDependencies (elastic#38956)
  set minimum supported version (elastic#39043)
  Enforce Completion Context Limit (elastic#38675)
  Mute test
  Don't close caches while there might still be in-flight requests. (elastic#38958)
  Fix elastic#38623 remove xpack namespace REST API (elastic#38625)
  Add data frame feature (elastic#38934)
  Test bi-directional index following during a rolling upgrade. (elastic#38962)
  Generate mvn pom for ssl-config library (elastic#39019)
  Mute testRetentionLeaseIsRenewedDuringRecovery
@jasontedor
Copy link
Member Author

@DaveCTurner Thanks for looking. I have responded to your feedback.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments but this is really good already.

@jasontedor jasontedor requested a review from dnhatn February 18, 2019 16:58
@jasontedor
Copy link
Member Author

Thanks @dnhatn, I have responded to your feedback.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a small comment.

@jasontedor jasontedor requested a review from dnhatn February 18, 2019 20:01
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@jasontedor jasontedor merged commit 331ef9d into elastic:master Feb 18, 2019
jasontedor added a commit that referenced this pull request Feb 18, 2019
This commit moves retention leases from being persisted in the Lucene
commit point to being persisted in a dedicated state file.
@jasontedor jasontedor deleted the retention-lease-state-file branch February 18, 2019 22:13
jasontedor added a commit that referenced this pull request Feb 18, 2019
This commit moves retention leases from being persisted in the Lucene
commit point to being persisted in a dedicated state file.
jasontedor added a commit that referenced this pull request Feb 18, 2019
This commit moves retention leases from being persisted in the Lucene
commit point to being persisted in a dedicated state file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. >enhancement v6.7.0 v7.0.0-rc2 v7.2.0 v8.0.0-alpha1
Projects
None yet
5 participants