Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry fetching follower global checkpoint when it fails. #34019

Merged
merged 3 commits into from
Sep 28, 2018

Conversation

martijnvg
Copy link
Member

Closes #34016

@martijnvg martijnvg added review :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features labels Sep 24, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@@ -374,7 +374,7 @@ static long computeDelay(int currentRetry, long maxRetryDelayInMillis) {
return Math.min(backOffDelay, maxRetryDelayInMillis);
}

private static boolean shouldRetry(Exception e) {
static boolean shouldRetry(Exception e) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe fetchGlobalCheckpoint() should be done in ShardFollowNodeTask like we update mapping or fetch document from shard changes api then we don't need to make to method package protected and complete reuse the retry logic that exists in ShardFollowNodeTask?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the fence on this one. I understand your reasons but I feel that ShardFollowNodeTask is complicated enough so we shouldn't move things in there unless strictly needed. As far as I can tell, the amount of code stays the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I'm still undecided. Lets leave it this way.

@martijnvg
Copy link
Member Author

run the java11 tests

@jasontedor
Copy link
Member

@elasticmachine run gradle build tests

2 similar comments
@jasontedor
Copy link
Member

@elasticmachine run gradle build tests

@jasontedor
Copy link
Member

@elasticmachine run gradle build tests

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Production code LGTM. Is there any way we can test this?

@@ -374,7 +374,7 @@ static long computeDelay(int currentRetry, long maxRetryDelayInMillis) {
return Math.min(backOffDelay, maxRetryDelayInMillis);
}

private static boolean shouldRetry(Exception e) {
static boolean shouldRetry(Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the fence on this one. I understand your reasons but I feel that ShardFollowNodeTask is complicated enough so we shouldn't move things in there unless strictly needed. As far as I can tell, the amount of code stays the same.

@martijnvg
Copy link
Member Author

Is there any way we can test this?

I think when ShardChangesIT#testFollowIndexAndCloseNode() is unmuted this should trigger conditions that allow the fetching of global checkpoints to be retried.

@martijnvg martijnvg merged commit 506c1c2 into elastic:master Sep 28, 2018
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Sep 28, 2018
* master:
  Use more precise does S3 bucket exist method (elastic#34123)
  LLREST: Introduce a strict mode (elastic#33708)
  [CCR] Adjust list retryable errors (elastic#33985)
  Fix AggregationFactories.Builder equality and hash regarding order (elastic#34005)
  MINOR: Remove some deadcode in NodeEnv and Related (elastic#34133)
  Rest-Api-Spec: Correct spelling in filter_path description (elastic#33154)
  Core: Don't rely on java time for epoch seconds formatting (elastic#34086)
  Retry errors when fetching follower global checkpoint. (elastic#34019)
  Watcher: Reenable watcher stats REST tests (elastic#34107)
  Remove special-casing of Synonym filters in AnalysisRegistry (elastic#34034)
  Rename CCR APIs (elastic#34027)
  Fixed CCR stats api serialization issues and (elastic#33983)
  Support 'string'-style queries on metadata fields when reasonable. (elastic#34089)
  Logging: Drop Settings from security logger get calls (elastic#33940)
  SQL: Internal refactoring of operators as functions (elastic#34097)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CCR Issues around the Cross Cluster State Replication features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants