-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Segment Replication] Update RefreshPolicy.WAIT_UNTIL for replica shards with segment replication enabled to wait for replica refresh #6464
[Segment Replication] Update RefreshPolicy.WAIT_UNTIL for replica shards with segment replication enabled to wait for replica refresh #6464
Conversation
Signed-off-by: Rishikesh1159 <[email protected]>
@@ -244,6 +244,7 @@ public void onReplicationDone(SegmentReplicationState state) { | |||
runnable.run(); | |||
} | |||
} | |||
replicaShard.refresh("replication complete refresh"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Rishikesh1159 Why is this extra refresh required? The reader should refresh before replication is marked completed.
@@ -822,8 +823,9 @@ public static Translog.Location performOnReplica(BulkShardRequest request, Index | |||
} | |||
assert operationResult != null : "operation result must never be null when primary response has no failure"; | |||
location = syncOperationResultOrThrow(operationResult, location); | |||
maxSeqNo = response.getResponse().getSeqNo(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to keep track of ongoing max here as we iterate all the items, this will overwrite the value with each iteration. The last item in the list is not guaranteed to have the highest seqNo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. I will keep track of maxSeqNo here
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Rishikesh1159 <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Rishikesh1159 <[email protected]>
@Rishikesh1159 This is looking almost ready to me, some minor nits & comment. Two things I think we need to follow up with here before this is safe to use with SR:
|
Signed-off-by: Rishikesh1159 <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
…orce refresh. Signed-off-by: Rishikesh1159 <[email protected]>
Signed-off-by: Rishikesh1159 <[email protected]>
@mch2 I checked the behaviour. Looks like in case of failover scenario every replica in the replication group gets a primary term bump. Inorder for primary term bump to complete on replica shards, the replica shard must release/fire all it's listeners, if listeners are not fired the replica will be stuck in blocking state and fail after 30min. Even if newly promoted primary shard sends new checkpoints to replicas, as the replicas are in blocking state they will not be able to receive a checkpoint. So, the only way out for us here is to release/fire all listeners during a primary term bump on replica. We are doing our best effort here to give read after write consistency with wait_until requests. But this is not always guaranteed with wait_until requests, docs are not searchable immediately in case of segment replication. This is the comprimise we have to make with our current architecture. With segment replication for wait_until we can only give read(only on primary shard) after write guarantee.
From the behaviour mentioned above, we limit the listeners that a replica shard can hold. Limit will be same as primary shards limit which is set in index settings. We are putting this limit to avoid a situation where shard can go down with more and more listeners piling on replicas with no limit/cap. With current implementation of segment replication we will break the wait_unti's read after write guarantee. We will be able to provide only read (only on primary) after write guarantee. As you said to give guarantee we might need Streaming Indexing API. |
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Rishikesh1159 <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Rishikesh1159 <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
@@ -118,7 +118,8 @@ public void setupListeners() throws Exception { | |||
() -> engine.refresh("too-many-listeners"), | |||
logger, | |||
threadPool.getThreadContext(), | |||
refreshMetric | |||
refreshMetric, | |||
this::returnSeqNo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - () -> 10L instead of creating a function here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Rishikesh1159 this LGTM. Lets please open an issue to document the changes to wait_until with SR enabled.
Signed-off-by: Rishikesh1159 <[email protected]>
Signed-off-by: Rishikesh1159 <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Rishikesh1159 <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
…rds with segment replication enabled to wait for replica refresh (#6464) * Initial draft PR for wait_until with segrep Signed-off-by: Rishikesh1159 <[email protected]> * Refactor code and fix test failures. Signed-off-by: Rishikesh1159 <[email protected]> * add comments and fix tests. Signed-off-by: Rishikesh1159 <[email protected]> * Refactor code, address comments and fix test failures. Signed-off-by: Rishikesh1159 <[email protected]> * Aplly spotless check Signed-off-by: Rishikesh1159 <[email protected]> * Adress comments and add integ test. Signed-off-by: Rishikesh1159 <[email protected]> * Address comments and fix failing tests. Signed-off-by: Rishikesh1159 <[email protected]> * Fixing failing test. Signed-off-by: Rishikesh1159 <[email protected]> * Remove unused code. Signed-off-by: Rishikesh1159 <[email protected]> * Addressing comments and refactoring Signed-off-by: Rishikesh1159 <[email protected]> * Adding max refreshlisteners limit that a replica shard can hold and force refresh. Signed-off-by: Rishikesh1159 <[email protected]> * Changing assert message Signed-off-by: Rishikesh1159 <[email protected]> * Fix call to release refresh listeners on replica shards. Signed-off-by: Rishikesh1159 <[email protected]> * Fix call to release refresh listeners on replica shards. Signed-off-by: Rishikesh1159 <[email protected]> * Address comments. Signed-off-by: Rishikesh1159 <[email protected]> * Fixing compile errors. Signed-off-by: Rishikesh1159 <[email protected]> * Spoltss Apply Signed-off-by: Rishikesh1159 <[email protected]> --------- Signed-off-by: Rishikesh1159 <[email protected]> (cherry picked from commit e8a4210) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…lica shards with segment replication enabled to wait for replica refresh (opensearch-project#6464)" This reverts commit e8a4210.
…lica shards with segment replication enabled to wait for replica refresh (opensearch-project#6464)" This reverts commit e8a4210. Signed-off-by: Suraj Singh <[email protected]>
#6622) * Revert "[Segment Replication] Update RefreshPolicy.WAIT_UNTIL for replica shards with segment replication enabled to wait for replica refresh (#6464)" This reverts commit e8a4210. Signed-off-by: Suraj Singh <[email protected]> * Add missing import statement Signed-off-by: Suraj Singh <[email protected]> --------- Signed-off-by: Suraj Singh <[email protected]>
…rds with segment replication enabled to wait for replica refresh (opensearch-project#6464) * Initial draft PR for wait_until with segrep Signed-off-by: Rishikesh1159 <[email protected]> * Refactor code and fix test failures. Signed-off-by: Rishikesh1159 <[email protected]> * add comments and fix tests. Signed-off-by: Rishikesh1159 <[email protected]> * Refactor code, address comments and fix test failures. Signed-off-by: Rishikesh1159 <[email protected]> * Aplly spotless check Signed-off-by: Rishikesh1159 <[email protected]> * Adress comments and add integ test. Signed-off-by: Rishikesh1159 <[email protected]> * Address comments and fix failing tests. Signed-off-by: Rishikesh1159 <[email protected]> * Fixing failing test. Signed-off-by: Rishikesh1159 <[email protected]> * Remove unused code. Signed-off-by: Rishikesh1159 <[email protected]> * Addressing comments and refactoring Signed-off-by: Rishikesh1159 <[email protected]> * Adding max refreshlisteners limit that a replica shard can hold and force refresh. Signed-off-by: Rishikesh1159 <[email protected]> * Changing assert message Signed-off-by: Rishikesh1159 <[email protected]> * Fix call to release refresh listeners on replica shards. Signed-off-by: Rishikesh1159 <[email protected]> * Fix call to release refresh listeners on replica shards. Signed-off-by: Rishikesh1159 <[email protected]> * Address comments. Signed-off-by: Rishikesh1159 <[email protected]> * Fixing compile errors. Signed-off-by: Rishikesh1159 <[email protected]> * Spoltss Apply Signed-off-by: Rishikesh1159 <[email protected]> --------- Signed-off-by: Rishikesh1159 <[email protected]> Signed-off-by: Mingshi Liu <[email protected]>
opensearch-project#6622) * Revert "[Segment Replication] Update RefreshPolicy.WAIT_UNTIL for replica shards with segment replication enabled to wait for replica refresh (opensearch-project#6464)" This reverts commit e8a4210. Signed-off-by: Suraj Singh <[email protected]> * Add missing import statement Signed-off-by: Suraj Singh <[email protected]> --------- Signed-off-by: Suraj Singh <[email protected]> Signed-off-by: Mingshi Liu <[email protected]>
Description
This PR updates the wait_until refresh policy on replica shards with segment replication enabled. Here we are updating wait_until refresh listeners to wait until replica shards reaches a specific seqNo instead of translog location.
The wait_until refresh policy for replica and primary shards with document replication remains same as before which is based on translog location. Even the primary shards with segment replication enabled uses translog location for wait_until requests. Only change made is on replica shards with segment replication enabled.
Issues Resolved
#6045
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.