-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segment Replication - Release incorrectly retained commits on primary shards #6660
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
…rimary shards This change ensures that primary shards clean up any state when a replica is marked out of sync. This can happen when replicas fail due to store corruption or mismatching segments during file copy. Signed-off-by: Marc Handalian <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
@@ -254,8 +260,22 @@ private void cancelHandlers(Predicate<? super SegmentReplicationSourceHandler> p | |||
.filter(predicate) | |||
.map(SegmentReplicationSourceHandler::getAllocationId) | |||
.collect(Collectors.toList()); | |||
logger.trace(() -> new ParameterizedMessage("Cancelling replications for allocationIds {}", allocationIds)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: As cancellation is an unexpected event, I think we can log this with warn or debug level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is a helper that is used in multiple spots - both for cancellations on node drop and when a local primary is shutting down, this is why I thought trace was more appropriate, though these are valuable logs, will change to warn.
cancelHandlers( | ||
(handler) -> handler.getCopyState().getShard().shardId().equals(shardId) | ||
&& inSyncAllocationIds.contains(handler.getAllocationId()) == false, | ||
"Shard is no longer in-sync with the primary" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we add a unit test validating cancellation ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
refresh(INDEX_NAME); | ||
} | ||
// Refresh, this should trigger round of segment replication | ||
assertBusy(() -> { assertDocCounts(docCount, replicaNode); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think irrespective of whether replica has different segment files, this test will pass ?. Can we assert on a different allocationID on shard to ensure shard indeed failed on replica ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added this.
Not sure why this failed. Let's see current gradle check output.
|
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #6660 +/- ##
============================================
+ Coverage 70.61% 70.74% +0.13%
- Complexity 59071 59177 +106
============================================
Files 4803 4803
Lines 283192 283208 +16
Branches 40837 40842 +5
============================================
+ Hits 199981 200368 +387
+ Misses 66804 66441 -363
+ Partials 16407 16399 -8
... and 522 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Signed-off-by: Marc Handalian <[email protected]>
Signed-off-by: Marc Handalian <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
Gradle Check (Jenkins) Run Completed with:
|
… shards (#6660) * Segment Replication - Release incorrectly retained index commits on primary shards This change ensures that primary shards clean up any state when a replica is marked out of sync. This can happen when replicas fail due to store corruption or mismatching segments during file copy. Signed-off-by: Marc Handalian <[email protected]> * PR feedback. Signed-off-by: Marc Handalian <[email protected]> * spotless fix. Signed-off-by: Marc Handalian <[email protected]> --------- Signed-off-by: Marc Handalian <[email protected]> (cherry picked from commit 73a2279) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
… shards (#6660) (#6661) * Segment Replication - Release incorrectly retained index commits on primary shards This change ensures that primary shards clean up any state when a replica is marked out of sync. This can happen when replicas fail due to store corruption or mismatching segments during file copy. * PR feedback. * spotless fix. --------- (cherry picked from commit 73a2279) Signed-off-by: Marc Handalian <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
… shards (opensearch-project#6660) * Segment Replication - Release incorrectly retained index commits on primary shards This change ensures that primary shards clean up any state when a replica is marked out of sync. This can happen when replicas fail due to store corruption or mismatching segments during file copy. Signed-off-by: Marc Handalian <[email protected]> * PR feedback. Signed-off-by: Marc Handalian <[email protected]> * spotless fix. Signed-off-by: Marc Handalian <[email protected]> --------- Signed-off-by: Marc Handalian <[email protected]> Signed-off-by: Mingshi Liu <[email protected]>
This change ensures that primary shards clean up any state when a replica is marked out of sync. This can happen when replicas fail due to store corruption or mismatching segments during file copy.
Description
This change ensures that primary shards clean up any state when a replica is marked out of sync. This can happen when replicas fail due to store corruption or mismatching segments during file copy.
Issues Resolved
closes #6578
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.