-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segment Replication - Remove seqNo field from ReplicationCheckpoint and use UserData to transfer state. #6594
Conversation
This change updates getLatestSegmentInfos to only return the max seqNo from the previous commit point. This is the only way to guarantee that up to this seqNo has made it into the commit point. Signed-off-by: Marc Handalian <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
|
Yep, thought I had run these. Removed that assertion that was tripping, it was asserting our processed seqNo was advanced on the replicas with every segment copy, which is now no longer guaranteed. We still assert on searchable docs. |
Gradle Check (Jenkins) Run Completed with:
|
Looks like a flaky unrelated test, that seed was not reproducible. |
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #6594 +/- ##
============================================
+ Coverage 70.69% 70.83% +0.14%
- Complexity 59076 59132 +56
============================================
Files 4804 4804
Lines 283081 283074 -7
Branches 40809 40807 -2
============================================
+ Hits 200125 200528 +403
+ Misses 66526 66084 -442
- Partials 16430 16462 +32
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
...er/src/internalClusterTest/java/org/opensearch/indices/replication/SegmentReplicationIT.java
Outdated
Show resolved
Hide resolved
...er/src/internalClusterTest/java/org/opensearch/indices/replication/SegmentReplicationIT.java
Outdated
Show resolved
Hide resolved
...er/src/internalClusterTest/java/org/opensearch/indices/replication/SegmentReplicationIT.java
Show resolved
Hide resolved
Signed-off-by: Marc Handalian <[email protected]>
I see this test fails here while asserting per index primary balance after one-third nodes are stopped. I think this can fail in certain scenarios which restricts primary shard movement to nodes already containing replica copies (due to |
Gradle Check (Jenkins) Run Completed with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-6594-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f4739bb8757d0153d01e83b3aaf8b76724ba3b04
# Push it to GitHub
git push --set-upstream origin backport/backport-6594-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x Then, create a pull request where the |
@mch2 : Backport workflow failing, care to raise a manual backport |
…nd use UserData to transfer state. (opensearch-project#6594) * Segment Replication - Fix incorrect maxSeqNo computation. This change updates getLatestSegmentInfos to only return the max seqNo from the previous commit point. This is the only way to guarantee that up to this seqNo has made it into the commit point. Signed-off-by: Marc Handalian <[email protected]> * Remove unnecessary seqNo field from ReplicationCheckpoint. Signed-off-by: Marc Handalian <[email protected]> --------- Signed-off-by: Marc Handalian <[email protected]> (cherry picked from commit f4739bb)
…nd use UserData to transfer state. (opensearch-project#6594) * Segment Replication - Fix incorrect maxSeqNo computation. This change updates getLatestSegmentInfos to only return the max seqNo from the previous commit point. This is the only way to guarantee that up to this seqNo has made it into the commit point. Signed-off-by: Marc Handalian <[email protected]> * Remove unnecessary seqNo field from ReplicationCheckpoint. Signed-off-by: Marc Handalian <[email protected]> --------- Signed-off-by: Marc Handalian <[email protected]> (cherry picked from commit f4739bb) Signed-off-by: Marc Handalian <[email protected]>
…nd use UserData to transfer state. (opensearch-project#6594) * Segment Replication - Fix incorrect maxSeqNo computation. This change updates getLatestSegmentInfos to only return the max seqNo from the previous commit point. This is the only way to guarantee that up to this seqNo has made it into the commit point. Signed-off-by: Marc Handalian <[email protected]> * Remove unnecessary seqNo field from ReplicationCheckpoint. Signed-off-by: Marc Handalian <[email protected]> --------- Signed-off-by: Marc Handalian <[email protected]> (cherry picked from commit f4739bb) Signed-off-by: Marc Handalian <[email protected]>
…nd use UserData to transfer state. (#6594) (#6601) * Segment Replication - Fix incorrect maxSeqNo computation. This change updates getLatestSegmentInfos to only return the max seqNo from the previous commit point. This is the only way to guarantee that up to this seqNo has made it into the commit point. * Remove unnecessary seqNo field from ReplicationCheckpoint. --------- (cherry picked from commit f4739bb) Signed-off-by: Marc Handalian <[email protected]>
…nd use UserData to transfer state. (opensearch-project#6594) * Segment Replication - Fix incorrect maxSeqNo computation. This change updates getLatestSegmentInfos to only return the max seqNo from the previous commit point. This is the only way to guarantee that up to this seqNo has made it into the commit point. Signed-off-by: Marc Handalian <[email protected]> * Remove unnecessary seqNo field from ReplicationCheckpoint. Signed-off-by: Marc Handalian <[email protected]> --------- Signed-off-by: Marc Handalian <[email protected]> Signed-off-by: Mingshi Liu <[email protected]>
Description
This change removes the seqNo field from ReplicationCheckpoint. This seqNo is only used on the replica to mark the
setLocalCheckpointOfSafeCommit
used to mark at which point the xlog can be deleted and advance the processed seqNo. This logic was incorrectly attempting to fetch a max by querying the SegmentInfos directly, which would be incorrectly set after delete operations to a lower value.Rather than relying on this seqNo, we can safely use the max seqNo of the previous commit point to advance both operations. For xlog purge, we will continue to purge after a commit is received from the primary and only purge up to the max, which is guaranteed to be in the set of fsynced segments. For the processed seqno, this ensures that if the replica is promoted primary it continues to replay ops from the xlog starting from this seqNo.
Issues Resolved
closes #6588
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.