-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] [Segment Replication] local checkpoint falling behind global checkpoint #3832
Comments
Looking |
I wasn't able to reproduce when starting server and manually doing ingestion, deletion & occasional refreshes. Writing a test to perform these operations at a larger scale. |
@Poojita-Raj : I am not able to reproduce this failure locally. I tried manually running server and wrote one integration test (below) without any success. Can you share more insights around how reproduce this ?
Branch: https://github.com/dreamer-89/OpenSearch/commits/segrep_snapshot (note, needed to pull in delete doc related fix). |
Thanks to @mch2. The issue is reproducible when a replica is started during indexing operation on primary. The fix is tracked in PR #3743
|
Closing this as it is not able to reproduce. Please feel free to reopen it, if there are solid steps to repro this. |
Describe the bug
Occasionally, with segment replication enabled, we see the below bug during the process of adding/refreshing/deleting documents:
java.lang.AssertionError: supposedly in-sync shard copy received a global checkpoint [0] that is higher than its local checkpoint [-1]
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The global checkpoint calculation must always take all primaries and replicas into account since its the global minimum checkpoint guaranteed to be processed on all nodes. We need to ensure this error isn't produced on regular operations on an index with segment replication.
The text was updated successfully, but these errors were encountered: