-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip global checkpoint sync for closed indices #41874
Conversation
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be an option to skip resync for closed indices? I think we rely on either having a perfect commit to use or doing a file based recovery anyway.
|
||
@Override | ||
protected boolean allowClosedIndices() { | ||
return true; // it's okay to sync global checkpoints on closed indices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks dangerous, since ReadOnlyEngine.syncTransLog()
does nothing. So we will accept the global checkpoint on replica, but it will not be persisted.
Looks like RecoveryTarget.finalizeRecovery
also relies on that method, so maybe we should implement it. I think this means that the global checkpoint is not persisted, but the replica is marked in sync. If this replica then later becomes primary after for instance a full restart, it could have an old global checkpoint and would fail the shard? Probably a different issue/PR (if I am right).
@henningandersen Thanks for looking.
Yes, we can skip as resync is a noop for closed indices. I explored this option then prefer other option to avoid adding one more exception for closed indices. I can make this change again - it's pretty contained.
We can add an assertion to make sure that the persisted global checkpoint equals to the in-memory value. |
@henningandersen I discussed this with Tanguy and Yannick during GAH. We need to keep resync; otherwise, replicas won't see the new primary term. I updated this PR to skip global checkpoint sync for closed indices. Can you please take another look? Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @henningandersen @ywelsch. |
The verifying-before-close step ensures the global checkpoints on all shard copies are in sync; thus, we don' t need to sync global checkpoints for closed indices. Relate #33888
The verifying-before-close step ensures the global checkpoints on all shard copies are in sync; thus, we don' t need to sync global checkpoints for closed indices. Relate elastic#33888
The verifying-before-close step ensures the global checkpoints on all shard copies are in sync; thus, we don' t need to sync global checkpoints for closed indices.
Relate #33888 (comment)