-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that max seq # is equal to the global checkpoint when creating ReadOnlyEngines #37426
Conversation
Pinging @elastic/es-distributed |
server/src/main/java/org/elasticsearch/index/engine/ReadOnlyEngine.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/engine/ReadOnlyEngine.java
Outdated
Show resolved
Hide resolved
@elasticmachine run gradle build tests 2 |
1 similar comment
@elasticmachine run gradle build tests 2 |
6a6b959
to
08bf83e
Compare
@elasticmachine run gradle build tests 2 |
1 similar comment
@elasticmachine run gradle build tests 2 |
@elasticmachine run gradle build tests 1 |
08bf83e
to
e28c9af
Compare
@elasticmachine run gradle build tests 2 |
2 similar comments
@elasticmachine run gradle build tests 2 |
@elasticmachine run gradle build tests 2 |
e28c9af
to
b24b5a0
Compare
b24b5a0
to
67a486e
Compare
Thanks @ywelsch |
* elastic/master: (43 commits) Remove remaining occurances of "include_type_name=true" in docs (elastic#37646) SQL: Return Intervals in SQL format for CLI (elastic#37602) Publish to masters first (elastic#37673) Un-assign persistent tasks as nodes exit the cluster (elastic#37656) Fail start of non-data node if node has data (elastic#37347) Use cancel instead of timeout for aborting publications (elastic#37670) Follow stats api should return a 404 when requesting stats for a non existing index (elastic#37220) Remove deprecated FieldNamesFieldMapper.Builder#index (elastic#37305) Document that date math is locale independent Bootstrap a Zen2 cluster once quorum is discovered (elastic#37463) Upgrade to lucene-8.0.0-snapshot-83f9835. (elastic#37668) Mute failing test Fix java time formatters that round up (elastic#37604) Removes awaits fix as the fix is in. (elastic#37676) Mute failing test Ensure that max seq # is equal to the global checkpoint when creating ReadOnlyEngines (elastic#37426) Mute failing discovery disruption tests Add note about how the body is referenced (elastic#33935) Define constants for REST requests endpoints in tests (elastic#37610) Make prepare engine step of recovery source non-blocking (elastic#37573) ...
This commit changes the TransportVerifyShardBeforeCloseAction so that it issues a forced flush, forcing the translog and the Lucene commit to contain the same max seq number and global checkpoint in the case the Translog contains operations that were not written in the IndexWriter (like a Delete that touches a non existing doc). This way the assertion added in #37426 won't trip. Related to #33888
This commit changes the TransportVerifyShardBeforeCloseAction so that it issues a forced flush, forcing the translog and the Lucene commit to contain the same max seq number and global checkpoint in the case the Translog contains operations that were not written in the IndexWriter (like a Delete that touches a non existing doc). This way the assertion added in #37426 won't trip. Related to #33888
… ReadOnlyEngines (elastic#37426) Since version 6.7.0 the Close Index API guarantees that all translog operations have been correctly flushed before the index is closed. If the index is reopened as a Frozen index (which uses a ReadOnlyEngine) we can verify that the maximum sequence number from the last Lucene commit is indeed equal to the last known global checkpoint and refuses to open the read only engine if it's not the case. In this PR the check is only done for indices created on or after 6.7.0 as they are guaranteed to be closed using the new Close Index API. Related elastic#33888
…8727) The Close Index API has been refactored in 6.7.0 and it now performs pre-closing sanity checks on shards before an index is closed: the maximum sequence number must be equals to the global checkpoint. While this is a strong requirement for regular shards, we identified the need to relax this check in the case of CCR following shards. The following shards are not in charge of managing the max sequence number or global checkpoint, which are pulled from a leader shard. They also fetch and process batches of operations from the leader in an unordered way, potentially leaving gaps in the history of ops. If the following shard lags a lot it's possible that the global checkpoint and max seq number never get in sync, preventing the following shard to be closed and a new PUT Follow action to be issued on this shard (which is our recommended way to resume/restart a CCR following). This commit allows each Engine implementation to define the specific verification it must perform before closing the index. In order to allow following/frozen/closed shards to be closed whatever the max seq number or global checkpoint are, the FollowingEngine and ReadOnlyEngine do not perform any check before the index is closed. Co-authored-by: Martijn van Groningen <[email protected]> This commit also contains #37426. Related #33888
This has been backported to |
Since version 6.7.0 the Close Index API guarantees that all translog operations have been correctly flushed before the index is closed. If the index is reopened as a Frozen index (which uses a
ReadOnlyEngine
) we can verify that the maximum sequence number from the last Lucene commit is indeed equal to the last known global checkpoint and refuses to open the read only engine if it's not the case. In this PR the check is only done for indices created on or after 6.7.0 as they are guaranteed to be closed using the new Close Index API.