-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore shard started requests when primary term does not match #37899
Conversation
Pinging @elastic/es-distributed |
All tests green (except oss-distro-docs, but it fails because of another change) but I'd like more CI runs so: @elasticmachine test this please |
Green again except oss-distro-docs, let's run it one more time: @elasticmachine test this please |
server/src/main/java/org/elasticsearch/cluster/action/shard/ShardStateAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/indices/cluster/ClusterStateChanges.java
Outdated
Show resolved
Hide resolved
Thanks @ywelsch, I updated the PR. Let me know what you think |
server/src/main/java/org/elasticsearch/cluster/action/shard/ShardStateAction.java
Outdated
Show resolved
Hide resolved
shardRouting.active()); | ||
} | ||
if (this.shardRouting.primary()) { | ||
assertTrue("a primary shard can't be demoted", shardRouting.primary()); | ||
if (this.shardRouting.initializing()) { | ||
assertEquals("primary term can not be updated on an initializing primary shard: " + shardRouting, term, newPrimaryTerm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have the same assertion directly in IndexShard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already present under this form so I think we're good.
4f779c0
to
2d39807
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine run elasticsearch-ci/2 |
1 similar comment
@elasticmachine run elasticsearch-ci/2 |
@elasticmachine run elasticsearch-ci/default-distro |
Thanks @ywelsch |
This pull request disables BWC tests while backporting #37899 to 6.x.
This commit changes the StartedShardEntry so that it also contains the primary term of the shard to start. This way the master node can also checks that the primary term from the start request is equal to the current shard's primary term in the cluster state, and it can ignore any shard started request that would concerns a previous instance of the shard that would have been allocated to the same node. Such situation are likely to happen with frozen (or restored) indices and the replication of closed indices, because with replicated closed indices the shards will be initialized again after the index is closed and can potentially be re initialized again if the index is reopened as a frozen index. In such cases the lifecycle of the shards would be something like: * shard is STARTED * index is closed * shards is INITIALIZING (index state is CLOSED, primary term is X) * index is reopened * shards are INITIALIZING again (index state is OPENED, potentially frozen, primary term is X+1) Adding the primary term to the shard started request will allow to discard potential StartedShardEntry requests received by the master node if the request concerns the shard with primary term X because it has been moved/reinitialized in the meanwhile under the primary term X+1. Relates to #33888
This commit adapts the version used in StartedShardEntry serialization after the backport of elastic#37899 and reenables bwc tests. Related to elastic#37899 Related to elastic#38074
Backported to 6.x in 255015d |
…ersion * elastic/master: Do not set up NodeAndClusterIdStateListener in test (elastic#38110) ML: better handle task state race condition (elastic#38040) Soft-deletes policy should always fetch latest leases (elastic#37940) Handle scheduler exceptions (elastic#38014) Minor logging improvements (elastic#38084) Fix Painless void return bug (elastic#38046) Update PutFollowAction serialization post-backport (elastic#37989) fix a few versionAdded values in ElasticsearchExceptions (elastic#37877) Reenable BWC tests after backport of elastic#37899 (elastic#38093) Mute failing test Mute failing test Fail start on obsolete indices documentation (elastic#37786) SQL: Implement FIRST/LAST aggregate functions (elastic#37936) Un-mute NoMasterNodeIT.testNoMasterActionsWriteMasterBlock remove unused parser fields in RemoteResponseParsers
When a shard on a data node finished to recover, the data node sends a
StartedShardEntry
request to the master node. This request contains the shard id, an allocation id and the id of the node that holds the shard to start. This request is then processed by the master node which checks that the shard exists in the cluster state with the same allocation id as in the request.This pull request changes the
StartedShardEntry
so that it also contains the primary term of the shard to start. This way the master node can also checks that the primary term from the start request is equal to the current shard's primary term in the cluster state, and it can ignore any shard started request that would concerns a previous instance of the shard that would have been allocated to the same node.Such situation are likely to happen with frozen (or restored) indices and the replication of closed indices, because with replicated closed indices the shards will be initialized again after the index is closed and can potentially be re initialized again if the index is reopened as a frozen index. In such cases the lifecycle of the shards would be something like:
Adding the primary term to the shard started request will allow to discard potential
StartedShardEntry
requests received by the master node if the request concerns the shard with primary term X because it has been moved/reinitialized in the meanwhile under the primary term X+1.Relates to #33888