-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A replica can be promoted and started in one cluster state update #32042
Conversation
Pinging @elastic/es-distributed |
@ywelsch I'm on the fence as to whether this should go into 6.3 . Opinions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some smaller comments but main change LGTM. With the improved testing, I think this can go into 6.3.
if (newRouting.primary() && currentRouting.isRelocationTarget() == false) { | ||
replicationTracker.activatePrimaryMode(getLocalCheckpoint()); | ||
} | ||
assert currentRouting.isRelocationTarget() == false || currentRouting.primary() == false || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this assertion confuses me: What should it express?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed it's confusing. I tried to clarify by adding the assertion message. If it helps I can flip the boolean around:
(currentRouting.isRelocationTarget() && currentRouting.primary() && replicationTracker.isPrimaryMode() == false) == false
but I think that might be just as confusing if not more. Let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep as is, with the assertion message I think it's ok. I wonder if we should have an assertion at the end of this method to say something like "if we have an active primary shard that's not relocating, then the replication tracker is in primary mode".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good. will add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in 3c79d57
/** | ||
* re-enables default behavior to fail tests when shards created by this class fail | ||
*/ | ||
protected void failOnShardFailures() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method is not used -> remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can do.
* @param listeners an optional set of listeners to add to the shard | ||
*/ | ||
protected IndexShard newShard(ShardRouting routing, ShardPath shardPath, IndexMetaData indexMetaData, | ||
@Nullable IndexSearcherWrapper indexSearcherWrapper, | ||
@Nullable EngineFactory engineFactory, | ||
Runnable globalCheckpointSyncer, | ||
IndexEventListener indexEventListener, IndexingOperationListener... listeners) throws IOException { | ||
IndexEventListener indexEventListener, boolean ignoreShardFailures, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this parameter used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left over of a failed attempt, will remove. good catch
…2042) When a replica is fully recovered (i.e., in `POST_RECOVERY` state) we send a request to the master to start the shard. The master changes the state of the replica and publishes a cluster state to that effect. In certain cases, that cluster state can be processed on the node hosting the replica *together* with a cluster state that promotes that, now started, replica to a primary. This can happen due to cluster state batched processing or if the master died after having committed the cluster state that starts the shard but before publishing it to the node with the replica. If the master also held the primary shard, the new master node will remove the primary (as it failed) and will also immediately promote the replica (thinking it is started). Sadly our code in IndexShard didn't allow for this which caused [assertions](https://github.com/elastic/elasticsearch/blob/13917162ad5c59a96ccb4d6a81a5044546c45c22/server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java#L482) to be tripped in some of our tests runs.
…2042) When a replica is fully recovered (i.e., in `POST_RECOVERY` state) we send a request to the master to start the shard. The master changes the state of the replica and publishes a cluster state to that effect. In certain cases, that cluster state can be processed on the node hosting the replica *together* with a cluster state that promotes that, now started, replica to a primary. This can happen due to cluster state batched processing or if the master died after having committed the cluster state that starts the shard but before publishing it to the node with the replica. If the master also held the primary shard, the new master node will remove the primary (as it failed) and will also immediately promote the replica (thinking it is started). Sadly our code in IndexShard didn't allow for this which caused [assertions](https://github.com/elastic/elasticsearch/blob/13917162ad5c59a96ccb4d6a81a5044546c45c22/server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java#L482) to be tripped in some of our tests runs.
* master: Painless: Simplify Naming in Lookup Package (#32177) Handle missing values in painless (#32207) add support for write index resolution when creating/updating documents (#31520) ECS Task IAM profile credentials ignored in repository-s3 plugin (#31864) Remove indication of future multi-homing support (#32187) Rest test - allow for snapshots to take 0 milliseconds Make x-pack-core generate a pom file Rest HL client: Add put watch action (#32026) Build: Remove pom generation for plugin zip files (#32180) Fix comments causing errors with Java 11 Fix rollup on date fields that don't support epoch_millis (#31890) Detect and prevent configuration that triggers a Gradle bug (#31912) [test] port linux package packaging tests (#31943) Revert "Introduce a Hashing Processor (#31087)" (#32178) Remove empty @return from JavaDoc Adjust SSLDriver behavior for JDK11 changes (#32145) [test] use randomized runner in packaging tests (#32109) Add support for field aliases. (#32172) Painless: Fix caching bug and clean up addPainlessClass. (#32142) Call setReferences() on custom referring tokenfilters in _analyze (#32157) Fix BwC Tests looking for UUID Pre 6.4 (#32158) Improve docs for search preferences (#32159) use before instead of onOrBefore Add more contexts to painless execute api (#30511) Add EC2 credential test for repository-s3 (#31918) A replica can be promoted and started in one cluster state update (#32042) Fix Java 11 javadoc compile problem Fix CP for namingConventions when gradle home has spaces (#31914) Fix `range` queries on `_type` field for singe type indices (#31756) [DOCS] Update TLS on Docker for 6.3 (#32114) ESIndexLevelReplicationTestCase doesn't support replicated failures but it's good to know what they are Remove versionType from translog (#31945) Switch distribution to new style Requests (#30595) Build: Skip jar tests if jar disabled Painless: Add PainlessClassBuilder (#32141) Build: Make additional test deps of check (#32015) Disable C2 from using AVX-512 on JDK 10 (#32138) Build: Move shadow customizations into common code (#32014) Painless: Fix Bug with Duplicate PainlessClasses (#32110) Remove empty @param from Javadoc Re-disable packaging tests on suse boxes Docs: Fix missing example script quote (#32010) [ML] Wait for aliases in multi-node tests (#32086) [ML] Move analyzer dependencies out of categorization config (#32123) Ensure to release translog snapshot in primary-replica resync (#32045) Handle TokenizerFactory TODOs (#32063) Relax TermVectors API to work with textual fields other than TextFieldType (#31915) Updates the build to gradle 4.9 (#32087) Mute :qa:mixed-cluster indices.stats/10_index/Index - all’ Check that client methods match API defined in the REST spec (#31825) Enable testing in FIPS140 JVM (#31666) Fix put mappings java API documentation (#31955) Add exclusion option to `keep_types` token filter (#32012) [Test] Modify assert statement for ssl handshake (#32072)
* es/6.x: (24 commits) Fix broken backport Switch full-cluster-restart to new style Requests (#32140) Fix multi level nested sort (#32204) MINOR: Remove unused `IndexDynamicSettings` (#32237) (#32248) [Tests] Remove QueryStringQueryBuilderTests#toQuery class assertions (#32236) Switch rolling restart to new style Requests (#32147) Enhance Parent circuit breaker error message (#32056) [ML] Use default request durability for .ml-state index (#32233) Enable testing in FIPS140 JVM (#31666) (#32231) Remove indices stats timeout from monitoring docs TESTS: Check for Netty resource leaks (#31861) (#32225) Rename ranking evaluation response section (#32166) Dependencies: Upgrade to joda time 2.10 (#32160) Backport SSL context names (#30953) to 6.x (#32223) Require Gradle 4.9 as minimum version (#32200) Detect old trial licenses and mimic behaviour (#32209) Painless: Simplify Naming in Lookup Package (#32177) add support for write index resolution when creating/updating documents (#31520) A replica can be promoted and started in one cluster state update (#32042) Rest test - allow for snapshots to take 0 milliseconds ...
When a replica is fully recovered (i.e., in
POST_RECOVERY
state) we send a request to the master to start the shard. The master changes the state of the replica and publishes a cluster state to that effect. In certain cases, that cluster state can be processed on the node hosting the replica together with a cluster state that promotes that, now started, replica to a primary. This can happen due to cluster state batched processing or if the master died after having committed the cluster state that starts the shard but before publishing it to the node with the replica. If the master also held the primary shard, the new master node will remove the primary (as it failed) and will also immediately promote the replica (thinking it is started).Sadly our code in IndexShard didn't allow for this which caused assertions to be tripped in some of our tests runs.