-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky Segment Replication test testStartReplicaAfterPrimaryIndexesDocs. #5722
Conversation
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## main #5722 +/- ##
============================================
+ Coverage 71.05% 71.10% +0.04%
+ Complexity 58744 58741 -3
============================================
Files 4766 4766
Lines 280030 280030
Branches 40434 40434
============================================
+ Hits 198988 199117 +129
+ Misses 64851 64679 -172
- Partials 16191 16234 +43
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
public final boolean shouldProcessCheckpoint(ReplicationCheckpoint requestCheckpoint) { | ||
if (state().equals(IndexShardState.STARTED) == false) { | ||
logger.trace(() -> new ParameterizedMessage("Ignoring new replication checkpoint - shard is not started {}", state())); | ||
public boolean isSegmentReplicationAllowed() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: isSegmentReplicationAllowed
does not sound that it is meant for target/replica. isSegRepSyncAllowed
or isSegRepAllowedOnReplica
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is not just invoked for replicas, even after primary is recovered it will be invoked from IndicesClusterStateService#forceSegmentReplication.
* | ||
* @param requestCheckpoint received checkpoint that is checked for processing | ||
* @return true if checkpoint should be processed | ||
* Checks if this shard is able to perform segment replication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
- Checks if this shard is able to perform segment replication.
Checks if this target shard should start round of segment replication with primary ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated - I'm intentionally not mentioning primary here as these paths will be reused for remote store replication.
server/src/main/java/org/opensearch/indices/cluster/IndicesClusterStateService.java
Show resolved
Hide resolved
Gradle Check (Jenkins) Run Completed with:
|
Does not seem related, refiring the gradle check.
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
This is fixed in #5737. @mch2 : Can you please rebase your changes against |
This test was failing because we are validating post recovery if a shard is able to perform segrep while also performing validation if a passed in checkopint. In the post recovery test this checkpoint is always empty, yet the shard will be ahead of this checkpoint after docs are indexed. This change differentiates shard validation from checkpoint validation. Signed-off-by: Marc Handalian <[email protected]> Fix spotless. Signed-off-by: Marc Handalian <[email protected]> Fix testIsSegmentReplicationAllowed_WrongEngineType. Signed-off-by: Marc Handalian <[email protected]> Update warn logs in isSegmentReplicationAllowed. Signed-off-by: Marc Handalian <[email protected]>
Signed-off-by: Marc Handalian <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
…sDocs. (opensearch-project#5722) * Fix flaky SR test testStartReplicaAfterPrimaryIndexesDocs. This test was failing because we are validating post recovery if a shard is able to perform segrep while also performing validation if a passed in checkopint. In the post recovery test this checkpoint is always empty, yet the shard will be ahead of this checkpoint after docs are indexed. This change differentiates shard validation from checkpoint validation. Signed-off-by: Marc Handalian <[email protected]> Fix spotless. Signed-off-by: Marc Handalian <[email protected]> Fix testIsSegmentReplicationAllowed_WrongEngineType. Signed-off-by: Marc Handalian <[email protected]> Update warn logs in isSegmentReplicationAllowed. Signed-off-by: Marc Handalian <[email protected]> * PR feedback. Signed-off-by: Marc Handalian <[email protected]> Signed-off-by: Marc Handalian <[email protected]>
….x. (#5945) * [Segment Replication] Add snapshot and restore tests for segment replication feature (#3993) * [Segment Replication] Add snapshots tests with segment replication enabled Signed-off-by: Suraj Singh <[email protected]> * Fix spotless failures Signed-off-by: Suraj Singh <[email protected]> * Add changelog entry, address review comments, add failover test Signed-off-by: Suraj Singh <[email protected]> * Fix spotless failures Signed-off-by: Suraj Singh <[email protected]> * Address review comments 2 Signed-off-by: Suraj Singh <[email protected]> Signed-off-by: Suraj Singh <[email protected]> * Remove changelog update. Signed-off-by: Marc Handalian <[email protected]> * Mute flaky test testStartReplicaAfterPrimaryIndexesDocs. (#5714) Signed-off-by: Marc Handalian <[email protected]> Signed-off-by: Marc Handalian <[email protected]> * Fix flaky Segment Replication test testStartReplicaAfterPrimaryIndexesDocs. (#5722) * Fix flaky SR test testStartReplicaAfterPrimaryIndexesDocs. This test was failing because we are validating post recovery if a shard is able to perform segrep while also performing validation if a passed in checkopint. In the post recovery test this checkpoint is always empty, yet the shard will be ahead of this checkpoint after docs are indexed. This change differentiates shard validation from checkpoint validation. Signed-off-by: Marc Handalian <[email protected]> Fix spotless. Signed-off-by: Marc Handalian <[email protected]> Fix testIsSegmentReplicationAllowed_WrongEngineType. Signed-off-by: Marc Handalian <[email protected]> Update warn logs in isSegmentReplicationAllowed. Signed-off-by: Marc Handalian <[email protected]> * PR feedback. Signed-off-by: Marc Handalian <[email protected]> Signed-off-by: Marc Handalian <[email protected]> * [Segment Replication] Mute flaky tests (#5739) Signed-off-by: Suraj Singh <[email protected]> Signed-off-by: Suraj Singh <[email protected]> * [Segment Replication] Mute flaky tests (#5742) Signed-off-by: Suraj Singh <[email protected]> Signed-off-by: Suraj Singh <[email protected]> * Fix spotless. Signed-off-by: Marc Handalian <[email protected]> * Muting flaky SegmentReplication ITs. (#5700) Signed-off-by: Marc Handalian <[email protected]> Signed-off-by: Marc Handalian <[email protected]> Signed-off-by: Suraj Singh <[email protected]> Signed-off-by: Marc Handalian <[email protected]> Co-authored-by: Suraj Singh <[email protected]>
….x. (#5945) * [Segment Replication] Add snapshot and restore tests for segment replication feature (#3993) * [Segment Replication] Add snapshots tests with segment replication enabled Signed-off-by: Suraj Singh <[email protected]> * Fix spotless failures Signed-off-by: Suraj Singh <[email protected]> * Add changelog entry, address review comments, add failover test Signed-off-by: Suraj Singh <[email protected]> * Fix spotless failures Signed-off-by: Suraj Singh <[email protected]> * Address review comments 2 Signed-off-by: Suraj Singh <[email protected]> Signed-off-by: Suraj Singh <[email protected]> * Remove changelog update. Signed-off-by: Marc Handalian <[email protected]> * Mute flaky test testStartReplicaAfterPrimaryIndexesDocs. (#5714) Signed-off-by: Marc Handalian <[email protected]> Signed-off-by: Marc Handalian <[email protected]> * Fix flaky Segment Replication test testStartReplicaAfterPrimaryIndexesDocs. (#5722) * Fix flaky SR test testStartReplicaAfterPrimaryIndexesDocs. This test was failing because we are validating post recovery if a shard is able to perform segrep while also performing validation if a passed in checkopint. In the post recovery test this checkpoint is always empty, yet the shard will be ahead of this checkpoint after docs are indexed. This change differentiates shard validation from checkpoint validation. Signed-off-by: Marc Handalian <[email protected]> Fix spotless. Signed-off-by: Marc Handalian <[email protected]> Fix testIsSegmentReplicationAllowed_WrongEngineType. Signed-off-by: Marc Handalian <[email protected]> Update warn logs in isSegmentReplicationAllowed. Signed-off-by: Marc Handalian <[email protected]> * PR feedback. Signed-off-by: Marc Handalian <[email protected]> Signed-off-by: Marc Handalian <[email protected]> * [Segment Replication] Mute flaky tests (#5739) Signed-off-by: Suraj Singh <[email protected]> Signed-off-by: Suraj Singh <[email protected]> * [Segment Replication] Mute flaky tests (#5742) Signed-off-by: Suraj Singh <[email protected]> Signed-off-by: Suraj Singh <[email protected]> * Fix spotless. Signed-off-by: Marc Handalian <[email protected]> * Muting flaky SegmentReplication ITs. (#5700) Signed-off-by: Marc Handalian <[email protected]> Signed-off-by: Marc Handalian <[email protected]> Signed-off-by: Suraj Singh <[email protected]> Signed-off-by: Marc Handalian <[email protected]> Co-authored-by: Suraj Singh <[email protected]>
Signed-off-by: Marc Handalian [email protected]
Description
Fix flaky SegmentReplicationIT test testStartReplicaAfterPrimaryIndexesDocs.
This test was failing because we are validating post recovery if a shard is able to perform segrep while also performing validation of a passed in checkopint. In the post recovery scenario this checkpoint is passed as empty, yet the shard will be ahead of this empty checkpoint after docs are indexed and fail validation. This change differentiates shard validation from checkpoint validation and only performs the former post recovery.
This PR also introduces validation of the engine type before SR is invoked. This is to ensure NRTReplicationEngine is properly loaded on the replica. Without this SR would continue and blow up at a later stage with an index corruption error. This happens a lot when MockInternalEngine is randomly loaded in tests as this method by default returns true.
I've re-run this test 1k times in intellij without failure.
Issues Resolved
related #5669
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.