Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote Store Migration] Skip segrep lag computation for shard copies on docrep nodes #14119

Merged
merged 12 commits into from
Jun 11, 2024

Conversation

shourya035
Copy link
Member

Description

During the dual replication phase, we skip creating checkpoint timers for replication group members that are still left behind on docrep nodes and are yet to mover over to remote enabled nodes.

However, the segrep bytes behind calculation logic still considers those shard copies. Since those shards are still on docrep and doesn't understand the Segrep checkpoints, they add up to the bytes_behind computation logic, in turn exhibiting false sense of replication lag during the remote store migration process.

With this PR, if the primary shard:

  • is allocated to a remote enabled node
    and
  • does not have remote store based index settings (which means remote migration for that index is not yet complete)

we are skipping those allocationIds while calculating segment replication lag metrics

Added a new IT through which this issue was readily reproduced without the current code change

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@shourya035 shourya035 self-assigned this Jun 10, 2024
@shourya035 shourya035 added skip-changelog backport 2.x Backport to 2.x branch labels Jun 10, 2024
Copy link
Contributor

❌ Gradle check result for 819ffae: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Member

@ashking94 ashking94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm overall

Signed-off-by: Shourya Dutta Biswas <[email protected]>
@shourya035 shourya035 added the v2.15.0 Issues and PRs related to version 2.15.0 label Jun 11, 2024
Copy link
Contributor

❌ Gradle check result for 45bfacb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 23b6173: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❕ Gradle check result for 23b6173: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Jun 11, 2024

Codecov Report

Attention: Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Project coverage is 71.65%. Comparing base (b15cb0c) to head (23b6173).
Report is 394 commits behind head on main.

Current head 23b6173 differs from pull request most recent head ad5bede

Please upload reports for the commit ad5bede to get more accurate results.

Files Patch % Lines
...org/opensearch/index/seqno/ReplicationTracker.java 0.00% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #14119      +/-   ##
============================================
+ Coverage     71.42%   71.65%   +0.23%     
- Complexity    59978    61882    +1904     
============================================
  Files          4985     5113     +128     
  Lines        282275   290792    +8517     
  Branches      40946    41990    +1044     
============================================
+ Hits         201603   208374    +6771     
- Misses        63999    65217    +1218     
- Partials      16673    17201     +528     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

✅ Gradle check result for ad5bede: SUCCESS

@gbbafna gbbafna merged commit 710d818 into opensearch-project:main Jun 11, 2024
28 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jun 11, 2024
… on docrep nodes (#14119)

Signed-off-by: Shourya Dutta Biswas <[email protected]>
(cherry picked from commit 710d818)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
gbbafna pushed a commit that referenced this pull request Jun 11, 2024
… on docrep nodes (#14119) (#14163)

(cherry picked from commit 710d818)

Signed-off-by: Shourya Dutta Biswas <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
kkewwei pushed a commit to kkewwei/OpenSearch that referenced this pull request Jul 24, 2024
… on docrep nodes (opensearch-project#14119) (opensearch-project#14163)

(cherry picked from commit 710d818)

Signed-off-by: Shourya Dutta Biswas <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: kkewwei <[email protected]>
wdongyu pushed a commit to wdongyu/OpenSearch that referenced this pull request Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog Storage:Remote v2.15.0 Issues and PRs related to version 2.15.0
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

3 participants