Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Segment Replication stats bytes behind metric #9686

Merged
merged 3 commits into from
Sep 1, 2023

Conversation

mch2
Copy link
Member

@mch2 mch2 commented Sep 1, 2023

Description

Fix SegRep's bytesbehind metric to be accurate by computing a diff of segments referenced by each ReplicationCheckpoint.
This change adds a map of StoreFileMetadata to ReplicationCheckpoint, which is computed from the SegmentInfos used to build the checkpoint.

Related Issues

Resolves #9685

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

This metric currently gives an estimate of the bytes behind based on the difference in size of the segments
referenced by the active readers between shards. This does not give a good indication of the amount of bytes
that need to be fetched and is inaccurate after deletes and merges. Fixed by sending file metadata with each checkpoint
and computing a diff between checkpoints when SegmentReplicationShardStats is built.

Signed-off-by: Marc Handalian <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2023

Compatibility status:

Checks if related components are compatible with change b3e779a

Incompatible components

Incompatible components: [https://github.com/opensearch-project/cross-cluster-replication.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]

@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2023

Gradle Check (Jenkins) Run Completed with:

@mch2
Copy link
Member Author

mch2 commented Sep 1, 2023

Segrep specific bwc test fails here bc of wire compatibility. Will mute this until this is backported.

Signed-off-by: Marc Handalian <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2023

Compatibility status:

Checks if related components are compatible with change b0e3cfc

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git]

@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2023

Gradle Check (Jenkins) Run Completed with:

@codecov
Copy link

codecov bot commented Sep 1, 2023

Codecov Report

Merging #9686 (f6dee2b) into main (04c90c7) will decrease coverage by 0.07%.
Report is 3 commits behind head on main.
The diff coverage is 90.90%.

@@             Coverage Diff              @@
##               main    #9686      +/-   ##
============================================
- Coverage     71.15%   71.08%   -0.07%     
+ Complexity    57727    57692      -35     
============================================
  Files          4800     4800              
  Lines        272047   272064      +17     
  Branches      39703    39705       +2     
============================================
- Hits         193565   193404     -161     
- Misses        62266    62418     +152     
- Partials      16216    16242      +26     
Files Changed Coverage Δ
.../replication/checkpoint/ReplicationCheckpoint.java 84.61% <72.72%> (-3.11%) ⬇️
...org/opensearch/index/seqno/ReplicationTracker.java 69.05% <100.00%> (+0.14%) ⬆️
...in/java/org/opensearch/index/shard/IndexShard.java 69.69% <100.00%> (+0.49%) ⬆️
...x/store/remote/metadata/RemoteSegmentMetadata.java 100.00% <100.00%> (ø)

... and 451 files with indirect coverage changes

@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2023

Compatibility status:

Checks if related components are compatible with change f6dee2b

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]

@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2023

Gradle Check (Jenkins) Run Completed with:

@mch2 mch2 merged commit f9b6694 into opensearch-project:main Sep 1, 2023
@mch2 mch2 added the backport 2.x Backport to 2.x branch label Sep 1, 2023
@mch2 mch2 deleted the bytesbehind branch September 1, 2023 18:31
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-9686-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f9b6694c0eb4a4a7be326ba143edc41a6eba2f0c
# Push it to GitHub
git push --set-upstream origin backport/backport-9686-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-9686-to-2.x.

@mch2 mch2 added backport 2.x Backport to 2.x branch and removed backport 2.x Backport to 2.x branch backport-failed labels Sep 2, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 2, 2023
* Fix Segment Replication stats bytes behind metric.

This metric currently gives an estimate of the bytes behind based on the difference in size of the segments
referenced by the active readers between shards. This does not give a good indication of the amount of bytes
that need to be fetched and is inaccurate after deletes and merges. Fixed by sending file metadata with each checkpoint
and computing a diff between checkpoints when SegmentReplicationShardStats is built.

Signed-off-by: Marc Handalian <[email protected]>

* Skip SegRep bwc test until this is backported to 2.x.

Signed-off-by: Marc Handalian <[email protected]>

* Add changelog entry.

Signed-off-by: Marc Handalian <[email protected]>

---------

Signed-off-by: Marc Handalian <[email protected]>
(cherry picked from commit f9b6694)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@mch2 mch2 added the backport 2.x Backport to 2.x branch label Sep 2, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-9686-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f9b6694c0eb4a4a7be326ba143edc41a6eba2f0c
# Push it to GitHub
git push --set-upstream origin backport/backport-9686-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-9686-to-2.x.

kaushalmahi12 pushed a commit to kaushalmahi12/OpenSearch that referenced this pull request Sep 12, 2023
…#9686)

* Fix Segment Replication stats bytes behind metric.

This metric currently gives an estimate of the bytes behind based on the difference in size of the segments
referenced by the active readers between shards. This does not give a good indication of the amount of bytes
that need to be fetched and is inaccurate after deletes and merges. Fixed by sending file metadata with each checkpoint
and computing a diff between checkpoints when SegmentReplicationShardStats is built.

Signed-off-by: Marc Handalian <[email protected]>

* Skip SegRep bwc test until this is backported to 2.x.

Signed-off-by: Marc Handalian <[email protected]>

* Add changelog entry.

Signed-off-by: Marc Handalian <[email protected]>

---------

Signed-off-by: Marc Handalian <[email protected]>
Signed-off-by: Kaushal Kumar <[email protected]>
brusic pushed a commit to brusic/OpenSearch that referenced this pull request Sep 25, 2023
…#9686)

* Fix Segment Replication stats bytes behind metric.

This metric currently gives an estimate of the bytes behind based on the difference in size of the segments
referenced by the active readers between shards. This does not give a good indication of the amount of bytes
that need to be fetched and is inaccurate after deletes and merges. Fixed by sending file metadata with each checkpoint
and computing a diff between checkpoints when SegmentReplicationShardStats is built.

Signed-off-by: Marc Handalian <[email protected]>

* Skip SegRep bwc test until this is backported to 2.x.

Signed-off-by: Marc Handalian <[email protected]>

* Add changelog entry.

Signed-off-by: Marc Handalian <[email protected]>

---------

Signed-off-by: Marc Handalian <[email protected]>
Signed-off-by: Ivan Brusic <[email protected]>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…#9686)

* Fix Segment Replication stats bytes behind metric.

This metric currently gives an estimate of the bytes behind based on the difference in size of the segments
referenced by the active readers between shards. This does not give a good indication of the amount of bytes
that need to be fetched and is inaccurate after deletes and merges. Fixed by sending file metadata with each checkpoint
and computing a diff between checkpoints when SegmentReplicationShardStats is built.

Signed-off-by: Marc Handalian <[email protected]>

* Skip SegRep bwc test until this is backported to 2.x.

Signed-off-by: Marc Handalian <[email protected]>

* Add changelog entry.

Signed-off-by: Marc Handalian <[email protected]>

---------

Signed-off-by: Marc Handalian <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Segment Replication] SegRep metric bytes behind is not precise.
4 participants