Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add restore level safeguards to prevent file cache oversubscription #8606

Merged

Conversation

kotwanikunal
Copy link
Member

@kotwanikunal kotwanikunal commented Jul 11, 2023

Description

Related Issues

Resolves #7033

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@kotwanikunal
Copy link
Member Author

Follow up PR: @andrross / @reta / @Bukhtawar

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.remotestore.SegmentReplicationRemoteStoreIT.testPrimaryStopped_ReplicaPromoted

@andrross
Copy link
Member

A new user-facing setting is worth a changelog entry I think.

@codecov
Copy link

codecov bot commented Jul 21, 2023

Codecov Report

Merging #8606 (b936e9d) into main (883559c) will increase coverage by 0.05%.
The diff coverage is 84.78%.

@@             Coverage Diff              @@
##               main    #8606      +/-   ##
============================================
+ Coverage     71.01%   71.06%   +0.05%     
- Complexity    57178    57215      +37     
============================================
  Files          4763     4763              
  Lines        269964   270003      +39     
  Branches      39502    39508       +6     
============================================
+ Hits         191707   191875     +168     
+ Misses        62041    61971      -70     
+ Partials      16216    16157      -59     
Files Changed Coverage Δ
...ster/snapshots/restore/RestoreSnapshotRequest.java 67.74% <ø> (-1.62%) ⬇️
...rg/opensearch/common/settings/ClusterSettings.java 93.18% <ø> (ø)
.../java/org/opensearch/snapshots/RestoreService.java 57.18% <82.50%> (+3.93%) ⬆️
...a/org/opensearch/cluster/routing/RoutingTable.java 94.82% <100.00%> (+0.44%) ⬆️
...uting/allocation/decider/DiskThresholdDecider.java 74.66% <100.00%> (-0.60%) ⬇️
...search/index/store/remote/filecache/FileCache.java 72.46% <100.00%> (+0.40%) ⬆️
server/src/main/java/org/opensearch/node/Node.java 86.04% <100.00%> (ø)

... and 465 files with indirect coverage changes

@kotwanikunal
Copy link
Member Author

A new user-facing setting is worth a changelog entry I think.

Added in an entry.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@kotwanikunal
Copy link
Member Author

Gradle Check (Jenkins) Run Completed with:

#8928

@kotwanikunal
Copy link
Member Author

Gradle Check (Jenkins) Run Completed with:

#8928

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@kotwanikunal
Copy link
Member Author

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.client.PitIT.testDeleteAllAndListAllPits

@andrross andrross added backport 2.x Backport to 2.x branch v2.10.0 labels Jul 28, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@andrross andrross merged commit a3aab67 into opensearch-project:main Jul 29, 2023
15 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-8606-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a3aab67ee86bf171a7eb480f0933e0b955fbf4f3
# Push it to GitHub
git push --set-upstream origin backport/backport-8606-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-8606-to-2.x.

kotwanikunal added a commit to kotwanikunal/OpenSearch that referenced this pull request Jul 29, 2023
kotwanikunal added a commit that referenced this pull request Jul 31, 2023
* Add safeguard limits for file cache during node level allocation (#8208)

Signed-off-by: Kunal Kotwani <[email protected]>
(cherry picked from commit 91bfa01)

* Add restore level safeguards to prevent file cache oversubscription (#8606)

Signed-off-by: Kunal Kotwani <[email protected]>
(cherry picked from commit a3aab67)
@Dileep-Dora
Copy link

Hi @kotwanikunal and @andrross from the implementation this is what I understood,

node.search.cache.size - setting to reserve Filecache on search node
cluster.filecache.remote_data_ratio - setting for ratio to allocate remote shards

for example if cluster.filecache.remote_data_ratio set to 5 and node.search.cache.size set to 100GB, can store 500GB of data per node.

If I set the cluster.filecache.remote_data_ratio to 10 , can store 1TB of data per node.

is may understanding correct?

and also when can we see expect the release in 2.8 version?

@andrross
Copy link
Member

andrross commented Aug 8, 2023

@Dileep-Dora Your understanding is correct. This feature will be released in the 2.10 version.

kaushalmahi12 pushed a commit to kaushalmahi12/OpenSearch that referenced this pull request Sep 12, 2023
brusic pushed a commit to brusic/OpenSearch that referenced this pull request Sep 25, 2023
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch v2.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Searchable Remote Index] Add safeguards to ensure a cluster cannot be over-subscribed
5 participants