Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close Zstd Dictionary after execution to avoid any memory leak. #9403

Merged
merged 3 commits into from
Aug 17, 2023

Conversation

mgodwan
Copy link
Member

@mgodwan mgodwan commented Aug 16, 2023

Description

While executing some test runs to evaluate the memory usage for indexing path, I found there was an increase in memory used when using Zstd but not with other compression codec implementations.

This PR closes Zstd Dictionary after execution to avoid any memory leak.

Saw the following memory utilization with the existing implementation and after the change:

Screenshot 2023-08-17 at 12 37 13 AM

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change 8255050

Incompatible components

Incompatible components: [https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/security-analytics.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]

@reta
Copy link
Collaborator

reta commented Aug 16, 2023

@mgodwan we probable need changelog here (bugfix) since the feature was released in 2.9.0, thank you

@andrross
Copy link
Member

@mgodwan Oof, good catch. Does this mean that using zstd compression before this fix will consume all the memory on a host within a relatively short period of time (like hours or days)?

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov
Copy link

codecov bot commented Aug 16, 2023

Codecov Report

Merging #9403 (9ae5684) into main (e1c40b4) will decrease coverage by 0.06%.
The diff coverage is 70.83%.

@@             Coverage Diff              @@
##               main    #9403      +/-   ##
============================================
- Coverage     71.16%   71.11%   -0.06%     
+ Complexity    57409    57381      -28     
============================================
  Files          4776     4776              
  Lines        270711   270713       +2     
  Branches      39574    39574              
============================================
- Hits         192648   192511     -137     
- Misses        61864    62036     +172     
+ Partials      16199    16166      -33     
Files Changed Coverage Δ
.../index/codec/customcodecs/ZstdCompressionMode.java 86.66% <70.83%> (+0.30%) ⬆️

... and 468 files with indirect coverage changes

Copy link
Contributor

@sarthakaggarwal97 sarthakaggarwal97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @mgodwan, this is a good catch. LGTM!

@reta reta merged commit 5cc7313 into opensearch-project:main Aug 17, 2023
@reta reta added the backport 2.x Backport to 2.x branch label Aug 17, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-9403-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 5cc73134321fb5c070664f7071014a3f0dedb39e
# Push it to GitHub
git push --set-upstream origin backport/backport-9403-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-9403-to-2.x.

@reta
Copy link
Collaborator

reta commented Aug 17, 2023

@mgodwan could you please port manually to 2.x? thank you

@@ -50,6 +50,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- [Remote Store] Add Segment download stats to remotestore stats API ([#8718](https://github.com/opensearch-project/OpenSearch/pull/8718))
- [Remote Store] Add remote segment transfer stats on NodesStats API ([#9168](https://github.com/opensearch-project/OpenSearch/pull/9168))
- Return 409 Conflict HTTP status instead of 503 on failure to concurrently execute snapshots ([#8986](https://github.com/opensearch-project/OpenSearch/pull/5855))
- Fix memory leak when using Zstd Dictionary ([#9403](https://github.com/opensearch-project/OpenSearch/pull/9403))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgodwan This needs to go in the [Unreleased 2.x] section. I'll submit a quick PR to fix this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andrross for fixing this.

in.skipBytes(compressedLength);
offsetInBlock += blockLength;
offsetInBytesRef -= blockLength;
try (ZstdDictDecompress dictDecompress = new ZstdDictDecompress(bytes.bytes, 0, dictLength)) {
Copy link
Collaborator

@nknize nknize Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should use the other ctor? And do this with direct memory instead of using heap?

try (ZstdDictDecompress dictDecompress = new ZstdDictDecompress(ByteBuffer.wrap(bytes.bytes, 0, dictLength))) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The leak is within the native memory and didn't show any issues within the JVM heap. The allocations in native side of code are not freed if this was not closed, and hence lead to the leak.

@austintlee
Copy link
Contributor

Is there still a memory leak? The graph on the right still has the memory utilization climbing.

andrross pushed a commit to andrross/OpenSearch that referenced this pull request Aug 17, 2023
…search-project#9403)

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Add changelog

Signed-off-by: Mohit Godwani <[email protected]>

---------

Signed-off-by: Mohit Godwani <[email protected]>
(cherry picked from commit 5cc7313)
andrross pushed a commit to andrross/OpenSearch that referenced this pull request Aug 17, 2023
…search-project#9403)

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Add changelog

Signed-off-by: Mohit Godwani <[email protected]>

---------

Signed-off-by: Mohit Godwani <[email protected]>
(cherry picked from commit 5cc7313)
Signed-off-by: Andrew Ross <[email protected]>
@mgodwan
Copy link
Member Author

mgodwan commented Aug 18, 2023

@austintlee I ran a custom workload for 4 hours with the final code which is pushed already and don't see any increase in memory usage (The graph attached in the issue description was for the first cut where PR had not handled the decompression flow)

Screenshot 2023-08-18 at 8 14 43 PM

Screenshot 2023-08-18 at 8 20 19 PM

andrross pushed a commit to andrross/OpenSearch that referenced this pull request Aug 18, 2023
…search-project#9403)

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Add changelog

Signed-off-by: Mohit Godwani <[email protected]>

---------

Signed-off-by: Mohit Godwani <[email protected]>
(cherry picked from commit 5cc7313)
Signed-off-by: Andrew Ross <[email protected]>
reta pushed a commit that referenced this pull request Aug 18, 2023
…ory leak. (#9403) (#9425)

* Close Zstd Dictionary after execution to avoid any memory leak. (#9403)

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Add changelog

Signed-off-by: Mohit Godwani <[email protected]>

---------

Signed-off-by: Mohit Godwani <[email protected]>
(cherry picked from commit 5cc7313)
Signed-off-by: Andrew Ross <[email protected]>

* Add 1.3.13 to bwc versions

Signed-off-by: Andrew Ross <[email protected]>

---------

Signed-off-by: Mohit Godwani <[email protected]>
Signed-off-by: Andrew Ross <[email protected]>
Co-authored-by: Mohit Godwani <[email protected]>
reta pushed a commit that referenced this pull request Aug 18, 2023
… (#9424)

* Close Dictionary after every execution to avoid any memory leak



* Close Dictionary after every execution to avoid any memory leak



* Add changelog



---------


(cherry picked from commit 5cc7313)

Signed-off-by: Mohit Godwani <[email protected]>
Signed-off-by: Andrew Ross <[email protected]>
Co-authored-by: Mohit Godwani <[email protected]>
@bugmakerrrrrr
Copy link
Contributor

@andrross @mgodwan I found that ZstdDictCompress/ZstdDictDecompress has implemented Object#finalize. If GC occurs, the resources will be automatically recycled, so the actual situation is not so serious, right?

@andrross
Copy link
Member

@bugmakerrrrrr Indeed, the parent class of ZstdDictCompress/ZstdDictDecompress will release resources upon finalization. Is it still possible that off heap memory could be exhausted before the JVM triggered finalization on these instances? I think I'm still in favor of a patch release to fix this.

@mgodwan
Copy link
Member Author

mgodwan commented Aug 22, 2023

@bugmakerrrrrr Yes, I also had seen finalize method implicitly calling close. Although, I did not see the cleanup happening without the fix during the long running tests I executed where memory usage climbed from 34G to 52G over a period of 9 hours. During this time, it may possibly impact usage of native memory by other components, such as knn plugin, Lucene fst, etc.

The recommendation is usually to not rely on finalize method due to its unpredictable nature, and can cause issues with the resource exhaustion.

@reta
Copy link
Collaborator

reta commented Aug 22, 2023

@mgodwan all true for finalizers, I am wondering if you do have JVM heap usage charts, my hypothesis is that GC may not be triggering full collections (no heap pressure) and as such, finalizers might not be called.

mgodwan added a commit to mgodwan/OpenSearch that referenced this pull request Aug 23, 2023
…search-project#9403)

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Add changelog

Signed-off-by: Mohit Godwani <[email protected]>

---------

Signed-off-by: Mohit Godwani <[email protected]>
@dblock
Copy link
Member

dblock commented Aug 24, 2023

Thanks for fixing this @mgodwan.

austintlee pushed a commit to austintlee/OpenSearch that referenced this pull request Aug 25, 2023
…search-project#9403)

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Add changelog

Signed-off-by: Mohit Godwani <[email protected]>

---------

Signed-off-by: Mohit Godwani <[email protected]>
kkmr pushed a commit to kkmr/OpenSearch that referenced this pull request Aug 28, 2023
…search-project#9403)

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Add changelog

Signed-off-by: Mohit Godwani <[email protected]>

---------

Signed-off-by: Mohit Godwani <[email protected]>
Signed-off-by: Kiran Reddy <[email protected]>
@wbeckler
Copy link

What's the next step to get this merged into 2.10?

@reta
Copy link
Collaborator

reta commented Aug 31, 2023

kaushalmahi12 pushed a commit to kaushalmahi12/OpenSearch that referenced this pull request Sep 12, 2023
…search-project#9403)

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Add changelog

Signed-off-by: Mohit Godwani <[email protected]>

---------

Signed-off-by: Mohit Godwani <[email protected]>
Signed-off-by: Kaushal Kumar <[email protected]>
brusic pushed a commit to brusic/OpenSearch that referenced this pull request Sep 25, 2023
…search-project#9403)

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Add changelog

Signed-off-by: Mohit Godwani <[email protected]>

---------

Signed-off-by: Mohit Godwani <[email protected]>
Signed-off-by: Ivan Brusic <[email protected]>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…search-project#9403)

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Close Dictionary after every execution to avoid any memory leak

Signed-off-by: Mohit Godwani <[email protected]>

* Add changelog

Signed-off-by: Mohit Godwani <[email protected]>

---------

Signed-off-by: Mohit Godwani <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants