Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Fix OOM error with manual buffer size specification #380

Closed
wants to merge 48 commits into from

Conversation

fhennig
Copy link
Contributor

@fhennig fhennig commented Jan 24, 2023

Description

fixes #359

automatic buffer sizing seems to be the problem in #359

This PR requires a new operator-rs version with the operator-rs features from this PR stackabletech/operator-rs#544

This PR is intended to fix this, by calculating a buffer size based on the user provided memory request. Druid guidelines are used to compute all values.
We also set the available MaxDirectAccessMemory dynamically, instead of hardcoding it. The heap calculation now incorporates this value, so now it shouldn't happen anymore that the JVM tries to allocate more than there is available to the container.

The computation has a few other benefits:

  • maximum buffer size can now be 2GB instead of 1GB (because we're not using druid auto anymore)
  • a fix memory quantity is reserved for the OS, not a scaled one. This saves memory if more memory is available
  • there we're bugs that didn't surface yet with the MaxDirectAccessMemory of all roles. All of them are fixed now, the MaxDirectAccessMemory + Heap Memory does not exceed the maximum memory available anymore.
  • The rounding used for the JVM memory is not unit-dependent anymore. Before it would round of .2 of whatever quantity, so there might be 200MB sitting around unused. This is not the case anymore.
  • If the direct access memory is maxed out, all the remaining memory will be allocated as Heap, to make maxium use of the allocatable memory. This is also for all roles.

Review Checklist

  • Code contains useful comments
  • CRD change approved (or not applicable)
  • (Integration-)Test cases added (or not applicable)
  • Documentation added (or not applicable)
  • Changelog updated (or not applicable)
  • Cargo.toml only contains references to git tags (not specific commits or branches)
  • Helm chart can be installed and deployed operator works (or not applicable)

Once the review is done, comment bors r+ (or bors merge) to merge. Further information

@fhennig fhennig changed the title Added runtime settings calculation struct Fix OOM error with manual buffer size specification Jan 24, 2023
@fhennig fhennig marked this pull request as ready for review January 30, 2023 11:16
@fhennig
Copy link
Contributor Author

fhennig commented Jan 30, 2023

The PR is ready for review, although operator-rs still needs to be updated

@fhennig
Copy link
Contributor Author

fhennig commented Jan 30, 2023

I've started a Test Run

Copy link
Member

@maltesander maltesander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished testing, works fine!
Some points:

  • Bump the versions in the test-definition to stackable23.1.0-rc1?
  • Error messages could provide better info about what is happening (see other comments) or
  • Add checks to reject values that will definitely fail to start up? (e.g. 10Mi as memory)

rust/operator-binary/src/druid_controller.rs Outdated Show resolved Hide resolved
tests/templates/kuttl/resources/30-assert.yaml Outdated Show resolved Hide resolved
bors bot pushed a commit to stackabletech/operator-rs that referenced this pull request Feb 1, 2023
## Description

I have used these functions in my Druid PR already: stackabletech/druid-operator#380



Co-authored-by: Felix Hennig <[email protected]>
@sbernauer
Copy link
Member

Nice, thx!

Copy link
Member

@maltesander maltesander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fhennig
Copy link
Contributor Author

fhennig commented Feb 2, 2023

bors merge

bors bot pushed a commit that referenced this pull request Feb 2, 2023
# Description

fixes #359 

automatic buffer sizing seems to be the problem in #359 

This PR requires a new operator-rs version with the operator-rs features from this PR stackabletech/operator-rs#544

This PR is intended to fix this, by calculating a buffer size based on the user provided memory request. Druid guidelines are used to compute all values.
We also set the available `MaxDirectAccessMemory` dynamically, instead of hardcoding it. The heap calculation now incorporates this value, so now it shouldn't happen anymore that the JVM tries to allocate more than there is available to the container.

The computation has a few other benefits:
- maximum buffer size can now be 2GB instead of 1GB (because we're not using druid `auto` anymore)
- a fix memory quantity is reserved for the OS, not a scaled one. This saves memory if more memory is available
- there we're bugs that didn't surface yet with the MaxDirectAccessMemory of all roles. All of them are fixed now, the MaxDirectAccessMemory + Heap Memory does not exceed the maximum memory available anymore.
- The rounding used for the JVM memory is not unit-dependent anymore. Before it would round of .2 of _whatever_ quantity, so there might be 200MB sitting around unused. This is not the case anymore.
- If the direct access memory is maxed out, all the remaining memory will be allocated as Heap, to make maxium use of the allocatable memory. This is also for all roles.
@bors
Copy link
Contributor

bors bot commented Feb 2, 2023

Pull request successfully merged into main.

Build succeeded:

@bors bors bot changed the title Fix OOM error with manual buffer size specification [Merged by Bors] - Fix OOM error with manual buffer size specification Feb 2, 2023
@bors bors bot closed this Feb 2, 2023
@bors bors bot deleted the fix/359-memory-allocation branch February 2, 2023 10:34
bors bot pushed a commit that referenced this pull request Feb 16, 2023
# Description

*Please add a description here. This will become the commit message of the merge request later.*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Historicals won't start when setting memory limit
3 participants