Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust parallelism in spark-tests script to reduce memory footprint [skip ci] #8871

Merged
merged 1 commit into from
Jul 31, 2023

Conversation

pxLi
Copy link
Collaborator

@pxLi pxLi commented Jul 31, 2023

mitigate #8729

We haven't figured out why JDK11 test run consumed more memory than other JDK versions and then was OOM killed by the system. (this could be related to different GC strategies in different JDK versions, but the issue was not resolved when trying to use non-default GC in jdk11).

This change is trying to limit the parallelism of integration tests in nightly CI.
We have confirmed that setting parallelism as 5 does not increase the test run duration (verified w/ multiple GPU types),
and this could significantly help reduce the peak of host memory footprint (from >50GiB to >40GiB) in recent CI runs.

@pxLi pxLi added the test Only impacts tests label Jul 31, 2023
@pxLi
Copy link
Collaborator Author

pxLi commented Jul 31, 2023

build

@pxLi pxLi marked this pull request as ready for review July 31, 2023 05:52
@jlowe jlowe merged commit c5fed02 into NVIDIA:branch-23.08 Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants