Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use rapids-cmake parallel testing feature #1183

Merged

Conversation

robertmaynard
Copy link
Contributor

Description

Converts librmm over to use rapids-cmake new GPU aware parallel testing feature, which allows tests to run across all the GPUs on a machine without oversubscription.

This will allow developers to run ctest -j<N> and ctest will figure out given the current machine how many tests it can run in parallel given the current GPU set ( currently 4 tests per GPU ).

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@robertmaynard robertmaynard added feature request New feature or request non-breaking Non-breaking change labels Dec 21, 2022
@github-actions github-actions bot added CMake cpp Pertains to C++ code labels Dec 21, 2022
@robertmaynard
Copy link
Contributor Author

robertmaynard commented Dec 21, 2022

Things to be done before moving out of draft:

@codecov-commenter
Copy link

codecov-commenter commented Dec 27, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-23.02@c5c02fc). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@              Coverage Diff               @@
##             branch-23.02   #1183   +/-   ##
==============================================
  Coverage                ?   0.00%           
==============================================
  Files                   ?       6           
  Lines                   ?     414           
  Branches                ?       0           
==============================================
  Hits                    ?       0           
  Misses                  ?     414           
  Partials                ?       0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@github-actions github-actions bot added the ci label Dec 27, 2022
@ajschmidt8
Copy link
Member

Big fan of this PR! @bdice and I were planning on circling back to the GH Actions test scripts in the future to figure out how we could make some of these scripts less verbose.

@robertmaynard robertmaynard changed the base branch from branch-23.02 to branch-23.04 January 24, 2023 16:02
@robertmaynard
Copy link
Contributor Author

Moving to 23.04

@harrism
Copy link
Member

harrism commented Jan 24, 2023

Some RMM tests allocate significant memory. If tests are run in parallel on small-memory GPUs this might be a problem.

@robertmaynard
Copy link
Contributor Author

Some RMM tests allocate significant memory. If tests are run in parallel on small-memory GPUs this might be a problem.

If we have a list of high memory usage tests we can mark them as requiring the entire GPU which will remove this failure spot ( It will still allow multiple tests to run on multi-gpu systems )

@github-actions github-actions bot removed the ci label Mar 8, 2023
@robertmaynard robertmaynard force-pushed the fea/parallel_testing branch 2 times, most recently from 5559183 to fc153f9 Compare March 10, 2023 16:14
@robertmaynard robertmaynard marked this pull request as ready for review March 10, 2023 16:45
@robertmaynard robertmaynard requested a review from a team as a code owner March 10, 2023 16:45
@robertmaynard robertmaynard requested a review from a team as a code owner March 10, 2023 16:50
@github-actions github-actions bot added the ci label Mar 10, 2023
@bdice
Copy link
Contributor

bdice commented Mar 10, 2023

Tests are failing due to a lack of ctest. Maybe this needs cmake in the test dependencies of librmm? If ctest is the only way to run tests now, we should specify it in the meta.yaml. If there are still test binaries that are useful on their own and ctest is "optional," then perhaps specify it in the dependencies.yaml section for C++ tests?

@robertmaynard
Copy link
Contributor Author

Tests are failing due to a lack of ctest. Maybe this needs cmake in the test dependencies of librmm? If ctest is the only way to run tests now, we should specify it in the meta.yaml. If there are still test binaries that are useful on their own and ctest is "optional," then perhaps specify it in the dependencies.yaml section for C++ tests?

I think it has to go in the dependencies.yaml since we are just changing how our CI invokes the tests. The tests themselves can still be run without ctest if one doesn't want parallel execution support.

@robertmaynard
Copy link
Contributor Author

Looks to be reducing our c++ test time by 45sec ( 4min to 3.15min ). rmm has a couple of tests ( DEVICE_MR ) that have long execution times ( 40+sec ) and high memory usage that cause high serialization.

@vyasr
Copy link
Contributor

vyasr commented Mar 10, 2023

Looks to be reducing our c++ test time by 45sec ( 4min to 3.15min ). rmm has a couple of tests ( DEVICE_MR ) that have long execution times ( 40+sec ) and high memory usage that cause high serialization.

I'm expecting that we'll get much more mileage out of this feature in cudf (and other downstream RAPIDS repos) where the tests are much more amenable to parallelization (lower util).

@vyasr
Copy link
Contributor

vyasr commented Mar 10, 2023

Tests are failing due to a lack of ctest. Maybe this needs cmake in the test dependencies of librmm? If ctest is the only way to run tests now, we should specify it in the meta.yaml. If there are still test binaries that are useful on their own and ctest is "optional," then perhaps specify it in the dependencies.yaml section for C++ tests?

I think it has to go in the dependencies.yaml since we are just changing how our CI invokes the tests. The tests themselves can still be run without ctest if one doesn't want parallel execution support.

One minor caveat is that for cudf I plan to use ctest to configure tests with suitable environment variables to manage the preload library and other associated settings, so those tests will run differently when run via ctest than when the executables are invoked directly. Obviously that specific example is only relevant for cudf, but it does illustrate the possibility that we will have instances where ctest will run a test differently from a direct invocation. We may want to document that expectation if it translates to a recommended way to run tests (ctest over direct executable).

ci/test_cpp.sh Outdated Show resolved Hide resolved
tests/CMakeLists.txt Outdated Show resolved Hide resolved
@robertmaynard robertmaynard force-pushed the fea/parallel_testing branch 2 times, most recently from f73d445 to 0b6aace Compare March 13, 2023 18:57
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM.

ci/test_cpp.sh Show resolved Hide resolved
tests/CMakeLists.txt Outdated Show resolved Hide resolved
@robertmaynard robertmaynard added the 5 - Merge After Dependencies Depends on another PR: do not merge out of order label Mar 15, 2023
tests/CMakeLists.txt Outdated Show resolved Hide resolved
@vyasr vyasr added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 5 - Merge After Dependencies Depends on another PR: do not merge out of order labels Mar 17, 2023
rapids-cmake 23.02 offers parallel testing with load balancing
across GPUs. This feature allows multiple tests to run on
the same gpu without oversubscription, and handles setting the
CUDA_VISIBLE_DEVICE so that you can have tests executing on
different GPUS.
@robertmaynard
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 591726f into rapidsai:branch-23.04 Mar 20, 2023
@robertmaynard robertmaynard deleted the fea/parallel_testing branch March 20, 2023 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge ci CMake cpp Pertains to C++ code feature request New feature or request non-breaking Non-breaking change
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants