Use rapids-cmake parallel testing feature #1183

robertmaynard · 2022-12-21T16:50:40Z

Description

Converts librmm over to use rapids-cmake new GPU aware parallel testing feature, which allows tests to run across all the GPUs on a machine without oversubscription.

This will allow developers to run ctest -j<N> and ctest will figure out given the current machine how many tests it can run in parallel given the current GPU set ( currently 4 tests per GPU ).

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

robertmaynard · 2022-12-21T16:52:53Z

Things to be done before moving out of draft:

rapids-cmake needs to merge Add rapids_test allowing projects to run gpu tests in parallel rapids-cmake#328

codecov-commenter · 2022-12-27T15:44:42Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-23.02@c5c02fc). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files

@@              Coverage Diff               @@
##             branch-23.02   #1183   +/-   ##
==============================================
  Coverage                ?   0.00%           
==============================================
  Files                   ?       6           
  Lines                   ?     414           
  Branches                ?       0           
==============================================
  Hits                    ?       0           
  Misses                  ?     414           
  Partials                ?       0

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

ajschmidt8 · 2022-12-30T04:30:06Z

Big fan of this PR! @bdice and I were planning on circling back to the GH Actions test scripts in the future to figure out how we could make some of these scripts less verbose.

robertmaynard · 2023-01-24T16:03:04Z

Moving to 23.04

harrism · 2023-01-24T23:26:58Z

Some RMM tests allocate significant memory. If tests are run in parallel on small-memory GPUs this might be a problem.

robertmaynard · 2023-01-25T13:04:43Z

Some RMM tests allocate significant memory. If tests are run in parallel on small-memory GPUs this might be a problem.

If we have a list of high memory usage tests we can mark them as requiring the entire GPU which will remove this failure spot ( It will still allow multiple tests to run on multi-gpu systems )

bdice · 2023-03-10T18:20:41Z

Tests are failing due to a lack of ctest. Maybe this needs cmake in the test dependencies of librmm? If ctest is the only way to run tests now, we should specify it in the meta.yaml. If there are still test binaries that are useful on their own and ctest is "optional," then perhaps specify it in the dependencies.yaml section for C++ tests?

robertmaynard · 2023-03-10T19:06:53Z

Tests are failing due to a lack of ctest. Maybe this needs cmake in the test dependencies of librmm? If ctest is the only way to run tests now, we should specify it in the meta.yaml. If there are still test binaries that are useful on their own and ctest is "optional," then perhaps specify it in the dependencies.yaml section for C++ tests?

I think it has to go in the dependencies.yaml since we are just changing how our CI invokes the tests. The tests themselves can still be run without ctest if one doesn't want parallel execution support.

robertmaynard · 2023-03-10T19:45:39Z

Looks to be reducing our c++ test time by 45sec ( 4min to 3.15min ). rmm has a couple of tests ( DEVICE_MR ) that have long execution times ( 40+sec ) and high memory usage that cause high serialization.

vyasr · 2023-03-10T20:07:08Z

Looks to be reducing our c++ test time by 45sec ( 4min to 3.15min ). rmm has a couple of tests ( DEVICE_MR ) that have long execution times ( 40+sec ) and high memory usage that cause high serialization.

I'm expecting that we'll get much more mileage out of this feature in cudf (and other downstream RAPIDS repos) where the tests are much more amenable to parallelization (lower util).

vyasr · 2023-03-10T20:10:19Z

Tests are failing due to a lack of ctest. Maybe this needs cmake in the test dependencies of librmm? If ctest is the only way to run tests now, we should specify it in the meta.yaml. If there are still test binaries that are useful on their own and ctest is "optional," then perhaps specify it in the dependencies.yaml section for C++ tests?

I think it has to go in the dependencies.yaml since we are just changing how our CI invokes the tests. The tests themselves can still be run without ctest if one doesn't want parallel execution support.

One minor caveat is that for cudf I plan to use ctest to configure tests with suitable environment variables to manage the preload library and other associated settings, so those tests will run differently when run via ctest than when the executables are invoked directly. Obviously that specific example is only relevant for cudf, but it does illustrate the possibility that we will have instances where ctest will run a test differently from a direct invocation. We may want to document that expectation if it translates to a recommended way to run tests (ctest over direct executable).

ci/test_cpp.sh

tests/CMakeLists.txt

vyasr

Mostly LGTM.

ci/test_cpp.sh

tests/CMakeLists.txt

rapids-cmake 23.02 offers parallel testing with load balancing across GPUs. This feature allows multiple tests to run on the same gpu without oversubscription, and handles setting the CUDA_VISIBLE_DEVICE so that you can have tests executing on different GPUS.

robertmaynard · 2023-03-20T14:32:20Z

/merge

robertmaynard added feature request New feature or request non-breaking Non-breaking change labels Dec 21, 2022

github-actions bot added CMake cpp Pertains to C++ code labels Dec 21, 2022

github-actions bot added the ci label Dec 27, 2022

robertmaynard changed the base branch from branch-23.02 to branch-23.04 January 24, 2023 16:02

robertmaynard force-pushed the fea/parallel_testing branch from 03edaaa to fc21512 Compare March 8, 2023 17:58

github-actions bot removed the ci label Mar 8, 2023

robertmaynard force-pushed the fea/parallel_testing branch 2 times, most recently from 5559183 to fc153f9 Compare March 10, 2023 16:14

robertmaynard marked this pull request as ready for review March 10, 2023 16:45

robertmaynard requested a review from a team as a code owner March 10, 2023 16:45

robertmaynard force-pushed the fea/parallel_testing branch from fc153f9 to 17df282 Compare March 10, 2023 16:50

robertmaynard requested a review from a team as a code owner March 10, 2023 16:50

github-actions bot added the ci label Mar 10, 2023

robertmaynard force-pushed the fea/parallel_testing branch from 17df282 to 078a220 Compare March 10, 2023 19:09

vyasr reviewed Mar 10, 2023

View reviewed changes

ci/test_cpp.sh Outdated Show resolved Hide resolved

tests/CMakeLists.txt Outdated Show resolved Hide resolved

robertmaynard force-pushed the fea/parallel_testing branch 2 times, most recently from f73d445 to 0b6aace Compare March 13, 2023 18:57

robertmaynard force-pushed the fea/parallel_testing branch from 0b6aace to fec0bf2 Compare March 13, 2023 19:48

vyasr approved these changes Mar 13, 2023

View reviewed changes

ci/test_cpp.sh Show resolved Hide resolved

tests/CMakeLists.txt Outdated Show resolved Hide resolved

harrism approved these changes Mar 14, 2023

View reviewed changes

robertmaynard added the 5 - Merge After Dependencies Depends on another PR: do not merge out of order label Mar 15, 2023

vyasr reviewed Mar 17, 2023

View reviewed changes

tests/CMakeLists.txt Outdated Show resolved Hide resolved

vyasr added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 5 - Merge After Dependencies Depends on another PR: do not merge out of order labels Mar 17, 2023

raydouglass approved these changes Mar 17, 2023

View reviewed changes

robertmaynard force-pushed the fea/parallel_testing branch from 5c341dd to 6b3d19a Compare March 20, 2023 12:50

robertmaynard force-pushed the fea/parallel_testing branch from 6b3d19a to 4cf7cf6 Compare March 20, 2023 12:57

rapids-bot bot merged commit 591726f into rapidsai:branch-23.04 Mar 20, 2023

robertmaynard deleted the fea/parallel_testing branch March 20, 2023 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use rapids-cmake parallel testing feature #1183

Use rapids-cmake parallel testing feature #1183

robertmaynard commented Dec 21, 2022

robertmaynard commented Dec 21, 2022 •

edited

Loading

codecov-commenter commented Dec 27, 2022 •

edited

Loading

ajschmidt8 commented Dec 30, 2022

robertmaynard commented Jan 24, 2023

harrism commented Jan 24, 2023

robertmaynard commented Jan 25, 2023

bdice commented Mar 10, 2023 •

edited

Loading

robertmaynard commented Mar 10, 2023

robertmaynard commented Mar 10, 2023

vyasr commented Mar 10, 2023

vyasr commented Mar 10, 2023

vyasr left a comment

robertmaynard commented Mar 20, 2023

Use rapids-cmake parallel testing feature #1183

Use rapids-cmake parallel testing feature #1183

Conversation

robertmaynard commented Dec 21, 2022

Description

Checklist

robertmaynard commented Dec 21, 2022 • edited Loading

codecov-commenter commented Dec 27, 2022 • edited Loading

Codecov Report

ajschmidt8 commented Dec 30, 2022

robertmaynard commented Jan 24, 2023

harrism commented Jan 24, 2023

robertmaynard commented Jan 25, 2023

bdice commented Mar 10, 2023 • edited Loading

robertmaynard commented Mar 10, 2023

robertmaynard commented Mar 10, 2023

vyasr commented Mar 10, 2023

vyasr commented Mar 10, 2023

vyasr left a comment

Choose a reason for hiding this comment

robertmaynard commented Mar 20, 2023

robertmaynard commented Dec 21, 2022 •

edited

Loading

codecov-commenter commented Dec 27, 2022 •

edited

Loading

bdice commented Mar 10, 2023 •

edited

Loading