Async benchmarks always deadlock #136

gevtushenko · 2023-06-30T17:23:04Z

The recent switch to lazy loading by default in CTK 12.2 seems to have broken the async benchmarks. This can be reproduced by nvbench.example.axes. The deadlock can be fixed by CUDA_MODULE_LOADING=EAGER. We should incorporate this information into the error message or set the variable ourselves.

The text was updated successfully, but these errors were encountered:

alliepiper · 2023-08-08T13:24:39Z

We likely want eager loads by default anyway to make sure that lazy loads aren't affecting measurements. Let's look into defining that var from the NVBench main implementation.

Fixes NVIDIA#136

This is the best way we have to diagnose a regression for NVIDIA#136.

* Set `CUDA_MODULE_LOADING=EAGER` before `main`. Fixes #136 * Portability for `setenv`. * Remove pre-main CUDART usage and setup env in main. * Fail examples if they deadlock. This is the best way we have to diagnose a regression for #136. * Add an initialize method to benchmark_manager for CUDA-related setup. Benchmarks are created statically, so their constructors cannot call the CUDA APIs without breaking the CUDA_MODULE_LOAD setup. This method is called from `main` after the environment has been configured.

gevtushenko added the type: bug: functional Does not work as intended. label Jun 30, 2023

alliepiper added this to the 1.0 - Initial Public Release milestone Aug 8, 2023

jrhemstad added this to CCCL Aug 9, 2023

github-project-automation bot moved this to Todo in CCCL Aug 9, 2023

jrhemstad assigned alliepiper Aug 9, 2023

alliepiper added a commit to alliepiper/nvbench that referenced this issue Apr 3, 2024

Set CUDA_MODULE_LOADING=EAGER before main.

868d951

Fixes NVIDIA#136

alliepiper mentioned this issue Apr 3, 2024

Set CUDA_MODULE_LOADING=EAGER before main. #157

Merged

alliepiper added a commit to alliepiper/nvbench that referenced this issue Apr 6, 2024

Fail examples if they deadlock.

e125614

This is the best way we have to diagnose a regression for NVIDIA#136.

alliepiper closed this as completed in #157 Apr 6, 2024

github-project-automation bot moved this from Todo to Done in CCCL Apr 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async benchmarks always deadlock #136

Async benchmarks always deadlock #136

gevtushenko commented Jun 30, 2023

alliepiper commented Aug 8, 2023

Async benchmarks always deadlock #136

Async benchmarks always deadlock #136

Comments

gevtushenko commented Jun 30, 2023

alliepiper commented Aug 8, 2023