Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Broken tests in debug build #6521

Closed
12 tasks
mtjrider opened this issue Oct 13, 2020 · 4 comments · Fixed by #8432
Closed
12 tasks

[BUG] Broken tests in debug build #6521

mtjrider opened this issue Oct 13, 2020 · 4 comments · Fixed by #8432
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. tests Unit testing for project

Comments

@mtjrider
Copy link
Contributor

mtjrider commented Oct 13, 2020

Summary

When compiling libcudf.so with debug symbols, not all tests pass / execute without error.
The following is a list of tests that either fail or error:

  • COPYING_TEST: does not compile
  • DICTIONARY_TEST: illegal memory access
  • ERROR_TEST: failure
  • FIXED_POINT_TEST: illegal memory access
  • GROUPBY_TEST: failure
  • INTEROP_TEST: illegal memory access
  • JOIN_TEST: unknown, indefinite hang
  • REDUCTION_TEST: failure
  • REPLACE_NULLS_TEST: failure
  • REPLACE_TEST: illegal memory access
  • SEARCH_TEST: illegal memory access
  • TEXT_TEST: illegal memory access

where "illegal memory access" indicates that, at some point a cudaErrorIllegalAddress error was detected, and "failure" indicates that the test simply failed. I have attached a more complete report detailing precisely which sub-test prompted the error(s) or failure(s): cudf-debug-test-log.txt

Reproducing the failures

Build libcudf.so with debugging symbols by cloning PR #6134 and configuring with cmake .. -DCMAKE_BUILD_TYPE=Debug. Alternatively, if you're using RAPIDS compose, simply fetch/checkout the PR, and build the C++ cudf library with build-cudf-cpp --debug.

Expected behavior

All tests should pass regardless of build type.

Environment details

Built from source.
print_env_out.txt

Status

  • COPYING_TEST fixed
  • DICTIONARY_TEST fixed
  • ERROR_TEST fixed
  • FIXED_POINT_TEST fixed
  • GROUPBY_TEST fixed
  • INTEROP_TEST fixed
  • JOIN_TEST fixed
  • REDUCTION_TEST fixed
  • REPLACE_NULLS_TEST fixed
  • REPLACE_TEST fixed
  • SEARCH_TEST fixed
  • TEXT_TEST fixed
@mtjrider mtjrider added bug Something isn't working Needs Triage Need team to review and classify labels Oct 13, 2020
@harrism
Copy link
Member

harrism commented Oct 14, 2020

I'm hoping that cudaIllegalMemoryAccess failures are all the same problem. May be related to rapidsai/rmm#563.

@kkraus14 kkraus14 added libcudf Affects libcudf (C++/CUDA) code. tests Unit testing for project and removed Needs Triage Need team to review and classify labels Oct 15, 2020
@github-actions
Copy link

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@ttnghia
Copy link
Contributor

ttnghia commented Mar 12, 2021

Does this issue still persist? I see that all the tests status are "fixed".

@razajafri
Copy link
Contributor

razajafri commented Apr 20, 2021

This is still a valid bug. I faced this yesterday, I only ran tests on the Java bindings and saw a heap of IllegalMemoryAccess Errors. The errors magically went away when I built cudf Release build.

Please find the logs attached
Error Log

rapids-bot bot pushed a commit that referenced this issue Jun 11, 2021
…tor in thrust::lower_bound (#8432)

Closes #6521 

The `thrust::lower_bound` call is crashing on a libcudf debug build when using the `output_indexalator`. I've opened [an issue in the thrust github](NVIDIA/thrust#1452) keep track of this. The problem only occurs when using the `-G` nvcc compile option.

I found a workaround using a `thrust::transform` along with device lambda containing a `thrust::lower_bound(seq)` call for each element. This PR adds the workaround which is only used in a debug build since the error occurs in functions that used as utilities for other functions when using dictionary columns.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Devavret Makkar (https://github.com/devavret)
  - Karthikeyan (https://github.com/karthikeyann)

URL: #8432
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. tests Unit testing for project
Projects
None yet
5 participants