-
Notifications
You must be signed in to change notification settings - Fork 758
CUDA thrust::lower_bound fails when give a custom output iterator and compiled with -G option [NVBug 3322776] #1452
Comments
Took a quick look at this in the debugger it appears that memory is getting corrupt -- the @davidwendt Has the RAPIDS team filed an nvcc bug for this? |
No, not yet. I wanted to get some help from you on creating the details for nvcc bug since I don't know what is happening. |
Ok, just wanted to make sure. I'm planning to spend a bit more time looking at this in case it is in our libraries before we escalate. |
I spent a couple more hours looking into this, and things seem to go off the rails around this line. The lhs of the assignment ( This tuple is produced from a I can't see anything going wrong in the source code, so this does seem like a compiler bug. I've filed NVBug 3322776 to have the compiler folks check it out. |
…tor in thrust::lower_bound (#8432) Closes #6521 The `thrust::lower_bound` call is crashing on a libcudf debug build when using the `output_indexalator`. I've opened [an issue in the thrust github](NVIDIA/thrust#1452) keep track of this. The problem only occurs when using the `-G` nvcc compile option. I found a workaround using a `thrust::transform` along with device lambda containing a `thrust::lower_bound(seq)` call for each element. This PR adds the workaround which is only used in a debug build since the error occurs in functions that used as utilities for other functions when using dictionary columns. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Devavret Makkar (https://github.com/devavret) - Karthikeyan (https://github.com/karthikeyann) URL: #8432
This is a tricky one. The root issue is that for template <class Input, class UnaryOp>
struct for_each_f
{
Input input;
UnaryOp op;
THRUST_FUNCTION
for_each_f(Input input, UnaryOp op)
: input(input), op(op) {}
template <class Size>
THRUST_DEVICE_FUNCTION void operator()(Size idx)
{
op(raw_reference_cast(input[idx])); // HERE
}
}; This has a bad interaction with The fix is quite simple: Thrust should avoid using template <class Size>
THRUST_DEVICE_FUNCTION void operator()(Size idx)
{
op(raw_reference_cast(*(input + idx)));
} |
Too much code in Thrust assumes that it[n] returns the same type as *(it+n), but the standard only requires that it[n] is convertible to the type of *(it+n). Thrust should avoid using operator[] on iterators and prefer instead to use addition/dereference. Fixes NVIDIA#1452
I believe this is a compiler issue since the problem only appears when using the -G option on nvcc. Unfortunately I'm not able to follow the thrust source code here well enough to see where the problem occurs. I've attached a smallish testcase that can reproduce the error consistently.
lb_output_itr.cu source file to reproduce the error
Compile the source file using the following command:
Running the resulting
lb_output_itr
executable gives the following result:The
0x400000000
should be the device pointer but the iterator object is getting trashed somewhere.Building without the
-G
option will produce the correct result:The
output_indexalator
iterator being used here is a simplified version from a much larger set of code and has been pared down to provide a minimal reproducer for this issue.I've verified the error occurs with the same results on my Linux 18.04 system with the following nvcc compiler versions (and associated thrust versions): V11.0.221, V11.1.105, V11.2.142, and V11.3.109
The text was updated successfully, but these errors were encountered: