Add example calling cuRAND #40

gmarkall · 2021-12-06T12:24:56Z

This is an example based on the cuRAND device API example from the cuRAND documentation.

seibert · 2021-12-06T14:21:41Z

This is great! I hope we can eventually bring this shim into the CUDA target itself and offer a well-supported path for using cuRAND.

Instead of putting logic in the shim file and calling it from relatively bare kernels in Python, the logic is now moved into Python. This needed a Numba extension adding to support the curandState data type natively in Numba.

gmarkall · 2021-12-08T17:21:11Z

@seibert the shim functions in the previous commit had some of the logic of the example in them - I've updated the shim so that it's just a shim layer that wraps each cuRAND function (well, the two that are supported for now), and moved all the logic into the Numba kernels.

It would be good to move this into Numba eventually (ASAP). One thing that makes me feel a little uncertain at the moment is the way we handle arrays of cuRAND states - because the C/C++ functions generally take a pointer to a given state, which is usually computed like &states[i] in C/C++, I made it so that getitem and setitem on a CurandStates object behave a little differently to normal - getitem always returns a new CurandStates object that's pointing at the indicated element - this can then be passed to a cuRAND function. setitem actually copies the state from the RHS of the getitem into the indicated element. So, there's a little bit of asymmetry here. This might not be too problematic because the states are mainly treated as opaque objects by the user, but it makes me feel a bit uneasy that the design may not be clean / consistent.

Do you have any thoughts on the API, whether it seems suitable, or if there's a cleaner way to design / implement the API?

seibert · 2021-12-08T17:31:31Z

The xoroshiro RNG defines a custom dtype for the state of a given generator, which allows the parallel initialization function on the host to return a GPU array of RNG states of this dtype. Since you can't getitem a reference to an individual NumPy array element, all of the API functions take the full RNG state array and an integer index to a particular state as arguments. Then the device function implementation can modify the state in place:

def xoroshiro128p_next(states, index):
    '''Return the next random uint64 and advance the RNG in states[index].
    :type states: 1D array, dtype=xoroshiro128p_dtype
    :param states: array of RNG states
    :type index: int64
    :param index: offset in states to update
    :rtype: uint64
    '''
    index = int64(index)
    s0 = states[index]['s0']
    s1 = states[index]['s1']
    result = s0 + s1

    s1 ^= s0
    states[index]['s0'] = uint64(rotl(s0, uint32(55))) ^ s1 ^ (s1 << uint32(14))
    states[index]['s1'] = uint64(rotl(s1, uint32(36)))

    return result

I think this is probably the best option without confusing the user with non-standard Python semantics.

gmarkall · 2021-12-08T17:40:42Z

Ah, that's a great point - so I think I understand from your suggestion that I can just make a call that would have been like:

curand_init(seed, seq, 0, &states[i]);

in C as:

curand_init(seed, seq, 0, states, i)

in Python. This also reduces the amount of work in implementing the extension, which I like 😄. Will update this PR accordingly.

seibert · 2021-12-08T17:48:50Z

Exactly. Numba already uses the same pattern in the API for CUDA atomic functions, again since there isn't a natural way to pass an address. Instead the user passes the device array and an offset into the array.

This avoids having weird deviations from Python semantics for cuRAND states.

gmarkall · 2021-12-08T17:54:38Z

Code now updated - the one remaining thing I will have a look at is how to give users a way to copy a given state into local memory, which is an optimization used in the examples. But now we're not deviating from Python / C/C++ API semantics, this should be straightforward to implement.

gmarkall · 2022-06-15T15:09:57Z

Now that numba/numba#7621 is merged, this should be ready to review (I can't set the label myself as I don't have permissions).

sklam

Add instruction about cuda-python binding and pointing to NUMBA_CUDA_USE_NVIDIA_BINDING.
I got the following error when running it:

/usr/local/cuda/include/curand_mrg32k3a.h(2138): error: A namespace scope variable without memory space annotations (__device__/__constant__/__shared__/__managed__) is considered a host variable, and host variables are not allowed in JIT mode. Consider using -default-device flag to process unannotated namespace scope variables as __device__ variables in JIT mode

/usr/local/cuda/include/curand_mrg32k3a.h(2721): error: A namespace scope variable without memory space annotations (__device__/__constant__/__shared__/__managed__) is considered a host variable, and host variables are not allowed in JIT mode. Consider using -default-device flag to process unannotated namespace scope variables as __device__ variables in JIT mode

/usr/local/cuda/include/curand_mrg32k3a.h(3367): error: A namespace scope variable without memory space annotations (__device__/__constant__/__shared__/__managed__) is considered a host variable, and host variables are not allowed in JIT mode. Consider using -default-device flag to process unannotated namespace scope variables as __device__ variables in JIT mode

/usr/local/cuda/include/cuda.h(53): catastrophic error: cannot open source file "stdlib.h"

 and 1 catastrophic error detected in the compilation of "shim.cu".
Compilation terminated.

Maybe I am running on hardware and toolkit that is too old

gmarkall · 2022-06-15T21:21:08Z

@sklam What hardware / toolkit are you running with?

If CUDA Python is not found, then a message directing the user to the installation instructions is presented.

gmarkall · 2022-06-16T09:46:41Z

Add instruction about cuda-python binding and pointing to NUMBA_CUDA_USE_NVIDIA_BINDING.

To address this, I've added a check for CUDA Python, and configuration of Numba to use the binding if it is present. If it's not present, the user is direction to install it.

sklam · 2022-06-16T17:23:26Z

@sklam What hardware / toolkit are you running with?


__OS Information__
Platform Name                                 : Linux-4.4.0-87-generic-x86_64-with-glibc2.23
Platform Release                              : 4.4.0-87-generic
OS Name                                       : Linux
OS Version                                    : #110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017
OS Specific Version                           : ?
Libc Version                                  : glibc 2.23
...
__CUDA Information__
CUDA Device Initialized                       : True
CUDA Driver Version                           : 11.2
CUDA Runtime Version                          : 11.0
CUDA NVIDIA Bindings Available                : True
CUDA NVIDIA Bindings In Use                   : False
CUDA Detect Output:
Found 2 CUDA devices
id 0     b'GeForce GTX 1080'                              [SUPPORTED]
                      Compute Capability: 6.1
                           PCI Device ID: 0
                              PCI Bus ID: 101
                                    UUID: GPU-15943929-e7ca-0156-3746-96a4146da70d
                                Watchdog: Disabled
             FP32/FP64 Performance Ratio: 32
id 1           b'Tesla K40c'                 [SUPPORTED (DEPRECATED)]
                      Compute Capability: 3.5
                           PCI Device ID: 0
                              PCI Bus ID: 23
                                    UUID: GPU-a916b1e7-41a8-310e-7426-a2a3e7ffb2ee
                                Watchdog: Disabled
             FP32/FP64 Performance Ratio: 3
Summary:
	2/2 devices are supported

CUDA Libraries Test Output:
Finding nvvm from Conda environment
	named  libnvvm.so.3.3.0
	trying to open library...	ok
Finding cudart from Conda environment
	named  libcudart.so.11.0.221
	trying to open library...	ok
Finding cudadevrt from Conda environment
	named  libcudadevrt.a
Finding libdevice from Conda environment
	trying to open library...	ok

gmarkall · 2022-06-16T21:00:49Z

@sklam You have the CUDA 11.0 toolkit from the conda environment - is the version of the toolkit at /usr/local/cuda also 11.0, or is it something older?

gmarkall · 2022-06-16T21:08:54Z

Also, does the example work if you apply the following change to your Numba installation?

diff --git a/numba/cuda/cudadrv/driver.py b/numba/cuda/cudadrv/driver.py
index 3bd84978f..cc91124fb 100644
--- a/numba/cuda/cudadrv/driver.py
+++ b/numba/cuda/cudadrv/driver.py
@@ -2767,7 +2767,8 @@ class NvrtcProgram:
         #   being optimized away.
         arch = f'--gpu-architecture=compute_{major}{minor}'.encode()
         include = f'-I{config.CUDA_INCLUDE_PATH}'.encode()
-        opts = [arch, include, b'-rdc', b'true']
+        opts = [arch, include, b'-rdc', b'true', b'-default-device',
+                b'-std=c++11']
 
         # Compile the program
         err, = nvrtc.nvrtcCompileProgram(self._program, len(opts), opts)

sklam · 2022-06-22T15:30:08Z

RE: #40 (comment)

It's CUDA Version 10.0.130.

RE: #40 (comment)
with the patch, it gives

  File "/path/to/numba/numba/cuda/cudadrv/driver.py", line 2789, in __init__
    raise NvrtcError(msg)
numba.cuda.cudadrv.driver.NvrtcError: NVRTC Compilation failure whilst compiling shim.cu:

/usr/local/cuda/include/cuda.h(53): catastrophic error: cannot open source file "stdlib.h"

1 catastrophic error detected in the compilation of "shim.cu".
Compilation terminated.

gmarkall · 2022-06-22T15:36:41Z

@sklam Thanks for the info - since nvRTC uses headers from the CUDA installation, I think this is attempting to use older headers with a newer toolkit (I'd imagine that this probably won't work at all with CUDA < 11). Does this need a note saying that you need at least CUDA 11.0 adding to the comments?

sklam · 2022-06-24T22:38:28Z

@gmarkall, yes, let's leave a note in the comment about it. Just in case people are confused about the error.

gmarkall · 2022-06-26T14:58:58Z

@sklam Thanks, I've now added a relevant comment.

sklam

I don't have a machine for testing this right now. All my machines have old CTK. But I trust @gmarkall to have fully tested this. The example will be great to have for the numba 0.56 release.

gmarkall added 2 commits December 6, 2021 12:22

Add cuRAND example

59f7a2e

Ignore vim swap files

f326730

gmarkall mentioned this pull request Dec 6, 2021

Add support for linking CUDA C / C++ with @cuda.jit kernels numba/numba#7621

Merged

Update the cuRAND example so all logic is in Python

ad3d566

Instead of putting logic in the shim file and calling it from relatively bare kernels in Python, the logic is now moved into Python. This needed a Numba extension adding to support the curandState data type natively in Numba.

Use index-based API for cuRAND

b0e31e1

This avoids having weird deviations from Python semantics for cuRAND states.

gmarkall mentioned this pull request Dec 9, 2021

[FEA] Initial support for string UDFs via Numba rapidsai/cudf#9639

Open

sklam reviewed Jun 15, 2022

View reviewed changes

cuRAND example: Check for CUDA Python and use it automatically

bd03181

If CUDA Python is not found, then a message directing the user to the installation instructions is presented.

cuRAND example: Add note about required CTK installation

78beec8

sklam approved these changes Jun 27, 2022

View reviewed changes

sklam merged commit c423f54 into numba:master Jun 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add example calling cuRAND #40

Add example calling cuRAND #40

gmarkall commented Dec 6, 2021

seibert commented Dec 6, 2021

gmarkall commented Dec 8, 2021

seibert commented Dec 8, 2021

gmarkall commented Dec 8, 2021

seibert commented Dec 8, 2021

gmarkall commented Dec 8, 2021

gmarkall commented Jun 15, 2022

sklam left a comment

gmarkall commented Jun 15, 2022

gmarkall commented Jun 16, 2022

sklam commented Jun 16, 2022

gmarkall commented Jun 16, 2022

gmarkall commented Jun 16, 2022

sklam commented Jun 22, 2022

gmarkall commented Jun 22, 2022

sklam commented Jun 24, 2022

gmarkall commented Jun 26, 2022

sklam left a comment

Add example calling cuRAND #40

Add example calling cuRAND #40

Conversation

gmarkall commented Dec 6, 2021

seibert commented Dec 6, 2021

gmarkall commented Dec 8, 2021

seibert commented Dec 8, 2021

gmarkall commented Dec 8, 2021

seibert commented Dec 8, 2021

gmarkall commented Dec 8, 2021

gmarkall commented Jun 15, 2022

sklam left a comment

Choose a reason for hiding this comment

gmarkall commented Jun 15, 2022

gmarkall commented Jun 16, 2022

sklam commented Jun 16, 2022

gmarkall commented Jun 16, 2022

gmarkall commented Jun 16, 2022

sklam commented Jun 22, 2022

gmarkall commented Jun 22, 2022

sklam commented Jun 24, 2022

gmarkall commented Jun 26, 2022

sklam left a comment

Choose a reason for hiding this comment