We are attempting to create a shared library interface to NCCL primitives that would be usable by pycuda through ctypes.
- nccl.
- PyCuda: In order for the cuda driver level allocations to work with the runtime api calls in nccl, we need to be able to access primary contexts. Use this pycuda fork.
- CUDA-7.0 or greater