Header-only runtime API wrappers, split NVTX wrappers, etc.
Main changes since 0.3.3:
- The runtime API wrappers are now a header-only library.
- Split the NVTX wrappers and the Runtime API wrappers into two separate libraries.
- Added several fundamental types which were implicit in previous versions:
cuda::size_t
,cuda::dimensionality_t
.
Minor API tweaks:
- Renamed
launch
->enqueue_launch
- Can now schedule managed memory region attachment on streams
- Now wrapping
cudaMemAdvise()
advice. - Array copying uses typed pointers
- Added: A
cuda::managed::device_side_pointer_for()
standalone function - Added: A container facade for the sequence of all devices, so you can now write
for (auto device : cuda::devices() ) { }
. - De-templatized: device setter RAII class
- Added: a freestanding
cuda::synchronize()
function instead of some wrapper methods - Made some type definitions from inside
device_t
to thedevice::
namespace - Added: A subclass of
memory::region_t
for managed memory - Using
memory::region_t
in more API functions - Dropped
cuda::kernel::maximum_dynamic_shared_memory_per_block()
. - Centralized the definitions of
take_ownership
anddo_not_take_ownership
- Made
stream_t&
parameters intoconst stream_t&
, almost universally.
Bug fixes:
- Cross-device waiting on events
- Error message fixes
- Not assuming the
uintNN_t
types are in the default namespace
Build, compatibility, usability:
- Fix support for CMake 3.8 (
CMakeLists.txt
was using some post-3.8 features) - Clang-related:
- Skipping examples which clang++ doesn't support yet (need
- Only enabling separable compilation and CUDA
- const-cast'ing
const void *
kernel function pointers before reinterpretation - clang wont'tt let it - GNU extension dropped when compiling examples with CUDA (clang dioesn't support ths)
- Fixed
std::max()
call issue
- CMake targets depending on the wrappers should now have a C++11 language standard requirement for compilation
- The wrappers now assert C++11 or later is used, instead of letting you just fail somewhere.