09 Mar 18:41

eyalroz

00fd6a0

Version 0.4.6: Minor bug fixes, wrap() fully supported

_{(v0.4.5 was discarded due to an invalid version string; this is essentially the same as v0.4.5 but with the version string fixed.)}

Changes since v0.4.4:

API changes

#298 : The wrap() methods, which take raw CUDA handles for events, devices, streams etc. and wrap them in, well, the library's wrapper objects (as opposed to otherwise getting/creating wrapper objects directly, with no raw handles) - are now out of the detail_:: namespace and part of the library's proper API.

Bug fixes

#300: Was hiding some CUDA 11 stream-related features due to faulty runtime API version check.
#299: Now correctly copying stream properites.
#296: (Probably) fixed a Win64-to-Win32 cross-build compilation issue with callback function signatures.

Note: Users's help is kindly requested in preparing for the next major release, which will cover both the runtime and the driver API, and NVRTC as well. See this branch and contact me / open relevant issues.

Assets 2

23 Dec 22:53

eyalroz

v0.4.4

a0e70fd

Version 0.4.4: MSVC compilation cleanup

Changes since v0.4.3:

Bug fixes

Device-properties-related functions using baked-in data corrected for some compute capabilities.

New functionality

#284 Introduced a grid-and-block-dimensions structure, grid::complete_dimensions_t
Additional variants of cuda::memory::set() so that you may use either regions or plain pointer.
device_t::global_memory_t now has an associated_device() member.
#223, #272 Support for CUDA 11.0 stream attributes.
Added device_t::supports_block_cooperation().
Additional variants of cuda::memory::copy() for convenience.
#292: Device-properties-related functions requiring baked-in data now support Ampere GPUs (CC 8.0, 8.6).
#293: Some methods of compute_architecture_t are now available only for compute_capability_t, as it is no longer reasonable to rely on microarchitecture-generation-default values (e.g. amount of shared memory per block, number of in-flight threads per multiprocessor etc.)

Changes to existing functionality

#280 Events and streams now have "handles" rather than "ids".
Partial revamp of the CUDA array wrapper classes (e.g. no templatization).
#258 Block "Cooperativity" is now part of the launch configuration, so less launch variants are necessary.
#250 Now offering const variants for both regular and mapped memory.
#269 Renamed cuda::device::resource_limit_t -> cuda::device::limit_t.
Support for GitHub workflows
#267 the NVTX library now depends on CUDA::nvToolsExt (which it should).
#268 Now exporting the requirement for the CUDAToolkit dependency.
cuda::runtime_error can now be constructed also using an r-value string reference, not just a constant l-value reference.
Removed some unnecessary explicit namespace specification in error.hpp.
Now using uniform parameter name in allocation functions
Renamed: array_t::associated_device() -> array_t::device().
#285, #289 Now using the wrap() idiom for constructing device_t's
#273: Added device-setter RAII objects to some asynchronous stream method.
Rework of (global-memory) symbol handling: No more symbol_t type; functionality moved from cuda::memory:: to cuda::symbol::; and now willing to locate any-type-argument.

Build mechanism

Avoid always re-determining CUDA architectures by minding the cache.
Fixed the CompileWithWarnings.cmake module to pass the appropriate flags to the appropriate executables (NVCC front-end vs actual compiler, MSVC vd GCC/clang)

Other changes

Multiple cosmetic changes to avoid MSVC compilation warning, e.g. explicit narrowing casts.
Example program changes, including utility headers.
Added a modified version of the CUDA sample program binaryPartitionCG.
Some internal changes to wrapper classes with no external interface change.
NVTX exception what() message fix.
#283 : Some wrapper identification string generator functions in detail_ subnamespaces.

This version is know to work with CUDA versions up to 11.5; pre-11.0 CUDA versions are supported, but not tested routinely.

Assets 2

20 Aug 07:23

eyalroz

v0.4.3

1163138

Version 0.4.3: New features, compatibility changes

Changes since v0.4.2:

New functionality

Support for working with CUDA symbols.
Support for asynchronous memory allocation.
Classes for all memory regions - both managed and regular, both constant and non-constant memory (we used to have some of these only).

Changes to existing functionality

launch_configuration_t is now constexpr.
Arguably better interface for the partially-existing managed memory region classes.
Pervasive use of regions as parameters to API functions involving memory: Copying, allocating, modifying attributes etc.
Renamed: no_shared_memory -> no_dynamic_shared_memory.

Other changes

CMake-based build mechanism changes to rely on CMake 3.17 changes to CUDA support (no effect on the use of the library).
Replaced the internal detail namespaces with detail_, for libcu++ compatibility.
Dropped the FindCUDAAPIWrappers.cmake module.

This version is know to work with CUDA versions up to 11.4 (but old CUDA versions are not routinely tested).

Assets 2

24 Feb 13:20

eyalroz

v0.4.2

f72ca8a

0.4.2: Bug fixes, compatibility improvements, range-for over devices

This is a minor release, with mostly bug fixes and compatibility improvements. Other than in its version number, it is identical to 0.4.1, which was retracted due to a version numbering issue.

Changes since 0.4:

Can now access all devices as a range: for(auto device : cuda::devices()) { /* etc. etc. */ }.
Wrapper classes (specifically, events and streams) now have non-owning copy constructors.
A stream priority range is now its own class.

Bug fixes:

Dropped invalid stream-priority-related constant.
The device management test was getting the direction of priority ranges backwards.
The p2pBandwidthLatencyTest example program was failing with cross-device event wait attempts, due to calling wait() and record() on the wrong stream.
Removed a spurious template specifier in device.hpp
Can now construct cuda::launch_configuration_t from two integers with C++14 and later.

Build, compatibility, usability:

CMake 3.18 and later no longer complain about the lack of a CUDA_ARCHITECTURES value.
Should now be compatible with MSVC 16.8 on Windows.

Assets 2

14 Oct 14:14

eyalroz

v0.4

e851749

Header-only runtime API wrappers, split NVTX wrappers, etc.

Main changes since 0.3.3:

The runtime API wrappers are now a header-only library.
Split the NVTX wrappers and the Runtime API wrappers into two separate libraries.
Added several fundamental types which were implicit in previous versions: cuda::size_t, cuda::dimensionality_t.

Minor API tweaks:

Renamed launch -> enqueue_launch
Can now schedule managed memory region attachment on streams
Now wrapping cudaMemAdvise() advice.
Array copying uses typed pointers
Added: A cuda::managed::device_side_pointer_for() standalone function
Added: A container facade for the sequence of all devices, so you can now write for (auto device : cuda::devices() ) { }.
De-templatized: device setter RAII class
Added: a freestanding cuda::synchronize() function instead of some wrapper methods
Made some type definitions from inside device_t to the device:: namespace
Added: A subclass of memory::region_t for managed memory
Using memory::region_t in more API functions
Dropped cuda::kernel::maximum_dynamic_shared_memory_per_block().
Centralized the definitions of take_ownership and do_not_take_ownership
Made stream_t& parameters into const stream_t&, almost universally.

Bug fixes:

Cross-device waiting on events
Error message fixes
Not assuming the uintNN_t types are in the default namespace

Build, compatibility, usability:

Fix support for CMake 3.8 (CMakeLists.txt was using some post-3.8 features)
Clang-related:
- Skipping examples which clang++ doesn't support yet (need
- Only enabling separable compilation and CUDA
- const-cast'ing const void * kernel function pointers before reinterpretation - clang wont'tt let it
- GNU extension dropped when compiling examples with CUDA (clang dioesn't support ths)
- Fixed std::max() call issue
CMake targets depending on the wrappers should now have a C++11 language standard requirement for compilation
The wrappers now assert C++11 or later is used, instead of letting you just fail somewhere.

Assets 2

20 Jul 19:49

eyalroz

v0.3.3

1a2eb70

De-templatization, no numeric handles etc.

This release includes both significant additions to the coverage by the wrappers, as well as major changes to the existing wrappers API.

Main changes since 0.2.0:

Forget about numeric handles! The wrapper classes no longer take numeric handles as parameters, in methods exposed to the user. You'll be dealing with device_t's, event_t's, stream_t's etc. - not device::id_t, device::stream_t and device::event_t's.
Wrappers classes no longer templated. That means, on one hand, you don't have to worry about the template argument of "do we assume the wrapper's device is the current one?" ; but on the other hand, every use of the wrapper will set the current device (even if it's already the right one). A lot of code was simplified or even remoed thanks to this change.
device_function_t is now named kernel_t, as only kernels are acceptable by the CUDA Runtime API calls mentioning "device functions". Also, kernel_t's are now a pair of (kernel, device), as the settings which can be made for a kernel are mostly/entirely device-specific.
The examples CMakeLists.txt has been split off from the main CMakeFiles.txt and moved into a subdirectory, removing any dependencies it may have.
Kernel launching now uses perfect forwarding of all parameters.
The library is now almost completely header-only. The single exception to this rule is profiling-related code. If you don't use it - the library is header-only for you.
Changed my email address in the code...

Main additions since 0.2.0:

2D and 3D Array support.
2D and 3D texture support.
A single set() and get() for all memory spaces.

Plus a few bug fixes, and another example program from the CUDA samples.

Changes from 0.3.0:

Fixed: Self-recursion in one of the memory allocation functions.
Fixed: Added missing inline specifiers to some functions
White space tweaks

Assets 2

20 Jan 15:57

eyalroz

release_0_2_0

f983473

Initial versioned release

This repository has not really needed "releases" so far:

We're gradually wrapping an API, with the underlying API changing occasionally - so breaking changes are made frequently.
The master branch is always the most stable and rounded-out version of the code one can use.

However, with other code potentially starting to depend on this repository, and with the CMake scripts maturing somewhat (thanks goes to @codecircuit for the latter) - named/versioned releases start to make more sense, if only for referential convenience.

Of course, there's the question of a versioning scheme. If we go with semantic versioning, we're going to be switching major version numbers all the time.

For now, versions will be numbered as follows: A.B.C or A.B.C-string.

A is the major version number. It will increase with major changes to the library's overall functionality relative to the previous major-version. What counts major? If a whole lot of your host-side code has to change for it to work, then the library change is major.
b is the minor version number. It will increase with changes to the library's functionality - including its API; and unlike SemVer - this change is not necessarily an addition. The change may be rather big in terms of code, but not in terms of the fundamental use patterns .
C is a "patch" version number. These changes are for bugfixes and minor tweaks. They often don't affect the API at all - but they might in some small subtle way.

Finally, why 0.2.0? Well, it's somewhat arbitrary; but the extension has had "core" functionality pretty stable for a while now, with quite a few users; so 0.1.0 feel a bit "premature", which this isn't. On the other hand, 1.0.0 is too presumptuous, since:

We don't have decent feature-test coverage of most of the library (the examples cover a lot though.);
We don't have full nor effectively-fool support of CUDA 9.x
We don't have good enough unit test coverage.

So 1.0.0 is a while off; enjoy 0.2.0 for now.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API changes

Bug fixes

Bug fixes

New functionality

Changes to existing functionality

Build mechanism

Other changes

New functionality

Changes to existing functionality

Other changes

Releases: eyalroz/cuda-api-wrappers

Version 0.4.6: Minor bug fixes, wrap() fully supported

API changes

Bug fixes

Version 0.4.4: MSVC compilation cleanup

Bug fixes

New functionality

Changes to existing functionality

Build mechanism

Other changes

Version 0.4.3: New features, compatibility changes

New functionality

Changes to existing functionality

Other changes

0.4.2: Bug fixes, compatibility improvements, range-for over devices

Header-only runtime API wrappers, split NVTX wrappers, etc.

De-templatization, no numeric handles etc.

Initial versioned release