Releases · eyalroz/cuda-api-wrappers

07 Dec 22:50

eyalroz

v0.6.1

831666a

Version 0.6.1: Minor bug fixes

Changes since v0.6:

Bug fixes

#442 Changed a no-longer-valid use of link::input_type_t in link.hpp which was triggering an error when building with C++17.
#438 Corrected the make_cuda_host_alloc_flags() function, which was bitwise-AND-ing instead of bitwise-OR-ing.

Other changes

#441 kernel_t::context() now uses wrap() and is noexcept
#436 , #437 Now respecting the CUDA_NO_HALF preprocessor define, and not defining nor including including and half-precision-related code with it defined.

Assets 2

08 Oct 17:15

eyalroz

v0.6

1a52be1

Version 0.6: PTX compilation library support

Changes since v0.5.6:

PTX Compilation library

This version introduces a single major change:

#385 : Support for NVIDIA's PTX compilation library.

Note: The CUDA driver already supports compilation of PTX code, but it has limited supported for various compilation options; plus - it requires a driver to be loaded, i.e. requires kernel involvement and a GPU on your system. This library does not.

Value-vs-reference issues

#430 : Now passing kernel-like objects by reference rather than by value where relevant in the kernel launch wrapper functions.
#433 : Now passing program name by value rather than by reference.

Other changes

#431 : The NVTX wrappers no longer depend on a thread support library
#436 : The wrapper library now respects CUDA_NO_HALF, when you want to avoid CUDA defining the half
#432 : Removed some std:: rather than ::std:: namespace qualifications which had snuck into the codebase recently (which cause trouble with NVIDIA's cuda::std namespace).
#435 : Updated static data tables for the Ampere/Lovelace (8.x) and Hopper architectures.

Assets 2

0 Join discussion

08 Oct 17:01

eyalroz

v0.5.6

23f9838

Version 0.5.6: Compatibility and partial-inclusion fixes

Changes since v0.5.5:

New functionality

#423: Add an implementation of the surface and texture reference getters for modules (getting raw references, not corresponding wrapper classes for these objects, which this library does not currently offer)

C++14-and-later compatibility fixes

#415: Resolved incompatibility of std::optional/std::experimental::optional with the internal poor_mans_optional
#416: corrected placement of inclusion of std::experimental::optional

Other changes

#428, #429 : Minor fixes and tweaks to CUDA array code (via the cuda::array_t class template)
#427, #406 : Stream and Event wrapper class instances are now non-copyable (you need to either move them or pass references/pointers to them)
#425, #426: Error and exception handling improvements (with a slight performance benefit)
#424 : Link options now passed by const-reference, not by value
#411: Add :: prefix to occurrences of std:: (which snuck in again in recent versions; these potentially clashe with NVIDIA's standard library constructs)
#413: Added missing intra-library #include directives which were masked when including all APIs, but not when including individual headers. Also, removed inappropriate inline decorators from declaration-only lines
#420: Internal renaming
#417, #417: Internal placement of functionality in header files (files in cuda/api/ vs in cuda/api/multi_wrapper_impls).
#412: bandwidthtest now includes <iostream> on its own
#409: Moved pci_id_impl.hpp into the detail/ subfolder (and renamed it)

Assets 2

10 Sep 18:43

eyalroz

v0.5.5

5338353

Version 0.5.5: Minor changes

Changes since v0.5.4:

Run-time compilation functionality

#397 : The NVRTC compilation options class now supports passing extra options to PTXAS, and also supports --dopt
#403 : The program builder class can now accept named header additions using std::string's for the name and/or header source (rather than only C-style const char* strings).

Bug fixes

#396 : scoped_existence_ensurer_t, the gadget for ensuring there is some current context (regardless of which) will now make sure the driver has been initialized.
#395 : Can now start profiling with our nvtx component even if the driver not yet being initialized.

Other changes

#400 : Added an alias for waiting/synchronizing on an event: You can now execute cuda::wait(my_event), not just cuda::synchronize(my_event).
#399 : time_elapsed_between() can now accept std::pair's of events.
#398 : Added another example program, the CUDA sample bandwidthtest
#401 : Made all stream enqueuing methods const (so you can now enqueue on a stream passed by const-reference).
#404 : Can now construct grid::overall_dimensions_t from a dim3 object, so that they're more interoperable with CUDA-related values you obtained elsewhere.

Assets 2

0 Join discussion

19 Aug 19:06

eyalroz

v0.5.4

4aac489

v0.5.4: Minor build issue fixes

Changes since v0.5.3:

Build-related fixes

#392 Made the NVTX and NVRTC wrappers usable in multiple translation units within the same executable
#393 Made the NVTX dependency on libdl (on Linux) explicit

Other changes

#394 Avoiding redundant cuInit() call when getting a device's name

Assets 2

26 Jul 07:55

eyalroz

v0.5.3

475f194

v0.5.3: Asynch memory ops, NVRTC compilation improvements

Changes since v0.5.2:

Runtime program compilation (NVRTC) improvements

#379: Can get the compilation log, PTX, cubin or NVVM in a user-provided rather than self-allocated buffer
#388: A builder interface for NVRTC programs
#386: Add support for nvrtcGetSupportedArchs()
#375: Support adding arbitrary options when dynamically compiling a CUDA program
#265: Support for diag-suppress/error/warn compilation options

Runtime-compilation-related Bug fixes

#391: Fix for a CUDA 10.0 support regression
#384: Make nvrtc depend on runtime-and-driver
#376: When rendering compilation options to a string, we get an extra space
#378: Compilation log vector contains trailing '\0'
#387: nvrtc.h included in wrong file

Other changes

#390: Avoiding a memory leak when getting a CUDA device's name
#248: Support asynchronous memory allocation (in v0.5.2 we only had allocation, no freeing)

Caveats

Continuous build testing on Windows is failing on GitHub Actions due to trouble with CMake detecting the NVTX path. Assistance from users on this matter would be appreciated.

Assets 2

18 Jun 21:08

eyalroz

v0.5.2

fb5a9a6

Version 0.5.2: Windows compatibility, less redundant API calls

Changes since v0.5.1:

Full MS Windows support is restored in this version (AFAICT). Also worked out some kinks and polished a few interfaces.

Bug fixes

#330, #369, #372 Corrected some launch_config_builder logic bugs.
#368 Fixed an accidental primary context deactivation in p2pBandwidthLatencyTest
#360 Was missing an implementation of context_t::create_event()
#357 All assignment operators updated to appropriatlyhandle primary context reference unit propagation
#351 Fixed a typo in Windows-target-only code
#335 Redundant 0x in error messages
#329 marshalled_options.hpp errors with C++17
#324 marshalled_options.hpp needs cuda::span, but doesn't see it
#325 nvrtc/compilation_options.hpp needs to know about device_t

Windows compatibility

#345 Avoid non-portable assumptions regarding thread handles in vectorAdd_profiled
#344 Workaround for an MSVC SFINAE error with std::iterator_traits<Iter>
#343 std::experimental::filesystem not properly supported on Windows
#342 Don't try to use mkstemp on Windows
#341 Avoid size_t <-> unsigned overload clash on Windows
#340 Apply the CUDA_CB decoration to shared memory size-determiner function - it's actually necessary on Windows
#339 Avoid some MSVC compiler warnings
#338 Added missing inclusions to have Windows NT HANDLE defined
#337 Support for MSVC's standard-incompliant __cplusplus value
#347 Using ::std:: rather than std::, to avoid clashes with NVIDIA's libcustd - that is included by default by CUDA 11.7's nvcc.

Interface tweaks

RTC compilation options

#364 marshal() and render() are now stand-alone functions.
#363 Can now render compilation options to an ::std::string (in case you want to save/print them)
#362 Add a clear_language_dialect() to rtc::compilation_options_t
#361 If an rtc::compilation_options_t is asked to set the language dialect to an empty or null string - unset it instead
#355 Support taking the C++ language dialect as an ::std::string, not just a C-style string.

Other classes

#365 module::get_kernel() can now take an ::std::string
#359 Now exposing the interface for enqueuing kernels with type-erased arguments, passed via an array of void* (so far, you could only enqueue when you passed the parameter types_.
#356 (Almost) all proxy classes are now move-assignable and move-cosntructible, but not copy-assignable or copy-constructible. Move them or use cosnt-ref's.
#358 link_t should have a device_id()

Miscellaneous and internal issues

#367 Avoiding a redundant scoped context setting when enqueuing a kernel
#366 Spruced up CUDA_DEVICE_OR_THIS_SCOPE() and CUDA_CONTEXT_FOR_THIS_SCOPE()
#353 Added missing PCI function initializer to the PCI location wrappers class.
#352 Simplified the options marshalling code
#349 Prefix CMake options with CAW_, for use as a subproject (e.g. FetchContent)
#346 Fix CUDA installation in GitHub action scripts
#326 Drop redundant inclusions and make include order more "challenging" in vectorAdd examples
#328 Reduce gratuitous API calls in current_device::detail::set()
#331 Can now load a module from file into any context, not just the current context
#334 Reduce the number of redundant informative API calls enhancement resolved-on-development
#333 Don't treat freeing in a destroyed context as an error
#303 Use CUDA_VERSION instead of CUDART_VERSION
#370 cuda::context::current::exists() now return false, rather than throwing, if the CUDA driver has not been initialized
#373 In Debug builds, now validating launch configuration grid dimensions before enqueueing/launching anything (as CUDA tends to fail silently, e.g. for emtpy grids)

Caveats

Continuous build testing on Windows is failing on GitHub Actions due to trouble with CMake detecting the NVTX path. Assistance from users on this matter would be appreciated.

Assets 2

09 May 21:35

eyalroz

v0.5.1

b2d2c22

Version 0.5.1: Fully header-only, launch config builder

Changes since v0.5.0:

Build mechanism

#307 The library is now entirely header-only (the NVTX wrappers, which used to be compiled, are now all within headers).

New supported features

#308 Supporting both narrow/regular and wide character inputs for NVRTC compilation.
#309 Support for naming streams, devices and events with NVTX

Concepts/facilities introduced

#311 A Builder-pattern class for building launch configurations more easily.

Compatibility

#304 : Now compatible will all CUDA versions between 9.0 and 11.6

Bug fixes

#320 No longer getting an error message about module::create() when including only runtime_api.hpp.
#317 No longer "leaking" references to device primary contexts which made them never be destroyed after some point. Fixing this exposed a few other latent issues involving non-existence of primary contexts: #316.
#314 No longer failing to enqueue events when there is no current context.
#305, #306 :
- Added missing named errors to cuda::status
- Now using driver error codes wherever applicable (they only started to coincide with Runtime API error codes in a recent CUDA version)
- Renamed mis-named error: cuda::status::not_ready -> cuda::status::async_operations_not_yet_completed.
#315 In one of the example programs, we were launching a kernel on the current device rather than the one the user had chosen.

Miscellaneous and internal issues

#310 NVTX wrapper now uses driver-API-style
#303 Using CUDA_VERSION instead of CUDA_RT_VERSION where relevant.
#320 Added an example program only explicitly including runtime-API-related headers.
#321 Weakened requirement from kernel parameter types from TriviallyCopyable to just being trivially copy-constructible.

Caveats

Windows support is partially broken in this version.

Assets 2

0 Join discussion

19 Feb 18:01

eyalroz

v0.5.0

d793566

v0.5.0: Rewrite, Driver+Runtime API+NVRTC coverage

This is a near-complete under-the-hood rewrite of the API wrappers library, while maintaining its existing API almost entirely: The library now primarily relies on CUDA Driver API calls, with Runtime API calls used only where the driver does not straightforwardly provide the same functionality.

If you are only interested in the Runtime API, you may which to use the latest 0.4.x release. At the moment, that is 0.4.7.

Fundamental feature set additions

#9 Driver API support
#228, #262 : NVRTC support

Wrapper classes introduced

Contexts: context_t.
Dynamically vs. statically compiled kernels: kernel_t and apriori_compiled_kernel_t
Device primary contexts: device::primary_context_t
link_t: Linking together compiled code to satisfy symbol definition requirements and complete executables.
link_options_t defining options for linking.
Virtual memory: physical_allocation_t, address_range_reservation_t and mapping_t between pairs of the former.
Modules: module_t, made up of compiled binary/PTX code - functions, global symbols etc - which may be loaded into contexts

and via NVRTC support:

Programs: rtc::program_t, made up of CUDA or PTX source code: program_t.
Compilation options, rtc::compilation_options_t defining options for compiling programs.

(All of the classes above are under the cuda:: namespace)

Concepts/facilities introduced

Treatment of the primary context as a context and its creation or destruction
The context stack
The current context
Waiting on a the value of a scalar in global device memory
Access by specific contexts to specific contexts of peer devices

Caveats

Windows support is partially broken in this version.

Assets 2

12 Mar 10:34

eyalroz

v0.4.7

3fc9c4c

Version 0.4.7: Minor changes

This version has very few changes to relative to 0.4.6. These are:

Bug fixes

#301 : Now ensuring launch configurations can be assigned to each other.

Note: Users's help is kindly requested in preparing for the next major release, which will cover both the runtime and the driver API, and NVRTC as well. See this branch and contact me / open relevant issues.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fixes

Other changes

PTX Compilation library

Value-vs-reference issues

Other changes

New functionality

C++14-and-later compatibility fixes

Other changes

Run-time compilation functionality

Bug fixes

Other changes

Build-related fixes

Other changes

Runtime program compilation (NVRTC) improvements

Runtime-compilation-related Bug fixes

Other changes

Caveats

Bug fixes

Windows compatibility

Interface tweaks

RTC compilation options

Other classes

Miscellaneous and internal issues

Caveats

Build mechanism

New supported features

Concepts/facilities introduced

Compatibility

Bug fixes

Miscellaneous and internal issues

Caveats

Fundamental feature set additions

Wrapper classes introduced

Concepts/facilities introduced

Caveats

Bug fixes

Releases: eyalroz/cuda-api-wrappers

Version 0.6.1: Minor bug fixes

Bug fixes

Other changes

Version 0.6: PTX compilation library support

PTX Compilation library

Value-vs-reference issues

Other changes

Version 0.5.6: Compatibility and partial-inclusion fixes

New functionality

C++14-and-later compatibility fixes

Other changes

Version 0.5.5: Minor changes

Run-time compilation functionality

Bug fixes

Other changes

v0.5.4: Minor build issue fixes

Build-related fixes

Other changes

v0.5.3: Asynch memory ops, NVRTC compilation improvements

Runtime program compilation (NVRTC) improvements

Runtime-compilation-related Bug fixes

Other changes

Caveats

Version 0.5.2: Windows compatibility, less redundant API calls

Bug fixes

Windows compatibility

Interface tweaks

RTC compilation options

Other classes

Miscellaneous and internal issues

Caveats

Version 0.5.1: Fully header-only, launch config builder

Build mechanism

New supported features

Concepts/facilities introduced

Compatibility

Bug fixes

Miscellaneous and internal issues

Caveats

v0.5.0: Rewrite, Driver+Runtime API+NVRTC coverage

Fundamental feature set additions

Wrapper classes introduced

Concepts/facilities introduced

Caveats

Version 0.4.7: Minor changes

Bug fixes