Skip to content

0.4.2: Bug fixes, compatibility improvements, range-for over devices

Compare
Choose a tag to compare
@eyalroz eyalroz released this 24 Feb 13:20
· 507 commits to master since this release

This is a minor release, with mostly bug fixes and compatibility improvements. Other than in its version number, it is identical to 0.4.1, which was retracted due to a version numbering issue.

Changes since 0.4:

  • Can now access all devices as a range: for(auto device : cuda::devices()) { /* etc. etc. */ }.
  • Wrapper classes (specifically, events and streams) now have non-owning copy constructors.
  • A stream priority range is now its own class.

Bug fixes:

  • Dropped invalid stream-priority-related constant.
  • The device management test was getting the direction of priority ranges backwards.
  • The p2pBandwidthLatencyTest example program was failing with cross-device event wait attempts, due to calling wait() and record() on the wrong stream.
  • Removed a spurious template specifier in device.hpp
  • Can now construct cuda::launch_configuration_t from two integers with C++14 and later.

Build, compatibility, usability:

  • CMake 3.18 and later no longer complain about the lack of a CUDA_ARCHITECTURES value.
  • Should now be compatible with MSVC 16.8 on Windows.