0.4.2: Bug fixes, compatibility improvements, range-for over devices
This is a minor release, with mostly bug fixes and compatibility improvements. Other than in its version number, it is identical to 0.4.1, which was retracted due to a version numbering issue.
Changes since 0.4:
- Can now access all devices as a range:
for(auto device : cuda::devices()) { /* etc. etc. */ }
. - Wrapper classes (specifically, events and streams) now have non-owning copy constructors.
- A stream priority range is now its own class.
Bug fixes:
- Dropped invalid stream-priority-related constant.
- The device management test was getting the direction of priority ranges backwards.
- The
p2pBandwidthLatencyTest
example program was failing with cross-device event wait attempts, due to callingwait()
andrecord()
on the wrong stream. - Removed a spurious template specifier in
device.hpp
- Can now construct
cuda::launch_configuration_t
from two integers with C++14 and later.
Build, compatibility, usability:
- CMake 3.18 and later no longer complain about the lack of a
CUDA_ARCHITECTURES
value. - Should now be compatible with MSVC 16.8 on Windows.