Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mergeback 6.3 fixes #420

Merged
merged 6 commits into from
Nov 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,21 @@
# Changelog for hipCUB

Documentation for hipCUB is available at
[https://rocm.docs.amd.com/projects/hipCUB/en/latest/](https://rocm.docs.amd.com/projects/hipCUB/en/latest/).
Full documentation for hipCUB is available at [https://rocm.docs.amd.com/projects/hipCUB/en/latest/](https://rocm.docs.amd.com/projects/hipCUB/en/latest/).

## (Unreleased) hipCUB-x.x.x for ROCm 6.4.0

### Added
* Added `ForEach`, `ForEachN`, `ForEachCopy`, `ForEachCopyN` and `Bulk` functions to have parity with CUB.
* Added the `hipcub::CubVector` type for CUB parity.

## (Unreleased) hipCUB-3.3.0 for ROCm 6.3.0
## hipCUB-3.3.0 for ROCm 6.3.0

### Fixed

* Not all headers in hipCUB included `config.hpp` which could have resulted in build errors.

### Added
* Add support for large indices in `hipcub::DeviceSegmentedReduce::*`. rocPRIM's backend provides support for all reduce variants, but CUB's does not have support yet for `DeviceSegmentedReduce::Arg*`, so large indices support has been excluded for these as well in hipCUB.
* Support for large indices in `hipcub::DeviceSegmentedReduce::*` has been added, with the exception of `DeviceSegmentedReduce::Arg*`. Although rocPRIM's backend provides support for all reduce variants, CUB does not support large indices in `DeviceSegmentedReduce::Arg*`. For this reason, large index support is not available for `hipcub::DeviceSegmentedReduce::Arg*`.
* Add -t smoke option in rtest.py. It will run a subset of tests such that the total test time is in 5 minutes. Use python3 ./rtest.py --test smoke or python3 ./rtest.py -t smoke to execute smoke test.
* Add inplace overloads of `DeviceScan` functions.
* Add inplace overloads of `DeviceSelect::Flagged` and `DeviceSelect::If`.
Expand All @@ -27,7 +26,11 @@ Documentation for hipCUB is available at
### Changed
* The NVIDIA backend now requires CUB, Thrust and libcu++ 2.4.0. If it is not found it will be downloaded from the NVIDIA CCCL repository.

## (Unreleased) hipCUB-3.2.0 for ROCm 6.2.0
### Resolved issues

* Fixed an issue where `config.hpp` was not included in all hipCUB headers, resulting in build errors.

## hipCUB-3.2.0 for ROCm 6.2.0

### Added
* Add `DeviceCopy` function to have parity with CUB.
Expand Down
22 changes: 12 additions & 10 deletions hipcub/include/hipcub/backend/rocprim/util_type.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -547,12 +547,12 @@ struct BaseTraits<UNSIGNED_INTEGER, true, false, _UnsignedBits, T>

static HIPCUB_HOST_DEVICE __forceinline__ UnsignedBits TwiddleIn(UnsignedBits key)
{
return key_codec::encode(rocprim::detail::bit_cast<T>(key));
return key;
}

static HIPCUB_HOST_DEVICE __forceinline__ UnsignedBits TwiddleOut(UnsignedBits key)
{
return key_codec::decode(rocprim::detail::bit_cast<T>(key));
return key;
}

static HIPCUB_HOST_DEVICE __forceinline__ T Max()
Expand Down Expand Up @@ -596,12 +596,12 @@ struct BaseTraits<SIGNED_INTEGER, true, false, _UnsignedBits, T>

static HIPCUB_HOST_DEVICE __forceinline__ UnsignedBits TwiddleIn(UnsignedBits key)
{
return key_codec::encode(rocprim::detail::bit_cast<T>(key));
return key ^ HIGH_BIT;
};

static HIPCUB_HOST_DEVICE __forceinline__ UnsignedBits TwiddleOut(UnsignedBits key)
{
return key_codec::decode(rocprim::detail::bit_cast<T>(key));
return key ^ HIGH_BIT;
};

static HIPCUB_HOST_DEVICE __forceinline__ T Max()
Expand Down Expand Up @@ -695,12 +695,14 @@ struct BaseTraits<FLOATING_POINT, true, false, _UnsignedBits, T>

static HIPCUB_HOST_DEVICE __forceinline__ UnsignedBits TwiddleIn(UnsignedBits key)
{
return key_codec::encode(rocprim::detail::bit_cast<T>(key));
UnsignedBits mask = (key & HIGH_BIT) ? UnsignedBits(-1) : HIGH_BIT;
return key ^ mask;
};

static HIPCUB_HOST_DEVICE __forceinline__ UnsignedBits TwiddleOut(UnsignedBits key)
{
return key_codec::decode(rocprim::detail::bit_cast<T>(key));
UnsignedBits mask = (key & HIGH_BIT) ? HIGH_BIT : UnsignedBits(-1);
return key ^ mask;
};

static HIPCUB_HOST_DEVICE __forceinline__ T Max() {
Expand Down Expand Up @@ -751,12 +753,12 @@ struct NumericTraits<__uint128_t>

static __host__ __device__ __forceinline__ UnsignedBits TwiddleIn(UnsignedBits key)
{
return key_codec::encode(rocprim::detail::bit_cast<T>(key));
return key;
}

static __host__ __device__ __forceinline__ UnsignedBits TwiddleOut(UnsignedBits key)
{
return key_codec::decode(rocprim::detail::bit_cast<T>(key));
return key;
}

static __host__ __device__ __forceinline__ T Max()
Expand Down Expand Up @@ -788,12 +790,12 @@ struct NumericTraits<__int128_t>

static __host__ __device__ __forceinline__ UnsignedBits TwiddleIn(UnsignedBits key)
{
return key_codec::encode(rocprim::detail::bit_cast<T>(key));
return key ^ HIGH_BIT;
};

static __host__ __device__ __forceinline__ UnsignedBits TwiddleOut(UnsignedBits key)
{
return key_codec::decode(rocprim::detail::bit_cast<T>(key));
return key ^ HIGH_BIT;
};

static __host__ __device__ __forceinline__ T Max()
Expand Down
1 change: 0 additions & 1 deletion scripts/copyright-date/check-copyright.sh
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,6 @@ if $forkdiff; then
source_commit="remotes/$remote/HEAD"

# don't use fork-point for finding fork point (lol)
# see: https://stackoverflow.com/a/53981615
diff_hash="$(git merge-base "$source_commit" "$branch")"
fi

Expand Down
Loading