6.4 version fix and mergeback 6.3 hotfixes (#431)

* Revert Bit Twiddle change from PR #377 (#397) An update to the TwiddleIn/Out functions from PR #377 seems to be causing a build failure in onnxruntime. This change reverts the single commit (0721c2c) that made those changes. We can re-apply the change with an appropriate fix in the future. Note: the commits in the PR were squashed, so that commit will not show up in the log. * Remove website URL from comments (#398) Referencing or using code from some websites is prohibited in this repository. This change removes an informational reference in the comments. * Add gfx1151 target (#399) (#401) Co-authored-by: Stanley Tsang <[email protected]> * Spolifroni amd/624 changelogcleanup upcoming (#411) * edited to conform to standards * edited to conform to standards * updated the changelog for 6.3 (#418) * added support gfx1151 and gfx12 to default gpu list * updated changelog * fixed minor grammar mistakes in changelog * Update CHANGELOG.md Co-authored-by: spolifroni-amd <[email protected]> * Update CHANGELOG.md Co-authored-by: spolifroni-amd <[email protected]> * Remove gfx940,gfx941 targets (#424) * Update version for 6.4 * Add extended tests changelog entry --------- Co-authored-by: Wayne Franz <[email protected]> Co-authored-by: amd-garydeng <[email protected]> Co-authored-by: spolifroni-amd <[email protected]> Co-authored-by: NguyenNhuDi <[email protected]> Co-authored-by: Val Movsik <[email protected]>
ROCm · Nov 20, 2024 · 8eb0a34 · 8eb0a34
1 parent a113f98
commit 8eb0a34
Show file tree

Hide file tree

Showing 3 changed files with 15 additions and 18 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,10 +2,11 @@
 
 Full documentation for hipCUB is available at [https://rocm.docs.amd.com/projects/hipCUB/en/latest/](https://rocm.docs.amd.com/projects/hipCUB/en/latest/).
 
-## (Unreleased) hipCUB-x.x.x for ROCm 6.4.0
+## hipCUB-3.4.0 for ROCm 6.4.0
 
 ### Added
 * Added regression tests to `rtest.py`. These tests recreate scenarios that have caused hardware problems in past emulation environments. Use `python rtest.py [--emulation|-e|--test|-t]=regression` to run these tests.
+* Added extended tests to `rtest.py`. These tests are extra tests that did not fit the criteria of smoke and regression tests. These tests will take much longer to run relative to smoke and regression tests. Use `python rtest.py [--emulation|-e|--test|-t]=extended` to run these tests.
 * Added `ForEach`, `ForEachN`, `ForEachCopy`, `ForEachCopyN` and `Bulk` functions to have parity with CUB.
 * Added the `hipcub::CubVector` type for CUB parity.
 * Added `--emulation` option for `rtest.py`
@@ -18,25 +19,19 @@ Full documentation for hipCUB is available at [https://rocm.docs.amd.com/project
 * The NVIDIA backend now requires CUB, Thrust and libcu++ 2.5.0. If it is not found it will be downloaded from the NVIDIA CCCL repository.
 * Changed the C++ version from 14 to 17. C++14 will be deprecated in the next major release.
 
-## hipCUB-3.3.0 for ROCm 6.3.0
-
-### Fixed
-* Not all headers in hipCUB included `config.hpp` which could have resulted in build errors.
+## hipCUB 3.3.0 for ROCm 6.3.0
 
 ### Added
+
 * Support for large indices in `hipcub::DeviceSegmentedReduce::*` has been added, with the exception of `DeviceSegmentedReduce::Arg*`. Although rocPRIM's backend provides support for all reduce variants, CUB does not support large indices in `DeviceSegmentedReduce::Arg*`. For this reason, large index support is not available for `hipcub::DeviceSegmentedReduce::Arg*`.
-* Add -t smoke option in rtest.py. It will run a subset of tests such that the total test time is in 5 minutes. Use python3 ./rtest.py --test smoke or python3 ./rtest.py -t smoke to execute smoke test.
-* Add inplace overloads of `DeviceScan` functions.
-* Add inplace overloads of `DeviceSelect::Flagged` and `DeviceSelect::If`.
-* Add `DeviceReduce::TransformReduce`.
-* Add `DeviceSelect::UniqueByKey` overload with `equality_op`.
-* Add support for large indices in `DeviceSelect::UniqueByKey`.
 
 ### Changed
-* The NVIDIA backend now requires CUB, Thrust and libcu++ 2.4.0. If it is not found it will be downloaded from the NVIDIA CCCL repository.
 
-### Resolved issues
+* Changed the default value of `rmake.py -a` to `default_gpus`. This is equivalent to `gfx906:xnack-,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201`.
+* The NVIDIA backend now requires CUB, Thrust, and libcu++ 2.3.2.
 
+### Resolved issues
+* Fixed an issue in `rmake.py` where the list storing cmake options would contain individual characters instead of a full string of options.
 * Fixed an issue where `config.hpp` was not included in all hipCUB headers, resulting in build errors.
 
 ## hipCUB-3.2.0 for ROCm 6.2.0

diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -91,11 +91,11 @@ if(NOT (CMAKE_CXX_COMPILER MATCHES ".*nvcc$" OR "${CMAKE_CXX_COMPILER_ID}" STREQ
     if(BUILD_ADDRESS_SANITIZER)
       # ASAN builds require xnack
       rocm_check_target_ids(DEFAULT_AMDGPU_TARGETS
-        TARGETS "gfx908:xnack+;gfx90a:xnack+;gfx940:xnack+;gfx941:xnack+;gfx942:xnack+"
+        TARGETS "gfx908:xnack+;gfx90a:xnack+;gfx942:xnack+"
       )
     else()
       rocm_check_target_ids(DEFAULT_AMDGPU_TARGETS
-        TARGETS "gfx803;gfx900:xnack-;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack-;gfx90a:xnack+;gfx940;gfx941;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1151;gfx1200;gfx1201"
+        TARGETS "gfx803;gfx900:xnack-;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack-;gfx90a:xnack+;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1151;gfx1200;gfx1201"
       )
     endif()
     set(GPU_TARGETS "${DEFAULT_AMDGPU_TARGETS}" CACHE STRING "GPU architectures to compile for" FORCE)
@@ -115,7 +115,7 @@ if(BUILD_ADDRESS_SANITIZER)
 endif()
 
 # Setup VERSION
-set(VERSION_STRING "3.3.0")
+set(VERSION_STRING "3.4.0")
 rocm_setup_version(VERSION ${VERSION_STRING})
 
 # Print configuration summary

diff --git a/rmake.py b/rmake.py
@@ -20,6 +20,8 @@ def parse_args():
     parser = argparse.ArgumentParser(description="""
     Checks build arguments
     """)
+    default_gpus = 'gfx906:xnack-,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201'
+
     parser.add_argument('-g', '--debug', required=False, default=False,  action='store_true',
                         help='Generate Debug build (default: False)')
     parser.add_argument(      '--build_dir', type=str, required=False, default="build",
@@ -35,7 +37,7 @@ def parse_args():
                         help='Install after build (default: False)')
     parser.add_argument(      '--cmake-darg', required=False, dest='cmake_dargs', action='append', default=[],
                         help='List of additional cmake defines for builds (e.g. CMAKE_CXX_COMPILER_LAUNCHER=ccache)')
-    parser.add_argument('-a', '--architecture', dest='gpu_architecture', required=False, default="gfx906;gfx1030;gfx1100;gfx1101;gfx1102", #:sramecc+:xnack-" ) #gfx1030" ) #gfx906" ) # gfx1030" )
+    parser.add_argument('-a', '--architecture', dest='gpu_architecture', required=False, default=default_gpus, #:sramecc+:xnack-" ) #gfx1030" ) #gfx906" ) # gfx1030" )
                         help='Set GPU architectures, e.g. all, gfx000, gfx803, gfx906:xnack-;gfx1030;gfx1100 (optional, default: all)')                        
     parser.add_argument('-v', '--verbose', required=False, default=False, action='store_true',
                         help='Verbose build (default: False)')
@@ -115,7 +117,7 @@ def config_cmd():
         else:
           cmake_executable = "cmake"
         toolchain = "toolchain-linux.cmake"
-        cmake_platform_opts = f"-DROCM_DIR:PATH={rocm_path} -DCPACK_PACKAGING_INSTALL_PREFIX={rocm_path}"
+        cmake_platform_opts = [f"-DROCM_DIR:PATH={rocm_path}", f"-DCPACK_PACKAGING_INSTALL_PREFIX={rocm_path}"]
 
     tools = f"-DCMAKE_TOOLCHAIN_FILE={toolchain}"
     cmake_options.append( tools )