Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop Stream 2024-07-01 #377

Merged
merged 27 commits into from
Aug 3, 2024

Conversation

Beanavil
Copy link
Contributor

@Beanavil Beanavil commented Jul 1, 2024

  • Added missing thread test to CMakeLists.txt
  • Added missing includes of config.hpp
  • Added missing tests for half and bfloat16 to hipCUB's WarpReduce and Scan tests
  • Fixed test_hipcub_thread.cpp build errors with half and bfloat16 types
  • Added missing tests for half and bfloat16 to hipCUB's Thread test
  • Added IteratorCategory to rocPRIM's backend
  • Added large index support and test for hipCUB's DeviceSegmentedReduce
  • Updated TwiddleIn/Out implementation (for rocPRIM's backend) to use rocPRIM's radix_key_codec

@stanleytsang-amd
Copy link
Collaborator

For some reason, the CI builds are failing because the inline assembly for thread_load.hpp aren't being formed correctly:

In file included from ../../../../test/hipcub/test_hipcub_thread.cpp:30:
In file included from ../../../../hipcub/include/hipcub/thread/thread_load.hpp:39:
../../../../hipcub/include/hipcub/thread/../backend/rocprim/thread/thread_load.hpp:109:1: error: invalid instruction
  109 | HIPCUB_ASM_THREAD_LOAD_GROUP(LOAD_CV, "glc", "s_waitcnt", "vmcnt");
      | ^
../../../../hipcub/include/hipcub/thread/../backend/rocprim/thread/thread_load.hpp:87:5: note: expanded from macro 'HIPCUB_ASM_THREAD_LOAD_GROUP'
   87 |     HIPCUB_ASM_THREAD_LOAD(cache_modifier, llvm_cache_modifier, uint64_t, uint64_t, flat_load_dwordx2, v, wait_inst, wait_cmd); \
      |     ^
../../../../hipcub/include/hipcub/thread/../backend/rocprim/thread/thread_load.hpp:72:70: note: expanded from macro 'HIPCUB_ASM_THREAD_LOAD'
   72 |         asm volatile(#asm_operator " %0, %1 " llvm_cache_modifier "\n\t"                                     \
      |                                                                      ^
<inline asm>:2:2: note: instantiated into assembly here
    2 |         s_waitcntvmcnt(0)

However, I am unable to reproduce this compile time error with the same ROCM build used in our CI, nor in even newer builds. It's bizarre. I think we can merge in this PR, and if this error is legitimate, then it should show itself again in other environments and we can attempt to reproduce the problem then.

@stanleytsang-amd
Copy link
Collaborator

@Beanavil @Naraenda Upon consultation with @lawruble13 and looking closely at the compile error log:

  1. The CI system works by merging the PR into develop (which uses the same merge mechanism as GitHub), and then building and testing on the result of the merge
  2. It would appear as though the merge is not being applied as expected, as looking at that compile error above, it's referencing a line of code that doesn't exist in this PR, but does exist in the develop branch version of the file
  3. This might be resolved by merging develop into your branch or rebasing your branch on top of develop and making sure the file contents are in order. Can you do that ASAP and we'll fire off another CI build.

@Beanavil Beanavil force-pushed the develop_stream_20240701 branch from 17b72bd to 0399aa6 Compare July 31, 2024 17:59
@Beanavil
Copy link
Contributor Author

@stanleytsang-amd I just rebased the branch 👍🏼

@stanleytsang-amd
Copy link
Collaborator

@stanleytsang-amd I just rebased the branch 👍🏼

Wow, that was quick. Thanks! I'll rerun CI now and keep you posted.

@stanleytsang-amd
Copy link
Collaborator

Further analysis revealed that a change from 41f6679 seems to be the trigger - namely the moving of the \t and the removal of the hash symbol as part of a fix required for gfx12. It compiled and functioned fine in past testing, but newer compiler builds seem to have changed that, resulting in the above build error.
This PR fixes that, we'll get that merged in ASAP and then rerun CI for this PR. @Beanavil I think another rebase for this PR shouldn't be required, but I'll let you know if otherwise.

@stanleytsang-amd
Copy link
Collaborator

stanleytsang-amd commented Aug 1, 2024

@Beanavil @Naraenda #390 fixed the build error, but it's now exposed a potential problem with the DeviceSegmentedReduce tests, they appear to be hanging on our CI on multiple platforms. It doesn't say which test, but probably it is the new LargeIndices test? What is the expected GPU memory usage of that test?

The reason I ask is because both of the failing GPUs so far have 32GB of memory, and while I unfortunately don't have access to a MI100 or Navi32 at the moment to test, I have tested on a 48GB Navi31 and a MI300 with 192GB of memory and both execute the DeviceSegmentedReduce tests just fine. The MI200 pipeline on CI is still waiting for the next available executor, I'll check back later.

I'll see if I can find a MI100 or Navi32 to work on in the meantime.

@stanleytsang-amd
Copy link
Collaborator

stanleytsang-amd commented Aug 2, 2024

It is also hanging on MI200, so the hang doesn't seem 100% correlated with GPU memory capacity/usage. However, the I just tried the test on Navi32 with the same ROCm build and it passed...

@Beanavil @Naraenda Assuming we can't get this resolved before the end of the week, I'm considering merging in rocPRIM PR 575 tomorrow (and rocThrust) if the rocPRIM build issue is resolved (see my latest comment in the rocPRIM PR) and all tests pass, to get a head start on deeper testing for rocPRIM. I don't see explicit mention of this PR being dependent on rocPRIM PR 575, is that actually the case? Can we safely merge in rocPRIM without breaking hipCUB?

@Beanavil
Copy link
Contributor Author

Beanavil commented Aug 2, 2024

@stanleytsang-amd sorry for the late reply:

I'll see if I can find a MI100 or Navi32 to work on in the meantime.

In our CI we test MI100 (also Vega20 and V620), and all the tests passed successfully.

It is also hanging on MI200, so the hang doesn't seem 100% correlated with GPU memory capacity/usage. However, the I just tried the test on Navi32 with the same ROCm build and it passed...

Hmm, could we get some more details about the platforms on which it's hanging? I'll see if I can reproduce it within the docker image you mentioned in the meantime.

if the rocPRIM build issue is resolved (see my latest comment in the rocPRIM PR) and all tests pass, to get a head start on deeper testing for rocPRIM. I don't see explicit mention of this PR being dependent on rocPRIM PR 575, is that actually the case? Can we safely merge in rocPRIM without breaking hipCUB?

Let me double-check this just to be sure. Update: it builds fine 👍🏼

@stanleytsang-amd
Copy link
Collaborator

@stanleytsang-amd sorry for the late reply:

I'll see if I can find a MI100 or Navi32 to work on in the meantime.

In our CI we test MI100 (also Vega20 and V620), and all the tests passed successfully.

It is also hanging on MI200, so the hang doesn't seem 100% correlated with GPU memory capacity/usage. However, the I just tried the test on Navi32 with the same ROCm build and it passed...

Hmm, could we get some more details about the platforms on which it's hanging? I'll see if I can reproduce it within the docker image you mentioned in the meantime.

if the rocPRIM build issue is resolved (see my latest comment in the rocPRIM PR) and all tests pass, to get a head start on deeper testing for rocPRIM. I don't see explicit mention of this PR being dependent on rocPRIM PR 575, is that actually the case? Can we safely merge in rocPRIM without breaking hipCUB?

Let me double-check this just to be sure. Update: it builds fine 👍🏼

Thanks for verifying the independence of this hipCUB PR from the rocPRIM PR.

The Linux platforms we test on are:

  1. rhel9 && gfx1101
  2. sles15sp1 && gfx908
  3. ubuntu22 && gfx90a

The docker image I sent you yesterday is the same base image used in 1).
I've created #391 so that we can capture the exact test case that's failing in Jenkins.

@stanleytsang-amd
Copy link
Collaborator

This is a bit unorthodox but I think we will merge in this PR, in order to expose it to a larger set of testing environments - if the hang is indeed legitimate, it's my hope that one of these environments will actually allow us to reproduce the hang locally.

@stanleytsang-amd stanleytsang-amd merged commit 3b4f714 into ROCm:develop Aug 3, 2024
5 of 10 checks passed
umfranzw added a commit to umfranzw/hipCUB that referenced this pull request Aug 26, 2024
An update to the TwiddleIn/Out functions from PR ROCm#377 seems to be causing
a build failure in onnxruntime. This change reverts the single commit
(0721c2c) that made those changes.
We can re-apply the change with an appropriate fix in the future.
Note: the commits in the PR were squashed, so that commit will not
show up in the log.
umfranzw added a commit to umfranzw/hipCUB that referenced this pull request Aug 26, 2024
An update to the TwiddleIn/Out functions from PR ROCm#377 seems to be causing
a build failure in onnxruntime. This change reverts the single commit
(0721c2c) that made those changes.
We can re-apply the change with an appropriate fix in the future.
Note: the commits in the PR were squashed, so that commit will not
show up in the log.
umfranzw added a commit that referenced this pull request Aug 26, 2024
An update to the TwiddleIn/Out functions from PR #377 seems to be causing
a build failure in onnxruntime. This change reverts the single commit
(0721c2c) that made those changes.
We can re-apply the change with an appropriate fix in the future.
Note: the commits in the PR were squashed, so that commit will not
show up in the log.
umfranzw added a commit that referenced this pull request Nov 1, 2024
* Revert Bit Twiddle change from PR #377 (#397)

An update to the TwiddleIn/Out functions from PR #377 seems to be causing
a build failure in onnxruntime. This change reverts the single commit
(0721c2c) that made those changes.
We can re-apply the change with an appropriate fix in the future.
Note: the commits in the PR were squashed, so that commit will not
show up in the log.

* Remove website URL from comments (#398)

Referencing or using code from some websites is prohibited in this repository.
This change removes an informational reference in the comments.

* Add gfx1151 target (#399) (#401)

Co-authored-by: Stanley Tsang <[email protected]>

* Spolifroni amd/624 changelogcleanup upcoming (#411)

* edited to conform to standards

* edited to conform to standards

* updated the changelog for 6.3 (#418)

---------

Co-authored-by: amd-garydeng <[email protected]>
Co-authored-by: Stanley Tsang <[email protected]>
Co-authored-by: spolifroni-amd <[email protected]>
NB4444 pushed a commit to StreamHPC/hipCUB that referenced this pull request Nov 20, 2024
* Revert Bit Twiddle change from PR ROCm#377 (ROCm#397)

An update to the TwiddleIn/Out functions from PR ROCm#377 seems to be causing
a build failure in onnxruntime. This change reverts the single commit
(0721c2c) that made those changes.
We can re-apply the change with an appropriate fix in the future.
Note: the commits in the PR were squashed, so that commit will not
show up in the log.

* Remove website URL from comments (ROCm#398)

Referencing or using code from some websites is prohibited in this repository.
This change removes an informational reference in the comments.

* Add gfx1151 target (ROCm#399) (ROCm#401)

Co-authored-by: Stanley Tsang <[email protected]>

* Spolifroni amd/624 changelogcleanup upcoming (ROCm#411)

* edited to conform to standards

* edited to conform to standards

* updated the changelog for 6.3 (ROCm#418)

---------

Co-authored-by: amd-garydeng <[email protected]>
Co-authored-by: Stanley Tsang <[email protected]>
Co-authored-by: spolifroni-amd <[email protected]>
stanleytsang-amd added a commit that referenced this pull request Nov 20, 2024
* Resolve "hipCUB compilation fails with latest rocPRIM changes"

* Rework device_histogram test

* Const-qualify half_t::operator+/*

* Update CUB/Thrust/libcu++ to 2.5.0

* Add tets for large number of items for hipcub::DeviceSelect::If

* Update example_device_radix_sort.cu

* fix: reset error code in device_radix_sort test after out-of-memory error

* Expose DeviceSelect::FlaggedIf

* Add test for DeviceSelect::FlaggedIf

* Add benchmark for DeviceSelect::FlaggedIf

* Set c++ version to 17 and create warning

* Fix ambiguous variable error

* Fix nodiscard warnings

* Set CI tests for both c++14 and 17

* Fix nodiscard warnings in example

* Examples clang-format

* Fixed clang format and dates

* temp fix: wrong error on cuda machines

* Ignore error from hipGetLastError to prevent warning

* Deleted declaration of unecessary hipError_t result

* Deleted declaration of unecessary hipError_t result

* Format amending

* Format amending

* Format amending

* Cleanup

* Formatting

* Added wrapper for BindTexture

* Fixed some issues in the test

* Removed unnecessary code

* Mergeback 6.3 fixes (#420)

* Revert Bit Twiddle change from PR #377 (#397)

An update to the TwiddleIn/Out functions from PR #377 seems to be causing
a build failure in onnxruntime. This change reverts the single commit
(0721c2c) that made those changes.
We can re-apply the change with an appropriate fix in the future.
Note: the commits in the PR were squashed, so that commit will not
show up in the log.

* Remove website URL from comments (#398)

Referencing or using code from some websites is prohibited in this repository.
This change removes an informational reference in the comments.

* Add gfx1151 target (#399) (#401)

Co-authored-by: Stanley Tsang <[email protected]>

* Spolifroni amd/624 changelogcleanup upcoming (#411)

* edited to conform to standards

* edited to conform to standards

* updated the changelog for 6.3 (#418)

---------

Co-authored-by: amd-garydeng <[email protected]>
Co-authored-by: Stanley Tsang <[email protected]>
Co-authored-by: spolifroni-amd <[email protected]>

* Add parameter to specify rocPRIM branch to use (#403)

* added link to documentation (#416)

---------

Co-authored-by: Balint Soproni <[email protected]>
Co-authored-by: Beatriz Navidad Vilches <[email protected]>
Co-authored-by: Saiyang Zhang <[email protected]>
Co-authored-by: Wayne Franz <[email protected]>
Co-authored-by: amd-garydeng <[email protected]>
Co-authored-by: Stanley Tsang <[email protected]>
Co-authored-by: spolifroni-amd <[email protected]>
Co-authored-by: Lauren Wrubleski <[email protected]>
stanleytsang-amd added a commit that referenced this pull request Nov 20, 2024
* Revert Bit Twiddle change from PR #377 (#397)

An update to the TwiddleIn/Out functions from PR #377 seems to be causing
a build failure in onnxruntime. This change reverts the single commit
(0721c2c) that made those changes.
We can re-apply the change with an appropriate fix in the future.
Note: the commits in the PR were squashed, so that commit will not
show up in the log.

* Remove website URL from comments (#398)

Referencing or using code from some websites is prohibited in this repository.
This change removes an informational reference in the comments.

* Add gfx1151 target (#399) (#401)

Co-authored-by: Stanley Tsang <[email protected]>

* Spolifroni amd/624 changelogcleanup upcoming (#411)

* edited to conform to standards

* edited to conform to standards

* updated the changelog for 6.3 (#418)

* added support gfx1151 and gfx12 to default gpu list

* updated changelog

* fixed minor grammar mistakes in changelog

* Update CHANGELOG.md

Co-authored-by: spolifroni-amd <[email protected]>

* Update CHANGELOG.md

Co-authored-by: spolifroni-amd <[email protected]>

* Remove gfx940,gfx941 targets (#424)

* Update version for 6.4

* Add extended tests changelog entry

---------

Co-authored-by: Wayne Franz <[email protected]>
Co-authored-by: amd-garydeng <[email protected]>
Co-authored-by: spolifroni-amd <[email protected]>
Co-authored-by: NguyenNhuDi <[email protected]>
Co-authored-by: Val Movsik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants