Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIP compilation #39

Merged
merged 6 commits into from
Jul 28, 2024
Merged

Conversation

ryanstocks00
Copy link
Contributor

I had to make these changes (copying functions from builtin.cu to builtin.hip) in order to get ExchCXX to succesfully compile with EXCHCXX_ENABLE_HIP=True - is this expected?

@wavefunction91
Copy link
Owner

Hi @ryanstocks00, thanks for the patch. Yes, that should more or less be what needs to be done to fi xthe HIP build. The AMD port has fallen on the back burner to some other priorities, but happy to get things fixed for those that want to (try to) use it. Unfortunately, I don't have access to any AMD GPU hardware at the moment, so I'm unable to verify if this is a fix or not (and HIP is not a part of our CI pipeline... mea culpa).

If you can provide me with a successful unit test run output with hardware details (e.g. a cat of whatever the analogous nvidia-smi is for AMD hardware... it's been a while since I've thought about it), etc, I'd be happy to merge this.

@ryanstocks00
Copy link
Contributor Author

Yep, we often have similar issues with our AMD pipeline breaking due to everyone developing on a NVIDIA system. One aspect that has helped a lot is we just point the HIP compiler at the cuda files directly as the syntax now is almost entirely compatible (with a simple e.g. #define cudaStream_t hipStream_t to make the compiler happy and only specialize the kernels to different hardware where it is required - would you be happy for a similar modification to be made here to reduce the amount of duplication? I will get the results of a unit test shortly

@ryanstocks00
Copy link
Contributor Author

I would also be interested to know how much development you foresee in these repositories going forward? Is this project still active or is it on the backburner for a bit?

@wavefunction91
Copy link
Owner

#define cudaStream_t hipStream_t

I could see how there might be a few issues with that, mostly build system related (CMake is not terribly intelligent when it comes to HIP, you really have to hold its hand), but if you want to mock something up I'd be happy to take a look at it. Another option would be to add a hipify generator to generate the HIP source at configure time - a bit more involved but more "future proof"

how much foreseen development

The eventual goal for this repo is to merge the functionality into libxc in collaboration with @susilehtola. ExchCXX was created as a stop gap as the GPU port in libxc was (and still is to my knowledge) a bit slapdash. If everything goes as planned, I'd hope we could get that worked out within the next year or so. That being said, there really isn't that much to do here functionality wise - just add new functionals as needed/requested until the libxc port is up and running. So I'd say we're probably in "bug whack-a-mole" and maintenance mode unless something bit comes up that needs attention (ala HIP not working :)).

Of course, if there's something missing that you'd like to see, please let us know.

GauXC is still a priority for dev and maintenance (solid user base and growing dev community). Happy to talk offline about what's in the works there.

@ryanstocks00
Copy link
Contributor Author

Unfortunately I am hitting some errors even with the non-MGGA kernels when running the tests

❯ ./build/test/xc_kernel_test --abort
Randomness seeded to: 2518024273

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
xc_kernel_test is a Catch2 v3.5.4 host application.
Run with -? for options

-------------------------------------------------------------------------------
HIP Interfaces
  Builtin Functionals
  EXC Regular: Unpolarized
-------------------------------------------------------------------------------
/software/projects/pawsey0799/ryan/setonix/software/ExchCXX/test/xc_kernel_test.cxx:2171
...............................................................................

/software/projects/pawsey0799/ryan/setonix/software/ExchCXX/test/xc_kernel_test.cxx:1980: FAILED:
  CHECK( exc[i] == Approx(exc_ref[i]) )
with expansion:
  2.0 == Approx( 0.0 )

===============================================================================
test cases:    7 |    6 passed | 1 failed
assertions: 9088 | 9087 passed | 1 failed

If you have any ideas what might cause this that would be very helpful, otherwise I will investigate a bit further

@ryanstocks00
Copy link
Contributor Author

I think I have fixed the failing cuda tests in cd04c1b, let me know if you are happy with this solution - I'm not sure where it is best to put the supports_unpolarized function

@ryanstocks00
Copy link
Contributor Author

@wavefunction91 Tests are now passing on AMD thanks to a refactor of the kernel tests
image
The issue was the cuda and hip tests diverged (with the cuda ones implementing the MGGA framework whilst the HIP ones did not). I have fixed it with a very simple example of how the cuda/hip code could be combined to avoid the duplication, let me know what you think! A nicer long-term solution would probably be to implement wrappers for the cuda/hip calls to avoid the macro redefinitions which can be a little confusing

Copy link
Owner

@wavefunction91 wavefunction91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough PR. Since all the CUDA -> HIP macros are local to the test driver, I'm fine with this for a stop gap, but in general we'd need to do something like we do in GauXC (type erasure, etc).

In addition to the comments, the CUDA path / function should be renamed to reflect their generality. I'll give you creative freedom on what that should be (keeping in mind that the SYCL path will not be able to be merged in the same way as the HIP/CUDA paths)

.gitignore Outdated Show resolved Hide resolved
src/builtin_interface.cxx Outdated Show resolved Hide resolved
for( auto kern : builtin_supported_kernels )
test_cuda_interface( TestInterface::EXC, EvalType::Regular,
Backend::builtin, kern, Spin::Unpolarized );
for (auto kern : builtin_supported_kernels)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll only comment here - but if you could revert these formatting changes, that would be great. We've needed a clang-format hook for a while here, happy to consider in the future / separate PR/issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted - a clang-format hook would be fantastic (though admittedly painful to introduce to an established repo). A pre-commit system would be great, I have liked working with something like this in the past:

# This file configures https://pre-commit.com/

repos:
  - repo: https://github.com/cheshirekow/cmake-format-precommit
    rev: v0.6.13
    hooks:
      - id: cmake-format
        name: CMake Format
        entry: cmake-format
  - repo: https://github.com/pre-commit/mirrors-clang-format
    rev: v16.0.6
    hooks:
      - id: clang-format
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.4.0
    hooks:
      - id: end-of-file-fixer
      - id: trailing-whitespace
      - id: check-yaml
      - id: check-shebang-scripts-are-executable
      - id: check-executables-have-shebangs
      - id: check-case-conflict
      - id: mixed-line-ending
  • ensures that everyone is using the same version of clang-format

Copy link
Owner

@wavefunction91 wavefunction91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@wavefunction91 wavefunction91 merged commit 7b01715 into wavefunction91:master Jul 28, 2024
6 checks passed
@wavefunction91
Copy link
Owner

@ryanstocks00 If you can update the GauXC PR to point to the merged commit, that would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants