Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Add libcu++ dependency; initial round of NV_IF_TARGET ports. #1605

Merged
merged 10 commits into from
May 17, 2022

Conversation

alliepiper
Copy link
Collaborator

@alliepiper alliepiper commented Jan 24, 2022

Requires NVIDIA/cub#448.

This PR contains an initial set of changes necessary to migrate Thrust and CUB to NV_IF_TARGET and remove dependence on __CUDA_ARCH__. It does not fully remove all usages of __CUDA_ARCH__, but rather focuses on the following:

  • Establish the libcu++ dependency for both Thrust and CUB.
  • Remove obsolete checks for unsupported CUDA architectures.
  • Migrate host/device divergent code from #ifdef __CUDA_ARCH__ to use NV_IF_TARGET.

This also includes various bug fixes for issues exposed by the above.

Future PRs will address the remaining usages of __CUDA_ARCH__ in the CDP macros and the kernel dispatch infrastructure.

Pre-written Release Notes

Breaking Changes

Other Enhancements

Bug Fixes

@alliepiper alliepiper added the blocked Cannot make progress. label Jan 24, 2022
@alliepiper alliepiper self-assigned this Jan 24, 2022
@alliepiper alliepiper marked this pull request as draft January 24, 2022 23:11
@alliepiper
Copy link
Collaborator Author

run tests

@alliepiper alliepiper added this to the 1.17.0 milestone Jan 25, 2022
@alliepiper
Copy link
Collaborator Author

run tests

@alliepiper alliepiper force-pushed the if_target_prep branch 2 times, most recently from 058166e to 9065dda Compare February 4, 2022 16:42
@alliepiper
Copy link
Collaborator Author

run tests

@alliepiper alliepiper force-pushed the if_target_prep branch 2 times, most recently from 17cfee7 to e12b463 Compare February 4, 2022 22:17
@alliepiper
Copy link
Collaborator Author

run tests

@alliepiper
Copy link
Collaborator Author

run tests

@alliepiper
Copy link
Collaborator Author

run tests

@alliepiper
Copy link
Collaborator Author

run tests

@alliepiper
Copy link
Collaborator Author

run tests

@alliepiper
Copy link
Collaborator Author

run tests

@alliepiper alliepiper changed the title Draft: libcudacxx, if-target prep/testing Add libcu++ dependency; initial round of NV_IF_TARGET ports. Mar 24, 2022
@alliepiper
Copy link
Collaborator Author

run tests

@alliepiper alliepiper marked this pull request as ready for review March 24, 2022 21:42
@alliepiper
Copy link
Collaborator Author

run tests

1 similar comment
@alliepiper
Copy link
Collaborator Author

run tests

@alliepiper
Copy link
Collaborator Author

Rebased. Now that the version has been bumped to 2.0.0 we can start to seriously think about merging this.

run tests

Copy link
Collaborator

@gevtushenko gevtushenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few comments that I consider critical, but the source of the issue isn't caused by this PR. So the comments shouldn't block it. The rest of the comments are quite optional.

testing/allocator.cu Show resolved Hide resolved
thrust/system/cuda/config.h Show resolved Hide resolved
thrust/system/cuda/detail/util.h Outdated Show resolved Hide resolved
thrust/system/cuda/detail/core/util.h Outdated Show resolved Hide resolved
thrust/system/cuda/detail/util.h Outdated Show resolved Hide resolved
thrust/system/cuda/detail/core/util.h Outdated Show resolved Hide resolved
#include <thrust/system/cuda/detail/util.h>
#include <thrust/system/detail/bad_alloc.h>
#include <thrust/detail/malloc_and_free.h>

#ifdef THRUST_CACHING_DEVICE_MALLOC
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any documentation on this macro, is it still in use?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea.

@alliepiper
Copy link
Collaborator Author

run tests

There's no way for a user to meaningfully use this, since libcudacxx
is a required dependency. It is checked during the initial
`find_package(Thrust)` call, before the user would have access to
Thrust's CMake API.

Updated the CMake README.md with instructions for using an explicit
libcudacxx target.
- The `g_state` flag wasn't reset between executions.
- The `destroy` method was being invoke in the current host system,
  not the system that owned the allocated memory (always cpp).
  This broke on MSVC's OpenMP implementation, where it seemed to be
  asserting the `g_state` flag before it was updated by `destroy`.
  This only happened on MSVC when host system = OMP, and appears to
  be a bug/miscompile in MSVC (repro'd on 2019). Fixed by explicitly
  tagging the allocator system to cpp.
- Added check that `destroy` is not invoked on empty vectors.
@alliepiper
Copy link
Collaborator Author

run tests

@alliepiper alliepiper merged commit b223930 into NVIDIA:main May 17, 2022
@alliepiper alliepiper deleted the if_target_prep branch May 17, 2022 17:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
helps: nvc++ Helps or needed by NVC++. P0: must have Absolutely necessary. Critical issue, major blocker, etc. release: breaking change Include in "Breaking Changes" section of release notes. type: enhancement New feature or request.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants