Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Add libcu++ dependency; initial round of NV_IF_TARGET ports. #448

Merged
merged 7 commits into from
May 17, 2022

Conversation

alliepiper
Copy link
Collaborator

@alliepiper alliepiper commented Mar 24, 2022

Requires NVIDIA/thrust#1605.

This PR contains an initial set of changes necessary to migrate Thrust and CUB to NV_IF_TARGET and remove dependence on __CUDA_ARCH__. It does not fully remove all usages of __CUDA_ARCH__, but rather focuses on the following:

  • Establish the libcu++ dependency for both Thrust and CUB.
  • Remove obsolete checks for unsupported CUDA architectures.
  • Migrate host/device divergent code from #ifdef __CUDA_ARCH__ to use NV_IF_TARGET.

This also includes various bug fixes for issues exposed by the above.

Future PRs will address the remaining usages of __CUDA_ARCH__ in the CDP macros and the kernel dispatch infrastructure.

Pre-written Release Notes

Breaking Changes

Other Enhancements

@alliepiper alliepiper marked this pull request as draft March 24, 2022 13:09
@alliepiper alliepiper added this to the 1.17.0 milestone Mar 24, 2022
@alliepiper alliepiper added the blocked Currently cannot make progress. label Mar 24, 2022
@alliepiper alliepiper changed the title libcudacxx, if-target prep Add libcu++ dependency; initial round of NV_IF_TARGET ports. Mar 24, 2022
@alliepiper alliepiper marked this pull request as ready for review March 24, 2022 21:42
@alliepiper alliepiper added type: enhancement New feature or request. P0: must have Absolutely necessary. Critical issue, major blocker, etc. helps: nvc++ Helps or needed by NVC++. release: breaking change Include in "Breaking Changes" section of release notes. and removed blocked Currently cannot make progress. labels Apr 4, 2022
Copy link
Collaborator

@gevtushenko gevtushenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of code is much cleaner now, thanks! There are a few minor changes that need to be addressed.

cub/agent/agent_sub_warp_merge_sort.cuh Outdated Show resolved Hide resolved
cub/agent/agent_sub_warp_merge_sort.cuh Outdated Show resolved Hide resolved
cub/block/specializations/block_histogram_sort.cuh Outdated Show resolved Hide resolved
cub/block/block_reduce.cuh Show resolved Hide resolved
cub/detail/target.cuh Outdated Show resolved Hide resolved
cub/device/dispatch/dispatch_segmented_sort.cuh Outdated Show resolved Hide resolved
cub/device/dispatch/dispatch_spmv_orig.cuh Show resolved Hide resolved
cub/util_arch.cuh Show resolved Hide resolved
cub/util_debug.cuh Outdated Show resolved Hide resolved
experimental/defunct/test_device_seg_reduce.cu Outdated Show resolved Hide resolved
@alliepiper alliepiper modified the milestones: 1.17.0, 2.0.0 Apr 25, 2022
Copy link
Collaborator

@gevtushenko gevtushenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a plan on how to address dynamic shared memory allocation without PTX_ARCH when we support redux?. If so, this can be merged.

test/test_util.h Outdated Show resolved Hide resolved
test/test_warp_reduce.cu Outdated Show resolved Hide resolved
@alliepiper
Copy link
Collaborator Author

Is there a plan on how to address dynamic shared memory allocation without PTX_ARCH when we support redux?

We'll need to use NV_IF_TARGET, details TBD.

nvc++ will stop defining __NVCOMPILER_CUDA_ARCH__ soon, removing the
ability to determine the PTX arch at compile time.

This updates agents and collective algorithms to no longer require the
PTX_ARCH template parameter, and changes the CUB_WARP_SIZE(PTX_ARCH), etc
helpers to ignore their argument. These macros only differed on
obsolete arches and have no effect on currently supported architectures.
This fixes the issue reported in NVIDIA#299. There's no
clear reason why this should use `RandomBits` unconditionally.
The merge sort test with pow2 >20 fails on GTX 1650. Detect
bad_alloc failures and skip those tests. Tests for smaller
problem sizes will still fail if there's a bad_alloc.
@alliepiper alliepiper merged commit 5571258 into NVIDIA:main May 17, 2022
@alliepiper alliepiper deleted the if_target_prep branch May 17, 2022 17:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
helps: nvc++ Helps or needed by NVC++. P0: must have Absolutely necessary. Critical issue, major blocker, etc. release: breaking change Include in "Breaking Changes" section of release notes. type: enhancement New feature or request.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants