Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort pr #199

Merged
merged 68 commits into from
Mar 22, 2022
Merged

Sort pr #199

merged 68 commits into from
Mar 22, 2022

Conversation

mfoerste4
Copy link
Contributor

@mfoerste4 mfoerste4 commented Feb 18, 2022

Added limited sort-support. All communication has to be done as part of task preparation.

  • 1-D (or flattened) sort is only supported for non-distributed data and will be broadcasted.

  • N-D data will swap the sort-axis to the last dimension, ensure c-order and broadcast the last dimension in order to sort in a single process

  • Sort is performed by
    ** std::stable_sort (CPU)
    ** thrust::stable_sort (OMP)
    ** thrust::stable_sort (GPU - complex data)
    ** cub::DeviceRadixSort, cub::DeviceSegmentedRadixSort (GPU - all primitive types)

  • merged with NCCL-branch for distributed 1D data on GPU

@magnatelee magnatelee self-requested a review February 19, 2022 04:37
@@ -32,6 +32,7 @@
UnaryRedCode,
)
from .linalg.cholesky import cholesky
from .sorting import sorting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a minor quibble: why don't we just name things sort everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no particular reasoning behind this - I will change it.

swapped_copy.copy(swapped, deep=True)

# run sort on last axis
sort_result = output.runtime.create_empty_thunk(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this thunk necessary if swapped_copy is already a copy we can mutate? Can we do the sorting in place using swapped_copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to keep the logic simple here. The underlying code dose not support input==output at this time. I could change this, but this will still not always be an option (with argsort the input is of different type than the output).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code skips the copy whenever possible.


if output.ndim > 1:
task.add_broadcast(input.base, input.ndim - 1)
elif output.runtime.num_gpus > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm asking this again, but why do we use NCCL when there's only one GPU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this

@magnatelee
Copy link
Contributor

Added a couple of minor comments. I'll make another pass tomorrow.

src/cunumeric/sort/sort.cc Show resolved Hide resolved
{
if (argptr == nullptr) {
// sort (in place)
#pragma omp parallel for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe your GPU code can be repurposed here, as Thrust can use OpenMP as a device. It'd be interesting to check if that performs any better than this code. I suggest we do that once we wrap up the upcoming release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the manual pragmas and added the omp execution policy to the thrust call for now. This might not be optimal in all scenarios but keeps it simple until we decide to focus on it.

src/cunumeric/sort/sort.cu Show resolved Hide resolved
src/cunumeric/sort/sort.cu Show resolved Hide resolved
src/cunumeric/sort/sort.cu Show resolved Hide resolved
@mfoerste4 mfoerste4 merged commit d97b567 into nv-legate:branch-22.03 Mar 22, 2022
@mfoerste4 mfoerste4 deleted the sort_pr branch March 22, 2022 09:11
ipdemes pushed a commit to ipdemes/cunumeric that referenced this pull request Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants