Sort pr #199

mfoerste4 · 2022-02-18T23:53:59Z

Added limited sort-support. All communication has to be done as part of task preparation.

1-D (or flattened) sort is only supported for non-distributed data and will be broadcasted.
N-D data will swap the sort-axis to the last dimension, ensure c-order and broadcast the last dimension in order to sort in a single process
Sort is performed by
** std::stable_sort (CPU)
** thrust::stable_sort (OMP)
** thrust::stable_sort (GPU - complex data)
** cub::DeviceRadixSort, cub::DeviceSegmentedRadixSort (GPU - all primitive types)
merged with NCCL-branch for distributed 1D data on GPU

… sort_LGT-203_pr

merge 22.03

for more information, see https://pre-commit.ci

magnatelee · 2022-03-16T06:19:59Z

cunumeric/deferred.py

@@ -32,6 +32,7 @@
    UnaryRedCode,
 )
 from .linalg.cholesky import cholesky
+from .sorting import sorting


a minor quibble: why don't we just name things sort everywhere?

There is no particular reasoning behind this - I will change it.

magnatelee · 2022-03-16T06:22:05Z

cunumeric/sorting.py

+    swapped_copy.copy(swapped, deep=True)
+
+    # run sort on last axis
+    sort_result = output.runtime.create_empty_thunk(


Why is this thunk necessary if swapped_copy is already a copy we can mutate? Can we do the sorting in place using swapped_copy?

I tried to keep the logic simple here. The underlying code dose not support input==output at this time. I could change this, but this will still not always be an option (with argsort the input is of different type than the output).

The code skips the copy whenever possible.

magnatelee · 2022-03-16T06:22:57Z

cunumeric/sorting.py

+
+    if output.ndim > 1:
+        task.add_broadcast(input.base, input.ndim - 1)
+    elif output.runtime.num_gpus > 0:


I'm asking this again, but why do we use NCCL when there's only one GPU?

I changed this

magnatelee · 2022-03-16T06:25:32Z

Added a couple of minor comments. I'll make another pass tomorrow.

src/cunumeric/sort/sort.cc

magnatelee · 2022-03-17T17:57:36Z

src/cunumeric/sort/sort_omp.cc

+  {
+    if (argptr == nullptr) {
+      // sort (in place)
+#pragma omp parallel for


I believe your GPU code can be repurposed here, as Thrust can use OpenMP as a device. It'd be interesting to check if that performs any better than this code. I suggest we do that once we wrap up the upcoming release.

I removed the manual pragmas and added the omp execution policy to the thrust call for now. This might not be optimal in all scenarios but keeps it simple until we decide to focus on it.

src/cunumeric/sort/sort.cu

for more information, see https://pre-commit.ci

mfoerste4 and others added 24 commits February 8, 2022 13:49

update OpenBLAS version to support new architectures

9594f19

initial draft for sort, 1D, key sort

69fbf7d

fixed compile error

dfa7adb

OpenMP non-distributed implementation, some small fixes, benchmark tool

4c7c3a2

added missing include

710c084

switch to parallel gcc sort

07bdb16

Enable N-D non-distributed sort

58b2bf4

update OpenBLAS version to support new architectures

f585dd5

initial draft for sort, 1D, key sort

3a08481

fixed compile error

131fb6d

OpenMP non-distributed implementation, some small fixes, benchmark tool

b115835

added missing include

03608cf

switch to parallel gcc sort

85bc3a7

Enable N-D non-distributed sort

188077b

Merge branch 'sort_LGT-203_pr' of github.com:mfoerste4/cunumeric into…

79d9c72

… sort_LGT-203_pr

merge after rebase to 22.03

5cd0956

added cupy-style sort kernel, support axis=None, improved benchmark

9063d77

Merge branch 'nv-legate:branch-22.03' into sort_LGT-203_pr

bec9143

refactoring and documentation

5e982c2

Merge branch 'nv-legate:branch-22.03' into sort_pr

e737e36

added argsort support and test coverage

c9e4407

Merge branch 'nv-legate:branch-22.03' into sort_pr

a5204fd

adjusted docstring

fd0d3f8

extract messy code from deferred

6c385dd

magnatelee self-requested a review February 19, 2022 04:37

mfoerste4 and others added 5 commits February 23, 2022 11:49

conflic resolve

a12df50

Merge branch 'nv-legate-branch-22.03' into sort_pr

878059e

refactor sort c-code, simplify, reduce duplicated code

49c3f3b

change argsort return type to int64

6a06149

Merge branch 'nv-legate:branch-22.03' into sort_pr

22941d9

mfoerste4 and others added 10 commits March 9, 2022 12:15

minor adjustemnts, comments

b210b69

argsort also allows non-stable sort

d945468

adjusted more tests to force stable sort when comparing argsort results

898a8d2

clarify offset iterator usage

09ac1c8

Merge branch 'sort_pr' into merge-22.03

02e0b53

Merge pull request #4 from nv-legate/merge-22.03

78cc482

merge 22.03

[pre-commit.ci] auto fixes from pre-commit.com hooks

9cd31bb

for more information, see https://pre-commit.ci

fixed merge conflict

da79f86

ensure 16byte alignment for NCCL transfers

04f811b

Merge branch 'nv-legate:branch-22.03' into sort_pr

3cb09ce

magnatelee reviewed Mar 16, 2022

View reviewed changes

mfoerste4 and others added 4 commits March 16, 2022 09:55

Merge branch 'nv-legate:branch-22.03' into sort_pr

6666a05

some minor adjustments

568523f

Merge branch 'sort_pr' of github.com:mfoerste4/cunumeric into sort_pr

3afa55a

fixed renaming

e1b6c31

magnatelee reviewed Mar 17, 2022

View reviewed changes

mfoerste4 and others added 10 commits March 18, 2022 03:57

manually free temporary memory to reduce peak usage

2edd7ba

refactor sort interface to prevent 1 unneeded copy

ee52211

Merge branch 'nv-legate:branch-22.03' into sort_pr

4203492

fixed init issue

927b54f

[pre-commit.ci] auto fixes from pre-commit.com hooks

10e7ebb

for more information, see https://pre-commit.ci

change to thrust openmp policy

e52b017

Merge branch 'sort_pr' of github.com:mfoerste4/cunumeric into sort_pr

fa9df75

Merge branch 'nv-legate:branch-22.03' into sort_pr

99ec004

removed another copy on python side in case we can sort in place

99798e3

Merge branch 'sort_pr' of github.com:mfoerste4/cunumeric into sort_pr

21d47b7

mfoerste4 merged commit d97b567 into nv-legate:branch-22.03 Mar 22, 2022

mfoerste4 deleted the sort_pr branch March 22, 2022 09:11

ipdemes pushed a commit to ipdemes/cunumeric that referenced this pull request Jun 7, 2022

Support for string scalar arguments (nv-legate#199)

3b870f4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sort pr #199

Sort pr #199

mfoerste4 commented Feb 18, 2022 •

edited

Loading

magnatelee Mar 16, 2022

mfoerste4 Mar 16, 2022

magnatelee Mar 16, 2022

mfoerste4 Mar 16, 2022

mfoerste4 Mar 21, 2022

magnatelee Mar 16, 2022

mfoerste4 Mar 21, 2022

magnatelee commented Mar 16, 2022

magnatelee Mar 17, 2022

mfoerste4 Mar 18, 2022

Sort pr #199

Sort pr #199

Conversation

mfoerste4 commented Feb 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

magnatelee commented Mar 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfoerste4 commented Feb 18, 2022 •

edited

Loading