Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort pr #199

Merged
merged 68 commits into from
Mar 22, 2022
Merged
Show file tree
Hide file tree
Changes from 58 commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
9594f19
update OpenBLAS version to support new architectures
mfoerste4 Feb 8, 2022
69fbf7d
initial draft for sort, 1D, key sort
mfoerste4 Feb 8, 2022
dfa7adb
fixed compile error
mfoerste4 Feb 8, 2022
4c7c3a2
OpenMP non-distributed implementation, some small fixes, benchmark tool
mfoerste4 Feb 8, 2022
710c084
added missing include
mfoerste4 Feb 8, 2022
07bdb16
switch to parallel gcc sort
mfoerste4 Feb 8, 2022
58b2bf4
Enable N-D non-distributed sort
Feb 10, 2022
f585dd5
update OpenBLAS version to support new architectures
mfoerste4 Feb 8, 2022
3a08481
initial draft for sort, 1D, key sort
mfoerste4 Feb 8, 2022
131fb6d
fixed compile error
mfoerste4 Feb 8, 2022
b115835
OpenMP non-distributed implementation, some small fixes, benchmark tool
mfoerste4 Feb 8, 2022
03608cf
added missing include
mfoerste4 Feb 8, 2022
85bc3a7
switch to parallel gcc sort
mfoerste4 Feb 8, 2022
188077b
Enable N-D non-distributed sort
Feb 10, 2022
79d9c72
Merge branch 'sort_LGT-203_pr' of github.com:mfoerste4/cunumeric into…
mfoerste4 Feb 10, 2022
5cd0956
merge after rebase to 22.03
mfoerste4 Feb 10, 2022
9063d77
added cupy-style sort kernel, support axis=None, improved benchmark
mfoerste4 Feb 16, 2022
bec9143
Merge branch 'nv-legate:branch-22.03' into sort_LGT-203_pr
mfoerste4 Feb 16, 2022
5e982c2
refactoring and documentation
mfoerste4 Feb 17, 2022
e737e36
Merge branch 'nv-legate:branch-22.03' into sort_pr
mfoerste4 Feb 17, 2022
c9e4407
added argsort support and test coverage
mfoerste4 Feb 18, 2022
a5204fd
Merge branch 'nv-legate:branch-22.03' into sort_pr
mfoerste4 Feb 18, 2022
fd0d3f8
adjusted docstring
mfoerste4 Feb 18, 2022
6c385dd
extract messy code from deferred
mfoerste4 Feb 18, 2022
a12df50
conflic resolve
mfoerste4 Feb 23, 2022
878059e
Merge branch 'nv-legate-branch-22.03' into sort_pr
mfoerste4 Feb 23, 2022
49c3f3b
refactor sort c-code, simplify, reduce duplicated code
mfoerste4 Feb 25, 2022
6a06149
change argsort return type to int64
mfoerste4 Feb 25, 2022
22941d9
Merge branch 'nv-legate:branch-22.03' into sort_pr
mfoerste4 Feb 25, 2022
ca889b9
resolved earlier merge issue
mfoerste4 Feb 25, 2022
5897c68
deactivate test for dimesions > 4
mfoerste4 Feb 28, 2022
e24ecca
Distributed 1-D Sort on GPU
mfoerste4 Mar 2, 2022
c33f446
Merge branch 'sort_nccl' into merge_2203
mfoerste4 Mar 7, 2022
aea5679
Merge pull request #2 from nv-legate/merge_2203
mfoerste4 Mar 7, 2022
3eeebd2
remove explicit host memory type
mfoerste4 Mar 7, 2022
6a7e736
assume all data is dense according to mapping config
mfoerste4 Mar 7, 2022
ae3436a
Merge pull request #3 from mfoerste4/sort_nccl
mfoerste4 Mar 7, 2022
7483a2b
transform to complex datatype AFTER computation
mfoerste4 Mar 7, 2022
e6beb1d
review changes python
mfoerste4 Mar 9, 2022
c7fee99
review changes C++ signatures and cleanup
mfoerste4 Mar 9, 2022
5a02047
non-stable sort for primitive values
mfoerste4 Mar 9, 2022
0c1805f
remove copies where possible
mfoerste4 Mar 9, 2022
461ae2b
fix eager test with new default non-stable sort
mfoerste4 Mar 9, 2022
763b99c
fix naming conventions
mfoerste4 Mar 9, 2022
b210b69
minor adjustemnts, comments
mfoerste4 Mar 9, 2022
d945468
argsort also allows non-stable sort
mfoerste4 Mar 9, 2022
898a8d2
adjusted more tests to force stable sort when comparing argsort results
mfoerste4 Mar 9, 2022
09ac1c8
clarify offset iterator usage
mfoerste4 Mar 9, 2022
02e0b53
Merge branch 'sort_pr' into merge-22.03
mfoerste4 Mar 10, 2022
78cc482
Merge pull request #4 from nv-legate/merge-22.03
mfoerste4 Mar 10, 2022
9cd31bb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 10, 2022
da79f86
fixed merge conflict
mfoerste4 Mar 10, 2022
04f811b
ensure 16byte alignment for NCCL transfers
mfoerste4 Mar 11, 2022
3cb09ce
Merge branch 'nv-legate:branch-22.03' into sort_pr
mfoerste4 Mar 15, 2022
6666a05
Merge branch 'nv-legate:branch-22.03' into sort_pr
mfoerste4 Mar 16, 2022
568523f
some minor adjustments
mfoerste4 Mar 16, 2022
3afa55a
Merge branch 'sort_pr' of github.com:mfoerste4/cunumeric into sort_pr
mfoerste4 Mar 16, 2022
e1b6c31
fixed renaming
mfoerste4 Mar 16, 2022
2edd7ba
manually free temporary memory to reduce peak usage
mfoerste4 Mar 18, 2022
ee52211
refactor sort interface to prevent 1 unneeded copy
mfoerste4 Mar 18, 2022
4203492
Merge branch 'nv-legate:branch-22.03' into sort_pr
mfoerste4 Mar 18, 2022
927b54f
fixed init issue
mfoerste4 Mar 18, 2022
10e7ebb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 18, 2022
e52b017
change to thrust openmp policy
mfoerste4 Mar 18, 2022
fa9df75
Merge branch 'sort_pr' of github.com:mfoerste4/cunumeric into sort_pr
mfoerste4 Mar 18, 2022
99ec004
Merge branch 'nv-legate:branch-22.03' into sort_pr
mfoerste4 Mar 21, 2022
99798e3
removed another copy on python side in case we can sort in place
mfoerste4 Mar 21, 2022
21d47b7
Merge branch 'sort_pr' of github.com:mfoerste4/cunumeric into sort_pr
mfoerste4 Mar 21, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions cunumeric/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -2765,6 +2765,14 @@ def setflags(self, write=None, align=None, uic=None):
"""
self.__array__().setflags(write=write, align=align, uic=uic)

def sort(self, axis=-1, kind="quicksort", order=None):
self._thunk.sort(rhs=self._thunk, axis=axis, kind=kind, order=order)

def argsort(self, axis=-1, kind="quicksort", order=None):
self._thunk.sort(
rhs=self._thunk, argsort=True, axis=axis, kind=kind, order=order
)

def squeeze(self, axis=None):
"""a.squeeze(axis=None)

Expand Down
1 change: 1 addition & 0 deletions cunumeric/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ class CuNumericOpCode(IntEnum):
RAND = _cunumeric.CUNUMERIC_RAND
READ = _cunumeric.CUNUMERIC_READ
SCALAR_UNARY_RED = _cunumeric.CUNUMERIC_SCALAR_UNARY_RED
SORT = _cunumeric.CUNUMERIC_SORT
SYRK = _cunumeric.CUNUMERIC_SYRK
TILE = _cunumeric.CUNUMERIC_TILE
TRANSPOSE_COPY_2D = _cunumeric.CUNUMERIC_TRANSPOSE_COPY_2D
Expand Down
19 changes: 19 additions & 0 deletions cunumeric/deferred.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
UnaryRedCode,
)
from .linalg.cholesky import cholesky
from .sort import sort
from .thunk import NumPyThunk
from .utils import get_arg_value_dtype

Expand Down Expand Up @@ -1541,3 +1542,21 @@ def unique(self):
)

return result

@auto_convert([1])
def sort(self, rhs, argsort=False, axis=-1, kind="quicksort", order=None):

if kind == "stable":
stable = True
else:
stable = False

if order is not None:
raise NotImplementedError(
"cuNumeric does not support sorting with 'order' as "
"ndarray only supports numeric values"
)
if axis is not None and (axis >= rhs.ndim or axis < -rhs.ndim):
raise ValueError("invalid axis")

sort(self, rhs, argsort, axis, stable)
10 changes: 10 additions & 0 deletions cunumeric/eager.py
Original file line number Diff line number Diff line change
Expand Up @@ -502,6 +502,16 @@ def nonzero(self):
result += (EagerArray(self.runtime, array),)
return result

def sort(self, rhs, argsort=False, axis=-1, kind="quicksort", order=None):
self.check_eager_args(rhs, axis, kind, order)
if self.deferred is not None:
self.deferred.sort(rhs, argsort, axis, kind, order)
else:
if argsort:
self.array = np.argsort(rhs.array, axis, kind, order)
else:
self.array = np.sort(rhs.array, axis, kind, order)

def random_uniform(self):
if self.deferred is not None:
self.deferred.random_uniform()
Expand Down
162 changes: 162 additions & 0 deletions cunumeric/module.py
Original file line number Diff line number Diff line change
Expand Up @@ -5721,6 +5721,168 @@ def unique(
# Sorting, searching, and counting
##################################

# Sorting


@add_boilerplate("a")
def argsort(a, axis=-1, kind="quicksort", order=None):
"""

Returns the indices that would sort an array.

Parameters
----------
a : array_like
Input array.
axis : int or None, optional
Axis to sort. By default, the index -1 (the last axis) is used. If
None, the flattened array is used.
kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, optional
Default is 'quicksort'. The underlying sort algorithm might vary.
The code basically supports 'stable' or *not* 'stable'.
order : str or list of str, optional
Currently not supported

Returns
-------
index_array : ndarray of ints
Array of indices that sort a along the specified axis. It has the
same shape as `a.shape` or is flattened in case of `axis` is None.

Notes
-----
The current implementation has only limited support for distributed data.
Distributed 1-D or flattened data will be broadcasted.

See Also
--------
numpy.argsort

Availability
--------
Multiple GPUs, Single CPU
"""

result = ndarray(a.shape, np.int64)
result._thunk.sort(
rhs=a._thunk, argsort=True, axis=axis, kind=kind, order=order
)
return result


def msort(a):
"""

Returns a sorted copy of an array sorted along the first axis.

Parameters
----------
a : array_like
Input array.

Returns
-------
out : ndarray
Sorted array with same dtype and shape as `a`.

Notes
-----
The current implementation has only limited support for distributed data.
Distributed 1-D data will be broadcasted.

See Also
--------
numpy.msort

Availability
--------
Multiple GPUs, Single CPU
"""
return sort(a, axis=0)


@add_boilerplate("a")
def sort(a, axis=-1, kind="quicksort", order=None):
"""

Returns a sorted copy of an array.

Parameters
----------
a : array_like
Input array.
axis : int or None, optional
Axis to sort. By default, the index -1 (the last axis) is used. If
None, the flattened array is used.
kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, optional
Default is 'quicksort'. The underlying sort algorithm might vary.
The code basically supports 'stable' or *not* 'stable'.
order : str or list of str, optional
Currently not supported

Returns
-------
out : ndarray
Sorted array with same dtype and shape as `a`. In case `axis` is
None the result is flattened.

Notes
-----
The current implementation has only limited support for distributed data.
Distributed 1-D or flattened data will be broadcasted.

See Also
--------
numpy.sort

Availability
--------
Multiple GPUs, Single CPU
"""
result = ndarray(a.shape, a.dtype)
result._thunk.sort(rhs=a._thunk, axis=axis, kind=kind, order=order)
return result


@add_boilerplate("a")
def sort_complex(a):
"""

Returns a sorted copy of an array sorted along the last axis. Sorts the
real part first, the imaginary part second.

Parameters
----------
a : array_like
Input array.

Returns
-------
out : ndarray, complex
Sorted array with same shape as `a`.

Notes
-----
The current implementation has only limited support for distributed data.
Distributed 1-D data will be broadcasted.

See Also
--------
numpy.sort_complex

Availability
--------
Multiple GPUs, Single CPU
"""

result = sort(a)
# force complex result upon return
if np.issubdtype(result.dtype, np.complexfloating):
return result
else:
return result.astype(np.complex64, copy=True)


# Searching


Expand Down
101 changes: 101 additions & 0 deletions cunumeric/sort.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# Copyright 2022 NVIDIA Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#


from cunumeric.config import CuNumericOpCode

from legate.core import types as ty


def sort_flattened(output, input, argsort, stable):
flattened = input.reshape((input.size,), order="C")

# run sort flattened -- return 1D solution
sort_result = output.runtime.create_empty_thunk(
flattened.shape, dtype=output.dtype, inputs=(flattened,)
)
sort(sort_result, flattened, argsort, stable=stable)
output.base = sort_result.base
output.numpy_array = None


def sort_swapped(output, input, argsort, sort_axis, stable):
assert sort_axis < input.ndim - 1 and sort_axis >= 0

# swap axes
swapped = input.swapaxes(sort_axis, input.ndim - 1)

swapped_copy = output.runtime.create_empty_thunk(
swapped.shape, dtype=input.dtype, inputs=(input, swapped)
)
swapped_copy.copy(swapped, deep=True)

# run sort on last axis
sort_result = output.runtime.create_empty_thunk(
swapped_copy.shape, dtype=output.dtype, inputs=(swapped_copy,)
)
sort(sort_result, swapped_copy, argsort, stable=stable)

output.base = sort_result.swapaxes(input.ndim - 1, sort_axis).base
output.numpy_array = None


def sort_task(output, input, argsort, stable):
task = output.context.create_task(CuNumericOpCode.SORT)

needs_unbound_output = output.runtime.num_gpus > 1 and input.ndim == 1

if needs_unbound_output:
unbound = output.runtime.create_unbound_thunk(dtype=output.dtype)
task.add_output(unbound.base)
else:
task.add_output(output.base)
task.add_alignment(output.base, input.base)

task.add_input(input.base)

if output.ndim > 1:
task.add_broadcast(input.base, input.ndim - 1)
elif output.runtime.num_gpus > 1:
task.add_nccl_communicator()
elif output.runtime.num_gpus == 0 and output.runtime.num_procs > 1:
# Distributed 1D sort on CPU not supported yet
task.add_broadcast(input.base)

task.add_scalar_arg(argsort, bool) # return indices flag
task.add_scalar_arg(input.base.shape, (ty.int32,))
task.add_scalar_arg(stable, bool)
task.execute()

if needs_unbound_output:
output.base = unbound.base
output.numpy_array = None


def sort(output, input, argsort, axis=-1, stable=False):
if axis is None and input.ndim > 1:
sort_flattened(output, input, argsort, stable)
else:
if axis is None:
axis = 0
elif axis < 0:
axis = input.ndim + axis

if axis is not input.ndim - 1:
sort_swapped(output, input, argsort, axis, stable)

else:
# run actual sort task
sort_task(output, input, argsort, stable)
Loading