-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slicing-based np.block
implementation
#306
Conversation
@@ -1437,7 +1496,7 @@ def block(arrays): | |||
# check if the 'arrays' is a balanced tree | |||
depth = check_list_depth(arrays) | |||
|
|||
result = _block(arrays, 1, depth) | |||
result = _block_slicing(arrays, depth) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think we should always use the slicing implementation? I know you were doing some experiments to figure out a reasonable threshold to toggle the behavior and if so, please leave a TODO comment.
cunumeric/module.py
Outdated
for i in range(axis + 1, common_info.ndim): | ||
idx_arr.append(slice(out_shape[i])) | ||
|
||
post_idx = (slice(None, None, None),) * len(out_shape[axis + 1 :]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slice(None)
would be sufficient.
cunumeric/module.py
Outdated
@@ -1314,33 +1361,45 @@ def _concatenate( | |||
dtype=None, | |||
casting="same_kind", | |||
common_info=None, | |||
slicing_only=False, | |||
): | |||
if axis < 0: | |||
axis += len(common_info.shape) | |||
leading_dim = _builtin_sum(arr.shape[axis] for arr in inputs) | |||
out_shape = list(common_info.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, common_info
seems to be optional, but you always take the shape of it. If it is never a None
, then please turn it into a positional argument to document the behavior.
cunumeric/module.py
Outdated
) | ||
if slicing_only: | ||
slices = [] | ||
out_array = (out_shape, slices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is both a very bad naming (since out_array
isn't an array!) and also untyped coding, as _concatenate
returns either a pair of a shape and a list or an ndarray. I think you should extract out the common code and separate the function for slicing_only=True
from that for slicing_only=False
.
cunumeric/module.py
Outdated
) | ||
return concatenate(reshaped, axis=-1 + (cur_depth - depth)) | ||
|
||
def _block_slicing(arr, depth): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use arrays
instead of arr
as in the caller. arr
sounds like a single ndarray, although it's really a list of arrays.
inputs = list(convert_to_cunumeric_ndarray(inp) for inp in arr) | ||
arrays = list(convert_to_cunumeric_ndarray(inp) for inp in arr) | ||
if len(arr) > 1: | ||
arrays, common_info = check_shape_dtype( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you had a chance to do this shape checking and input conversion in check_list_depth
. Why are you doing this checking again here? Is it simply because you want to reuse check_shape_dtype
to construct an ArrayInfo
?
cunumeric/module.py
Outdated
def _block(arr, cur_depth, depth): | ||
def _block_collect_slices(arr, cur_depth, depth): | ||
# collects slices for each array in `arr` | ||
# the last outcome will be slices on every dimension of the output array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"last outcome" => "outcome"
cunumeric/module.py
Outdated
) | ||
offset += cur_dim | ||
# flatten the slices | ||
slices = functools_reduce(iconcat, updated_slices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you could use functools.chain
instead.
…ape` And refine verbose / redundant codes
* Broaden add_scalar_arg type * only need to accept 1-tuples
Reimplemented
np.block
w/ slices to avoid creating intermediate arrays by concatenation.This PR includes changes in _concatenate, which removes verbose code.
Slicing-based block performs better for most cases on a single node.
For multi-node, the communication from src to dest can be different depending on how intermediate arrays are partitioned.
We keep the slicing-based block only and will come up with some ways to avoid perf degradations for some cases in a separate PR.