Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slicing-based np.block implementation #306

Merged
merged 4 commits into from
May 16, 2022

Conversation

sbak5
Copy link
Contributor

@sbak5 sbak5 commented Apr 28, 2022

Reimplemented np.block w/ slices to avoid creating intermediate arrays by concatenation.
This PR includes changes in _concatenate, which removes verbose code.

Slicing-based block performs better for most cases on a single node.
For multi-node, the communication from src to dest can be different depending on how intermediate arrays are partitioned.
We keep the slicing-based block only and will come up with some ways to avoid perf degradations for some cases in a separate PR.

@sbak5 sbak5 requested a review from magnatelee April 29, 2022 17:52
cunumeric/module.py Show resolved Hide resolved
@@ -1437,7 +1496,7 @@ def block(arrays):
# check if the 'arrays' is a balanced tree
depth = check_list_depth(arrays)

result = _block(arrays, 1, depth)
result = _block_slicing(arrays, depth)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we should always use the slicing implementation? I know you were doing some experiments to figure out a reasonable threshold to toggle the behavior and if so, please leave a TODO comment.

for i in range(axis + 1, common_info.ndim):
idx_arr.append(slice(out_shape[i]))

post_idx = (slice(None, None, None),) * len(out_shape[axis + 1 :])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slice(None) would be sufficient.

cunumeric/module.py Show resolved Hide resolved
@@ -1314,33 +1361,45 @@ def _concatenate(
dtype=None,
casting="same_kind",
common_info=None,
slicing_only=False,
):
if axis < 0:
axis += len(common_info.shape)
leading_dim = _builtin_sum(arr.shape[axis] for arr in inputs)
out_shape = list(common_info.shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, common_info seems to be optional, but you always take the shape of it. If it is never a None, then please turn it into a positional argument to document the behavior.

)
if slicing_only:
slices = []
out_array = (out_shape, slices)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is both a very bad naming (since out_array isn't an array!) and also untyped coding, as _concatenate returns either a pair of a shape and a list or an ndarray. I think you should extract out the common code and separate the function for slicing_only=True from that for slicing_only=False.

)
return concatenate(reshaped, axis=-1 + (cur_depth - depth))

def _block_slicing(arr, depth):
Copy link
Contributor

@magnatelee magnatelee May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use arrays instead of arr as in the caller. arr sounds like a single ndarray, although it's really a list of arrays.

inputs = list(convert_to_cunumeric_ndarray(inp) for inp in arr)
arrays = list(convert_to_cunumeric_ndarray(inp) for inp in arr)
if len(arr) > 1:
arrays, common_info = check_shape_dtype(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you had a chance to do this shape checking and input conversion in check_list_depth. Why are you doing this checking again here? Is it simply because you want to reuse check_shape_dtype to construct an ArrayInfo?

def _block(arr, cur_depth, depth):
def _block_collect_slices(arr, cur_depth, depth):
# collects slices for each array in `arr`
# the last outcome will be slices on every dimension of the output array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"last outcome" => "outcome"

cunumeric/module.py Show resolved Hide resolved
)
offset += cur_dim
# flatten the slices
slices = functools_reduce(iconcat, updated_slices)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you could use functools.chain instead.

@sbak5 sbak5 merged commit 0324098 into nv-legate:branch-22.05 May 16, 2022
@manopapad manopapad mentioned this pull request May 17, 2022
ipdemes pushed a commit to ipdemes/cunumeric that referenced this pull request Aug 4, 2022
* Broaden add_scalar_arg type

* only need to accept 1-tuples
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants