Skip to content

Commit

Permalink
Fix hpZ with zero element (microsoft#5652)
Browse files Browse the repository at this point in the history
Fix corner cases where hpz secondary partition has zero element. This
ensure that `sec_numel` is at least zero. For this scenario, copying is
really not necessary except that all ranks need to synchronize at the
end of secondary partition. This is a good solution until [2nd tensor
all-gather vs 2nd tensor partition issue
](https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/partition_parameters.py#L1706)
is properly fixed.

Fixes: microsoft#5642

---------

Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
  • Loading branch information
3 people authored Jun 18, 2024
1 parent 2a0c0e3 commit b33873d
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion deepspeed/runtime/zero/partition_parameters.py
Original file line number Diff line number Diff line change
Expand Up @@ -1689,7 +1689,7 @@ def _partition_param_sec(self, param, buffer=None, has_been_updated=False):
one_dim_param = param.contiguous().view(-1)

# ds_numel is unpadded, so the last chunk of the secondary tensor might not be secondary_partition_size
sec_numel = param.ds_numel - secondary_start if secondary_end > param.ds_numel else secondary_partition_size
sec_numel = max(0, min(param.ds_numel - secondary_start, secondary_partition_size))

# copy from full tensor to secondary tensor
param.ds_secondary_tensor.narrow(0, 0,
Expand Down

0 comments on commit b33873d

Please sign in to comment.