Make .zero_grad()
signature consistent with PyTorch (set_grads_to_None)
#2678
Labels
.zero_grad()
signature consistent with PyTorch (set_grads_to_None)
#2678
Is your feature request related to a problem? Please describe.
The DeepSpeed optimizers have a
.zero_grad()
method like all optimizers in PyTorch, but the argument that controls whether gradients get zeroed out vs. set to None is namedset_grads_to_None
DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py
Line 1534 in 323c266
whereas in PyTorch, it is called
set_to_none
. This inconsistency requires other frameworks that wrap around deepspeed to have a translation depending on whether a PyTorch optimizer or a DeepSpeed optimizer is used.Describe the solution you'd like
Rename the argument to
set_to_none
.Describe alternatives you've considered
Keeping it as is. This is not a major issue, but it would be nice to be consistent with the naming in PyTorch. Less cognitive overhead for users.
Additional context
Example of how it can be translated in other libraries for now:
Lightning-AI/pytorch-lightning#16275
The text was updated successfully, but these errors were encountered: