Make `.zero_grad()` signature consistent with PyTorch (set_grads_to_None) #2678

awaelchli · 2023-01-09T08:22:11Z

Is your feature request related to a problem? Please describe.

The DeepSpeed optimizers have a .zero_grad() method like all optimizers in PyTorch, but the argument that controls whether gradients get zeroed out vs. set to None is named set_grads_to_None

DeepSpeed/deepspeed/runtime/zero/stage_1_and_2.py

Line 1534 in 323c266

def zero_grad(self, set_grads_to_None=True):

whereas in PyTorch, it is called set_to_none. This inconsistency requires other frameworks that wrap around deepspeed to have a translation depending on whether a PyTorch optimizer or a DeepSpeed optimizer is used.

Describe the solution you'd like

Rename the argument to set_to_none.

Describe alternatives you've considered
Keeping it as is. This is not a major issue, but it would be nice to be consistent with the naming in PyTorch. Less cognitive overhead for users.

Additional context

Example of how it can be translated in other libraries for now:
Lightning-AI/pytorch-lightning#16275

The text was updated successfully, but these errors were encountered:

loadams · 2023-01-23T21:50:58Z

Hi @awaelchli,

I spoke with @tjruwase and @jeffra who agree that it makes sense to change the behavior to match the torch implementation here. I'll link the PR soon.

awaelchli · 2023-01-24T18:10:38Z

Thank you @loadams @jeffra, very nice improvement!

awaelchli added the enhancement New feature or request label Jan 9, 2023

awaelchli mentioned this issue Jan 9, 2023

Handle set_to_none when using DeepSpeed optimizer in Lite Lightning-AI/pytorch-lightning#16275

Merged

11 tasks

tjruwase added the training label Jan 17, 2023

loadams self-assigned this Jan 19, 2023

loadams linked a pull request Jan 23, 2023 that will close this issue

Change zero_grad() argument to match pytorch #2741

Merged

jeffra closed this as completed in #2741 Jan 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `.zero_grad()` signature consistent with PyTorch (set_grads_to_None) #2678

Make `.zero_grad()` signature consistent with PyTorch (set_grads_to_None) #2678

awaelchli commented Jan 9, 2023

loadams commented Jan 23, 2023

awaelchli commented Jan 24, 2023

Make .zero_grad() signature consistent with PyTorch (set_grads_to_None) #2678

Make .zero_grad() signature consistent with PyTorch (set_grads_to_None) #2678

Comments

awaelchli commented Jan 9, 2023

loadams commented Jan 23, 2023

awaelchli commented Jan 24, 2023

Make `.zero_grad()` signature consistent with PyTorch (set_grads_to_None) #2678

Make `.zero_grad()` signature consistent with PyTorch (set_grads_to_None) #2678