Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DeepSpeed] restore memory for evaluation #10114

Merged
merged 4 commits into from
Feb 10, 2021

Conversation

stas00
Copy link
Contributor

@stas00 stas00 commented Feb 10, 2021

I spent some time trying to see if we could gain from DeepSpeed during inference - and while in the future there will be goodies to make it useful at the moment we don't need it, so let's make DeepSpeed cleanly contained to train for now.

This PR has a few small tweaks:

  • frees up all the memory used by DeepSpeed at the end of training
  • makes a clean way of not switching model.to() - only for when --do_train is used with deepspeed (so this is the case where you @sgugger were concerned about eval before train - no problem now)
  • adds a warning if a user tries to use --deepspeed without --do_train
  • re-works the test suite
  • applies consistent json config formatting

@sgugger, @LysandreJik

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, apart from the auto-clean that is a bit too magic to my taste. Thanks!

Comment on lines 1043 to 1046
self.deepspeed = None
self.optimizer = None
self.lr_scheduler = None
self.model_wrapped = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super fond of having this done automatically. It should be in a free_memory method of the Trainer (or a name you like better) that is explicitly called by the user between training and evaluation IMO.
This is also useful beyond deepspeed.

Copy link
Contributor Author

@stas00 stas00 Feb 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are 2 different things.

  1. in the case of DeepSpeed this is a clean cut. We explicitly init all these at the beginning of train:
    if self.args.deepspeed:
    model, optimizer, lr_scheduler = init_deepspeed(self, num_training_steps=max_steps)
    self.model = model.module
    self.model_wrapped = model # will get further wrapped in DDP
    self.deepspeed = model # DeepSpeedEngine object
    self.optimizer = optimizer
    self.lr_scheduler = lr_scheduler

so this PR explicitly cleans these up at the end of train - this is completely opaque. A user had no way to init those explicitly and thus has no need to do anything special.

  1. wrt generic case, the story is different because the user may supply her own optimizer/lr_scheduler and in such case, yes, they need to have control over whether to clean up or not.

As you pointed out this would be useful to the user, but it's a different situation, so let's solve it separately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though I do think I need to fix this:

            self.model_wrapped = None

to restore this to self.model

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, it's true that in this case they are instantiated inside the train method so this makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants