-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DeepSpeed] restore memory for evaluation #10114
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, apart from the auto-clean that is a bit too magic to my taste. Thanks!
src/transformers/trainer.py
Outdated
self.deepspeed = None | ||
self.optimizer = None | ||
self.lr_scheduler = None | ||
self.model_wrapped = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super fond of having this done automatically. It should be in a free_memory
method of the Trainer
(or a name you like better) that is explicitly called by the user between training and evaluation IMO.
This is also useful beyond deepspeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are 2 different things.
- in the case of DeepSpeed this is a clean cut. We explicitly init all these at the beginning of
train
:
transformers/src/transformers/trainer.py
Lines 765 to 771 in 22a32cf
if self.args.deepspeed: model, optimizer, lr_scheduler = init_deepspeed(self, num_training_steps=max_steps) self.model = model.module self.model_wrapped = model # will get further wrapped in DDP self.deepspeed = model # DeepSpeedEngine object self.optimizer = optimizer self.lr_scheduler = lr_scheduler
so this PR explicitly cleans these up at the end of train
- this is completely opaque. A user had no way to init those explicitly and thus has no need to do anything special.
- wrt generic case, the story is different because the user may supply her own optimizer/lr_scheduler and in such case, yes, they need to have control over whether to clean up or not.
As you pointed out this would be useful to the user, but it's a different situation, so let's solve it separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though I do think I need to fix this:
self.model_wrapped = None
to restore this to self.model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, it's true that in this case they are instantiated inside the train
method so this makes sense.
I spent some time trying to see if we could gain from DeepSpeed during inference - and while in the future there will be goodies to make it useful at the moment we don't need it, so let's make DeepSpeed cleanly contained to
train
for now.This PR has a few small tweaks:
model.to()
- only for when--do_train
is used with deepspeed (so this is the case where you @sgugger were concerned about eval before train - no problem now)--deepspeed
without--do_train
@sgugger, @LysandreJik