Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reclaiming memory for inference #897

Open
stas00 opened this issue Mar 27, 2021 · 4 comments
Open

reclaiming memory for inference #897

stas00 opened this issue Mar 27, 2021 · 4 comments

Comments

@stas00
Copy link
Collaborator

stas00 commented Mar 27, 2021

While #896 solves the leak problem, ideally we should also have a new method to free all optimizer/scheduler related parts to pave wave for inference. In some environments like google colab general RAM is very scarce so every bit counts.

Here is one way to approach this:

engine, optimizer, scheduler = deepspeed.initialize(...)
# do the training and 
# then before inference do:
engine.free_optimizer_and_scheduler()
optimizer = None
scheduler = None
# it's then user's responsibility to make sure they have no remaining references to optimizer/scheduler objects for them to be freed.

with a new deepspeed method:

def free_optimizer_and_scheduler(self):
    self.lr_scheduler.optimizer = None
    self.optimizer.optimizer = None
    self.lr_scheduler = None
    self.optimizer = None

That way after training is done a lion part of the general RAM used by deepspeed is reclaimed. There are probably other bits to manually clean to reclaim even more.

Let me know if it sounds good to you and I will make another PR with this feature. We can in the future extend it if need be to support other things to benefit inference.

Thank you.

@jeffra, @RezaYazdaniAminabadi

@RezaYazdaniAminabadi
Copy link
Contributor

I think these things that you mentioned makes sense. We also need to make sure such freeing or not allocating the memory for those training-related parts not happen in an automatic way when we are at inference mode. I mean that user don't need to specifically call a function like free_optimizer_and_scheduler to free those memory, but have an easy way of switching mode like eval mode in PyTorch.

@stas00
Copy link
Collaborator Author

stas00 commented Mar 29, 2021

I agree! That would be nice indeed.

But torch's model.eval()/train() just turns some flags on/off, how would you deal with the user switching back from eval to train in deepspeed? Do you save the config and simply re-init the parts that were freed for eval.

@RezaYazdaniAminabadi
Copy link
Contributor

yes, that can be a viable option, as we have to eventually control the checkpointing through deepspeed if we want to seamlessly switch between these two modes. I think we can hide all the necessary operations for the switching between inference and training modes, and the user still feels it's like switching a flag on and off.

@stas00
Copy link
Collaborator Author

stas00 commented Mar 29, 2021

yes, please!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants