reclaiming memory for inference #897

stas00 · 2021-03-27T05:02:33Z

While #896 solves the leak problem, ideally we should also have a new method to free all optimizer/scheduler related parts to pave wave for inference. In some environments like google colab general RAM is very scarce so every bit counts.

Here is one way to approach this:

engine, optimizer, scheduler = deepspeed.initialize(...)
# do the training and 
# then before inference do:
engine.free_optimizer_and_scheduler()
optimizer = None
scheduler = None
# it's then user's responsibility to make sure they have no remaining references to optimizer/scheduler objects for them to be freed.

with a new deepspeed method:

def free_optimizer_and_scheduler(self):
    self.lr_scheduler.optimizer = None
    self.optimizer.optimizer = None
    self.lr_scheduler = None
    self.optimizer = None

That way after training is done a lion part of the general RAM used by deepspeed is reclaimed. There are probably other bits to manually clean to reclaim even more.

Let me know if it sounds good to you and I will make another PR with this feature. We can in the future extend it if need be to support other things to benefit inference.

Thank you.

@jeffra, @RezaYazdaniAminabadi

The text was updated successfully, but these errors were encountered:

RezaYazdaniAminabadi · 2021-03-29T16:41:29Z

I think these things that you mentioned makes sense. We also need to make sure such freeing or not allocating the memory for those training-related parts not happen in an automatic way when we are at inference mode. I mean that user don't need to specifically call a function like free_optimizer_and_scheduler to free those memory, but have an easy way of switching mode like eval mode in PyTorch.

stas00 · 2021-03-29T16:48:21Z

I agree! That would be nice indeed.

But torch's model.eval()/train() just turns some flags on/off, how would you deal with the user switching back from eval to train in deepspeed? Do you save the config and simply re-init the parts that were freed for eval.

RezaYazdaniAminabadi · 2021-03-29T16:55:05Z

yes, that can be a viable option, as we have to eventually control the checkpointing through deepspeed if we want to seamlessly switch between these two modes. I think we can hide all the necessary operations for the switching between inference and training modes, and the user still feels it's like switching a flag on and off.

stas00 · 2021-03-29T16:56:18Z

yes, please!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reclaiming memory for inference #897

reclaiming memory for inference #897

stas00 commented Mar 27, 2021 •

edited

Loading

RezaYazdaniAminabadi commented Mar 29, 2021

stas00 commented Mar 29, 2021

RezaYazdaniAminabadi commented Mar 29, 2021

stas00 commented Mar 29, 2021

reclaiming memory for inference #897

reclaiming memory for inference #897

Comments

stas00 commented Mar 27, 2021 • edited Loading

RezaYazdaniAminabadi commented Mar 29, 2021

stas00 commented Mar 29, 2021

RezaYazdaniAminabadi commented Mar 29, 2021

stas00 commented Mar 29, 2021

stas00 commented Mar 27, 2021 •

edited

Loading