Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unload and reload models on request #471

Merged
merged 4 commits into from
Mar 27, 2023
Merged

Conversation

Brawlence
Copy link
Contributor

@Brawlence Brawlence commented Mar 21, 2023

An important step towards optimizing running different neural networks in parallel on the same GPU.

The core idea and the usage case is simple: when oobabooga is used alongside other memory hogs like Stable Diffusion (sd-api-pictures extension) or Tortoise-TTS (not yet implemented) this simple unload function leaves a lot more video memory for those other neural networks to work with. Once they finish their jobs, the LLM can be returned back to VRAM.

This is the first one of the possible improvements to #309 memory handling.


Tested on my machine, unloading Pyg-2.7B-8bit is almost instant, loading it back (from the RAM cache) takes ~7 seconds which I consider to be an acceptable delay compared to the image generation itself.

Pyg-6B-8bit is a bit slower but still tolerable.

MemGraph

ConsoleLog

@Brawlence Brawlence marked this pull request as ready for review March 21, 2023 13:29
@mastoca
Copy link

mastoca commented Mar 21, 2023

I like the reloading idea as I've been switching to another model then returning to the updated model.

I'm not sure what the app state would be in an 'unloaded' condition, perhaps we just need the reload implementation?

@Brawlence
Copy link
Contributor Author

Well, the state after unloading the checkpoint would be undetermined. One won't be able to generate a response, yet the generated error is not fatal and one can resume the chat texgen once the model is loaded back in, that much I tested.

The core idea and the usage case is simple: when oobabooga is used alongside other memory hogs like Stable Diffusion (sd-api-pictures extension) or Tortoise-TTS (not yet implemented) this simple unload function leaves a lot more videomemory for those other neural networks to work with. Once they finish their jobs, the LLM can be returned back to VRAM.

Now shows the message in the console when unloading weights. Also reload_model() calls unload_model() first to free the memory so that multiple reloads won't overfill it.
@oobabooga
Copy link
Owner

In the latest gradio version, there is now this circle icon in dropdown menus that unselects the currently selected option. I have modified the PR for using this button to unload the model from memory.

circle

Your buttons were more functional because they allowed the very same model to be reloaded without having to locate it in the dropdown list, but I found that they occupied a lot of space while being a very niche feature. It should still be possible to create unload/reload buttons inside an extension.

@oobabooga oobabooga merged commit af603a1 into oobabooga:main Mar 27, 2023
@Brawlence
Copy link
Contributor Author

That's a nice way to save space!
Though I'll still need the reload_model() function in server.py, as it would be called in extension which is trying to manage VRAM. I'll just introduce it as part of sd-api-pics update, this will make more sense in context

Ph0rk0z pushed a commit to Ph0rk0z/text-generation-webui-testing that referenced this pull request Apr 17, 2023
@catboxanon
Copy link

The unload_model and more newly added reload_model functions should be added as endpoints to the API extension, I don't think their scope should be limited to just developing extensions from within this UI only. The SD web UI exposing endpoints in it's own web UI is the only reason the SD API extension is possible in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants