-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do i run inference with multiple models without maxing out my GPU VRAM? #55
Comments
Is this mean 15 secs per image? or something else? |
I remember the Geforce GTX 790 has at least 12GB of VRAM, so it seems not to be related to VRAM. |
yes, it's 15 seconds per image once i max out the VRAM. Unfortunately the gtx970 has only 3.5 gb of VRAM, it's almot 10years old at this point, maybe you're thinking of some newer AMD card with a similar name |
Actually, we can release VRAM by clearing the cache. The source code is available here: https://github.com/deepghs/imgutils/blob/main/imgutils/tagging/wd14.py#L69 Here's how you can use it: from imgutils.tagging.wd14 import _get_wd14_model
_get_wd14_model.cache_clear() Once the cache is cleared, the previously loaded model will be released. However, this method is currently just a workaround. A more suitable approach would be for us to provide a complete VRAM management layer in the future. This part has already been added to the todo list. |
Works perfectly for what i need to do, thanks for the help and also for writing this library, i spent months trying every commercial software with auto tagging but they were all too generic, while this does exactly what i wanted |
I'm trying to tag a dataset using more than one WD14 model, so i wrote a simple script that iterates all the files in a directory for every model in a list, like this
my problem is, after every loop in model_list, inference time increases a lot, for the first loop it takes ~0.15 seconds to extract the tags from each image, no matter how many images, but by the time i'm doing the 4th loop it takes 15 seconds. But if i run the script 4 times with a single model in the list, every model takes the same 0.15 seconds.
Running it with the task manager open, i noticed that every time it loads a new model, the dedicated VRAM used by python increases by ~1.5gb, so once i reach the third loop, my poor 970 doesn't have any more free memory so i guess it start using the system RAM and that's why it slows down.
Is there a way to free the VRAM before loading a new model? I tried looking in the ONNX documentation but it's way above my level of understanding.
I'm running it on Win10 / onnxruntime-gpu / cuda11.8
The text was updated successfully, but these errors were encountered: