Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do i run inference with multiple models without maxing out my GPU VRAM? #55

Open
HSHallucinations opened this issue Nov 28, 2023 · 5 comments

Comments

@HSHallucinations
Copy link

HSHallucinations commented Nov 28, 2023

I'm trying to tag a dataset using more than one WD14 model, so i wrote a simple script that iterates all the files in a directory for every model in a list, like this

model_list = ['SwinV2', 'ConvNextV2', 'MOAT', 'ViT']

for m in model_list:
    for child in directory.glob('**/*'):
          ratings, features, chars = get_wd14_tags(child, model_name=m, general_threshold=thresh)

my problem is, after every loop in model_list, inference time increases a lot, for the first loop it takes ~0.15 seconds to extract the tags from each image, no matter how many images, but by the time i'm doing the 4th loop it takes 15 seconds. But if i run the script 4 times with a single model in the list, every model takes the same 0.15 seconds.

Running it with the task manager open, i noticed that every time it loads a new model, the dedicated VRAM used by python increases by ~1.5gb, so once i reach the third loop, my poor 970 doesn't have any more free memory so i guess it start using the system RAM and that's why it slows down.

Is there a way to free the VRAM before loading a new model? I tried looking in the ONNX documentation but it's way above my level of understanding.

I'm running it on Win10 / onnxruntime-gpu / cuda11.8

@narugo1992
Copy link
Contributor

by the time i'm doing the 4th loop it takes 15 seconds

Is this mean 15 secs per image? or something else?

@narugo1992
Copy link
Contributor

Running it with the task manager open, i noticed that every time it loads a new model, the dedicated VRAM used by python increases by ~1.5gb, so once i reach the third loop, my poor 970 doesn't have any more free memory so i guess it start using the system RAM and that's why it slows down.

I remember the Geforce GTX 790 has at least 12GB of VRAM, so it seems not to be related to VRAM.

@HSHallucinations
Copy link
Author

yes, it's 15 seconds per image once i max out the VRAM. Unfortunately the gtx970 has only 3.5 gb of VRAM, it's almot 10years old at this point, maybe you're thinking of some newer AMD card with a similar name

@narugo1992
Copy link
Contributor

Actually, we can release VRAM by clearing the cache. The source code is available here: https://github.com/deepghs/imgutils/blob/main/imgutils/tagging/wd14.py#L69

Here's how you can use it:

from imgutils.tagging.wd14 import _get_wd14_model

_get_wd14_model.cache_clear()

Once the cache is cleared, the previously loaded model will be released.

However, this method is currently just a workaround. A more suitable approach would be for us to provide a complete VRAM management layer in the future. This part has already been added to the todo list.

@HSHallucinations
Copy link
Author

Works perfectly for what i need to do, thanks for the help and also for writing this library, i spent months trying every commercial software with auto tagging but they were all too generic, while this does exactly what i wanted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants