Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory #52

Open
MrChenNX opened this issue Feb 29, 2024 · 1 comment
Open

CUDA out of memory #52

MrChenNX opened this issue Feb 29, 2024 · 1 comment

Comments

@MrChenNX
Copy link

MrChenNX commented Feb 29, 2024

Hi, I'm faced with the following error even though I change the samples_per_gpu to 1 (the smallest batch size I think):

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 690.00 MiB (GPU 0; 39.59 GiB total capacity; 973.79 MiB already allocated; 474.62 MiB free; 1.72 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

Does it work only on GPU with 80GB memory or is there any other things I need to take care of? Thanks in advance!

@raoyongming
Copy link
Owner

Hi, the message here (GPU 0; 39.59 GiB total capacity; 973.79 MiB already allocated; 474.62 MiB free; 1.72 GiB reserved in total by PyTorch) may indicate that there are other processes/programs or users that occupy the main part of the memory. Any of our models can be trained on 40G A100 gpus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants