CUDA out of memory #52

MrChenNX · 2024-02-29T21:54:17Z

Hi, I'm faced with the following error even though I change the samples_per_gpu to 1 (the smallest batch size I think):

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 690.00 MiB (GPU 0; 39.59 GiB total capacity; 973.79 MiB already allocated; 474.62 MiB free; 1.72 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

Does it work only on GPU with 80GB memory or is there any other things I need to take care of? Thanks in advance!

raoyongming · 2024-03-01T13:22:51Z

Hi, the message here (GPU 0; 39.59 GiB total capacity; 973.79 MiB already allocated; 474.62 MiB free; 1.72 GiB reserved in total by PyTorch) may indicate that there are other processes/programs or users that occupy the main part of the memory. Any of our models can be trained on 40G A100 gpus.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory #52

CUDA out of memory #52

MrChenNX commented Feb 29, 2024 •

edited

Loading

raoyongming commented Mar 1, 2024

CUDA out of memory #52

CUDA out of memory #52

Comments

MrChenNX commented Feb 29, 2024 • edited Loading

raoyongming commented Mar 1, 2024

MrChenNX commented Feb 29, 2024 •

edited

Loading