-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSError: [Errno 12] Cannot allocate memory #5
Comments
|
Yeah Exactly!
Does this give us any information as to where we might be going wrong. Can I change anything myself,(given that I have root permission)which could help me prevent this issue. |
Have you fixed that ? I am facing the same issue. |
I am also running into the same problem, although I am running everything on a CPU. I have more than enough memory (the error occurs when I'm using only 10G out of 32G) |
@penguinshin I fix this bug by adding 64G swap memory. When data loader forks workers to load data, the memory increases rapidly. You can try setting |
Hi! As @ZhuFengdaaa confirms, it seems a peak memory problem although I am not able to reproduce it. Again as @ZhuFengdaaa suggests, this seems to be linked with the number of threads (also see https://discuss.pytorch.org/t/guidelines-for-assigning-num-workers-to-dataloader/813/6). |
Another related thread ruotianluo/self-critical.pytorch#11 |
Why does this have to do with permissions?and what should i do with permissions? |
**Fixed this ** by allocating 4G swap memory. You can try allocating more memory if 4G does not suffice. |
Fixed the problem by allocating 64GB of swap memory from the external disk. |
Why would some swap be needed? It slows down everything |
Fix is almost always: |
This problem is come from CPU memory allocation. check CPU Ram Memory |
Hello,
I am getting Cannot allocate memory error;I understand this is something related to my GPU. But it is quite surprising that I should get this error because, I am training this on 3 1080TI GPUs, with a batch size of 64.
CUDA_VISIBLE_DEVICES=0,1,2 python train.py ~/DATASETS/cifar.python cifar10 -s ./snapshots --log ./logs --ngpu 3 --learning_rate 0.05 -b 64
Please suggest what I could do to avoid this issue.
Thank You!
The text was updated successfully, but these errors were encountered: