-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: cuda runtime error (2) : out of memory #328
Comments
I have the same problem,did you solve it? |
Did you try to train or test? 800x600 is more than 7 times larger than 256x256 image (800x600/(256x256) = 7.32), so the memory requirement will be very high. One approach to save memory is to train on cropped images using If it still run into out-of-memory error, you can try reducing the network size. |
hello @taesung89 Did you try to train or test?I was training 800x600 is more than 7 times larger than 256x256 image (800x600/(256x256) = 7.32), so the memory requirement will be very high.Yes, this another problem I have. I am already working on a server which has lots of GPUs, each one is 16 GB. When I choose a single GPU, it is allocated but not fully utilized, only 4GB out of the 16 are utilized. Is there an idea of how to fully use the GPU ?? and accordingly, if I choose multiple GPUs, only one of them is allocated. you can refer to my issue here #327 One approach to save memory is to train on cropped images using --resize_or_crop resize_and_crop, and then generate the images at test time by loading only one generator network using --model test --resize_or_crop none. I think 800x600 can be dealt this way.Exactly, this is what I did. I did a resize and crop (as the original implementation), and then during testing, I tested on the full image, and it worked One last question, what is your opinion concerning my case of training on rectangle images 800x600. Do you think resizing to 286x286 then cropping to 256x256 is a good idea ?, or shall I skip the resizing and only crop square patches, or shall I resize to smaller rectangle images instead of square, and then do the random cropping. |
Hello @taesung89, I also have similar problem since not enough GPU memory for my 512 x 512 images. I did exact same ways you suggested for the training (loadSize = 512, fineSize = 256) and the test (loadSize = 512, fineSize = 512). My question is a little bit different from above. How does G work for larger test image (512 x 512) even though it was trained with small (in this case, 256 x 256). When testing, is there anyways of upsampling technique involved? I may miss some parts for now. For now, the results after G seems visually ok. Thank you in advance. |
Hello,
when i change the
--resize_or_crop none
, I have this errormy images are not that big, they are 800x600, and I am running on a 16 GB GPU
create web directory ./checkpoints/carla2kitti_cyclegan/web...
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Exception NameError: "global name 'FileNotFoundError' is not defined" in <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7fd2831a5b90>> ignored
Traceback (most recent call last):
File "train.py", line 32, in
model.optimize_parameters()
File "/mnt/DTAA_data/DTAA/code/z637177/pytorch-CycleGAN-and-pix2pix-master/models/cycle_gan_model.py", line 138, in optimize_parameters
self.forward()
File "/mnt/DTAA_data/DTAA/code/z637177/pytorch-CycleGAN-and-pix2pix-master/models/cycle_gan_model.py", line 85, in forward
self.rec_B = self.netG_A(self.fake_A)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 112, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/mnt/DTAA_data/DTAA/code/z637177/pytorch-CycleGAN-and-pix2pix-master/models/networks.py", line 186, in forward
return self.model(input)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/modules/instancenorm.py", line 50, in forward
self.training or not self.track_running_stats, self.momentum, self.eps)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/functional.py", line 1245, in instance_norm
eps=eps)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/onnx/init.py", line 57, in wrapper
return fn(*args, **kwargs)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/functional.py", line 1233, in _instance_norm
training=use_input_stats, momentum=momentum, eps=eps)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/functional.py", line 1194, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58
The text was updated successfully, but these errors were encountered: