RuntimeError: cuda runtime error (2) : out of memory #328

mhusseinsh · 2018-07-17T12:04:21Z

Hello,
when i change the --resize_or_crop none, I have this error
my images are not that big, they are 800x600, and I am running on a 16 GB GPU
create web directory ./checkpoints/carla2kitti_cyclegan/web...
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Exception NameError: "global name 'FileNotFoundError' is not defined" in <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7fd2831a5b90>> ignored
Traceback (most recent call last):
File "train.py", line 32, in
model.optimize_parameters()
File "/mnt/DTAA_data/DTAA/code/z637177/pytorch-CycleGAN-and-pix2pix-master/models/cycle_gan_model.py", line 138, in optimize_parameters
self.forward()
File "/mnt/DTAA_data/DTAA/code/z637177/pytorch-CycleGAN-and-pix2pix-master/models/cycle_gan_model.py", line 85, in forward
self.rec_B = self.netG_A(self.fake_A)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 112, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/mnt/DTAA_data/DTAA/code/z637177/pytorch-CycleGAN-and-pix2pix-master/models/networks.py", line 186, in forward
return self.model(input)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/modules/instancenorm.py", line 50, in forward
self.training or not self.track_running_stats, self.momentum, self.eps)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/functional.py", line 1245, in instance_norm
eps=eps)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/onnx/init.py", line 57, in wrapper
return fn(*args, **kwargs)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/functional.py", line 1233, in _instance_norm
training=use_input_stats, momentum=momentum, eps=eps)
File "/home/adm.Z637177/.local/lib/python2.7/site-packages/torch/nn/functional.py", line 1194, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

The text was updated successfully, but these errors were encountered:

cici33 · 2018-07-19T01:12:16Z

I have the same problem,did you solve it?

taesungp · 2018-07-19T03:34:31Z

Did you try to train or test? 800x600 is more than 7 times larger than 256x256 image (800x600/(256x256) = 7.32), so the memory requirement will be very high.

One approach to save memory is to train on cropped images using --resize_or_crop resize_and_crop, and then generate the images at test time by loading only one generator network using --model test --resize_or_crop none. I think 800x600 can be dealt this way.

If it still run into out-of-memory error, you can try reducing the network size.

mhusseinsh · 2018-07-19T05:17:50Z

hello @taesung89
Thanks for your helpful reply

Did you try to train or test?

I was training

800x600 is more than 7 times larger than 256x256 image (800x600/(256x256) = 7.32), so the memory requirement will be very high.

Yes, this another problem I have. I am already working on a server which has lots of GPUs, each one is 16 GB. When I choose a single GPU, it is allocated but not fully utilized, only 4GB out of the 16 are utilized. Is there an idea of how to fully use the GPU ?? and accordingly, if I choose multiple GPUs, only one of them is allocated. you can refer to my issue here #327

One approach to save memory is to train on cropped images using --resize_or_crop resize_and_crop, and then generate the images at test time by loading only one generator network using --model test --resize_or_crop none. I think 800x600 can be dealt this way.

Exactly, this is what I did. I did a resize and crop (as the original implementation), and then during testing, I tested on the full image, and it worked

One last question, what is your opinion concerning my case of training on rectangle images 800x600. Do you think resizing to 286x286 then cropping to 256x256 is a good idea ?, or shall I skip the resizing and only crop square patches, or shall I resize to smaller rectangle images instead of square, and then do the random cropping.
what can you recommend me to do ?

snlee81 · 2018-09-03T05:38:08Z

Hello @taesung89,

I also have similar problem since not enough GPU memory for my 512 x 512 images. I did exact same ways you suggested for the training (loadSize = 512, fineSize = 256) and the test (loadSize = 512, fineSize = 512). My question is a little bit different from above. How does G work for larger test image (512 x 512) even though it was trained with small (in this case, 256 x 256).

When testing, is there anyways of upsampling technique involved? I may miss some parts for now. For now, the results after G seems visually ok.

Thank you in advance.

junyanz · 2018-09-03T15:58:59Z

The G is a fully convolutional network (FCN). It does not require the same image size for training and test. See the original FCN paper and slides for more details.

junyanz closed this as completed Jan 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: cuda runtime error (2) : out of memory #328

RuntimeError: cuda runtime error (2) : out of memory #328

mhusseinsh commented Jul 17, 2018

cici33 commented Jul 19, 2018

taesungp commented Jul 19, 2018

mhusseinsh commented Jul 19, 2018

snlee81 commented Sep 3, 2018 •

edited

Loading

junyanz commented Sep 3, 2018

RuntimeError: cuda runtime error (2) : out of memory #328

RuntimeError: cuda runtime error (2) : out of memory #328

Comments

mhusseinsh commented Jul 17, 2018

cici33 commented Jul 19, 2018

taesungp commented Jul 19, 2018

mhusseinsh commented Jul 19, 2018

Did you try to train or test?

800x600 is more than 7 times larger than 256x256 image (800x600/(256x256) = 7.32), so the memory requirement will be very high.

One approach to save memory is to train on cropped images using --resize_or_crop resize_and_crop, and then generate the images at test time by loading only one generator network using --model test --resize_or_crop none. I think 800x600 can be dealt this way.

snlee81 commented Sep 3, 2018 • edited Loading

junyanz commented Sep 3, 2018

snlee81 commented Sep 3, 2018 •

edited

Loading