Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out of memory #70

Closed
heiyuxiaokai opened this issue Jun 23, 2019 · 8 comments
Closed

out of memory #70

heiyuxiaokai opened this issue Jun 23, 2019 · 8 comments

Comments

@heiyuxiaokai
Copy link

File "/home/fw/Softwares/FCOS/maskrcnn_benchmark/structures/boxlist_ops.py", line 84, in boxlist_iou
wh = (rb - lt + TO_REMOVE).clamp(min=0) # [N,M,2]
RuntimeError: CUDA out of memory. Tried to allocate 1.56 GiB (GPU 1; 11.92 GiB total capacity; 7.99 GiB already allocated; 1.20 GiB free; 1.74 GiB cached)

It seems the iou caculate' problem. I use retinanet, batch 4, 2 titan x(12G)
The GPU use of beginning:
Screenshot from 2019-06-23 10-40-26
Should I set the batch to 2?

@tianzhi0549
Copy link
Owner

@heiyuxiaokai Did FCOS run out of memory?

@heiyuxiaokai
Copy link
Author

@tianzhi0549 No,Maybe the iou caculate process of a special image(with many boxes) need a lot of memory. FCOS haven't this process. Did your GPU is 12G where you train this model for (4 gpu, batch 8)?
The data I use is remote sensing image, which may have many object.

@tianzhi0549
Copy link
Owner

@heiyuxiaokai our GPUs are 32GB V100.

@heiyuxiaokai
Copy link
Author

@tianzhi0549 So I should set batch to 2. You train batch 8 of 4 GPU(V100). Why don't you use a larger batch for 32g GPU?

@tianzhi0549
Copy link
Owner

@heiyuxiaokai We use 16 images in a mini-batch for a fair comparison.

@heiyuxiaokai
Copy link
Author

Too many GT Boxes. It was explained there.
facebookresearch/maskrcnn-benchmark#18

@dreamhighchina
Copy link

你的解决了吗?我也是在计算loss的时候出错了,我的batchsize是2都错。

@heiyuxiaokai
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants