You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that because the function is written by C++, so the variable type is unmatched, i convert the fp16 variable with .float(), it worked. but the GPU memory from 7341Mb to 8553Mb, what should I do?
I got it! for those extern function such as nms, roi align, roi pooling, you should convert data to float32 before compute, after computing, you should convert result and data to flot16, so the gpu memory decrease to 7000Mb.
I got it! for those extern function such as NMS, ROI aligns, ROI pooling, you should convert data to float32 before compute, after computing, you should convert result and data to flot16, so the GPU memory decrease to 7000Mb.
Hi, I got the same problem as you. After converted .float() before ROI aligns function, I got the following error:
RuntimeError: Function _ROIAlignBackward returned an invalid gradient at index 0 - expected type torch.cuda.HalfTensor but got torch.cuda.FloatTensor
Could you please help me to solve it?
I don't know why this question.
Traceback (most recent call last):
File "trainval_fp16.py", line 390, in
rois_label = fasterRCNN(im_data, im_info, gt_boxes, num_boxes)
File "/home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xfr/faster-rcnn.pytorch/lib/model/faster_rcnn/fpn_p.py", line 276, in forward
rois, rpn_loss_cls, rpn_loss_bbox = self.RCNN_rpn(rpn_feature_maps, im_info, gt_boxes, num_boxes)
File "/home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xfr/faster-rcnn.pytorch/lib/model/rpn/rpn_fpn.py", line 100, in forward
im_info, cfg_key, rpn_shapes))
File "/home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xfr/faster-rcnn.pytorch/lib/model/rpn/proposal_layer_fpn.py", line 118, in forward
keep_idx_i = nms(proposals_single, scores_single.squeeze(1), nms_thresh)
RuntimeError: Expected object of scalar type Half but got scalar type Float for sequence elment 1 in sequence argument at position #1 'tensors' (checked_tensor_list_unwrap at /pytorch/aten/src/ATen/Utils.h:91)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f7da5b8cfe1 in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f7da5b8cdfa in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: + 0x1332f09 (0x7f7d2b941f09 in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #3: at::CUDAHalfType::_th_cat(c10::ArrayRefat::Tensor, long) const + 0xac (0x7f7d2b94baac in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #4: at::native::cat(c10::ArrayRefat::Tensor, long) + 0xa4 (0x7f7d21926ae4 in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #5: at::TypeDefault::cat(c10::ArrayRefat::Tensor, long) const + 0x4f (0x7f7d21b0436f in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #6: torch::autograd::VariableType::cat(c10::ArrayRefat::Tensor, long) const + 0x1e8 (0x7f7d99116928 in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #7: nms(at::Tensor const&, at::Tensor const&, float) + 0xc5 (0x7f7cc5317585 in /home/xfr/faster-rcnn.pytorch/lib/model/_C.cpython-36m-x86_64-linux-gnu.so)
frame #8: + 0x28e87 (0x7f7cc5323e87 in /home/xfr/faster-rcnn.pytorch/lib/model/_C.cpython-36m-x86_64-linux-gnu.so)
frame #9: + 0x28f7e (0x7f7cc5323f7e in /home/xfr/faster-rcnn.pytorch/lib/model/_C.cpython-36m-x86_64-linux-gnu.so)
frame #10: + 0x25e65 (0x7f7cc5320e65 in /home/xfr/faster-rcnn.pytorch/lib/model/_C.cpython-36m-x86_64-linux-gnu.so)
frame #63: __libc_start_main + 0xf0 (0x7f7dafec7830 in /lib/x86_64-linux-gnu/libc.so.6)
The text was updated successfully, but these errors were encountered: