Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nms error Expected object of scalar type Half but got scalar type Float for sequence elment 1 in sequence argument at position #1 'tensors' #430

Closed
cizhenshi opened this issue Aug 14, 2019 · 5 comments

Comments

@cizhenshi
Copy link

I don't know why this question.

Traceback (most recent call last):
File "trainval_fp16.py", line 390, in
rois_label = fasterRCNN(im_data, im_info, gt_boxes, num_boxes)
File "/home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xfr/faster-rcnn.pytorch/lib/model/faster_rcnn/fpn_p.py", line 276, in forward
rois, rpn_loss_cls, rpn_loss_bbox = self.RCNN_rpn(rpn_feature_maps, im_info, gt_boxes, num_boxes)
File "/home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xfr/faster-rcnn.pytorch/lib/model/rpn/rpn_fpn.py", line 100, in forward
im_info, cfg_key, rpn_shapes))
File "/home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xfr/faster-rcnn.pytorch/lib/model/rpn/proposal_layer_fpn.py", line 118, in forward
keep_idx_i = nms(proposals_single, scores_single.squeeze(1), nms_thresh)
RuntimeError: Expected object of scalar type Half but got scalar type Float for sequence elment 1 in sequence argument at position #1 'tensors' (checked_tensor_list_unwrap at /pytorch/aten/src/ATen/Utils.h:91)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f7da5b8cfe1 in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f7da5b8cdfa in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: + 0x1332f09 (0x7f7d2b941f09 in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #3: at::CUDAHalfType::_th_cat(c10::ArrayRefat::Tensor, long) const + 0xac (0x7f7d2b94baac in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #4: at::native::cat(c10::ArrayRefat::Tensor, long) + 0xa4 (0x7f7d21926ae4 in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #5: at::TypeDefault::cat(c10::ArrayRefat::Tensor, long) const + 0x4f (0x7f7d21b0436f in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #6: torch::autograd::VariableType::cat(c10::ArrayRefat::Tensor, long) const + 0x1e8 (0x7f7d99116928 in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #7: nms(at::Tensor const&, at::Tensor const&, float) + 0xc5 (0x7f7cc5317585 in /home/xfr/faster-rcnn.pytorch/lib/model/_C.cpython-36m-x86_64-linux-gnu.so)
frame #8: + 0x28e87 (0x7f7cc5323e87 in /home/xfr/faster-rcnn.pytorch/lib/model/_C.cpython-36m-x86_64-linux-gnu.so)
frame #9: + 0x28f7e (0x7f7cc5323f7e in /home/xfr/faster-rcnn.pytorch/lib/model/_C.cpython-36m-x86_64-linux-gnu.so)
frame #10: + 0x25e65 (0x7f7cc5320e65 in /home/xfr/faster-rcnn.pytorch/lib/model/_C.cpython-36m-x86_64-linux-gnu.so)

frame #63: __libc_start_main + 0xf0 (0x7f7dafec7830 in /lib/x86_64-linux-gnu/libc.so.6)

@cizhenshi
Copy link
Author

It seems that because the function is written by C++, so the variable type is unmatched, i convert the fp16 variable with .float(), it worked. but the GPU memory from 7341Mb to 8553Mb, what should I do?

@cizhenshi
Copy link
Author

I got it! for those extern function such as nms, roi align, roi pooling, you should convert data to float32 before compute, after computing, you should convert result and data to flot16, so the gpu memory decrease to 7000Mb.

@ShifuShen
Copy link

I got it! for those extern function such as NMS, ROI aligns, ROI pooling, you should convert data to float32 before compute, after computing, you should convert result and data to flot16, so the GPU memory decrease to 7000Mb.

Hi, I got the same problem as you. After converted .float() before ROI aligns function, I got the following error:
RuntimeError: Function _ROIAlignBackward returned an invalid gradient at index 0 - expected type torch.cuda.HalfTensor but got torch.cuda.FloatTensor
Could you please help me to solve it?

@ShifuShen
Copy link

I‘ve solved this problem by modifying the ROIAlign Backward function.

@HaxThePlanet
Copy link

For me the issue was not enough vram, had a game open in the bg.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants