nms error Expected object of scalar type Half but got scalar type Float for sequence elment 1 in sequence argument at position #1 'tensors' #430

cizhenshi · 2019-08-14T01:38:44Z

I don't know why this question.

Traceback (most recent call last):
File "trainval_fp16.py", line 390, in
rois_label = fasterRCNN(im_data, im_info, gt_boxes, num_boxes)
File "/home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xfr/faster-rcnn.pytorch/lib/model/faster_rcnn/fpn_p.py", line 276, in forward
rois, rpn_loss_cls, rpn_loss_bbox = self.RCNN_rpn(rpn_feature_maps, im_info, gt_boxes, num_boxes)
File "/home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xfr/faster-rcnn.pytorch/lib/model/rpn/rpn_fpn.py", line 100, in forward
im_info, cfg_key, rpn_shapes))
File "/home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/xfr/faster-rcnn.pytorch/lib/model/rpn/proposal_layer_fpn.py", line 118, in forward
keep_idx_i = nms(proposals_single, scores_single.squeeze(1), nms_thresh)
RuntimeError: Expected object of scalar type Half but got scalar type Float for sequence elment 1 in sequence argument at position #1 'tensors' (checked_tensor_list_unwrap at /pytorch/aten/src/ATen/Utils.h:91)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f7da5b8cfe1 in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f7da5b8cdfa in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: + 0x1332f09 (0x7f7d2b941f09 in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #3: at::CUDAHalfType::_th_cat(c10::ArrayRefat::Tensor, long) const + 0xac (0x7f7d2b94baac in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so)
frame #4: at::native::cat(c10::ArrayRefat::Tensor, long) + 0xa4 (0x7f7d21926ae4 in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #5: at::TypeDefault::cat(c10::ArrayRefat::Tensor, long) const + 0x4f (0x7f7d21b0436f in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #6: torch::autograd::VariableType::cat(c10::ArrayRefat::Tensor, long) const + 0x1e8 (0x7f7d99116928 in /home/xfr/home/anaconda3/envs/py3.6/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #7: nms(at::Tensor const&, at::Tensor const&, float) + 0xc5 (0x7f7cc5317585 in /home/xfr/faster-rcnn.pytorch/lib/model/_C.cpython-36m-x86_64-linux-gnu.so)
frame #8: + 0x28e87 (0x7f7cc5323e87 in /home/xfr/faster-rcnn.pytorch/lib/model/_C.cpython-36m-x86_64-linux-gnu.so)
frame #9: + 0x28f7e (0x7f7cc5323f7e in /home/xfr/faster-rcnn.pytorch/lib/model/_C.cpython-36m-x86_64-linux-gnu.so)
frame #10: + 0x25e65 (0x7f7cc5320e65 in /home/xfr/faster-rcnn.pytorch/lib/model/_C.cpython-36m-x86_64-linux-gnu.so)

frame #63: __libc_start_main + 0xf0 (0x7f7dafec7830 in /lib/x86_64-linux-gnu/libc.so.6)

cizhenshi · 2019-08-14T02:05:01Z

It seems that because the function is written by C++, so the variable type is unmatched, i convert the fp16 variable with .float(), it worked. but the GPU memory from 7341Mb to 8553Mb, what should I do?

cizhenshi · 2019-08-14T02:42:08Z

I got it! for those extern function such as nms, roi align, roi pooling, you should convert data to float32 before compute, after computing, you should convert result and data to flot16, so the gpu memory decrease to 7000Mb.

ShifuShen · 2019-08-30T03:17:23Z

I got it! for those extern function such as NMS, ROI aligns, ROI pooling, you should convert data to float32 before compute, after computing, you should convert result and data to flot16, so the GPU memory decrease to 7000Mb.

Hi, I got the same problem as you. After converted .float() before ROI aligns function, I got the following error:
RuntimeError: Function _ROIAlignBackward returned an invalid gradient at index 0 - expected type torch.cuda.HalfTensor but got torch.cuda.FloatTensor
Could you please help me to solve it?

ShifuShen · 2019-08-30T06:50:28Z

I‘ve solved this problem by modifying the ROIAlign Backward function.

HaxThePlanet · 2020-06-18T19:25:24Z

For me the issue was not enough vram, had a game open in the bg.

cizhenshi closed this as completed Aug 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nms error Expected object of scalar type Half but got scalar type Float for sequence elment 1 in sequence argument at position #1 'tensors' #430

nms error Expected object of scalar type Half but got scalar type Float for sequence elment 1 in sequence argument at position #1 'tensors' #430

cizhenshi commented Aug 14, 2019

cizhenshi commented Aug 14, 2019

cizhenshi commented Aug 14, 2019

ShifuShen commented Aug 30, 2019

ShifuShen commented Aug 30, 2019

HaxThePlanet commented Jun 18, 2020

nms error Expected object of scalar type Half but got scalar type Float for sequence elment 1 in sequence argument at position #1 'tensors' #430

nms error Expected object of scalar type Half but got scalar type Float for sequence elment 1 in sequence argument at position #1 'tensors' #430

Comments

cizhenshi commented Aug 14, 2019

cizhenshi commented Aug 14, 2019

cizhenshi commented Aug 14, 2019

ShifuShen commented Aug 30, 2019

ShifuShen commented Aug 30, 2019

HaxThePlanet commented Jun 18, 2020