Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected tensor for argument #1 'input' to have the same type as tensor for argument #2 'rois'; but type torch.cuda.HalfTensor does not equal torch.cuda.FloatTensor #832

Open
sarmientoj24 opened this issue May 12, 2020 · 7 comments

Comments

@sarmientoj24
Copy link

Getting this while training a Faster RCNN
On training process

for epoch in range(num_epochs):
  model.train()
  i = 0    
  for imgs, annotations in data_loader:
    i += 1
    total_processed += 1
    imgs = list(img.to(device) for img in imgs)
    annotations = [{k: v.to(device) for k, v in t.items()} for t in annotations]
    loss_dict = model(imgs, annotations)
    losses = sum(loss for loss in loss_dict.values())

    optimizer.zero_grad()
    losses.backward()
    with amp.scale_loss(losses, optimizer) as scaled_loss:
      scaled_loss.backward()
    optimizer.step()

I get RUNTIME Error

  warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-65-8fac2bdda8e5> in <module>()
     15     # imgs = torch.as_tensor(imgs, dtype=torch.float32)
     16     annotations = [{k: v.to(device) for k, v in t.items()} for t in annotations]
---> 17     loss_dict = model(new_imgs, annotations)
     18     losses = sum(loss for loss in loss_dict.values())
     19 

6 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/generalized_rcnn.py in forward(self, images, targets)
     69             features = OrderedDict([('0', features)])
     70         proposals, proposal_losses = self.rpn(images, features, targets)
---> 71         detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
     72         detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)
     73 

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/roi_heads.py in forward(self, features, proposals, image_shapes, targets)
    752             matched_idxs = None
    753 
--> 754         box_features = self.box_roi_pool(features, proposals, image_shapes)
    755         box_features = self.box_head(box_features)
    756         class_logits, box_regression = self.box_predictor(box_features)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/torchvision/ops/poolers.py in forward(self, x, boxes, image_shapes)
    194                 output_size=self.output_size,
    195                 spatial_scale=scales[0],
--> 196                 sampling_ratio=self.sampling_ratio
    197             )
    198 

/usr/local/lib/python3.6/dist-packages/torchvision/ops/roi_align.py in roi_align(input, boxes, output_size, spatial_scale, sampling_ratio, aligned)
     43     return torch.ops.torchvision.roi_align(input, rois, spatial_scale,
     44                                            output_size[0], output_size[1],
---> 45                                            sampling_ratio, aligned)
     46 
     47 

RuntimeError: Expected tensor for argument #1 'input' to have the same type as tensor for argument #2 'rois'; but type torch.cuda.HalfTensor does not equal torch.cuda.FloatTensor (while checking arguments for ROIAlign_forward_cuda)```

Environment

CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
CMake version: version 3.12.0

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: Tesla P100-PCIE-16GB
Nvidia driver version: 418.67
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5

Versions of relevant libraries:
[pip3] numpy==1.18.4
[pip3] torch==1.5.0+cu101
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.3.1
[pip3] torchvision==0.6.0+cu101
[conda] Could not collect

@oooolga
Copy link

oooolga commented Jun 8, 2020

Did you get this error fixed? I am receiving the same runtime error. The model works perfectly when I run it on its own. I receive this runtime error only when I run it with another model simultaneously.

@sarmientoj24
Copy link
Author

not yet. no answer from apex

@sharat29ag
Copy link

I am also getting the same error. Is the error fixed?

@yulijun1220
Copy link

I came across the same problem. Is there any solution way now?

RuntimeError: Expected tensor for argument #1 'grad_output' to have the same type as tensor for argument #2 'weight'; but type torch.cuda.HalfTensor does not equal torch.cuda.FloatTensor (while checking arguments for cudnn_convolution_backward_input)

@ucasyjz
Copy link

ucasyjz commented Apr 19, 2022

I also meet this question

@iasonasxrist
Copy link

same problem but fixed with command .float().
This error referred to the tensor.dtype such as torch.float16 is the half tensor of tensor.float32!

@zhangisland
Copy link

using scope of with autocast(enabled=True): helped me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants