You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Because the operator functions are dynamic attributes, when I use the model and convert it to nn.DataParallel, the following error will arise during forwarding:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-10-f58217f83c02> in <module>()
28 shapedata=shapedata,
29 metadata_dir=checkpoint_path, samples_dir=samples_path,
---> 30 checkpoint_path = args['checkpoint_file'])
/mnt/Data2/jingwang/git-task/Neural3DMM/train_funcs.py in train_autoencoder_dataloader(dataloader_train, dataloader_val, device, model, optim, loss_fn, bsize, start_epoch, n_epochs, eval_freq, scheduler, writer, save_recons, shapedata, metadata_dir, samples_dir, checkpoint_path)
25 cur_bsize = tx.shape[0]
26
---> 27 tx_hat = model(tx)
28 loss = loss_fn(tx, tx_hat)
29
${HOME}/Data/anaconda3/envs/py2-spiral/lib/python2.7/site-packages/torch/nn/modules/module.pyc in __call__(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
--> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)
${HOME}/Data/anaconda3/envs/py2-spiral/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.pyc in forward(self, *inputs, **kwargs)
150 return self.module(*inputs[0], **kwargs[0])
151 replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
--> 152 outputs = self.parallel_apply(replicas, inputs, kwargs)
153 return self.gather(outputs, self.output_device)
154
${HOME}/Data/anaconda3/envs/py2-spiral/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.pyc in parallel_apply(self, replicas, inputs, kwargs)
160
161 def parallel_apply(self, replicas, inputs, kwargs):
--> 162 return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
163
164 def gather(self, outputs, output_device):
${HOME}/Data/anaconda3/envs/py2-spiral/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.pyc in parallel_apply(modules, inputs, kwargs_tup, devices)
81 output = results[i]
82 if isinstance(output, Exception):
---> 83 raise output
84 outputs.append(output)
85 return outputs
RuntimeError: binary_op(): expected both inputs to be on same device, but input a is on cuda:1 and input b is on cuda:0
Because the operator functions are dynamic attributes, when I use the model and convert it to nn.DataParallel, the following error will arise during forwarding:
It seems that this behavior is explained in this issue: pytorch/pytorch#8637 .
I wonder if there are any ways to bypass the issue in pytorch, and allow us to use DataParallel?
The text was updated successfully, but these errors were encountered: