-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RunTimeError using multiple GPUs. #11
Comments
Hello, Sorry that I missed this issue. I am not an expert in DataParallel but I think the reason is that I have used the "Batch dimension" for simulating the time. As there are operations over the time dimension, it cannot be done over multiple GPUs. I will try to explore the problem and find a fix, but I guess it won't be an easy one! Sorry again and thank you for reporting this important issue. |
my solution was making sure both of them in the same device torch.where(torch.tensor(pairings[i],device=torch.tensor(self.learning_rate[f]).device), *(torch.tensor(self.learning_rate[f],device=torch.tensor(self.learning_rate[f]).device))) but this is not efficient. Please let me know if you have any other suggestion . Thank you so much for replying |
I have an idea in my mind (as a quick temporary fix) but I cannot check if it works at the moment. Right now, the script iterates over samples in a single batch and passes them to the network one by one. What if we move this iteration into the network's forward pass? This means we pass a 5-dim input to the network which is in the shape of (batch, time, channels, w, h), then, inside the forward function, we iterate over the batch dimension and process each of them separately. Please let me know if my explanation is not clear. |
Hi Milad & Meera, |
Thank you so much for all your answers! def forward(self, input):
flattened = input.flatten(0, 1)
conv2d_out = fn.conv2d(flattened, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)
return conv2d_out.reshape(input.shape[0], -1, conv2d_out.shape[1], conv2d_out.shape[2], conv2d_out.shape[3]) |
Hello,
Thank You for this library. I have been using the mozafari.py network to train my spiking network model. Since the dataset is imagenet I wanted to use a multiGPU setup. I used the data parallel module from pytorch to train with 8 GPUs like following:
The text was updated successfully, but these errors were encountered: