-
Notifications
You must be signed in to change notification settings - Fork 836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA error trying to run demo following requirements #82
Comments
I have tested CUDA and pytorch using this demo code and it ran correctly on gpu |
Solved the issue adding compute_30 to all setup.py and recompiling all torch modules. |
@AlverGant hi bro, i try to add compute_30 like this |
Hi @Exspiravit that is exactly what I did, is your GPU a Tesla K80? If not you have to set it accordingly as http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/ |
You are having the same error that I had "cuda call failed" the differences is that I am using cuda 9 and torch 1.0.0 as recommended in the requirements from the developer |
And I am using an older driver NVIDIA-SMI 384.130 |
Colab Pro allows for tesla V100 gpus, which require '-gencode', 'arch=compute_70,code=sm_70', By default, '-gencode', 'arch=compute_75,code=sm_75', is uncommented. In order for me to fix it, I needed to uncomment line 44 which has '-gencode', 'arch=compute_70,code=sm_70', and maybe this isn't the correct way, but I commented line 42 which had '-gencode', 'arch=compute_75,code=sm_75'. I then recompiled DAIN and DAIN PyTorch. Sorry if this is confusing, I'm not exactly experienced at this stuff. |
If you do not want to build CUDA programs. |
Hi,
I have installed using the following requirements
Ubuntu 16 on P2xlarge AWS Tesla k80
NVIDIA-SMI 384.130 driver
CUDA release 9.0, V9.0.176
CUDNN libcudnn7_7.6.5.32-1+cuda9.0_amd64.deb
Pytorch==1.0.0, scipy==1.2.0
Python3
All modules were compiled without errors
but when running the example CUDA_VISIBLE_DEVICES=0 python demo_MiddleBury.py
I got the following error:
revise the unique id to a random numer 91561
Namespace(SAVED_MODEL=None, alpha=[0.0, 1.0], arg='./model_weights/91561-Thu-May-21-18:36/args.txt', batch_size=1, channels=3, ctx_lr_coe=1.0, datasetName='Vimeo_90K_interp', datasetPath='', dataset_split=97, debug=False, depth_lr_coe=0.001, dtype=<class 'torch.cuda.FloatTensor'>, end_frame=100, epsilon=1e-06, factor=0.2, filter_lr_coe=1.0, filter_size=4, flow_lr_coe=0.01, force=False, frame_input_dir='/content/DAIN/input_frames', frame_output_dir='/content/DAIN/output_frames', log='./model_weights/91561-Thu-May-21-18:36/log.txt', lr=0.002, netName='DAIN', no_date=False, numEpoch=100, occ_lr_coe=1.0, patience=5, rectify_lr=0.001, save_path='./model_weights/91561-Thu-May-21-18:36', save_which=1, seed=1, start_frame=1, time_step=0.5, uid=None, use_cuda=True, use_cudnn=1, weight_decay=0, workers=8)
cudnn is used
The testing model weight is: ./model_weights/best.pth
The unique id for current testing is: 85504
RubberWhale
/home/ubuntu/.local/lib/python3.5/site-packages/torch/nn/modules/upsampling.py:129: UserWarning: nn.UpsamplingNearest2d is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.{} is deprecated. Use nn.functional.interpolate instead.".format(self.name))
/home/ubuntu/.local/lib/python3.5/site-packages/torch/nn/modules/upsampling.py:129: UserWarning: nn.Upsample is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.{} is deprecated. Use nn.functional.interpolate instead.".format(self.name))
/home/ubuntu/.local/lib/python3.5/site-packages/torch/nn/functional.py:2423: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
error in correlation_forward_cuda_kernel: no kernel image is available for execution on the device
Traceback (most recent call last):
File "demo_MiddleBury.py", line 131, in
y_s,offset,filter = model(torch.stack((X0, X1),dim = 0))
File "/home/ubuntu/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/DAIN/networks/DAIN.py", line 152, in forward
time_offsets=time_offsets[::-1])
File "/usr/lib/python3.5/contextlib.py", line 77, in exit
self.gen.throw(type, value, traceback)
File "/home/ubuntu/.local/lib/python3.5/site-packages/torch/cuda/init.py", line 326, in stream
yield
File "/home/ubuntu/DAIN/networks/DAIN.py", line 149, in forward
self.forward_flownets(self.flownets, cur_offset_input, time_offsets=time_offsets),
File "/home/ubuntu/DAIN/networks/DAIN.py", line 205, in forward_flownets
temp = model(input) # this is a single direction motion results, but not a bidirectional one
File "/home/ubuntu/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/DAIN/PWCNet/PWCNet.py", line 221, in forward
corr6 = self.corr(c16, c26)
File "/home/ubuntu/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, kwargs)
File "/home/ubuntu/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 59, in forward
result = CorrelationFunction(self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)(input1, input2)
File "/home/ubuntu/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 27, in forward
self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)
RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f3d75f03fe1 in /home/ubuntu/.local/lib/python3.5/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f3d75f03dfa in /home/ubuntu/.local/lib/python3.5/site-packages/torch/lib/libc10.so)
frame #2: correlation_forward_cuda(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, int, int, int, int, int, int) + 0x5e6 (0x7f3d7299ef26 in /usr/local/lib/python3.5/dist-packages/correlation_cuda-0.0.0-py3.5-linux-x86_64.egg/correlation_cuda.cpython-35m-x86_64-linux-gnu.so)
frame #3: + 0x16042 (0x7f3d729ab042 in /usr/local/lib/python3.5/dist-packages/correlation_cuda-0.0.0-py3.5-linux-x86_64.egg/correlation_cuda.cpython-35m-x86_64-linux-gnu.so)
frame #4: + 0x1627e (0x7f3d729ab27e in /usr/local/lib/python3.5/dist-packages/correlation_cuda-0.0.0-py3.5-linux-x86_64.egg/correlation_cuda.cpython-35m-x86_64-linux-gnu.so)
frame #5: + 0x12e76 (0x7f3d729a7e76 in /usr/local/lib/python3.5/dist-packages/correlation_cuda-0.0.0-py3.5-linux-x86_64.egg/correlation_cuda.cpython-35m-x86_64-linux-gnu.so)
frame #9: python3() [0x4ec9a3]
frame #11: python3() [0x4fc63e]
frame #14: THPFunction_do_forward(THPFunction, _object) + 0x15c (0x7f3db0235bdc in /home/ubuntu/.local/lib/python3.5/site-packages/torch/lib/libtorch_python.so)
frame #17: python3() [0x5b4846]
frame #21: python3() [0x4ecab7]
frame #25: python3() [0x4ec9a3]
frame #27: python3() [0x4fc63e]
frame #29: python3() [0x5b4846]
frame #33: python3() [0x4ecab7]
frame #37: python3() [0x4ec9a3]
frame #39: python3() [0x4fc63e]
frame #41: python3() [0x5b4846]
frame #44: python3() [0x54548f]
frame #47: python3() [0x4ecab7]
frame #51: python3() [0x4ec9a3]
frame #53: python3() [0x4fc63e]
frame #55: python3() [0x5b4846]
frame #58: python3() [0x544f43]
frame #60: python3() [0x622642]
Please help!
The text was updated successfully, but these errors were encountered: