-
Notifications
You must be signed in to change notification settings - Fork 836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error interpolating frames #117
Comments
I think you didn't build the packages correctly, make sure you have PyTorch >=1.0.0, <=1.4.0. If you're just intrested in doing the interpolation, check my repo iBobbyTS/VFIN, it's very easy to use, DAIN is included, there are colab notebooks that I tested, and there's also a tar file with everything (python, pytorch and dain packages) installed correctly. Open an issue there if there are any problem with it. |
I don't know how to install/change PyTorch >=1.0.0, <=1.4.0 in colab (Win10). |
To install pytorch 1.4, simply run this: |
This error has popped up a few times in the issues section of various DAIN colab repos. I have read it's possibly something with the GPU build. I would love for someone to look into this because I am not familiar at all with this area of coding and I've only gotten DAIN to work once (probably when I was assigned a P100). I get this same error when using different colabs for DAIN, specifically when I get a V100 I think. Has anyone found a solution??? |
Hi! Yes, I know what the error is. I don’t know exactly how to fix it. This is the relevant error line:
This means that the C native module built into CUDA is not properly compiled to a matching device. As Google keeps adding new models to their Colab support, we will keep finding these issues. This explains the symptoms that @TaoTeCha is seeing. My first attempt at this was achieved in #87, where I manually added a bunch of models for all the GPUs I could find in Colab at that time (June 2020). I also added a structure that hopefully made it easier to add more overtime... but it’s less than ideal. @xRoyBx Note that the version of the Colab with those fixes also suppresses a few warnings that I saw in your logs. Is it possible you’re not using the latest one? 1.5 is already in master, 1.5.1 is in a PR (#116). If anyone knows how to achieve future general compatibility, I’d be glad to work on that. |
I've trained and ran a handful of deep learning models in colab but in every case the GPU has been all set up and ready to go so I am totally ignorant with all this. I have a couple questions.
Thanks |
|
Not sure if it's a coincidence but I uncommented '-gencode', 'arch=compute_70,code=sm_70' in the compiler_args.py and switched to !pip install torch==1.0.0 torchvision==0.2.1 The colab is working with a V100 now. I'll probably use this a handful of times over the next week and I'll keep you updated if it continues to work. Thanks! |
Thanks for interpolation fix, unfortunately i get this error when creating output video, apparently it doesn't create output frames: /content/DAIN/output_frames
|
Are you sure your output path exists? Do you have a folder in your drive named DAIN? When you mounted you drive, did you mount it as gdrive or just drive? Try changing the output to '/content/output.mp4' and just download from the colab file folder. Or try !ffmpeg instead of %shell ffmpeg |
Paths are ok, DAIN folder is present, drive mounted as Gdrive: the problem is always the same (forget my previous post, sorry): it doesn't create output png frames despite the output folder is present (using Tesla V100 in colab) |
In VFIN, you don't need to worry about that, just specify -ot video, it will generate a mp4 in your input folder, if you specify the outpht by -o , you can use any extension and save it anywhere. |
using this command: i get this error |
Sorry, I changed the name of the runing file, for this time, use |
Any problem about VFIN, please open issues there. |
Hello, i always get this error during interpolation and can't proceed forward:
/content/DAIN
revise the unique id to a random numer 68776
Namespace(SAVED_MODEL=None, alpha=[0.0, 1.0], arg='./model_weights/68776-Tue-Nov-10-11-46/args.txt', batch_size=1, channels=3, ctx_lr_coe=1.0, datasetName='Vimeo_90K_interp', datasetPath='', dataset_split=97, debug=False, depth_lr_coe=0.001, dtype=<class 'torch.cuda.FloatTensor'>, end_frame=1259, epsilon=1e-06, factor=0.2, filter_lr_coe=1.0, filter_size=4, flow_lr_coe=0.01, force=False, frame_input_dir='/content/DAIN/input_frames', frame_output_dir='/content/DAIN/output_frames', log='./model_weights/68776-Tue-Nov-10-11-46/log.txt', lr=0.002, netName='DAIN_slowmotion', no_date=False, numEpoch=100, occ_lr_coe=1.0, patience=5, rectify_lr=0.001, save_path='./model_weights/68776-Tue-Nov-10-11-46', save_which=1, seed=1, start_frame=1, time_step=0.5, uid=None, use_cuda=True, use_cudnn=1, weight_decay=0, workers=8)
cudnn is used
Interpolate 1 frames
error in correlation_forward_cuda_kernel: no kernel image is available for execution on the device
Warning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function) (THPFunction_do_forward at /pytorch/torch/csrc/autograd/python_function.cpp:622)
Traceback (most recent call last):
File "colab_interpolate.py", line 112, in
y_s, offset, filter = model(torch.stack((X0, X1),dim = 0))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/content/DAIN/networks/DAIN_slowmotion.py", line 148, in forward
self.forward_flownets(self.flownets, cur_offset_input, time_offsets=time_offsets),
File "/content/DAIN/networks/DAIN_slowmotion.py", line 212, in forward_flownets
temp = model(input) # this is a single direction motion results, but not a bidirectional one
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/content/DAIN/PWCNet/PWCNet.py", line 221, in forward
corr6 = self.corr(c16, c26)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, kwargs)
File "/content/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 59, in forward
result = CorrelationFunction(self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)(input1, input2)
File "/content/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 27, in forward
self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)
RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f3c81ae3193 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: correlation_forward_cuda(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, int, int, int, int, int, int) + 0x628 (0x7f3c7e117b38 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #2: + 0x1bd4a (0x7f3c7e127d4a in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #3: + 0x18890 (0x7f3c7e124890 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #4: python3() [0x50a4a5]
frame #7: python3() [0x594a01]
frame #9: THPFunction_do_forward(THPFunction, _object) + 0x4ac (0x7f3ccaaf4d4c in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so)
frame #11: python3() [0x54a971]
frame #13: python3() [0x50a433]
frame #16: python3() [0x594a01]
frame #19: python3() [0x507be4]
frame #21: python3() [0x594a01]
frame #22: python3() [0x54a971]
frame #24: python3() [0x50a433]
frame #26: python3() [0x507be4]
frame #28: python3() [0x594a01]
frame #31: python3() [0x507be4]
frame #33: python3() [0x594a01]
frame #34: python3() [0x54a971]
frame #36: python3() [0x50a433]
frame #38: python3() [0x507be4]
frame #39: python3() [0x509900]
frame #40: python3() [0x50a2fd]
frame #42: python3() [0x507be4]
frame #44: python3() [0x594a01]
frame #47: python3() [0x507be4]
frame #49: python3() [0x594a01]
frame #50: python3() [0x54a971]
frame #52: python3() [0x50a433]
frame #54: python3() [0x507be4]
frame #56: python3() [0x634e72]
frame #61: __libc_start_main + 0xe7 (0x7f3cd5d04bf7 in /lib/x86_64-linux-gnu/libc.so.6)
The text was updated successfully, but these errors were encountered: