Error interpolating frames #117

xRoyBx · 2020-11-10T11:57:20Z

Hello, i always get this error during interpolation and can't proceed forward:

/content/DAIN
revise the unique id to a random numer 68776
Namespace(SAVED_MODEL=None, alpha=[0.0, 1.0], arg='./model_weights/68776-Tue-Nov-10-11-46/args.txt', batch_size=1, channels=3, ctx_lr_coe=1.0, datasetName='Vimeo_90K_interp', datasetPath='', dataset_split=97, debug=False, depth_lr_coe=0.001, dtype=<class 'torch.cuda.FloatTensor'>, end_frame=1259, epsilon=1e-06, factor=0.2, filter_lr_coe=1.0, filter_size=4, flow_lr_coe=0.01, force=False, frame_input_dir='/content/DAIN/input_frames', frame_output_dir='/content/DAIN/output_frames', log='./model_weights/68776-Tue-Nov-10-11-46/log.txt', lr=0.002, netName='DAIN_slowmotion', no_date=False, numEpoch=100, occ_lr_coe=1.0, patience=5, rectify_lr=0.001, save_path='./model_weights/68776-Tue-Nov-10-11-46', save_which=1, seed=1, start_frame=1, time_step=0.5, uid=None, use_cuda=True, use_cudnn=1, weight_decay=0, workers=8)
cudnn is used
Interpolate 1 frames
error in correlation_forward_cuda_kernel: no kernel image is available for execution on the device
Warning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function) (THPFunction_do_forward at /pytorch/torch/csrc/autograd/python_function.cpp:622)
Traceback (most recent call last):
File "colab_interpolate.py", line 112, in
y_s, offset, filter = model(torch.stack((X0, X1),dim = 0))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/content/DAIN/networks/DAIN_slowmotion.py", line 148, in forward
self.forward_flownets(self.flownets, cur_offset_input, time_offsets=time_offsets),
File "/content/DAIN/networks/DAIN_slowmotion.py", line 212, in forward_flownets
temp = model(input) # this is a single direction motion results, but not a bidirectional one
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/content/DAIN/PWCNet/PWCNet.py", line 221, in forward
corr6 = self.corr(c16, c26)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, kwargs)
File "/content/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 59, in forward
result = CorrelationFunction(self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)(input1, input2)
File "/content/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 27, in forward
self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)
RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f3c81ae3193 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: correlation_forward_cuda(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, int, int, int, int, int, int) + 0x628 (0x7f3c7e117b38 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #2: + 0x1bd4a (0x7f3c7e127d4a in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #3: + 0x18890 (0x7f3c7e124890 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #4: python3() [0x50a4a5]

frame #7: python3() [0x594a01]
frame #9: THPFunction_do_forward(THPFunction, _object) + 0x4ac (0x7f3ccaaf4d4c in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so)
frame #11: python3() [0x54a971]
frame #13: python3() [0x50a433]
frame #16: python3() [0x594a01]
frame #19: python3() [0x507be4]
frame #21: python3() [0x594a01]
frame #22: python3() [0x54a971]
frame #24: python3() [0x50a433]
frame #26: python3() [0x507be4]
frame #28: python3() [0x594a01]
frame #31: python3() [0x507be4]
frame #33: python3() [0x594a01]
frame #34: python3() [0x54a971]
frame #36: python3() [0x50a433]
frame #38: python3() [0x507be4]
frame #39: python3() [0x509900]
frame #40: python3() [0x50a2fd]
frame #42: python3() [0x507be4]
frame #44: python3() [0x594a01]
frame #47: python3() [0x507be4]
frame #49: python3() [0x594a01]
frame #50: python3() [0x54a971]
frame #52: python3() [0x50a433]
frame #54: python3() [0x507be4]
frame #56: python3() [0x634e72]
frame #61: __libc_start_main + 0xe7 (0x7f3cd5d04bf7 in /lib/x86_64-linux-gnu/libc.so.6)

iBobbyTS · 2020-11-10T15:08:50Z

I think you didn't build the packages correctly, make sure you have PyTorch >=1.0.0, <=1.4.0. If you're just intrested in doing the interpolation, check my repo iBobbyTS/VFIN, it's very easy to use, DAIN is included, there are colab notebooks that I tested, and there's also a tar file with everything (python, pytorch and dain packages) installed correctly. Open an issue there if there are any problem with it.

xRoyBx · 2020-11-10T16:22:22Z

I don't know how to install/change PyTorch >=1.0.0, <=1.4.0 in colab (Win10).
Anyway, using the "official" notebook by Styler00Dollar and Alpha or other related notebooks, i get the same error.
I'll try with VFIN, thanks ;)

iBobbyTS · 2020-11-10T16:43:14Z

To install pytorch 1.4, simply run this:
pip install torch==1.4.0
if you're using a notebook like Colab, add a ! befor, like
!pip install torch==1.4.0
I’m modifying code in VFIN very often now, so there mifht be errors while someone else use it. I'm still learning about GitHub, I might start using the brunch and release systems to keep stable versions and somewhere else to develop.

TaoTeCha · 2020-11-10T22:35:45Z

This error has popped up a few times in the issues section of various DAIN colab repos. I have read it's possibly something with the GPU build. I would love for someone to look into this because I am not familiar at all with this area of coding and I've only gotten DAIN to work once (probably when I was assigned a P100). I get this same error when using different colabs for DAIN, specifically when I get a V100 I think.

Has anyone found a solution???

AlphaGit · 2020-11-10T22:52:50Z

Hi! Yes, I know what the error is. I don’t know exactly how to fix it.

This is the relevant error line:

error in correlation_forward_cuda_kernel: no kernel image is available for execution on the device

This means that the C native module built into CUDA is not properly compiled to a matching device. As Google keeps adding new models to their Colab support, we will keep finding these issues. This explains the symptoms that @TaoTeCha is seeing.

My first attempt at this was achieved in #87, where I manually added a bunch of models for all the GPUs I could find in Colab at that time (June 2020). I also added a structure that hopefully made it easier to add more overtime... but it’s less than ideal.

@xRoyBx Note that the version of the Colab with those fixes also suppresses a few warnings that I saw in your logs. Is it possible you’re not using the latest one? 1.5 is already in master, 1.5.1 is in a PR (#116).

If anyone knows how to achieve future general compatibility, I’d be glad to work on that.

TaoTeCha · 2020-11-10T23:18:08Z

I've trained and ran a handful of deep learning models in colab but in every case the GPU has been all set up and ready to go so I am totally ignorant with all this. I have a couple questions.

Why do you need to do a 15 minute 'build' with DAIN when I have never had to do this with any other model I've used?
What parameters do I need to change in the files to find a model that works with colab's V100? I'm willing to put in the trial and error work if someone enlightens me in what I should be changing.

Thanks

AlphaGit · 2020-11-11T00:31:08Z

DAIN is a mixture of different CNNs put together, some of them from previous papers. You can find more info here and in the original paper. So that you don’t have to run 6 CNNs in parallel, which is memory-expensive and incredibly slow, the authors compiled some of these “layers” into CUDA modules they could run in the GPU directly to train and infer with DAIN. These modules are the ones taking ~15 minutes and giving us these headaches.
Check out this file.

TaoTeCha · 2020-11-11T02:07:43Z

Not sure if it's a coincidence but I uncommented '-gencode', 'arch=compute_70,code=sm_70' in the compiler_args.py and switched to !pip install torch==1.0.0 torchvision==0.2.1

The colab is working with a V100 now. I'll probably use this a handful of times over the next week and I'll keep you updated if it continues to work.

Thanks!

xRoyBx · 2020-11-11T12:54:27Z

Thanks for interpolation fix, unfortunately i get this error when creating output video, apparently it doesn't create output frames:

/content/DAIN/output_frames
ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
libavutil 55. 78.100 / 55. 78.100
libavcodec 57.107.100 / 57.107.100
libavformat 57. 83.100 / 57. 83.100
libavdevice 57. 10.100 / 57. 10.100
libavfilter 6.107.100 / 6.107.100
libavresample 3. 7. 0 / 3. 7. 0
libswscale 4. 8.100 / 4. 8.100
libswresample 2. 9.100 / 2. 9.100
libpostproc 54. 7.100 / 54. 7.100
[image2 @ 0x55727fd56000] Could not open file : .png
[image2 @ 0x55727fd56000] Could not find codec parameters for stream 0 (Video: png, none(pc)): unspecified size
Consider increasing the value for the 'analyzeduration' and 'probesize' options
Input #0, image2, from '.png':
Duration: 00:00:00.02, start: 0.000000, bitrate: N/A
Stream #0:0: Video: png, none(pc), 60 tbr, 60 tbn, 60 tbc
Output #0, mp4, to '/content/gdrive/My Drive/DAIN/output.mp4':
Output file #0 does not contain any stream

CalledProcessError Traceback (most recent call last)
in ()
1 # Create output video
2 get_ipython().magic('cd {FRAME_OUTPUT_DIR}')
----> 3 get_ipython().magic("shell ffmpeg -y -r {TARGET_FPS} -f image2 -pattern_type glob -i '*.png' '/content/gdrive/My Drive/{OUTPUT_FILE_PATH}'")

3 frames
/usr/local/lib/python3.6/dist-packages/google/colab/_system_commands.py in check_returncode(self)
136 if self.returncode:
137 raise subprocess.CalledProcessError(
--> 138 returncode=self.returncode, cmd=self.args, output=self.output)
139
140 def repr_pretty(self, p, cycle): # pylint:disable=unused-argument

CalledProcessError: Command 'ffmpeg -y -r 60 -f image2 -pattern_type glob -i '*.png' '/content/gdrive/My Drive/DAIN/output.mp4'' returned non-zero exit status 1.

TaoTeCha · 2020-11-11T15:55:44Z

Are you sure your output path exists? Do you have a folder in your drive named DAIN? When you mounted you drive, did you mount it as gdrive or just drive? Try changing the output to '/content/output.mp4' and just download from the colab file folder.

Or try !ffmpeg instead of %shell ffmpeg

xRoyBx · 2020-11-11T17:00:07Z

!ffmpeg

Are you sure your output path exists? Do you have a folder in your drive named DAIN? When you mounted you drive, did you mount it as gdrive or just drive? Try changing the output to '/content/output.mp4' and just download from the colab file folder.

Or try !ffmpeg instead of %shell ffmpeg

Paths are ok, DAIN folder is present, drive mounted as Gdrive: the problem is always the same (forget my previous post, sorry): it doesn't create output png frames despite the output folder is present (using Tesla V100 in colab)

iBobbyTS · 2020-11-14T07:34:27Z

In VFIN, you don't need to worry about that, just specify -ot video, it will generate a mp4 in your input folder, if you specify the outpht by -o , you can use any extension and save it anywhere.

xRoyBx · 2020-11-14T14:23:33Z

In VFIN, you don't need to worry about that, just specify -ot video, it will generate a mp4 in your input folder, if you specify the outpht by -o , you can use any extension and save it anywhere.

using this command:
!/content/python/bin/python3 /content/VFIN/run.py -i "/content/drive/My Drive/VFIN/input.mp4" -o "/content/drive/My Drive/VFIN/output.mp4"

i get this error
/content/python/bin/python3: can't open file '/content/VFIN/run.py': [Errno 2] No such file or directory

iBobbyTS · 2020-11-14T14:30:00Z

In VFIN, you don't need to worry about that, just specify -ot video, it will generate a mp4 in your input folder, if you specify the outpht by -o , you can use any extension and save it anywhere.

using this command:

!/content/python/bin/python3 /content/VFIN/run.py -i "/content/drive/My Drive/VFIN/input.mp4" -o "/content/drive/My Drive/VFIN/output.mp4"

i get this error

/content/python/bin/python3: can't open file '/content/VFIN/run.py': [Errno 2] No such file or directory

Sorry, I changed the name of the runing file, for this time, use
!/content/python/bin/python3 /content/VFIN/run_class.py -i "/content/drive/My Drive/VFIN/input.mp4" -o "/content/drive/My Drive/VFIN/output.mp4"
instead.
I fixed the GitHub repo and the pre-built tar file, copy the tar file to your drive again and use it next time, copy the notebook too, I edited it.
By the way, you need -a DAIN -ot video to make it use DAIN and output a video.

iBobbyTS · 2020-11-14T14:32:27Z

Any problem about VFIN, please open issues there.

TaoTeCha mentioned this issue Nov 11, 2020

error in correlation_forward_cuda_kernel: no kernel image is available for execution on the device CyFeng16/MVIMP#41

Open

semel1 mentioned this issue Dec 12, 2020

Colab pro error Interpolation #98

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error interpolating frames #117

Error interpolating frames #117

xRoyBx commented Nov 10, 2020

iBobbyTS commented Nov 10, 2020

xRoyBx commented Nov 10, 2020 •

edited

Loading

iBobbyTS commented Nov 10, 2020 •

edited

Loading

TaoTeCha commented Nov 10, 2020 •

edited

Loading

AlphaGit commented Nov 10, 2020

TaoTeCha commented Nov 10, 2020

AlphaGit commented Nov 11, 2020

TaoTeCha commented Nov 11, 2020

xRoyBx commented Nov 11, 2020 •

edited

Loading

TaoTeCha commented Nov 11, 2020

xRoyBx commented Nov 11, 2020 •

edited

Loading

iBobbyTS commented Nov 14, 2020

xRoyBx commented Nov 14, 2020

iBobbyTS commented Nov 14, 2020 •

edited

Loading

iBobbyTS commented Nov 14, 2020

Error interpolating frames #117

Error interpolating frames #117

Comments

xRoyBx commented Nov 10, 2020

iBobbyTS commented Nov 10, 2020

xRoyBx commented Nov 10, 2020 • edited Loading

iBobbyTS commented Nov 10, 2020 • edited Loading

TaoTeCha commented Nov 10, 2020 • edited Loading

AlphaGit commented Nov 10, 2020

TaoTeCha commented Nov 10, 2020

AlphaGit commented Nov 11, 2020

TaoTeCha commented Nov 11, 2020

xRoyBx commented Nov 11, 2020 • edited Loading

TaoTeCha commented Nov 11, 2020

xRoyBx commented Nov 11, 2020 • edited Loading

iBobbyTS commented Nov 14, 2020

xRoyBx commented Nov 14, 2020

iBobbyTS commented Nov 14, 2020 • edited Loading

iBobbyTS commented Nov 14, 2020

xRoyBx commented Nov 10, 2020 •

edited

Loading

iBobbyTS commented Nov 10, 2020 •

edited

Loading

TaoTeCha commented Nov 10, 2020 •

edited

Loading

xRoyBx commented Nov 11, 2020 •

edited

Loading

xRoyBx commented Nov 11, 2020 •

edited

Loading

iBobbyTS commented Nov 14, 2020 •

edited

Loading