Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VideoReader segfault on SOME videos. #2650

Closed
bjuncek opened this issue Sep 8, 2020 · 4 comments
Closed

VideoReader segfault on SOME videos. #2650

bjuncek opened this issue Sep 8, 2020 · 4 comments

Comments

@bjuncek
Copy link
Contributor

bjuncek commented Sep 8, 2020

🐛 Bug

VideoReader segmentation fault on long videos when using video_reader backend. This issue is a continuation of #2259
Torchvision segfaults when reading entire test video.

I used to believe this was the issue with long videos only, but it happens on the test videos we have provided as well suggesting in might related to the FFMPEG version installed on a system (the fact that test don't catch that might suggest that).

To Reproduce

Steps to reproduce the behavior:

  1. install torchvision from source - in this case
  2. from your folder call
    vframes, _, _ = torchvision.io.read_video(path, pts_unit="sec") where path=$TVDIR/test/assets/videos/TrumanShow_wave_f_nm_np1_fr_med_26.avi

Backtrace suggests it's an issue in libswscale.

#0  0x00007fff88224cf2 in ?? () from /home/bjuncek/miniconda3/envs/vb/lib/libswscale.so.5
#1  0x00007fff88223bb4 in ?? () from /home/bjuncek/miniconda3/envs/vb/lib/libswscale.so.5
#2  0x00007fff881f9af4 in sws_scale () from /home/bjuncek/miniconda3/envs/vb/lib/libswscale.so.5

Which I've previously found can be due to conflicting inputs (note, this might be due to new/different FFMPEG version?).

Expected behavior

Video is being read

Environment

Collecting environment information...
PyTorch version: 1.6.0
Is debug build: False
CUDA used to build PyTorch: 10.2

OS: Ubuntu 18.04.4 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Quadro RTX 8000
GPU 1: Quadro RTX 8000

Nvidia driver version: 440.33.01
cuDNN version: Probably one of the following:
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7
/usr/local/cuda-10.2.89/targets/x86_64-linux/lib/libcudnn.so.7

Versions of relevant libraries:
[pip3] numpy==1.19.1
[pip3] torch==1.6.0
[pip3] torchvision==0.7.0a0+78ed10c
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.2.89 hfd86e86_1
[conda] mkl 2020.2 256
[conda] mkl-service 2.3.0 py38he904b0f_0
[conda] mkl_fft 1.1.0 py38h23d657b_0
[conda] mkl_random 1.1.1 py38hcb8c335_0 conda-forge
[conda] numpy 1.19.1 py38hbc911f0_0
[conda] numpy-base 1.19.1 py38hfa32c7d_0
[conda] pytorch 1.6.0 py3.8_cuda10.2.89_cudnn7.6.5_0 pytorch
[conda] torchvision 0.7.0a0+78ed10c pypi_0 pypi

Suggested fix

Removing hidden inputs (specifically size/aspect ratio/crop) in _read_video op can in principle fix this, but might be bc breaking if users are exposing these manually in their code.

cc @bjuncek

@andfoy
Copy link
Contributor

andfoy commented Sep 16, 2020

More information:

Program received signal SIGSEGV, Segmentation fault.
0x00007f4d206b7fc2 in ff_yuv_420_rgb24_ssse3.loop0 ()
    at libswscale/x86/yuv_2_rgb.asm:376
376	libswscale/x86/yuv_2_rgb.asm: No such file or directory.
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64
(gdb) bt
#0  0x00007f4d206b7fc2 in ff_yuv_420_rgb24_ssse3.loop0 ()
    at libswscale/x86/yuv_2_rgb.asm:376
#1  0x00007f4d206b6e84 in yuv420_rgb24_ssse3 (c=0x564bca2e59c0, src=0x7ffd52b1c9e0, 
    srcStride=0x7ffd52b1c9c0, srcSliceY=0, srcSliceH=256, dst=0x7ffd52b1ca00, 
    dstStride=0x7ffd52b1c9d0) at libswscale/x86/yuv2rgb_template.c:177
#2  0x00007f4d2068eb45 in sws_scale (c=<optimized out>, srcSlice=<optimized out>, 
    srcStride=<optimized out>, srcSliceY=<optimized out>, srcSliceH=256, 
    dst=<optimized out>, dstStride=0x7ffd52b1cd10) at libswscale/swscale.c:969
#3  0x00007f4d21f62d2b in ffmpeg::(anonymous namespace)::transformImage (
    context=0x564bca2e59c0, srcSlice=0x564bca35f100, srcStride=0x564bca35f140, 
    inFormat=..., outFormat=..., 
    out=0x564bca4c5b60 "\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\025\016\030\024\r\027\024\r\031\024\r\031\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\023\f\030\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\024\r\031\023\f\030\023\f\030\023\f"..., planes=0x7ffd52b1cd20, lines=0x7ffd52b1cd10)
    at /root/vision/torchvision/csrc/cpu/decoder/video_sampler.cpp:46
#4  0x00007f4d21f639a8 in ffmpeg::VideoSampler::sample (this=0x564bca5a4220, 
    srcSlice=0x564bca35f100, srcStride=0x564bca35f140, out=0x564bca4c3490)
    at /root/vision/torchvision/csrc/cpu/decoder/video_sampler.cpp:182
#5  0x00007f4d21f63c1e in ffmpeg::VideoSampler::sample (this=0x564bca5a4220, 
    frame=0x564bca35f100, out=0x564bca4c3490)

The segafults only occur when MMX/SSE/AVX optimizations are enabled on FFmpeg

@fmassa
Copy link
Member

fmassa commented Sep 18, 2020

I believe this issue might be a bug in FFmpeg introduced in FFmpeg/FFmpeg@fc6a588, and that has been fixed in FFmpeg/FFmpeg@ba3e771

The bug report for this issue was in https://trac.ffmpeg.org/ticket/8747

If that's the case, then recompiling FFmpeg would solve the issue.

@andfoy
Copy link
Contributor

andfoy commented Sep 18, 2020

Effectively, this issue is directly related to the regression introduced in 4.3 and fixed in FFmpeg/FFmpeg@ba3e771. On FFmpeg 4.2 video reader tests pass

@bjuncek
Copy link
Contributor Author

bjuncek commented Oct 15, 2020

Given that this was a known issue from ffmpeg, and is fixed by using a different version, I'm closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants