Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) Initial implementation of the new videoReader API #2683

Merged
merged 143 commits into from
Oct 7, 2020

Conversation

bjuncek
Copy link
Contributor

@bjuncek bjuncek commented Sep 17, 2020

Per description in #2660 , here is a proof of concept implementation for video reading and metadata accessing.
THIS API IS STILL EXPERIMENTAL AND WILL LIKELY BE CHANGED/MODIFIED

Some key features:

  1. confirming identical results to "video_reader" backend.
  2. if merged would fixed the VideoReader segfault on SOME videos. #2650 issue due to underlying API change.

Missing features:

  • audio stream not properly tested
  • performance upgrade - at the moment there is a constant overhead for tensor allocation
  • lack of support for random byte, CC, and SUB streams
  • seek not properly tested

.circleci/config.yml.in Outdated Show resolved Hide resolved
s = min(r)
e = max(r)

reader = torch.classes.torchvision.Video(full_path, "video")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a follow-up PR: we should expose in torchvision Video, so that you can access it via torchvision.io.Video or something like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding that to #2660 feature tracker

self.assertEqual(tv_result.size(), new_api.size())

def test_partial_video_reading_fn(self):
torchvision.set_video_backend("video_reader")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might need to comment this out for now. Many of the test issues we had before were due to switching globally to using video_reader during the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - pushed the changes

@bjuncek
Copy link
Contributor Author

bjuncek commented Oct 6, 2020

@fmassa seems segfaulting on travis.
Looking at the log, it picks up av from conda and the segfault is likely due to the ffmpeg 4.3.1

the output of the raw log

  av                 conda-forge/linux-64::av-8.0.2-py36hf21bf4b_1
  bzip2              conda-forge/linux-64::bzip2-1.0.8-h516909a_3
  ffmpeg             conda-forge/linux-64::ffmpeg-4.3.1-h167e202_0

which makes sense given that travis is installing av from conda with no ffmpeg version check.
Should we add conda install -c conda-forge ffmpeg=4.2.2 before this

@fmassa
Copy link
Member

fmassa commented Oct 7, 2020

Should we add conda install -c conda-forge ffmpeg=4.2.2 before this

We have disabled all IO tests in travis, TravisCI now only compiles those blocks. so I would say you can for now just skip test_video as we do for TestIO and TestVideoReader in

- pytest --cov-config .coveragerc --cov torchvision --cov $TV_INSTALL_PATH -k 'not TestVideoReader and not TestVideoTransforms and not TestIO' test --ignore=test/test_datasets_download.py

@codecov
Copy link

codecov bot commented Oct 7, 2020

Codecov Report

Merging #2683 into master will increase coverage by 0.68%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2683      +/-   ##
==========================================
+ Coverage   72.42%   73.11%   +0.68%     
==========================================
  Files          96       96              
  Lines        8313     8332      +19     
  Branches     1293     1299       +6     
==========================================
+ Hits         6021     6092      +71     
+ Misses       1903     1848      -55     
- Partials      389      392       +3     
Impacted Files Coverage Δ
torchvision/__init__.py 68.75% <0.00%> (+3.12%) ⬆️
torchvision/ops/boxes.py 99.07% <0.00%> (+5.81%) ⬆️
torchvision/io/video.py 80.47% <0.00%> (+11.83%) ⬆️
torchvision/io/_video_opt.py 39.37% <0.00%> (+16.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6e639d3...aae1d4f. Read the comment docs.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging this to move forward, there are a few follow-up cleanups that can be done but let's do them in a different PR

@fmassa fmassa merged commit 87c7864 into pytorch:master Oct 7, 2020
@bjuncek bjuncek deleted the bjuncek/base_api branch October 7, 2020 16:35
bryant1410 pushed a commit to bryant1410/vision-1 that referenced this pull request Nov 22, 2020
* adding base files

* setup modification to actually build the thing

* video api constructor registration

* FAIL metadata

* FAIL update for QS

* revert

* debugging with Victor

* adding base files

* setup modification to actually build the thing

* video api constructor registration

* FAIL metadata

* FAIL update for QS

* revert

* debugging with Victor

* metadata registration works

* API build next

* test

* Merge change

* formatting parameters to avoid the segfault

* next now works on a video

* make size of the output tensor format dependent

* Make next work on audio stream only as well

* refactoring the _setCurrentStream param

* Fixing the last frame return and sensor

* todo docs

* Formatting

* cleanup and comments

* introducing new tests for the API

* cleanup

* Comment out unnecesary format (will add following FFMPEG fix)

* Reformat parsing function

* removing the seek bug `get_decoder_params`

* Removing unnecessary code/variables

* enforce RGB24 as a reading format (will crash before ffmpeg fix)

* permute the dimensions to return (RGB x H x W)

* Changing the return type to std::tuple<torch::Tensor, double> as opposed to tensor list

* Adjusting tests for the new return type

* remove unnecessary jitter

* clangangangang

* Metadata return changes (pytorch#1)

* remove implicit calls to set a current stream (pytorch#2)

* Adding new tests to check the accuracy of the seek

* cleanup debugging statements

* adding base files

* setup modification to actually build the thing

* video api constructor registration

* FAIL metadata

* FAIL update for QS

* revert

* debugging with Victor

* adding base files

* video api constructor registration

* FAIL metadata

* FAIL update for QS

* revert

* debugging with Victor

* metadata registration works

* API build next

* test

* Merge change

* formatting parameters to avoid the segfault

* next now works on a video

* make size of the output tensor format dependent

* Make next work on audio stream only as well

* refactoring the _setCurrentStream param

* Fixing the last frame return and sensor

* todo docs

* Formatting

* cleanup and comments

* introducing new tests for the API

* cleanup

* Comment out unnecesary format (will add following FFMPEG fix)

* Reformat parsing function

* removing the seek bug `get_decoder_params`

* Removing unnecessary code/variables

* enforce RGB24 as a reading format (will crash before ffmpeg fix)

* permute the dimensions to return (RGB x H x W)

* Changing the return type to std::tuple<torch::Tensor, double> as opposed to tensor list

* Adjusting tests for the new return type

* remove unnecessary jitter

* clangangangang

* Metadata return changes (pytorch#1)

* remove implicit calls to set a current stream (pytorch#2)

* Adding new tests to check the accuracy of the seek

* cleanup debugging statements

* Addressing PR comments

* addressing Francisco's comments

* CLANG build formatting

* Updated testing to test against pyav for the video tensor reads

* Formatting

* remove pyav from pip deps and add it to conda build

* add pyav and ffmeped to conda builds

* Formatting?

* Setting up linter once and for all hopefully

* Testing pyav

* Fix to 8.0.0

* Try 6.2.0

* See what happens with av from pip

* Remove FFMPEG blocker

* What is going on?

* More tests

* Forgot something

* unblocker

* Check if cache is messing up with things

* Now try with different ffmpeg

* Now try with different ffmpeg

* Testing pyav

* Fix to 8.0.0

* Try 6.2.0

* See what happens with av from pip

* What is going on?

* More tests

* Forgot something

* Check if cache is messing up with things

* Now try with different ffmpeg

* Now try with different ffmpeg

* Do not install av

* Test with ffmpeg 4.2

* clean up video tests

* cleaning up the tests a bit to better test partial reading

* arrgh linter

* Forgot the av test

* forgot av test

* checkout build files from master

* revert circleci

* addressing Franciscos comments

* addressing Franciscos comments

* Ignore ffmpeg in travis

Co-authored-by: Francisco Massa <[email protected]>
Co-authored-by: Edgar Andrés Margffoy Tuay <[email protected]>
vfdev-5 pushed a commit to Quansight/vision that referenced this pull request Dec 4, 2020
* adding base files

* setup modification to actually build the thing

* video api constructor registration

* FAIL metadata

* FAIL update for QS

* revert

* debugging with Victor

* adding base files

* setup modification to actually build the thing

* video api constructor registration

* FAIL metadata

* FAIL update for QS

* revert

* debugging with Victor

* metadata registration works

* API build next

* test

* Merge change

* formatting parameters to avoid the segfault

* next now works on a video

* make size of the output tensor format dependent

* Make next work on audio stream only as well

* refactoring the _setCurrentStream param

* Fixing the last frame return and sensor

* todo docs

* Formatting

* cleanup and comments

* introducing new tests for the API

* cleanup

* Comment out unnecesary format (will add following FFMPEG fix)

* Reformat parsing function

* removing the seek bug `get_decoder_params`

* Removing unnecessary code/variables

* enforce RGB24 as a reading format (will crash before ffmpeg fix)

* permute the dimensions to return (RGB x H x W)

* Changing the return type to std::tuple<torch::Tensor, double> as opposed to tensor list

* Adjusting tests for the new return type

* remove unnecessary jitter

* clangangangang

* Metadata return changes (#1)

* remove implicit calls to set a current stream (pytorch#2)

* Adding new tests to check the accuracy of the seek

* cleanup debugging statements

* adding base files

* setup modification to actually build the thing

* video api constructor registration

* FAIL metadata

* FAIL update for QS

* revert

* debugging with Victor

* adding base files

* video api constructor registration

* FAIL metadata

* FAIL update for QS

* revert

* debugging with Victor

* metadata registration works

* API build next

* test

* Merge change

* formatting parameters to avoid the segfault

* next now works on a video

* make size of the output tensor format dependent

* Make next work on audio stream only as well

* refactoring the _setCurrentStream param

* Fixing the last frame return and sensor

* todo docs

* Formatting

* cleanup and comments

* introducing new tests for the API

* cleanup

* Comment out unnecesary format (will add following FFMPEG fix)

* Reformat parsing function

* removing the seek bug `get_decoder_params`

* Removing unnecessary code/variables

* enforce RGB24 as a reading format (will crash before ffmpeg fix)

* permute the dimensions to return (RGB x H x W)

* Changing the return type to std::tuple<torch::Tensor, double> as opposed to tensor list

* Adjusting tests for the new return type

* remove unnecessary jitter

* clangangangang

* Metadata return changes (#1)

* remove implicit calls to set a current stream (pytorch#2)

* Adding new tests to check the accuracy of the seek

* cleanup debugging statements

* Addressing PR comments

* addressing Francisco's comments

* CLANG build formatting

* Updated testing to test against pyav for the video tensor reads

* Formatting

* remove pyav from pip deps and add it to conda build

* add pyav and ffmeped to conda builds

* Formatting?

* Setting up linter once and for all hopefully

* Testing pyav

* Fix to 8.0.0

* Try 6.2.0

* See what happens with av from pip

* Remove FFMPEG blocker

* What is going on?

* More tests

* Forgot something

* unblocker

* Check if cache is messing up with things

* Now try with different ffmpeg

* Now try with different ffmpeg

* Testing pyav

* Fix to 8.0.0

* Try 6.2.0

* See what happens with av from pip

* What is going on?

* More tests

* Forgot something

* Check if cache is messing up with things

* Now try with different ffmpeg

* Now try with different ffmpeg

* Do not install av

* Test with ffmpeg 4.2

* clean up video tests

* cleaning up the tests a bit to better test partial reading

* arrgh linter

* Forgot the av test

* forgot av test

* checkout build files from master

* revert circleci

* addressing Franciscos comments

* addressing Franciscos comments

* Ignore ffmpeg in travis

Co-authored-by: Francisco Massa <[email protected]>
Co-authored-by: Edgar Andrés Margffoy Tuay <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants