Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid relying on PyAV-provided video frame count #6929

Merged
merged 1 commit into from
Oct 2, 2023

Conversation

SpecLad
Copy link
Contributor

@SpecLad SpecLad commented Sep 29, 2023

Motivation and context

They are not reliable.

In particular, MP4 has a feature called "edit lists" that allows you to set a custom playback order for the media data. With edit lists, you could only specify that a particular range of frames should be played, or that a range should be played multiple times, etc. See the following for technical details:

https://developer.apple.com/documentation/quicktime-file-format/edit_list_atom

FFmpeg follows edit lists when decoding videos. However, the frame count returned by PyAV's Stream.frames property is the number of frames in the raw media data and does not reflect the modifications applied by an edit list.

When we build a video manifest, we use Stream.frames if it's non-zero. Therefore, in the presence of an edit list we will obtain a frame count that does not match the actual number of frames that we can get out of the video.

FWIW, edit lists are probably not the only way that Stream.frames could be inaccurate, it's just the reason behind a specific problem I encountered.

Since we already have to handle the situation where Stream.frames is not available, just pretend it doesn't exist and always count frames by traversing the entire video. I don't think it even matters much, since we have to do it anyway to build the rest of the manifest.

We also have to stop validating the frame count in a user-provided manifest, which is unfortunate, but it doesn't seem worthwhile to decode the entire video just for that.

How has this been tested?

I checked that dataset_manifest/create.py now calculates the correct number of frames for a file with an edit list. I also tested the same file by uploading it to CVAT.

Checklist

  • I submit my changes into the develop branch
  • I have added a description of my changes into the CHANGELOG file
  • [ ] I have updated the documentation accordingly
  • [ ] I have added tests to cover my changes
  • [ ] I have linked related issues (see GitHub docs)
  • [ ] I have increased versions of npm packages if it is necessary
    (cvat-canvas,
    cvat-core,
    cvat-data and
    cvat-ui)

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

They are not reliable.

In particular, MP4 has a feature called "edit lists" that allows you to set
a custom playback order for the media data. With edit lists, you could only
specify that a particular range of frames should be played, or that a range
should be played multiple times, etc. See the following for technical
details:

https://developer.apple.com/documentation/quicktime-file-format/edit_list_atom

FFmpeg follows edit lists when decoding videos. However, the frame count
returned by PyAV's `Stream.frames` property is the number of frames in the
raw media data and does not reflect the modifications applied by an edit
list.

When we build a video manifest, we use `Stream.frames` if it's non-zero.
Therefore, in the presence of an edit list we will obtain a frame count that
does not match the actual number of frames that we can get out of the video.

FWIW, edit lists are probably not the only way that `Stream.frames` could be
inaccurate, it's just the reason behind a specific problem I encountered.

Since we already have to handle the situation where `Stream.frames` is not
available, just pretend it doesn't exist and always count frames by
traversing the entire video. I don't think it even matters much, since we
have to do it anyway to build the rest of the manifest.

We also have to stop validating the frame count in a user-provided manifest,
which is unfortunate, but it doesn't seem worthwhile to decode the entire
video just for that.
@SpecLad SpecLad force-pushed the no-easy-frame-count branch from ebe2f31 to b98f3d3 Compare September 29, 2023 13:14
@SpecLad SpecLad marked this pull request as ready for review September 29, 2023 13:24
@codecov
Copy link

codecov bot commented Sep 29, 2023

Codecov Report

Merging #6929 (b98f3d3) into develop (d497bb6) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           develop    #6929   +/-   ##
========================================
  Coverage    82.52%   82.52%           
========================================
  Files          360      360           
  Lines        38908    38895   -13     
  Branches      3544     3544           
========================================
- Hits         32108    32100    -8     
+ Misses        6800     6795    -5     
Components Coverage Δ
cvat-ui 77.62% <ø> (-0.01%) ⬇️
cvat-server 87.00% <100.00%> (+0.02%) ⬆️

Copy link
Contributor

@nmanovic nmanovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nmanovic nmanovic merged commit 4cdd68e into cvat-ai:develop Oct 2, 2023
@SpecLad SpecLad deleted the no-easy-frame-count branch October 9, 2023 15:51
mikhail-treskin pushed a commit to retailnext/cvat that referenced this pull request Oct 25, 2023
They are not reliable.

In particular, MP4 has a feature called "edit lists" that allows you to
set a custom playback order for the media data. With edit lists, you
could only specify that a particular range of frames should be played,
or that a range should be played multiple times, etc. See the following
for technical details:


https://developer.apple.com/documentation/quicktime-file-format/edit_list_atom

FFmpeg follows edit lists when decoding videos. However, the frame count
returned by PyAV's `Stream.frames` property is the number of frames in
the raw media data and does not reflect the modifications applied by an
edit list.

When we build a video manifest, we use `Stream.frames` if it's non-zero.
Therefore, in the presence of an edit list we will obtain a frame count
that does not match the actual number of frames that we can get out of
the video.

FWIW, edit lists are probably not the only way that `Stream.frames`
could be inaccurate, it's just the reason behind a specific problem I
encountered.

Since we already have to handle the situation where `Stream.frames` is
not available, just pretend it doesn't exist and always count frames by
traversing the entire video. I don't think it even matters much, since
we have to do it anyway to build the rest of the manifest.

We also have to stop validating the frame count in a user-provided
manifest, which is unfortunate, but it doesn't seem worthwhile to decode
the entire video just for that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants