Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exact Seek #78

Merged
merged 36 commits into from
Jul 13, 2020
Merged

Exact Seek #78

merged 36 commits into from
Jul 13, 2020

Conversation

innerlee
Copy link
Contributor

@innerlee innerlee commented Jul 7, 2020

This pr implements exact seeking. Details see the post below.

Note that this is tested only for the cpu version.
GPU is not checked since it is hard to compile successfully.

JoannaLXY and others added 30 commits July 3, 2020 13:01
Signed-off-by: lixuanyi <[email protected]>
Signed-off-by: lizz <[email protected]>
Signed-off-by: lizz <[email protected]>
Signed-off-by: lizz <[email protected]>
Signed-off-by: lizz <[email protected]>
Signed-off-by: lixuanyi <[email protected]>
Signed-off-by: lixuanyi <[email protected]>
Signed-off-by: lizz <[email protected]>
Signed-off-by: lizz <[email protected]>
.
Signed-off-by: lizz <[email protected]>
Signed-off-by: lizz <[email protected]>
Signed-off-by: lizz <[email protected]>
Signed-off-by: lizz <[email protected]>
Signed-off-by: lizz <[email protected]>
.
Signed-off-by: lizz <[email protected]>
Signed-off-by: lixuanyi <[email protected]>
Signed-off-by: lizz <[email protected]>
Signed-off-by: lizz <[email protected]>
Signed-off-by: lizz <[email protected]>
Signed-off-by: lixuanyi <[email protected]>
Signed-off-by: lizz <[email protected]>
Signed-off-by: lizz <[email protected]>
Signed-off-by: lizz <[email protected]>
Signed-off-by: lizz <[email protected]>
.
Signed-off-by: lizz <[email protected]>
Signed-off-by: lizz <[email protected]>
Signed-off-by: lizz <[email protected]>
innerlee and others added 4 commits July 7, 2020 18:31
Signed-off-by: lizz <[email protected]>
Signed-off-by: lizz <[email protected]>
Signed-off-by: lizz <[email protected]>
Signed-off-by: lixuanyi <[email protected]>
@innerlee innerlee changed the title Lxy/exact Exact Seek Jul 7, 2020
@innerlee innerlee marked this pull request as ready for review July 7, 2020 11:06
@JoannaLXY
Copy link
Contributor

JoannaLXY commented Jul 7, 2020

This PR improves the SeekAccurate algorithm by adding an additional check on the curr_frame_ after calling av_seek_frame and ensures accurate seek.

Previously, the SeekAccurate function may not be able to return the correct frames when random seeking. This is due to the incorrect keyframe found by av_seek_frame in Seek. For example, a video has keyframes at frame 0, 16, 32, 48; after calling Seek(16), the expected behavior is that curr_frame_ will be at 16. But we found that this is non-deterministic as sometimes the real curr_frame_ will be at 0 or 32. (The reason is yet to be found).

Based on this fact, this PR modifies the SeekAccurate algorithm:

  1. Before every Seek call, go to frame 0 first and then seek.
  2. After Seek, check the checks the real current frame. If it is smaller than the frame we are looking for, use SkipFramesImpl to move to the correct position. If it is larger, call Seek again and use the flag AVSEEK_FLAG_BACKWARD to find to correct position. If it is correct, continue with the previous algorithm.
    One thing to note is that when checking whether the real current frame is correct or not, we inevitably decode this frame and the curr_frame_ will 'overrun' and cause problems. For example, a video has keyframes at frame 0, 16, 32, 48; the SeekAccurate(16) called Seek(16). By decoding, we found that it returns the correct frame 16 but curr_frame_ now moves to 17. If we call NextFrame now, it will give the frame 17. To solve this problem, we stored the frame when decoding and use a flag over_run_ to indicate whether to use it.

This modification is tested on 4 different datasets with 500 videos each. We randomly sample 3 frames and decode these frames using num_thread = 1. The following table shows the speed and decoding accuracy in different settings. It is tested on CPU.

Edit:

Decoding time/Accuracy Decord (Before) Decord (After)
Kinetics400 (Resized, keyframe interval at most 16) ~15.2s/70.6% ~15.3s/100%
Kinetics400 (Resized) ~145s/100% ~144s/100%
Kinetics400 (Original) ~205s/100% ~209s/100%
Something-Something v2 ~17s/100% ~14s/100%

@innerlee
Copy link
Contributor Author

innerlee commented Jul 7, 2020

The speed of the second row looks strange. May need to find out the reason of slowness.

Edit: solved. see the above edited post

Signed-off-by: lizz <[email protected]>
@zhreshold
Copy link
Member

This PR improves the SeekAccurate algorithm by adding an additional check on the curr_frame_ after calling av_seek_frame and ensures accurate seek.

Previously, the SeekAccurate function may not be able to return the correct frames when random seeking. This is due to the incorrect keyframe found by av_seek_frame in Seek. For example, a video has keyframes at frame 0, 16, 32, 48; after calling Seek(16), the expected behavior is that curr_frame_ will be at 16. But we found that this is non-deterministic as sometimes the real curr_frame_ will be at 0 or 32. (The reason is yet to be found).

Based on this fact, this PR modifies the SeekAccurate algorithm:

  1. Before every Seek call, go to frame 0 first and then seek.
  2. After Seek, check the checks the real current frame. If it is smaller than the frame we are looking for, use SkipFramesImpl to move to the correct position. If it is larger, call Seek again and use the flag AVSEEK_FLAG_BACKWARD to find to correct position. If it is correct, continue with the previous algorithm.
    One thing to note is that when checking whether the real current frame is correct or not, we inevitably decode this frame and the curr_frame_ will 'overrun' and cause problems. For example, a video has keyframes at frame 0, 16, 32, 48; the SeekAccurate(16) called Seek(16). By decoding, we found that it returns the correct frame 16 but curr_frame_ now moves to 17. If we call NextFrame now, it will give the frame 17. To solve this problem, we stored the frame when decoding and use a flag over_run_ to indicate whether to use it.

This modification is tested on 3 different datasets with 500 videos each. The following table shows the speed and decoding accuracy in different settings.

Decoding time/Accuracy PyAV Decord (Before) Decord (After)
Kinetics400(Resized) 177.5s/100% 17s/70.6% 20.67s/100%
Kinetics400(Original) 701s/100% 66.2s/100% 265.5s/100%
Something-Something v2 34.98s/100% 26.48/100% 18.27s/100%
Here, PyAV decodes the videos sequentially so it is used as ground truth. After modifying, the speed may drop but the accuracy is ensured.

awesome, first of all, thanks for putting efforts on this issue, I will dig into the solution and get back to you.
Before that, I have several comments on this

  • "1. Before every Seek call, go to frame 0 first and then seek" why would this make any difference in seek result if flag is seek backward?
  • for the results, do you know why resized kinetics 400 is bad in comparison to original k400?

@innerlee
Copy link
Contributor Author

innerlee commented Jul 8, 2020

  • Go to 0 and then seek key frame can increase the success rate of seeking to the correct keyframe. In https://github.com/dmlc/decord/pull/78/files#diff-2528f2176a6afdf33b9891ecd924c3e5R272 , if we jump from earlier time to later, the flag is 0. If we jump using the BACKWARD flag, there is a large probability that it will jump to a wrong keyframe, and increases the processing time.

  • The resized version uses -g 16 to increase the density of keyframes. The original video has very few keyframes.

Edit: I will post the resize script later

Edit:

ffmpeg -hide_banner -loglevel panic -i 'original/$f' -vf mpdecimate,scale=-2:256 -vsync vfr -c:v libx264 -g 16 -an 'resized/$f' -y

mpdecimate filter is to filter out repeated frames, vsync is to sync timestamp, -g 16 is to insert keyframes, and scale is the resize.

@zhreshold
Copy link
Member

@innerlee thanks, let me benchmark it first

@innerlee
Copy link
Contributor Author

Thanks. If it works properly, then it might be good to fix the GPU version also. Here the fix i mean make it compiles and runs. I haven't tried (or compiled) gpu version yet.

Copy link
Member

@zhreshold zhreshold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments.

Besides, I've tested with cuda build and it's fine as the changes are on higher level seeking. So if the comments are addressed I think the PR is good to go!

@@ -281,6 +286,20 @@ bool VideoReader::Seek(int64_t pos) {
return ret >= 0;
}

bool VideoReader::SeekStart() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we merge this into a special case of Seek(0) in Seek?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! I have substituted the SeekStart() by Seek(0) and the decoding speed remains similar.

@zhreshold zhreshold merged commit dfab508 into dmlc:master Jul 13, 2020
@zhreshold
Copy link
Member

@innerlee @JoannaLXY Much appreciated for the awesome job in fixing this, now this PR is merged! 😸

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants