-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extremely slow accurate seek #111
Comments
Just to substantiate: Another benchmark run from a smaller (SD) file with same duration (10m):
|
And some info on the video file:
|
On further investigation, it seems the slow iteration is due to the video files. I'll leave this open in case there is some helpful input and investigate further. |
My intuition is that these video files might lack some sort of header information, which would require decord to scan the entire file on each individual indexing step. I've solved the problem by manually iterating over all frames and skipping those not in the batch set. |
Can you print the keyframes of the testing video you have? You can call |
Thanks for the pointer, I'll investigate that. As far as the RTX 2000 cards go, I think I read about some minor improvements in encoding quality, but the decoding seemed pretty stable since multiple generations. By the way, a side effect of my manual iteration is that decord seems to load all frames in memory - so it's consuming ~40GB of ram for a 200MB video file. So there is some incentive for me to get indexing working. |
OK, for documentation, here are some further details.
By the way, I could get around the memory bloat by decoding the files on the GPU - even though the GPU decoder is a bit slower than the CPU (probably due to copying - my code isn't optimized towards using the zero copy facilities from decord). |
I had a similar problem in simply doing something like this on cpu on ubuntu: for i in range(10): Each line got slower as the loop progressed. The problem went away when I reverted from 0.4.2 back to 0.4.0 |
Hi @mark-aai, [edit] Although, looking at the commits, I don't see anything in particular that stands out between 0.4.0 and 0.4.2 ... Especially since you're decoding on the CPU and not on GPU. |
Hi @trifle , the only thing I could add is I used pip to install both times. First "pip install decord", which installed 0.4.2, and then later as soon as I ran "pip install decord==0.4.0" the problem went away. I tried this because I checked what version was running fine on a different machine, and it was 0.4.0. |
Here's another example using CPU and self-contained. After following compilation instructions:
For this benchmark, the accurate seeking seems to often take approximately the runtime of the video. The problem probably stems from #78 . Apparently that works fine for some video formats but not for others. Is there any way we can determine which formats can't accurately seek to a keyframe, and only use this aggressive/slow strategy of seeking to 0 and then double checking in those cases? It seems that one of these is the most likely culprit. If it's not possible to know beforehand, we could let the user choose the strategy. Either:
From the benchmarks in that PR it's clear that the new strategy doesn't necessarily slow down all formats in the same way. On the other hand, it's probably the case that many formats can accurately seek to keyframes. By the way, my problem here that that now |
@innerlee @JoannaLXY Any thoughts about this? |
What's the speed on v0.4.1? |
Okay I've gone ahead and bisected it manually and for the same example which anyone can youtube-dl themselves, it seems I pointed to the wrong culprit. (Sorry @innerlee & @JoannaLXY!)
So it's a much smaller perf regression than the one we're encountering (and worth it if it improves accuracy). I'll carry on bisecting and report back where the regression happened. I would be good to have some simple tests to avoid such big regressions even making it in. |
Okay the problem was introduced here: 7b6c0e9
|
For random access, if you can get the indices ahead of time, then sorting and accessing sequentially will improve speed. See code here https://github.com/open-mmlab/mmaction2/blob/master/mmaction/datasets/pipelines/loading.py#L971-L976 |
Yeah that's a good tip (and quite similar to what |
However v0.4.0 is the last known good version rather than v0.4.1 due to #100 |
That was mostly because of bad video files, which can be fixed on the user side. We used v0.4.1 on the whole kinetics 700 (re-encoded because of resize), and found no issue |
@frankier interesting that frame based seeking is way slower in some cases, I am a bit confused now since there's no documentation of ffmpeg to detail the implementation strategy |
@frankier and also my own test using the same video you have provided shown a very different result: Results on mac with (Intel(R) Core(TM) i7-7660U CPU @ 2.50GHz) master/HEAD
revert 7b6c0e9:
There's no differece before/after
|
Anyway I am reverting the frame based seeking to ts based in #115, I will keep this thread open as I haven't found the root cause of the issue and the issue seems to be more complicated and not well reproducible. |
I am using Debian Bullseye. It looks like I have the same version of the relevant FFMPEG/lib although am missing libavdevice. Here is the full info:
|
Everyone experiencing this seems to be on Linux (Ubuntu or Debian). @trifle had libavdevice so that's probably not the problem. It is interesting that OS X seems to not have the problem. I wonder if the underlying problem is somehow related to this. If so, two places to consider are:
I think the next thing to do to track down what's going on here is for someone on Debian/Ubuntu to try compiling ffmpeg manually and see if the problem persists. If this fixes it the problem should be reported on the Debian bug tracker, otherwise the problem should be reported to ffmpeg. |
Dear @innerlee & @zhreshold & @frankier & @trifle , vr = de.VideoReader(input_video)
for i in range(len(vr)):
frame = vr.next() and I think, the vr = de.VideoReader(input_video)
for i in range(start_index, end_index):
vr.seek(start_index)
frame = vr.next() and I think, the vr = de.VideoReader(input_video)
for i in range(start_index, end_index):
frame = vr[i] or vr = de.VideoReader(input_video)
for i in range(start_index, end_index):
vr.seek_accurate(start_index)
frame = vr.next() am I correct? |
Hi,
I've hit an unexpected regression with accurate seeks - they seem about one order of magniture too slow. This is with a CUDA-enabled manual install from current git HEAD:
The machine (16 true core, nvme SSD) is neither CPU nor IO limited during the benchmark. CPU load is ~8 (out of 16), so half the cores are idle. The file is a 50MB h.264 video encoded by ffmpeg.
Results are slightly faster for random but even worse for accurate seeks when running on GPU:
This is on a RTX 2070 super.
Other specs:
Comparison benchmarks (don't have pyav, so for opencv only):
If there's anything else I can do to help debugging, let me know!
The text was updated successfully, but these errors were encountered: