-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make filter_graph_->nb_threads
be 1, and make number of decoding threads customizable
#63
Conversation
We also work on Kinetics dataset and found an issue similar to #62. I suggest adding an option to totally disable FFmpeg multi-threading. My patch is similar, so I will not open a new PR. see huww98@76333af In our use case, we see not only 1.24x speedup, but CPU usage also drops from 245% to 110%. That's a total of 2.7x more efficient. Here is my test code. In short, I open each file, then only read first 16 frames. from pathlib import Path
import time
from decord import VideoReader
BASE = Path('/tmp/answering_questions')
def main():
files = list(BASE.glob('**/*.mp4'))
frames = list(range(16))
print(f"Reading {len(frames)} frames from {len(files)} files")
for i in range(3):
print(f'pass {i + 1}')
t1 = time.perf_counter()
for f in files:
v = VideoReader(str(f))
v.get_batch(frames)
t2 = time.perf_counter()
print(f'Time: {t2-t1}')
if __name__ == "__main__":
main() The CPU usage data is reported by |
Good catch! However, I think a better option is to expose the multithreading config to python side with arguments to turn ON/OFF the multithreading. @yjxiong @bryanyzhu FYI, there's possibly a good margin for speed up in multi-worker loading if we turn of the internal mt. |
@huww98 I tested your changes. If we
Clearly in the single worker case, we should leave the decoding thread num to auto. @zhreshold Yeah its nice to implement that option. I'm not familiar with c++, so if this change is trivial for you, could you help implementing it? I would like to continue tracing other performance issues. |
Signed-off-by: lizz <[email protected]>
@zhreshold I implemented the customizable number of decoding thread. And leave the filter graph thread to the fixed number 1. |
filter_graph_->nb_threads
be 1filter_graph_->nb_threads
be 1, and make number of decoding threads customizable
Signed-off-by: lizz <[email protected]>
@innerlee Thanks. My test is performed in WSL2, maybe threading overhead is larger in WSL. However, in our use case, we do not care about single video decode time. Since we always use multiple workers to increase throughput, I think multi-thread in FFmpeg can only add more overhead. Disabling FFmpeg threading can always benefit us. |
yes, with this pr, you can set num_threads=1 in the python constructor of VideoReader. Could you try it and check if the speed is similar to your changes? |
Late to the party. Great to know this is being figured out. The filter-graph threading seems not needed in the use case. The codec threading seems helpful when the use case just has one worker. For multiple workers, it is better to just use no ffmpeg-level threading. Maybe we can add this guidance in the usage doc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
I will be merging this now, once we figure out the impact of |
The makes the time spent on this function deterministic, and hence faster.
Testing on 32 videos shows 1.3x speedup:
Testing code: