-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add experimenta lazy loading of info extractors #8497
Conversation
Without patch
With patch
I guess disk speed / IOPS is more important than cpu power? |
Thanks for checking, it seems you get an improvement similar to mine. Although your initial times aren't as bad as those in #3029.
I don't completely understand what you say, could you elaborate? |
Yes, similar results. Not sure how they get to 10+ seconds for --version command. I doubt it has to do with CPU after seeing my results for ARM CPU.
I have no clue about Python, but in the strace results someone posted in the original issue, I've noticed a lot of repetitive file system calls like:
So I assume these are all the extractors being read? If that's the case, a slow file system could be the bottleneck and that's why I did not see a big difference in my test (fast file system). |
with python2 i get this error:
|
this is the result i get(netbook with AMD 1 GHz cpu and 2 GB RAM):
With patch:
|
@remitamine It should be fixed with jaimeMF/youtube-dl@eee1aca |
Apologies for my earlier, now removed comment, I failed to RTFM and built youtube-dl without the lazy loading support. I've tested your patch on one of the Raspberries that had the problem to begin with, results are as follows: "Vanilla" youtube-dl:
"Lazy-load" youtube-dl (this time built correctly):
My "butchered" version of youtube-dl from #3029, supporting only Youtube, Soundcloud and Bandcamp:
Note, the --get-url results should be taken lightly as this Pi is currently having some network problems. That said, the results seem to be reproducable in concurrent tests. Seems there's a significant performance improvement in this version, even more than with just removing 900+ lines of downloaders in my butchered version. I haven't tested actually downloading videos with it yet but at least in these tests the difference seems very impressive. This was tested with the zipped versions of all 3, vanilla downloaded directly from the official download site, the other 2 built with (make lazy-extractors;) make youtube-dl. |
Fails with python 2.6:
|
valid_url=valid_url, | ||
module=ie.__module__) | ||
if ie.suitable.__func__ is not InfoExtractor.suitable.__func__: | ||
s += getsource(ie.suitable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be more PEP8 with a '\n':
class YahooSearchIE(LazyLoadExtractor):
_VALID_URL = None
_module = 'youtube_dl.extractor.yahoo'
@classmethod
def suitable(cls, url):
return re.match(cls._make_valid_url(), url) is not None
@classmethod
def _make_valid_url(cls):
return 'yvsearch(?P<prefix>|[1-9][0-9]*|all):(?P<query>[\\s\\S]+)'
Error when downloading multiple URLs of the same InfoExtractor:
Another minor suggestion: print information about whether lazy extractors are used or not in the verbose log. |
Patch for some of my ideas: diff --git a/devscripts/make_lazy_extractors.py b/devscripts/make_lazy_extractors.py
index 8627d0b..5506335 100644
--- a/devscripts/make_lazy_extractors.py
+++ b/devscripts/make_lazy_extractors.py
@@ -41,14 +41,16 @@ def build_lazy_ie(ie, name):
valid_url=valid_url,
module=ie.__module__)
if ie.suitable.__func__ is not InfoExtractor.suitable.__func__:
- s += getsource(ie.suitable)
+ s += '\n' + getsource(ie.suitable)
if hasattr(ie, '_make_valid_url'):
# search extractors
s += make_valid_template.format(ie._make_valid_url())
return s
names = []
-for ie in _ALL_CLASSES:
+sorted_ies = sorted(_ALL_CLASSES, key=lambda c: c.__name__[:-2] if c.__name__ != 'GenericIE' else '')
+sorted_ies = sorted_ies[1:] + [sorted_ies[0]]
+for ie in sorted_ies:
name = ie.ie_key() + 'IE'
src = build_lazy_ie(ie, name)
module_contents.append(src) |
@yan12125 thanks a lot for your comments, I thin I have addressed all of them.
Fixed in jaimeMF/youtube-dl@a4126fd Style fixes in jaimeMF/youtube-dl@dd20d6e.
|
@SharkWipf thanks for testing it, I'm glad it improves the time. |
Something "bad" about this change is that all pull request that add a extractor would need to be updated. So I want to expose the rationale for the change: I initially started by creating a new try:
from .lazy_extractors import <something>
except ImportError:
from .extractor import <something> Apart from the need to change all imports, the main problem is that when you do |
Currently, it's impossible to make py2exe Windows build with lazy extractors enabled since |
Here are some of my measurements. Command: youtube-dl -v: Linux: python 2.7.11, non-lazy: python 3.5.1, non-lazy: python 2.7.11, lazy: python 3.5.1, lazy: Windows: python 2.7.10, non-lazy python 2.7.10, lazy For sensible use cases I've got almost similar measurements (lazy/non-lazy) being lazy one even slower in some cases (probably due to network I/O influence). |
Do you want me to add a new distutils command? I'm playing with it, but currently you need to run |
Ok then. |
Added in jaimeMF/youtube-dl@a4e1733. |
i think it will be possible to use this method to improve generic extractor load time if we add the |
@remitamine I think it's better to merge this PR first and then work on that. Note that there is a similar implementation for what you want in #6216. Note that the problem would be that the |
I guess this PR can be merged? Here are my tests:
With lazy-load:
Test environment: My Android phone with my patched Python build. An incredible improvement! Much thanks @jaimeMF. |
I'll need to rebase against the current HEAD, I'll try to do it on the weekend. |
'make lazy-extractors' creates the youtube_dl/extractor/lazy_extractors.py (imported by youtube_dl/extractor/__init__.py), which contains simplified classes that only have the 'suitable' class method and that load the appropiate class with the '__new__' method when a instance is created.
When building with python3 the unicode characters are not escaped, python2 needs to know the encoding.
not sure why it happen but it should be related to this change.
|
@remitamine it's the @classmethod
def suitable(cls, url):
# Don't return True if the url can be extracted with other youtube
# extractor, the regex would is too permissive and it would match.
other_ies = iter(klass for (name, klass) in globals().items() if name.endswith('IE') and klass is not cls)
if any(ie.suitable(url) for ie in other_ies):
return False
else:
return super(YoutubeUserIE, cls).suitable(url) It's also picking GenericIE, and since it matches all urls it returns False. I'm not sure what would be the cleanest fix, I can think of just changing it in iter(klass for (name, klass) in globals().items() if name.endswith('IE') and not name == 'GenericIE' and klass is not cls) Do you have a better suggestion? |
as i understand the suitable method here tries to see if the url match other youtube extractors, may be we can check only for extractors that starts with |
That sounds better. |
thanks for the help. |
there is an error happen when i use Lazy Extractors:
|
@remitamine should be fixed in 169d836. Thanks for pointing it. For future problems, feel free to open a new issue and ping me. |
@jaimeMF Hi, how stable is this patch? Close to production ready? |
It works, but some details may not work correctly and sometimes it breaks. |
@jaimeMF anything specific I should worry about? |
No. If you find something that doesn't work, open a new issue and we will fix it. |
@royale1223 It's not caused by this patch. This URL redirects to https://vimeo.com/ondemand/castlesinthesky/89677808, and the latter works fine with youtube-dl. Nevertheless the original URL should be supported as well. Could you open a new issue? |
@yan12125 Happens because it's not a free video. |
Hi there! just wondering where is the documentation on how to lazy load the info extractors. Is there a flag I can pass when getting the url info |
@alejomendoza clone this repository and run the following two commands:
And replace the official youtube-dl with the newly generated one. |
Move import of nieuwsblad extractor from __init__.py to extractors.py #8497
If this performance improvement is working without issues by now, why not make it the default? |
Actually lazy extractors break testing. (#13554) It should be fixed for developers before making it as the default for users. |
Inspired by @remitamine's comment (#3029 (comment))
In my computer the difference is not super spectacular.
youtube-dl --version
goes from 0.6s to 0.3-0.4s, in the case of the zipped executable it goes from 1s to 0.4s; andyoutube-dl 'http://youtube.com/watch?v=BaW_jenozKcj'
goes from 2.1s to 1.6s. On slower devices like the Raspberry Pi (#3029) the difference may be more noticeable.Since a lot of things may break, it requires to run
make lazy-extractors
first.