Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimenta lazy loading of info extractors #8497

Merged
merged 9 commits into from
Apr 9, 2016

Conversation

jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented Feb 10, 2016

Inspired by @remitamine's comment (#3029 (comment))
In my computer the difference is not super spectacular. youtube-dl --version goes from 0.6s to 0.3-0.4s, in the case of the zipped executable it goes from 1s to 0.4s; and youtube-dl 'http://youtube.com/watch?v=BaW_jenozKcj' goes from 2.1s to 1.6s. On slower devices like the Raspberry Pi (#3029) the difference may be more noticeable.

Since a lot of things may break, it requires to run make lazy-extractors first.

@kidol
Copy link
Contributor

kidol commented Feb 10, 2016

ARMv7 Processor rev 2 (v7l)
4 x 1,3Ghz

Without patch

time youtube-dl --version
real 0m0.953s
user 0m0.820s
sys 0m0.110s

time youtube-dl --get-url http://youtube.com/watch?v=BaW_jenozKcj
real 0m3.058s
user 0m2.440s
sys 0m0.180s

With patch

time youtube-dl --version
real 0m0.610s
user 0m0.550s
sys 0m0.040s

time youtube-dl --get-url http://youtube.com/watch?v=BaW_jenozKcj
real 0m2.859s
user 0m2.100s
sys 0m0.210s

I guess disk speed / IOPS is more important than cpu power?

@jaimeMF
Copy link
Collaborator Author

jaimeMF commented Feb 10, 2016

Thanks for checking, it seems you get an improvement similar to mine. Although your initial times aren't as bad as those in #3029.

I guess disk speed / IOPS is more important than cpu power?

I don't completely understand what you say, could you elaborate?

@kidol
Copy link
Contributor

kidol commented Feb 10, 2016

Yes, similar results. Not sure how they get to 10+ seconds for --version command. I doubt it has to do with CPU after seeing my results for ARM CPU.

I don't completely understand what you say, could you elaborate?

I have no clue about Python, but in the strace results someone posted in the original issue, I've noticed a lot of repetitive file system calls like:

open("/usr/local/bin/youtube-dl", O_RDONLY|O_LARGEFILE)

So I assume these are all the extractors being read? If that's the case, a slow file system could be the bottleneck and that's why I did not see a big difference in my test (fast file system).

@remitamine
Copy link
Collaborator

with python2 i get this error:

python2 __main__.py --version
Traceback (most recent call last):
  File "__main__.py", line 16, in <module>
    import youtube_dl
  File "/home/amine/youtube-dl/youtube_dl/__init__.py", line 43, in <module>
    from .extractor import gen_extractors, list_extractors
  File "/home/amine/youtube-dl/youtube_dl/extractor/__init__.py", line 4, in <module>
    from .lazy_extractors import *
  File "/home/amine/youtube-dl/youtube_dl/extractor/lazy_extractors.py", line 1763
SyntaxError: Non-ASCII character '\xc3' in file /home/amine/youtube-dl/youtube_dl/extractor/lazy_extractors.py on line 1763, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

@remitamine
Copy link
Collaborator

this is the result i get(netbook with AMD 1 GHz cpu and 2 GB RAM):
when i test to extract youtube video i modified the extractor to return immediately in the instialization because the downloading web pages can affect the time.
Without patch:

time python __main__.py --version
2016.02.10

real    0m1.987s
user    0m1.817s
sys 0m0.157s
time python __main__.py --simulate 'http://youtube.com/watch?v=BaW_jenozKcj'

real    0m3.683s
user    0m3.517s
sys 0m0.163s

With patch:

time python __main__.py --version
2016.02.09.1

real    0m1.032s
user    0m0.963s
sys 0m0.060s
time python __main__.py --simulate 'http://youtube.com/watch?v=BaW_jenozKcj'

real    0m1.909s
user    0m1.817s
sys 0m0.067s

@jaimeMF
Copy link
Collaborator Author

jaimeMF commented Feb 11, 2016

@remitamine It should be fixed with jaimeMF/youtube-dl@eee1aca

@SharkWipf
Copy link

Apologies for my earlier, now removed comment, I failed to RTFM and built youtube-dl without the lazy loading support.

I've tested your patch on one of the Raspberries that had the problem to begin with, results are as follows:

"Vanilla" youtube-dl:

time ./youtube-dl-vanilla --version
2016.01.29

real    0m6.404s
user    0m6.140s
sys     0m0.240s
time ./youtube-dl-vanilla --get-url http://youtube.com/watch?v=BaW_jenozKcj

real    0m23.432s
user    0m8.110s
sys     0m0.490s

"Lazy-load" youtube-dl (this time built correctly):

time ./youtube-dl-lazy --version
2016.02.09.1

real    0m1.929s
user    0m1.780s
sys     0m0.150s
time ./youtube-dl-lazy --get-url http://youtube.com/watch?v=BaW_jenozKcj

real    0m13.758s
user    0m4.170s
sys     0m0.290s

My "butchered" version of youtube-dl from #3029, supporting only Youtube, Soundcloud and Bandcamp:

time ./youtube-dl-butchered --version
2016.01.29

real    0m2.358s
user    0m2.180s
sys     0m0.160s
time ./youtube-dl-butchered --get-url http://youtube.com/watch?v=BaW_jenozKcj

real    0m18.967s
user    0m3.940s
sys     0m0.420s

Note, the --get-url results should be taken lightly as this Pi is currently having some network problems. That said, the results seem to be reproducable in concurrent tests.

Seems there's a significant performance improvement in this version, even more than with just removing 900+ lines of downloaders in my butchered version. I haven't tested actually downloading videos with it yet but at least in these tests the difference seems very impressive.

This was tested with the zipped versions of all 3, vanilla downloaded directly from the official download site, the other 2 built with (make lazy-extractors;) make youtube-dl.

@yan12125
Copy link
Collaborator

Fails with python 2.6:

$ PYTHON=python2.6 make lazy-extractors
python2.6 devscripts/make_lazy_extractors.py youtube_dl/extractor/lazy_extractors.py
WARNING: Lazy loading extractors is an experimental feature that may not always work
Traceback (most recent call last):
  File "devscripts/make_lazy_extractors.py", line 53, in <module>
    src = build_lazy_ie(ie, name)
  File "devscripts/make_lazy_extractors.py", line 47, in build_lazy_ie
    s += make_valid_template.format(ie._make_valid_url())
ValueError: zero length field name in format
Makefile:99: recipe for target 'youtube_dl/extractor/lazy_extractors.py' failed
make: *** [youtube_dl/extractor/lazy_extractors.py] Error 1

valid_url=valid_url,
module=ie.__module__)
if ie.suitable.__func__ is not InfoExtractor.suitable.__func__:
s += getsource(ie.suitable)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be more PEP8 with a '\n':

class YahooSearchIE(LazyLoadExtractor):
    _VALID_URL = None
    _module = 'youtube_dl.extractor.yahoo'
    @classmethod
    def suitable(cls, url):
        return re.match(cls._make_valid_url(), url) is not None

    @classmethod
    def _make_valid_url(cls):
        return 'yvsearch(?P<prefix>|[1-9][0-9]*|all):(?P<query>[\\s\\S]+)'

@yan12125
Copy link
Collaborator

Error when downloading multiple URLs of the same InfoExtractor:

$ youtube-dl -vs test:youtube test:youtube_1
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['-vs', 'test:youtube', 'test:youtube_1']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.02.09.1
[debug] Git HEAD: dccd778
[debug] Python version 3.5.1 - Linux-4.4.1-2-ARCH-x86_64-with-arch-Arch-Linux
[debug] exe versions: avconv v12_dev0-2370-gab9068c, avprobe v12_dev0-2370-gab9068c, ffmpeg 3.0, ffprobe 3.0, rtmpdump 2.4
[debug] Proxy map: {}
[TestURL] Test URL: http://www.youtube.com/watch?v=BaW_jenozKcj&t=1s&end=9
[youtube] BaW_jenozKc: Downloading webpage
[youtube] BaW_jenozKc: Downloading video info webpage
[youtube] BaW_jenozKc: Extracting video information
[youtube] BaW_jenozKc: Downloading MPD manifest
[TestURL] Test URL: http://www.youtube.com/watch?v=UxxajLWwzqY
[youtube] UxxajLWwzqY: Downloading webpage
[youtube] UxxajLWwzqY: Downloading video info webpage
[youtube] UxxajLWwzqY: Extracting video information
[youtube] {22} signature length 40.41, html5 player en_US-vfldIygzk
ERROR: Signature extraction failed: Traceback (most recent call last):
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/extractor/youtube.py", line 894, in _decrypt_signature
    if player_id not in self._player_cache:
AttributeError: 'YoutubeIE' object has no attribute '_player_cache'
 (caused by AttributeError("'YoutubeIE' object has no attribute '_player_cache'",)); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/extractor/youtube.py", line 894, in _decrypt_signature
    if player_id not in self._player_cache:
AttributeError: 'YoutubeIE' object has no attribute '_player_cache'
Traceback (most recent call last):
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/extractor/youtube.py", line 894, in _decrypt_signature
    if player_id not in self._player_cache:
AttributeError: 'YoutubeIE' object has no attribute '_player_cache'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/YoutubeDL.py", line 668, in extract_info
    ie_result = ie.extract(url)
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/extractor/common.py", line 315, in extract
    return self._real_extract(url)
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/extractor/youtube.py", line 1395, in _real_extract
    encrypted_sig, video_id, player_url, age_gate)
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/extractor/youtube.py", line 906, in _decrypt_signature
    'Signature extraction failed: ' + tb, cause=e)
youtube_dl.utils.ExtractorError: Signature extraction failed: Traceback (most recent call last):
  File "/home/yen/Executables/Multimedia/youtube-dl/youtube_dl/extractor/youtube.py", line 894, in _decrypt_signature
    if player_id not in self._player_cache:
AttributeError: 'YoutubeIE' object has no attribute '_player_cache'
 (caused by AttributeError("'YoutubeIE' object has no attribute '_player_cache'",)); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

Another minor suggestion: print information about whether lazy extractors are used or not in the verbose log.

@yan12125
Copy link
Collaborator

Patch for some of my ideas:

diff --git a/devscripts/make_lazy_extractors.py b/devscripts/make_lazy_extractors.py
index 8627d0b..5506335 100644
--- a/devscripts/make_lazy_extractors.py
+++ b/devscripts/make_lazy_extractors.py
@@ -41,14 +41,16 @@ def build_lazy_ie(ie, name):
         valid_url=valid_url,
         module=ie.__module__)
     if ie.suitable.__func__ is not InfoExtractor.suitable.__func__:
-        s += getsource(ie.suitable)
+        s += '\n' + getsource(ie.suitable)
     if hasattr(ie, '_make_valid_url'):
         # search extractors
         s += make_valid_template.format(ie._make_valid_url())
     return s

 names = []
-for ie in _ALL_CLASSES:
+sorted_ies = sorted(_ALL_CLASSES, key=lambda c: c.__name__[:-2] if c.__name__ != 'GenericIE' else '')
+sorted_ies = sorted_ies[1:] + [sorted_ies[0]]
+for ie in sorted_ies:
     name = ie.ie_key() + 'IE'
     src = build_lazy_ie(ie, name)
     module_contents.append(src)

@jaimeMF
Copy link
Collaborator Author

jaimeMF commented Feb 21, 2016

@yan12125 thanks a lot for your comments, I thin I have addressed all of them.

Fails with python 2.6:

Fixed in jaimeMF/youtube-dl@a4126fd

Style fixes in jaimeMF/youtube-dl@dd20d6e.

Another minor suggestion: print information about whether lazy extractors are used or not in the verbose log.

jaimeMF/youtube-dl@8c96085

Error when downloading multiple URLs of the same InfoExtractor:

jaimeMF/youtube-dl@29d8ba4

@jaimeMF
Copy link
Collaborator Author

jaimeMF commented Feb 21, 2016

@SharkWipf thanks for testing it, I'm glad it improves the time.

@jaimeMF
Copy link
Collaborator Author

jaimeMF commented Feb 21, 2016

Something "bad" about this change is that all pull request that add a extractor would need to be updated. So I want to expose the rationale for the change:

I initially started by creating a new youtube_dl/lazy_extractors.py file and changed all from .extractor import <something> to:

try:
    from .lazy_extractors import <something>
except ImportError:
    from .extractor import <something>

Apart from the need to change all imports, the main problem is that when you do __import__('youtube_dl.extractor.<somemodule>'), youtube_dl/extractor/__init__.py is also run and therefore all extractors are loaded which makes the change useless. That's why I decided to modify youtube_dl/extractor/__init__.py (which allows to reuse the functions defined there). Maybe it would be easier to handle merge conflicts if the extractors were loaded in the except ImportError: part, but I personally find 900 indented imports a bit ugly.

@dstftw
Copy link
Collaborator

dstftw commented Feb 22, 2016

Currently, it's impossible to make py2exe Windows build with lazy extractors enabled since devscripts/make_lazy_extractors.py is only called via Makefile that is not used in Windows build. Ideally single python setup.py py2exe should still be kept enough for a py2exe build.

@dstftw
Copy link
Collaborator

dstftw commented Feb 22, 2016

Here are some of my measurements.

Command: youtube-dl -v:

Linux:

python 2.7.11, non-lazy:
real 0m0.259s
user 0m0.203s
sys 0m0.033s

python 3.5.1, non-lazy:
real 0m0.369s
user 0m0.307s
sys 0m0.050s

python 2.7.11, lazy:
real 0m0.161s
user 0m0.117s
sys 0m0.030s

python 3.5.1, lazy:
real 0m0.216s
user 0m0.160s
sys 0m0.043s

Windows:

python 2.7.10, non-lazy
0.59+s

python 2.7.10, lazy
0.40+s

For sensible use cases I've got almost similar measurements (lazy/non-lazy) being lazy one even slower in some cases (probably due to network I/O influence).

@jaimeMF
Copy link
Collaborator Author

jaimeMF commented Mar 6, 2016

Currently, it's impossible to make py2exe Windows build with lazy extractors enabled since devscripts/make_lazy_extractors.py is only called via Makefile that is not used in Windows build. Ideally single python setup.py py2exe should still be kept enough for a py2exe build.

Do you want me to add a new distutils command? I'm playing with it, but currently you need to run python setup.py build_lazy_extractors py2exe, is that what you want? (I would avoid doing it for default until we test it more).

@dstftw
Copy link
Collaborator

dstftw commented Mar 6, 2016

I would avoid doing it for default until we test it more

Ok then.

@jaimeMF
Copy link
Collaborator Author

jaimeMF commented Mar 7, 2016

Added in jaimeMF/youtube-dl@a4e1733.

@remitamine
Copy link
Collaborator

i think it will be possible to use this method to improve generic extractor load time if we add the _extract_url.
but the problem is that many extractors uses other names for embed url extraction and also the _extract_url is not generic i think it's better to convert them into _extract_urls(sometimes the page contain multiple embed but the _extract_url only extract one).

@jaimeMF
Copy link
Collaborator Author

jaimeMF commented Mar 17, 2016

@remitamine I think it's better to merge this PR first and then work on that. Note that there is a similar implementation for what you want in #6216. Note that the problem would be that the _extract_url(s) could try to access some property or global variable, so we must be careful on how we do it.

@yan12125
Copy link
Collaborator

yan12125 commented Apr 6, 2016

I guess this PR can be merged?

Here are my tests:
Without lazy load:

time python3.6 -m youtube_dl -v url
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['-v', 'url']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.02.09.1
[debug] Python version 3.6.0a0 - Linux-3.10.49-perf-g4186cc1-aarch64-with-libc
[debug] exe versions: none
[debug] Proxy map: {}
ERROR: You've asked youtube-dl to download the URL "url". That doesn't make any sense. Simply remove the parameter in your command or configuration.
Traceback (most recent call last):
  File "/data/local/tmp/youtube-dl-lazy-load/youtube_dl/YoutubeDL.py", line 668, in extract_info
    ie_result = ie.extract(url)
  File "/data/local/tmp/youtube-dl-lazy-load/youtube_dl/extractor/common.py", line 315, in extract
    return self._real_extract(url)
  File "/data/local/tmp/youtube-dl-lazy-load/youtube_dl/extractor/commonmistakes.py", line 29, in _real_extract
    raise ExtractorError(msg, expected=True)
youtube_dl.utils.ExtractorError: You've asked youtube-dl to download the URL "url". That doesn't make any sense. Simply remove the parameter in your command or configuration.

    0m2.09s real     0m1.93s user     0m0.07s system

With lazy-load:

time python3.6 -m youtube_dl -v url
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['-v', 'url']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.02.09.1
[debug] Lazy loading extractors enabled
[debug] Python version 3.6.0a0 - Linux-3.10.49-perf-g4186cc1-aarch64-with-libc
[debug] exe versions: none
[debug] Proxy map: {}
ERROR: You've asked youtube-dl to download the URL "url". That doesn't make any sense. Simply remove the parameter in your command or configuration.
Traceback (most recent call last):
  File "/data/local/tmp/youtube-dl-lazy-load/youtube_dl/YoutubeDL.py", line 668, in extract_info
    ie_result = ie.extract(url)
  File "/data/local/tmp/youtube-dl-lazy-load/youtube_dl/extractor/common.py", line 315, in extract
    return self._real_extract(url)
  File "/data/local/tmp/youtube-dl-lazy-load/youtube_dl/extractor/commonmistakes.py", line 29, in _real_extract
    raise ExtractorError(msg, expected=True)
youtube_dl.utils.ExtractorError: You've asked youtube-dl to download the URL "url". That doesn't make any sense. Simply remove the parameter in your command or configuration.

    0m1.19s real     0m1.07s user     0m0.09s system

Test environment: My Android phone with my patched Python build.

An incredible improvement! Much thanks @jaimeMF.

@jaimeMF
Copy link
Collaborator Author

jaimeMF commented Apr 6, 2016

I'll need to rebase against the current HEAD, I'll try to do it on the weekend.

jaimeMF added 5 commits April 8, 2016 21:43
'make lazy-extractors' creates the youtube_dl/extractor/lazy_extractors.py (imported by youtube_dl/extractor/__init__.py), which contains simplified classes that only have the 'suitable' class method and that load the appropiate class with the '__new__' method when a instance is created.
When building with python3 the unicode characters are not escaped, python2 needs to know the encoding.
@remitamine
Copy link
Collaborator

not sure why it happen but it should be related to this change.
the url works without lazy extractors but when i use lazy extractors it uses the GenericIE instead of YoutubeUserIE.

[amine@amine youtube-dl]$ make clean 
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete
find . -name "*.class" -delete
[amine@amine youtube-dl]$ make
zip --quiet youtube-dl youtube_dl/*.py youtube_dl/*/*.py
zip --quiet --junk-paths youtube-dl youtube_dl/__main__.py
echo '#!/usr/bin/env python' > youtube-dl
cat youtube-dl.zip >> youtube-dl
rm youtube-dl.zip
chmod a+x youtube-dl
pandoc -f markdown -t plain README.md -o README.txt
/usr/bin/env python devscripts/prepare_manpage.py >youtube-dl.1.temp.md
pandoc -s -f markdown -t man youtube-dl.1.temp.md -o youtube-dl.1
rm -f youtube-dl.1.temp.md
/usr/bin/env python devscripts/bash-completion.py
/usr/bin/env python devscripts/zsh-completion.py
/usr/bin/env python devscripts/fish-completion.py
/usr/bin/env python devscripts/make_supportedsites.py docs/supportedsites.md
[amine@amine youtube-dl]$ ./youtube-dl -f best -o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s' https://www.youtube.com/user/AlltimeConspiracies/videos
[youtube:user] AlltimeConspiracies: Downloading channel page
^C
ERROR: Interrupted by user
[amine@amine youtube-dl]$ make clean 
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete
find . -name "*.class" -delete
[amine@amine youtube-dl]$ make lazy-extractors 
/usr/bin/env python devscripts/make_lazy_extractors.py youtube_dl/extractor/lazy_extractors.py
WARNING: Lazy loading extractors is an experimental feature that may not always work
[amine@amine youtube-dl]$ make
zip --quiet youtube-dl youtube_dl/*.py youtube_dl/*/*.py
zip --quiet --junk-paths youtube-dl youtube_dl/__main__.py
echo '#!/usr/bin/env python' > youtube-dl
cat youtube-dl.zip >> youtube-dl
rm youtube-dl.zip
chmod a+x youtube-dl
COLUMNS=80 /usr/bin/env python youtube_dl/__main__.py --help | /usr/bin/env python devscripts/make_readme.py
/usr/bin/env python devscripts/make_contributing.py README.md CONTRIBUTING.md
pandoc -f markdown -t plain README.md -o README.txt
/usr/bin/env python devscripts/prepare_manpage.py >youtube-dl.1.temp.md
pandoc -s -f markdown -t man youtube-dl.1.temp.md -o youtube-dl.1
rm -f youtube-dl.1.temp.md
/usr/bin/env python devscripts/bash-completion.py
/usr/bin/env python devscripts/zsh-completion.py
/usr/bin/env python devscripts/fish-completion.py
/usr/bin/env python devscripts/make_supportedsites.py docs/supportedsites.md
[amine@amine youtube-dl]$ ./youtube-dl -f best -o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s' https://www.youtube.com/user/AlltimeConspiracies/videos
[generic] videos: Requesting header
WARNING: Falling back on generic information extractor.
[generic] videos: Downloading webpage

@jaimeMF
Copy link
Collaborator Author

jaimeMF commented Apr 15, 2016

@remitamine it's the suitable mehtod for YoutubeUserIE:

    @classmethod
    def suitable(cls, url):
        # Don't return True if the url can be extracted with other youtube
        # extractor, the regex would is too permissive and it would match.
        other_ies = iter(klass for (name, klass) in globals().items() if name.endswith('IE') and klass is not cls)
        if any(ie.suitable(url) for ie in other_ies):
            return False
        else:
            return super(YoutubeUserIE, cls).suitable(url)

It's also picking GenericIE, and since it matches all urls it returns False. I'm not sure what would be the cleanest fix, I can think of just changing it in youtube/extractor/youtube.py to look like

iter(klass for (name, klass) in globals().items() if name.endswith('IE') and not name == 'GenericIE' and klass is not cls)

Do you have a better suggestion?

@remitamine
Copy link
Collaborator

remitamine commented Apr 15, 2016

as i understand the suitable method here tries to see if the url match other youtube extractors, may be we can check only for extractors that starts with Youtube

@jaimeMF
Copy link
Collaborator Author

jaimeMF commented Apr 15, 2016

as i understand the suitable method here tries to see if the url match other youtube extractors, may be we can check only for extractors that starts with Youtube

That sounds better.

@remitamine
Copy link
Collaborator

thanks for the help.
now it work with f3a58d4.

@remitamine
Copy link
Collaborator

there is an error happen when i use Lazy Extractors:

python __main__.py -v -F 'http://www.youtube.com/watch?v=BaW_jenozKc'
[debug] System config: []
[debug] User config: ['--external-downloader', 'aria2c', '--sub-lang', 'en,ar', '--write-sub', '--sub-format', 'ass/vtt/srt/best', '-f', 'best[height<=720]/bestvideo[height<=720]+bestaudio', '--hls-prefer-native']
[debug] Command-line args: ['-v', '-F', 'http://www.youtube.com/watch?v=BaW_jenozKc']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.06.20
[debug] Lazy loading extractors enabled
[debug] Git HEAD: 1ac5705
[debug] Python version 3.5.1 - Linux-4.6.2-1-ARCH-i686-with-arch
[debug] exe versions: ffmpeg 3.0.2, ffprobe 3.0.2, rtmpdump 2.4
[debug] Proxy map: {}
Traceback (most recent call last):
  File "__main__.py", line 19, in <module>
    youtube_dl.main()
  File "/home/amine/youtube-dl/youtube_dl/__init__.py", line 420, in main
    _real_main(argv)
  File "/home/amine/youtube-dl/youtube_dl/__init__.py", line 410, in _real_main
    retcode = ydl.download(all_urls)
  File "/home/amine/youtube-dl/youtube_dl/YoutubeDL.py", line 1740, in download
    url, force_generic_extractor=self.params.get('force_generic_extractor', False))
  File "/home/amine/youtube-dl/youtube_dl/YoutubeDL.py", line 667, in extract_info
    if not ie.suitable(url):
  File "/home/amine/youtube-dl/youtube_dl/extractor/lazy_extractors.py", line 213, in suitable
    return False if ArteTVPlaylistIE.suitable(url) else super(ArteTVPlus7IE, cls).suitable(url)
TypeError: super(type, obj): obj must be an instance or subtype of type

@jaimeMF
Copy link
Collaborator Author

jaimeMF commented Jun 22, 2016

@remitamine should be fixed in 169d836. Thanks for pointing it. For future problems, feel free to open a new issue and ping me.

@royale1223
Copy link

@jaimeMF Hi, how stable is this patch? Close to production ready?

@jaimeMF
Copy link
Collaborator Author

jaimeMF commented Jul 29, 2016

It works, but some details may not work correctly and sometimes it breaks.

@royale1223
Copy link

@jaimeMF anything specific I should worry about?

@jaimeMF
Copy link
Collaborator Author

jaimeMF commented Jul 29, 2016

@jaimeMF anything specific I should worry about?

No. If you find something that doesn't work, open a new issue and we will fix it.

@yan12125
Copy link
Collaborator

yan12125 commented Sep 1, 2016

@royale1223 It's not caused by this patch. This URL redirects to https://vimeo.com/ondemand/castlesinthesky/89677808, and the latter works fine with youtube-dl. Nevertheless the original URL should be supported as well. Could you open a new issue?

@royale1223
Copy link

royale1223 commented Dec 14, 2016

@yan12125 Happens because it's not a free video.

@alejomendoza
Copy link

Hi there! just wondering where is the documentation on how to lazy load the info extractors. Is there a flag I can pass when getting the url info youtube-dl -g --youtube-skip-dash-manifest https://www.youtube.com/watch?v=V0Ll64U-FuY ?

@yan12125
Copy link
Collaborator

@alejomendoza clone this repository and run the following two commands:

make lazy-extractors
make youtube-dl

And replace the official youtube-dl with the newly generated one.

fluxw42 referenced this pull request in fluxw42/youtube-dl Apr 11, 2017
Move import of nieuwsblad extractor from __init__.py to extractors.py #8497
@tobimensch
Copy link

If this performance improvement is working without issues by now, why not make it the default?

@yan12125
Copy link
Collaborator

yan12125 commented Jul 3, 2017

Actually lazy extractors break testing. (#13554) It should be fixed for developers before making it as the default for users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants