Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[roosterteeth] Added new extractor #6536

Closed
wants to merge 7 commits into from
Closed

Conversation

ngld
Copy link
Contributor

@ngld ngld commented Aug 12, 2015

This extractor allows you to download single videos are whole seasons from roosterteeth.com, achievementhunter.com and fun.haus.

The RoosterteethShowIE allows you to filter videos using a simple regex filter (the second test case contains an example). I've added that feature since I found no other way to do this with YTDL's own filters. I hope you don't mind.

This resolves #6371.

@dstftw
Copy link
Collaborator

dstftw commented Aug 12, 2015

Broken on python 2:

py26yt "http://roosterteeth.com/show/red-vs-blue#;season=.* 1$" -v
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'http://roosterteeth.com/show/red-vs-blue#;season=.* 1$', u'-v']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2015.08.09
[debug] Git HEAD: 5e879ff
[debug] Python version 2.6.6 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-73993-g8a17335, ffprobe N-73993-g8a17335, rtmpdump 2.4
[debug] Proxy map: {}
Traceback (most recent call last):
  File "youtube_dl/__main__.py", line 19, in <module>
    youtube_dl.main()
  File "C:\Dev\git\youtube-dl\master\youtube_dl\__init__.py", line 410, in main
    _real_main(argv)
  File "C:\Dev\git\youtube-dl\master\youtube_dl\__init__.py", line 400, in _real_main
    retcode = ydl.download(all_urls)
  File "C:\Dev\git\youtube-dl\master\youtube_dl\YoutubeDL.py", line 1653, in download
    url, force_generic_extractor=self.params.get('force_generic_extractor', False))
  File "C:\Dev\git\youtube-dl\master\youtube_dl\YoutubeDL.py", line 655, in extract_info
    ie_result = ie.extract(url)
  File "C:\Dev\git\youtube-dl\master\youtube_dl\extractor\common.py", line 286, in extract
    return self._real_extract(url)
  File "C:\Dev\git\youtube-dl\master\youtube_dl\extractor\roosterteeth.py", line 51, in _real_extract
    ep_filter = compat_urllib_parse.parse_qs(params)
AttributeError: 'module' object has no attribute 'parse_qs'

Doesn't work for me at all:

py34yt "http://roosterteeth.com/show/red-vs-blue#;season=.* 1$" -v
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['http://roosterteeth.com/show/red-vs-blue#;season=.* 1$', '-v']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2015.08.09
[debug] Git HEAD: 5e879ff
[debug] Python version 3.4.3 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-73993-g8a17335, ffprobe N-73993-g8a17335, rtmpdump 2.4
[debug] Proxy map: {}
[RoosterteethShow] red-vs-blue: Downloading webpage
ERROR: Unable to download webpage: HTTP Error 403: Forbidden (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are usin
g the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "C:\Dev\git\youtube-dl\master\youtube_dl\extractor\common.py", line 325, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "C:\Dev\git\youtube-dl\master\youtube_dl\YoutubeDL.py", line 1860, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "C:\python\python343\lib\urllib\request.py", line 469, in open
    response = meth(req, response)
  File "C:\python\python343\lib\urllib\request.py", line 579, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\python\python343\lib\urllib\request.py", line 507, in error
    return self._call_chain(*args)
  File "C:\python\python343\lib\urllib\request.py", line 441, in _call_chain
    result = func(*args)
  File "C:\python\python343\lib\urllib\request.py", line 587, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

UPD: serves me with 403 captcha.

@dstftw
Copy link
Collaborator

dstftw commented Aug 12, 2015

I don't think URL is a good place for some custom extractor-specific filters that's not supported by the website itself. It should be generic or should not at all. There is a --match-title to filter by title.

@ngld
Copy link
Contributor Author

ngld commented Aug 12, 2015

I've fixed the first error and removed my own filter. I'm not sure what caused the 403 error in your case. Can you try again with -v --dump-pages?

--match-filter won't work in this case since the titles don't include the season number. I've tried to add a season field to the playlist item but the RoosterteethIE extractor doesn't have access to it and its result replaces the playlist entry.

@dstftw
Copy link
Collaborator

dstftw commented Aug 12, 2015

-v is already present, --dump-pages won't help since it's 403.
It's a captcha page:

<html>

<head>

<title>Rooster Teeth &middot; Argh!</title>

<style type='text/css'>body,td{cursor:default;}body{background:#fff;color:#000;font:12px Arial , Helvetica , sans-serif;}h2{color:#222;}td{font-size:11px;line-height:150%;vertical-align:top;}a{font-size:11px;text-decoration:none;line-height:150%;color:#c2262b;cursor:pointer;}a:hover{text-decoration:underline;}.secret{font-size:11px;color:#eee;}</style>

</head>

<body>

<center>

<img width="300" height="194" src="">

<br />

<div>

<form class="challenge-form" id="challenge-form" action="/cdn-cgi/l/chk_captcha" method="get">
<script type="text/javascript" src="/cdn-cgi/scripts/cf.challenge.js" data-type="custom"  data-ray="214e099963600ca7" async></script>
<noscript id="cf-captcha-bookmark" class="cf-captcha-info">
  <iframe src="//www.google.com/recaptcha/api/noscript?k=6LeT6gcAAAAAAAZ_yDmTMqPH57dJQZdQcu6VFqog" height="300" width="500" frameborder="0"></iframe>
  <input type="hidden" name="recaptcha_response_field" value="manual_challenge">
  <label for="manual_recaptcha_challenge_field">Enter confirmation code after solving challenge above</label>
  <textarea id="manual_recaptcha_challenge_field" name="recaptcha_challenge_field" rows="3" cols="40"></textarea>
  <button type="submit" class="cf-captcha-submit">Submit</button>
</noscript>
</form>


</div>



<br />

<br />

</center>

</body>

</html>

@dstftw
Copy link
Collaborator

dstftw commented Aug 12, 2015

By the way, technically you are free to build titles in any way.

@ngld
Copy link
Contributor Author

ngld commented Aug 12, 2015

That captcha page is generated by CloudFlare. Strange, I've never encountered it myself.
I'm not sure how to handle that page... Does it work after solving the captcha in your browser?

I'd like to add the season to an episode's title but the video page doesn't include the required information and I don't see a way to pass the season from RoosterteethShowIE to RoosterteethIE.

@dstftw
Copy link
Collaborator

dstftw commented Aug 12, 2015

Yes, I can browse it in browser it after solving the captcha. Workaround is to pass cookies exported from browser to youtube-dl. But there is nothing can be done in extractor.
For passing custom data smuggle_url and unsmuggle_url can be used.

@ngld
Copy link
Contributor Author

ngld commented Aug 12, 2015

Alright, now you can use --match-filter '^Season 2:' if you only want episodes from the second season.
You can also use the season in the filename template (i.e. -o '%(season)s/%(raw_title)s.%(ext)s').
I'm quite happy with this solution. Thanks for helping me out.

The extractor now uses the native HLS implementation. Anything else I should change?

if 'youtubeKey' not in meta:
raise ExtractorError('Invalid metadata for youtube video!')

res = self.url_result('https://youtube.com/watch?v=' + meta['youtubeKey'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should directly do:

res = {
    '_type': 'url_transparent',
   'url': 'https://youtube.com/watch?v=' + meta['youtubeKey'],
   'id': video_id,
}

@ngld
Copy link
Contributor Author

ngld commented Oct 1, 2015

I've updated the extractor and rebased the fork. Anything else I should change?

@yan12125
Copy link
Collaborator

yan12125 commented Apr 9, 2016

Since #8497 landed, please move changes in youtube_dl/extractors/__init__.py to youtube_dl/extractors/extractors.py. Check step 5 of the new developer instructions for more information.

@antdude
Copy link

antdude commented Jun 13, 2018

Still waiting for the fixes? I still can't download like from https://roosterteeth.com/episode/death-battle-season-5-doctor-strange-vs-doctor-fate-marvel-vs-dc ... :(

@ngld
Copy link
Contributor Author

ngld commented Jun 13, 2018

This extractor was for the old site. The video embeds have changed and this code won't work anymore.

@ngld ngld closed this Jun 13, 2018
@ngld
Copy link
Contributor Author

ngld commented Jun 13, 2018

@antdude This is the PR you're looking for: #16105

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roosterteeth.com now unsupported
5 participants