Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Roosterteeth] Update extractor to support new site. #16105

Closed
wants to merge 14 commits into from

Conversation

ddmgy
Copy link

@ddmgy ddmgy commented Apr 6, 2018

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

This updates the extractor for Roosterteeth.com to support the recent changes to the website. It now uses the officially provided API rather than scraping HTML. Addresses issue #16094.

There are some issues that need to be fixed before it can be merged.

  • Videos that do not require log in can be downloaded with no issue, but attempting to log in fails. I have not yet found a way to log in through the new API. The old log in URL stills exists, and can be used to log in through a browser. However, the form is now loaded through Javascript after the page loads, so youtube-dl seems unable to grab it. Logging in and downloading FIRST-only videos now works.

  • The changes I made don't look great, as I just wanted to get it mostly working before hopefully getting feedback on the log in issue. I'll be going through and fixing things up over the weekend. I've cleaned things up and made user-facing strings less generic. I'm open to any suggestions for further changes.

Logging in is done by POSTing username and password to an authorization URL, and an access token is returned. This access token is a cookie in the browser, so it could be saved as a cookie here, but I'm not sure how to do that, or how to check if a saved cookie is expired. I'll look more in to it.

Barring any requested changes, this seems ready to be merged.

Edit: The access token is a standard JWT that is valid for 8 hours after logging in. I have implemented saving the access token as a cookie and re-downloading it after it expires, so users who download multiple videos in a short time will not have to log in every time.

@DIzFer
Copy link

DIzFer commented Apr 28, 2018

I came here looking for this precise fix and I can confirm it works for me. Is it usual for a PR like this to sit unmerged for 3 weeks?

@ddmgy
Copy link
Author

ddmgy commented Apr 28, 2018

@DIzFer Thanks for confirming that it works! It's been working for me, but I don't watch Rooster Teeth videos very often anymore, so I haven't really been able to test my fix extensively.

I don't know how long a PR usually sits waiting for a review or merge, but I imagine the maintainers simply haven't had the time to check this one yet. @dstftw seems to be the most active (or only?) developer of youtube-dl, so I would guess this PR has just gotten lost in the flood of issues and pull requests the project gets every week.

@au5ton
Copy link

au5ton commented May 26, 2018

bump because this is still unmerged on the master branch

youtube_dl/extractor/roosterteeth.py Outdated Show resolved Hide resolved
youtube_dl/extractor/roosterteeth.py Outdated Show resolved Hide resolved
youtube_dl/extractor/roosterteeth.py Outdated Show resolved Hide resolved
youtube_dl/extractor/roosterteeth.py Outdated Show resolved Hide resolved
youtube_dl/extractor/roosterteeth.py Outdated Show resolved Hide resolved
youtube_dl/extractor/roosterteeth.py Outdated Show resolved Hide resolved

if len(video_response.get('data', [])) == 0:
raise ExtractorError('Unable to download video information')
video_attributes = video_response.get('data')[0].get('attributes')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what 'Same' is meant to be referencing, but I removed the check and error, and changed video_attributes declaration to use try_get. If this is not what you meant for me to change, I can revert it.

youtube_dl/extractor/roosterteeth.py Outdated Show resolved Hide resolved
youtube_dl/extractor/roosterteeth.py Outdated Show resolved Hide resolved
youtube_dl/extractor/roosterteeth.py Outdated Show resolved Hide resolved
@ddmgy
Copy link
Author

ddmgy commented Jun 12, 2018

I've (hopefully) made the requested changes. Awaiting further review.

@RichardHancock
Copy link

Thanks for this, seems to be working on every video I've tried so far. One critique would be that the thumbnails it fetches are tiny (240ish), even though there are 720-1080p thumbnails available on the video page, minor thing but would be great if someone could fix.
Thanks

@ddmgy
Copy link
Author

ddmgy commented Jul 13, 2018

@RichardHancock It should now download a larger image for the thumbnail.

@RichardHancock
Copy link

@ddmgy Thanks that works great. Seems to download with incorrect image file extension for some videos, but that's Rooster Teeth's end as I had this issue when manually downloading them.

Thanks for doing this, works just as good as the old website extractor now, Hopefully someone can get this merged in soon

@ddmgy
Copy link
Author

ddmgy commented Jul 13, 2018

@RichardHancock Glad to hear it works. Would you mind telling me which videos are giving the wrong extension for thumbnails? I'll see if I can fix it.

@RichardHancock
Copy link

@ddmgy Seems to be pretty random, only ones I can find doing it right now are recent Camp Camp episodes. I assume one departments just uploading them wrong.
Here's one that I've checked: https://roosterteeth.com/episode/camp-camp-season-3-7
That one downloads a .png but when opened in some programs they throw an error saying that its actually a jpg (other programs just silently load it anyway).
It doesn't seem to be as bad as when I was manually downloading the thumbnails as some of those images were .webp with .png extensions which almost no program could detect and fix. I just assumed it was the same problem.

I don't really think it's anything that you need to fix, it's just their mistake. I'll post again if I find other series doing it.
Also that latest commit you made makes it download the small thumbnails again, but I assume your just testing things.

Thanks

… download the largest available image now.
@ddmgy
Copy link
Author

ddmgy commented Jul 14, 2018

@RichardHancock Hmm, the thumbnails returned for that Camp Camp episode are a mix of PNGs and JPGs, and I'm not sure which set of those youtube-dl is actually grabbing. I would assume the API is returning the data in the same order every time, so it should be getting PNGs, but I'll test that more tonight. Actually, I see now that even when the extension on the URL shows it is a PNG, it is sometimes actually a JPG image. The image opens fine for me in everything I'm using, but you're right: this is on RoosterTeeth's side, and there's nothing I can fix there.

Unfortunately, even if I always manage to grab the same set of thumbnails for a video, there is no way to tell what size each of the images is, and they don't always seem to be ordered correctly. When --write-thumbnail is passed to youtube-dl, it will only download the last thumbnail in the list of thumbnails (which is what my latest commit generates). You can use --write-all-thumbnails, which will download all 4 thumbnails, but then you'd have to check each one manually to see which is the size you want and then delete the other 3.

Well, you can still use --write-all-thumbnails and manually select which image you want to keep, but using --write-thumbnail should at least choose to download only the largest available thumbnail now. I wish there was a proper way to choose which thumbnail you want to download, but I'm not seeing any available options for that.

Thanks for your feedback on this. I don't use thumbnails, so it probably never would have come to my attention that I was grabbing the wrong size.

@ddmgy
Copy link
Author

ddmgy commented Sep 15, 2018

As I no longer have a FIRST subscription with roosterteeth.com, I will be unable to make any further changes to this pull request, in the unlikely event that it is reviewed again or considered for merging.

If anyone wants to take over maintenance of this pull request, leave a comment here so we can figure out a way to transfer it over.

@RichardHancock
Copy link

@ddmgy I would be willing to take it over if needed.

@RichardHancock
Copy link

The test for the extractor is failing at the moment,
'episode': 'S2:E10 - Million Dollars, But... The Game Announcement'
Needs to change to:
'episode': 'Million Dollars, But... The Game Announcement'
for it to pass the test

@lanodan
Copy link

lanodan commented Dec 3, 2018

Hi, any ways this could support bonus episodes? For example https://roosterteeth.com/bonus-feature/rwby-adam-short (volume 6, adam character)

Fixed test for new rooster teeth site
@AevumDecessus
Copy link

@dstftw
I've updated the tests for this change, and they're passing now. I can confirm that the module is working with the new Rooster Teeth site, both for public videos, as well as ones that require a FIRST subscription/login in order to download.

@mDuo13
Copy link

mDuo13 commented Mar 10, 2019

Any updates to this or the other PR for presumably the same problem (#17843)? It's pretty inconvenient using the API workaround described in #16094, so I'd encourage @dstftw or other maintainers to merge this...

Given that the Rooster Teeth extractor doesn't work at all in the current master branch, I would think there's not as much worry about breaking it with changes like these...

@lanodan
Copy link

lanodan commented Mar 10, 2019 via email

@terraboops
Copy link

FWIW I used this change to download stuff... so, it does work... It'd be really nice to just merge this..................... .. .. . . . . . . . . . . . . . .

@terraboops
Copy link

The requested changes seem pretty pedantic too - just merge it 🚀

@Qyriad
Copy link

Qyriad commented May 9, 2019

Weren't the requested changes made anyway? Or at least, it doesn't seem to me like there's been any review since #16105 (comment).
I don't really understand why neither this nor #17843 have been merged, given that the Roosterteeth extractor is completely non-functional on master.
I've had this merged on my local installation for months.

@au5ton
Copy link

au5ton commented May 9, 2019

Please merge.

@lanodan
Copy link

lanodan commented May 9, 2019 via email

@jonnyrobbie
Copy link

jonnyrobbie commented May 11, 2019

Weird, here it just works:

yeah, that was my bad. I cloned the repos and switched the branches, and then just run the bin/ executable, which just imports the system vanilla version of ytdl. I should have run it with python -m youtube_dl.

After realizing that, it worked fine.

@axipher
Copy link

axipher commented Jul 18, 2019

Any update on if this is getting merged in to the official build? Or if it still works?

@jonnyrobbie
Copy link

I've tried that a few days ago and it still worked for me. But I was simply downloading single episodes by the simple episode url. Maybe some more advanced querries still cause issues? I don't know.

@JordanCarr
Copy link

The URL scheme for Rooster Teeth appears to have changed from roosterteeth.com/episode/foo to roosterteeth.com/watch/foo. When using one of the new /watch/ URLs the extractor doesn’t recognize it for the RoosterTeeth extractor and then the generic one fails. Even with the new URL simply changing /watch/ to /episode/ allows it to continue to work.

@agent619
Copy link

agent619 commented Aug 7, 2019

I'm trying to use this version and I'm getting a 403 error:

youtube-dl --verbose https://roosterteeth.com/episode/backwardz-compatible-2019-sadism-or-masochism
[debug] System config: []
[debug] User config: [u'-o', u'~/Movies/%(title)s.%(ext)s', u'-f', u'bestvideo[height<=640]+bestaudio[height<=640]/best[height<=640]']
[debug] Custom config: []
[debug] Command-line args: [u'--verbose', u'https://roosterteeth.com/episode/backwardz-compatible-2019-sadism-or-masochism']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2019.08.02
[debug] Python version 2.7.11 (CPython) - Darwin-18.6.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 4.1.4, ffprobe 4.1.4, rtmpdump 2.4
[debug] Proxy map: {}
[RoosterTeeth] backwardz-compatible-2019-sadism-or-masochism: Downloading video information (1/2)
[RoosterTeeth] backwardz-compatible-2019-sadism-or-masochism: Downloading video information (2/2)
ERROR: Unable to download video information (2/2): HTTP Error 403: Forbidden (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/youtube_dl-2019.8.2-py2.7.egg/youtube_dl/extractor/common.py", line 627, in _request_webpage
return self._downloader.urlopen(url_or_request)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/youtube_dl-2019.8.2-py2.7.egg/youtube_dl/YoutubeDL.py", line 2227, in urlopen
return self._opener.open(req, timeout=self._socket_timeout)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 437, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 475, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 558, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

@JordanCarr
Copy link

I'm trying to use this version and I'm getting a 403 error:

youtube-dl --verbose https://roosterteeth.com/episode/backwardz-compatible-2019-sadism-or-masochism
[debug] System config: []
[debug] User config: [u'-o', u'~/Movies/%(title)s.%(ext)s', u'-f', u'bestvideo[height<=640]+bestaudio[height<=640]/best[height<=640]']
[debug] Custom config: []
[debug] Command-line args: [u'--verbose', u'https://roosterteeth.com/episode/backwardz-compatible-2019-sadism-or-masochism']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2019.08.02
[debug] Python version 2.7.11 (CPython) - Darwin-18.6.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 4.1.4, ffprobe 4.1.4, rtmpdump 2.4
[debug] Proxy map: {}
[RoosterTeeth] backwardz-compatible-2019-sadism-or-masochism: Downloading video information (1/2)
[RoosterTeeth] backwardz-compatible-2019-sadism-or-masochism: Downloading video information (2/2)
ERROR: Unable to download video information (2/2): HTTP Error 403: Forbidden (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/youtube_dl-2019.8.2-py2.7.egg/youtube_dl/extractor/common.py", line 627, in _request_webpage
return self._downloader.urlopen(url_or_request)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/youtube_dl-2019.8.2-py2.7.egg/youtube_dl/YoutubeDL.py", line 2227, in urlopen
return self._opener.open(req, timeout=self._socket_timeout)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 437, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 475, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 558, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

It looks like that video is behind the FIRST subscription and you’ve not supplied a login so as to access FIRST content. Try supplying a username with --username and entering your password when prompted. You could also use a cookie file if the previous method doesn’t work

@agent619
Copy link

agent619 commented Aug 7, 2019

I'm trying to use this version and I'm getting a 403 error:
youtube-dl --verbose https://roosterteeth.com/episode/backwardz-compatible-2019-sadism-or-masochism
[debug] System config: []
[debug] User config: [u'-o', u'~/Movies/%(title)s.%(ext)s', u'-f', u'bestvideo[height<=640]+bestaudio[height<=640]/best[height<=640]']
[debug] Custom config: []
[debug] Command-line args: [u'--verbose', u'https://roosterteeth.com/episode/backwardz-compatible-2019-sadism-or-masochism']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2019.08.02
[debug] Python version 2.7.11 (CPython) - Darwin-18.6.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 4.1.4, ffprobe 4.1.4, rtmpdump 2.4
[debug] Proxy map: {}
[RoosterTeeth] backwardz-compatible-2019-sadism-or-masochism: Downloading video information (1/2)
[RoosterTeeth] backwardz-compatible-2019-sadism-or-masochism: Downloading video information (2/2)
ERROR: Unable to download video information (2/2): HTTP Error 403: Forbidden (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/youtube_dl-2019.8.2-py2.7.egg/youtube_dl/extractor/common.py", line 627, in _request_webpage
return self._downloader.urlopen(url_or_request)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/youtube_dl-2019.8.2-py2.7.egg/youtube_dl/YoutubeDL.py", line 2227, in urlopen
return self._opener.open(req, timeout=self._socket_timeout)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 437, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 475, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 558, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

It looks like that video is behind the FIRST subscription and you’ve not supplied a login so as to access FIRST content. Try supplying a username with --username and entering your password when prompted. You could also use a cookie file if the previous method doesn’t work

Ah I see! Thanks for that. I have clearly misunderstood ddmgy's description - I had thought that her extractor meant that I didn't have to log in at all. I'll try this then!

@lanodan
Copy link

lanodan commented Aug 12, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.