[WIP] [Funimation] Fixes for language matching and playlist handling to pull full shows #13515

kaithar · 2017-06-29T00:37:30Z

Before submitting a pull request make sure you have:

At least skimmed through adding new extractor tutorial and youtube-dl coding conventions sections
Searched the bugtracker for similar pull requests

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

I am the original author of this code and I am willing to release it under Unlicense

What is the purpose of your pull request?

Bug fix
Improvement

Description of your pull request and other information

First off, I've skimmed the docs, but I've not actually checked if I'm following them. This is a WIP I'm submitting so that others can test it and suggest errors. I'm also not certain it'll pass tests, if it doesn't I'll fix. That said, I've been using and testing this code for a while now, and I think I've ironed out all the bugs and corner cases, so it does work even if it's not the most polished PR. I've also left in some debug and informational print statements. These should probably be converted to something more in keeping with the style guidelines but I'm not familiar enough with the overall code base to know what.

Second, there is some overlap between this PR and #9515 ... I'm not sure how much but I've definitely produced a competing show playlist solution. I doubt both will merge and it looks long abandoned, but I'd like to look at picking up some of the changes from that PR in this one if that's ok.

So, disclaimers done, on to what this pull is aiming at doing. As mentioned in #13225 I've been fixing some issues with the Funimation extractor.

Firstly, I've implemented a full show playlist extractor. I've made some assumptions in it, but I think they're fair. Specifically:

I've only implemented logic to handle the US and UK versions of Funimation's site. There appears to be others, based on their own code, but I've not used them and can't access them anyway.
I've hard coded some language targeting logic but it only targets English and Japanese. There might be some others available on the site, but it seems like an acceptable choice.
The code doesn't attempt to validate the requested options, only to replicate them to the correct episode URLs.

Secondly, I've given the actual episode extractor an overhaul. It appears Funimation aren't too picky about what language they initially offer you, and they're entirely consistent, so I've had to build code to attempt to work out the best option.

The logic is that it pulls out the simulcast or uncut value from the url, the language choice if one is given, then tries to find a source that matches both those values and the episode the url loads. Failing that it first tries the desired language with each of 'uncut', 'simulcast', 'extras', then it repeats the desired alpha and those 3 fall backs for English then Japanese. It accepts the first combination it finds a valid source entry for, since there are cases of episodes where those combinations are available but list no sources.

Points 1 and 2 for the playlist extractor are also valid for the episode extractor.

There are some oddities in the episode numbering, but unfortunately those seem to be related more to the numbering on the site than the extractor code.

There are some features I want to add yet, adding language to the info dict is top of my list, but it seems ready for review and someone asked me to post it.

josegonzalez · 2017-06-29T07:23:57Z

Forgive my stupidity, but how do you specify the language here? None of the parameters on youtube-dl seem to show a way to set the audio language, just the subtitles.

kaithar · 2017-06-29T16:08:54Z

@josegonzalez Funimation episodes, when accessed in the browser, use the lang querystring to select the language you want, so I'm using the same method. The regex looks for an optional pattern along the lines of \?lang=(english|japanese) at the end of the url. Funimation itself doesn't support a url selection for entire shows but I've applied the same querystring method to them for allowing input.

It would be nice if there were an option, maybe an extension of -f, that allowed language selection to be passed to the extractor. There's a format_id add on for -f that I saw mentioned in #8891 but I've no idea how or if that actually works with the extractors. To my mind, the format supported by the upstream site should probably win out over options though, if following the principle of least surprise.

dstftw · 2017-06-30T13:50:40Z

Read

youtube-dl coding conventions

and fix all issues.

programmeroftheeve · 2017-07-08T20:05:22Z

I was messing around with this fork and found something interesting when using my account. Downloading the webpage with youtube-dl shows the KANE_customdimentions as a new customerType;

var KANE_customdimensions = {
        'event':'set-customdimensions',
        'requesttime': Date(),
        'territory': 'US',
        'sessionCookie': getCustomDimensionCookie('PIsession'),
        'userCookie': getCustomDimensionCookie('PIuser'),
        
            'customerType': 'New',
        
    }

However when using cookies gathered from chrome I am getting:

var KANE_customdimensions = {
        'event':'set-customdimensions',
        'requesttime': Date(),
        'territory': 'US',
        'sessionCookie': getCustomDimensionCookie('PIsession'),
        'userCookie': getCustomDimensionCookie('PIuser'),
        
            'userCountry': 'US',
            'planName':'Premium - Yearly',
            'userAccessLevel':'Premium',
            'autoPlay':'false',
            'customerType': 'Repeat',
            'audioPreference':'english_dubbed',
            'restrictmatureContent':'false',
            'qualityPreference':'hd_720',
            'userID':'256670',
            
            'customerGender':'None',
            'customerAge':'23',
        
    }

This helps with getting the correct formats.

kaithar · 2017-07-12T02:53:28Z

@btaidm are you using netrc to login from youtube-dl?

programmeroftheeve · 2017-07-12T15:02:00Z

@kaithar I am passing login info from the command line. I can try using netrc to see if that would work without passing the cookie file.

jackalblood · 2017-07-12T20:01:51Z

Hi guys sorry to post here but I really didn't know where else to go

I'm a self Internet taught Linux user and finally managed to get funimation working with youtube-dl now as you may have guessed I have no idea how to pass the argument to youtube-dl to grab the English language dub and by default it seems to only grab Japanese

I see the links are appended with /a=1 for English so I assumed that would grab the English dub but it doesn't

I've read around and to be totally honest I'm lost on what to do or if it's even been implemented yet since I see it's a work in progress

I've tried adding lang=english too but to no avail

Any help for a novice user would be greatly appreciated thank you

mariuszskon · 2017-07-13T00:09:46Z

@jackalblood This is probably not the best place to discuss this, but by browsing the code, it appears you need to add ?lang=mylang to the very end of the URL where mylang is either english or japanese. However, it should default to english. Therefore, if you are still getting the Japanese, that means either one of two things:

The extractor is not working as intended
Funimation is not providing you with an English dub anyway (not available in your region etc.) You should check if you can get the dub through your browser first.
You are using the official youtube-dl and not this fork! Either wait for it to be (myabe) merged, or make sure you get the fork itself.

mariuszskon · 2017-07-13T00:10:37Z

Funimation itself doesn't support a url selection for entire shows but I've applied the same querystring method to them for allowing input.

I think adding something to the URL is a really poor way of selecting formats based on language (especially if the site itself does not do this). youtube-dl should either add language as a format selector to prevent confusion, or extractors like this one should make a format_id which includes the language (as per @kaithar's mention of #8891). Yes, it does work.

jackalblood · 2017-07-13T00:14:38Z

I'm guessing option 3 is my issue currently since I can verify 1 and 2 I'll look into how to change my current fork since I'm using a pip install just one of them things I just didn't think of thanks very much for the information I won't take up any more of your time thank you again

programmeroftheeve · 2017-07-13T12:12:22Z

@jackalblood
The easiest thing to do is to clone @kaithar's fork and run youtube-dl following the development instructions.

cd /path/to/cloned/repo
python -m youtube_dl args...

or

PYTHONPATH=/path/to/clone/repo python -m youtube_dl args...

programmeroftheeve · 2017-07-13T12:13:24Z

@kaithar
Switching to the netrc file didn't change anything.

jackalblood · 2017-07-13T14:50:51Z

@btaidm that definitely got me working with the correct fork now but the problem is still persisting I've uninstall all other versions just to be sure and with or without the ?lang=english I'm getting Japanese so either I'm missing something or it's just not working out for me

programmeroftheeve · 2017-07-13T19:27:12Z

@jackalblood
The easiest way to fix that would to login to your funimation account through your webbrowser, and then download the cookies for funimation.

For chrome, I use the cookies.txt extension.

Then pass the cookie as an argument to youtube-dl

python -m youtube_dl --cookie /path/to/cookie.txt args...

jackalblood · 2017-07-13T19:36:54Z

@btaidm I didn't even realise I could do that I'll give it a go as soon as I'm home thank you will the cookies allow me to pull a certain language then?

jackalblood · 2017-07-13T19:59:14Z

@btaidm That was the solution! Thank you very much for taking the time to walk me through it all it was really very simple if you know how

So I can absolutely confirm this fork will get English-speaking audio if you pass your cookies

programmeroftheeve · 2017-07-13T20:06:21Z

@jackalblood
It should, assuming the show you're downloading has an English dub you can access, as some of the shows require a premium account to view the dubs.

Here is an example command that I have used:

python -m youtube_dl -n --cookie ~/Downloads/cookies.txt https://www.funimation.com/shows/one-piece/the-beginning-of-the-new-chapter-the-straw-hats-reunited/uncut/\?lang\=english

argument explanation:

-n: use netrc for login credentials.
--cookie: cookie to use
URL: self-explanatory

With this fork, it will first try to use the uncut dub, then fall back to the simulcast, then Japanese, I believe.

programmeroftheeve · 2017-07-13T20:07:15Z

@jackalblood
No problem, took me a little bit to understand as well.

kaithar · 2017-07-19T14:31:51Z

I think adding something to the URL is a really poor way of selecting formats based on language (especially if the site itself does not do this).

@mariuszskon Honestly I agree, the way the language selection is done on funimation is a headache. The website is horribly unreliable in this regard. The android app is actually even worse, I've had issues where I've been casting it to my TV and getting Japanese audio even though the app has English selected.
There does seem to be some content on funimation that my code still isn't handling well... Fairy Tail is my major failing case right now, I just haven't had time to go and check if there is legitimately no dub for the affected episodes.

I've been toying with the idea of either adding support for a ?hardlang= param that will error instead of falling back to alternative languages or making ?lang= enforced even if it could fall back. I'd prefer to enforce language in a better way but I can't see how that works with the format_id bit, I just don't know enough about the code base and I've yet to get much in the way of feedback from core team. I suspect the answer might be that I need to return multiple candidates from the extractor, with appropriate language tags... feedback would be really helpful.

…d formats

kaithar · 2017-07-29T22:12:00Z

Ok, hopefully that will solve the language issues in a more consistent way. I've made it so that if the first experience it requests has a suitable alpha with empty sources then it will request that languages experience and redo the search before falling back further. This fixes this flow:

Request page with ?lang=english
Parse out the experience id, we don't know it yet but we've been given the Japanese audio id
Fetch the experience file and find the episode
Loop the alphas using the fallback order, find that the first match is one without sources
Using the id associated with that language version, fetch the experience file and find the episode
Loop the alphas using the fallback order, ignore alphas without sources
If no sources are found raise an exception

Steps 4 and 5 are new... it might result in requesting the experience file twice in some situations, but mostly it should give a more consistent success in finding the right language.

I've also added the language and language_preference keys to the format return. I've not played with the format selection to see if that's useful, but the information is there at least.

I could make it so that the extractor returns all the alphas rather than selecting what it thinks is the best match, but I have a feeling that that would actually be counter productive and would also result in making even more requests for experience files.

kaithar · 2017-09-03T22:31:52Z

As noted on #14089 it seems that they made some changes to the site.
I've had a look and, well, this is quite the mess they made... seems like they added cloudfront (or, more accurately, added more use of it), moved the api path, split up the experience payload and added a javascript set auth cookie.

I think I've got a working fix figured out but I still need to give it a little work out and remove some localisations.

kaithar · 2017-10-02T15:13:27Z

I think their site changes have settled down again now. I'll see about cleaning up code and pushing it to this PR branch.

kaithar · 2017-10-09T21:50:34Z

Well, I spoke too soon... they added/turned on incapsula anti-bot nonsense ¬_¬
Not sure what I can actually do about that though.

compguy284 · 2017-11-12T11:08:42Z

@kaithar
I was able to use incapsula-cracker-py3 to get around incapsula. After that I have no idea on what to do.

kaithar · 2017-11-13T13:40:39Z

@compguy284 you did? I wasn't having much luck with that, what user agent did you use?

compguy284 · 2017-11-13T15:45:11Z

@kaithar
I've been using the user-agent of the chrome version I have installed.
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36"

If you don't specify one for incapsula-cracker-py3 then it uses something else by default.
"IncapUnblockSession (https://github.com/ziplokk1/incapsula-cracker-py3)"

kaithar · 2017-11-13T18:48:28Z

@compguy284 hmmm, that's odd, I was using Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 which should be reasonably equivalent, but it was just looping.

The bypass I've finally managed to get working has involved writing a reverse proxy that sanitises the headers for every request and then using a hosts file override to send the requests through it.
So far as I can tell, Incapsula has been set up to fingerprint the order your browser sends headers in, then if that doesn't match the expected order for your browser it won't all you through even if you pass the js tests or have a valid session cookie. I arrived at this conclusion after experimenting and finding that modifying the request to send the same headers in a slightly different order would result in incapsula failing even when it was a legit browser making the requests.
The thing I've been puzzling on for a few days is how to make this bypass method more general, since it's currently a separate process and configured to mimic the cookies and characteristics of the version of chrome I'm running. There's also the issue that youtube-dl uses urllib... I had used requests specifically because it documents that giving an OrderedDict for headers will be honoured in the generated request. When I looked at the code for urllib it uses a dict and I'm not too sure how well things will react to being told to use an OrderedDict instead... even if it works, it doesn't help if I end up having to hard code in some headers that don't match the user chosen user agent... and if I hard code the user agent I'd probably be be frowned at. So frustrating.

GinoMan · 2017-12-06T04:24:52Z

Wow... that is frustrating. Any luck in figuring out a workaround?

kaithar · 2017-12-08T16:26:49Z

It looks like they've stopped making major changes to the site and the code I have has been working reliably. I'll need to try disabling my proxy solution... it's worked for a suspiciously long amount of time for a bot detection method. I also need to clean up the code and rebase it against master.
I've been pretty busy the past couple of weeks but I'll try to find some time this weekend to make some progress. I should probably also find some time to separate out a hook patch I've made to allow capturing the output of the progress info.

wingback18 · 2018-05-04T02:19:44Z

hi
I'm getting the same too

Unable to extract al:web:url

iczero · 2019-05-07T03:20:49Z

Super simple fix for incapsula if you're not actually running a botnet or something idk

Download an extension allowing you to export the current domain's cookies into the netscape cookies.txt format used by youtube-dl (for example https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg)
Go to funimation.com in your browser to pass incapsula's js tests
Copy out all the cookies for the domain from the extension and put them into a new file (ex. funimation-cookies.txt)
Pass that cookies file to youtube-dl, along with your original arguments and a matching user-agent

youtube-dl --user-agent "(search google for what is my user agent)" --cookies funimation-cookies.txt <url>

Proceed as normal

DarkHorse-APP2 · 2022-05-25T19:09:13Z

--user-agent

It doesn't work for me.

Fixes for language matching and playlist handling to pull full shows

5ed0a91

dstftw added the pending-fixes label Jun 30, 2017

Add better handling of empty sources, add the language to the returne…

1390824

…d formats

dstftw force-pushed the master branch from 4991699 to 1141e91 Compare August 5, 2017 00:42

kaithar mentioned this pull request Oct 2, 2017

Feature request #13781

Closed

8 tasks

opensiriusfox mentioned this pull request Jan 22, 2018

Funimation: Unable to extract al:web:url #15265

Closed

8 tasks

dstftw force-pushed the master branch from 37318e1 to 65220c3 Compare January 27, 2018 22:49

dstftw force-pushed the master branch from d99bab0 to e118a87 Compare January 23, 2019 18:40

Nz17 mentioned this pull request Apr 12, 2020

Funimation JSON Error #24742

Closed

5 tasks

dstftw force-pushed the master branch from 5e26784 to da2069f Compare September 13, 2020 13:52

shirt-dev mentioned this pull request Mar 8, 2021

[Broken] Funimation yt-dlp/yt-dlp#157

Closed

5 tasks

dirkf force-pushed the master branch from 01bf89e to 4c6fba3 Compare August 26, 2022 07:51

dirkf closed this Aug 1, 2023

dirkf added the defunct PR source branch is not accessible label Oct 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] [Funimation] Fixes for language matching and playlist handling to pull full shows #13515

[WIP] [Funimation] Fixes for language matching and playlist handling to pull full shows #13515

kaithar commented Jun 29, 2017

josegonzalez commented Jun 29, 2017

kaithar commented Jun 29, 2017

dstftw commented Jun 30, 2017

programmeroftheeve commented Jul 8, 2017

kaithar commented Jul 12, 2017

programmeroftheeve commented Jul 12, 2017

jackalblood commented Jul 12, 2017

mariuszskon commented Jul 13, 2017

mariuszskon commented Jul 13, 2017

jackalblood commented Jul 13, 2017

programmeroftheeve commented Jul 13, 2017

programmeroftheeve commented Jul 13, 2017

jackalblood commented Jul 13, 2017

programmeroftheeve commented Jul 13, 2017

jackalblood commented Jul 13, 2017

jackalblood commented Jul 13, 2017

programmeroftheeve commented Jul 13, 2017

programmeroftheeve commented Jul 13, 2017

kaithar commented Jul 19, 2017

kaithar commented Jul 29, 2017

kaithar commented Sep 3, 2017 •

edited

Loading

kaithar commented Oct 2, 2017

kaithar commented Oct 9, 2017

compguy284 commented Nov 12, 2017

kaithar commented Nov 13, 2017

compguy284 commented Nov 13, 2017

kaithar commented Nov 13, 2017

GinoMan commented Dec 6, 2017

kaithar commented Dec 8, 2017

wingback18 commented May 4, 2018

iczero commented May 7, 2019

DarkHorse-APP2 commented May 25, 2022

[WIP] [Funimation] Fixes for language matching and playlist handling to pull full shows #13515

[WIP] [Funimation] Fixes for language matching and playlist handling to pull full shows #13515

Conversation

kaithar commented Jun 29, 2017

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

Description of your pull request and other information

josegonzalez commented Jun 29, 2017

kaithar commented Jun 29, 2017

dstftw commented Jun 30, 2017

programmeroftheeve commented Jul 8, 2017

kaithar commented Jul 12, 2017

programmeroftheeve commented Jul 12, 2017

jackalblood commented Jul 12, 2017

mariuszskon commented Jul 13, 2017

mariuszskon commented Jul 13, 2017

jackalblood commented Jul 13, 2017

programmeroftheeve commented Jul 13, 2017

programmeroftheeve commented Jul 13, 2017

jackalblood commented Jul 13, 2017

programmeroftheeve commented Jul 13, 2017

jackalblood commented Jul 13, 2017

jackalblood commented Jul 13, 2017

programmeroftheeve commented Jul 13, 2017

programmeroftheeve commented Jul 13, 2017

kaithar commented Jul 19, 2017

kaithar commented Jul 29, 2017

kaithar commented Sep 3, 2017 • edited Loading

kaithar commented Oct 2, 2017

kaithar commented Oct 9, 2017

compguy284 commented Nov 12, 2017

kaithar commented Nov 13, 2017

compguy284 commented Nov 13, 2017

kaithar commented Nov 13, 2017

GinoMan commented Dec 6, 2017

kaithar commented Dec 8, 2017

wingback18 commented May 4, 2018

iczero commented May 7, 2019

DarkHorse-APP2 commented May 25, 2022

kaithar commented Sep 3, 2017 •

edited

Loading