Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] [Funimation] Fixes for language matching and playlist handling to pull full shows #13515

Closed
wants to merge 2 commits into from

Conversation

kaithar
Copy link

@kaithar kaithar commented Jun 29, 2017

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense

What is the purpose of your pull request?

  • Bug fix
  • Improvement

Description of your pull request and other information

First off, I've skimmed the docs, but I've not actually checked if I'm following them. This is a WIP I'm submitting so that others can test it and suggest errors. I'm also not certain it'll pass tests, if it doesn't I'll fix. That said, I've been using and testing this code for a while now, and I think I've ironed out all the bugs and corner cases, so it does work even if it's not the most polished PR. I've also left in some debug and informational print statements. These should probably be converted to something more in keeping with the style guidelines but I'm not familiar enough with the overall code base to know what.

Second, there is some overlap between this PR and #9515 ... I'm not sure how much but I've definitely produced a competing show playlist solution. I doubt both will merge and it looks long abandoned, but I'd like to look at picking up some of the changes from that PR in this one if that's ok.

So, disclaimers done, on to what this pull is aiming at doing. As mentioned in #13225 I've been fixing some issues with the Funimation extractor.

Firstly, I've implemented a full show playlist extractor. I've made some assumptions in it, but I think they're fair. Specifically:

  1. I've only implemented logic to handle the US and UK versions of Funimation's site. There appears to be others, based on their own code, but I've not used them and can't access them anyway.
  2. I've hard coded some language targeting logic but it only targets English and Japanese. There might be some others available on the site, but it seems like an acceptable choice.
  3. The code doesn't attempt to validate the requested options, only to replicate them to the correct episode URLs.

Secondly, I've given the actual episode extractor an overhaul. It appears Funimation aren't too picky about what language they initially offer you, and they're entirely consistent, so I've had to build code to attempt to work out the best option.

The logic is that it pulls out the simulcast or uncut value from the url, the language choice if one is given, then tries to find a source that matches both those values and the episode the url loads. Failing that it first tries the desired language with each of 'uncut', 'simulcast', 'extras', then it repeats the desired alpha and those 3 fall backs for English then Japanese. It accepts the first combination it finds a valid source entry for, since there are cases of episodes where those combinations are available but list no sources.

Points 1 and 2 for the playlist extractor are also valid for the episode extractor.

There are some oddities in the episode numbering, but unfortunately those seem to be related more to the numbering on the site than the extractor code.

There are some features I want to add yet, adding language to the info dict is top of my list, but it seems ready for review and someone asked me to post it.

@josegonzalez
Copy link

Forgive my stupidity, but how do you specify the language here? None of the parameters on youtube-dl seem to show a way to set the audio language, just the subtitles.

@kaithar
Copy link
Author

kaithar commented Jun 29, 2017

@josegonzalez Funimation episodes, when accessed in the browser, use the lang querystring to select the language you want, so I'm using the same method. The regex looks for an optional pattern along the lines of \?lang=(english|japanese) at the end of the url. Funimation itself doesn't support a url selection for entire shows but I've applied the same querystring method to them for allowing input.

It would be nice if there were an option, maybe an extension of -f, that allowed language selection to be passed to the extractor. There's a format_id add on for -f that I saw mentioned in #8891 but I've no idea how or if that actually works with the extractors. To my mind, the format supported by the upstream site should probably win out over options though, if following the principle of least surprise.

@dstftw
Copy link
Collaborator

dstftw commented Jun 30, 2017

Read

youtube-dl coding conventions

and fix all issues.

@programmeroftheeve
Copy link

I was messing around with this fork and found something interesting when using my account. Downloading the webpage with youtube-dl shows the KANE_customdimentions as a new customerType;

var KANE_customdimensions = {
        'event':'set-customdimensions',
        'requesttime': Date(),
        'territory': 'US',
        'sessionCookie': getCustomDimensionCookie('PIsession'),
        'userCookie': getCustomDimensionCookie('PIuser'),
        
            'customerType': 'New',
        
    }

However when using cookies gathered from chrome I am getting:

var KANE_customdimensions = {
        'event':'set-customdimensions',
        'requesttime': Date(),
        'territory': 'US',
        'sessionCookie': getCustomDimensionCookie('PIsession'),
        'userCookie': getCustomDimensionCookie('PIuser'),
        
            'userCountry': 'US',
            'planName':'Premium - Yearly',
            'userAccessLevel':'Premium',
            'autoPlay':'false',
            'customerType': 'Repeat',
            'audioPreference':'english_dubbed',
            'restrictmatureContent':'false',
            'qualityPreference':'hd_720',
            'userID':'256670',
            
            'customerGender':'None',
            'customerAge':'23',
        
    }

This helps with getting the correct formats.

@kaithar
Copy link
Author

kaithar commented Jul 12, 2017

@btaidm are you using netrc to login from youtube-dl?

@programmeroftheeve
Copy link

@kaithar I am passing login info from the command line. I can try using netrc to see if that would work without passing the cookie file.

@jackalblood
Copy link

Hi guys sorry to post here but I really didn't know where else to go

I'm a self Internet taught Linux user and finally managed to get funimation working with youtube-dl now as you may have guessed I have no idea how to pass the argument to youtube-dl to grab the English language dub and by default it seems to only grab Japanese

I see the links are appended with /a=1 for English so I assumed that would grab the English dub but it doesn't

I've read around and to be totally honest I'm lost on what to do or if it's even been implemented yet since I see it's a work in progress

I've tried adding lang=english too but to no avail

Any help for a novice user would be greatly appreciated thank you

@mariuszskon
Copy link

@jackalblood This is probably not the best place to discuss this, but by browsing the code, it appears you need to add ?lang=mylang to the very end of the URL where mylang is either english or japanese. However, it should default to english. Therefore, if you are still getting the Japanese, that means either one of two things:

  1. The extractor is not working as intended
  2. Funimation is not providing you with an English dub anyway (not available in your region etc.) You should check if you can get the dub through your browser first.
  3. You are using the official youtube-dl and not this fork! Either wait for it to be (myabe) merged, or make sure you get the fork itself.

@mariuszskon
Copy link

Funimation itself doesn't support a url selection for entire shows but I've applied the same querystring method to them for allowing input.

I think adding something to the URL is a really poor way of selecting formats based on language (especially if the site itself does not do this). youtube-dl should either add language as a format selector to prevent confusion, or extractors like this one should make a format_id which includes the language (as per @kaithar's mention of #8891). Yes, it does work.

@jackalblood
Copy link

I'm guessing option 3 is my issue currently since I can verify 1 and 2 I'll look into how to change my current fork since I'm using a pip install just one of them things I just didn't think of thanks very much for the information I won't take up any more of your time thank you again

@programmeroftheeve
Copy link

@jackalblood
The easiest thing to do is to clone @kaithar's fork and run youtube-dl following the development instructions.

cd /path/to/cloned/repo
python -m youtube_dl args...

or

PYTHONPATH=/path/to/clone/repo python -m youtube_dl args...

@programmeroftheeve
Copy link

@kaithar
Switching to the netrc file didn't change anything.

@jackalblood
Copy link

@btaidm that definitely got me working with the correct fork now but the problem is still persisting I've uninstall all other versions just to be sure and with or without the ?lang=english I'm getting Japanese so either I'm missing something or it's just not working out for me

@programmeroftheeve
Copy link

@jackalblood
The easiest way to fix that would to login to your funimation account through your webbrowser, and then download the cookies for funimation.

For chrome, I use the cookies.txt extension.

Then pass the cookie as an argument to youtube-dl

python -m youtube_dl --cookie /path/to/cookie.txt args...

@jackalblood
Copy link

@btaidm I didn't even realise I could do that I'll give it a go as soon as I'm home thank you will the cookies allow me to pull a certain language then?

@jackalblood
Copy link

@btaidm That was the solution! Thank you very much for taking the time to walk me through it all it was really very simple if you know how

So I can absolutely confirm this fork will get English-speaking audio if you pass your cookies

@programmeroftheeve
Copy link

@jackalblood
It should, assuming the show you're downloading has an English dub you can access, as some of the shows require a premium account to view the dubs.

Here is an example command that I have used:

python -m youtube_dl -n --cookie ~/Downloads/cookies.txt https://www.funimation.com/shows/one-piece/the-beginning-of-the-new-chapter-the-straw-hats-reunited/uncut/\?lang\=english 

argument explanation:

  • -n: use netrc for login credentials.
  • --cookie: cookie to use
  • URL: self-explanatory

With this fork, it will first try to use the uncut dub, then fall back to the simulcast, then Japanese, I believe.

@programmeroftheeve
Copy link

@jackalblood
No problem, took me a little bit to understand as well.

@kaithar
Copy link
Author

kaithar commented Jul 19, 2017

I think adding something to the URL is a really poor way of selecting formats based on language (especially if the site itself does not do this).

@mariuszskon Honestly I agree, the way the language selection is done on funimation is a headache. The website is horribly unreliable in this regard. The android app is actually even worse, I've had issues where I've been casting it to my TV and getting Japanese audio even though the app has English selected.
There does seem to be some content on funimation that my code still isn't handling well... Fairy Tail is my major failing case right now, I just haven't had time to go and check if there is legitimately no dub for the affected episodes.

I've been toying with the idea of either adding support for a ?hardlang= param that will error instead of falling back to alternative languages or making ?lang= enforced even if it could fall back. I'd prefer to enforce language in a better way but I can't see how that works with the format_id bit, I just don't know enough about the code base and I've yet to get much in the way of feedback from core team. I suspect the answer might be that I need to return multiple candidates from the extractor, with appropriate language tags... feedback would be really helpful.

@kaithar
Copy link
Author

kaithar commented Jul 29, 2017

Ok, hopefully that will solve the language issues in a more consistent way. I've made it so that if the first experience it requests has a suitable alpha with empty sources then it will request that languages experience and redo the search before falling back further. This fixes this flow:

  1. Request page with ?lang=english
  2. Parse out the experience id, we don't know it yet but we've been given the Japanese audio id
  3. Fetch the experience file and find the episode
  4. Loop the alphas using the fallback order, find that the first match is one without sources
  5. Using the id associated with that language version, fetch the experience file and find the episode
  6. Loop the alphas using the fallback order, ignore alphas without sources
  7. If no sources are found raise an exception

Steps 4 and 5 are new... it might result in requesting the experience file twice in some situations, but mostly it should give a more consistent success in finding the right language.

I've also added the language and language_preference keys to the format return. I've not played with the format selection to see if that's useful, but the information is there at least.

I could make it so that the extractor returns all the alphas rather than selecting what it thinks is the best match, but I have a feeling that that would actually be counter productive and would also result in making even more requests for experience files.

@kaithar
Copy link
Author

kaithar commented Sep 3, 2017

As noted on #14089 it seems that they made some changes to the site.
I've had a look and, well, this is quite the mess they made... seems like they added cloudfront (or, more accurately, added more use of it), moved the api path, split up the experience payload and added a javascript set auth cookie.

I think I've got a working fix figured out but I still need to give it a little work out and remove some localisations.

@kaithar
Copy link
Author

kaithar commented Oct 2, 2017

I think their site changes have settled down again now. I'll see about cleaning up code and pushing it to this PR branch.

@kaithar kaithar mentioned this pull request Oct 2, 2017
8 tasks
@kaithar
Copy link
Author

kaithar commented Oct 9, 2017

Well, I spoke too soon... they added/turned on incapsula anti-bot nonsense ¬_¬
Not sure what I can actually do about that though.

@compguy284
Copy link

@kaithar
I was able to use incapsula-cracker-py3 to get around incapsula. After that I have no idea on what to do.

@kaithar
Copy link
Author

kaithar commented Nov 13, 2017

@compguy284 you did? I wasn't having much luck with that, what user agent did you use?

@compguy284
Copy link

@kaithar
I've been using the user-agent of the chrome version I have installed.
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36"

If you don't specify one for incapsula-cracker-py3 then it uses something else by default.
"IncapUnblockSession (https://github.com/ziplokk1/incapsula-cracker-py3)"

@kaithar
Copy link
Author

kaithar commented Nov 13, 2017

@compguy284 hmmm, that's odd, I was using Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 which should be reasonably equivalent, but it was just looping.

The bypass I've finally managed to get working has involved writing a reverse proxy that sanitises the headers for every request and then using a hosts file override to send the requests through it.
So far as I can tell, Incapsula has been set up to fingerprint the order your browser sends headers in, then if that doesn't match the expected order for your browser it won't all you through even if you pass the js tests or have a valid session cookie. I arrived at this conclusion after experimenting and finding that modifying the request to send the same headers in a slightly different order would result in incapsula failing even when it was a legit browser making the requests.
The thing I've been puzzling on for a few days is how to make this bypass method more general, since it's currently a separate process and configured to mimic the cookies and characteristics of the version of chrome I'm running. There's also the issue that youtube-dl uses urllib... I had used requests specifically because it documents that giving an OrderedDict for headers will be honoured in the generated request. When I looked at the code for urllib it uses a dict and I'm not too sure how well things will react to being told to use an OrderedDict instead... even if it works, it doesn't help if I end up having to hard code in some headers that don't match the user chosen user agent... and if I hard code the user agent I'd probably be be frowned at. So frustrating.

@GinoMan
Copy link

GinoMan commented Dec 6, 2017

Wow... that is frustrating. Any luck in figuring out a workaround?

@kaithar
Copy link
Author

kaithar commented Dec 8, 2017

It looks like they've stopped making major changes to the site and the code I have has been working reliably. I'll need to try disabling my proxy solution... it's worked for a suspiciously long amount of time for a bot detection method. I also need to clean up the code and rebase it against master.
I've been pretty busy the past couple of weeks but I'll try to find some time this weekend to make some progress. I should probably also find some time to separate out a hook patch I've made to allow capturing the output of the progress info.

@wingback18
Copy link

hi
I'm getting the same too

Unable to extract al:web:url

@iczero
Copy link

iczero commented May 7, 2019

Super simple fix for incapsula if you're not actually running a botnet or something idk

  1. Download an extension allowing you to export the current domain's cookies into the netscape cookies.txt format used by youtube-dl (for example https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg)
  2. Go to funimation.com in your browser to pass incapsula's js tests
  3. Copy out all the cookies for the domain from the extension and put them into a new file (ex. funimation-cookies.txt)
  4. Pass that cookies file to youtube-dl, along with your original arguments and a matching user-agent
youtube-dl --user-agent "(search google for what is my user agent)" --cookies funimation-cookies.txt <url>
  1. Proceed as normal

@DarkHorse-APP2
Copy link

--user-agent

It doesn't work for me.

@dirkf dirkf closed this Aug 1, 2023
@dirkf dirkf added the defunct PR source branch is not accessible label Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defunct PR source branch is not accessible pending-fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.