Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature innertube client fix #219

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

alive4ever
Copy link

The major change is the addition of mweb innertube client, which includes some refactoring of how base_js is handled and a new trick to decrypt n signature by extracting the relevant decryption code from base_js file, using technique similar to iv-org/inv_sig_helper.

This pull request also introduce three dependencies: fake-useragent to simplify user agent header creation of mobile and desktop browser, flpc to parse nsig decryption regex, and dukpy to execute the extracted nsig decryption code.

Also add the ability to parse visitorData in the YT Api response and specifying own visitorData and poToken pair using properly formatted json in the data/po_token_cache.txt file.

Also several more fixes for android and ios client and make innertube client selectable.

Also several changes in settings.py, notably to allow reloading of tv_embedded client in case of missing player urls and showing Download placeholder via use_video_download option, which credits ~heckyel/yt-local.

This will hopefully fix #218

@alive4ever alive4ever force-pushed the feature-innertube-client-fix branch 2 times, most recently from 1da876d to 5276825 Compare November 5, 2024 15:12
@alive4ever
Copy link
Author

Hi, it's been a while without any feedback.

For n signature solving, dukpy is used here. I tried several python js bindings before coming to this setup.

  • First, I tried importing JSInterpreter from yt-dlp project. This add yt-dlp as dependency, which is huge (more than 20MB) and the js execution time is slow. The nice thing is that JSInterpreter can execute js functions from base.js without extracting the specified function before.
  • The second is py-mini-racer. It is slightly faster than JSInterpreter from yt-dlp but the package is rather huge (18 MB of site-package files). It requires the js function to be extracted.
  • The third is dukpy, which I consider very good with smaller site-package than py-mini-racer (9MB). It also needs the js function to be extracted.
  • The fourth is combining quickjs with jsengine, which is fastest and smallest (2MB). It can call js function directly from base.js. The downside is there is no pre-built wheel for quickjs module on aarch64 which results in source install for aarch64, so I stick to dukpy for this reason (pre-built wheels and faster aarch64 install.

@user234683
Copy link
Owner

user234683 commented Nov 8, 2024 via email

@alive4ever
Copy link
Author

Thanks for the feedback. It gives a peace of mind for me.

No need to rush. I am open for any improvement suggestions to this pull request.

youtube/util.py Outdated
print('Unable to access ' + player_file)

signature_timestamp = None
signature_timestamp_cache = settings.data_dir + '/sts_' + player_version + 'txt'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend using os.path.join here. Also, .txt, not txt

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late reply.

I've used os.path.join in place of string concatenation in this file and similar places.

youtube/util.py Outdated
response_dict = json.loads(response)
if settings.use_visitor_data:
if not settings.use_po_token:
if response_dict['responseContext'].get('visitorData'):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't assume 'responseContext' will be present - otherwise it will raise an exception when youtube changes something.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put this inside try ... except block, with specific KeyError exception message.

youtube/util.py Outdated
if settings.use_visitor_data:
if not settings.use_po_token:
if response_dict['responseContext'].get('visitorData'):
if not os.path.exists(visitor_data_file):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how the visitor system works - but do we want to refresh this file ever? Maybe YouTube issues an updated token for example and marks the old one as invalid?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added os.path.getmtime check to make sure that the visitorData.txt file is less than 86400 seconds old before using its content. Otherwise, the visitor data file will be deleted and replaced with new one.

youtube/util.py Outdated
else:
if os.path.exists(visitor_data_file):
print('Removing visitor_data file')
os.remove(visitor_data_file)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anonymity's sake - do we want to consider refreshing the visitor data every day? Again, not really sure what constraints go into it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, as I mentioned above.

server.py Outdated
with open(visitor_data_file, "r") as file:
visitor_data = file.read()
file.close()
except:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use except Exception, otherwise you'll catch KeyboardInterrupt and SystemExit: https://stackoverflow.com/questions/54948548/what-is-wrong-with-using-a-bare-except

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added except OSError to notify if there is a file access error which prevents access to the visitor data file.


def extract_nsig_func(base_js):
for i, member in enumerate(NSIG_FUNCTION_ARRAYS):
func_array_re = regex.compile(member.replace('$', '\\$'))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend against this; I would just put the three \ escapes you need into your regex instead of modifying it at runtime

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only tried doing what inv_sig_helper does, copying exactly the same regex pattern with runtime replacement of escaping dollar sign which only done once for as long as the extracted nsig_func_{player_version}.js file exists.

The resulting n_sig_code is cached as data/nsig_func_{player_version}.js and loaded as info['nsig_func'] = { player_version: js_nsig_decrypt_code } during runtime of the youtube-local` session.

So the n_sig_function extraction is only done once and the subsequent access to it is either loaded directly from the info['nsig_func'] dict or loaded from nsig_func_{player_version}.js file if the file is already exists.

func_body_re = []
for i, member in enumerate(NSIG_FUNCTION_ENDINGS):
func_body_re_item = ''
func_body_re_item += func_context.group(1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to do a re.escape() on this before appending it to your regexes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 1111 to 1107
print('jscode len is: ' + str(len(jscode)))
dukpy_session = dukpy.JSInterpreter()
# Loading the function into dukpy session
dukpy_session.evaljs(jscode)
print('n_sig = ' + n_sig)
#n_sig_result = dukpy_session.evaljs('decrypt_nsig("' + n_sig + '")')
n_sig_result = dukpy_session.evaljs("decrypt_nsig(dukpy['n'])", n=n_sig)
print('n_sig_result = ' + n_sig_result)
return n_sig_result
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we verified that dukpy has limited execution privileges? For instance, can javascript code executed with Dukpy make network requests or open files? If so it would be a massive security hole

Also recommend removing these debugging print statmenets when you're done

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dukpy is just a python wrapper for duktape js engine.

The pypi package of dukpy has dukpy-install command which is able to download npm packages from the internet.

Unless told to do so, dukpy module doesn't access the internet for as far as I know. The nsig_func doesn't need access to the internet during runtime, which I have verified doing manual n_sig decryption using various python js bindings.

I also consider dukpy as just-work lightweight js engine for python, since it has wheels for arm64 on pypi and armhf on piwheels.org so if anyone runs this on their single board computers, they will hopefully meet no problems during runtime.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming that this works on arm64.

requirements.txt Outdated
@@ -6,3 +6,6 @@ urllib3>=1.24.1
defusedxml>=0.5.0
cachetools>=4.0.0
stem>=1.8.0
fake-useragent>=1.5.1
flpc>=0.2.5
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you do any performance testing that suggested the need for this? Is there a noticeable speedup? Would rather avoid dependencies if possible

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With no break statement, the re module will hang for some time. flpc will not hang in finding the specified regex, even without break statement in the for loop.

I added the break statement in the for loop so the regex engine will be freed from work (i.e. testing another regex pattern) after a match is found, which mitigates hanging on the built-in re module.

I've removed flpc from the requirements to use the built-in re module as you wish, with very small or no performance degradation during my extended testing.

youtube/util.py Outdated
Comment on lines 911 to 915
print("Debugging headers")
for item in headers:
print(item)
print("Debugging data payload")
print(json.dumps(data, indent=4))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend removing these debugging print statmenets

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@sapnolas
Copy link

Any chance you could make a version of this similar to the youtube local main branch that uses Python 3.6 or earlier so it can run on Windows 7?

@alive4ever
Copy link
Author

Any chance you could make a version of this similar to the youtube local main branch that uses Python 3.6 or earlier so it can run on Windows 7?

I actually tried several times to also build packages for oldwin (32 bit) as found on my github-actions history with no success. Tried several version of python, from oldwin to 3.8.7 with failure regarding finding _socks module.

So I reverted my github action recipe to only build for python 3.11 on Windows.

Add a setting option to enable video download. Downloading is disabled by default.
Use fake useragent library instead of hardcoded useragent.
Allow selecting innertube client via settings page.
Update ios and web innertube client context.
Make the loading of age_restricted contents configurable.
Disabled by default.
Add functional mweb client. Also several features included:
- Fix sig decryption code
- Add nsig decryption with function body extraction taken from
  iv-org/inv_sig_helper, using dukpy to execute nsig js decryption code.
- Add support to use visitorData from api response
- Add support to use poToken by saving a json file as
  'po_token_cache.txt':
  {
    "visitorData": "long_base64_visitorData_value",
    "poToken": "long_base64_poToken_value"
  }
- More consistent request headers around several module: server.py,
  channel.py, and comments.py
Avoid removing dist-info directories so fake-useragent can start, since
the module will call importlib.metadata on init.

Also use HEAD instead of master during archive copying step to allow
generate_release.py to run on branch other than master.
- use os.path.join to access visitorData.txt and signature timestamp cache file
- remove visitorData.txt if file is more than 24h old.
- try accessing responseContext and raise exception on KeyError
- use os.path.join to define visitorData.txt file
- use OSError exception instead of blank exception
- remove flpc import
- add re.escape to extracted function name before appending to list
- remove debugging messages
remove flpc from requirements, since built-in python
regex engine is good enough.
Return error message when n signature is None
Remove debugging messages during yt-api request
Only send visitor data header to specific google domains.
@alive4ever alive4ever force-pushed the feature-innertube-client-fix branch from f50cae3 to c7e1cac Compare November 24, 2024 02:05
Update ios innertube client context.
Update to require fake-useragent>=2.0.0
Update player version for innertube clients.
Emergency fix for player 3bb1f723 to fix both signature and `n` query
parameter decryption.

Credits to /yt-dlp team.
Fix wrong indentation in po_token conditinal statements.
@alive4ever
Copy link
Author

Update for today:

Currently experiencing 403 errors with mweb client after ±1 minutes of video playback on formats other than integrated 360p format. Still haven't found its cause.

Full video can only be played on mweb with integrated 360p format (itag 18).

@alive4ever
Copy link
Author

alive4ever commented Dec 17, 2024

Update for today:

Currently experiencing 403 errors with mweb client after ±1 minutes of video playback on formats other than integrated 360p format. Still haven't found its cause.

Full video can only be played on mweb with integrated 360p format (itag 18).

Update for today:

It seems that the cause of 1 minute playable stream is mweb client started to require po_token. According to BgUtils author, this is case 2 of mandatory po_token use.

When to Use a PoToken

YouTube's web player checks the "sps" (StreamProtectionStatus) of each media segment request (only if using UMP or SABR; our browser example uses UMP) to determine if the stream needs a PoToken.

Status 1: The stream is either already using a PoToken or does not need one.
Status 2: The stream requires a PoToken but will allow the client to request up to 1-2 MB of data before interrupting playback.
Status 3: The stream requires a PoToken and will interrupt playback immediately.

Adding data/po_token_cache.txt and enabling settings.use_po_token solves this issue for me.

Btw, po_token_cache.txt can be extracted from browser or created using bgutils or similar tools.

Any improvement suggestion is appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Youtube-local doesn't work for some videos, mainly music videos
4 participants