Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generation of authorization headers used to access frontend api | Error when generating token fails #16

Closed
15532th opened this issue Sep 2, 2023 · 8 comments
Labels
help wanted Extra attention is needed

Comments

@15532th
Copy link

15532th commented Sep 2, 2023

I suggest that this issue is used to keep track of currently ongoing situation with "x-web-authorizekey" header, so that pieces of information are not scattered around multiple closed PR but present in one place.

Twitcasting uses "x-web-sessionid" and "x-web-authorizekey" client headers to confirm that request to frontendapi subdomain endpoints come from legitimate clients. While first is simple a plaintext value, embedded in html on channel's home page, second is generated based on value (salt, used among other variables to calculate hashsum used as part of "x-web-authorizekey" header) produced by executing some Javascript code in PlayerPage2.js. Since the code is minified and to some extent obfuscated, parsing it might be challenging.

First attempt to handle it by hardcoding the salt value was made in #7. Then it became apparent that it gets changed every few weeks along with updates of normal code in PlayerPage2, and code was updated at #12 to use new value. It worth noting that old salt doesn't become invalid immediately after PlayerPage2.js changes, and keeps being accepted for some time along with a new one.

Then #14 added code for automated extraction, using regexp to extract salt-related code from PlayerPage2.js and javascript module to eval() it using nodejs to get salt value. Regular expression it used was too strict and stopped matching after PlayerPage2.js update and it got reverted back to still valid but now outdated hardcoded value in #15, which is likely to also stop working soon.

@15532th 15532th mentioned this issue Sep 2, 2023
@ef1500 ef1500 pinned this issue Sep 8, 2023
@Trung0246
Copy link

Trung0246 commented Sep 8, 2023

I think streamlink did encounter something similar streamlink/streamlink#5370 (check the pull request mentioned at the bottom)

Maybe worth open an issue over there since they have the web browser js execution thing implemented in case of js script may become heavily obfuscated in the future.

@15532th
Copy link
Author

15532th commented Sep 8, 2023

Hello and thank you for bringing it up.

The issue with twitch seems to be way worse than what we have on hands here. As mentioned in streamlink/streamlink#5380, it seems to deal with it by using its own python implementation of Chrome CDP. It depends on other parts of streamlink codebase and is likely not easy to extract for use as a standalone library. If things ever become as bad as they are described in that twitch issue, we're probably better off using streamlink plugin to utilize CDP interface.

That said, both streamlink and yt-dlp plugins seems to not be using llfmp4 stream source at all and both support hls, which currently doesn't need token for frontendapi endpoints in the first place, and the hls is much better fit for slow/unstable connections than websocket stream, so we probably should take closer look at it instead. @ef1500, from comments in TwitcastWebsocket it seems the attempt was made to use it before and it didn't work out?

Maybe worth open an issue over there

Not sure I follow on this part, what kind of issue are you suggesting to open?

@ef1500
Copy link
Member

ef1500 commented Sep 8, 2023

Yeah, sometimes Twitcasting will opt to use an m3u8 stream to fill in the gaps if your internet is laggy, so that way you can get a smooth streaming experience. However, in some streams, depending on some unknown basis, this playlist is either nonexistent or serves no purpose.

Not only that, but if my recollection serves me correctly, when I last tried to mess around with it, the quality of the m3u8 also varied drastically from the actual stream. This is partially because the m3u8 binds .ts videos together instead of just providing a raw stream for us to work with. If you're recording a 10-hour stream, unless you have the computational power to somehow re-encode the .ts videos on the fly, it's going to be a nightmare re-encoding the whole stream after it ends. So I decided to lay off on using the m3u8 method and strictly adhere to the websocket stream from twitcasting.

@15532th
Copy link
Author

15532th commented Sep 12, 2023

Not sure I understand what filling the gaps means. Does player switch between sources on the fly or use two simultaneously?

Can you confirm that it was playlist link itself going 404 and not specific fragment urls? I tried a few and both meta playlist and three quality playlists were always present, but I did get 404 on fragments that are older than a few seconds.

From streamlink plugin code it seems that hls stream has two re-encoded quality playlist, 64k and 220k, and source one, with exact bitrate varying between different streams. It doesn't seems any different from how websocket streams are.

There shouldn't be any issue with re-encoding .ts file neither after the stream, since it is exactly how ytarchive does things, nor on the fly, since streamlink seems to be able to pass fragments directly to vlc to play without a problem.

That said, frontend doesn't use endpoint for hls playlists so it migh eventually disappear just like HappyToken did, but websockets tend to have all that stuttering if connection latency is not low enough, so if it's possible to use hls even temporarily it would still be worth it.

@ef1500
Copy link
Member

ef1500 commented Sep 14, 2023

Not sure I understand what filling the gaps means. Does player switch between sources on the fly or use two simultaneously?

I don't have the time or means to re-investigate, but it seemed like that when I had initially experimented with programming a downloader. It may be different now, I'm not sure.

Can you confirm that it was playlist link itself going 404 and not specific fragment URLs?

Again, I'd love to, but I don't have the time or the means currently. I'm really sorry, I don't mean to disappoint.

There shouldn't be any issue with re-encoding .ts file neither after the stream, since it is exactly how ytarchive does things, nor on the fly, since streamlink seems to be able to pass fragments directly to vlc to play without a problem.

That's great news then! We might be able to get the tool working again, then.

That said, frontend doesn't use endpoint for hls playlists so it migh eventually disappear just like HappyToken did, but websockets tend to have all that stuttering if connection latency is not low enough, so if it's possible to use hls even temporarily it would still be worth it.

I'm all for a solution to our ongoing problem; I think the best approach would be a dynamic, adaptive approach. Just to throw an idea out there to brainstorm a little, an ideal solution would:

  1. Asses the webpage and tell us if anything changed (so people can open a new issue and tell us what's changed so we can fix it faster), similar to yt-dlp
  2. If encoding the .ts files on the fly isn't a big deal, we can use the HLS as our primary method of download, but still keep everything else on the backburner (meaning, the program can still attempt to download using the old methods at the user's request. Or even at the program's discretion, provided the user enables some argument that lets the program decide).

My living situation most likely won't change until around Christmas break (and only for a time). I will be able to use my PC rig and put more effort into the program around Christmastime. Until then, I can't do too much code-wise. In the meantime, I'll do my best to stay on top of pull requests, issues, and community feedback.

@ef1500 ef1500 added the help wanted Extra attention is needed label Sep 30, 2023
@15532th
Copy link
Author

15532th commented Nov 20, 2023

Surprising it took that long, but salt extraction code broke because one of regular expressions had a variable name hardcoded. #18 should deal with it.

There is another regexp with plain variable name in get_encoded_array(), but I would rather leave it as is while it works, because risk of false positive seems too high with how general the expression is.

@ef1500
Copy link
Member

ef1500 commented Nov 27, 2023

Merged it the other day, just tried it on a fresh clone, worked flawlessly. I'll close this issue but keep it pinned for the time being in case it comes back or other individuals run into similar issues with other tools.

Thank you so very much for all the time you've poured into this, I sincerely appreciate it. This tool would have been dead in the water if it weren't for your help. Cheers.

@ef1500 ef1500 closed this as completed Nov 27, 2023
@15532th
Copy link
Author

15532th commented Nov 29, 2023

If this issue is getting pinned, perhaps it would make sense changing the title to include error message shown when requesting the token fails? Exception message in GetSalt is

twitcasting.TwitcastAPI.TwitcastingAPIError: Failed to extract value used to generate authorization headers from PlayerPage.

But then there is error that happens when secret word for password-protected stream is wrong or not provided:

Password was not accepted for livestream
Unable to extract session-id from user page
Unable to generate authorization headers: no session id provided                                                                                              
Got status code 400 when requesting token
Traceback (most recent call last):                                                                                                                                                        
    [...] 
twitcasting.TwitcastAPI.TwitcastingAPIError: Error parsing token: KeyError('token'). Raw token data: '{"error":{"code":3013,"message":"Invalid Request"}}'        

The cause might not be immediately obvious, as it gets hidden by a long stacktrace, and the last line says something about token. Should the exception be thrown earlier, in GetAuthSessionID to try avoiding possible confusion?

Original judgement against it was made when generating authorization headers wasn't necessarily for obtaining the token, so if GenerateAuthHeaders fails because of some change in stream page layout or anything else, it would break not just password-protected but all downloads without a good reason, so attempt to request the token is made even if generating headers has failed.

Currently failure to obtain session-id essentially means download is impossible, so raising there is fine, but it might change again if using HLS gets implemented.

@ef1500 ef1500 changed the title Generation of authorization headers used to access frontend api Generation of authorization headers used to access frontend api | Error when generating token fails Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants