Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve youtube api calls #1985

Merged
merged 5 commits into from
Jun 7, 2021

Conversation

SamantazFox
Copy link
Member

@SamantazFox SamantazFox commented Apr 7, 2021

See #1981 and #1929 (comment) for previous discussions on the subject.

This PR is only a partial job, as some code cleaning and some other fixes are required before continuing. See #1985 (comment) and #1985 (comment) for further details.

@SamantazFox SamantazFox force-pushed the improve-youtube-api-helper branch 2 times, most recently from 05acacc to 2a23501 Compare April 7, 2021 04:40
@unixfox
Copy link
Member

unixfox commented Apr 9, 2021

Thank you for your work. Great improvements. That's the way to go in order to reduce the amount of reCAPTCHA and have a more stable way of fetching the data from YouTube.

@SamantazFox
Copy link
Member Author

Thank you for your work. Great improvements. That's the way to go in order to reduce the amount of reCAPTCHA and have a more stable way of fetching the data from YouTube.

No probs ^^

I'm currently facing an issue, tho: the actual way of fetching video infos (ytInitialData and ytInitialPlayerResponse) is a bit tricky because cookies are saved, too, and there are workarounds for region locked videos.

@AudricV
Copy link
Contributor

AudricV commented Apr 10, 2021

@SamantazFox Note also that there are two endpoints used for videos: player (in the JSON, the streamingData is present for example) and next (for additional stats and next videos) (same body for the POST request). Note that on the player endpoint, the cipher algorithm is different so cipher protected URLs will be broken (streams return 403 with my tests when I tried an implementation in NewPipe Extractor) unless you change the cipher decryption in Invidious.

@SamantazFox
Copy link
Member Author

@SamantazFox Note also that there are two endpoints used for videos: player (in the streamingData is present for example) and next (for additional stats and next videos) (same body for the POST request).

Thanks! For now, I'm only interested in using the player endpoint, as next video(s) and other stats are not used in invidious (Invidious relies on playlist data to know what to play next in the case of a playlist, and doesn't autoplay next video in other cases).

@AudricV
Copy link
Contributor

AudricV commented Apr 10, 2021

When I said additional stats, it's likes/dislikes for example and the next videos, which are part of the watch page of Invidious, even if they are not from a playlist.

@FireMasterK
Copy link
Contributor

FireMasterK commented Apr 10, 2021

There's not rate limit according to my testing. (I tested 35k requests).

You may be interested in using the mobile client name and version to remove the need for signature ciphers for the API. (ANDROID and 16.02.35)

Another thing I noticed - Invidious has support for protobuf requests?! You can replicate the exact android app for even api stablity with that!

@SamantazFox
Copy link
Member Author

When I said additional stats, it's likes/dislikes for example and the next videos, which are part of the watch page of Invidious, even if they are not from a playlist.

Oh, okay, nice! Thanks for the clarification!

There's not rate limit according to my testing. (I tested 35k requests).

Nifty :D

You may be interested in using the mobile client name and version to remove the need for signature ciphers for the API. (ANDROID and 16.02.35)

What do you mean by "signature ciphers"?

Another thing I noticed - Invidious has support for protobuf requests?! You can replicate the exact android app for even api stablity with that!

Yes, thanks to Omar Roth's protodec! I don't have the app, so if you can povide the different protobuf objects on Android (along with metadata, like playlist ID, videoID, cannel ID, etc...), that would be cool!

We could also randomize between the two, to reduce even more the risk of rate limiting.

@AudricV
Copy link
Contributor

AudricV commented Apr 11, 2021

What do you mean by "signature ciphers"?

See https://github.com/iv-org/invidious/blob/master/src/invidious/helpers/signatures.cr. It's for the protected contents, especially music contents (YouTube Music tracks, videoclips, ...).

@SamantazFox
Copy link
Member Author

SamantazFox commented Apr 11, 2021

See https://github.com/iv-org/invidious/blob/master/src/invidious/helpers/signatures.cr. It's for the protected contents, especially music contents (YouTube Music tracks, videoclips, ...).

Oooh, right! From my experience with invidious (about a year or so), reading of protected videos never worked. I guess that's yet another entire subject.

Also, from my understanding, we will alway need those ciphers, as they're required to get the video stream from youtube, no matter wich API we use to get JSON:

fmt_stream = info["streamingData"]?.try &.["formats"]?.try &.as_a.map &.as_h || [] of Hash(String, JSON::Any)
fmt_stream.each do |fmt|
if s = (fmt["cipher"]? || fmt["signatureCipher"]?).try { |h| HTTP::Params.parse(h.as_s) }
s.each do |k, v|
fmt[k] = JSON::Any.new(v)
end
fmt["url"] = JSON::Any.new("#{fmt["url"]}#{DECRYPT_FUNCTION.decrypt_signature(fmt)}")
end
fmt["url"] = JSON::Any.new("#{fmt["url"]}&host=#{URI.parse(fmt["url"].as_s).host}")
fmt["url"] = JSON::Any.new("#{fmt["url"]}&region=#{self.info["region"]}") if self.info["region"]?
end

@AudricV
Copy link
Contributor

AudricV commented Apr 11, 2021

Not really, see TeamNewPipe/NewPipeExtractor#562.

@TheFrenchGhosty TheFrenchGhosty added the unfinished More work is needed on this PR, or on something this PR uses. label Apr 13, 2021
@TheFrenchGhosty TheFrenchGhosty marked this pull request as draft April 19, 2021 20:35
@syeopite syeopite mentioned this pull request May 3, 2021
12 tasks
@SamantazFox
Copy link
Member Author

Small update: I've been using this branch on my personnal instance for 3 weeks, and it works without issues. I'll try to continue working on that this week-end (I'm still having some server issues).

@AudricV
Copy link
Contributor

AudricV commented May 6, 2021

Nice ;)

@unixfox
Copy link
Member

unixfox commented May 23, 2021

What's left to be worked on here? In case I could help.

@AudricV
Copy link
Contributor

AudricV commented May 23, 2021

What's left to be worked on here? In case I could help.

The use of the new internal API for the videos in Invidious (see my issue for more details).

@syeopite
Copy link
Member

A lot of scraping needs to be replaced by this as well but that may be a bit out of scope for this specific PR

@SamantazFox
Copy link
Member Author

SamantazFox commented May 23, 2021

What's left to be worked on here? In case I could help.

@SamantazFox
Copy link
Member Author

A lot of scraping needs to be replaced by this as well but that may be a bit out of scope for this specific PR

Do you have examples?

@AudricV
Copy link
Contributor

AudricV commented May 23, 2021

This is required if we need to use the video and next endpoints of the youtube API.

player and next, not video and next ;)

@SamantazFox
Copy link
Member Author

@TiA4f8R ah, yeah, thanks ^^

@syeopite
Copy link
Member

A lot of scraping needs to be replaced by this as well but that may be a bit out of scope for this specific PR

Do you have examples?

Here's one. There's a lot of areas in Invidious that scrapes Youtube's page.

@AudricV
Copy link
Contributor

AudricV commented May 24, 2021

You can't imagine how much I love you for that and the rest <3

I really appreciate, but say thank you to @FireMasterK. Without they, I would not have been able to find this.

As the amount of API endpoint function grow, this will
prevent ugly code copy/pasta
@SamantazFox
Copy link
Member Author

* The `fetch_channel_playlists` and `get_about_info` function should use the `browse` endpoint:
  https://github.com/SamantazFox/invidious/blob/improve-youtube-api-helper/src/invidious/channels.cr#L358
  https://github.com/SamantazFox/invidious/blob/improve-youtube-api-helper/src/invidious/channels.cr#L777

I thought I could do that easily, but the channel's routes code should be cleaned first.

@SamantazFox
Copy link
Member Author

Oh come on.... It's just a useless comma....
https://github.com/iv-org/invidious/pull/1985/checks?check_run_id=2656341135

@SamantazFox SamantazFox force-pushed the improve-youtube-api-helper branch from 4d54387 to b7fe212 Compare May 24, 2021 13:25
@SamantazFox SamantazFox marked this pull request as ready for review May 24, 2021 13:26
@SamantazFox SamantazFox changed the title [WIP] Improve youtube api calls Improve youtube api calls May 24, 2021
@SamantazFox SamantazFox requested a review from saltycrys May 24, 2021 13:34
Copy link
Member

@saltycrys saltycrys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!

@SamantazFox SamantazFox added need-testing This feature needs to be deployed and tested to see if it's working, and doesn't break something and removed unfinished More work is needed on this PR, or on something this PR uses. labels May 24, 2021
@unixfox
Copy link
Member

unixfox commented May 24, 2021

I'm a bit lost on this PR, is all the work already done? Like are we still parsing the HTML of the www.youtube.com webpage?
If this PR is done then I would be very happy to test this PR.

@SamantazFox
Copy link
Member Author

SamantazFox commented May 25, 2021

I'm a bit lost on this PR, is all the work already done?

No, there is still stuff to do. As explained here, there are some other things that need to be fixed/sorted out first.

TL;DR: I don't properly understand the region bypass code, the channel community tab need to be fixed first, the channel routes need to be moved out of invidious.cr and cleaned up.

Like are we still parsing the HTML of the www.youtube.com webpage?

Yes. here, here, here and probably in many other places....

@unixfox
Copy link
Member

unixfox commented May 25, 2021

Ok so we can keep this PR as a draft until we use the majority of the youtube mobile API?

@SamantazFox
Copy link
Member Author

Ok so we can keep this PR as a draft until we use the majority of the youtube mobile API?

I'm not sure to understand what you mean by "use the majority of the YT mobile API"?

@unixfox
Copy link
Member

unixfox commented May 25, 2021

Ok so we can keep this PR as a draft until we use the majority of the youtube mobile API?

I'm not sure to understand what you mean by "use the majority of the YT mobile API"?

Well I meant that until we don't stop scraping the HTML of the YouTube webpages for most of the logic in Invidious we can set this PR as a draft because like you said in #1985 (comment) there are still a lot of code that need to be converted to use the YouTube mobile API.

@SamantazFox
Copy link
Member Author

SamantazFox commented May 25, 2021

Well I meant that until we don't stop scraping the HTML of the YouTube webpages for most of the logic in Invidious we can set this PR as a draft because like you said in #1985 (comment) there are still a lot of code that need to be converted to use the YouTube mobile API.

Ah, ok. I'll see what I can do for the player, but this PR will need to be merged before I can do anything about the channel's "about" tab.

The get_channel_about function is currently used in all channel-related routes (here, here, here, ...) as a way to get the channel's UCID from a user-name.

I'd prefer to not touch that atm, and make a separate PR that will clean the code, move it to src/invidious/routes, and properly handle the different URL forms (i.e /c/<ucid>, /channel/<ucid>, /u/<username>, /user/<username>, and many more) by using the resolve_url API endpoint, as pointed out by @TiA4f8R here: #1985 (comment).

Edit/Note: The code as-is is perfectly functional, and what could easily use those API endpoints now use them. I've been running those modifications on my instance for all that time this PR was open, until this weekend.

@unixfox
Copy link
Member

unixfox commented Jun 2, 2021

PR live in testing on https://yewtu.be

@Perflyst Perflyst added in-testing This feature has been deployed and is being tested and removed need-testing This feature needs to be deployed and tested to see if it's working, and doesn't break something labels Jun 2, 2021
@SamantazFox
Copy link
Member Author

PR live in testing on https://yewtu.be

@unixfox How is it going?

@unixfox
Copy link
Member

unixfox commented Jun 7, 2021

PR live in testing on yewtu.be

@unixfox How is it going?

No issues on my side, it's rock solid.

@syeopite syeopite merged commit d827346 into iv-org:master Jun 7, 2021
@SamantazFox SamantazFox removed the in-testing This feature has been deployed and is being tested label Jun 7, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 8, 2021
@SamantazFox SamantazFox deleted the improve-youtube-api-helper branch February 7, 2022 16:05
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants