Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Select info methods #950

Closed
konsumer opened this issue Jun 24, 2021 · 2 comments
Closed

Feature: Select info methods #950

konsumer opened this issue Jun 24, 2021 · 2 comments

Comments

@konsumer
Copy link

konsumer commented Jun 24, 2021

I am making a very light proxy, where I need the raw stream URLs for "regular" videos, meaning not age-restricted or otherwise weird. As it is, this library does 3 requests for info in getBasicInfo:

let info = await pipeline([id, options], validate, retryOptions, [
    getWatchHTMLPage,
    getWatchJSONPage,
    getVideoInfoPage,
  ]);

This data is needed for special videos, and for collecting other important info, but not all use-cases need that.

I am thinking just exporting getWatchHTMLPage would do the trick for me, and I can skip the other stuff. Alternately, maybe an option for getBasicInfo to make less requests would also support my use-case.

I am happy to PR for the feature, if interested. I would much prefer to share the effort of keeping getWatchHTMLPage up-to-date, rather than maintain my own copy of something very similar.

In my current code, I am using this, and filtering URLs to get stream links that work:

import fetch from 'node-fetch'

const regex = /var ytInitialPlayerResponse = (.+);<\/script>/gm

export async function getWatchHTMLPage (id) {
  const r = await fetch(`https://www.youtube.com/watch?v=${id}`)
  const str = await r.text()
  const m = regex.exec(str)
  if (m && m.length === 2){
    return JSON.parse(m[1])
  }
}

// test
getInfo('K-281doxOMc')
  .then(console.log)

But like I said, exporting getWatchHTMLPage here would be preferred.

This is sort of related to #945 as I started working in this direction when it started failing (which in my case is resolved by just not calling the other data-functions.)

@gatecrasher777
Copy link
Contributor

gatecrasher777 commented Jun 25, 2021

The library only executes those extra requests if the previous one failed.

If you want really light and efficient, the innertube player?key= post requests give the player responses already in json format. These requests use much less bandwidth and overhead than calling the watch pages. But you need to maintain a session in which you get all the info needed to make those calls from an initial call to the youtube home page (which also gives the html5player to be used for decyphering urls).

The player?key= post requests don't work for age restricted videos though. For those you still need to use the watch page with a logged- in token/cookie.

The player requests also work in an unlimited and anonymous way too - i.e. without cookies and no 429s.

@konsumer
Copy link
Author

konsumer commented Jun 25, 2021

The library only executes those extra requests if the previous one failed.

Ah, yes, sorry, I just had a look at pipeline. Cool!

But you need to maintain a session

I use redirects in a lambda, so that would probably be problematic, without drastically changing how it works.

I think exporting them might be helpful for other stuff (like more atomic testing, working around failures that crash data-collection, etc) but I can totally live with it. Feel free to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants