Doesn't work in production #38

Donald646 · 2024-08-11T19:12:58Z

This gets blocked in production

emilthemaker · 2024-08-11T20:01:15Z

Mind specifying what's happening?

Donald646 · 2024-08-11T20:04:12Z

I'm using Next.js Supabase vercel

It seems like when I go to production, it's getting blocked. I get the error Transcript is Disabled, but I know that's not true, because it works locally.

I get the

Error: An error occurred in the Server Components render. The specific message is omitted in production builds to avoid leaking sensitive details. A digest property is included in this error instance which may provide additional details about the nature of the error.

I am using server components for it. Even when I switch to API routes it still gets blocked.

Donald646 · 2024-08-11T20:05:33Z

import { YoutubeTranscript } from 'youtube-transcript';

export async function POST(request: Request) {
  const { videoUrl } = await request.json();

  try {
    const transcript = await YoutubeTranscript.fetchTranscript(videoUrl);
    let newTranscript = '';
    for (let i = 0; i < transcript.length; i++) {
      newTranscript += `Timestamp: ${transcript[i].offset}, Text: ${transcript[i].text}\n`;
    }
    return NextResponse.json({ transcript: newTranscript });
  } catch (error) {
    console.error('Error fetching transcript:', error);
    return NextResponse.json(
      { error: 'Failed to fetch video transcript. Please check the URL and try again.' },
      { status: 500 }
    );
  }
}

this is my code above

Donald646 · 2024-08-11T20:09:59Z

I saw the other issue #11 and it seems like it's the issue I'm facing, but recommending it because there seems to be no good solution. If YouTube is blocking sites, then would this library be useless then?

kellenmace · 2024-08-12T00:50:48Z

I was encountering this issue on Vercel, too. I believe this issue exists because YouTube is returning different HTML for YouTube video pages depending on where the request is coming from. When I scrape a page like https://www.youtube.com/watch?v=rB9ql0L0cUQ on my local machine, the HTML page contains a script tag with a ytInitialPlayerResponse variable in it, and within that object is the URLs to the caption tracks (this is what the youtube-transcript library relies on to get the transcript URLs). When I scrape that same page in a Serverless Function on Vercel though, the ytInitialPlayerResponse object does not contain the caption track URLs, therefore the code that youtube-transcript runs errors out.

Option 1

If all you need is the lines of text from the transcript, then the method described in #11 where you use the youtubei.js NPM package to access the transcript may work just fine.

Option 2

If you need all of the transcript data, including the start time and duration for every line, the easiest solution is to run your youtube-transcript code somewhere else. You can create a Google Cloud Function that executes the code and call it from your app, for example. This isn't ideal, since you have to have this one bit of functionality live outside of your Vercel-hosted app, but it works.

Donald646 · 2024-08-12T01:07:48Z

I see thank you. The durations are a pretty core part of my app. The commenter of the youtubei.js code said it was also possible to get the durations, so I will check that out first.

terrytjw · 2024-08-12T04:52:13Z

i am just experiencing this issue too in prod. works perfectly fine on local. @Donald646 did you manage to get it to work on prod?

Donald646 · 2024-08-12T04:53:43Z

@terrytjw No I haven't, I don't think it works. I'm planning to switch to youtubei.js, but it seems like the issue is prevalent over there as well.

terrytjw · 2024-08-12T05:00:53Z

I was encountering this issue on Vercel, too. I believe this issue exists because YouTube is returning different HTML for YouTube video pages depending on where the request is coming from. When I scrape a page like https://www.youtube.com/watch?v=rB9ql0L0cUQ on my local machine, the HTML page contains a script tag with a ytInitialPlayerResponse variable in it, and within that object is the URLs to the caption tracks (this is what the youtube-transcript library relies on to get the transcript URLs). When I scrape that same page in a Serverless Function on Vercel though, the ytInitialPlayerResponse object does not contain the caption track URLs, therefore the code that youtube-transcript runs errors out.

Option 1

If all you need is the lines of text from the transcript, then the method described in #11 where you use the youtubei.js NPM package to access the transcript may work just fine.

Option 2

If you need all of the transcript data, including the start time and duration for every line, the easiest solution is to run your youtube-transcript code somewhere else. You can create a Google Cloud Function that executes the code and call it from your app, for example. This isn't ideal, since you have to have this one bit of functionality live outside of your Vercel-hosted app, but it works.

hey @kellenmace , other than Google Cloud Function, can aws lambda work too?

terrytjw · 2024-08-12T05:01:40Z

@terrytjw No I haven't, I don't think it works. I'm planning to switch to youtubei.js, but it seems like the issue is prevalent over there as well.

@Donald646 have you tried the Google Cloud Function approach?

Donald646 · 2024-08-12T05:04:19Z

@terrytjw No I haven't, I don't think it works. I'm planning to switch to youtubei.js, but it seems like the issue is prevalent over there as well.

@Donald646 have you tried the Google Cloud Function approach?

No I haven't yet, but from another issue I was looking at on Youtubei.js they said It didn't work either, but they weren't getting transcripts so I don't know

SuspiciousLookingOwl/youtubei#113

leandronorcio · 2024-08-12T05:17:42Z

hey @kellenmace , other than Google Cloud Function, can aws lambda work too?

I can confirm it’s not working on AWS Lambda.

Donald646 · 2024-08-12T05:36:04Z

Yea this library is cooked, it's kinda useless if it doesn't work in production

emilthemaker · 2024-08-12T14:26:36Z

@terrytjw No I haven't, I don't think it works. I'm planning to switch to youtubei.js, but it seems like the issue is prevalent over there as well.

@Donald646 have you tried the Google Cloud Function approach?

No I haven't yet, but from another issue I was looking at on Youtubei.js they said It didn't work either, but they weren't getting transcripts so I don't know

SuspiciousLookingOwl/youtubei#113

Working for me with youtubei.js but no clue why 😅

I'm on Supabase Edge Functions. Transcript works, but some other properties appear broken.

metaloozee · 2024-08-13T06:34:32Z

Because YouTube is always changing how it operates and because it enforces sign-in, it is making it impossible for data scrapers to obtain the data, this is a global problem.

You can find a similar issue that simply returns the YouTube message "Sign in to confirm you are not a bot" if you check at a few different libraries, including ytdl and youtubei.js.

colouredFunk · 2024-08-13T06:38:53Z

It's working for me on a regular linux production server

metaloozee · 2024-08-13T06:39:02Z

It's working for me on a regular linux production server

wait till their system find out that you're just a bot

emilthemaker · 2024-08-13T09:55:56Z

I'm getting the bot login thing too but can still scrape transcripts

…

-------------- Emil Lienemann -------------- founder talktweak ***@***.*** emil.cx/meet ( https://emil.cx/meet )

On Tue, Aug 13, 2024 at 8:34 AM, ayan < ***@***.*** > wrote: Because YouTube is always changing how it operates and because it enforces sign-in, it is making it impossible for data scrapers to obtain the data, this is a global problem. You can find a similar issue that simply returns the YouTube message "Sign in to confirm you are not a bot" if you check at a few different libraries, including ytdl and youtubei.js. — Reply to this email directly, view it on GitHub ( #38 (comment) ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/BA7YF32A2KNSNX5VIFUDFK3ZRGSI5AVCNFSM6AAAAABMK7UCRKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBVGQ2DEOBSG4 ). You are receiving this because you commented. Message ID: <Kakulukian/youtube-transcript/issues/38/2285442827 @ github. com>

colouredFunk · 2024-08-13T12:46:40Z

it's stopped working for me now too. I'm not seeing any error message, it just hangs....

aleksa-codes · 2024-08-16T09:51:01Z

I saw the other issue #11 and it seems like it's the issue I'm facing, but recommending it because there seems to be no good solution. If YouTube is blocking sites, then would this library be useless then?

youtubei.js is working for me on production, I switched to it based on the code in that issue: #11 (comment). I managed to get the timestamps as well, if someone is interested I can share the code with the timestamps.

Edit: does not work for everything on production, I was not able to get basicInfo like: title, duration, description etc. but getting transcription still works. Again works locally and does not on Vercel prod in my case. Found out the hard way after I migrated all my code to use youtubei.js for all of the data I was getting in different ways. I get "Sign in required". Something probably related to: LuanRT/YouTube.js#696

Lipcsyy · 2024-08-16T13:21:43Z

I saw the other issue #11 and it seems like it's the issue I'm facing, but recommending it because there seems to be no good solution. If YouTube is blocking sites, then would this library be useless then?

youtubei.js is working for me on production, I switched to it based on the code in that issue: #11 (comment). I managed to get the timestamps as well, if someone is interested I can share the code with the timestamps.

It says video unavailable for me. Did this occur to you too?

aleksa-codes · 2024-08-16T13:25:42Z

I saw the other issue #11 and it seems like it's the issue I'm facing, but recommending it because there seems to be no good solution. If YouTube is blocking sites, then would this library be useless then?

youtubei.js is working for me on production, I switched to it based on the code in that issue: #11 (comment). I managed to get the timestamps as well, if someone is interested I can share the code with the timestamps.

It says video unavailable for me. Did this occur to you too?

I was getting that error only with: enable_safety_mode: true as option in Innertube.create({}). Remove it or set it to false, if that is the case.

aleksa-codes · 2024-08-16T14:06:37Z

I saw the other issue #11 and it seems like it's the issue I'm facing, but recommending it because there seems to be no good solution. If YouTube is blocking sites, then would this library be useless then?

youtubei.js is working for me on production, I switched to it based on the code in that issue: #11 (comment). I managed to get the timestamps as well, if someone is interested I can share the code with the timestamps.

It says video unavailable for me. Did this occur to you too?

@Lipcsyy Sorry to follow up, but I noticed that the code here #11 (comment) uses a URL in youtube.getInfo(). However, the YouTube.js documentation doesn't mention using a URL as target. According to the docs, the function expects the video ID.

Here's the function I'm using to extract the ID from the URL:

const getYouTubeVideoId = (input: string): string => {
  const regExp: RegExp =
    /(?:https?:\/\/)?(?:www\.)?(?:youtube\.com\/(?:.*[?&]v=|(?:v|e(?:mbed)?)\/|shorts\/|live\/)|youtu\.be\/)([a-zA-Z0-9_-]{11})/;
  const match: RegExpMatchArray | null = input.match(regExp);

  return match && match[1] ? match[1] : input;
};

There might be a simpler approach, but I hope this helps!

adnjoo · 2024-08-20T22:32:58Z

I have the same issue with Next.js deployed to Vercel

rn [Error]: [YoutubeTranscript] 🚨 Transcript is disabled on this video (Osj9tv8aOqM)

tonymanh-dev · 2024-08-21T13:55:39Z

Any solution for that?

adnjoo · 2024-08-21T19:17:27Z

@tonymanh-dev

Any solution for that?

https://www.npmjs.com/package/youtubei.js/

e.g. https://github.com/adnjoo/fast-youtube-summary/blob/main/web/app/(api)/summarize/route.ts#L3-L24

tushar453030 · 2024-08-22T19:04:28Z

Hey @adnjoo

I implemented the same logic as yours in my express server

but when hitting the APIs it gives error InnertubeError: This video is unavailable.

What could be the reason?

tushar453030 · 2024-08-22T19:10:20Z

@adnjoo

By any change you ever came accros the below error

When hitting the APIs locally using postman or UI locally it works fine I gets the transcript but when hosted the backend on vercel or aws it gives error

https://www.youtube.com/watch?v=EGkGRs6YhoM
YoutubeTranscriptDisabledError: [YoutubeTranscript] 🚨 Transcript is disabled on this video

mrgoonie · 2024-08-23T01:07:09Z

Youtube has been updated, this script was cooked

tushar453030 · 2024-08-23T13:36:16Z

Hii @mrgoonie,
Checkout this repo it has hosted site shared by @adnjoo. Its working the code is same as of mine.
https://github.com/adnjoo/fast-youtube-summary/tree/main

The only diff is his code is in next.js and mine on express.

M-YasirGhaffar · 2024-08-26T21:27:48Z

I think the error in production is due to the same IP address of the hosting server. I also saw this error in the logs of the server:

Error: YoutubeTranscriptTooManyRequestError: [YoutubeTranscript] 🚨 YouTube is receiving too many requests from this IP and now requires solving a captcha to continue

mrgoonie · 2024-08-27T10:19:04Z

I think the error in production is due to the same IP address of the hosting server. I also saw this error in the logs of the server:
Error: YoutubeTranscriptTooManyRequestError: [YoutubeTranscript] 🚨 YouTube is receiving too many requests from this IP and now requires solving a captcha to continue

That's exactly the reason, as I mentioned earlier, Youtube has just updated and will block your server IP address if it crawls the Youtube URL too many times.

So I came up with a solution that adding a proxy layer in every fetch.

Here is the demo: https://app.digicord.site/youtube/transcript

There is a FREE API within the page in case someone need it. Cheers!

mrgoonie · 2024-08-27T10:21:17Z

Hii @mrgoonie, Checkout this repo it has hosted site shared by @adnjoo. Its working the code is same as of mine. https://github.com/adnjoo/fast-youtube-summary/tree/main

The only diff is his code is in next.js and mine on express.

It's working because your IP address has not been blocked, not working for me because mine is cooked 😄

swarajbachu · 2024-09-07T06:47:06Z

so its not yet solved yet 🥲, I felt this was easy then saw the error on production

ngocsangyem · 2024-09-14T17:29:17Z

Hi @tushar453030, try to pass the video ID instead of the whole URL. It worked on my side by following this code.

const init = async () => {
    const youtube = await Innertube.create({
        lang: "en",
        location: "US",
        retrieve_player: false,
    });

    try {
        const info = await youtube.getInfo('axYAW7PuSIM');
        const transcriptData = await info.getTranscript();
        const mappedData = transcriptData.transcript.content.body.initial_segments.map(
            (segment) => segment.snippet.text
        );

        console.log('transcript', mappedData);

    } catch (error) {
        console.error("Error fetching transcript:", error);
        throw error;
    }
}

hatemmezlini · 2024-10-29T19:44:44Z

If anyone is still struggling with this. It's due to Youtube banning ISP Ips (why it would not work ion production) I have made a working solution using oauth2 that doesn't uses proxies and avoids Youtube ban. You can use it for free on Apify: https://apify.com/invideoiq/video-transcript-scraper. You only pay for Apify usage, however Apify gives you free credit of 5$ which will give you around 5000 transcripts

swarajbachu · 2024-10-30T09:17:02Z

If anyone is still struggling with this. It's due to Youtube banning ISP Ips (why it would not work ion production) I have made a working solution using oauth2 that doesn't uses proxies and avoids Youtube ban. You can use it for free on Apify: https://apify.com/invideoiq/video-transcript-scraper. You only pay for Apify usage, however Apify gives you free credit of 5$ which will give you around 5000 transcripts

so you are directly using google apis ?

hatemmezlini · 2024-10-30T09:42:09Z

@swarajbachu No, There is an oauth plugin that was created by yt-dlp developers. It uses the Youtube on TV client because the token is never refreshed on TVs. All I had to do is a create a dummy account, it asked me for password once and than I saved the token and kept passing it in every request. I believe something similar can be developed here

blake41 · 2024-11-23T12:19:23Z

I think the error in production is due to the same IP address of the hosting server. I also saw this error in the logs of the server:
Error: YoutubeTranscriptTooManyRequestError: [YoutubeTranscript] 🚨 YouTube is receiving too many requests from this IP and now requires solving a captcha to continue
That's exactly the reason, as I mentioned earlier, Youtube has just updated and will block your server IP address if it crawls the Youtube URL too many times.

So I came up with a solution that adding a proxy layer in every fetch.

Here is the demo: https://app.digicord.site/youtube/transcript

There is a FREE API within the page in case someone need it. Cheers!

what are you using for the proxy?

mrgoonie · 2024-11-23T14:23:29Z

There are plenty of them on the internet, I picked IPRoyal, good enough for me, but if you crawl too many, you will need many proxies to rotate 😅

blake41 · 2024-11-23T18:52:51Z

@swarajbachu No, There is an oauth plugin that was created by yt-dlp developers. It uses the Youtube on TV client because the token is never refreshed on TVs. All I had to do is a create a dummy account, it asked me for password once and than I saved the token and kept passing it in every request. I believe something similar can be developed here

I followed the extractor wiki and grabbed a proof of origin token. Is that the token you're referring to? I'm still getting blocked on my prod server even when I pass the PO using this library. Or which oauth plugin are you referring to with the yt-dlp library?

petesampras12 · 2024-12-02T19:19:04Z

I think the error in production is due to the same IP address of the hosting server. I also saw this error in the logs of the server:
Error: YoutubeTranscriptTooManyRequestError: [YoutubeTranscript] 🚨 YouTube is receiving too many requests from this IP and now requires solving a captcha to continue
That's exactly the reason, as I mentioned earlier, Youtube has just updated and will block your server IP address if it crawls the Youtube URL too many times.

So I came up with a solution that adding a proxy layer in every fetch.

Here is the demo: https://app.digicord.site/youtube/transcript

There is a FREE API within the page in case someone need it. Cheers!

Fantastic! Is it possible to limit the request to only get the "content" part and not the "chunks"? Because it takes a long time to retreive the transcript for a 3 min video, but perhaps it would go quicker if i could only get "content"? Appreciate it!

petesampras12 · 2024-12-16T12:42:05Z

Sadly both youtube-transcript and digicords variant has now stopped working (https://app.digicord.site/youtube/transcript).

Anyone else know of a working API?

blake41 · 2024-12-16T13:58:17Z

i run all my requests through a residential proxy. unless you are doing an insane amount of requests, quite cheap

savnani5 · 2024-12-19T00:59:10Z

i run all my requests through a residential proxy. unless you are doing an insane amount of requests, quite cheap

I am trying this, but still getting blocked, what library are you using to fetch?

blake41 · 2024-12-19T14:26:06Z

your requests must not be going through the proxy

import { setGlobalDispatcher, ProxyAgent } from 'undici';

// Load environment variables
dotenv.config();

// Set up global proxy if configured
if (process.env.PROXY_HOST && process.env.PROXY_PORT) {
const proxyUrl = process.env.PROXY_USERNAME && process.env.PROXY_PASSWORD
? http://${process.env.PROXY_USERNAME}:${process.env.PROXY_PASSWORD}@${process.env.PROXY_HOST}:${process.env.PROXY_PORT}
: http://${process.env.PROXY_HOST}:${process.env.PROXY_PORT};

console.log('Setting up global proxy dispatcher:', proxyUrl.replace(/:[^:@]@/, ':**@'));
const dispatcher = new ProxyAgent({ uri: proxyUrl });
setGlobalDispatcher(dispatcher);
}

petesampras12 · 2024-12-19T16:00:30Z

i run all my requests through a residential proxy. unless you are doing an insane amount of requests, quite cheap

I am trying this, but still getting blocked, what library are you using to fetch?

Did you get it to work?

savnani5 · 2024-12-19T21:51:00Z

your requests must not be going through the proxy

import { setGlobalDispatcher, ProxyAgent } from 'undici';

// Load environment variables dotenv.config();

// Set up global proxy if configured if (process.env.PROXY_HOST && process.env.PROXY_PORT) { const proxyUrl = process.env.PROXY_USERNAME && process.env.PROXY_PASSWORD ? http://${process.env.PROXY_USERNAME}:${process.env.PROXY_PASSWORD}@${process.env.PROXY_HOST}:${process.env.PROXY_PORT} : http://${process.env.PROXY_HOST}:${process.env.PROXY_PORT};

console.log('Setting up global proxy dispatcher:', proxyUrl.replace(/:[^:@]@/, ':**@')); const dispatcher = new ProxyAgent({ uri: proxyUrl }); setGlobalDispatcher(dispatcher); }

I was using youtube-transcript and ytdl-core with proxies and it was throwing 500 internal serve error, got it working for now with innertube! Thanks

savnani5 · 2024-12-19T21:52:15Z

i run all my requests through a residential proxy. unless you are doing an insane amount of requests, quite cheap

I am trying this, but still getting blocked, what library are you using to fetch?

Did you get it to work?

Yes, working with youtubei.js (innertube)!

petesampras12 · 2024-12-22T11:33:56Z

i run all my requests through a residential proxy. unless you are doing an insane amount of requests, quite cheap

I am trying this, but still getting blocked, what library are you using to fetch?

Did you get it to work?

Yes, working with youutbei.js (innertube)!

Awesome! Is it this one? (https://github.com/haxzie/innerTube.js/)

I can't find any method that gets the transcript?

savnani5 · 2024-12-23T19:31:56Z

i run all my requests through a residential proxy. unless you are doing an insane amount of requests, quite cheap

I am trying this, but still getting blocked, what library are you using to fetch?

Did you get it to work?

Yes, working with youutbei.js (innertube)!

Awesome! Is it this one? (https://github.com/haxzie/innerTube.js/)

I can't find any method that gets the transcript?

here this one: #11 (comment)

M-YasirGhaffar mentioned this issue Nov 17, 2024

Transcript Retrieval Issue in Production Environments M-YasirGhaffar/youtube-video-summarizer-using-gemini-api#1

Open

Doesn't work in production #38

Doesn't work in production #38

Comments

Donald646 commented Aug 11, 2024

emilthemaker commented Aug 11, 2024

Donald646 commented Aug 11, 2024

Donald646 commented Aug 11, 2024 • edited Loading

Donald646 commented Aug 11, 2024

kellenmace commented Aug 12, 2024

Option 1

Option 2

Donald646 commented Aug 12, 2024

terrytjw commented Aug 12, 2024

Donald646 commented Aug 12, 2024

terrytjw commented Aug 12, 2024

Option 1

Option 2

terrytjw commented Aug 12, 2024

Donald646 commented Aug 12, 2024

leandronorcio commented Aug 12, 2024 • edited Loading

Donald646 commented Aug 12, 2024

emilthemaker commented Aug 12, 2024

metaloozee commented Aug 13, 2024

colouredFunk commented Aug 13, 2024

metaloozee commented Aug 13, 2024

emilthemaker commented Aug 13, 2024 via email

colouredFunk commented Aug 13, 2024

aleksa-codes commented Aug 16, 2024 • edited Loading

Lipcsyy commented Aug 16, 2024

aleksa-codes commented Aug 16, 2024 • edited Loading

aleksa-codes commented Aug 16, 2024 • edited Loading

adnjoo commented Aug 20, 2024

tonymanh-dev commented Aug 21, 2024

adnjoo commented Aug 21, 2024 • edited Loading

tushar453030 commented Aug 22, 2024

tushar453030 commented Aug 22, 2024 • edited Loading

mrgoonie commented Aug 23, 2024

tushar453030 commented Aug 23, 2024 • edited Loading

M-YasirGhaffar commented Aug 26, 2024

mrgoonie commented Aug 27, 2024

mrgoonie commented Aug 27, 2024

swarajbachu commented Sep 7, 2024

ngocsangyem commented Sep 14, 2024

hatemmezlini commented Oct 29, 2024

swarajbachu commented Oct 30, 2024

hatemmezlini commented Oct 30, 2024

blake41 commented Nov 23, 2024

mrgoonie commented Nov 23, 2024

blake41 commented Nov 23, 2024

petesampras12 commented Dec 2, 2024 • edited Loading

petesampras12 commented Dec 16, 2024

blake41 commented Dec 16, 2024

savnani5 commented Dec 19, 2024

blake41 commented Dec 19, 2024

petesampras12 commented Dec 19, 2024

savnani5 commented Dec 19, 2024

savnani5 commented Dec 19, 2024 • edited Loading

petesampras12 commented Dec 22, 2024

savnani5 commented Dec 23, 2024 • edited Loading

Donald646 commented Aug 11, 2024 •

edited

Loading

leandronorcio commented Aug 12, 2024 •

edited

Loading

aleksa-codes commented Aug 16, 2024 •

edited

Loading

aleksa-codes commented Aug 16, 2024 •

edited

Loading

aleksa-codes commented Aug 16, 2024 •

edited

Loading

adnjoo commented Aug 21, 2024 •

edited

Loading

tushar453030 commented Aug 22, 2024 •

edited

Loading

tushar453030 commented Aug 23, 2024 •

edited

Loading

petesampras12 commented Dec 2, 2024 •

edited

Loading

savnani5 commented Dec 19, 2024 •

edited

Loading

savnani5 commented Dec 23, 2024 •

edited

Loading