[pornhub] Workaround scrape detection #5930

Cupcake-iOS · 2015-06-09T02:31:13Z

Hello, I try to download the video from pornhub.com, and it gives me the following error message.

➜  youtube-dl --verbose "http://www.pornhub.com/view_video.php?viewkey=1290284933"
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'--verbose', u'http://www.pornhub.com/view_video.php?viewkey=1290284933']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2015.06.04.1
[debug] Python version 2.7.6 - Darwin-14.3.0-x86_64-i386-64bit
[debug] exe versions: none
[debug] Proxy map: {}
[PornHub] 1290284933: Downloading webpage
ERROR: Unable to extract title; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 650, in extract_info
    ie_result = ie.extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 273, in extract
    return self._real_extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/pornhub.py", line 55, in _real_extract
    video_title = self._html_search_regex(r'<h1 [^>]+>([^<]+)', webpage, 'title')
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 564, in _html_search_regex
    res = self._search_regex(pattern, string, name, default, fatal, flags, group)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 555, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)
RegexNotFoundError: Unable to extract title; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

The text was updated successfully, but these errors were encountered:

yan12125 · 2015-06-09T02:58:11Z

Works for me. Could you add an option --write-pages:

youtube-dl --verbose --write-pages "http://www.pornhub.com/view_video.php?viewkey=1290284933"

And upload/paste all *.dump files?

Cupcake-iOS · 2015-06-09T03:35:31Z

Still the same. Please check the following dump file content,

<html><head><script type="text/javascript"><!--
function leastFactor(n) {
 if (isNaN(n) || !isFinite(n)) return NaN;
 if (n==0) return 0;
 if (n%1 || n*n<2) return 1;
 if (n%2==0) return 2;
 if (n%3==0) return 3;
 if (n%5==0) return 5;
 var m=Math.sqrt(n);
 for (var i=7;i<=m;i+=30) {
  if (n%i==0)      return i;
  if (n%(i+4)==0)  return i+4;
  if (n%(i+6)==0)  return i+6;
  if (n%(i+10)==0) return i+10;
  if (n%(i+12)==0) return i+12;
  if (n%(i+16)==0) return i+16;
  if (n%(i+22)==0) return i+22;
  if (n%(i+24)==0) return i+24;
 }
 return n;
}
function go() {
 var p=2012283879083; var s=2097722137; var n;
if ((s >> 9) & 1)/*
else p-=
*/p+=/*
p+= */194539741*
10;/*
else p-=
*/else /*
p+= */p-=96891998*  10;/* 120886108*
*/if ((s >> 9) & 1) p+=/*
else p-=
*/60068856*/*
p+= */10;/*
else p-=
*/else 
p-=/* 120886108*
*/125939562*/*
else p-=
*/10;   if ((s >> 14) & 1)  p+=/* 120886108*
*/116707458*/* 120886108*
*/17;/*
*13;
*/else /*
p+= */p-=/*
else p-=
*/31885004*
15;/*
else p-=
*/if ((s >> 3) & 1)
p+=/*
p+= */158004163*/*
p+= */4;/* 120886108*
*/else /*
else p-=
*/p-=/*
else p-=
*/72068438* 4;  if ((s >> 8) & 1)/*
p+= */p+=
143157909*/* 120886108*
*/11;/*
else p-=
*/else /*
else p-=
*/p-=144627391*/* 120886108*
*/9;/* 120886108*
*/ p-=6212272203;
 n=leastFactor(p);
{ document.cookie="RNKEY="+n+"*"+p/n+":"+s+":711807541:1";
  document.location.reload(true); }
}
//--></script></head>
<body onload="go()">
Loading ...
</body>
</html>

yan12125 · 2015-06-09T03:53:03Z

Is the content from

curl -v "http://www.pornhub.com/view_video.php?viewkey=1290284933"

the same as dump files?

Cupcake-iOS · 2015-06-09T04:04:59Z

no, using diff tool, in go() function has some different results.

yan12125 · 2015-06-09T04:20:00Z

Seems from your location pornpub is blocking downloading tools, such as youtube-dl. I'm afraid there's no simple way to bypass it. It's horrible to parse this complicated page and pass the correct value to pornhub.

Cupcake-iOS · 2015-06-09T04:28:41Z

Got it and thanks a lot!

Hrxn · 2015-10-11T16:00:05Z

Does a new WAN interface IP address help?

auggie5 · 2019-01-21T19:04:38Z

This is really a delay mechanism used when an IP address makes too many requests too quickly. What the function is doing is requiring the client to do an expensive calculation before loading the page, to slow it down.

This is sometimes triggered by downloading a large playlist with a high percentage of private videos using --ignore-errors.

youtube-dl will try to download the page for a private video, fail immediately because it is private and go on to the next right away. When the next and the next are also private it can be making many requests in rapid succession and trigger this response. Once you start getting it, you keep getting it for some percentage of videos. The percentage seems to go up the more videos you try (and fail) to request in rapid succession, which is naturally what happens when it's downloading a playlist and you start getting a high percentage of this response.

A partial mitigation could be to avoid doing that. If it's possible to identify a video as private from the playlist itself without having to try and fail to download it, there wouldn't be so many requests all at once.

It can also happen when resuming a half-downloaded playlist because a request is made for every video in the first half of the playlist with no delay between them because they all have already been downloaded.

Parsing this would be easy with a javascript library if you're willing to take on that much of a dependency. More work without it but still possible.

What changes in each case is the contents of the go() function. Here is a second example to compare with the first above:

<html><head><script type="text/javascript"></script></head>
<body onload="go()">
Loading ...
</body>
</html>

It's calculating some numbers and constructing a cookie from them.

mjolnir870 · 2019-01-22T19:25:23Z

This will occur even with just downloading from a large playlist. Resolving issue #17571 would help avoid tripping high request thresholds if you use an archive file. Right now if you download a playlist of 100 files and have an archive file it will store identifiers for the 100 files. If the playlist gets updated to have 105 files and you download it again, youtube-dl still downloads 105 pages even though the archive file should allow it to ignore 100 pages. You can very rapidly get the delay mechanism this way because the 100 pages in the archive file are downloaded and discarded within a minute or two.

Cupcake-iOS closed this as completed Jun 9, 2015

yan12125 mentioned this issue Oct 7, 2015

Pornhub broken #7074

Closed

dstftw reopened this Oct 10, 2015

dstftw changed the title ~~Unable to extract title~~ [pornhub] Workaround scrape detection Oct 10, 2015

This was referenced Jul 27, 2016

[PornHub] ERROR: Unable to extract title #10175

Closed

Unable to extract title #4822

Closed

dstftw mentioned this issue Apr 15, 2017

can't download pornhub #12722

Closed

8 tasks

This was referenced Jan 13, 2019

[PornHub] Title Cannot Be Extracted #17197

Closed

Pornhub unable to extract title #18842

Closed

dstftw closed this as completed in 278d061 Jan 22, 2019

pionxzh mentioned this issue Oct 26, 2023

The recommendedVideos function always returns empty result pionxzh/Pornhub.js#92

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pornhub] Workaround scrape detection #5930

[pornhub] Workaround scrape detection #5930

Cupcake-iOS commented Jun 9, 2015

yan12125 commented Jun 9, 2015

Cupcake-iOS commented Jun 9, 2015

yan12125 commented Jun 9, 2015

Cupcake-iOS commented Jun 9, 2015

yan12125 commented Jun 9, 2015

Cupcake-iOS commented Jun 9, 2015

Hrxn commented Oct 11, 2015

auggie5 commented Jan 21, 2019 •

edited

Loading

mjolnir870 commented Jan 22, 2019

[pornhub] Workaround scrape detection #5930

[pornhub] Workaround scrape detection #5930

Comments

Cupcake-iOS commented Jun 9, 2015

yan12125 commented Jun 9, 2015

Cupcake-iOS commented Jun 9, 2015

yan12125 commented Jun 9, 2015

Cupcake-iOS commented Jun 9, 2015

yan12125 commented Jun 9, 2015

Cupcake-iOS commented Jun 9, 2015

Hrxn commented Oct 11, 2015

auggie5 commented Jan 21, 2019 • edited Loading

mjolnir870 commented Jan 22, 2019

auggie5 commented Jan 21, 2019 •

edited

Loading