-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pornhub] Workaround scrape detection #5930
Comments
Works for me. Could you add an option
And upload/paste all *.dump files? |
Still the same. Please check the following dump file content,
|
Is the content from
the same as dump files? |
no, using diff tool, in go() function has some different results. |
Seems from your location pornpub is blocking downloading tools, such as youtube-dl. I'm afraid there's no simple way to bypass it. It's horrible to parse this complicated page and pass the correct value to pornhub. |
Got it and thanks a lot! |
Does a new WAN interface IP address help? |
This is really a delay mechanism used when an IP address makes too many requests too quickly. What the function is doing is requiring the client to do an expensive calculation before loading the page, to slow it down. This is sometimes triggered by downloading a large playlist with a high percentage of private videos using --ignore-errors. youtube-dl will try to download the page for a private video, fail immediately because it is private and go on to the next right away. When the next and the next are also private it can be making many requests in rapid succession and trigger this response. Once you start getting it, you keep getting it for some percentage of videos. The percentage seems to go up the more videos you try (and fail) to request in rapid succession, which is naturally what happens when it's downloading a playlist and you start getting a high percentage of this response. A partial mitigation could be to avoid doing that. If it's possible to identify a video as private from the playlist itself without having to try and fail to download it, there wouldn't be so many requests all at once. It can also happen when resuming a half-downloaded playlist because a request is made for every video in the first half of the playlist with no delay between them because they all have already been downloaded. Parsing this would be easy with a javascript library if you're willing to take on that much of a dependency. More work without it but still possible. What changes in each case is the contents of the go() function. Here is a second example to compare with the first above:
It's calculating some numbers and constructing a cookie from them. |
This will occur even with just downloading from a large playlist. Resolving issue #17571 would help avoid tripping high request thresholds if you use an archive file. Right now if you download a playlist of 100 files and have an archive file it will store identifiers for the 100 files. If the playlist gets updated to have 105 files and you download it again, youtube-dl still downloads 105 pages even though the archive file should allow it to ignore 100 pages. You can very rapidly get the delay mechanism this way because the 100 pages in the archive file are downloaded and discarded within a minute or two. |
Hello, I try to download the video from pornhub.com, and it gives me the following error message.
The text was updated successfully, but these errors were encountered: