Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various RSS PDF download issues #2

Open
xthursdayx opened this issue May 4, 2021 · 1 comment
Open

Various RSS PDF download issues #2

xthursdayx opened this issue May 4, 2021 · 1 comment

Comments

@xthursdayx
Copy link

xthursdayx commented May 4, 2021

Thanks for this useful script!

For some reason I seem to only be able to download the specific blog I'm after through the rss feed (with option -p), however every time I run the command the scrapping and downloading stops at a particular post.

I've tried using the -a -s and -p flags to download a specific year (or month) after the post which seems to be causing the problem, but I get the following error:

title: Baru Samarinda
link: https://blog-name.blogspot.com//2015/07/baru-samarinda-terima.html
Download html as PDF, please be patient...18/71
file path: /home/xthursdayx/blogspot-downloader/blog name.blogspot.com /Baru Samarinda Terima Penganugerahan P....pdf
pdfkit IOError

I also tried the command python3 blogspot_downloader.py -lo http://blog-name.blogspot.com/, exporting the results to urls.list and then ran the command python3 blogspot_downloader.py -p -1 <urls.list and got the following error:

URL: Create single pdf: /home/vidrir/blogspot-downloader/flores borneo.blogspot.com.pdf
IOError --one:  wkhtmltopdf reported an error:
Loading page (1/2)
Error: Failed to load https://blog-name.blogspot.com/2015/07/baru-samarinda-terima.html.html?action=backlinks&widgetId=Blog1&widgetType=Blog&responseType=js&postID=6408130842748688230&xssi_token=AOuZoY7PJVRw0EwDhHe-xNsCx9cPbEV4gQ400A1620183076462, with network status code 302 and http status code 400 - Error transferring https://blog-name.blogspot.com/2015/007/baru-samarinda-terima.html?action=backlinks&widgetId=Blog1&widgetType=Blog&responseType=js&postID=6408130842748688230&xssi_token=AOuZoY7PJVRw0EwDhHe-xNsCx9cPbEV4gQ%3A1620183076462 - server replied:
Printing pages (2/2)
Done
Exit with code 1 due to network error: ProtocolInvalidOperationError

Any idea what my problem is? Thanks for the help!

**blog name and post changed for the owner's benefit.

@xthursdayx xthursdayx changed the title Slowing down website downloads Various RSS PDF download issues May 5, 2021
@limkokhole
Copy link
Owner

limkokhole commented Nov 30, 2021

The blog page no longer exist. But try not using -p if got error since it rely on 3rd party library pdfkit and tool wkhtmltopdf out of my control unlike Epub (pypub bundled with this script). Also -p do not support multiple links.

And I fixed pypub to make it able to download some images and text. So -p is not preferred unless Epub version not working due to Javascript.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants