A shell script to download a pdf file from a scribd document. This script can download too big files (such as 150 pages), forbidden pages (marked with “You are not reading a free preview”)… If you have a problem with a file just tell me at tobias.bora -@- gmail -.- com.
To use it, make sure you have installed phantomjs, pdftk and ImageMagick with
sudo apt-get install phantomjs imagemagick pdftk
You should check that phantomjs version is greater than 1.6 because an important function came. (If it’s not possible to install a newer version, see below how to correct that)
After, just run :
$ ./scribd_download.sh <your url>
If you have any problem and you want to try more options, run with no option
$ ./scribd_download.sh
and you’ll get a list of all options.
For time benchmark, for big files you can count around 120 pages for 5mn of execution. If you don’t want to use pdftk (it can be longer with it) go in the script file uncomment the line
# pdf_convert_mode="convert"
like this
pdf_convert_mode="convert"
Example :
$ ./scribd_download.sh http://fr.scribd.com/doc/63942746/chopin-nocturne-n-20-partition
The Scribd structure often changes, so if you have any problem, please contact me at tobias.bora -@- gmail -.- com, or let a message in the “Issues” section.
Yes you can. To do that, please edit the file scribd_dowload.sh and modify the line
zoom_precision=2
like this
zoom_precision=1
Note that you may lose some precision. It will be easier to deal with this case in a futur version.
Just open the hidden folder .tmp in the current folder. You can see every pages in png format and check that there is no problem.
- Auto-dectect PhantomJs version and correct zoom precision
- Deal with documents with different page size
- More flexible command line options (resolution, pages…)
- Deal with documents with only one big obfuscated “page”
- Add a graphical interface
- Port it in a multi-OS language