Skip to content

A little shell script to download a pdf file from a scribd document. This script isn't perfect, but it's enough for me.

Notifications You must be signed in to change notification settings

b44x/Scribd-downloader

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 

Repository files navigation

Scribd-downloader

A shell script to download a pdf file from a scribd document. This script can download too big files (such as 150 pages), forbidden pages (marked with “You are not reading a free preview”)… If you have a problem with a file just tell me at tobias.bora -@- gmail -.- com.

To use it, make sure you have installed phantomjs, pdftk and ImageMagick with

sudo apt-get install phantomjs imagemagick pdftk

You should check that phantomjs version is greater than 1.6 because an important function came. (If it’s not possible to install a newer version, see below how to correct that)

After, just run :

$ ./scribd_download.sh <your url>

If you have any problem and you want to try more options, run with no option

$ ./scribd_download.sh

and you’ll get a list of all options.

For time benchmark, for big files you can count around 120 pages for 5mn of execution. If you don’t want to use pdftk (it can be longer with it) go in the script file uncomment the line

# pdf_convert_mode="convert"

like this

pdf_convert_mode="convert"

Example :

$ ./scribd_download.sh http://fr.scribd.com/doc/63942746/chopin-nocturne-n-20-partition

The Scribd structure often changes, so if you have any problem, please contact me at tobias.bora -@- gmail -.- com, or let a message in the “Issues” section.

FAQ

I can’t install a recent version of PhantomJs newer than 1.6. Can I use an older version ?

Yes you can. To do that, please edit the file scribd_dowload.sh and modify the line

zoom_precision=2

like this

zoom_precision=1

Note that you may lose some precision. It will be easier to deal with this case in a futur version.

Is it possible to see the pages created in real time when the document is long to download ?

Just open the hidden folder .tmp in the current folder. You can see every pages in png format and check that there is no problem.

To Do

  • Auto-dectect PhantomJs version and correct zoom precision
  • Deal with documents with different page size
  • More flexible command line options (resolution, pages…)
  • Deal with documents with only one big obfuscated “page”
  • Add a graphical interface
  • Port it in a multi-OS language

About

A little shell script to download a pdf file from a scribd document. This script isn't perfect, but it's enough for me.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 100.0%