pdf2images

Convert PDF file to image files ROBUSTLY.

Example

$ pdf2images -h
usage: pdf2images [-h] [--max-size MAX_SIZE] pdf_file output_dir

positional arguments:
  pdf_file
  output_dir

optional arguments:
  -h, --help           show this help message and exit
  --max-size MAX_SIZE  max size of either side of the image

Why another "pdf-to-image" package

Once in a while, I need to convert a pdf file (usually slides or academic paper) into image files (thumbnails) in order to get a fast glance to the readers without downloading the pdf file.

However, I found all the pdf2image solutions cannot robustly process all the pdf files, since many pdf files are in non-standard format or come up with extensions. They are always broken in some cases.

But to look them on the bright side, for any plausible case, there is almost one of them can process it successfully.

So I combined (a.k.a. ensemble) them together to make it work across most cases.

Installation

As mentioned above, we combined multiple pdf manipulation libraries. Here are the list of the libraries used:

wand, an ImageMagick python wrapper.
pdftotext command line tool provided by xpdf
preview-generator
qpdf

where wand and preview-generator are python packages that can be automatically installed along with pdf2images. However, you have to install xpdf and qpdf manually.

On Ubuntu:

sudo apt install -y qpdf xpdf libimage-exiftool-perl poppler-utils

On Arch Linux:

sudo pacman -S --noconfirm qpdf xpdf perl-image-exiftool

On macOS:

brew install freetype imagemagick qpdf xpdf exiftool libmagic ghostscript

The installation of pdf2images is quite simple:

pip install pdf2images

Robustness

This package has successfully processed hundreds of thousands of arxiv papers (for generating thumbnails).

Gallary

The following images are converted from a slide from Deep Learning Book

Development

pip3 install -r requirements.dev.txt
pre-commit install

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.circleci		.circleci
assets		assets
bin		bin
pdf2images		pdf2images
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.dev.txt		requirements.dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf2images

Example

Why another "pdf-to-image" package

Installation

Robustness

Gallary

Development

About

Releases

Packages

Languages

License

zxytim/pdf2images

Folders and files

Latest commit

History

Repository files navigation

pdf2images

Example

Why another "pdf-to-image" package

Installation

Robustness

Gallary

Development

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages