If you like this project , show your support by donating or giving a 🌟 start to this repository
Escraper aka. Project X29 is an simple project to scrap email addresses from PDFs and Photos. Just Feed the Input File and get a output as a .txt
file.
( Assume we have a input file called called
card.pdf
which is an business card an includes some email addresses which we want to extract.)
Execute this :
$ escraper -p card.pdf
After this we will get a output file called card.pdf.txt
which will contain all the email addresses present in card.pdf
- Extract emails from a pdf file:
$ escraper -p/--pdf FILENAME
- Extract emails from a pdf file:
$ escraper -i/--image FILENAME
- Choose custom output file:
$ escraper -o/--out OUTPUT
- Perquisites :
- A C++ Compile
sudo apt install build-essentials
- ImageMagick Library
sudo apt install graphicsmagick-libmagick-dev-compat
- Tesseract OCR Library
sudo apt install tesseract-ocr libtesseract-dev libleptonica-dev
- Make
sudo apt install make
- Git Clone or Download this repo
git clone https://github.com/bauripalash/escraper
cd
into the project folder
cd escraper
` Make
make
- Now you'll have a binary called escraper
If you like this project consider giving it a 🌟 star or donating. Follow me on socials [Twitter] | [Facebook] | [Instagram] | or even [GitHub]