-
Notifications
You must be signed in to change notification settings - Fork 9
my take at a PDF text extraction utility
License
oyvindberg/PDFExtract
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
####################################### # How to build PDFExtract from source ####################################### # 1. create a folder for the projects and cd to it # # 2. install TEI P5 model # git clone http://github.com/elacin/TEI-P5-Java-model.git cd TEI-P5-Java-model/ #this chooses version 0.3, which is currently used by PDFExtract git checkout 29d668e mvn install cd .. # # 3. install patched PDFBox # svn checkout http://svn.apache.org/repos/asf/pdfbox/trunk/ pdfbox #apply patch (tested against pdfbox svn r1157684) cd pdfbox patch -p0 < ../PDFExtract/parent/patch/pdfbox_poms.patch patch -p0 < ../PDFExtract/parent/patch/pdfbox-font-bounding-boxes.patch patch -p0 < ../PDFExtract/parent/patch/pdfbox-drawer-visibility.patch mvn install cd .. # # 4. install PDFExtract # git clone http://github.com/elacin/PDFExtract.git cd PDFExtract/parent mvn -DskipTests=true assembly:assembly #yes, some cleanup of tests is in order # the binary distribution will end up as PDFExtract/pdfextract-cli/target/pdfextract-cli-${VERSION}-bin.tar.bz2
About
my take at a PDF text extraction utility
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published