Extract texts with their corresponding page numbers from PDF files.
Wraps the command line tool pdftotext
(poppler-utils).
- poppler-utils (version >=22.05.0) must be installed and available in the path.
go get "github.com/heussd/pdftotext-go"
- See tests for code examples.
Version 22.05.0 of poppler introduced a new parameter -tsv
, which extracts PDF content with meta data as TSV. This functionality is essential for the operation of this library.
- amitaifrey for finding and fixing a bug