Notebook showing 4 methods for extracting text from pdf files using the python packages PyPdf2, Pdfminer.six, PyMuPdf, and Grobid.
Levenshtein distance, cosine similarity, tf-idf similarity, and processing time are compared for the text output of each method.