Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pdfminer to requirements - and bug report. #3

Open
Theblackcat98 opened this issue Mar 7, 2024 · 1 comment
Open

Add pdfminer to requirements - and bug report. #3

Theblackcat98 opened this issue Mar 7, 2024 · 1 comment

Comments

@Theblackcat98
Copy link

Add pdfminer to requirements.

  • In requirements.txt pdfminer isn't listed.

When running PDF-to-Text option I get

TypeError: TextConverter.__init__() got an unexpected keyword argument 'codec'

Full Error:

Traceback (most recent call last):
  File "/home/user/Apps/PDF-TOOLBOX/pdf-toolbox.py", line 535, in <module>
    program()
  File "/home/user/Apps/PDF-TOOLBOX/pdf-toolbox.py", line 526, in program
    menu()
  File "/home/user/Apps/PDF-TOOLBOX/pdf-toolbox.py", line 446, in menu
    pdf2txt()
  File "/home/user/Apps/PDF-TOOLBOX/pdf-toolbox.py", line 356, in pdf2txt
    print(convert_pdf_to_txt(x)) 
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/Apps/PDF-TOOLBOX/pdf-toolbox.py", line 334, in convert_pdf_to_txt
    with TextConverter(rsrcmgr, retstr, codec=codec,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: TextConverter.__init__() got an unexpected keyword argument 'codec'
@Siddharth-Latthe-07
Copy link

@Theblackcat98 The error you're encountering with pdfminer is due to changes in the pdfminer.six library, where the codec argument is no longer accepted by the TextConverter class.
Try out these steps, and let me know, if it works

  1. update the req.txt file with inclusion of latest version pdfminer.six, through pip install pdfminer.six
  2. Modify the text-to-speech function:-
    Update the convert_pdf_to_txt function to remove the codec argument from the TextConverter initialization.
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.pdfpage import PDFPage
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from io import StringIO

def convert_pdf_to_txt(path):
    rsrcmgr = PDFResourceManager()
    retstr = StringIO()
    laparams = LAParams()
    device = TextConverter(rsrcmgr, retstr, laparams=laparams)
    fp = open(path, 'rb')
    interpreter = PDFPageInterpreter(rsrcmgr, device)
    for page in PDFPage.get_pages(fp):
        interpreter.process_page(page)
    text = retstr.getvalue()
    fp.close()
    device.close()
    retstr.close()
    return text

let me know, if it works
thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants