Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: ord() expected string of length 1, but int found #274

Closed
pbienst opened this issue Jan 10, 2020 · 3 comments
Closed

TypeError: ord() expected string of length 1, but int found #274

pbienst opened this issue Jan 10, 2020 · 3 comments

Comments

@pbienst
Copy link

pbienst commented Jan 10, 2020

Fresh install on Ubuntu, using 'sudo apt-get install libtika-java' and 'sudo pip3 install tika',

Parsing any pdf file with 'parsed = parser.from_file("a.pdf")' fails with

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/tika/parser.py", line 36, in from_file
    output = parse1(service, filename, serverEndpoint, headers=headers, config_path=config_path, requestOptions=requestOptions)
  File "/usr/local/lib/python3.6/dist-packages/tika/tika.py", line 321, in parse1
    headers.update({'Accept': responseMimeType, 'Content-Disposition': make_content_disposition_header(path.encode('utf-8') if type(path) is unicode_string else path)})
  File "/usr/local/lib/python3.6/dist-packages/tika/tika.py", line 126, in make_content_disposition_header
    return build_header(os.path.basename(fn)).decode('ascii')
  File "/usr/local/lib/python3.6/dist-packages/rfc6266.py", line 430, in build_header
    if is_token(filename):
  File "/usr/local/lib/python3.6/dist-packages/rfc6266.py", line 370, in is_token
    return all(is_token_char(ch) for ch in candidate)
  File "/usr/local/lib/python3.6/dist-packages/rfc6266.py", line 370, in <genexpr>
    return all(is_token_char(ch) for ch in candidate)
  File "/usr/local/lib/python3.6/dist-packages/rfc6266.py", line 357, in is_token_char
    asciicode = ord(ch)
TypeError: ord() expected string of length 1, but int found
@chrismattmann
Copy link
Owner

Looks like an issue with the filename and unicode @pbienst What charset are you using and can you give examples of the filename? What is your locale?

@pbienst
Copy link
Author

pbienst commented Jan 12, 2020

As I mentioned in my report, the filename is simply "a.pdf"...
Here is the pdf: https://www.dropbox.com/s/rfoisx2bw651l08/a.pdf?dl=0

@chrismattmann
Copy link
Owner

the file isn't there anymore and I can't test.... @pbienst

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants