You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 18, 2019. It is now read-only.
My OCR language is configured as "French" but, when I scan document, I see a TesseractError on the console :
Extracting boxes ...
Exception in thread Thread-7:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(_self.__args, *_self.__kwargs)
File "/home/chris/tmp/paperwork/src/paperwork/frontend/workers.py", line 44, in __wrapper
self.do(**kwargs)
File "/home/chris/tmp/paperwork/src/paperwork/frontend/mainwindow.py", line 415, in do
self.__scan_progress_cb)
File "/home/chris/tmp/paperwork/src/paperwork/backend/img/doc.py", line 257, in scan_single_page
self.__add_img(img, ocrlang, resolution, scanner_calibration, callback)
File "/home/chris/tmp/paperwork/src/paperwork/backend/img/doc.py", line 233, in __add_img
scanner_calibration, callback)
File "/home/chris/tmp/paperwork/src/paperwork/backend/img/page.py", line 373, in make
(bmpfile, txt, boxes) = self.__ocr(outfiles, ocrlang, callback)
File "/home/chris/tmp/paperwork/src/paperwork/backend/img/page.py", line 353, in __ocr
lang=ocrlang, builder=pyocr.builders.WordBoxBuilder())
File "/usr/lib/python2.7/site-packages/pyocr/tesseract.py", line 225, in image_to_string
raise TesseractError(status, errors)
TesseractError: (1, 'Error opening data file /usr/share/tessdata/eng.traineddata\nPlease make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.\nFailed loading language 'eng'\nTesseract couldn't load any languages!\nCould not initialize tesseract.\n')
Paperwork wants to do OCR with english language but I have configured french in settings window. I don't have data for english language.
$ ls /usr/share/tessdata/
configs fra.traineddata tessconfigs
$ cat ~/.paperwork.conf
[Global]
[OCR]
lang = fra
ocrtime = 21.6917388439
I tried to deactivate OCR and enable french language but the problem is still there.
The text was updated successfully, but these errors were encountered:
Adding a print in pyocr to make sure Paperwork specify 'fra' as the wanted language (pyocr/src/tesseract.py:run_tesseract()) : Check.
'mv /usr/local/share/tessdata/eng.traineddata /usr/local/share/tessdata/old.eng.traineddata' to make sure Tesseract never tries to use 'eng' when we specify 'fra' : Check.
I advise you to check a few things on your side:
Make sure you're using Tesseract v3 and not Tesseract v2 (tesseract --version). Preferably Tesseract v3.01
Make sure you've the french training data installed. Otherwise I assume Tesseract might want to fall back on english ones. Note however that Paperwork shouldn't display 'French' in the settings window if the french data aren't installed. It it does anyway, please fill another bug report.
Make sure you don't have any warnings in Paperwork verbose. For instance, please look for "Warning: Failed to figure out system language" (you shouldn't get it since you've configured explicitly the language to use. However if it pops up, it would give us a good hint).
Make sure you don't have a 'paperwork.conf' (without the first dot) wandering around in the directory from which you run Paperwork. It would be used instead of your ~/.paperwork.conf
My OCR language is configured as "French" but, when I scan document, I see a TesseractError on the console :
Extracting boxes ...
Exception in thread Thread-7:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(_self.__args, *_self.__kwargs)
File "/home/chris/tmp/paperwork/src/paperwork/frontend/workers.py", line 44, in __wrapper
self.do(**kwargs)
File "/home/chris/tmp/paperwork/src/paperwork/frontend/mainwindow.py", line 415, in do
self.__scan_progress_cb)
File "/home/chris/tmp/paperwork/src/paperwork/backend/img/doc.py", line 257, in scan_single_page
self.__add_img(img, ocrlang, resolution, scanner_calibration, callback)
File "/home/chris/tmp/paperwork/src/paperwork/backend/img/doc.py", line 233, in __add_img
scanner_calibration, callback)
File "/home/chris/tmp/paperwork/src/paperwork/backend/img/page.py", line 373, in make
(bmpfile, txt, boxes) = self.__ocr(outfiles, ocrlang, callback)
File "/home/chris/tmp/paperwork/src/paperwork/backend/img/page.py", line 353, in __ocr
lang=ocrlang, builder=pyocr.builders.WordBoxBuilder())
File "/usr/lib/python2.7/site-packages/pyocr/tesseract.py", line 225, in image_to_string
raise TesseractError(status, errors)
TesseractError: (1, 'Error opening data file /usr/share/tessdata/eng.traineddata\nPlease make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.\nFailed loading language 'eng'\nTesseract couldn't load any languages!\nCould not initialize tesseract.\n')
Paperwork wants to do OCR with english language but I have configured french in settings window. I don't have data for english language.
$ ls /usr/share/tessdata/
configs fra.traineddata tessconfigs
$ cat ~/.paperwork.conf
[Global]
[OCR]
lang = fra
ocrtime = 21.6917388439
I tried to deactivate OCR and enable french language but the problem is still there.
The text was updated successfully, but these errors were encountered: