-
Notifications
You must be signed in to change notification settings - Fork 827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: too many tesseract errors are ignored #1086
Labels
bug
Something isn't working
Comments
awalker4
added a commit
to Unstructured-IO/unstructured-inference
that referenced
this issue
Aug 22, 2023
We've seen a 500 error in `unstructured-api` due to an uncaught TesseractError in the `entire_page` path. I can't reproduce it, but we can at least add a try catch. The last fix was too aggessive, which we're tracking [here][Unstructured-IO/unstructured#1086], so we may need to adjust this fix as well.
Note - same adjustment should happen to Unstructured-IO/unstructured-inference#183 |
awalker4
added a commit
to Unstructured-IO/unstructured-inference
that referenced
this issue
Aug 22, 2023
We've seen a 500 error in `unstructured-api` due to an uncaught TesseractError in the `entire_page` path. I can't reproduce it, but we can at least add a try catch. The last fix was too aggessive, which we're tracking [here](Unstructured-IO/unstructured#1086), so we may need to adjust this fix as well. Closes #179
@cragwolfe @awalker4 can you please confirm if good to close this issue? |
Closing because longer than 180 days; |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
While #1074 introduces an important fix where some PDF's or Images would fail to be partitioned, other valid Tesseract errors that should be bubbled up are also now ignored.
Definition of Done
test_unstructured/partition/test_image.py::test_partition_image_raises_with_invalid_language is no longer skipped (after #1074 merges)
The text was updated successfully, but these errors were encountered: